Sample records for ensemble support vector

  1. The assisted prediction modelling frame with hybridisation and ensemble for business risk forecasting and an implementation

    NASA Astrophysics Data System (ADS)

    Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie

    2015-08-01

    The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.

  2. Currency crisis indication by using ensembles of support vector machine classifiers

    NASA Astrophysics Data System (ADS)

    Ramli, Nor Azuana; Ismail, Mohd Tahir; Wooi, Hooy Chee

    2014-07-01

    There are many methods that had been experimented in the analysis of currency crisis. However, not all methods could provide accurate indications. This paper introduces an ensemble of classifiers by using Support Vector Machine that's never been applied in analyses involving currency crisis before with the aim of increasing the indication accuracy. The proposed ensemble classifiers' performances are measured using percentage of accuracy, root mean squared error (RMSE), area under the Receiver Operating Characteristics (ROC) curve and Type II error. The performances of an ensemble of Support Vector Machine classifiers are compared with the single Support Vector Machine classifier and both of classifiers are tested on the data set from 27 countries with 12 macroeconomic indicators for each country. From our analyses, the results show that the ensemble of Support Vector Machine classifiers outperforms single Support Vector Machine classifier on the problem involving indicating a currency crisis in terms of a range of standard measures for comparing the performance of classifiers.

  3. Lysine acetylation sites prediction using an ensemble of support vector machine classifiers.

    PubMed

    Xu, Yan; Wang, Xiao-Bo; Ding, Jun; Wu, Ling-Yun; Deng, Nai-Yang

    2010-05-07

    Lysine acetylation is an essentially reversible and high regulated post-translational modification which regulates diverse protein properties. Experimental identification of acetylation sites is laborious and expensive. Hence, there is significant interest in the development of computational methods for reliable prediction of acetylation sites from amino acid sequences. In this paper we use an ensemble of support vector machine classifiers to perform this work. The experimentally determined acetylation lysine sites are extracted from Swiss-Prot database and scientific literatures. Experiment results show that an ensemble of support vector machine classifiers outperforms single support vector machine classifier and other computational methods such as PAIL and LysAcet on the problem of predicting acetylation lysine sites. The resulting method has been implemented in EnsemblePail, a web server for lysine acetylation sites prediction available at http://www.aporc.org/EnsemblePail/. Copyright (c) 2010 Elsevier Ltd. All rights reserved.

  4. Using Support Vector Machine Ensembles for Target Audience Classification on Twitter

    PubMed Central

    Lo, Siaw Ling; Chiong, Raymond; Cornforth, David

    2015-01-01

    The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space. PMID:25874768

  5. Using support vector machine ensembles for target audience classification on Twitter.

    PubMed

    Lo, Siaw Ling; Chiong, Raymond; Cornforth, David

    2015-01-01

    The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.

  6. Prediction of Weather Impacted Airport Capacity using Ensemble Learning

    NASA Technical Reports Server (NTRS)

    Wang, Yao Xun

    2011-01-01

    Ensemble learning with the Bagging Decision Tree (BDT) model was used to assess the impact of weather on airport capacities at selected high-demand airports in the United States. The ensemble bagging decision tree models were developed and validated using the Federal Aviation Administration (FAA) Aviation System Performance Metrics (ASPM) data and weather forecast at these airports. The study examines the performance of BDT, along with traditional single Support Vector Machines (SVM), for airport runway configuration selection and airport arrival rates (AAR) prediction during weather impacts. Testing of these models was accomplished using observed weather, weather forecast, and airport operation information at the chosen airports. The experimental results show that ensemble methods are more accurate than a single SVM classifier. The airport capacity ensemble method presented here can be used as a decision support model that supports air traffic flow management to meet the weather impacted airport capacity in order to reduce costs and increase safety.

  7. A comparison of breeding and ensemble transform vectors for global ensemble generation

    NASA Astrophysics Data System (ADS)

    Deng, Guo; Tian, Hua; Li, Xiaoli; Chen, Jing; Gong, Jiandong; Jiao, Meiyan

    2012-02-01

    To compare the initial perturbation techniques using breeding vectors and ensemble transform vectors, three ensemble prediction systems using both initial perturbation methods but with different ensemble member sizes based on the spectral model T213/L31 are constructed at the National Meteorological Center, China Meteorological Administration (NMC/CMA). A series of ensemble verification scores such as forecast skill of the ensemble mean, ensemble resolution, and ensemble reliability are introduced to identify the most important attributes of ensemble forecast systems. The results indicate that the ensemble transform technique is superior to the breeding vector method in light of the evaluation of anomaly correlation coefficient (ACC), which is a deterministic character of the ensemble mean, the root-mean-square error (RMSE) and spread, which are of probabilistic attributes, and the continuous ranked probability score (CRPS) and its decomposition. The advantage of the ensemble transform approach is attributed to its orthogonality among ensemble perturbations as well as its consistence with the data assimilation system. Therefore, this study may serve as a reference for configuration of the best ensemble prediction system to be used in operation.

  8. Facial Expression Recognition using Multiclass Ensemble Least-Square Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Lawi, Armin; Sya'Rani Machrizzandi, M.

    2018-03-01

    Facial expression is one of behavior characteristics of human-being. The use of biometrics technology system with facial expression characteristics makes it possible to recognize a person’s mood or emotion. The basic components of facial expression analysis system are face detection, face image extraction, facial classification and facial expressions recognition. This paper uses Principal Component Analysis (PCA) algorithm to extract facial features with expression parameters, i.e., happy, sad, neutral, angry, fear, and disgusted. Then Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM) is used for the classification process of facial expression. The result of MELS-SVM model obtained from our 185 different expression images of 10 persons showed high accuracy level of 99.998% using RBF kernel.

  9. Evaluating uncertainties in multi-layer soil moisture estimation with support vector machines and ensemble Kalman filtering

    NASA Astrophysics Data System (ADS)

    Liu, Di; Mishra, Ashok K.; Yu, Zhongbo

    2016-07-01

    This paper examines the combination of support vector machines (SVM) and the dual ensemble Kalman filter (EnKF) technique to estimate root zone soil moisture at different soil layers up to 100 cm depth. Multiple experiments are conducted in a data rich environment to construct and validate the SVM model and to explore the effectiveness and robustness of the EnKF technique. It was observed that the performance of SVM relies more on the initial length of training set than other factors (e.g., cost function, regularization parameter, and kernel parameters). The dual EnKF technique proved to be efficient to improve SVM with observed data either at each time step or at a flexible time steps. The EnKF technique can reach its maximum efficiency when the updating ensemble size approaches a certain threshold. It was observed that the SVM model performance for the multi-layer soil moisture estimation can be influenced by the rainfall magnitude (e.g., dry and wet spells).

  10. Ensemble support vector machine classification of dementia using structural MRI and mini-mental state examination.

    PubMed

    Sørensen, Lauge; Nielsen, Mads

    2018-05-15

    The International Challenge for Automated Prediction of MCI from MRI data offered independent, standardized comparison of machine learning algorithms for multi-class classification of normal control (NC), mild cognitive impairment (MCI), converting MCI (cMCI), and Alzheimer's disease (AD) using brain imaging and general cognition. We proposed to use an ensemble of support vector machines (SVMs) that combined bagging without replacement and feature selection. SVM is the most commonly used algorithm in multivariate classification of dementia, and it was therefore valuable to evaluate the potential benefit of ensembling this type of classifier. The ensemble SVM, using either a linear or a radial basis function (RBF) kernel, achieved multi-class classification accuracies of 55.6% and 55.0% in the challenge test set (60 NC, 60 MCI, 60 cMCI, 60 AD), resulting in a third place in the challenge. Similar feature subset sizes were obtained for both kernels, and the most frequently selected MRI features were the volumes of the two hippocampal subregions left presubiculum and right subiculum. Post-challenge analysis revealed that enforcing a minimum number of selected features and increasing the number of ensemble classifiers improved classification accuracy up to 59.1%. The ensemble SVM outperformed single SVM classifications consistently in the challenge test set. Ensemble methods using bagging and feature selection can improve the performance of the commonly applied SVM classifier in dementia classification. This resulted in competitive classification accuracies in the International Challenge for Automated Prediction of MCI from MRI data. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS

    NASA Astrophysics Data System (ADS)

    Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah

    2014-05-01

    Flood is one of the most devastating natural disasters that occur frequently in Terengganu, Malaysia. Recently, ensemble based techniques are getting extremely popular in flood modeling. In this paper, weights-of-evidence (WoE) model was utilized first, to assess the impact of classes of each conditioning factor on flooding through bivariate statistical analysis (BSA). Then, these factors were reclassified using the acquired weights and entered into the support vector machine (SVM) model to evaluate the correlation between flood occurrence and each conditioning factor. Through this integration, the weak point of WoE can be solved and the performance of the SVM will be enhanced. The spatial database included flood inventory, slope, stream power index (SPI), topographic wetness index (TWI), altitude, curvature, distance from the river, geology, rainfall, land use/cover (LULC), and soil type. Four kernel types of SVM (linear kernel (LN), polynomial kernel (PL), radial basis function kernel (RBF), and sigmoid kernel (SIG)) were used to investigate the performance of each kernel type. The efficiency of the new ensemble WoE and SVM method was tested using area under curve (AUC) which measured the prediction and success rates. The validation results proved the strength and efficiency of the ensemble method over the individual methods. The best results were obtained from RBF kernel when compared with the other kernel types. Success rate and prediction rate for ensemble WoE and RBF-SVM method were 96.48% and 95.67% respectively. The proposed ensemble flood susceptibility mapping method could assist researchers and local governments in flood mitigation strategies.

  12. A new Method for the Estimation of Initial Condition Uncertainty Structures in Mesoscale Models

    NASA Astrophysics Data System (ADS)

    Keller, J. D.; Bach, L.; Hense, A.

    2012-12-01

    The estimation of fast growing error modes of a system is a key interest of ensemble data assimilation when assessing uncertainty in initial conditions. Over the last two decades three methods (and variations of these methods) have evolved for global numerical weather prediction models: ensemble Kalman filter, singular vectors and breeding of growing modes (or now ensemble transform). While the former incorporates a priori model error information and observation error estimates to determine ensemble initial conditions, the latter two techniques directly address the error structures associated with Lyapunov vectors. However, in global models these structures are mainly associated with transient global wave patterns. When assessing initial condition uncertainty in mesoscale limited area models, several problems regarding the aforementioned techniques arise: (a) additional sources of uncertainty on the smaller scales contribute to the error and (b) error structures from the global scale may quickly move through the model domain (depending on the size of the domain). To address the latter problem, perturbation structures from global models are often included in the mesoscale predictions as perturbed boundary conditions. However, the initial perturbations (when used) are often generated with a variant of an ensemble Kalman filter which does not necessarily focus on the large scale error patterns. In the framework of the European regional reanalysis project of the Hans-Ertel-Center for Weather Research we use a mesoscale model with an implemented nudging data assimilation scheme which does not support ensemble data assimilation at all. In preparation of an ensemble-based regional reanalysis and for the estimation of three-dimensional atmospheric covariance structures, we implemented a new method for the assessment of fast growing error modes for mesoscale limited area models. The so-called self-breeding is development based on the breeding of growing modes technique. Initial perturbations are integrated forward for a short time period and then rescaled and added to the initial state again. Iterating this rapid breeding cycle provides estimates for the initial uncertainty structure (or local Lyapunov vectors) given a specific norm. To avoid that all ensemble perturbations converge towards the leading local Lyapunov vector we apply an ensemble transform variant to orthogonalize the perturbations in the sub-space spanned by the ensemble. By choosing different kind of norms to measure perturbation growth, this technique allows for estimating uncertainty patterns targeted at specific sources of errors (e.g. convection, turbulence). With case study experiments we show applications of the self-breeding method for different sources of uncertainty and different horizontal scales.

  13. Classifying Physical Morphology of Cocoa Beans Digital Images using Multiclass Ensemble Least-Squares Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Lawi, Armin; Adhitya, Yudhi

    2018-03-01

    The objective of this research is to determine the quality of cocoa beans through morphology of their digital images. Samples of cocoa beans were scattered on a bright white paper under a controlled lighting condition. A compact digital camera was used to capture the images. The images were then processed to extract their morphological parameters. Classification process begins with an analysis of cocoa beans image based on morphological feature extraction. Parameters for extraction of morphological or physical feature parameters, i.e., Area, Perimeter, Major Axis Length, Minor Axis Length, Aspect Ratio, Circularity, Roundness, Ferret Diameter. The cocoa beans are classified into 4 groups, i.e.: Normal Beans, Broken Beans, Fractured Beans, and Skin Damaged Beans. The model of classification used in this paper is the Multiclass Ensemble Least-Squares Support Vector Machine (MELS-SVM), a proposed improvement model of SVM using ensemble method in which the separate hyperplanes are obtained by least square approach and the multiclass procedure uses One-Against- All method. The result of our proposed model showed that the classification with morphological feature input parameters were accurately as 99.705% for the four classes, respectively.

  14. A study of fuzzy logic ensemble system performance on face recognition problem

    NASA Astrophysics Data System (ADS)

    Polyakova, A.; Lipinskiy, L.

    2017-02-01

    Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.

  15. An ensemble of SVM classifiers based on gene pairs.

    PubMed

    Tong, Muchenxuan; Liu, Kun-Hong; Xu, Chungui; Ju, Wenbin

    2013-07-01

    In this paper, a genetic algorithm (GA) based ensemble support vector machine (SVM) classifier built on gene pairs (GA-ESP) is proposed. The SVMs (base classifiers of the ensemble system) are trained on different informative gene pairs. These gene pairs are selected by the top scoring pair (TSP) criterion. Each of these pairs projects the original microarray expression onto a 2-D space. Extensive permutation of gene pairs may reveal more useful information and potentially lead to an ensemble classifier with satisfactory accuracy and interpretability. GA is further applied to select an optimized combination of base classifiers. The effectiveness of the GA-ESP classifier is evaluated on both binary-class and multi-class datasets. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Ensemble of classifiers for ontology enrichment

    NASA Astrophysics Data System (ADS)

    Semenova, A. V.; Kureichik, V. M.

    2018-05-01

    A classifier is a basis of ontology learning systems. Classification of text documents is used in many applications, such as information retrieval, information extraction, definition of spam. A new ensemble of classifiers based on SVM (a method of support vectors), LSTM (neural network) and word embedding are suggested. An experiment was conducted on open data, which allows us to conclude that the proposed classification method is promising. The implementation of the proposed classifier is performed in the Matlab using the functions of the Text Analytics Toolbox. The principal difference between the proposed ensembles of classifiers is the high quality of classification of data at acceptable time costs.

  17. A Novel Bearing Multi-Fault Diagnosis Approach Based on Weighted Permutation Entropy and an Improved SVM Ensemble Classifier.

    PubMed

    Zhou, Shenghan; Qian, Silin; Chang, Wenbing; Xiao, Yiyong; Cheng, Yang

    2018-06-14

    Timely and accurate state detection and fault diagnosis of rolling element bearings are very critical to ensuring the reliability of rotating machinery. This paper proposes a novel method of rolling bearing fault diagnosis based on a combination of ensemble empirical mode decomposition (EEMD), weighted permutation entropy (WPE) and an improved support vector machine (SVM) ensemble classifier. A hybrid voting (HV) strategy that combines SVM-based classifiers and cloud similarity measurement (CSM) was employed to improve the classification accuracy. First, the WPE value of the bearing vibration signal was calculated to detect the fault. Secondly, if a bearing fault occurred, the vibration signal was decomposed into a set of intrinsic mode functions (IMFs) by EEMD. The WPE values of the first several IMFs were calculated to form the fault feature vectors. Then, the SVM ensemble classifier was composed of binary SVM and the HV strategy to identify the bearing multi-fault types. Finally, the proposed model was fully evaluated by experiments and comparative studies. The results demonstrate that the proposed method can effectively detect bearing faults and maintain a high accuracy rate of fault recognition when a small number of training samples are available.

  18. Prediction of the distillation temperatures of crude oils using ¹H NMR and support vector regression with estimated confidence intervals.

    PubMed

    Filgueiras, Paulo R; Terra, Luciana A; Castro, Eustáquio V R; Oliveira, Lize M S L; Dias, Júlio C M; Poppi, Ronei J

    2015-09-01

    This paper aims to estimate the temperature equivalent to 10% (T10%), 50% (T50%) and 90% (T90%) of distilled volume in crude oils using (1)H NMR and support vector regression (SVR). Confidence intervals for the predicted values were calculated using a boosting-type ensemble method in a procedure called ensemble support vector regression (eSVR). The estimated confidence intervals obtained by eSVR were compared with previously accepted calculations from partial least squares (PLS) models and a boosting-type ensemble applied in the PLS method (ePLS). By using the proposed boosting strategy, it was possible to identify outliers in the T10% property dataset. The eSVR procedure improved the accuracy of the distillation temperature predictions in relation to standard PLS, ePLS and SVR. For T10%, a root mean square error of prediction (RMSEP) of 11.6°C was obtained in comparison with 15.6°C for PLS, 15.1°C for ePLS and 28.4°C for SVR. The RMSEPs for T50% were 24.2°C, 23.4°C, 22.8°C and 14.4°C for PLS, ePLS, SVR and eSVR, respectively. For T90%, the values of RMSEP were 39.0°C, 39.9°C and 39.9°C for PLS, ePLS, SVR and eSVR, respectively. The confidence intervals calculated by the proposed boosting methodology presented acceptable values for the three properties analyzed; however, they were lower than those calculated by the standard methodology for PLS. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Active relearning for robust supervised classification of pulmonary emphysema

    NASA Astrophysics Data System (ADS)

    Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.

    2012-03-01

    Radiologists are adept at recognizing the appearance of lung parenchymal abnormalities in CT scans. However, the inconsistent differential diagnosis, due to subjective aggregation, mandates supervised classification. Towards optimizing Emphysema classification, we introduce a physician-in-the-loop feedback approach in order to minimize uncertainty in the selected training samples. Using multi-view inductive learning with the training samples, an ensemble of Support Vector Machine (SVM) models, each based on a specific pair-wise dissimilarity metric, was constructed in less than six seconds. In the active relearning phase, the ensemble-expert label conflicts were resolved by an expert. This just-in-time feedback with unoptimized SVMs yielded 15% increase in classification accuracy and 25% reduction in the number of support vectors. The generality of relearning was assessed in the optimized parameter space of six different classifiers across seven dissimilarity metrics. The resultant average accuracy improved to 21%. The co-operative feedback method proposed here could enhance both diagnostic and staging throughput efficiency in chest radiology practice.

  20. Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning.

    PubMed

    Liu, Bin; Wang, Shanyi; Dong, Qiwen; Li, Shumin; Liu, Xuan

    2016-04-20

    DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. With the rapid development of next generation of sequencing technique, the number of protein sequences is unprecedentedly increasing. Thus it is necessary to develop computational methods to identify the DNA-binding proteins only based on the protein sequence information. In this study, a novel method called iDNA-KACC is presented, which combines the Support Vector Machine (SVM) and the auto-cross covariance transformation. The protein sequences are first converted into profile-based protein representation, and then converted into a series of fixed-length vectors by the auto-cross covariance transformation with Kmer composition. The sequence order effect can be effectively captured by this scheme. These vectors are then fed into Support Vector Machine (SVM) to discriminate the DNA-binding proteins from the non DNA-binding ones. iDNA-KACC achieves an overall accuracy of 75.16% and Matthew correlation coefficient of 0.5 by a rigorous jackknife test. Its performance is further improved by employing an ensemble learning approach, and the improved predictor is called iDNA-KACC-EL. Experimental results on an independent dataset shows that iDNA-KACC-EL outperforms all the other state-of-the-art predictors, indicating that it would be a useful computational tool for DNA binding protein identification. .

  1. Ensemble Feature Learning of Genomic Data Using Support Vector Machine

    PubMed Central

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.

    2016-01-01

    The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923

  2. Metal Oxide Gas Sensor Drift Compensation Using a Two-Dimensional Classifier Ensemble

    PubMed Central

    Liu, Hang; Chu, Renzhi; Tang, Zhenan

    2015-01-01

    Sensor drift is the most challenging problem in gas sensing at present. We propose a novel two-dimensional classifier ensemble strategy to solve the gas discrimination problem, regardless of the gas concentration, with high accuracy over extended periods of time. This strategy is appropriate for multi-class classifiers that consist of combinations of pairwise classifiers, such as support vector machines. We compare the performance of the strategy with those of competing methods in an experiment based on a public dataset that was compiled over a period of three years. The experimental results demonstrate that the two-dimensional ensemble outperforms the other methods considered. Furthermore, we propose a pre-aging process inspired by that applied to the sensors to improve the stability of the classifier ensemble. The experimental results demonstrate that the weight of each multi-class classifier model in the ensemble remains fairly static before and after the addition of new classifier models to the ensemble, when a pre-aging procedure is applied. PMID:25942640

  3. Gridded Calibration of Ensemble Wind Vector Forecasts Using Ensemble Model Output Statistics

    NASA Astrophysics Data System (ADS)

    Lazarus, S. M.; Holman, B. P.; Splitt, M. E.

    2017-12-01

    A computationally efficient method is developed that performs gridded post processing of ensemble wind vector forecasts. An expansive set of idealized WRF model simulations are generated to provide physically consistent high resolution winds over a coastal domain characterized by an intricate land / water mask. Ensemble model output statistics (EMOS) is used to calibrate the ensemble wind vector forecasts at observation locations. The local EMOS predictive parameters (mean and variance) are then spread throughout the grid utilizing flow-dependent statistical relationships extracted from the downscaled WRF winds. Using data withdrawal and 28 east central Florida stations, the method is applied to one year of 24 h wind forecasts from the Global Ensemble Forecast System (GEFS). Compared to the raw GEFS, the approach improves both the deterministic and probabilistic forecast skill. Analysis of multivariate rank histograms indicate the post processed forecasts are calibrated. Two downscaling case studies are presented, a quiescent easterly flow event and a frontal passage. Strengths and weaknesses of the approach are presented and discussed.

  4. Spatio-temporal evolution of perturbations in ensembles initialized by bred, Lyapunov and singular vectors

    NASA Astrophysics Data System (ADS)

    Pazó, Diego; Rodríguez, Miguel A.; López, Juan M.

    2010-05-01

    We study the evolution of finite perturbations in the Lorenz ‘96 model, a meteorological toy model of the atmosphere. The initial perturbations are chosen to be aligned along different dynamic vectors: bred, Lyapunov, and singular vectors. Using a particular vector determines not only the amplification rate of the perturbation but also the spatial structure of the perturbation and its stability under the evolution of the flow. The evolution of perturbations is systematically studied by means of the so-called mean-variance of logarithms diagram that provides in a very compact way the basic information to analyse the spatial structure. We discuss the corresponding advantages of using those different vectors for preparing initial perturbations to be used in ensemble prediction systems, focusing on key properties: dynamic adaptation to the flow, robustness, equivalence between members of the ensemble, etc. Among all the vectors considered here, the so-called characteristic Lyapunov vectors are possibly optimal, in the sense that they are both perfectly adapted to the flow and extremely robust.

  5. Spatio-temporal evolution of perturbations in ensembles initialized by bred, Lyapunov and singular vectors

    NASA Astrophysics Data System (ADS)

    Pazó, Diego; Rodríguez, Miguel A.; López, Juan M.

    2010-01-01

    We study the evolution of finite perturbations in the Lorenz `96 model, a meteorological toy model of the atmosphere. The initial perturbations are chosen to be aligned along different dynamic vectors: bred, Lyapunov, and singular vectors. Using a particular vector determines not only the amplification rate of the perturbation but also the spatial structure of the perturbation and its stability under the evolution of the flow. The evolution of perturbations is systematically studied by means of the so-called mean-variance of logarithms diagram that provides in a very compact way the basic information to analyse the spatial structure. We discuss the corresponding advantages of using those different vectors for preparing initial perturbations to be used in ensemble prediction systems, focusing on key properties: dynamic adaptation to the flow, robustness, equivalence between members of the ensemble, etc. Among all the vectors considered here, the so-called characteristic Lyapunov vectors are possibly optimal, in the sense that they are both perfectly adapted to the flow and extremely robust.

  6. HLPI-Ensemble: Prediction of human lncRNA-protein interactions based on ensemble strategy.

    PubMed

    Hu, Huan; Zhang, Li; Ai, Haixin; Zhang, Hui; Fan, Yetian; Zhao, Qi; Liu, Hongsheng

    2018-03-27

    LncRNA plays an important role in many biological and disease progression by binding to related proteins. However, the experimental methods for studying lncRNA-protein interactions are time-consuming and expensive. Although there are a few models designed to predict the interactions of ncRNA-protein, they all have some common drawbacks that limit their predictive performance. In this study, we present a model called HLPI-Ensemble designed specifically for human lncRNA-protein interactions. HLPI-Ensemble adopts the ensemble strategy based on three mainstream machine learning algorithms of Support Vector Machines (SVM), Random Forests (RF) and Extreme Gradient Boosting (XGB) to generate HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble, respectively. The results of 10-fold cross-validation show that HLPI-SVM Ensemble, HLPI-RF Ensemble and HLPI-XGB Ensemble achieved AUCs of 0.95, 0.96 and 0.96, respectively, in the test dataset. Furthermore, we compared the performance of the HLPI-Ensemble models with the previous models through external validation dataset. The results show that the false positives (FPs) of HLPI-Ensemble models are much lower than that of the previous models, and other evaluation indicators of HLPI-Ensemble models are also higher than those of the previous models. It is further showed that HLPI-Ensemble models are superior in predicting human lncRNA-protein interaction compared with previous models. The HLPI-Ensemble is publicly available at: http://ccsipb.lnu.edu.cn/hlpiensemble/ .

  7. An efficient ensemble learning method for gene microarray classification.

    PubMed

    Osareh, Alireza; Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  8. Dropout Prediction in E-Learning Courses through the Combination of Machine Learning Techniques

    ERIC Educational Resources Information Center

    Lykourentzou, Ioanna; Giannoukos, Ioannis; Nikolopoulos, Vassilis; Mpardis, George; Loumos, Vassili

    2009-01-01

    In this paper, a dropout prediction method for e-learning courses, based on three popular machine learning techniques and detailed student data, is proposed. The machine learning techniques used are feed-forward neural networks, support vector machines and probabilistic ensemble simplified fuzzy ARTMAP. Since a single technique may fail to…

  9. Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.

    PubMed

    Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G

    2017-09-01

    To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.

  10. The NRL relocatable ocean/acoustic ensemble forecast system

    NASA Astrophysics Data System (ADS)

    Rowley, C.; Martin, P.; Cummings, J.; Jacobs, G.; Coelho, E.; Bishop, C.; Hong, X.; Peggion, G.; Fabre, J.

    2009-04-01

    A globally relocatable regional ocean nowcast/forecast system has been developed to support rapid implementation of new regional forecast domains. The system is in operational use at the Naval Oceanographic Office for a growing number of regional and coastal implementations. The new system is the basis for an ocean acoustic ensemble forecast and adaptive sampling capability. We present an overview of the forecast system and the ocean ensemble and adaptive sampling methods. The forecast system consists of core ocean data analysis and forecast modules, software for domain configuration, surface and boundary condition forcing processing, and job control, and global databases for ocean climatology, bathymetry, tides, and river locations and transports. The analysis component is the Navy Coupled Ocean Data Assimilation (NCODA) system, a 3D multivariate optimum interpolation system that produces simultaneous analyses of temperature, salinity, geopotential, and vector velocity using remotely-sensed SST, SSH, and sea ice concentration, plus in situ observations of temperature, salinity, and currents from ships, buoys, XBTs, CTDs, profiling floats, and autonomous gliders. The forecast component is the Navy Coastal Ocean Model (NCOM). The system supports one-way nesting and multiple assimilation methods. The ensemble system uses the ensemble transform technique with error variance estimates from the NCODA analysis to represent initial condition error. Perturbed surface forcing or an atmospheric ensemble is used to represent errors in surface forcing. The ensemble transform Kalman filter is used to assess the impact of adaptive observations on future analysis and forecast uncertainty for both ocean and acoustic properties.

  11. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting

    NASA Astrophysics Data System (ADS)

    Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu

    2016-06-01

    To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.

  12. BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting.

    PubMed

    Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan

    2015-06-01

    Conventional clinical decision support systems are based on individual classifiers or simple combination of these classifiers which tend to show moderate performance. This research paper presents a novel classifier ensemble framework based on enhanced bagging approach with multi-objective weighted voting scheme for prediction and analysis of heart disease. The proposed model overcomes the limitations of conventional performance by utilizing an ensemble of five heterogeneous classifiers: Naïve Bayes, linear regression, quadratic discriminant analysis, instance based learner and support vector machines. Five different datasets are used for experimentation, evaluation and validation. The datasets are obtained from publicly available data repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several classifiers. Prediction results of the proposed ensemble model are assessed by ten fold cross validation and ANOVA statistics. The experimental evaluation shows that the proposed framework deals with all type of attributes and achieved high diagnosis accuracy of 84.16 %, 93.29 % sensitivity, 96.70 % specificity, and 82.15 % f-measure. The f-ratio higher than f-critical and p value less than 0.05 for 95 % confidence interval indicate that the results are extremely statistically significant for most of the datasets.

  13. Protein Kinase Classification with 2866 Hidden Markov Models and One Support Vector Machine

    NASA Technical Reports Server (NTRS)

    Weber, Ryan; New, Michael H.; Fonda, Mark (Technical Monitor)

    2002-01-01

    The main application considered in this paper is predicting true kinases from randomly permuted kinases that share the same length and amino acid distributions as the true kinases. Numerous methods already exist for this classification task, such as HMMs, motif-matchers, and sequence comparison algorithms. We build on some of these efforts by creating a vector from the output of thousands of structurally based HMMs, created offline with Pfam-A seed alignments using SAM-T99, which then must be combined into an overall classification for the protein. Then we use a Support Vector Machine for classifying this large ensemble Pfam-Vector, with a polynomial and chisquared kernel. In particular, the chi-squared kernel SVM performs better than the HMMs and better than the BLAST pairwise comparisons, when predicting true from false kinases in some respects, but no one algorithm is best for all purposes or in all instances so we consider the particular strengths and weaknesses of each.

  14. [MicroRNA Target Prediction Based on Support Vector Machine Ensemble Classification Algorithm of Under-sampling Technique].

    PubMed

    Chen, Zhiru; Hong, Wenxue

    2016-02-01

    Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.

  15. Glyph-based analysis of multimodal directional distributions in vector field ensembles

    NASA Astrophysics Data System (ADS)

    Jarema, Mihaela; Demir, Ismail; Kehrer, Johannes; Westermann, Rüdiger

    2015-04-01

    Ensemble simulations are increasingly often performed in the geosciences in order to study the uncertainty and variability of model predictions. Describing ensemble data by mean and standard deviation can be misleading in case of multimodal distributions. We present first results of a glyph-based visualization of multimodal directional distributions in 2D and 3D vector ensemble data. Directional information on the circle/sphere is modeled using mixtures of probability density functions (pdfs), which enables us to characterize the distributions with relatively few parameters. The resulting mixture models are represented by 2D and 3D lobular glyphs showing direction, spread and strength of each principal mode of the distributions. A 3D extension of our approach is realized by means of an efficient GPU rendering technique. We demonstrate our method in the context of ensemble weather simulations.

  16. Modeling the geographic distribution of Ixodes scapularis and Ixodes pacificus (Acari: Ixodidae) in the contiguous United States

    USGS Publications Warehouse

    Hahn, Micah; Jarnevich, Catherine S.; Monaghan, Andrew J.; Eisen, Rebecca J.

    2016-01-01

    In addition to serving as vectors of several other human pathogens, the black-legged tick, Ixodes scapularis Say, and western black-legged tick, Ixodes pacificus Cooley and Kohls, are the primary vectors of the spirochete (Borrelia burgdorferi ) that causes Lyme disease, the most common vector-borne disease in the United States. Over the past two decades, the geographic range of I. pacificus has changed modestly while, in contrast, the I. scapularis range has expanded substantially, which likely contributes to the concurrent expansion in the distribution of human Lyme disease cases in the Northeastern, North-Central and Mid-Atlantic states. Identifying counties that contain suitable habitat for these ticks that have not yet reported established vector populations can aid in targeting limited vector surveillance resources to areas where tick invasion and potential human risk are likely to occur. We used county-level vector distribution information and ensemble modeling to map the potential distribution of I. scapularis and I. pacificus in the contiguous United States as a function of climate, elevation, and forest cover. Results show that I. pacificus is currently present within much of the range classified by our model as suitable for establishment. In contrast, environmental conditions are suitable for I. scapularis to continue expanding its range into northwestern Minnesota, central and northern Michigan, within the Ohio River Valley, and inland from the southeastern and Gulf coasts. Overall, our ensemble models show suitable habitat for I. scapularis in 441 eastern counties and for I. pacificus in 11 western counties where surveillance records have not yet supported classification of the counties as established.

  17. The role of model dynamics in ensemble Kalman filter performance for chaotic systems

    USGS Publications Warehouse

    Ng, G.-H.C.; McLaughlin, D.; Entekhabi, D.; Ahanin, A.

    2011-01-01

    The ensemble Kalman filter (EnKF) is susceptible to losing track of observations, or 'diverging', when applied to large chaotic systems such as atmospheric and ocean models. Past studies have demonstrated the adverse impact of sampling error during the filter's update step. We examine how system dynamics affect EnKF performance, and whether the absence of certain dynamic features in the ensemble may lead to divergence. The EnKF is applied to a simple chaotic model, and ensembles are checked against singular vectors of the tangent linear model, corresponding to short-term growth and Lyapunov vectors, corresponding to long-term growth. Results show that the ensemble strongly aligns itself with the subspace spanned by unstable Lyapunov vectors. Furthermore, the filter avoids divergence only if the full linearized long-term unstable subspace is spanned. However, short-term dynamics also become important as non-linearity in the system increases. Non-linear movement prevents errors in the long-term stable subspace from decaying indefinitely. If these errors then undergo linear intermittent growth, a small ensemble may fail to properly represent all important modes, causing filter divergence. A combination of long and short-term growth dynamics are thus critical to EnKF performance. These findings can help in developing practical robust filters based on model dynamics. ?? 2011 The Authors Tellus A ?? 2011 John Wiley & Sons A/S.

  18. Human Activity Recognition from Smart-Phone Sensor Data using a Multi-Class Ensemble Learning in Home Monitoring.

    PubMed

    Ghose, Soumya; Mitra, Jhimli; Karunanithi, Mohan; Dowling, Jason

    2015-01-01

    Home monitoring of chronically ill or elderly patient can reduce frequent hospitalisations and hence provide improved quality of care at a reduced cost to the community, therefore reducing the burden on the healthcare system. Activity recognition of such patients is of high importance in such a design. In this work, a system for automatic human physical activity recognition from smart-phone inertial sensors data is proposed. An ensemble of decision trees framework is adopted to train and predict the multi-class human activity system. A comparison of our proposed method with a multi-class traditional support vector machine shows significant improvement in activity recognition accuracies.

  19. Data mining for the analysis of hippocampal zones in Alzheimer's disease

    NASA Astrophysics Data System (ADS)

    Ovando Vázquez, Cesaré M.

    2012-02-01

    In this work, a methodology to classify people with Alzheimer's Disease (AD), Healthy Controls (HC) and people with Mild Cognitive Impairment (MCI) is presented. This methodology consists of an ensemble of Support Vector Machines (SVM) with the hippocampal boxes (HB) as input data, these hippocampal zones are taken from Magnetic Resonance (MRI) and Positron Emission Tomography (PET) images. Two ways of constructing this ensemble are presented, the first consists of linear SVM models and the second of non-linear SVM models. Results demonstrate that the linear models classify HBs more accurately than the non-linear models between HC and MCI and that there are no differences between HC and AD.

  20. Intelligent postoperative morbidity prediction of heart disease using artificial intelligence techniques.

    PubMed

    Hsieh, Nan-Chen; Hung, Lun-Ping; Shih, Chun-Che; Keh, Huan-Chao; Chan, Chien-Hui

    2012-06-01

    Endovascular aneurysm repair (EVAR) is an advanced minimally invasive surgical technology that is helpful for reducing patients' recovery time, postoperative morbidity and mortality. This study proposes an ensemble model to predict postoperative morbidity after EVAR. The ensemble model was developed using a training set of consecutive patients who underwent EVAR between 2000 and 2009. All data required for prediction modeling, including patient demographics, preoperative, co-morbidities, and complication as outcome variables, was collected prospectively and entered into a clinical database. A discretization approach was used to categorize numerical values into informative feature space. Then, the Bayesian network (BN), artificial neural network (ANN), and support vector machine (SVM) were adopted as base models, and stacking combined multiple models. The research outcomes consisted of an ensemble model to predict postoperative morbidity after EVAR, the occurrence of postoperative complications prospectively recorded, and the causal effect knowledge by BNs with Markov blanket concept.

  1. Weighted K-means support vector machine for cancer prediction.

    PubMed

    Kim, SungHwan

    2016-01-01

    To date, the support vector machine (SVM) has been widely applied to diverse bio-medical fields to address disease subtype identification and pathogenicity of genetic variants. In this paper, I propose the weighted K-means support vector machine (wKM-SVM) and weighted support vector machine (wSVM), for which I allow the SVM to impose weights to the loss term. Besides, I demonstrate the numerical relations between the objective function of the SVM and weights. Motivated by general ensemble techniques, which are known to improve accuracy, I directly adopt the boosting algorithm to the newly proposed weighted KM-SVM (and wSVM). For predictive performance, a range of simulation studies demonstrate that the weighted KM-SVM (and wSVM) with boosting outperforms the standard KM-SVM (and SVM) including but not limited to many popular classification rules. I applied the proposed methods to simulated data and two large-scale real applications in the TCGA pan-cancer methylation data of breast and kidney cancer. In conclusion, the weighted KM-SVM (and wSVM) increases accuracy of the classification model, and will facilitate disease diagnosis and clinical treatment decisions to benefit patients. A software package (wSVM) is publicly available at the R-project webpage (https://www.r-project.org).

  2. Training set extension for SVM ensemble in P300-speller with familiar face paradigm.

    PubMed

    Li, Qi; Shi, Kaiyang; Gao, Ning; Li, Jian; Bai, Ou

    2018-03-27

    P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject's fatigue. This study aimed to develop a method for acquiring more training data based on a collected small training set. A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm. The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences. The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.

  3. An ensemble of dynamic neural network identifiers for fault detection and isolation of gas turbine engines.

    PubMed

    Amozegar, M; Khorasani, K

    2016-04-01

    In this paper, a new approach for Fault Detection and Isolation (FDI) of gas turbine engines is proposed by developing an ensemble of dynamic neural network identifiers. For health monitoring of the gas turbine engine, its dynamics is first identified by constructing three separate or individual dynamic neural network architectures. Specifically, a dynamic multi-layer perceptron (MLP), a dynamic radial-basis function (RBF) neural network, and a dynamic support vector machine (SVM) are trained to individually identify and represent the gas turbine engine dynamics. Next, three ensemble-based techniques are developed to represent the gas turbine engine dynamics, namely, two heterogeneous ensemble models and one homogeneous ensemble model. It is first shown that all ensemble approaches do significantly improve the overall performance and accuracy of the developed system identification scheme when compared to each of the stand-alone solutions. The best selected stand-alone model (i.e., the dynamic RBF network) and the best selected ensemble architecture (i.e., the heterogeneous ensemble) in terms of their performances in achieving an accurate system identification are then selected for solving the FDI task. The required residual signals are generated by using both a single model-based solution and an ensemble-based solution under various gas turbine engine health conditions. Our extensive simulation studies demonstrate that the fault detection and isolation task achieved by using the residuals that are obtained from the dynamic ensemble scheme results in a significantly more accurate and reliable performance as illustrated through detailed quantitative confusion matrix analysis and comparative studies. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

    PubMed

    Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R

    2017-01-01

    Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as we have done here, utilizing readily-available off-the-shelf machine learning techniques and resulting in only a fraction of narratives that require manual review. Human-machine ensemble methods are likely to improve performance over total manual coding. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. SVM and SVM Ensembles in Breast Cancer Prediction.

    PubMed

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  6. SVM and SVM Ensembles in Breast Cancer Prediction

    PubMed Central

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807

  7. Kernel parameter variation-based selective ensemble support vector data description for oil spill detection on the ocean via hyperspectral imaging

    NASA Astrophysics Data System (ADS)

    Uslu, Faruk Sukru

    2017-07-01

    Oil spills on the ocean surface cause serious environmental, political, and economic problems. Therefore, these catastrophic threats to marine ecosystems require detection and monitoring. Hyperspectral sensors are powerful optical sensors used for oil spill detection with the help of detailed spectral information of materials. However, huge amounts of data in hyperspectral imaging (HSI) require fast and accurate computation methods for detection problems. Support vector data description (SVDD) is one of the most suitable methods for detection, especially for large data sets. Nevertheless, the selection of kernel parameters is one of the main problems in SVDD. This paper presents a method, inspired by ensemble learning, for improving performance of SVDD without tuning its kernel parameters. Additionally, a classifier selection technique is proposed to get more gain. The proposed approach also aims to solve the small sample size problem, which is very important for processing high-dimensional data in HSI. The algorithm is applied to two HSI data sets for detection problems. In the first HSI data set, various targets are detected; in the second HSI data set, oil spill detection in situ is realized. The experimental results demonstrate the feasibility and performance improvement of the proposed algorithm for oil spill detection problems.

  8. EnsembleGASVR: a novel ensemble method for classifying missense single nucleotide polymorphisms.

    PubMed

    Rapakoulia, Trisevgeni; Theofilatos, Konstantinos; Kleftogiannis, Dimitrios; Likothanasis, Spiros; Tsakalidis, Athanasios; Mavroudi, Seferina

    2014-08-15

    Single nucleotide polymorphisms (SNPs) are considered the most frequently occurring DNA sequence variations. Several computational methods have been proposed for the classification of missense SNPs to neutral and disease associated. However, existing computational approaches fail to select relevant features by choosing them arbitrarily without sufficient documentation. Moreover, they are limited to the problem of missing values, imbalance between the learning datasets and most of them do not support their predictions with confidence scores. To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a two-step algorithm, which in its first step applies a novel evolutionary embedded algorithm to locate close to optimal Support Vector Regression models. In its second step, these models are combined to extract a universal predictor, which is less prone to overfitting issues, systematizes the rebalancing of the learning sets and uses an internal approach for solving the missing values problem without loss of information. Confidence scores support all the predictions and the model becomes tunable by modifying the classification thresholds. An extensive study was performed for collecting the most relevant features for the problem of classifying SNPs, and a superset of 88 features was constructed. Experimental results show that the proposed framework outperforms well-known algorithms in terms of classification performance in the examined datasets. Finally, the proposed algorithmic framework was able to uncover the significant role of certain features such as the solvent accessibility feature, and the top-scored predictions were further validated by linking them with disease phenotypes. Datasets and codes are freely available on the Web at http://prlab.ceid.upatras.gr/EnsembleGASVR/dataset-codes.zip. All the required information about the article is available through http://prlab.ceid.upatras.gr/EnsembleGASVR/site.html. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Rolling bearing fault detection and diagnosis based on composite multiscale fuzzy entropy and ensemble support vector machines

    NASA Astrophysics Data System (ADS)

    Zheng, Jinde; Pan, Haiyang; Cheng, Junsheng

    2017-02-01

    To timely detect the incipient failure of rolling bearing and find out the accurate fault location, a novel rolling bearing fault diagnosis method is proposed based on the composite multiscale fuzzy entropy (CMFE) and ensemble support vector machines (ESVMs). Fuzzy entropy (FuzzyEn), as an improvement of sample entropy (SampEn), is a new nonlinear method for measuring the complexity of time series. Since FuzzyEn (or SampEn) in single scale can not reflect the complexity effectively, multiscale fuzzy entropy (MFE) is developed by defining the FuzzyEns of coarse-grained time series, which represents the system dynamics in different scales. However, the MFE values will be affected by the data length, especially when the data are not long enough. By combining information of multiple coarse-grained time series in the same scale, the CMFE algorithm is proposed in this paper to enhance MFE, as well as FuzzyEn. Compared with MFE, with the increasing of scale factor, CMFE obtains much more stable and consistent values for a short-term time series. In this paper CMFE is employed to measure the complexity of vibration signals of rolling bearings and is applied to extract the nonlinear features hidden in the vibration signals. Also the physically meanings of CMFE being suitable for rolling bearing fault diagnosis are explored. Based on these, to fulfill an automatic fault diagnosis, the ensemble SVMs based multi-classifier is constructed for the intelligent classification of fault features. Finally, the proposed fault diagnosis method of rolling bearing is applied to experimental data analysis and the results indicate that the proposed method could effectively distinguish different fault categories and severities of rolling bearings.

  10. A Simple Ensemble Simulation Technique for Assessment of Future Variations in Specific High-Impact Weather Events

    NASA Astrophysics Data System (ADS)

    Taniguchi, Kenji

    2018-04-01

    To investigate future variations in high-impact weather events, numerous samples are required. For the detailed assessment in a specific region, a high spatial resolution is also required. A simple ensemble simulation technique is proposed in this paper. In the proposed technique, new ensemble members were generated from one basic state vector and two perturbation vectors, which were obtained by lagged average forecasting simulations. Sensitivity experiments with different numbers of ensemble members, different simulation lengths, and different perturbation magnitudes were performed. Experimental application to a global warming study was also implemented for a typhoon event. Ensemble-mean results and ensemble spreads of total precipitation, atmospheric conditions showed similar characteristics across the sensitivity experiments. The frequencies of the maximum total and hourly precipitation also showed similar distributions. These results indicate the robustness of the proposed technique. On the other hand, considerable ensemble spread was found in each ensemble experiment. In addition, the results of the application to a global warming study showed possible variations in the future. These results indicate that the proposed technique is useful for investigating various meteorological phenomena and the impacts of global warming. The results of the ensemble simulations also enable the stochastic evaluation of differences in high-impact weather events. In addition, the impacts of a spectral nudging technique were also examined. The tracks of a typhoon were quite different between cases with and without spectral nudging; however, the ranges of the tracks among ensemble members were comparable. It indicates that spectral nudging does not necessarily suppress ensemble spread.

  11. Force Sensor Based Tool Condition Monitoring Using a Heterogeneous Ensemble Learning Model

    PubMed Central

    Wang, Guofeng; Yang, Yinwei; Li, Zhimeng

    2014-01-01

    Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability. PMID:25405514

  12. Force sensor based tool condition monitoring using a heterogeneous ensemble learning model.

    PubMed

    Wang, Guofeng; Yang, Yinwei; Li, Zhimeng

    2014-11-14

    Tool condition monitoring (TCM) plays an important role in improving machining efficiency and guaranteeing workpiece quality. In order to realize reliable recognition of the tool condition, a robust classifier needs to be constructed to depict the relationship between tool wear states and sensory information. However, because of the complexity of the machining process and the uncertainty of the tool wear evolution, it is hard for a single classifier to fit all the collected samples without sacrificing generalization ability. In this paper, heterogeneous ensemble learning is proposed to realize tool condition monitoring in which the support vector machine (SVM), hidden Markov model (HMM) and radius basis function (RBF) are selected as base classifiers and a stacking ensemble strategy is further used to reflect the relationship between the outputs of these base classifiers and tool wear states. Based on the heterogeneous ensemble learning classifier, an online monitoring system is constructed in which the harmonic features are extracted from force signals and a minimal redundancy and maximal relevance (mRMR) algorithm is utilized to select the most prominent features. To verify the effectiveness of the proposed method, a titanium alloy milling experiment was carried out and samples with different tool wear states were collected to build the proposed heterogeneous ensemble learning classifier. Moreover, the homogeneous ensemble learning model and majority voting strategy are also adopted to make a comparison. The analysis and comparison results show that the proposed heterogeneous ensemble learning classifier performs better in both classification accuracy and stability.

  13. Bayesian Hierarchical Model Characterization of Model Error in Ocean Data Assimilation and Forecasts

    DTIC Science & Technology

    2013-09-30

    wind ensemble with the increments in the surface momentum flux control vector in a four-dimensional variational (4dvar) assimilation system. The...stability  effects?   surface  stress   Surface   Momentum  Flux  Ensembles  from  Summaries  of  BHM  Winds  (Mediterranean...surface wind speed given ensemble winds from a Bayesian Hierarchical Model to provide surface momentum flux ensembles. 3 Figure 2: Domain of

  14. An Improved Ensemble of Random Vector Functional Link Networks Based on Particle Swarm Optimization with Double Optimization Strategy

    PubMed Central

    Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang

    2016-01-01

    For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system. PMID:27835638

  15. An Improved Ensemble of Random Vector Functional Link Networks Based on Particle Swarm Optimization with Double Optimization Strategy.

    PubMed

    Ling, Qing-Hua; Song, Yu-Qing; Han, Fei; Yang, Dan; Huang, De-Shuang

    2016-01-01

    For ensemble learning, how to select and combine the candidate classifiers are two key issues which influence the performance of the ensemble system dramatically. Random vector functional link networks (RVFL) without direct input-to-output links is one of suitable base-classifiers for ensemble systems because of its fast learning speed, simple structure and good generalization performance. In this paper, to obtain a more compact ensemble system with improved convergence performance, an improved ensemble of RVFL based on attractive and repulsive particle swarm optimization (ARPSO) with double optimization strategy is proposed. In the proposed method, ARPSO is applied to select and combine the candidate RVFL. As for using ARPSO to select the optimal base RVFL, ARPSO considers both the convergence accuracy on the validation data and the diversity of the candidate ensemble system to build the RVFL ensembles. In the process of combining RVFL, the ensemble weights corresponding to the base RVFL are initialized by the minimum norm least-square method and then further optimized by ARPSO. Finally, a few redundant RVFL is pruned, and thus the more compact ensemble of RVFL is obtained. Moreover, in this paper, theoretical analysis and justification on how to prune the base classifiers on classification problem is presented, and a simple and practically feasible strategy for pruning redundant base classifiers on both classification and regression problems is proposed. Since the double optimization is performed on the basis of the single optimization, the ensemble of RVFL built by the proposed method outperforms that built by some single optimization methods. Experiment results on function approximation and classification problems verify that the proposed method could improve its convergence accuracy as well as reduce the complexity of the ensemble system.

  16. An Improved Ensemble Learning Method for Classifying High-Dimensional and Imbalanced Biomedicine Data.

    PubMed

    Yu, Hualong; Ni, Jun

    2014-01-01

    Training classifiers on skewed data can be technically challenging tasks, especially if the data is high-dimensional simultaneously, the tasks can become more difficult. In biomedicine field, skewed data type often appears. In this study, we try to deal with this problem by combining asymmetric bagging ensemble classifier (asBagging) that has been presented in previous work and an improved random subspace (RS) generation strategy that is called feature subspace (FSS). Specifically, FSS is a novel method to promote the balance level between accuracy and diversity of base classifiers in asBagging. In view of the strong generalization capability of support vector machine (SVM), we adopt it to be base classifier. Extensive experiments on four benchmark biomedicine data sets indicate that the proposed ensemble learning method outperforms many baseline approaches in terms of Accuracy, F-measure, G-mean and AUC evaluation criterions, thus it can be regarded as an effective and efficient tool to deal with high-dimensional and imbalanced biomedical data.

  17. Ensemble based on static classifier selection for automated diagnosis of Mild Cognitive Impairment.

    PubMed

    Nanni, Loris; Lumini, Alessandra; Zaffonato, Nicolò

    2018-05-15

    Alzheimer's disease (AD) is the most common cause of neurodegenerative dementia in the elderly population. Scientific research is very active in the challenge of designing automated approaches to achieve an early and certain diagnosis. Recently an international competition among AD predictors has been organized: "A Machine learning neuroimaging challenge for automated diagnosis of Mild Cognitive Impairment" (MLNeCh). This competition is based on pre-processed sets of T1-weighted Magnetic Resonance Images (MRI) to be classified in four categories: stable AD, individuals with MCI who converted to AD, individuals with MCI who did not convert to AD and healthy controls. In this work, we propose a method to perform early diagnosis of AD, which is evaluated on MLNeCh dataset. Since the automatic classification of AD is based on the use of feature vectors of high dimensionality, different techniques of feature selection/reduction are compared in order to avoid the curse-of-dimensionality problem, then the classification method is obtained as the combination of Support Vector Machines trained using different clusters of data extracted from the whole training set. The multi-classifier approach proposed in this work outperforms all the stand-alone method tested in our experiments. The final ensemble is based on a set of classifiers, each trained on a different cluster of the training data. The proposed ensemble has the great advantage of performing well using a very reduced version of the data (the reduction factor is more than 90%). The MATLAB code for the ensemble of classifiers will be publicly available 1 to other researchers for future comparisons. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Iterative-method performance evaluation for multiple vectors associated with a large-scale sparse matrix

    NASA Astrophysics Data System (ADS)

    Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo

    2016-07-01

    Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.

  19. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    PubMed

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.

  20. Conservative strategy-based ensemble surrogate model for optimal groundwater remediation design at DNAPLs-contaminated sites

    NASA Astrophysics Data System (ADS)

    Ouyang, Qi; Lu, Wenxi; Lin, Jin; Deng, Wenbing; Cheng, Weiguo

    2017-08-01

    The surrogate-based simulation-optimization techniques are frequently used for optimal groundwater remediation design. When this technique is used, surrogate errors caused by surrogate-modeling uncertainty may lead to generation of infeasible designs. In this paper, a conservative strategy that pushes the optimal design into the feasible region was used to address surrogate-modeling uncertainty. In addition, chance-constrained programming (CCP) was adopted to compare with the conservative strategy in addressing this uncertainty. Three methods, multi-gene genetic programming (MGGP), Kriging (KRG) and support vector regression (SVR), were used to construct surrogate models for a time-consuming multi-phase flow model. To improve the performance of the surrogate model, ensemble surrogates were constructed based on combinations of different stand-alone surrogate models. The results show that: (1) the surrogate-modeling uncertainty was successfully addressed by the conservative strategy, which means that this method is promising for addressing surrogate-modeling uncertainty. (2) The ensemble surrogate model that combines MGGP with KRG showed the most favorable performance, which indicates that this ensemble surrogate can utilize both stand-alone surrogate models to improve the performance of the surrogate model.

  1. Design and Implementation of a Parallel Multivariate Ensemble Kalman Filter for the Poseidon Ocean General Circulation Model

    NASA Technical Reports Server (NTRS)

    Keppenne, Christian L.; Rienecker, Michele M.; Koblinsky, Chester (Technical Monitor)

    2001-01-01

    A multivariate ensemble Kalman filter (MvEnKF) implemented on a massively parallel computer architecture has been implemented for the Poseidon ocean circulation model and tested with a Pacific Basin model configuration. There are about two million prognostic state-vector variables. Parallelism for the data assimilation step is achieved by regionalization of the background-error covariances that are calculated from the phase-space distribution of the ensemble. Each processing element (PE) collects elements of a matrix measurement functional from nearby PEs. To avoid the introduction of spurious long-range covariances associated with finite ensemble sizes, the background-error covariances are given compact support by means of a Hadamard (element by element) product with a three-dimensional canonical correlation function. The methodology and the MvEnKF configuration are discussed. It is shown that the regionalization of the background covariances; has a negligible impact on the quality of the analyses. The parallel algorithm is very efficient for large numbers of observations but does not scale well beyond 100 PEs at the current model resolution. On a platform with distributed memory, memory rather than speed is the limiting factor.

  2. Impact of climate change upon vector born diseases in Europe and Africa using ENSEMBLES Regional Climate Models

    NASA Astrophysics Data System (ADS)

    Caminade, Cyril; Morse, Andy

    2010-05-01

    Climate variability is an important component in determining the incidence of a number of diseases with significant human/animal health and socioeconomic impacts. The most important diseases affecting health are vector-borne, such as malaria, Rift Valley Fever and including those that are tick borne, with over 3 billion of the world population at risk. Malaria alone is responsible for at least one million deaths annually, with 80% of malaria deaths occurring in sub-Saharan Africa. The climate has a large impact upon the incidence of vector-borne diseases; directly via the development rates and survival of both the pathogen and the vector, and indirectly through changes in the environmental conditions. A large ensemble of regional climate model simulations has been produced within the ENSEMBLES project framework for both the European and African continent. This work will present recent progress in human and animal disease modelling, based on high resolution climate observations and regional climate simulations. Preliminary results will be given as an illustration, including the impact of climate change upon bluetongue (disease affecting the cattle) over Europe and upon malaria and Rift Valley Fever over Africa. Malaria scenarios based on RCM ensemble simulations have been produced for West Africa. These simulations have been carried out using the Liverpool Malaria Model. Future projections highlight that the malaria incidence decreases at the northern edge of the Sahel and that the epidemic belt is shifted southward in autumn. This could lead to significant public health problems in the future as the demography is expected to dramatically rise over Africa for the 21st century.

  3. Exploiting ensemble learning for automatic cataract detection and grading.

    PubMed

    Yang, Ji-Jiang; Li, Jianqiang; Shen, Ruifang; Zeng, Yang; He, Jian; Bi, Jing; Li, Yong; Zhang, Qinyan; Peng, Lihui; Wang, Qing

    2016-02-01

    Cataract is defined as a lenticular opacity presenting usually with poor visual acuity. It is one of the most common causes of visual impairment worldwide. Early diagnosis demands the expertise of trained healthcare professionals, which may present a barrier to early intervention due to underlying costs. To date, studies reported in the literature utilize a single learning model for retinal image classification in grading cataract severity. We present an ensemble learning based approach as a means to improving diagnostic accuracy. Three independent feature sets, i.e., wavelet-, sketch-, and texture-based features, are extracted from each fundus image. For each feature set, two base learning models, i.e., Support Vector Machine and Back Propagation Neural Network, are built. Then, the ensemble methods, majority voting and stacking, are investigated to combine the multiple base learning models for final fundus image classification. Empirical experiments are conducted for cataract detection (two-class task, i.e., cataract or non-cataractous) and cataract grading (four-class task, i.e., non-cataractous, mild, moderate or severe) tasks. The best performance of the ensemble classifier is 93.2% and 84.5% in terms of the correct classification rates for cataract detection and grading tasks, respectively. The results demonstrate that the ensemble classifier outperforms the single learning model significantly, which also illustrates the effectiveness of the proposed approach. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  4. Proposed hybrid-classifier ensemble algorithm to map snow cover area

    NASA Astrophysics Data System (ADS)

    Nijhawan, Rahul; Raman, Balasubramanian; Das, Josodhir

    2018-01-01

    Metaclassification ensemble approach is known to improve the prediction performance of snow-covered area. The methodology adopted in this case is based on neural network along with four state-of-art machine learning algorithms: support vector machine, artificial neural networks, spectral angle mapper, K-mean clustering, and a snow index: normalized difference snow index. An AdaBoost ensemble algorithm related to decision tree for snow-cover mapping is also proposed. According to available literature, these methods have been rarely used for snow-cover mapping. Employing the above techniques, a study was conducted for Raktavarn and Chaturangi Bamak glaciers, Uttarakhand, Himalaya using multispectral Landsat 7 ETM+ (enhanced thematic mapper) image. The study also compares the results with those obtained from statistical combination methods (majority rule and belief functions) and accuracies of individual classifiers. Accuracy assessment is performed by computing the quantity and allocation disagreement, analyzing statistic measures (accuracy, precision, specificity, AUC, and sensitivity) and receiver operating characteristic curves. A total of 225 combinations of parameters for individual classifiers were trained and tested on the dataset and results were compared with the proposed approach. It was observed that the proposed methodology produced the highest classification accuracy (95.21%), close to (94.01%) that was produced by the proposed AdaBoost ensemble algorithm. From the sets of observations, it was concluded that the ensemble of classifiers produced better results compared to individual classifiers.

  5. Linear Reconstruction of Non-Stationary Image Ensembles Incorporating Blur and Noise Models

    DTIC Science & Technology

    1998-03-01

    for phase distortions due to noise which leads to less deblurring as noise increases [41]. In contrast, the vector Wiener filter incorporates some a...AFIT/DS/ENG/98- 06 Linear Reconstruction of Non-Stationary Image Ensembles Incorporating Blur and Noise Models DISSERTATION Stephen D. Ford Captain...Dissertation 4. TITLE AND SUBTITLE 5. FUNDING NUMBERS LINEAR RECONSTRUCTION OF NON-STATIONARY IMAGE ENSEMBLES INCORPORATING BLUR AND NOISE MODELS 6. AUTHOR(S

  6. Ensemble of One-Class Classifiers for Personal Risk Detection Based on Wearable Sensor Data.

    PubMed

    Rodríguez, Jorge; Barrera-Animas, Ari Y; Trejo, Luis A; Medina-Pérez, Miguel Angel; Monroy, Raúl

    2016-09-29

    This study introduces the One-Class K-means with Randomly-projected features Algorithm (OCKRA). OCKRA is an ensemble of one-class classifiers built over multiple projections of a dataset according to random feature subsets. Algorithms found in the literature spread over a wide range of applications where ensembles of one-class classifiers have been satisfactorily applied; however, none is oriented to the area under our study: personal risk detection. OCKRA has been designed with the aim of improving the detection performance in the problem posed by the Personal RIsk DEtection(PRIDE) dataset. PRIDE was built based on 23 test subjects, where the data for each user were captured using a set of sensors embedded in a wearable band. The performance of OCKRA was compared against support vector machine and three versions of the Parzen window classifier. On average, experimental results show that OCKRA outperformed the other classifiers for at least 0.53% of the area under the curve (AUC). In addition, OCKRA achieved an AUC above 90% for more than 57% of the users.

  7. Ensemble of One-Class Classifiers for Personal Risk Detection Based on Wearable Sensor Data

    PubMed Central

    Rodríguez, Jorge; Barrera-Animas, Ari Y.; Trejo, Luis A.; Medina-Pérez, Miguel Angel; Monroy, Raúl

    2016-01-01

    This study introduces the One-Class K-means with Randomly-projected features Algorithm (OCKRA). OCKRA is an ensemble of one-class classifiers built over multiple projections of a dataset according to random feature subsets. Algorithms found in the literature spread over a wide range of applications where ensembles of one-class classifiers have been satisfactorily applied; however, none is oriented to the area under our study: personal risk detection. OCKRA has been designed with the aim of improving the detection performance in the problem posed by the Personal RIsk DEtection(PRIDE) dataset. PRIDE was built based on 23 test subjects, where the data for each user were captured using a set of sensors embedded in a wearable band. The performance of OCKRA was compared against support vector machine and three versions of the Parzen window classifier. On average, experimental results show that OCKRA outperformed the other classifiers for at least 0.53% of the area under the curve (AUC). In addition, OCKRA achieved an AUC above 90% for more than 57% of the users. PMID:27690054

  8. PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine.

    PubMed

    Hayat, Maqsood; Tahir, Muhammad

    2015-08-01

    Membrane protein is a central component of the cell that manages intra and extracellular processes. Membrane proteins execute a diversity of functions that are vital for the survival of organisms. The topology of transmembrane proteins describes the number of transmembrane (TM) helix segments and its orientation. However, owing to the lack of its recognized structures, the identification of TM helix and its topology through experimental methods is laborious with low throughput. In order to identify TM helix segments reliably, accurately, and effectively from topogenic sequences, we propose the PSOFuzzySVM-TMH model. In this model, evolutionary based information position specific scoring matrix and discrete based information 6-letter exchange group are used to formulate transmembrane protein sequences. The noisy and extraneous attributes are eradicated using an optimization selection technique, particle swarm optimization, from both feature spaces. Finally, the selected feature spaces are combined in order to form ensemble feature space. Fuzzy-support vector Machine is utilized as a classification algorithm. Two benchmark datasets, including low and high resolution datasets, are used. At various levels, the performance of the PSOFuzzySVM-TMH model is assessed through 10-fold cross validation test. The empirical results reveal that the proposed framework PSOFuzzySVM-TMH outperforms in terms of classification performance in the examined datasets. It is ascertained that the proposed model might be a useful and high throughput tool for academia and research community for further structure and functional studies on transmembrane proteins.

  9. Bayesian Hierarchical Model Characterization of Model Error in Ocean Data Assimilation and Forecasts

    DTIC Science & Technology

    2013-09-30

    proof-of-concept results comparing a BHM surface wind ensemble with the increments in the surface momentum flux control vector in a four-dimensional...Surface   Momentum  Flux  Ensembles  from  Summaries  of  BHM  Winds  (Mediterranean)   include  ocean  current  effect   Td...Bayesian Hierarchical Model to provide surface momentum flux ensembles. 3 Figure 2: Domain of interest : squares indicate spatial locations where

  10. Improving precision of glomerular filtration rate estimating model by ensemble learning.

    PubMed

    Liu, Xun; Li, Ningshan; Lv, Linsheng; Fu, Yongmei; Cheng, Cailian; Wang, Caixia; Ye, Yuqiu; Li, Shaomin; Lou, Tanqi

    2017-11-09

    Accurate assessment of kidney function is clinically important, but estimates of glomerular filtration rate (GFR) by regression are imprecise. We hypothesized that ensemble learning could improve precision. A total of 1419 participants were enrolled, with 1002 in the development dataset and 417 in the external validation dataset. GFR was independently estimated from age, sex and serum creatinine using an artificial neural network (ANN), support vector machine (SVM), regression, and ensemble learning. GFR was measured by 99mTc-DTPA renal dynamic imaging calibrated with dual plasma sample 99mTc-DTPA GFR. Mean measured GFRs were 70.0 ml/min/1.73 m 2 in the developmental and 53.4 ml/min/1.73 m 2 in the external validation cohorts. In the external validation cohort, precision was better in the ensemble model of the ANN, SVM and regression equation (IQR = 13.5 ml/min/1.73 m 2 ) than in the new regression model (IQR = 14.0 ml/min/1.73 m 2 , P < 0.001). The precision of ensemble learning was the best of the three models, but the models had similar bias and accuracy. The median difference ranged from 2.3 to 3.7 ml/min/1.73 m 2 , 30% accuracy ranged from 73.1 to 76.0%, and P was > 0.05 for all comparisons of the new regression equation and the other new models. An ensemble learning model including three variables, the average ANN, SVM, and regression equation values, was more precise than the new regression model. A more complex ensemble learning strategy may further improve GFR estimates.

  11. NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

    PubMed Central

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available. PMID:24667482

  12. IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids.

    PubMed

    Ali, Safdar; Majid, Abdul; Khan, Asifullah

    2014-04-01

    Development of an accurate and reliable intelligent decision-making method for the construction of cancer diagnosis system is one of the fast growing research areas of health sciences. Such decision-making system can provide adequate information for cancer diagnosis and drug discovery. Descriptors derived from physicochemical properties of protein sequences are very useful for classifying cancerous proteins. Recently, several interesting research studies have been reported on breast cancer classification. To this end, we propose the exploitation of the physicochemical properties of amino acids in protein primary sequences such as hydrophobicity (Hd) and hydrophilicity (Hb) for breast cancer classification. Hd and Hb properties of amino acids, in recent literature, are reported to be quite effective in characterizing the constituent amino acids and are used to study protein foldings, interactions, structures, and sequence-order effects. Especially, using these physicochemical properties, we observed that proline, serine, tyrosine, cysteine, arginine, and asparagine amino acids offer high discrimination between cancerous and healthy proteins. In addition, unlike traditional ensemble classification approaches, the proposed 'IDM-PhyChm-Ens' method was developed by combining the decision spaces of a specific classifier trained on different feature spaces. The different feature spaces used were amino acid composition, split amino acid composition, and pseudo amino acid composition. Consequently, we have exploited different feature spaces using Hd and Hb properties of amino acids to develop an accurate method for classification of cancerous protein sequences. We developed ensemble classifiers using diverse learning algorithms such as random forest (RF), support vector machines (SVM), and K-nearest neighbor (KNN) trained on different feature spaces. We observed that ensemble-RF, in case of cancer classification, performed better than ensemble-SVM and ensemble-KNN. Our analysis demonstrates that ensemble-RF, ensemble-SVM and ensemble-KNN are more effective than their individual counterparts. The proposed 'IDM-PhyChm-Ens' method has shown improved performance compared to existing techniques.

  13. DART: A Community Facility Providing State-of-the-Art, Efficient Ensemble Data Assimilation for Large (Coupled) Geophysical Models

    NASA Astrophysics Data System (ADS)

    Hoar, T. J.; Anderson, J. L.; Collins, N.; Kershaw, H.; Hendricks, J.; Raeder, K.; Mizzi, A. P.; Barré, J.; Gaubert, B.; Madaus, L. E.; Aydogdu, A.; Raeder, J.; Arango, H.; Moore, A. M.; Edwards, C. A.; Curchitser, E. N.; Escudier, R.; Dussin, R.; Bitz, C. M.; Zhang, Y. F.; Shrestha, P.; Rosolem, R.; Rahman, M.

    2016-12-01

    Strongly-coupled ensemble data assimilation with multiple high-resolution model components requires massive state vectors which need to be efficiently stored and accessed throughout the assimilation process. Supercomputer architectures are tending towards increasing the number of cores per node but have the same or less memory per node. Recent advances in the Data Assimilation Research Testbed (DART), a freely-available community ensemble data assimilation facility that works with dozens of large geophysical models, have addressed the need to run with a smaller memory footprint on a higher node count by utilizing MPI-2 one-sided communication to do non-blocking asynchronous access of distributed data. DART runs efficiently on many computational platforms ranging from laptops through thousands of cores on the newest supercomputers. Benefits of the new DART implementation will be shown. In addition, overviews of the most recently supported models will be presented: CAM-CHEM, WRF-CHEM, CM1, OpenGGCM, FESOM, ROMS, CICE5, TerrSysMP (COSMO, CLM, ParFlow), JULES, and CABLE. DART provides a comprehensive suite of software, documentation, and tutorials that can be used for ensemble data assimilation research, operations, and education. Scientists and software engineers at NCAR are available to support DART users who want to use existing DART products or develop their own applications. Current DART users range from university professors teaching data assimilation, to individual graduate students working with simple models, through national laboratories and state agencies doing operational prediction with large state-of-the-art models.

  14. A transposase strategy for creating libraries of circularly permuted proteins.

    PubMed

    Mehta, Manan M; Liu, Shirley; Silberg, Jonathan J

    2012-05-01

    A simple approach for creating libraries of circularly permuted proteins is described that is called PERMutation Using Transposase Engineering (PERMUTE). In PERMUTE, the transposase MuA is used to randomly insert a minitransposon that can function as a protein expression vector into a plasmid that contains the open reading frame (ORF) being permuted. A library of vectors that express different permuted variants of the ORF-encoded protein is created by: (i) using bacteria to select for target vectors that acquire an integrated minitransposon; (ii) excising the ensemble of ORFs that contain an integrated minitransposon from the selected vectors; and (iii) circularizing the ensemble of ORFs containing integrated minitransposons using intramolecular ligation. Construction of a Thermotoga neapolitana adenylate kinase (AK) library using PERMUTE revealed that this approach produces vectors that express circularly permuted proteins with distinct sequence diversity from existing methods. In addition, selection of this library for variants that complement the growth of Escherichia coli with a temperature-sensitive AK identified functional proteins with novel architectures, suggesting that PERMUTE will be useful for the directed evolution of proteins with new functions.

  15. A transposase strategy for creating libraries of circularly permuted proteins

    PubMed Central

    Mehta, Manan M.; Liu, Shirley; Silberg, Jonathan J.

    2012-01-01

    A simple approach for creating libraries of circularly permuted proteins is described that is called PERMutation Using Transposase Engineering (PERMUTE). In PERMUTE, the transposase MuA is used to randomly insert a minitransposon that can function as a protein expression vector into a plasmid that contains the open reading frame (ORF) being permuted. A library of vectors that express different permuted variants of the ORF-encoded protein is created by: (i) using bacteria to select for target vectors that acquire an integrated minitransposon; (ii) excising the ensemble of ORFs that contain an integrated minitransposon from the selected vectors; and (iii) circularizing the ensemble of ORFs containing integrated minitransposons using intramolecular ligation. Construction of a Thermotoga neapolitana adenylate kinase (AK) library using PERMUTE revealed that this approach produces vectors that express circularly permuted proteins with distinct sequence diversity from existing methods. In addition, selection of this library for variants that complement the growth of Escherichia coli with a temperature-sensitive AK identified functional proteins with novel architectures, suggesting that PERMUTE will be useful for the directed evolution of proteins with new functions. PMID:22319214

  16. Automated detection of pulmonary nodules in CT images with support vector machines

    NASA Astrophysics Data System (ADS)

    Liu, Lu; Liu, Wanyu; Sun, Xiaoming

    2008-10-01

    Many methods have been proposed to avoid radiologists fail to diagnose small pulmonary nodules. Recently, support vector machines (SVMs) had received an increasing attention for pattern recognition. In this paper, we present a computerized system aimed at pulmonary nodules detection; it identifies the lung field, extracts a set of candidate regions with a high sensitivity ratio and then classifies candidates by the use of SVMs. The Computer Aided Diagnosis (CAD) system presented in this paper supports the diagnosis of pulmonary nodules from Computed Tomography (CT) images as inflammation, tuberculoma, granuloma..sclerosing hemangioma, and malignant tumor. Five texture feature sets were extracted for each lesion, while a genetic algorithm based feature selection method was applied to identify the most robust features. The selected feature set was fed into an ensemble of SVMs classifiers. The achieved classification performance was 100%, 92.75% and 90.23% in the training, validation and testing set, respectively. It is concluded that computerized analysis of medical images in combination with artificial intelligence can be used in clinical practice and may contribute to more efficient diagnosis.

  17. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  18. SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease.

    PubMed

    Ozcift, Akin

    2012-08-01

    Parkinson disease (PD) is an age-related deterioration of certain nerve systems, which affects movement, balance, and muscle control of clients. PD is one of the common diseases which affect 1% of people older than 60 years. A new classification scheme based on support vector machine (SVM) selected features to train rotation forest (RF) ensemble classifiers is presented for improving diagnosis of PD. The dataset contains records of voice measurements from 31 people, 23 with PD and each record in the dataset is defined with 22 features. The diagnosis model first makes use of a linear SVM to select ten most relevant features from 22. As a second step of the classification model, six different classifiers are trained with the subset of features. Subsequently, at the third step, the accuracies of classifiers are improved by the utilization of RF ensemble classification strategy. The results of the experiments are evaluated using three metrics; classification accuracy (ACC), Kappa Error (KE) and Area under the Receiver Operating Characteristic (ROC) Curve (AUC). Performance measures of two base classifiers, i.e. KStar and IBk, demonstrated an apparent increase in PD diagnosis accuracy compared to similar studies in literature. After all, application of RF ensemble classification scheme improved PD diagnosis in 5 of 6 classifiers significantly. We, numerically, obtained about 97% accuracy in RF ensemble of IBk (a K-Nearest Neighbor variant) algorithm, which is a quite high performance for Parkinson disease diagnosis.

  19. Short-Circuit Fault Detection and Classification Using Empirical Wavelet Transform and Local Energy for Electric Transmission Line.

    PubMed

    Huang, Nantian; Qi, Jiajin; Li, Fuqing; Yang, Dongfeng; Cai, Guowei; Huang, Guilin; Zheng, Jian; Li, Zhenxin

    2017-09-16

    In order to improve the classification accuracy of recognizing short-circuit faults in electric transmission lines, a novel detection and diagnosis method based on empirical wavelet transform (EWT) and local energy (LE) is proposed. First, EWT is used to deal with the original short-circuit fault signals from photoelectric voltage transformers, before the amplitude modulated-frequency modulated (AM-FM) mode with a compactly supported Fourier spectrum is extracted. Subsequently, the fault occurrence time is detected according to the modulus maxima of intrinsic mode function (IMF₂) from three-phase voltage signals processed by EWT. After this process, the feature vectors are constructed by calculating the LE of the fundamental frequency based on the three-phase voltage signals of one period after the fault occurred. Finally, the classifier based on support vector machine (SVM) which was constructed with the LE feature vectors is used to classify 10 types of short-circuit fault signals. Compared with complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and improved CEEMDAN methods, the new method using EWT has a better ability to present the frequency in time. The difference in the characteristics of the energy distribution in the time domain between different types of short-circuit faults can be presented by the feature vectors of LE. Together, simulation and real signals experiment demonstrate the validity and effectiveness of the new approach.

  20. Short-Circuit Fault Detection and Classification Using Empirical Wavelet Transform and Local Energy for Electric Transmission Line

    PubMed Central

    Huang, Nantian; Qi, Jiajin; Li, Fuqing; Yang, Dongfeng; Cai, Guowei; Huang, Guilin; Zheng, Jian; Li, Zhenxin

    2017-01-01

    In order to improve the classification accuracy of recognizing short-circuit faults in electric transmission lines, a novel detection and diagnosis method based on empirical wavelet transform (EWT) and local energy (LE) is proposed. First, EWT is used to deal with the original short-circuit fault signals from photoelectric voltage transformers, before the amplitude modulated-frequency modulated (AM-FM) mode with a compactly supported Fourier spectrum is extracted. Subsequently, the fault occurrence time is detected according to the modulus maxima of intrinsic mode function (IMF2) from three-phase voltage signals processed by EWT. After this process, the feature vectors are constructed by calculating the LE of the fundamental frequency based on the three-phase voltage signals of one period after the fault occurred. Finally, the classifier based on support vector machine (SVM) which was constructed with the LE feature vectors is used to classify 10 types of short-circuit fault signals. Compared with complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and improved CEEMDAN methods, the new method using EWT has a better ability to present the frequency in time. The difference in the characteristics of the energy distribution in the time domain between different types of short-circuit faults can be presented by the feature vectors of LE. Together, simulation and real signals experiment demonstrate the validity and effectiveness of the new approach. PMID:28926953

  1. Optical vector network analysis of ultranarrow transitions in 166Er3+ : 7LiYF4 crystal.

    PubMed

    Kukharchyk, N; Sholokhov, D; Morozov, O; Korableva, S L; Cole, J H; Kalachev, A A; Bushev, P A

    2018-02-15

    We present optical vector network analysis (OVNA) of an isotopically purified Er166 3+ :LiYF 4 7 crystal. The OVNA method is based on generation and detection of a modulated optical sideband by using a radio-frequency vector network analyzer. This technique is widely used in the field of microwave photonics for the characterization of optical responses of optical devices such as filters and high-Q resonators. However, dense solid-state atomic ensembles induce a large phase shift on one of the optical sidebands that results in the appearance of extra features on the measured transmission response. We present a simple theoretical model that accurately describes the observed spectra and helps to reconstruct the absorption profile of a solid-state atomic ensemble as well as corresponding change of the refractive index in the vicinity of atomic resonances.

  2. Ensemble of shape functions and support vector machines for the estimation of discrete arm muscle activation from external biceps 3D point clouds.

    PubMed

    Abraham, Leandro; Bromberg, Facundo; Forradellas, Raymundo

    2018-04-01

    Muscle activation level is currently being captured using impractical and expensive devices which make their use in telemedicine settings extremely difficult. To address this issue, a prototype is presented of a non-invasive, easy-to-install system for the estimation of a discrete level of muscle activation of the biceps muscle from 3D point clouds captured with RGB-D cameras. A methodology is proposed that uses the ensemble of shape functions point cloud descriptor for the geometric characterization of 3D point clouds, together with support vector machines to learn a classifier that, based on this geometric characterization for some points of view of the biceps, provides a model for the estimation of muscle activation for all neighboring points of view. This results in a classifier that is robust to small perturbations in the point of view of the capturing device, greatly simplifying the installation process for end-users. In the discrimination of five levels of effort with values up to the maximum voluntary contraction (MVC) of the biceps muscle (3800 g), the best variant of the proposed methodology achieved mean absolute errors of about 9.21% MVC - an acceptable performance for telemedicine settings where the electric measurement of muscle activation is impractical. The results prove that the correlations between the external geometry of the arm and biceps muscle activation are strong enough to consider computer vision and supervised learning an alternative with great potential for practical applications in tele-physiotherapy. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. Detecting bladder fullness through the ensemble activity patterns of the spinal cord unit population in a somatovisceral convergence environment.

    PubMed

    Park, Jae Hong; Kim, Chang-Eop; Shin, Jaewoo; Im, Changkyun; Koh, Chin Su; Seo, In Seok; Kim, Sang Jeong; Shin, Hyung-Cheul

    2013-10-01

    Chronic monitoring of the state of the bladder can be used to notify patients with urinary dysfunction when the bladder should be voided. Given that many spinal neurons respond both to somatic and visceral inputs, it is necessary to extract bladder information selectively from the spinal cord. Here, we hypothesize that sensory information with distinct modalities should be represented by the distinct ensemble activity patterns within the neuronal population and, therefore, analyzing the activity patterns of the neuronal population could distinguish bladder fullness from somatic stimuli. We simultaneously recorded 26-27 single unit activities in response to bladder distension or tactile stimuli in the dorsal spinal cord of each Sprague-Dawley rat. In order to discriminate between bladder fullness and tactile stimulus inputs, we analyzed the ensemble activity patterns of the entire neuronal population. A support vector machine (SVM) was employed as a classifier, and discrimination performance was measured by k-fold cross-validation tests. Most of the units responding to bladder fullness also responded to the tactile stimuli (88.9-100%). The SVM classifier precisely distinguished the bladder fullness from the somatic input (100%), indicating that the ensemble activity patterns of the unit population in the spinal cord are distinct enough to identify the current input modality. Moreover, our ensemble activity pattern-based classifier showed high robustness against random losses of signals. This study is the first to demonstrate that the two main issues of electroneurographic monitoring of bladder fullness, low signals and selectiveness, can be solved by an ensemble activity pattern-based approach, improving the feasibility of chronic monitoring of bladder fullness by neural recording.

  4. Ensemble forecasting of potential habitat for three invasive fishes

    USGS Publications Warehouse

    Poulos, Helen M.; Chernoff, Barry; Fuller, Pam L.; Butman, David

    2012-01-01

    Aquatic invasive species pose major ecological and economic threats to aquatic ecosystems worldwide via displacement, predation, or hybridization with native species and the alteration of aquatic habitats and hydrologic cycles. Modeling the habitat suitability of alien aquatic species through spatially explicit mapping is an increasingly important risk assessment tool. Habitat modeling also facilitates identification of key environmental variables influencing invasive species distributions. We compared four modeling methods to predict the potential continental United States distributions of northern snakehead Channa argus (Cantor, 1842), round goby Neogobius melanostomus (Pallas, 1814), and silver carp Hypophthalmichthys molitrix (Valenciennes, 1844) using maximum entropy (Maxent), the genetic algorithm for rule set production (GARP), DOMAIN, and support vector machines (SVM). We used inventory records from the USGS Nonindigenous Aquatic Species Database and a geographic information system of 20 climatic and environmental variables to generate individual and ensemble distribution maps for each species. The ensemble maps from our study performed as well as or better than all of the individual models except Maxent. The ensemble and Maxent models produced significantly higher accuracy individual maps than GARP, one-class SVMs, or DOMAIN. The key environmental predictor variables in the individual models were consistent with the tolerances of each species. Results from this study provide insights into which locations and environmental conditions may promote the future spread of invasive fish in the US.

  5. A mutual information-Dempster-Shafer based decision ensemble system for land cover classification of hyperspectral data

    NASA Astrophysics Data System (ADS)

    Pahlavani, Parham; Bigdeli, Behnaz

    2017-12-01

    Hyperspectral images contain extremely rich spectral information that offer great potential to discriminate between various land cover classes. However, these images are usually composed of tens or hundreds of spectrally close bands, which result in high redundancy and great amount of computation time in hyperspectral classification. Furthermore, in the presence of mixed coverage pixels, crisp classifiers produced errors, omission and commission. This paper presents a mutual information-Dempster-Shafer system through an ensemble classification approach for classification of hyperspectral data. First, mutual information is applied to split data into a few independent partitions to overcome high dimensionality. Then, a fuzzy maximum likelihood classifies each band subset. Finally, Dempster-Shafer is applied to fuse the results of the fuzzy classifiers. In order to assess the proposed method, a crisp ensemble system based on a support vector machine as the crisp classifier and weighted majority voting as the crisp fusion method are applied on hyperspectral data. Furthermore, a dimension reduction system is utilized to assess the effectiveness of mutual information band splitting of the proposed method. The proposed methodology provides interesting conclusions on the effectiveness and potentiality of mutual information-Dempster-Shafer based classification of hyperspectral data.

  6. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis.

    PubMed

    You, Zhu-Hong; Lei, Ying-Ke; Zhu, Lin; Xia, Junfeng; Wang, Bing

    2013-01-01

    Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.

  7. A mapping of an ensemble of mitochondrial sequences for various organisms into 3D space based on the word composition.

    PubMed

    Aita, Takuyo; Nishigaki, Koichi

    2012-11-01

    To visualize a bird's-eye view of an ensemble of mitochondrial genome sequences for various species, we recently developed a novel method of mapping a biological sequence ensemble into Three-Dimensional (3D) vector space. First, we represented a biological sequence of a species s by a word-composition vector x(s), where its length [absolute value]x(s)[absolute value] represents the sequence length, and its unit vector x(s)/[absolute value]x(s)[absolute value] represents the relative composition of the K-tuple words through the sequence and the size of the dimension, N=4(K), is the number of all possible words with the length of K. Second, we mapped the vector x(s) to the 3D position vector y(s), based on the two following simple principles: (1) [absolute value]y(s)[absolute value]=[absolute value]x(s)[absolute value] and (2) the angle between y(s) and y(t) maximally correlates with the angle between x(s) and x(t). The mitochondrial genome sequences for 311 species, including 177 Animalia, 85 Fungi and 49 Green plants, were mapped into 3D space by using K=7. The mapping was successful because the angles between vectors before and after the mapping highly correlated with each other (correlation coefficients were 0.92-0.97). Interestingly, the Animalia kingdom is distributed along a single arc belt (just like the Milky Way on a Celestial Globe), and the Fungi and Green plant kingdoms are distributed in a similar arc belt. These two arc belts intersect at their respective middle regions and form a cross structure just like a jet aircraft fuselage and its wings. This new mapping method will allow researchers to intuitively interpret the visual information presented in the maps in a highly effective manner. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. Predicting Error Bars for QSAR Models

    NASA Astrophysics Data System (ADS)

    Schroeter, Timon; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-09-01

    Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D7 models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniques for the other modelling approaches.

  9. Spatial and temporal variation in the abundance of Culicoides biting midges (Diptera: Ceratopogonidae) in nine European countries.

    PubMed

    Cuéllar, Ana Carolina; Kjær, Lene Jung; Kirkeby, Carsten; Skovgard, Henrik; Nielsen, Søren Achim; Stockmarr, Anders; Andersson, Gunnar; Lindstrom, Anders; Chirico, Jan; Lühken, Renke; Steinke, Sonja; Kiel, Ellen; Gethmann, Jörn; Conraths, Franz J; Larska, Magdalena; Hamnes, Inger; Sviland, Ståle; Hopp, Petter; Brugger, Katharina; Rubel, Franz; Balenghien, Thomas; Garros, Claire; Rakotoarivony, Ignace; Allène, Xavier; Lhoir, Jonathan; Chavernac, David; Delécolle, Jean-Claude; Mathieu, Bruno; Delécolle, Delphine; Setier-Rio, Marie-Laure; Venail, Roger; Scheid, Bethsabée; Chueca, Miguel Ángel Miranda; Barceló, Carlos; Lucientes, Javier; Estrada, Rosa; Mathis, Alexander; Tack, Wesley; Bødker, Rene

    2018-02-27

    Biting midges of the genus Culicoides (Diptera: Ceratopogonidae) are vectors of bluetongue virus (BTV), African horse sickness virus and Schmallenberg virus (SBV). Outbreaks of both BTV and SBV have affected large parts of Europe. The spread of these diseases depends largely on vector distribution and abundance. The aim of this analysis was to identify and quantify major spatial patterns and temporal trends in the distribution and seasonal variation of observed Culicoides abundance in nine countries in Europe. We gathered existing Culicoides data from Spain, France, Germany, Switzerland, Austria, Denmark, Sweden, Norway and Poland. In total, 31,429 Culicoides trap collections were available from 904 ruminant farms across these countries between 2007 and 2013. The Obsoletus ensemble was distributed widely in Europe and accounted for 83% of all 8,842,998 Culicoides specimens in the dataset, with the highest mean monthly abundance recorded in France, Germany and southern Norway. The Pulicaris ensemble accounted for only 12% of the specimens and had a relatively southerly and easterly spatial distribution compared to the Obsoletus ensemble. Culicoides imicola Kieffer was only found in Spain and the southernmost part of France. There was a clear spatial trend in the accumulated annual abundance from southern to northern Europe, with the Obsoletus ensemble steadily increasing from 4000 per year in southern Europe to 500,000 in Scandinavia. The Pulicaris ensemble showed a very different pattern, with an increase in the accumulated annual abundance from 1600 in Spain, peaking at 41,000 in northern Germany and then decreasing again toward northern latitudes. For the two species ensembles and C. imicola, the season began between January and April, with later start dates and increasingly shorter vector seasons at more northerly latitudes. We present the first maps of seasonal Culicoides abundance in large parts of Europe covering a gradient from southern Spain to northern Scandinavia. The identified temporal trends and spatial patterns are useful for planning the allocation of resources for international prevention and surveillance programmes in the European Union.

  10. Per-field crop classification in irrigated agricultural regions in middle Asia using random forest and support vector machine ensemble

    NASA Astrophysics Data System (ADS)

    Löw, Fabian; Schorcht, Gunther; Michel, Ulrich; Dech, Stefan; Conrad, Christopher

    2012-10-01

    Accurate crop identification and crop area estimation are important for studies on irrigated agricultural systems, yield and water demand modeling, and agrarian policy development. In this study a novel combination of Random Forest (RF) and Support Vector Machine (SVM) classifiers is presented that (i) enhances crop classification accuracy and (ii) provides spatial information on map uncertainty. The methodology was implemented over four distinct irrigated sites in Middle Asia using RapidEye time series data. The RF feature importance statistics was used as feature-selection strategy for the SVM to assess possible negative effects on classification accuracy caused by an oversized feature space. The results of the individual RF and SVM classifications were combined with rules based on posterior classification probability and estimates of classification probability entropy. SVM classification performance was increased by feature selection through RF. Further experimental results indicate that the hybrid classifier improves overall classification accuracy in comparison to the single classifiers as well as useŕs and produceŕs accuracy.

  11. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature

    PubMed Central

    Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar

    2017-01-01

    Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems. PMID:29099838

  12. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.

    PubMed

    Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar

    2017-01-01

    Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems.

  13. Computer Based Behavioral Biometric Authentication via Multi-Modal Fusion

    DTIC Science & Technology

    2013-03-01

    the decisions made by each individual modality. Fusion of features is the simple concatenation of feature vectors from multiple modalities to be...of Features BayesNet MDL 330 LibSVM PCA 80 J48 Wrapper Evaluator 11 3.5.3 Ensemble Based Decision Level Fusion. In ensemble learning multiple ...The high fusion percentages validate our hypothesis that by combining features from multiple modalities, classification accuracy can be improved. As

  14. The cubic ternary complex receptor-occupancy model. III. resurrecting efficacy.

    PubMed

    Weiss, J M; Morgan, P H; Lutz, M W; Kenakin, T P

    1996-08-21

    Early work in pharmacology characterized the interaction of receptors and ligands in terms of two parameters, affinity and efficacy, an approach we term the bipartite view. A precise formulation of efficacy only exists for very simple pharmacological models. Here we extend the notion of efficacy to models that incorporate receptor activation and G-protein coupling. Using the cubic ternary complex model, we show that efficacy is not purely a property of the ligand-receptor interaction; it also depends upon the distributional details of the receptor species in the native receptor ensemble. This suggests a distinction between what we call potential efficacy (a vector) and realized efficacy (a scalar). To each receptor species in the native receptor ensemble we assign a part-worth utility; taken together these utilities comprise the potential efficacy vector. Realized efficacy is the expectation of these part-worth utilities with respect to the frequency distribution of receptor species in the native receptor ensemble. In the parlance of statistical decision theory, the binding of a ligand to a receptor ensemble is a random prospect and realized efficacy is the utility of this prospect. We explore the implications that our definition of efficacy has for understanding agonism and in assessing the legitimacy of the bipartite view in pharmacology.

  15. Simulating ensembles of source water quality using a K-nearest neighbor resampling approach.

    PubMed

    Towler, Erin; Rajagopalan, Balaji; Seidel, Chad; Summers, R Scott

    2009-03-01

    Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.

  16. Quasi-most unstable modes: a window to 'À la carte' ensemble diversity?

    NASA Astrophysics Data System (ADS)

    Homar Santaner, Victor; Stensrud, David J.

    2010-05-01

    The atmospheric scientific community is nowadays facing the ambitious challenge of providing useful forecasts of atmospheric events that produce high societal impact. The low level of social resilience to false alarms creates tremendous pressure on forecasting offices to issue accurate, timely and reliable warnings.Currently, no operational numerical forecasting system is able to respond to the societal demand for high-resolution (in time and space) predictions in the 12-72h time span. The main reasons for such deficiencies are the lack of adequate observations and the high non-linearity of the numerical models that are currently used. The whole weather forecasting problem is intrinsically probabilistic and current methods aim at coping with the various sources of uncertainties and the error propagation throughout the forecasting system. This probabilistic perspective is often created by generating ensembles of deterministic predictions that are aimed at sampling the most important sources of uncertainty in the forecasting system. The ensemble generation/sampling strategy is a crucial aspect of their performance and various methods have been proposed. Although global forecasting offices have been using ensembles of perturbed initial conditions for medium-range operational forecasts since 1994, no consensus exists regarding the optimum sampling strategy for high resolution short-range ensemble forecasts. Bred vectors, however, have been hypothesized to better capture the growing modes in the highly nonlinear mesoscale dynamics of severe episodes than singular vectors or observation perturbations. Yet even this technique is not able to produce enough diversity in the ensembles to accurately and routinely predict extreme phenomena such as severe weather. Thus, we propose a new method to generate ensembles of initial conditions perturbations that is based on the breeding technique. Given a standard bred mode, a set of customized perturbations is derived with specified amplitudes and horizontal scales. This allows the ensemble to excite growing modes across a wider range of scales. Results show that this approach produces significantly more spread in the ensemble prediction than standard bred modes alone. Several examples that illustrate the benefits from this approach for severe weather forecasts will be provided.

  17. Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach.

    PubMed

    Hussain, Lal

    2018-06-01

    Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.

  18. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    NASA Astrophysics Data System (ADS)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  19. Development and validation of a climate-based ensemble prediction model for West Nile Virus infection rates in Culex mosquitoes, Suffolk County, New York.

    PubMed

    Little, Eliza; Campbell, Scott R; Shaman, Jeffrey

    2016-08-09

    West Nile Virus (WNV) is an endemic public health concern in the United States that produces periodic seasonal epidemics. Underlying these outbreaks is the enzootic cycle of WNV between mosquito vectors and bird hosts. Identifying the key environmental conditions that facilitate and accelerate this cycle can be used to inform effective vector control. Here, we model and forecast WNV infection rates among mosquito vectors in Suffolk County, New York using readily available meteorological and hydrological conditions. We first validate a statistical model built with surveillance data between 2001 and 2009 (m09) and specify a set of new statistical models using surveillance data from 2001 to 2012 (m12). This ensemble of new models is then used to make predictions for 2013-2015, and multimodel inference is employed to provide a formal probabilistic interpretation across the disparate individual model predictions. The findings of the m09 and m12 models align; with the ensemble of m12 models indicating an association between warm, dry early spring (April) conditions and increased annual WNV infection rates in Culex mosquitoes. This study shows that real-time climate information can be used to predict WNV infection rates in Culex mosquitoes prior to its seasonal peak and before WNV spillover transmission risk to humans is greatest.

  20. Ensemble Data Assimilation Without Ensembles: Methodology and Application to Ocean Data Assimilation

    NASA Technical Reports Server (NTRS)

    Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume

    2013-01-01

    Two methods to estimate background error covariances for data assimilation are introduced. While both share properties with the ensemble Kalman filter (EnKF), they differ from it in that they do not require the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The first method is referred-to as SAFE (Space Adaptive Forecast error Estimation) because it estimates error covariances from the spatial distribution of model variables within a single state vector. It can thus be thought of as sampling an ensemble in space. The second method, named FAST (Flow Adaptive error Statistics from a Time series), constructs an ensemble sampled from a moving window along a model trajectory. The underlying assumption in these methods is that forecast errors in data assimilation are primarily phase errors in space and/or time.

  1. Vector magnetometer based on synchronous manipulation of nitrogen-vacancy centers in all crystal directions

    NASA Astrophysics Data System (ADS)

    Zhang, Chen; Yuan, Heng; Zhang, Ning; Xu, Lixia; Zhang, Jixing; Li, Bo; Fang, Jiancheng

    2018-04-01

    Negatively charged nitrogen vacancy (NV‑) centers in diamond have been extensively studied as high-sensitivity magnetometers, showcasing a wide range of applications. This study experimentally demonstrates a vector magnetometry scheme based on synchronous manipulation of NV‑ center ensembles in all crystal directions using double frequency microwaves (MWs) and multi-coupled-strip-lines (mCSL) waveguide. The application of the mCSL waveguide ensures a high degree of synchrony (99%) for manipulating NV‑ centers in multiple orientations in a large volume. Manipulation with double frequency MWs makes NV‑ centers of all four crystal directions involved, and additionally leads to an enhancement of the manipulation field. In this work, by monitoring the changes in the slope of the resonance line consisting of multi-axes NV‑ centers, measurement of the direction of the external field vector was demonstrated with a sensitivity of {{10}\\prime}/\\sqrt{Hz} . Based on the scheme, the fluorescence signal contrast was improved by four times higher and the sensitivity to the magnetic field strength was improved by two times. The method provides a more practical way of achieving vector sensors based on NV‑ center ensembles in diamond.

  2. Ensembl Genomes 2013: scaling up access to genome-wide data.

    PubMed

    Kersey, Paul Julian; Allen, James E; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Hughes, Daniel Seth Toney; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Langridge, Nicholas; McDowall, Mark D; Maheswari, Uma; Maslen, Gareth; Nuhn, Michael; Ong, Chuang Kee; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Tuli, Mary Ann; Walts, Brandon; Williams, Gareth; Wilson, Derek; Youens-Clark, Ken; Monaco, Marcela K; Stein, Joshua; Wei, Xuehong; Ware, Doreen; Bolser, Daniel M; Howe, Kevin Lee; Kulesha, Eugene; Lawson, Daniel; Staines, Daniel Michael

    2014-01-01

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.

  3. Ensembles of novelty detection classifiers for structural health monitoring using guided waves

    NASA Astrophysics Data System (ADS)

    Dib, Gerges; Karpenko, Oleksii; Koricho, Ermias; Khomenko, Anton; Haq, Mahmoodul; Udpa, Lalita

    2018-01-01

    Guided wave structural health monitoring uses sparse sensor networks embedded in sophisticated structures for defect detection and characterization. The biggest challenge of those sensor networks is developing robust techniques for reliable damage detection under changing environmental and operating conditions (EOC). To address this challenge, we develop a novelty classifier for damage detection based on one class support vector machines. We identify appropriate features for damage detection and introduce a feature aggregation method which quadratically increases the number of available training observations. We adopt a two-level voting scheme by using an ensemble of classifiers and predictions. Each classifier is trained on a different segment of the guided wave signal, and each classifier makes an ensemble of predictions based on a single observation. Using this approach, the classifier can be trained using a small number of baseline signals. We study the performance using Monte-Carlo simulations of an analytical model and data from impact damage experiments on a glass fiber composite plate. We also demonstrate the classifier performance using two types of baseline signals: fixed and rolling baseline training set. The former requires prior knowledge of baseline signals from all EOC, while the latter does not and leverages the fact that EOC vary slowly over time and can be modeled as a Gaussian process.

  4. Ensemble predictive model for more accurate soil organic carbon spectroscopic estimation

    NASA Astrophysics Data System (ADS)

    Vašát, Radim; Kodešová, Radka; Borůvka, Luboš

    2017-07-01

    A myriad of signal pre-processing strategies and multivariate calibration techniques has been explored in attempt to improve the spectroscopic prediction of soil organic carbon (SOC) over the last few decades. Therefore, to come up with a novel, more powerful, and accurate predictive approach to beat the rank becomes a challenging task. However, there may be a way, so that combine several individual predictions into a single final one (according to ensemble learning theory). As this approach performs best when combining in nature different predictive algorithms that are calibrated with structurally different predictor variables, we tested predictors of two different kinds: 1) reflectance values (or transforms) at each wavelength and 2) absorption feature parameters. Consequently we applied four different calibration techniques, two per each type of predictors: a) partial least squares regression and support vector machines for type 1, and b) multiple linear regression and random forest for type 2. The weights to be assigned to individual predictions within the ensemble model (constructed as a weighted average) were determined by an automated procedure that ensured the best solution among all possible was selected. The approach was tested at soil samples taken from surface horizon of four sites differing in the prevailing soil units. By employing the ensemble predictive model the prediction accuracy of SOC improved at all four sites. The coefficient of determination in cross-validation (R2cv) increased from 0.849, 0.611, 0.811 and 0.644 (the best individual predictions) to 0.864, 0.650, 0.824 and 0.698 for Site 1, 2, 3 and 4, respectively. Generally, the ensemble model affected the final prediction so that the maximal deviations of predicted vs. observed values of the individual predictions were reduced, and thus the correlation cloud became thinner as desired.

  5. Ensemble Teleportation

    NASA Astrophysics Data System (ADS)

    Krüger, Thomas

    2006-05-01

    The possibility of teleportation is by sure the most interesting consequence of quantum non-separability. So far, however, teleportation schemes have been formulated by use of state vectors and considering individual entities only. In the present article the feasibility of teleportation is examined on the basis of the rigorous ensemble interpretation of quantum mechanics (not to be confused with a mere treatment of noisy EPR pairs) leading to results which are unexpected from the usual point of view.

  6. A two-model hydrologic ensemble prediction of hydrograph: case study from the upper Nysa Klodzka river basin (SW Poland)

    NASA Astrophysics Data System (ADS)

    Niedzielski, Tomasz; Mizinski, Bartlomiej

    2016-04-01

    The HydroProg system has been elaborated in frame of the research project no. 2011/01/D/ST10/04171 of the National Science Centre of Poland and is steadily producing multimodel ensemble predictions of hydrograph in real time. Although there are six ensemble members available at present, the longest record of predictions and their statistics is available for two data-based models (uni- and multivariate autoregressive models). Thus, we consider 3-hour predictions of water levels, with lead times ranging from 15 to 180 minutes, computed every 15 minutes since August 2013 for the Nysa Klodzka basin (SW Poland) using the two approaches and their two-model ensemble. Since the launch of the HydroProg system there have been 12 high flow episodes, and the objective of this work is to present the performance of the two-model ensemble in the process of forecasting these events. For a sake of brevity, we limit our investigation to a single gauge located at the Nysa Klodzka river in the town of Klodzko, which is centrally located in the studied basin. We identified certain regular scenarios of how the models perform in predicting the high flows in Klodzko. At the initial phase of the high flow, well before the rising limb of hydrograph, the two-model ensemble is found to provide the most skilful prognoses of water levels. However, while forecasting the rising limb of hydrograph, either the two-model solution or the vector autoregressive model offers the best predictive performance. In addition, it is hypothesized that along with the development of the rising limb phase, the vector autoregression becomes the most skilful approach amongst the scrutinized ones. Our simple two-model exercise confirms that multimodel hydrologic ensemble predictions cannot be treated as universal solutions suitable for forecasting the entire high flow event, but their superior performance may hold only for certain phases of a high flow.

  7. Predicting Error Bars for QSAR Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schroeter, Timon; Technische Universitaet Berlin, Department of Computer Science, Franklinstrasse 28/29, 10587 Berlin; Schwaighofer, Anton

    2007-09-18

    Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D{sub 7} models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniquesmore » for the other modelling approaches.« less

  8. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis

    PubMed Central

    2013-01-01

    Background Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs. Results We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance. Conclusions When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time. PMID:23815620

  9. Fixed points, stable manifolds, weather regimes, and their predictability.

    PubMed

    Deremble, Bruno; D'Andrea, Fabio; Ghil, Michael

    2009-12-01

    In a simple, one-layer atmospheric model, we study the links between low-frequency variability and the model's fixed points in phase space. The model dynamics is characterized by the coexistence of multiple "weather regimes." To investigate the transitions from one regime to another, we focus on the identification of stable manifolds associated with fixed points. We show that these manifolds act as separatrices between regimes. We track each manifold by making use of two local predictability measures arising from the meteorological applications of nonlinear dynamics, namely, "bred vectors" and singular vectors. These results are then verified in the framework of ensemble forecasts issued from "clouds" (ensembles) of initial states. The divergence of the trajectories allows us to establish the connections between zones of low predictability, the geometry of the stable manifolds, and transitions between regimes.

  10. A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China

    NASA Astrophysics Data System (ADS)

    Xu, Lei; Chen, Nengcheng; Zhang, Xiang

    2018-02-01

    Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.

  11. Prediction of N-Methyl-D-Aspartate Receptor GluN1-Ligand Binding Affinity by a Novel SVM-Pose/SVM-Score Combinatorial Ensemble Docking Scheme

    PubMed Central

    Leong, Max K.; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng

    2017-01-01

    The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r2 = 0.928–0.988,  = 0.894–0.954, RMSE = 0.002–0.412, s = 0.001–0.214), and the predicted pKi values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r2 = 0.967,  = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery. PMID:28059133

  12. Prediction of N-Methyl-D-Aspartate Receptor GluN1-Ligand Binding Affinity by a Novel SVM-Pose/SVM-Score Combinatorial Ensemble Docking Scheme.

    PubMed

    Leong, Max K; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng

    2017-01-06

    The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r 2  = 0.928-0.988,  = 0.894-0.954, RMSE = 0.002-0.412, s = 0.001-0.214), and the predicted pK i values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r 2  = 0.967,  = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q 2  = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery.

  13. Prediction of N-Methyl-D-Aspartate Receptor GluN1-Ligand Binding Affinity by a Novel SVM-Pose/SVM-Score Combinatorial Ensemble Docking Scheme

    NASA Astrophysics Data System (ADS)

    Leong, Max K.; Syu, Ren-Guei; Ding, Yi-Lung; Weng, Ching-Feng

    2017-01-01

    The glycine-binding site of the N-methyl-D-aspartate receptor (NMDAR) subunit GluN1 is a potential pharmacological target for neurodegenerative disorders. A novel combinatorial ensemble docking scheme using ligand and protein conformation ensembles and customized support vector machine (SVM)-based models to select the docked pose and to predict the docking score was generated for predicting the NMDAR GluN1-ligand binding affinity. The predicted root mean square deviation (RMSD) values in pose by SVM-Pose models were found to be in good agreement with the observed values (n = 30, r2 = 0.928-0.988,  = 0.894-0.954, RMSE = 0.002-0.412, s = 0.001-0.214), and the predicted pKi values by SVM-Score were found to be in good agreement with the observed values for the training samples (n = 24, r2 = 0.967,  = 0.899, RMSE = 0.295, s = 0.170) and test samples (n = 13, q2 = 0.894, RMSE = 0.437, s = 0.202). When subjected to various statistical validations, the developed SVM-Pose and SVM-Score models consistently met the most stringent criteria. A mock test asserted the predictivity of this novel docking scheme. Collectively, this accurate novel combinatorial ensemble docking scheme can be used to predict the NMDAR GluN1-ligand binding affinity for facilitating drug discovery.

  14. Fault detection, isolation, and diagnosis of self-validating multifunctional sensors.

    PubMed

    Yang, Jing-Li; Chen, Yin-Sheng; Zhang, Li-Li; Sun, Zhen

    2016-06-01

    A novel fault detection, isolation, and diagnosis (FDID) strategy for self-validating multifunctional sensors is presented in this paper. The sparse non-negative matrix factorization-based method can effectively detect faults by using the squared prediction error (SPE) statistic, and the variables contribution plots based on SPE statistic can help to locate and isolate the faulty sensitive units. The complete ensemble empirical mode decomposition is employed to decompose the fault signals to a series of intrinsic mode functions (IMFs) and a residual. The sample entropy (SampEn)-weighted energy values of each IMFs and the residual are estimated to represent the characteristics of the fault signals. Multi-class support vector machine is introduced to identify the fault mode with the purpose of diagnosing status of the faulty sensitive units. The performance of the proposed strategy is compared with other fault detection strategies such as principal component analysis, independent component analysis, and fault diagnosis strategies such as empirical mode decomposition coupled with support vector machine. The proposed strategy is fully evaluated in a real self-validating multifunctional sensors experimental system, and the experimental results demonstrate that the proposed strategy provides an excellent solution to the FDID research topic of self-validating multifunctional sensors.

  15. A discrete wavelet based feature extraction and hybrid classification technique for microarray data analysis.

    PubMed

    Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan

    2014-01-01

    Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.

  16. Applying data fusion techniques for benthic habitat mapping and monitoring in a coral reef ecosystem

    NASA Astrophysics Data System (ADS)

    Zhang, Caiyun

    2015-06-01

    Accurate mapping and effective monitoring of benthic habitat in the Florida Keys are critical in developing management strategies for this valuable coral reef ecosystem. For this study, a framework was designed for automated benthic habitat mapping by combining multiple data sources (hyperspectral, aerial photography, and bathymetry data) and four contemporary imagery processing techniques (data fusion, Object-based Image Analysis (OBIA), machine learning, and ensemble analysis). In the framework, 1-m digital aerial photograph was first merged with 17-m hyperspectral imagery and 10-m bathymetry data using a pixel/feature-level fusion strategy. The fused dataset was then preclassified by three machine learning algorithms (Random Forest, Support Vector Machines, and k-Nearest Neighbor). Final object-based habitat maps were produced through ensemble analysis of outcomes from three classifiers. The framework was tested for classifying a group-level (3-class) and code-level (9-class) habitats in a portion of the Florida Keys. Informative and accurate habitat maps were achieved with an overall accuracy of 88.5% and 83.5% for the group-level and code-level classifications, respectively.

  17. Learning About Climate and Atmospheric Models Through Machine Learning

    NASA Astrophysics Data System (ADS)

    Lucas, D. D.

    2017-12-01

    From the analysis of ensemble variability to improving simulation performance, machine learning algorithms can play a powerful role in understanding the behavior of atmospheric and climate models. To learn about model behavior, we create training and testing data sets through ensemble techniques that sample different model configurations and values of input parameters, and then use supervised machine learning to map the relationships between the inputs and outputs. Following this procedure, we have used support vector machines, random forests, gradient boosting and other methods to investigate a variety of atmospheric and climate model phenomena. We have used machine learning to predict simulation crashes, estimate the probability density function of climate sensitivity, optimize simulations of the Madden Julian oscillation, assess the impacts of weather and emissions uncertainty on atmospheric dispersion, and quantify the effects of model resolution changes on precipitation. This presentation highlights recent examples of our applications of machine learning to improve the understanding of climate and atmospheric models. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  18. A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem.

    PubMed

    Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul

    2013-01-01

    Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.

  19. Using Chou's pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location.

    PubMed

    Jiang, Xiaoying; Wei, Rong; Zhao, Yanjun; Zhang, Tongliang

    2008-05-01

    The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author.

  20. Spectral classifier design with ensemble classifiers and misclassification-rejection: application to elastic-scattering spectroscopy for detection of colonic neoplasia.

    PubMed

    Rodriguez-Diaz, Eladio; Castanon, David A; Singh, Satish K; Bigio, Irving J

    2011-06-01

    Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These "rejected" samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20-33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk.

  1. Spectral classifier design with ensemble classifiers and misclassification-rejection: application to elastic-scattering spectroscopy for detection of colonic neoplasia

    PubMed Central

    Rodriguez-Diaz, Eladio; Castanon, David A.; Singh, Satish K.; Bigio, Irving J.

    2011-01-01

    Optical spectroscopy has shown potential as a real-time, in vivo, diagnostic tool for identifying neoplasia during endoscopy. We present the development of a diagnostic algorithm to classify elastic-scattering spectroscopy (ESS) spectra as either neoplastic or non-neoplastic. The algorithm is based on pattern recognition methods, including ensemble classifiers, in which members of the ensemble are trained on different regions of the ESS spectrum, and misclassification-rejection, where the algorithm identifies and refrains from classifying samples that are at higher risk of being misclassified. These “rejected” samples can be reexamined by simply repositioning the probe to obtain additional optical readings or ultimately by sending the polyp for histopathological assessment, as per standard practice. Prospective validation using separate training and testing sets result in a baseline performance of sensitivity = .83, specificity = .79, using the standard framework of feature extraction (principal component analysis) followed by classification (with linear support vector machines). With the developed algorithm, performance improves to Se ∼ 0.90, Sp ∼ 0.90, at a cost of rejecting 20–33% of the samples. These results are on par with a panel of expert pathologists. For colonoscopic prevention of colorectal cancer, our system could reduce biopsy risk and cost, obviate retrieval of non-neoplastic polyps, decrease procedure time, and improve assessment of cancer risk. PMID:21721830

  2. Ensembles of novelty detection classifiers for structural health monitoring using guided waves

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dib, Gerges; Karpenko, Oleksii; Koricho, Ermias

    Guided wave structural health monitoring uses sparse sensor networks embedded in sophisticated structures for defect detection and characterization. The biggest challenge of those sensor networks is developing robust techniques for reliable damage detection under changing environmental and operating conditions. To address this challenge, we develop a novelty classifier for damage detection based on one class support vector machines. We identify appropriate features for damage detection and introduce a feature aggregation method which quadratically increases the number of available training observations.We adopt a two-level voting scheme by using an ensemble of classifiers and predictions. Each classifier is trained on a differentmore » segment of the guided wave signal, and each classifier makes an ensemble of predictions based on a single observation. Using this approach, the classifier can be trained using a small number of baseline signals. We study the performance using monte-carlo simulations of an analytical model and data from impact damage experiments on a glass fiber composite plate.We also demonstrate the classifier performance using two types of baseline signals: fixed and rolling baseline training set. The former requires prior knowledge of baseline signals from all environmental and operating conditions, while the latter does not and leverages the fact that environmental and operating conditions vary slowly over time and can be modeled as a Gaussian process.« less

  3. Species-specific ecological niche modelling predicts different range contractions for Lutzomyia intermedia and a related vector of Leishmania braziliensis following climate change in South America.

    PubMed

    McIntyre, Shannon; Rangel, Elizabeth F; Ready, Paul D; Carvalho, Bruno M

    2017-03-24

    Before 1996 the phlebotomine sand fly Lutzomyia neivai was usually treated as a synonym of the morphologically similar Lutzomyia intermedia, which has long been considered a vector of Leishmania braziliensis, the causative agent of much cutaneous leishmaniasis in South America. This report investigates the likely range changes of both sand fly species in response to a stabilisation climate change scenario (RCP4.5) and a high greenhouse gas emissions one (RCP8.5). Ecological niche modelling was used to identify areas of South America with climates currently suitable for each species, and then the future distributions of these climates were predicted based on climate change scenarios. Compared with the previous ecological niche model of L. intermedia (sensu lato) produced using the GARP algorithm in 2003, the current investigation modelled the two species separately, making use of verified presence records and additional records after 2001. Also, the new ensemble approach employed ecological niche modelling algorithms (including Maximum Entropy, Random Forests and Support Vector Machines) that have been widely adopted since 2003 and perform better than GARP, as well as using a more recent climate change model (HadGEM2) considered to have better performance at higher resolution than the earlier one (HadCM2). Lutzomyia intermedia was shown to be the more tropical of the two species, with its climatic niche defined by higher annual mean temperatures and lower temperature seasonality, in contrast to the more subtropical L. neivai. These different latitudinal ranges explain the two species' predicted responses to climate change by 2050, with L. intermedia mostly contracting its range (except perhaps in northeast Brazil) and L. neivai mostly shifting its range southwards in Brazil and Argentina. This contradicts the findings of the 2003 report, which predicted more range expansion. The different findings can be explained by the improved data sets and modelling methods. Our findings indicate that climate change will not always lead to range expansion of disease vectors such as sand flies. Ecological niche models should be species specific, carefully selected and combined in an ensemble approach.

  4. Conformational and functional analysis of molecular dynamics trajectories by Self-Organising Maps

    PubMed Central

    2011-01-01

    Background Molecular dynamics (MD) simulations are powerful tools to investigate the conformational dynamics of proteins that is often a critical element of their function. Identification of functionally relevant conformations is generally done clustering the large ensemble of structures that are generated. Recently, Self-Organising Maps (SOMs) were reported performing more accurately and providing more consistent results than traditional clustering algorithms in various data mining problems. We present a novel strategy to analyse and compare conformational ensembles of protein domains using a two-level approach that combines SOMs and hierarchical clustering. Results The conformational dynamics of the α-spectrin SH3 protein domain and six single mutants were analysed by MD simulations. The Cα's Cartesian coordinates of conformations sampled in the essential space were used as input data vectors for SOM training, then complete linkage clustering was performed on the SOM prototype vectors. A specific protocol to optimize a SOM for structural ensembles was proposed: the optimal SOM was selected by means of a Taguchi experimental design plan applied to different data sets, and the optimal sampling rate of the MD trajectory was selected. The proposed two-level approach was applied to single trajectories of the SH3 domain independently as well as to groups of them at the same time. The results demonstrated the potential of this approach in the analysis of large ensembles of molecular structures: the possibility of producing a topological mapping of the conformational space in a simple 2D visualisation, as well as of effectively highlighting differences in the conformational dynamics directly related to biological functions. Conclusions The use of a two-level approach combining SOMs and hierarchical clustering for conformational analysis of structural ensembles of proteins was proposed. It can easily be extended to other study cases and to conformational ensembles from other sources. PMID:21569575

  5. Classification of large-sized hyperspectral imagery using fast machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Xia, Junshi; Yokoya, Naoto; Iwasaki, Akira

    2017-07-01

    We present a framework of fast machine learning algorithms in the context of large-sized hyperspectral images classification from the theoretical to a practical viewpoint. In particular, we assess the performance of random forest (RF), rotation forest (RoF), and extreme learning machine (ELM) and the ensembles of RF and ELM. These classifiers are applied to two large-sized hyperspectral images and compared to the support vector machines. To give the quantitative analysis, we pay attention to comparing these methods when working with high input dimensions and a limited/sufficient training set. Moreover, other important issues such as the computational cost and robustness against the noise are also discussed.

  6. Predictability of tropical cyclone events on intraseasonal timescales with the ECMWF monthly forecast model

    NASA Astrophysics Data System (ADS)

    Elsberry, Russell L.; Jordan, Mary S.; Vitart, Frederic

    2010-05-01

    The objective of this study is to provide evidence of predictability on intraseasonal time scales (10-30 days) for western North Pacific tropical cyclone formation and subsequent tracks using the 51-member ECMWF 32-day forecasts made once a week from 5 June through 25 December 2008. Ensemble storms are defined by grouping ensemble member vortices whose positions are within a specified separation distance that is equal to 180 n mi at the initial forecast time t and increases linearly to 420 n mi at Day 14 and then is constant. The 12-h track segments are calculated with a Weighted-Mean Vector Motion technique in which the weighting factor is inversely proportional to the distance from the endpoint of the previous 12-h motion vector. Seventy-six percent of the ensemble storms had five or fewer member vortices. On average, the ensemble storms begin 2.5 days before the first entry of the Joint Typhoon Warning Center (JTWC) best-track file, tend to translate too slowly in the deep tropics, and persist for longer periods over land. A strict objective matching technique with the JTWC storms is combined with a second subjective procedure that is then applied to identify nearby ensemble storms that would indicate a greater likelihood of a tropical cyclone developing in that region with that track orientation. The ensemble storms identified in the ECMWF 32-day forecasts provided guidance on intraseasonal timescales of the formations and tracks of the three strongest typhoons and two other typhoons, but not for two early season typhoons and the late season Dolphin. Four strong tropical storms were predicted consistently over Week-1 through Week-4, as was one weak tropical storm. Two other weak tropical storms, three tropical cyclones that developed from precursor baroclinic systems, and three other tropical depressions were not predicted on intraseasonal timescales. At least for the strongest tropical cyclones during the peak season, the ECMWF 32-day ensemble provides guidance of formation and tracks on 10-30 day timescales.

  7. Development of Ensemble Model Based Water Demand Forecasting Model

    NASA Astrophysics Data System (ADS)

    Kwon, Hyun-Han; So, Byung-Jin; Kim, Seong-Hyeon; Kim, Byung-Seop

    2014-05-01

    In recent years, Smart Water Grid (SWG) concept has globally emerged over the last decade and also gained significant recognition in South Korea. Especially, there has been growing interest in water demand forecast and optimal pump operation and this has led to various studies regarding energy saving and improvement of water supply reliability. Existing water demand forecasting models are categorized into two groups in view of modeling and predicting their behavior in time series. One is to consider embedded patterns such as seasonality, periodicity and trends, and the other one is an autoregressive model that is using short memory Markovian processes (Emmanuel et al., 2012). The main disadvantage of the abovementioned model is that there is a limit to predictability of water demands of about sub-daily scale because the system is nonlinear. In this regard, this study aims to develop a nonlinear ensemble model for hourly water demand forecasting which allow us to estimate uncertainties across different model classes. The proposed model is consist of two parts. One is a multi-model scheme that is based on combination of independent prediction model. The other one is a cross validation scheme named Bagging approach introduced by Brieman (1996) to derive weighting factors corresponding to individual models. Individual forecasting models that used in this study are linear regression analysis model, polynomial regression, multivariate adaptive regression splines(MARS), SVM(support vector machine). The concepts are demonstrated through application to observed from water plant at several locations in the South Korea. Keywords: water demand, non-linear model, the ensemble forecasting model, uncertainty. Acknowledgements This subject is supported by Korea Ministry of Environment as "Projects for Developing Eco-Innovation Technologies (GT-11-G-02-001-6)

  8. Anopheles gambiae genome reannotation through synthesis of ab initio and comparative gene prediction algorithms

    PubMed Central

    Li, Jun; Riehle, Michelle M; Zhang, Yan; Xu, Jiannong; Oduol, Frederick; Gomez, Shawn M; Eiglmeier, Karin; Ueberheide, Beatrix M; Shabanowitz, Jeffrey; Hunt, Donald F; Ribeiro, José MC; Vernick, Kenneth D

    2006-01-01

    Background Complete genome annotation is a necessary tool as Anopheles gambiae researchers probe the biology of this potent malaria vector. Results We reannotate the A. gambiae genome by synthesizing comparative and ab initio sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an A. gambiae shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics. We provide a functional proteomics annotation, ReAnoXcel, obtained by analysis of the new CDSs through the AnoXcel pipeline, which allows functional comparisons of the CDS sets within the same bioinformatic platform. CDS data are available for download. Conclusion Comprehensive A. gambiae genome reannotation is achieved through a combination of comparative and ab initio gene prediction algorithms. PMID:16569258

  9. Training in cortical control of neuroprosthetic devices improves signal extraction from small neuronal ensembles.

    PubMed

    Helms Tillery, S I; Taylor, D M; Schwartz, A B

    2003-01-01

    We have recently developed a closed-loop environment in which we can test the ability of primates to control the motion of a virtual device using ensembles of simultaneously recorded neurons /29/. Here we use a maximum likelihood method to assess the information about task performance contained in the neuronal ensemble. We trained two animals to control the motion of a computer cursor in three dimensions. Initially the animals controlled cursor motion using arm movements, but eventually they learned to drive the cursor directly from cortical activity. Using a population vector (PV) based upon the relation between cortical activity and arm motion, the animals were able to control the cursor directly from the brain in a closed-loop environment, but with difficulty. We added a supervised learning method that modified the parameters of the PV according to task performance (adaptive PV), and found that animals were able to exert much finer control over the cursor motion from brain signals. Here we describe a maximum likelihood method (ML) to assess the information about target contained in neuronal ensemble activity. Using this method, we compared the information about target contained in the ensemble during arm control, during brain control early in the adaptive PV, and during brain control after the adaptive PV had settled and the animal could drive the cursor reliably and with fine gradations. During the arm-control task, the ML was able to determine the target of the movement in as few as 10% of the trials, and as many as 75% of the trials, with an average of 65%. This average dropped when the animals used a population vector to control motion of the cursor. On average we could determine the target in around 35% of the trials. This low percentage was also reflected in poor control of the cursor, so that the animal was unable to reach the target in a large percentage of trials. Supervised adjustment of the population vector parameters produced new weighting coefficients and directional tuning parameters for many neurons. This produced a much better performance of the brain-controlled cursor motion. It was also reflected in the maximum likelihood measure of cell activity, producing the correct target based only on neuronal activity in over 80% of the trials on average. The changes in maximum likelihood estimates of target location based on ensemble firing show that an animal's ability to regulate the motion of a cortically controlled device is not crucially dependent on the experimenter's ability to estimate intention from neuronal activity.

  10. ENSO Bred Vectors in Coupled Ocean-Atmosphere General Circulation Models

    NASA Technical Reports Server (NTRS)

    Yang, S. C.; Cai, Ming; Kalnay, E.; Rienecker, M.; Yuan, G.; Toth, ZA.

    2004-01-01

    The breeding method has been implemented in the NASA Seasonal-to-Interannual Prediction Project (NSIPP) Coupled General Circulation Model (CGCM) with the goal of improving operational seasonal to interannual climate predictions through ensemble forecasting and data assimilation. The coupled instability as cap'tured by the breeding method is the first attempt to isolate the evolving ENSO instability and its corresponding global atmospheric response in a fully coupled ocean-atmosphere GCM. Our results show that the growth rate of the coupled bred vectors (BV) peaks at about 3 months before a background ENSO event. The dominant growing BV modes are reminiscent of the background ENSO anomalies and show a strong tropical response with wind/SST/thermocline interrelated in a manner similar to the background ENSO mode. They exhibit larger amplitudes in the eastern tropical Pacific, reflecting the natural dynamical sensitivity associated with the presence of the shallow thermocline. Moreover, the extratropical perturbations associated with these coupled BV modes reveal the variations related to the atmospheric teleconnection patterns associated with background ENSO variability, e.g. over the North Pacific and North America. A similar experiment was carried out with the NCEP/CFS03 CGCM. Comparisons between bred vectors from the NSIPP CGCM and NCEP/CFS03 CGCM demonstrate the robustness of the results. Our results strongly suggest that the breeding method can serve as a natural filter to identify the slowly varying, coupled instabilities in a coupled GCM, which can be used to construct ensemble perturbations for ensemble forecasts and to estimate the coupled background error covariance for coupled data assimilation.

  11. Improvement of Disease Prediction and Modeling through the Use of Meteorological Ensembles: Human Plague in Uganda

    PubMed Central

    Moore, Sean M.; Monaghan, Andrew; Griffith, Kevin S.; Apangu, Titus; Mead, Paul S.; Eisen, Rebecca J.

    2012-01-01

    Climate and weather influence the occurrence, distribution, and incidence of infectious diseases, particularly those caused by vector-borne or zoonotic pathogens. Thus, models based on meteorological data have helped predict when and where human cases are most likely to occur. Such knowledge aids in targeting limited prevention and control resources and may ultimately reduce the burden of diseases. Paradoxically, localities where such models could yield the greatest benefits, such as tropical regions where morbidity and mortality caused by vector-borne diseases is greatest, often lack high-quality in situ local meteorological data. Satellite- and model-based gridded climate datasets can be used to approximate local meteorological conditions in data-sparse regions, however their accuracy varies. Here we investigate how the selection of a particular dataset can influence the outcomes of disease forecasting models. Our model system focuses on plague (Yersinia pestis infection) in the West Nile region of Uganda. The majority of recent human cases have been reported from East Africa and Madagascar, where meteorological observations are sparse and topography yields complex weather patterns. Using an ensemble of meteorological datasets and model-averaging techniques we find that the number of suspected cases in the West Nile region was negatively associated with dry season rainfall (December-February) and positively with rainfall prior to the plague season. We demonstrate that ensembles of available meteorological datasets can be used to quantify climatic uncertainty and minimize its impacts on infectious disease models. These methods are particularly valuable in regions with sparse observational networks and high morbidity and mortality from vector-borne diseases. PMID:23024750

  12. Shifts in the suitable habitat available for brown trout (Salmo trutta L.) under short-term climate change scenarios.

    PubMed

    Muñoz-Mas, R; Lopez-Nicolas, A; Martínez-Capel, F; Pulido-Velazquez, M

    2016-02-15

    The impact of climate change on the habitat suitability for large brown trout (Salmo trutta L.) was studied in a segment of the Cabriel River (Iberian Peninsula). The future flow and water temperature patterns were simulated at a daily time step with M5 models' trees (NSE of 0.78 and 0.97 respectively) for two short-term scenarios (2011-2040) under the representative concentration pathways (RCP 4.5 and 8.5). An ensemble of five strongly regularized machine learning techniques (generalized additive models, multilayer perceptron ensembles, random forests, support vector machines and fuzzy rule base systems) was used to model the microhabitat suitability (depth, velocity and substrate) during summertime and to evaluate several flows simulated with River2D©. The simulated flow rate and water temperature were combined with the microhabitat assessment to infer bivariate habitat duration curves (BHDCs) under historical conditions and climate change scenarios using either the weighted usable area (WUA) or the Boolean-based suitable area (SA). The forecasts for both scenarios jointly predicted a significant reduction in the flow rate and an increase in water temperature (mean rate of change of ca. -25% and +4% respectively). The five techniques converged on the modelled suitability and habitat preferences; large brown trout selected relatively high flow velocity, large depth and coarse substrate. However, the model developed with support vector machines presented a significantly trimmed output range (max.: 0.38), and thus its predictions were banned from the WUA-based analyses. The BHDCs based on the WUA and the SA broadly matched, indicating an increase in the number of days with less suitable habitat available (WUA and SA) and/or with higher water temperature (trout will endure impoverished environmental conditions ca. 82% of the days). Finally, our results suggested the potential extirpation of the species from the study site during short time spans. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier.

    PubMed

    Xia, Jiaqi; Peng, Zhenling; Qi, Dawei; Mu, Hongbo; Yang, Jianyi

    2017-03-15

    Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information. http://yanglab.nankai.edu.cn/TA-fold/. yangjy@nankai.edu.cn or mhb-506@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  14. The Ensemble Canon

    NASA Technical Reports Server (NTRS)

    MIittman, David S

    2011-01-01

    Ensemble is an open architecture for the development, integration, and deployment of mission operations software. Fundamentally, it is an adaptation of the Eclipse Rich Client Platform (RCP), a widespread, stable, and supported framework for component-based application development. By capitalizing on the maturity and availability of the Eclipse RCP, Ensemble offers a low-risk, politically neutral path towards a tighter integration of operations tools. The Ensemble project is a highly successful, ongoing collaboration among NASA Centers. Since 2004, the Ensemble project has supported the development of mission operations software for NASA's Exploration Systems, Science, and Space Operations Directorates.

  15. Classifier ensemble based on feature selection and diversity measures for predicting the affinity of A(2B) adenosine receptor antagonists.

    PubMed

    Bonet, Isis; Franco-Montero, Pedro; Rivero, Virginia; Teijeira, Marta; Borges, Fernanda; Uriarte, Eugenio; Morales Helguera, Aliuska

    2013-12-23

    A(2B) adenosine receptor antagonists may be beneficial in treating diseases like asthma, diabetes, diabetic retinopathy, and certain cancers. This has stimulated research for the development of potent ligands for this subtype, based on quantitative structure-affinity relationships. In this work, a new ensemble machine learning algorithm is proposed for classification and prediction of the ligand-binding affinity of A(2B) adenosine receptor antagonists. This algorithm is based on the training of different classifier models with multiple training sets (composed of the same compounds but represented by diverse features). The k-nearest neighbor, decision trees, neural networks, and support vector machines were used as single classifiers. To select the base classifiers for combining into the ensemble, several diversity measures were employed. The final multiclassifier prediction results were computed from the output obtained by using a combination of selected base classifiers output, by utilizing different mathematical functions including the following: majority vote, maximum and average probability. In this work, 10-fold cross- and external validation were used. The strategy led to the following results: i) the single classifiers, together with previous features selections, resulted in good overall accuracy, ii) a comparison between single classifiers, and their combinations in the multiclassifier model, showed that using our ensemble gave a better performance than the single classifier model, and iii) our multiclassifier model performed better than the most widely used multiclassifier models in the literature. The results and statistical analysis demonstrated the supremacy of our multiclassifier approach for predicting the affinity of A(2B) adenosine receptor antagonists, and it can be used to develop other QSAR models.

  16. Ensemble of surrogates-based optimization for identifying an optimal surfactant-enhanced aquifer remediation strategy at heterogeneous DNAPL-contaminated sites

    NASA Astrophysics Data System (ADS)

    Jiang, Xue; Lu, Wenxi; Hou, Zeyu; Zhao, Haiqing; Na, Jin

    2015-11-01

    The purpose of this study was to identify an optimal surfactant-enhanced aquifer remediation (SEAR) strategy for aquifers contaminated by dense non-aqueous phase liquid (DNAPL) based on an ensemble of surrogates-based optimization technique. A saturated heterogeneous medium contaminated by nitrobenzene was selected as case study. A new kind of surrogate-based SEAR optimization employing an ensemble surrogate (ES) model together with a genetic algorithm (GA) is presented. Four methods, namely radial basis function artificial neural network (RBFANN), kriging (KRG), support vector regression (SVR), and kernel extreme learning machines (KELM), were used to create four individual surrogate models, which were then compared. The comparison enabled us to select the two most accurate models (KELM and KRG) to establish an ES model of the SEAR simulation model, and the developed ES model as well as these four stand-alone surrogate models was compared. The results showed that the average relative error of the average nitrobenzene removal rates between the ES model and the simulation model for 20 test samples was 0.8%, which is a high approximation accuracy, and which indicates that the ES model provides more accurate predictions than the stand-alone surrogate models. Then, a nonlinear optimization model was formulated for the minimum cost, and the developed ES model was embedded into this optimization model as a constrained condition. Besides, GA was used to solve the optimization model to provide the optimal SEAR strategy. The developed ensemble surrogate-optimization approach was effective in seeking a cost-effective SEAR strategy for heterogeneous DNAPL-contaminated sites. This research is expected to enrich and develop the theoretical and technical implications for the analysis of remediation strategy optimization of DNAPL-contaminated aquifers.

  17. Ensemble of Surrogates-based Optimization for Identifying an Optimal Surfactant-enhanced Aquifer Remediation Strategy at Heterogeneous DNAPL-contaminated Sites

    NASA Astrophysics Data System (ADS)

    Lu, W., Sr.; Xin, X.; Luo, J.; Jiang, X.; Zhang, Y.; Zhao, Y.; Chen, M.; Hou, Z.; Ouyang, Q.

    2015-12-01

    The purpose of this study was to identify an optimal surfactant-enhanced aquifer remediation (SEAR) strategy for aquifers contaminated by dense non-aqueous phase liquid (DNAPL) based on an ensemble of surrogates-based optimization technique. A saturated heterogeneous medium contaminated by nitrobenzene was selected as case study. A new kind of surrogate-based SEAR optimization employing an ensemble surrogate (ES) model together with a genetic algorithm (GA) is presented. Four methods, namely radial basis function artificial neural network (RBFANN), kriging (KRG), support vector regression (SVR), and kernel extreme learning machines (KELM), were used to create four individual surrogate models, which were then compared. The comparison enabled us to select the two most accurate models (KELM and KRG) to establish an ES model of the SEAR simulation model, and the developed ES model as well as these four stand-alone surrogate models was compared. The results showed that the average relative error of the average nitrobenzene removal rates between the ES model and the simulation model for 20 test samples was 0.8%, which is a high approximation accuracy, and which indicates that the ES model provides more accurate predictions than the stand-alone surrogate models. Then, a nonlinear optimization model was formulated for the minimum cost, and the developed ES model was embedded into this optimization model as a constrained condition. Besides, GA was used to solve the optimization model to provide the optimal SEAR strategy. The developed ensemble surrogate-optimization approach was effective in seeking a cost-effective SEAR strategy for heterogeneous DNAPL-contaminated sites. This research is expected to enrich and develop the theoretical and technical implications for the analysis of remediation strategy optimization of DNAPL-contaminated aquifers.

  18. Identifying pollution sources and predicting urban air quality using ensemble learning methods

    NASA Astrophysics Data System (ADS)

    Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali

    2013-12-01

    In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.

  19. Benefits of an ultra large and multiresolution ensemble for estimating available wind power

    NASA Astrophysics Data System (ADS)

    Berndt, Jonas; Hoppe, Charlotte; Elbern, Hendrik

    2016-04-01

    In this study we investigate the benefits of an ultra large ensemble with up to 1000 members including multiple nesting with a target horizontal resolution of 1 km. The ensemble shall be used as a basis to detect events of extreme errors in wind power forecasting. Forecast value is the wind vector at wind turbine hub height (~ 100 m) in the short range (1 to 24 hour). Current wind power forecast systems rest already on NWP ensemble models. However, only calibrated ensembles from meteorological institutions serve as input so far, with limited spatial resolution (˜10 - 80 km) and member number (˜ 50). Perturbations related to the specific merits of wind power production are yet missing. Thus, single extreme error events which are not detected by such ensemble power forecasts occur infrequently. The numerical forecast model used in this study is the Weather Research and Forecasting Model (WRF). Model uncertainties are represented by stochastic parametrization of sub-grid processes via stochastically perturbed parametrization tendencies and in conjunction via the complementary stochastic kinetic-energy backscatter scheme already provided by WRF. We perform continuous ensemble updates by comparing each ensemble member with available observations using a sequential importance resampling filter to improve the model accuracy while maintaining ensemble spread. Additionally, we use different ensemble systems from global models (ECMWF and GFS) as input and boundary conditions to capture different synoptic conditions. Critical weather situations which are connected to extreme error events are located and corresponding perturbation techniques are applied. The demanding computational effort is overcome by utilising the supercomputer JUQUEEN at the Forschungszentrum Juelich.

  20. Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology.

    PubMed

    Zhang, Jieru; Ju, Ying; Lu, Huijuan; Xuan, Ping; Zou, Quan

    2016-01-01

    Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.

  1. Identifying interactions between chemical entities in biomedical text.

    PubMed

    Lamurias, Andre; Ferreira, João D; Couto, Francisco M

    2014-10-23

    Interactions between chemical compounds described in biomedical text can be of great importance to drug discovery and design, as well as pharmacovigilance. We developed a novel system, \\"Identifying Interactions between Chemical Entities\\" (IICE), to identify chemical interactions described in text. Kernel-based Support Vector Machines first identify the interactions and then an ensemble classifier validates and classifies the type of each interaction. This relation extraction module was evaluated with the corpus released for the DDI Extraction task of SemEval 2013, obtaining results comparable to state-of-the-art methods for this type of task. We integrated this module with our chemical named entity recognition module and made the whole system available as a web tool at www.lasige.di.fc.ul.pt/webtools/iice.

  2. Identifying interactions between chemical entities in biomedical text.

    PubMed

    Lamurias, Andre; Ferreira, João D; Couto, Francisco M

    2014-12-01

    Interactions between chemical compounds described in biomedical text can be of great importance to drug discovery and design, as well as pharmacovigilance. We developed a novel system, "Identifying Interactions between Chemical Entities" (IICE), to identify chemical interactions described in text. Kernel-based Support Vector Machines first identify the interactions and then an ensemble classifier validates and classifies the type of each interaction. This relation extraction module was evaluated with the corpus released for the DDI Extraction task of SemEval 2013, obtaining results comparable to stateof- the-art methods for this type of task. We integrated this module with our chemical named entity recognition module and made the whole system available as a web tool at www.lasige.di.fc.ul.pt/webtools/iice.

  3. Monitoring of the Conformational Space of Dipeptides by Generative Topographic Mapping.

    PubMed

    Horvath, Dragos; Marcou, Gilles; Varnek, Alexandre

    2018-01-01

    This work describes a procedure to build generative topographic maps (GTM) as 2D representation of the conformational space (CS) of dipeptides. GTMs with excellent propensities to support highly predictive landscapes of various conformational properties were reported for three dipeptides (AA, KE and KR). CS monitoring via GTMproceeds through the projection of conformer ensembles on the map, producing cumulated responsibility (CR) vectors characteristic of the CS areas covered by the ensemble. Overlap of the CS areas visited by two distinct simulations can be expressed by the Tanimoto coefficient Tc of the associated CRs. This idea was used to monitor the reproducibility of the stochastic evolutionary conformer generation process implemented in S4MPLE. It could be shown that conformers produced by <500 S4MPLE runs reproducibly cover the relevant CS zone at given setup of the driving force field. The propensity of a simulation to visit the native CS zone can thus be quantitatively estimated, as the Tc score with respect to the "native" CR, as defined by the ensemble of dipeptide geometries extracted from PDB proteins. It could be shown that low-energy CS regions were indeed found to fall within the native zone. The Tc overlap score behaved as a smooth function of force field parameters. This opens the perspective of a novel force field parameter tuning procedure, bound to simultaneously optimize the behavior of the in Silico simulations for every possible dipeptide. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification.

    PubMed

    Jowkar, Gholam-Hossein; Mansoori, Eghbal G

    2016-10-01

    Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease gene identification. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. In our method, a reliable set of positive and negative genes are extracted using co-training schema. Then, the similarity graph of genes is built using metric learning by concentrating on multi-rank-walk method to perform inference from labeled genes. At last, a Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k-nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12950 disease genes with 949 positive genes from six class of diseases and 12001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Good Models Gone Bad: Quantifying and Predicting Parameter-Induced Climate Model Simulation Failures

    NASA Astrophysics Data System (ADS)

    Lucas, D. D.; Klein, R.; Tannahill, J.; Brandon, S.; Covey, C. C.; Domyancic, D.; Ivanova, D. P.

    2012-12-01

    Simulations using IPCC-class climate models are subject to fail or crash for a variety of reasons. Statistical analysis of the failures can yield useful insights to better understand and improve the models. During the course of uncertainty quantification (UQ) ensemble simulations to assess the effects of ocean model parameter uncertainties on climate simulations, we experienced a series of simulation failures of the Parallel Ocean Program (POP2). About 8.5% of our POP2 runs failed for numerical reasons at certain combinations of parameter values. We apply support vector machine (SVM) classification from the fields of pattern recognition and machine learning to quantify and predict the probability of failure as a function of the values of 18 POP2 parameters. The SVM classifiers readily predict POP2 failures in an independent validation ensemble, and are subsequently used to determine the causes of the failures via a global sensitivity analysis. Four parameters related to ocean mixing and viscosity are identified as the major sources of POP2 failures. Our method can be used to improve the robustness of complex scientific models to parameter perturbations and to better steer UQ ensembles. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 and was funded by the Uncertainty Quantification Strategic Initiative Laboratory Directed Research and Development Project at LLNL under project tracking code 10-SI-013 (UCRL LLNL-ABS-569112).

  6. Robust Framework to Combine Diverse Classifiers Assigning Distributed Confidence to Individual Classifiers at Class Level

    PubMed Central

    Arshad, Sannia; Rho, Seungmin

    2014-01-01

    We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes. PMID:25295302

  7. Robust framework to combine diverse classifiers assigning distributed confidence to individual classifiers at class level.

    PubMed

    Khalid, Shehzad; Arshad, Sannia; Jabbar, Sohail; Rho, Seungmin

    2014-01-01

    We have presented a classification framework that combines multiple heterogeneous classifiers in the presence of class label noise. An extension of m-Mediods based modeling is presented that generates model of various classes whilst identifying and filtering noisy training data. This noise free data is further used to learn model for other classifiers such as GMM and SVM. A weight learning method is then introduced to learn weights on each class for different classifiers to construct an ensemble. For this purpose, we applied genetic algorithm to search for an optimal weight vector on which classifier ensemble is expected to give the best accuracy. The proposed approach is evaluated on variety of real life datasets. It is also compared with existing standard ensemble techniques such as Adaboost, Bagging, and Random Subspace Methods. Experimental results show the superiority of proposed ensemble method as compared to its competitors, especially in the presence of class label noise and imbalance classes.

  8. Adiabatic and nonadiabatic perturbation theory for coherence vector description of neutrino oscillations

    NASA Astrophysics Data System (ADS)

    Hollenberg, Sebastian; Päs, Heinrich

    2012-01-01

    The standard wave function approach for the treatment of neutrino oscillations fails in situations where quantum ensembles at a finite temperature with or without an interacting background plasma are encountered. As a first step to treat such phenomena in a novel way, we propose a unified approach to both adiabatic and nonadiabatic two-flavor oscillations in neutrino ensembles with finite temperature and generic (e.g., matter) potentials. Neglecting effects of ensemble decoherence for now, we study the evolution of a neutrino ensemble governed by the associated quantum kinetic equations, which apply to systems with finite temperature. The quantum kinetic equations are solved formally using the Magnus expansion and it is shown that a convenient choice of the quantum mechanical picture (e.g., the interaction picture) reveals suitable parameters to characterize the physics of the underlying system (e.g., an effective oscillation length). It is understood that this method also provides a promising starting point for the treatment of the more general case in which decoherence is taken into account.

  9. Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.

    PubMed

    Yu, Hualong; Hong, Shufang; Yang, Xibei; Ni, Jun; Dan, Yuanyuan; Qin, Bin

    2013-01-01

    DNA microarray technology can measure the activities of tens of thousands of genes simultaneously, which provides an efficient way to diagnose cancer at the molecular level. Although this strategy has attracted significant research attention, most studies neglect an important problem, namely, that most DNA microarray datasets are skewed, which causes traditional learning algorithms to produce inaccurate results. Some studies have considered this problem, yet they merely focus on binary-class problem. In this paper, we dealt with multiclass imbalanced classification problem, as encountered in cancer DNA microarray, by using ensemble learning. We utilized one-against-all coding strategy to transform multiclass to multiple binary classes, each of them carrying out feature subspace, which is an evolving version of random subspace that generates multiple diverse training subsets. Next, we introduced one of two different correction technologies, namely, decision threshold adjustment or random undersampling, into each training subset to alleviate the damage of class imbalance. Specifically, support vector machine was used as base classifier, and a novel voting rule called counter voting was presented for making a final decision. Experimental results on eight skewed multiclass cancer microarray datasets indicate that unlike many traditional classification approaches, our methods are insensitive to class imbalance.

  10. Method of assessing the state of a rolling bearing based on the relative compensation distance of multiple-domain features and locally linear embedding

    NASA Astrophysics Data System (ADS)

    Kang, Shouqiang; Ma, Danyang; Wang, Yujing; Lan, Chaofeng; Chen, Qingguo; Mikulovich, V. I.

    2017-03-01

    To effectively assess different fault locations and different degrees of performance degradation of a rolling bearing with a unified assessment index, a novel state assessment method based on the relative compensation distance of multiple-domain features and locally linear embedding is proposed. First, for a single-sample signal, time-domain and frequency-domain indexes can be calculated for the original vibration signal and each sensitive intrinsic mode function obtained by improved ensemble empirical mode decomposition, and the singular values of the sensitive intrinsic mode function matrix can be extracted by singular value decomposition to construct a high-dimensional hybrid-domain feature vector. Second, a feature matrix can be constructed by arranging each feature vector of multiple samples, the dimensions of each row vector of the feature matrix can be reduced by the locally linear embedding algorithm, and the compensation distance of each fault state of the rolling bearing can be calculated using the support vector machine. Finally, the relative distance between different fault locations and different degrees of performance degradation and the normal-state optimal classification surface can be compensated, and on the basis of the proposed relative compensation distance, the assessment model can be constructed and an assessment curve drawn. Experimental results show that the proposed method can effectively assess different fault locations and different degrees of performance degradation of the rolling bearing under certain conditions.

  11. Classification of Alzheimer's disease patients with hippocampal shape wrapper-based feature selection and support vector machine

    NASA Astrophysics Data System (ADS)

    Young, Jonathan; Ridgway, Gerard; Leung, Kelvin; Ourselin, Sebastien

    2012-02-01

    It is well known that hippocampal atrophy is a marker of the onset of Alzheimer's disease (AD) and as a result hippocampal volumetry has been used in a number of studies to provide early diagnosis of AD and predict conversion of mild cognitive impairment patients to AD. However, rates of atrophy are not uniform across the hippocampus making shape analysis a potentially more accurate biomarker. This study studies the hippocampi from 226 healthy controls, 148 AD patients and 330 MCI patients obtained from T1 weighted structural MRI images from the ADNI database. The hippocampi are anatomically segmented using the MAPS multi-atlas segmentation method, and the resulting binary images are then processed with SPHARM software to decompose their shapes as a weighted sum of spherical harmonic basis functions. The resulting parameterizations are then used as feature vectors in Support Vector Machine (SVM) classification. A wrapper based feature selection method was used as this considers the utility of features in discriminating classes in combination, fully exploiting the multivariate nature of the data and optimizing the selected set of features for the type of classifier that is used. The leave-one-out cross validated accuracy obtained on training data is 88.6% for classifying AD vs controls and 74% for classifying MCI-converters vs MCI-stable with very compact feature sets, showing that this is a highly promising method. There is currently a considerable fall in accuracy on unseen data indicating that the feature selection is sensitive to the data used, however feature ensemble methods may overcome this.

  12. Joys of Community Ensemble Playing: The Case of the Happy Roll Elastic Ensemble in Taiwan

    ERIC Educational Resources Information Center

    Hsieh, Yuan-Mei; Kao, Kai-Chi

    2012-01-01

    The Happy Roll Elastic Ensemble (HREE) is a community music ensemble supported by Tainan Culture Centre in Taiwan. With enjoyment and friendship as its primary goals, it aims to facilitate the joys of ensemble playing and the spirit of social networking. This article highlights the key aspects of HREE's development in its first two years…

  13. Ensemble Kalman filter inference of spatially-varying Manning's n coefficients in the coastal ocean

    NASA Astrophysics Data System (ADS)

    Siripatana, Adil; Mayo, Talea; Knio, Omar; Dawson, Clint; Maître, Olivier Le; Hoteit, Ibrahim

    2018-07-01

    Ensemble Kalman (EnKF) filtering is an established framework for large scale state estimation problems. EnKFs can also be used for state-parameter estimation, using the so-called "Joint-EnKF" approach. The idea is simply to augment the state vector with the parameters to be estimated and assign invariant dynamics for the time evolution of the parameters. In this contribution, we investigate the efficiency of the Joint-EnKF for estimating spatially-varying Manning's n coefficients used to define the bottom roughness in the Shallow Water Equations (SWEs) of a coastal ocean model. Observation System Simulation Experiments (OSSEs) are conducted using the ADvanced CIRCulation (ADCIRC) model, which solves a modified form of the Shallow Water Equations. A deterministic EnKF, the Singular Evolutive Interpolated Kalman (SEIK) filter, is used to estimate a vector of Manning's n coefficients defined at the model nodal points by assimilating synthetic water elevation data. It is found that with reasonable ensemble size (O (10)) , the filter's estimate converges to the reference Manning's field. To enhance performance, we have further reduced the dimension of the parameter search space through a Karhunen-Loéve (KL) expansion. We have also iterated on the filter update step to better account for the nonlinearity of the parameter estimation problem. We study the sensitivity of the system to the ensemble size, localization scale, dimension of retained KL modes, and number of iterations. The performance of the proposed framework in term of estimation accuracy suggests that a well-tuned Joint-EnKF provides a promising robust approach to infer spatially varying seabed roughness parameters in the context of coastal ocean modeling.

  14. Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence.

    PubMed

    Tseng, Chih-Jen; Lu, Chi-Jie; Chang, Chi-Chang; Chen, Gin-Den; Cheewakriangkrai, Chalong

    2017-05-01

    Ovarian cancer is the second leading cause of deaths among gynecologic cancers in the world. Approximately 90% of women with ovarian cancer reported having symptoms long before a diagnosis was made. Literature shows that recurrence should be predicted with regard to their personal risk factors and the clinical symptoms of this devastating cancer. In this study, ensemble learning and five data mining approaches, including support vector machine (SVM), C5.0, extreme learning machine (ELM), multivariate adaptive regression splines (MARS), and random forest (RF), were integrated to rank the importance of risk factors and diagnose the recurrence of ovarian cancer. The medical records and pathologic status were extracted from the Chung Shan Medical University Hospital Tumor Registry. Experimental results illustrated that the integrated C5.0 model is a superior approach in predicting the recurrence of ovarian cancer. Moreover, the classification accuracies of C5.0, ELM, MARS, RF, and SVM indeed increased after using the selected important risk factors as predictors. Our findings suggest that The International Federation of Gynecology and Obstetrics (FIGO), Pathologic M, Age, and Pathologic T were the four most critical risk factors for ovarian cancer recurrence. In summary, the above information can support the important influence of personality and clinical symptom representations on all phases of guide interventions, with the complexities of multiple symptoms associated with ovarian cancer in all phases of the recurrent trajectory. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. On the asynchronously continuous control of mobile robot movement by motor cortical spiking activity.

    PubMed

    Xu, Zhiming; So, Rosa Q; Toe, Kyaw Kyar; Ang, Kai Keng; Guan, Cuntai

    2014-01-01

    This paper presents an asynchronously intracortical brain-computer interface (BCI) which allows the subject to continuously drive a mobile robot. This system has a great implication for disabled patients to move around. By carefully designing a multiclass support vector machine (SVM), the subject's self-paced instantaneous movement intents are continuously decoded to control the mobile robot. In particular, we studied the stability of the neural representation of the movement directions. Experimental results on the nonhuman primate showed that the overt movement directions were stably represented in ensemble of recorded units, and our SVM classifier could successfully decode such movements continuously along the desired movement path. However, the neural representation of the stop state for the self-paced control was not stably represented and could drift.

  16. Comparison of initial perturbation methods for the mesoscale ensemble prediction system of the Meteorological Research Institute for the WWRP Beijing 2008 Olympics Research and Development Project (B08RDP)

    NASA Astrophysics Data System (ADS)

    Saito, Kazuo; Hara, Masahiro; Kunii, Masaru; Seko, Hiromu; Yamaguchi, Munehiko

    2011-05-01

    Different initial perturbation methods for the mesoscale ensemble prediction were compared by the Meteorological Research Institute (MRI) as a part of the intercomparison of mesoscale ensemble prediction systems (EPSs) of the World Weather Research Programme (WWRP) Beijing 2008 Olympics Research and Development Project (B08RDP). Five initial perturbation methods for mesoscale ensemble prediction were developed for B08RDP and compared at MRI: (1) a downscaling method of the Japan Meteorological Agency (JMA)'s operational one-week EPS (WEP), (2) a targeted global model singular vector (GSV) method, (3) a mesoscale model singular vector (MSV) method based on the adjoint model of the JMA non-hydrostatic model (NHM), (4) a mesoscale breeding growing mode (MBD) method based on the NHM forecast and (5) a local ensemble transform (LET) method based on the local ensemble transform Kalman filter (LETKF) using NHM. These perturbation methods were applied to the preliminary experiments of the B08RDP Tier-1 mesoscale ensemble prediction with a horizontal resolution of 15 km. To make the comparison easier, the same horizontal resolution (40 km) was employed for the three mesoscale model-based initial perturbation methods (MSV, MBD and LET). The GSV method completely outperformed the WEP method, confirming the advantage of targeting in mesoscale EPS. The GSV method generally performed well with regard to root mean square errors of the ensemble mean, large growth rates of ensemble spreads throughout the 36-h forecast period, and high detection rates and high Brier skill scores (BSSs) for weak rains. On the other hand, the mesoscale model-based initial perturbation methods showed good detection rates and BSSs for intense rains. The MSV method showed a rapid growth in the ensemble spread of precipitation up to a forecast time of 6 h, which suggests suitability of the mesoscale SV for short-range EPSs, but the initial large growth of the perturbation did not last long. The performance of the MBD method was good for ensemble prediction of intense rain with a relatively small computing cost. The LET method showed similar characteristics to the MBD method, but the spread and growth rate were slightly smaller and the relative operating characteristic area skill score and BSS did not surpass those of MBD. These characteristic features of the five methods were confirmed by checking the evolution of the total energy norms and their growth rates. Characteristics of the initial perturbations obtained by four methods (GSV, MSV, MBD and LET) were examined for the case of a synoptic low-pressure system passing over eastern China. With GSV and MSV, the regions of large spread were near the low-pressure system, but with MSV, the distribution was more concentrated on the mesoscale disturbance. On the other hand, large-spread areas were observed southwest of the disturbance in MBD and LET. The horizontal pattern of LET perturbation was similar to that of MBD, but the amplitude of the LET perturbation reflected the observation density.

  17. Black holes with halos

    NASA Astrophysics Data System (ADS)

    Monten, Ruben; Toldo, Chiara

    2018-02-01

    We present new AdS4 black hole solutions in N =2 gauged supergravity coupled to vector and hypermultiplets. We focus on a particular consistent truncation of M-theory on the homogeneous Sasaki–Einstein seven-manifold M 111, characterized by the presence of one Betti vector multiplet. We numerically construct static and spherically symmetric black holes with electric and magnetic charges, corresponding to M2 and M5 branes wrapping non-contractible cycles of the internal manifold. The novel feature characterizing these nonzero temperature configurations is the presence of a massive vector field halo. Moreover, we verify the first law of black hole mechanics and we study the thermodynamics in the canonical ensemble. We analyze the behavior of the massive vector field condensate across the small-large black hole phase transition and we interpret the process in the dual field theory.

  18. Fault Detection of Bearing Systems through EEMD and Optimization Algorithm

    PubMed Central

    Lee, Dong-Han; Ahn, Jong-Hyo; Koh, Bong-Hwan

    2017-01-01

    This study proposes a fault detection and diagnosis method for bearing systems using ensemble empirical mode decomposition (EEMD) based feature extraction, in conjunction with particle swarm optimization (PSO), principal component analysis (PCA), and Isomap. First, a mathematical model is assumed to generate vibration signals from damaged bearing components, such as the inner-race, outer-race, and rolling elements. The process of decomposing vibration signals into intrinsic mode functions (IMFs) and extracting statistical features is introduced to develop a damage-sensitive parameter vector. Finally, PCA and Isomap algorithm are used to classify and visualize this parameter vector, to separate damage characteristics from healthy bearing components. Moreover, the PSO-based optimization algorithm improves the classification performance by selecting proper weightings for the parameter vector, to maximize the visualization effect of separating and grouping of parameter vectors in three-dimensional space. PMID:29143772

  19. Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling.

    PubMed

    Pourghasemi, Hamid Reza; Yousefi, Saleh; Kornejady, Aiding; Cerdà, Artemi

    2017-12-31

    Gully erosion is identified as an important sediment source in a range of environments and plays a conclusive role in redistribution of eroded soils on a slope. Hence, addressing spatial occurrence pattern of this phenomenon is very important. Different ensemble models and their single counterparts, mostly data mining methods, have been used for gully erosion susceptibility mapping; however, their calibration and validation procedures need to be thoroughly addressed. The current study presents a series of individual and ensemble data mining methods including artificial neural network (ANN), support vector machine (SVM), maximum entropy (ME), ANN-SVM, ANN-ME, and SVM-ME to map gully erosion susceptibility in Aghemam watershed, Iran. To this aim, a gully inventory map along with sixteen gully conditioning factors was used. A 70:30% randomly partitioned sets were used to assess goodness-of-fit and prediction power of the models. The robustness, as the stability of models' performance in response to changes in the dataset, was assessed through three training/test replicates. As a result, conducted preliminary statistical tests showed that ANN has the highest concordance and spatial differentiation with a chi-square value of 36,656 at 95% confidence level, while the ME appeared to have the lowest concordance (1772). The ME model showed an impractical result where 45% of the study area was introduced as highly susceptible to gullying, in contrast, ANN-SVM indicated a practical result with focusing only on 34% of the study area. Through all three replicates, the ANN-SVM ensemble showed the highest goodness-of-fit and predictive power with a respective values of 0.897 (area under the success rate curve) and 0.879 (area under the prediction rate curve), on average, and correspondingly the highest robustness. This attests the important role of ensemble modeling in congruently building accurate and generalized models which emphasizes the necessity to examine different models integrations. The result of this study can prepare an outline for further biophysical designs on gullies scattered in the study area. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Predication of different stages of Alzheimer's disease using neighborhood component analysis and ensemble decision tree.

    PubMed

    Jin, Mingwu; Deng, Weishu

    2018-05-15

    There is a spectrum of the progression from healthy control (HC) to mild cognitive impairment (MCI) without conversion to Alzheimer's disease (AD), to MCI with conversion to AD (cMCI), and to AD. This study aims to predict the different disease stages using brain structural information provided by magnetic resonance imaging (MRI) data. The neighborhood component analysis (NCA) is applied to select most powerful features for prediction. The ensemble decision tree classifier is built to predict which group the subject belongs to. The best features and model parameters are determined by cross validation of the training data. Our results show that 16 out of a total of 429 features were selected by NCA using 240 training subjects, including MMSE score and structural measures in memory-related regions. The boosting tree model with NCA features can achieve prediction accuracy of 56.25% on 160 test subjects. Principal component analysis (PCA) and sequential feature selection (SFS) are used for feature selection, while support vector machine (SVM) is used for classification. The boosting tree model with NCA features outperforms all other combinations of feature selection and classification methods. The results suggest that NCA be a better feature selection strategy than PCA and SFS for the data used in this study. Ensemble tree classifier with boosting is more powerful than SVM to predict the subject group. However, more advanced feature selection and classification methods or additional measures besides structural MRI may be needed to improve the prediction performance. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. ANALYSIS OF SAMPLING TECHNIQUES FOR IMBALANCED DATA: AN N=648 ADNI STUDY

    PubMed Central

    Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M.; Ye, Jieping

    2013-01-01

    Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer’s disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and under sampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1). a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2). sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. PMID:24176869

  2. Analysis of sampling techniques for imbalanced data: An n = 648 ADNI study.

    PubMed

    Dubey, Rashmi; Zhou, Jiayu; Wang, Yalin; Thompson, Paul M; Ye, Jieping

    2014-02-15

    Many neuroimaging applications deal with imbalanced imaging data. For example, in Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the mild cognitive impairment (MCI) cases eligible for the study are nearly two times the Alzheimer's disease (AD) patients for structural magnetic resonance imaging (MRI) modality and six times the control cases for proteomics modality. Constructing an accurate classifier from imbalanced data is a challenging task. Traditional classifiers that aim to maximize the overall prediction accuracy tend to classify all data into the majority class. In this paper, we study an ensemble system of feature selection and data sampling for the class imbalance problem. We systematically analyze various sampling techniques by examining the efficacy of different rates and types of undersampling, oversampling, and a combination of over and undersampling approaches. We thoroughly examine six widely used feature selection algorithms to identify significant biomarkers and thereby reduce the complexity of the data. The efficacy of the ensemble techniques is evaluated using two different classifiers including Random Forest and Support Vector Machines based on classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity measures. Our extensive experimental results show that for various problem settings in ADNI, (1) a balanced training set obtained with K-Medoids technique based undersampling gives the best overall performance among different data sampling techniques and no sampling approach; and (2) sparse logistic regression with stability selection achieves competitive performance among various feature selection algorithms. Comprehensive experiments with various settings show that our proposed ensemble model of multiple undersampled datasets yields stable and promising results. © 2013 Elsevier Inc. All rights reserved.

  3. Coherence rephasing combined with spin-wave storage using chirped control pulses

    NASA Astrophysics Data System (ADS)

    Demeter, Gabor

    2014-06-01

    Photon-echo based optical quantum memory schemes often employ intermediate steps to transform optical coherences to spin coherences for longer storage times. We analyze a scheme that uses three identical chirped control pulses for coherence rephasing in an inhomogeneously broadened ensemble of three-level Λ systems. The pulses induce a cyclic permutation of the atomic populations in the adiabatic regime. Optical coherences created by a signal pulse are stored as spin coherences at an intermediate time interval, and are rephased for echo emission when the ensemble is returned to the initial state. Echo emission during a possible partial rephasing when the medium is inverted can be suppressed with an appropriate choice of control pulse wave vectors. We demonstrate that the scheme works in an optically dense ensemble, despite control pulse distortions during propagation. It integrates conveniently the spin-wave storage step into memory schemes based on a second rephasing of the atomic coherences.

  4. Manifestations of classical physics in the quantum evolution of correlated spin states in pulsed NMR experiments.

    PubMed

    Ligare, Martin

    2016-05-01

    Multiple-pulse NMR experiments are a powerful tool for the investigation of molecules with coupled nuclear spins. The product operator formalism provides a way to understand the quantum evolution of an ensemble of weakly coupled spins in such experiments using some of the more intuitive concepts of classical physics and semi-classical vector representations. In this paper I present a new way in which to interpret the quantum evolution of an ensemble of spins. I recast the quantum problem in terms of mixtures of pure states of two spins whose expectation values evolve identically to those of classical moments. Pictorial representations of these classically evolving states provide a way to calculate the time evolution of ensembles of weakly coupled spins without the full machinery of quantum mechanics, offering insight to anyone who understands precession of magnetic moments in magnetic fields.

  5. Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.

    PubMed

    Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J

    2015-02-01

    The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. sw-SVM: sensor weighting support vector machines for EEG-based brain-computer interfaces.

    PubMed

    Jrad, N; Congedo, M; Phlypo, R; Rousseau, S; Flamary, R; Yger, F; Rakotomamonjy, A

    2011-10-01

    In many machine learning applications, like brain-computer interfaces (BCI), high-dimensional sensor array data are available. Sensor measurements are often highly correlated and signal-to-noise ratio is not homogeneously spread across sensors. Thus, collected data are highly variable and discrimination tasks are challenging. In this work, we focus on sensor weighting as an efficient tool to improve the classification procedure. We present an approach integrating sensor weighting in the classification framework. Sensor weights are considered as hyper-parameters to be learned by a support vector machine (SVM). The resulting sensor weighting SVM (sw-SVM) is designed to satisfy a margin criterion, that is, the generalization error. Experimental studies on two data sets are presented, a P300 data set and an error-related potential (ErrP) data set. For the P300 data set (BCI competition III), for which a large number of trials is available, the sw-SVM proves to perform equivalently with respect to the ensemble SVM strategy that won the competition. For the ErrP data set, for which a small number of trials are available, the sw-SVM shows superior performances as compared to three state-of-the art approaches. Results suggest that the sw-SVM promises to be useful in event-related potentials classification, even with a small number of training trials.

  7. Incremental Support Vector Machine Framework for Visual Sensor Networks

    NASA Astrophysics Data System (ADS)

    Awad, Mariette; Jiang, Xianhua; Motai, Yuichi

    2006-12-01

    Motivated by the emerging requirements of surveillance networks, we present in this paper an incremental multiclassification support vector machine (SVM) technique as a new framework for action classification based on real-time multivideo collected by homogeneous sites. The technique is based on an adaptation of least square SVM (LS-SVM) formulation but extends beyond the static image-based learning of current SVM methodologies. In applying the technique, an initial supervised offline learning phase is followed by a visual behavior data acquisition and an online learning phase during which the cluster head performs an ensemble of model aggregations based on the sensor nodes inputs. The cluster head then selectively switches on designated sensor nodes for future incremental learning. Combining sensor data offers an improvement over single camera sensing especially when the latter has an occluded view of the target object. The optimization involved alleviates the burdens of power consumption and communication bandwidth requirements. The resulting misclassification error rate, the iterative error reduction rate of the proposed incremental learning, and the decision fusion technique prove its validity when applied to visual sensor networks. Furthermore, the enabled online learning allows an adaptive domain knowledge insertion and offers the advantage of reducing both the model training time and the information storage requirements of the overall system which makes it even more attractive for distributed sensor networks communication.

  8. Identification of Shearer Cutting Patterns Using Vibration Signals Based on a Least Squares Support Vector Machine with an Improved Fruit Fly Optimization Algorithm

    PubMed Central

    Si, Lei; Wang, Zhongbin; Liu, Xinhua; Tan, Chao; Liu, Ze; Xu, Jing

    2016-01-01

    Shearers play an important role in fully mechanized coal mining face and accurately identifying their cutting pattern is very helpful for improving the automation level of shearers and ensuring the safety of coal mining. The least squares support vector machine (LSSVM) has been proven to offer strong potential in prediction and classification issues, particularly by employing an appropriate meta-heuristic algorithm to determine the values of its two parameters. However, these meta-heuristic algorithms have the drawbacks of being hard to understand and reaching the global optimal solution slowly. In this paper, an improved fly optimization algorithm (IFOA) to optimize the parameters of LSSVM was presented and the LSSVM coupled with IFOA (IFOA-LSSVM) was used to identify the shearer cutting pattern. The vibration acceleration signals of five cutting patterns were collected and the special state features were extracted based on the ensemble empirical mode decomposition (EEMD) and the kernel function. Some examples on the IFOA-LSSVM model were further presented and the results were compared with LSSVM, PSO-LSSVM, GA-LSSVM and FOA-LSSVM models in detail. The comparison results indicate that the proposed approach was feasible, efficient and outperformed the others. Finally, an industrial application example at the coal mining face was demonstrated to specify the effect of the proposed system. PMID:26771615

  9. Texture Descriptors Ensembles Enable Image-Based Classification of Maturation of Human Stem Cell-Derived Retinal Pigmented Epithelium

    PubMed Central

    Caetano dos Santos, Florentino Luciano; Skottman, Heli; Juuti-Uusitalo, Kati; Hyttinen, Jari

    2016-01-01

    Aims A fast, non-invasive and observer-independent method to analyze the homogeneity and maturity of human pluripotent stem cell (hPSC) derived retinal pigment epithelial (RPE) cells is warranted to assess the suitability of hPSC-RPE cells for implantation or in vitro use. The aim of this work was to develop and validate methods to create ensembles of state-of-the-art texture descriptors and to provide a robust classification tool to separate three different maturation stages of RPE cells by using phase contrast microscopy images. The same methods were also validated on a wide variety of biological image classification problems, such as histological or virus image classification. Methods For image classification we used different texture descriptors, descriptor ensembles and preprocessing techniques. Also, three new methods were tested. The first approach was an ensemble of preprocessing methods, to create an additional set of images. The second was the region-based approach, where saliency detection and wavelet decomposition divide each image in two different regions, from which features were extracted through different descriptors. The third method was an ensemble of Binarized Statistical Image Features, based on different sizes and thresholds. A Support Vector Machine (SVM) was trained for each descriptor histogram and the set of SVMs combined by sum rule. The accuracy of the computer vision tool was verified in classifying the hPSC-RPE cell maturation level. Dataset and Results The RPE dataset contains 1862 subwindows from 195 phase contrast images. The final descriptor ensemble outperformed the most recent stand-alone texture descriptors, obtaining, for the RPE dataset, an area under ROC curve (AUC) of 86.49% with the 10-fold cross validation and 91.98% with the leave-one-image-out protocol. The generality of the three proposed approaches was ascertained with 10 more biological image datasets, obtaining an average AUC greater than 97%. Conclusions Here we showed that the developed ensembles of texture descriptors are able to classify the RPE cell maturation stage. Moreover, we proved that preprocessing and region-based decomposition improves many descriptors’ accuracy in biological dataset classification. Finally, we built the first public dataset of stem cell-derived RPE cells, which is publicly available to the scientific community for classification studies. The proposed tool is available at https://www.dei.unipd.it/node/2357 and the RPE dataset at http://www.biomeditech.fi/data/RPE_dataset/. Both are available at https://figshare.com/s/d6fb591f1beb4f8efa6f. PMID:26895509

  10. Recognition of medication information from discharge summaries using ensembles of classifiers.

    PubMed

    Doan, Son; Collier, Nigel; Xu, Hua; Pham, Hoang Duy; Tu, Minh Phuong

    2012-05-07

    Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks. We investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting. Evaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge. Our experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition.

  11. Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China.

    PubMed

    Zhai, Binxu; Chen, Jianguo

    2018-04-18

    A stacked ensemble model is developed for forecasting and analyzing the daily average concentrations of fine particulate matter (PM 2.5 ) in Beijing, China. Special feature extraction procedures, including those of simplification, polynomial, transformation and combination, are conducted before modeling to identify potentially significant features based on an exploratory data analysis. Stability feature selection and tree-based feature selection methods are applied to select important variables and evaluate the degrees of feature importance. Single models including LASSO, Adaboost, XGBoost and multi-layer perceptron optimized by the genetic algorithm (GA-MLP) are established in the level 0 space and are then integrated by support vector regression (SVR) in the level 1 space via stacked generalization. A feature importance analysis reveals that nitrogen dioxide (NO 2 ) and carbon monoxide (CO) concentrations measured from the city of Zhangjiakou are taken as the most important elements of pollution factors for forecasting PM 2.5 concentrations. Local extreme wind speeds and maximal wind speeds are considered to extend the most effects of meteorological factors to the cross-regional transportation of contaminants. Pollutants found in the cities of Zhangjiakou and Chengde have a stronger impact on air quality in Beijing than other surrounding factors. Our model evaluation shows that the ensemble model generally performs better than a single nonlinear forecasting model when applied to new data with a coefficient of determination (R 2 ) of 0.90 and a root mean squared error (RMSE) of 23.69μg/m 3 . For single pollutant grade recognition, the proposed model performs better when applied to days characterized by good air quality than when applied to days registering high levels of pollution. The overall classification accuracy level is 73.93%, with most misclassifications made among adjacent categories. The results demonstrate the interpretability and generalizability of the stacked ensemble model. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  12. Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms.

    PubMed

    Barzegar, Rahim; Moghaddam, Asghar Asghari; Deo, Ravinesh; Fijani, Elham; Tziritis, Evangelos

    2018-04-15

    Constructing accurate and reliable groundwater risk maps provide scientifically prudent and strategic measures for the protection and management of groundwater. The objectives of this paper are to design and validate machine learning based-risk maps using ensemble-based modelling with an integrative approach. We employ the extreme learning machines (ELM), multivariate regression splines (MARS), M5 Tree and support vector regression (SVR) applied in multiple aquifer systems (e.g. unconfined, semi-confined and confined) in the Marand plain, North West Iran, to encapsulate the merits of individual learning algorithms in a final committee-based ANN model. The DRASTIC Vulnerability Index (VI) ranged from 56.7 to 128.1, categorized with no risk, low and moderate vulnerability thresholds. The correlation coefficient (r) and Willmott's Index (d) between NO 3 concentrations and VI were 0.64 and 0.314, respectively. To introduce improvements in the original DRASTIC method, the vulnerability indices were adjusted by NO 3 concentrations, termed as the groundwater contamination risk (GCR). Seven DRASTIC parameters utilized as the model inputs and GCR values utilized as the outputs of individual machine learning models were served in the fully optimized committee-based ANN-predictive model. The correlation indicators demonstrated that the ELM and SVR models outperformed the MARS and M5 Tree models, by virtue of a larger d and r value. Subsequently, the r and d metrics for the ANN-committee based multi-model in the testing phase were 0.8889 and 0.7913, respectively; revealing the superiority of the integrated (or ensemble) machine learning models when compared with the original DRASTIC approach. The newly designed multi-model ensemble-based approach can be considered as a pragmatic step for mapping groundwater contamination risks of multiple aquifer systems with multi-model techniques, yielding the high accuracy of the ANN committee-based model. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Biomedical named entity extraction: some issues of corpus compatibilities.

    PubMed

    Ekbal, Asif; Saha, Sriparna; Sikdar, Utpal Kumar

    2013-01-01

    Named Entity (NE) extraction is one of the most fundamental and important tasks in biomedical information extraction. It involves identification of certain entities from text and their classification into some predefined categories. In the biomedical community, there is yet no general consensus regarding named entity (NE) annotation; thus, it is very difficult to compare the existing systems due to corpus incompatibilities. Due to this problem we can not also exploit the advantages of using different corpora together. In our present work we address the issues of corpus compatibilities, and use a single objective optimization (SOO) based classifier ensemble technique that uses the search capability of genetic algorithm (GA) for NE extraction in biomedicine. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. We use Conditional Random Field (CRF) and Support Vector Machine (SVM) frameworks to build a number of models depending upon the various representations of the set of features and/or feature templates. It is to be noted that we tried to extract the features without using any deep domain knowledge and/or resources. In order to assess the challenges of corpus compatibilities, we experiment with the different benchmark datasets and their various combinations. Comparison results with the existing approaches prove the efficacy of the used technique. GA based ensemble achieves around 2% performance improvements over the individual classifiers. Degradation in performance on the integrated corpus clearly shows the difficulties of the task. In summary, our used ensemble based approach attains the state-of-the-art performance levels for entity extraction in three different kinds of biomedical datasets. The possible reasons behind the better performance in our used approach are the (i). use of variety and rich features as described in Subsection "Features for named entity extraction"; (ii) use of GA based classifier ensemble technique to combine the outputs of multiple classifiers.

  14. Diamond-Based Magnetic Imaging with Fourier Optical Processing

    NASA Astrophysics Data System (ADS)

    Backlund, Mikael P.; Kehayias, Pauli; Walsworth, Ronald L.

    2017-11-01

    Diamond-based magnetic field sensors have attracted great interest in recent years. In particular, wide-field magnetic imaging using nitrogen-vacancy (NV) centers in diamond has been previously demonstrated in condensed matter, biological, and paleomagnetic applications. Vector magnetic imaging with NV ensembles typically requires a significant applied field (>10 G ) to resolve the contributions from four crystallographic orientations, hindering studies of magnetic samples that require measurement in low or independently specified bias fields. Here we model and measure the complex amplitude distribution of NV emission at the microscope's Fourier plane and show that by modulating this collected light at the Fourier plane, one can decompose the NV ensemble magnetic resonance spectrum into its constituent orientations by purely optical means. This decomposition effectively extends the dynamic range at a given bias field and enables wide-field vector magnetic imaging at arbitrarily low bias fields, thus broadening potential applications of NV imaging and sensing. Our results demonstrate that NV-based microscopy stands to benefit greatly from Fourier optical approaches, which have already found widespread utility in other branches of microscopy.

  15. Moisture Damage Modeling in Lime and Chemically Modified Asphalt at Nanolevel Using Ensemble Computational Intelligence

    PubMed Central

    2018-01-01

    This paper measures the adhesion/cohesion force among asphalt molecules at nanoscale level using an Atomic Force Microscopy (AFM) and models the moisture damage by applying state-of-the-art Computational Intelligence (CI) techniques (e.g., artificial neural network (ANN), support vector regression (SVR), and an Adaptive Neuro Fuzzy Inference System (ANFIS)). Various combinations of lime and chemicals as well as dry and wet environments are used to produce different asphalt samples. The parameters that were varied to generate different asphalt samples and measure the corresponding adhesion/cohesion forces are percentage of antistripping agents (e.g., Lime and Unichem), AFM tips K values, and AFM tip types. The CI methods are trained to model the adhesion/cohesion forces given the variation in values of the above parameters. To achieve enhanced performance, the statistical methods such as average, weighted average, and regression of the outputs generated by the CI techniques are used. The experimental results show that, of the three individual CI methods, ANN can model moisture damage to lime- and chemically modified asphalt better than the other two CI techniques for both wet and dry conditions. Moreover, the ensemble of CI along with statistical measurement provides better accuracy than any of the individual CI techniques. PMID:29849551

  16. Enhancing the discrimination accuracy between metastases, gliomas and meningiomas on brain MRI by volumetric textural features and ensemble pattern recognition methods.

    PubMed

    Georgiadis, Pantelis; Cavouras, Dionisis; Kalatzis, Ioannis; Glotsos, Dimitris; Athanasiadis, Emmanouil; Kostopoulos, Spiros; Sifaki, Koralia; Malamas, Menelaos; Nikiforidis, George; Solomou, Ekaterini

    2009-01-01

    Three-dimensional (3D) texture analysis of volumetric brain magnetic resonance (MR) images has been identified as an important indicator for discriminating among different brain pathologies. The purpose of this study was to evaluate the efficiency of 3D textural features using a pattern recognition system in the task of discriminating benign, malignant and metastatic brain tissues on T1 postcontrast MR imaging (MRI) series. The dataset consisted of 67 brain MRI series obtained from patients with verified and untreated intracranial tumors. The pattern recognition system was designed as an ensemble classification scheme employing a support vector machine classifier, specially modified in order to integrate the least squares features transformation logic in its kernel function. The latter, in conjunction with using 3D textural features, enabled boosting up the performance of the system in discriminating metastatic, malignant and benign brain tumors with 77.14%, 89.19% and 93.33% accuracy, respectively. The method was evaluated using an external cross-validation process; thus, results might be considered indicative of the generalization performance of the system to "unseen" cases. The proposed system might be used as an assisting tool for brain tumor characterization on volumetric MRI series.

  17. Boosting with Averaged Weight Vectors

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)

    2002-01-01

    AdaBoost is a well-known ensemble learning algorithm that constructs its constituent or base models in sequence. A key step in AdaBoost is constructing a distribution over the training examples to create each base model. This distribution, represented as a vector, is constructed to be orthogonal to the vector of mistakes made by the previous base model in the sequence. The idea is to make the next base model's errors uncorrelated with those of the previous model. Some researchers have pointed out the intuition that it is probably better to construct a distribution that is orthogonal to the mistake vectors of all the previous base models, but that this is not always possible. We present an algorithm that attempts to come as close as possible to this goal in an efficient manner. We present experimental results demonstrating significant improvement over AdaBoost and the Totally Corrective boosting algorithm, which also attempts to satisfy this goal.

  18. Visualization and Nowcasting for Aviation using online verified ensemble weather radar extrapolation.

    NASA Astrophysics Data System (ADS)

    Kaltenboeck, Rudolf; Kerschbaum, Markus; Hennermann, Karin; Mayer, Stefan

    2013-04-01

    Nowcasting of precipitation events, especially thunderstorm events or winter storms, has high impact on flight safety and efficiency for air traffic management. Future strategic planning by air traffic control will result in circumnavigation of potential hazardous areas, reduction of load around efficiency hot spots by offering alternatives, increase of handling capacity, anticipation of avoidance manoeuvres and increase of awareness before dangerous areas are entered by aircraft. To facilitate this rapid update forecasts of location, intensity, size, movement and development of local storms are necessary. Weather radar data deliver precipitation analysis of high temporal and spatial resolution close to real time by using clever scanning strategies. These data are the basis to generate rapid update forecasts in a time frame up to 2 hours and more for applications in aviation meteorological service provision, such as optimizing safety and economic impact in the context of sub-scale phenomena. On the basis of tracking radar echoes by correlation the movement vectors of successive weather radar images are calculated. For every new successive radar image a set of ensemble precipitation fields is collected by using different parameter sets like pattern match size, different time steps, filter methods and an implementation of history of tracking vectors and plausibility checks. This method considers the uncertainty in rain field displacement and different scales in time and space. By validating manually a set of case studies, the best verification method and skill score is defined and implemented into an online-verification scheme which calculates the optimized forecasts for different time steps and different areas by using different extrapolation ensemble members. To get information about the quality and reliability of the extrapolation process additional information of data quality (e.g. shielding in Alpine areas) is extrapolated and combined with an extrapolation-quality-index. Subsequently the probability and quality information of the forecast ensemble is available and flexible blending to numerical prediction model for each subarea is possible. Simultaneously with automatic processing the ensemble nowcasting product is visualized in a new innovative way which combines the intensity, probability and quality information for different subareas in one forecast image.

  19. More than just trash bins? Potential roles for extracellular vesicles in the vertical and horizontal transmission of yeast prions.

    PubMed

    Kabani, Mehdi; Melki, Ronald

    2016-05-01

    In the yeast Saccharomyces cerevisiae, an ensemble of structurally and functionally diverse cytoplasmic proteins has the ability to form self-perpetuating protein aggregates (e.g. prions) which are the vectors of heritable non-Mendelian phenotypic traits. Whether harboring these prions is deleterious-akin to mammalian degenerative disorders-or beneficial-as epigenetic modifiers of gene expression-for yeasts has been intensely debated and strong arguments were made in support of both views. We recently reported that the yeast prion protein Sup35p is exported via extracellular vesicles (EV), both in its soluble and aggregated infectious states. Herein, we discuss the possible implications of this observation and propose several hypotheses regarding the roles of EV in both vertical and horizontal propagation of 'good' and 'bad' yeast prions.

  20. Displacement data assimilation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rosenthal, W. Steven; Venkataramani, Shankar; Mariano, Arthur J.

    We show that modifying a Bayesian data assimilation scheme by incorporating kinematically-consistent displacement corrections produces a scheme that is demonstrably better at estimating partially observed state vectors in a setting where feature information is important. While the displacement transformation is generic, here we implement it within an ensemble Kalman Filter framework and demonstrate its effectiveness in tracking stochastically perturbed vortices.

  1. Oceanic ensemble forecasting in the Gulf of Mexico: An application to the case of the Deep Water Horizon oil spill

    NASA Astrophysics Data System (ADS)

    Khade, Vikram; Kurian, Jaison; Chang, Ping; Szunyogh, Istvan; Thyng, Kristen; Montuoro, Raffaele

    2017-05-01

    This paper demonstrates the potential of ocean ensemble forecasting in the Gulf of Mexico (GoM). The Bred Vector (BV) technique with one week rescaling frequency is implemented on a 9 km resolution version of the Regional Ocean Modelling System (ROMS). Numerical experiments are carried out by using the HYCOM analysis products to define the initial conditions and the lateral boundary conditions. The growth rates of the forecast uncertainty are estimated to be about 10% of initial amplitude per week. By carrying out ensemble forecast experiments with and without perturbed surface forcing, it is demonstrated that in the coastal regions accounting for uncertainties in the atmospheric forcing is more important than accounting for uncertainties in the ocean initial conditions. In the Loop Current region, the initial condition uncertainties, are the dominant source of the forecast uncertainty. The root-mean-square error of the Lagrangian track forecasts at the 15-day forecast lead time can be reduced by about 10 - 50 km using the ensemble mean Eulerian forecast of the oceanic flow for the computation of the tracks, instead of the single-initial-condition Eulerian forecast.

  2. Ensemble Nonlinear Autoregressive Exogenous Artificial Neural Networks for Short-Term Wind Speed and Power Forecasting.

    PubMed

    Men, Zhongxian; Yee, Eugene; Lien, Fue-Sang; Yang, Zhiling; Liu, Yongqian

    2014-01-01

    Short-term wind speed and wind power forecasts (for a 72 h period) are obtained using a nonlinear autoregressive exogenous artificial neural network (ANN) methodology which incorporates either numerical weather prediction or high-resolution computational fluid dynamics wind field information as an exogenous input. An ensemble approach is used to combine the predictions from many candidate ANNs in order to provide improved forecasts for wind speed and power, along with the associated uncertainties in these forecasts. More specifically, the ensemble ANN is used to quantify the uncertainties arising from the network weight initialization and from the unknown structure of the ANN. All members forming the ensemble of neural networks were trained using an efficient particle swarm optimization algorithm. The results of the proposed methodology are validated using wind speed and wind power data obtained from an operational wind farm located in Northern China. The assessment demonstrates that this methodology for wind speed and power forecasting generally provides an improvement in predictive skills when compared to the practice of using an "optimal" weight vector from a single ANN while providing additional information in the form of prediction uncertainty bounds.

  3. Ensemble Nonlinear Autoregressive Exogenous Artificial Neural Networks for Short-Term Wind Speed and Power Forecasting

    PubMed Central

    Lien, Fue-Sang; Yang, Zhiling; Liu, Yongqian

    2014-01-01

    Short-term wind speed and wind power forecasts (for a 72 h period) are obtained using a nonlinear autoregressive exogenous artificial neural network (ANN) methodology which incorporates either numerical weather prediction or high-resolution computational fluid dynamics wind field information as an exogenous input. An ensemble approach is used to combine the predictions from many candidate ANNs in order to provide improved forecasts for wind speed and power, along with the associated uncertainties in these forecasts. More specifically, the ensemble ANN is used to quantify the uncertainties arising from the network weight initialization and from the unknown structure of the ANN. All members forming the ensemble of neural networks were trained using an efficient particle swarm optimization algorithm. The results of the proposed methodology are validated using wind speed and wind power data obtained from an operational wind farm located in Northern China. The assessment demonstrates that this methodology for wind speed and power forecasting generally provides an improvement in predictive skills when compared to the practice of using an “optimal” weight vector from a single ANN while providing additional information in the form of prediction uncertainty bounds. PMID:27382627

  4. Can-Evo-Ens: Classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences.

    PubMed

    Ali, Safdar; Majid, Abdul

    2015-04-01

    The diagnostic of human breast cancer is an intricate process and specific indicators may produce negative results. In order to avoid misleading results, accurate and reliable diagnostic system for breast cancer is indispensable. Recently, several interesting machine-learning (ML) approaches are proposed for prediction of breast cancer. To this end, we developed a novel classifier stacking based evolutionary ensemble system "Can-Evo-Ens" for predicting amino acid sequences associated with breast cancer. In this paper, first, we selected four diverse-type of ML algorithms of Naïve Bayes, K-Nearest Neighbor, Support Vector Machines, and Random Forest as base-level classifiers. These classifiers are trained individually in different feature spaces using physicochemical properties of amino acids. In order to exploit the decision spaces, the preliminary predictions of base-level classifiers are stacked. Genetic programming (GP) is then employed to develop a meta-classifier that optimal combine the predictions of the base classifiers. The most suitable threshold value of the best-evolved predictor is computed using Particle Swarm Optimization technique. Our experiments have demonstrated the robustness of Can-Evo-Ens system for independent validation dataset. The proposed system has achieved the highest value of Area Under Curve (AUC) of ROC Curve of 99.95% for cancer prediction. The comparative results revealed that proposed approach is better than individual ML approaches and conventional ensemble approaches of AdaBoostM1, Bagging, GentleBoost, and Random Subspace. It is expected that the proposed novel system would have a major impact on the fields of Biomedical, Genomics, Proteomics, Bioinformatics, and Drug Development. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Estimation of the uncertainty of a climate model using an ensemble simulation

    NASA Astrophysics Data System (ADS)

    Barth, A.; Mathiot, P.; Goosse, H.

    2012-04-01

    The atmospheric forcings play an important role in the study of the ocean and sea-ice dynamics of the Southern Ocean. Error in the atmospheric forcings will inevitably result in uncertain model results. The sensitivity of the model results to errors in the atmospheric forcings are studied with ensemble simulations using multivariate perturbations of the atmospheric forcing fields. The numerical ocean model used is the NEMO-LIM in a global configuration with an horizontal resolution of 2°. NCEP reanalyses are used to provide air temperature and wind data to force the ocean model over the last 50 years. A climatological mean is used to prescribe relative humidity, cloud cover and precipitation. In a first step, the model results is compared with OSTIA SST and OSI SAF sea ice concentration of the southern hemisphere. The seasonal behavior of the RMS difference and bias in SST and ice concentration is highlighted as well as the regions with relatively high RMS errors and biases such as the Antarctic Circumpolar Current and near the ice-edge. Ensemble simulations are performed to statistically characterize the model error due to uncertainties in the atmospheric forcings. Such information is a crucial element for future data assimilation experiments. Ensemble simulations are performed with perturbed air temperature and wind forcings. A Fourier decomposition of the NCEP wind vectors and air temperature for 2007 is used to generate ensemble perturbations. The perturbations are scaled such that the resulting ensemble spread matches approximately the RMS differences between the satellite SST and sea ice concentration. The ensemble spread and covariance are analyzed for the minimum and maximum sea ice extent. It is shown that errors in the atmospheric forcings can extend to several hundred meters in depth near the Antarctic Circumpolar Current.

  6. DART: Tools and Support for Ensemble Data Assimilation Research, Operations, and Education

    NASA Astrophysics Data System (ADS)

    Hoar, T. J.; Anderson, J. L.; Collins, N.; Raeder, K.; Kershaw, H.; Romine, G. S.; Mizzi, A. P.; Chatterjee, A.; Karspeck, A. R.; Zarzycki, C. M.; Ha, S. Y.; Barre, J.; Gaubert, B.

    2014-12-01

    The Data Assimilation Research Testbed (DART) is a community facility for ensemble data assimilation developed and supported by the National Center for Atmospheric Research. DART provides a comprehensive suite of software, documentation, examples and tutorials that can be used for ensemble data assimilation research, operations, and education. Scientists and software engineers from the Data Assimilation Research Section at NCAR are available to actively support DART users who want to use existing DART products or develop their own new applications. Current DART users range from university professors teaching data assimilation, to individual graduate students working with simple models, through national laboratories doing operational prediction with large state-of-the-art models. DART runs efficiently on many computational platforms ranging from laptops through thousands of cores on the newest supercomputers. This poster focuses on several recent research activities using DART with geophysical models. First, DART is being used with the Community Atmosphere Model Spectral Element (CAM-SE) and Model for Prediction Across Scales (MPAS) global atmospheric models that support locally enhanced grid resolution. Initial results from ensemble assimilation with both models are presented. DART is also being used to produce ensemble analyses of atmospheric tracers, in particular CO, in both the global CAM-Chem model and the regional Weather Research and Forecast with chemistry (WRF-Chem) model by assimilating observations from the Measurements of Pollution in the Troposphere (MOPITT) and Infrared Atmospheric Sounding Interferometer (IASI) instruments. Results from ensemble analyses in both models are presented. An interface between DART and the Community Atmosphere Biosphere Land Exchange (CABLE) model has been completed and ensemble land surface analyses with DART/CABLE will be discussed. Finally, an update on ensemble analyses in the fully-coupled Community Earth System (CESM) is presented. The poster includes instructions on how to get started using DART for research or educational applications.

  7. Representation of photon limited data in emission tomography using origin ensembles

    NASA Astrophysics Data System (ADS)

    Sitek, A.

    2008-06-01

    Representation and reconstruction of data obtained by emission tomography scanners are challenging due to high noise levels in the data. Typically, images obtained using tomographic measurements are represented using grids. In this work, we define images as sets of origins of events detected during tomographic measurements; we call these origin ensembles (OEs). A state in the ensemble is characterized by a vector of 3N parameters Y, where the parameters are the coordinates of origins of detected events in a three-dimensional space and N is the number of detected events. The 3N-dimensional probability density function (PDF) for that ensemble is derived, and we present an algorithm for OE image estimation from tomographic measurements. A displayable image (e.g. grid based image) is derived from the OE formulation by calculating ensemble expectations based on the PDF using the Markov chain Monte Carlo method. The approach was applied to computer-simulated 3D list-mode positron emission tomography data. The reconstruction errors for a 10 000 000 event acquisition for simulated ranged from 0.1 to 34.8%, depending on object size and sampling density. The method was also applied to experimental data and the results of the OE method were consistent with those obtained by a standard maximum-likelihood approach. The method is a new approach to representation and reconstruction of data obtained by photon-limited emission tomography measurements.

  8. Simplified model of statistically stationary spacecraft rotation and associated induced gravity environments

    NASA Technical Reports Server (NTRS)

    Fichtl, G. H.; Holland, R. L.

    1978-01-01

    A stochastic model of spacecraft motion was developed based on the assumption that the net torque vector due to crew activity and rocket thruster firings is a statistically stationary Gaussian vector process. The process had zero ensemble mean value, and the components of the torque vector were mutually stochastically independent. The linearized rigid-body equations of motion were used to derive the autospectral density functions of the components of the spacecraft rotation vector. The cross-spectral density functions of the components of the rotation vector vanish for all frequencies so that the components of rotation were mutually stochastically independent. The autospectral and cross-spectral density functions of the induced gravity environment imparted to scientific apparatus rigidly attached to the spacecraft were calculated from the rotation rate spectral density functions via linearized inertial frame to body-fixed principal axis frame transformation formulae. The induced gravity process was a Gaussian one with zero mean value. Transformation formulae were used to rotate the principal axis body-fixed frame to which the rotation rate and induced gravity vector were referred to a body-fixed frame in which the components of the induced gravity vector were stochastically independent. Rice's theory of exceedances was used to calculate expected exceedance rates of the components of the rotation and induced gravity vector processes.

  9. Application of Bred Vectors To Data Assimilation

    NASA Astrophysics Data System (ADS)

    Corazza, M.; Kalnay, E.; Patil, Dj

    We introduced a statistic, the BV-dimension, to measure the effective local finite-time dimensionality of the atmosphere. We show that this dimension is often quite low, and suggest that this finding has important implications for data assimilation and the accuracy of weather forecasting (Patil et al, 2001). The original database for this study was the forecasts of the NCEP global ensemble forecasting system. The initial differences between the control forecast and the per- turbed forecasts are called bred vectors. The control and perturbed initial conditions valid at time t=n(t are evolved using the forecast model until time t=(n+1) (t. The differences between the perturbed and the control forecasts are scaled down to their initial amplitude, and constitute the bred vectors valid at (n+1) (t. Their growth rate is typically about 1.5/day. The bred vectors are similar by construction to leading Lya- punov vectors except that they have small but finite amplitude, and they are valid at finite times. The original NCEP ensemble data set has 5 independent bred vectors. We define a local bred vector at each grid point by choosing the 5 by 5 grid points centered at the grid point (a region of about 1100km by 1100km), and using the north-south and east- west velocity components at 500mb pressure level to form a 50 dimensional column vector. Since we have k=5 global bred vectors, we also have k local bred vectors at each grid point. We estimate the effective dimensionality of the subspace spanned by the local bred vectors by performing a singular value decomposition (EOF analysis). The k local bred vector columns form a 50xk matrix M. The singular values s(i) of M measure the extent to which the k column unit vectors making up the matrix M point in the direction of v(i). We define the bred vector dimension as BVDIM={Sum[s(i)]}^2/{Sum[s(i)]^2} For example, if 4 out of the 5 vectors lie along v, and one lies along v, the BV- dimension would be BVDIM[sqrt(4), 1, 0,0,0]=1.8, less than 2 because one direction is more dominant than the other in representing the original data. The results (Patil et al, 2001) show that there are large regions where the bred vectors span a subspace of substantially lower dimension than that of the full space. These low dimensionality regions are dominant in the baroclinic extratropics, typically have a lifetime of 3-7 days, have a well-defined horizontal and vertical structure that spans 1 most of the atmosphere, and tend to move eastward. New results with a large number of ensemble members confirm these results and indicate that the low dimensionality regions are quite robust, and depend only on the verification time (i.e., the underlying flow). Corazza et al (2001) have performed experiments with a data assimilation system based on a quasi-geostrophic model and simulated observations (Morss, 1999, Hamill et al, 2000). A 3D-variational data assimilation scheme for a quasi-geostrophic chan- nel model is used to study the structure of the background error and its relationship to the corresponding bred vectors. The "true" evolution of the model atmosphere is defined by an integration of the model and "rawinsonde observations" are simulated by randomly perturbing the true state at fixed locations. It is found that after 3-5 days the bred vectors develop well organized structures which are very similar for the two different norms considered in this paper (potential vorticity norm and streamfunction norm). The results show that the bred vectors do indeed represent well the characteristics of the data assimilation forecast errors, and that the subspace of bred vectors contains most of the forecast error, except in areas where the forecast errors are small. For example, the angle between the 6hr forecast error and the subspace spanned by 10 bred vectors is less than 10o over 90% of the domain, indicating a pattern correlation of more than 98.5% between the forecast error and its projection onto the bred vector subspace. The presence of low-dimensional regions in the perturbations of the basic flow has important implications for data assimilation. At any given time, there is a difference between the true atmospheric state and the model forecast. Assuming that model er- rors are not the dominant source of errors, in a region of low BV-dimensionality the difference between the true state and the forecast should lie substantially in the low dimensional unstable subspace of the few bred vectors that contribute most strongly to the low BV-dimension. This information should yield a substantial improvement in the forecast: the data assimilation algorithm should correct the model state by moving it closer to the observations along the unstable subspace, since this is where the true state most likely lies. Preliminary experiments have been conducted with the quasi-geostrophic data assim- ilation system testing whether it is possible to add "errors of the day" based on bred vectors to the standard (constant) 3D-Var background error covariance in order to capture these important errors. The results are extremely encouraging, indicating a significant reduction (about 40%) in the analysis errors at a very low computational cost. References: 2 Corazza, M., E. Kalnay, DJ Patil, R. Morss, M Cai, I. Szunyogh, BR Hunt, E Ott and JA Yorke, 2001: Use of the breeding technique to estimate the structure of the analysis "errors of the day". Submitted to Nonlinear Processes in Geophysics. Hamill, T.M., Snyder, C., and Morss, R.E., 2000: A Comparison of Probabilistic Fore- casts from Bred, Singular-Vector and Perturbed Observation Ensembles, Mon. Wea. Rev., 128, 1835­1851. Kalnay, E., and Z. Toth, 1994: Removing growing errors in the analysis cycle. Preprints of the Tenth Conference on Numerical Weather Prediction, Amer. Meteor. Soc., 1994, 212-215. Morss, R. E., 1999: Adaptive observations: Idealized sampling strategies for improv- ing numerical weather prediction. PHD thesis, Massachussetts Institute of technology, 225pp. Patil, D. J. S., B. R. Hunt, E. Kalnay, J. A. Yorke, and E. Ott., 2001: Local Low Dimensionality of Atmospheric Dynamics. Phys. Rev. Lett., 86, 5878. 3

  10. Scalar and vector Keldysh models in the time domain

    NASA Astrophysics Data System (ADS)

    Kiselev, M. N.; Kikoin, K. A.

    2009-04-01

    The exactly solvable Keldysh model of disordered electron system in a random scattering field with extremely long correlation length is converted to the time-dependent model with extremely long relaxation. The dynamical problem is solved for the ensemble of two-level systems (TLS) with fluctuating well depths having the discrete Z 2 symmetry. It is shown also that the symmetric TLS with fluctuating barrier transparency may be described in terms of the vector Keldysh model with dime-dependent random planar rotations in xy plane having continuous SO(2) symmetry. Application of this model to description of dynamic fluctuations in quantum dots and optical lattices is discussed.

  11. Group Cohesion, Collective Efficacy, and Motivational Climate As Predictors of Conductor Support in Music Ensembles

    ERIC Educational Resources Information Center

    Matthews, Wendy K.; Kitsantas, Anastasia

    2007-01-01

    In the present study, we examined whether collective efficacy, group cohesion (task and social), and perceived motivational climate (task-involving and ego-involving orientations) in a music ensemble predict instrumentalists' perceived conductor support. Ninety-one (N = 91) skilled high school instrumentalists participated in the study. To assess…

  12. Verification of Ensemble Forecasts for the New York City Operations Support Tool

    NASA Astrophysics Data System (ADS)

    Day, G.; Schaake, J. C.; Thiemann, M.; Draijer, S.; Wang, L.

    2012-12-01

    The New York City water supply system operated by the Department of Environmental Protection (DEP) serves nine million people. It covers 2,000 square miles of portions of the Catskill, Delaware, and Croton watersheds, and it includes nineteen reservoirs and three controlled lakes. DEP is developing an Operations Support Tool (OST) to support its water supply operations and planning activities. OST includes historical and real-time data, a model of the water supply system complete with operating rules, and lake water quality models developed to evaluate alternatives for managing turbidity in the New York City Catskill reservoirs. OST will enable DEP to manage turbidity in its unfiltered system while satisfying its primary objective of meeting the City's water supply needs, in addition to considering secondary objectives of maintaining ecological flows, supporting fishery and recreation releases, and mitigating downstream flood peaks. The current version of OST relies on statistical forecasts of flows in the system based on recent observed flows. To improve short-term decision making, plans are being made to transition to National Weather Service (NWS) ensemble forecasts based on hydrologic models that account for short-term weather forecast skill, longer-term climate information, as well as the hydrologic state of the watersheds and recent observed flows. To ensure that the ensemble forecasts are unbiased and that the ensemble spread reflects the actual uncertainty of the forecasts, a statistical model has been developed to post-process the NWS ensemble forecasts to account for hydrologic model error as well as any inherent bias and uncertainty in initial model states, meteorological data and forecasts. The post-processor is designed to produce adjusted ensemble forecasts that are consistent with the DEP historical flow sequences that were used to develop the system operating rules. A set of historical hindcasts that is representative of the real-time ensemble forecasts is needed to verify that the post-processed forecasts are unbiased, statistically reliable, and preserve the skill inherent in the "raw" NWS ensemble forecasts. A verification procedure and set of metrics will be presented that provide an objective assessment of ensemble forecasts. The procedure will be applied to both raw ensemble hindcasts and to post-processed ensemble hindcasts. The verification metrics will be used to validate proper functioning of the post-processor and to provide a benchmark for comparison of different types of forecasts. For example, current NWS ensemble forecasts are based on climatology, using each historical year to generate a forecast trace. The NWS Hydrologic Ensemble Forecast System (HEFS) under development will utilize output from both the National Oceanic Atmospheric Administration (NOAA) Global Ensemble Forecast System (GEFS) and the Climate Forecast System (CFS). Incorporating short-term meteorological forecasts and longer-term climate forecast information should provide sharper, more accurate forecasts. Hindcasts from HEFS will enable New York City to generate verification results to validate the new forecasts and further fine-tune system operating rules. Project verification results will be presented for different watersheds across a range of seasons, lead times, and flow levels to assess the quality of the current ensemble forecasts.

  13. Predictability of machine learning techniques to forecast the trends of market index prices: Hypothesis testing for the Korean stock markets.

    PubMed

    Pyo, Sujin; Lee, Jaewook; Cha, Mincheol; Jang, Huisu

    2017-01-01

    The prediction of the trends of stocks and index prices is one of the important issues to market participants. Investors have set trading or fiscal strategies based on the trends, and considerable research in various academic fields has been studied to forecast financial markets. This study predicts the trends of the Korea Composite Stock Price Index 200 (KOSPI 200) prices using nonparametric machine learning models: artificial neural network, support vector machines with polynomial and radial basis function kernels. In addition, this study states controversial issues and tests hypotheses about the issues. Accordingly, our results are inconsistent with those of the precedent research, which are generally considered to have high prediction performance. Moreover, Google Trends proved that they are not effective factors in predicting the KOSPI 200 index prices in our frameworks. Furthermore, the ensemble methods did not improve the accuracy of the prediction.

  14. Ensemble of sparse classifiers for high-dimensional biological data.

    PubMed

    Kim, Sunghan; Scalzo, Fabien; Telesca, Donatello; Hu, Xiao

    2015-01-01

    Biological data are often high in dimension while the number of samples is small. In such cases, the performance of classification can be improved by reducing the dimension of data, which is referred to as feature selection. Recently, a novel feature selection method has been proposed utilising the sparsity of high-dimensional biological data where a small subset of features accounts for most variance of the dataset. In this study we propose a new classification method for high-dimensional biological data, which performs both feature selection and classification within a single framework. Our proposed method utilises a sparse linear solution technique and the bootstrap aggregating algorithm. We tested its performance on four public mass spectrometry cancer datasets along with two other conventional classification techniques such as Support Vector Machines and Adaptive Boosting. The results demonstrate that our proposed method performs more accurate classification across various cancer datasets than those conventional classification techniques.

  15. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules.

    PubMed

    Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-12-01

    We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.

  16. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules.

    PubMed

    Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-09-01

    We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.

  17. The incorrect usage of singular spectral analysis and discrete wavelet transform in hybrid models to predict hydrological time series

    NASA Astrophysics Data System (ADS)

    Du, Kongchang; Zhao, Ying; Lei, Jiaqiang

    2017-09-01

    In hydrological time series prediction, singular spectrum analysis (SSA) and discrete wavelet transform (DWT) are widely used as preprocessing techniques for artificial neural network (ANN) and support vector machine (SVM) predictors. These hybrid or ensemble models seem to largely reduce the prediction error. In current literature researchers apply these techniques to the whole observed time series and then obtain a set of reconstructed or decomposed time series as inputs to ANN or SVM. However, through two comparative experiments and mathematical deduction we found the usage of SSA and DWT in building hybrid models is incorrect. Since SSA and DWT adopt 'future' values to perform the calculation, the series generated by SSA reconstruction or DWT decomposition contain information of 'future' values. These hybrid models caused incorrect 'high' prediction performance and may cause large errors in practice.

  18. Predictability of machine learning techniques to forecast the trends of market index prices: Hypothesis testing for the Korean stock markets

    PubMed Central

    Pyo, Sujin; Lee, Jaewook; Cha, Mincheol

    2017-01-01

    The prediction of the trends of stocks and index prices is one of the important issues to market participants. Investors have set trading or fiscal strategies based on the trends, and considerable research in various academic fields has been studied to forecast financial markets. This study predicts the trends of the Korea Composite Stock Price Index 200 (KOSPI 200) prices using nonparametric machine learning models: artificial neural network, support vector machines with polynomial and radial basis function kernels. In addition, this study states controversial issues and tests hypotheses about the issues. Accordingly, our results are inconsistent with those of the precedent research, which are generally considered to have high prediction performance. Moreover, Google Trends proved that they are not effective factors in predicting the KOSPI 200 index prices in our frameworks. Furthermore, the ensemble methods did not improve the accuracy of the prediction. PMID:29136004

  19. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules

    NASA Astrophysics Data System (ADS)

    Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-12-01

    We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.

  20. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules

    NASA Astrophysics Data System (ADS)

    Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-09-01

    We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.

  1. An ensemble of dissimilarity based classifiers for Mackerel gender determination

    NASA Astrophysics Data System (ADS)

    Blanco, A.; Rodriguez, R.; Martinez-Maranon, I.

    2014-03-01

    Mackerel is an infravalored fish captured by European fishing vessels. A manner to add value to this specie can be achieved by trying to classify it attending to its sex. Colour measurements were performed on Mackerel females and males (fresh and defrozen) extracted gonads to obtain differences between sexes. Several linear and non linear classifiers such as Support Vector Machines (SVM), k Nearest Neighbors (k-NN) or Diagonal Linear Discriminant Analysis (DLDA) can been applied to this problem. However, theyare usually based on Euclidean distances that fail to reflect accurately the sample proximities. Classifiers based on non-Euclidean dissimilarities misclassify a different set of patterns. We combine different kind of dissimilarity based classifiers. The diversity is induced considering a set of complementary dissimilarities for each model. The experimental results suggest that our algorithm helps to improve classifiers based on a single dissimilarity.

  2. Fatigue design of a cellular phone folder using regression model-based multi-objective optimization

    NASA Astrophysics Data System (ADS)

    Kim, Young Gyun; Lee, Jongsoo

    2016-08-01

    In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.

  3. Failure analysis of parameter-induced simulation crashes in climate models

    NASA Astrophysics Data System (ADS)

    Lucas, D. D.; Klein, R.; Tannahill, J.; Ivanova, D.; Brandon, S.; Domyancic, D.; Zhang, Y.

    2013-01-01

    Simulations using IPCC-class climate models are subject to fail or crash for a variety of reasons. Quantitative analysis of the failures can yield useful insights to better understand and improve the models. During the course of uncertainty quantification (UQ) ensemble simulations to assess the effects of ocean model parameter uncertainties on climate simulations, we experienced a series of simulation crashes within the Parallel Ocean Program (POP2) component of the Community Climate System Model (CCSM4). About 8.5% of our CCSM4 simulations failed for numerical reasons at combinations of POP2 parameter values. We apply support vector machine (SVM) classification from machine learning to quantify and predict the probability of failure as a function of the values of 18 POP2 parameters. A committee of SVM classifiers readily predicts model failures in an independent validation ensemble, as assessed by the area under the receiver operating characteristic (ROC) curve metric (AUC > 0.96). The causes of the simulation failures are determined through a global sensitivity analysis. Combinations of 8 parameters related to ocean mixing and viscosity from three different POP2 parameterizations are the major sources of the failures. This information can be used to improve POP2 and CCSM4 by incorporating correlations across the relevant parameters. Our method can also be used to quantify, predict, and understand simulation crashes in other complex geoscientific models.

  4. Failure analysis of parameter-induced simulation crashes in climate models

    NASA Astrophysics Data System (ADS)

    Lucas, D. D.; Klein, R.; Tannahill, J.; Ivanova, D.; Brandon, S.; Domyancic, D.; Zhang, Y.

    2013-08-01

    Simulations using IPCC (Intergovernmental Panel on Climate Change)-class climate models are subject to fail or crash for a variety of reasons. Quantitative analysis of the failures can yield useful insights to better understand and improve the models. During the course of uncertainty quantification (UQ) ensemble simulations to assess the effects of ocean model parameter uncertainties on climate simulations, we experienced a series of simulation crashes within the Parallel Ocean Program (POP2) component of the Community Climate System Model (CCSM4). About 8.5% of our CCSM4 simulations failed for numerical reasons at combinations of POP2 parameter values. We applied support vector machine (SVM) classification from machine learning to quantify and predict the probability of failure as a function of the values of 18 POP2 parameters. A committee of SVM classifiers readily predicted model failures in an independent validation ensemble, as assessed by the area under the receiver operating characteristic (ROC) curve metric (AUC > 0.96). The causes of the simulation failures were determined through a global sensitivity analysis. Combinations of 8 parameters related to ocean mixing and viscosity from three different POP2 parameterizations were the major sources of the failures. This information can be used to improve POP2 and CCSM4 by incorporating correlations across the relevant parameters. Our method can also be used to quantify, predict, and understand simulation crashes in other complex geoscientific models.

  5. Ensembles vs. information theory: supporting science under uncertainty

    NASA Astrophysics Data System (ADS)

    Nearing, Grey S.; Gupta, Hoshin V.

    2018-05-01

    Multi-model ensembles are one of the most common ways to deal with epistemic uncertainty in hydrology. This is a problem because there is no known way to sample models such that the resulting ensemble admits a measure that has any systematic (i.e., asymptotic, bounded, or consistent) relationship with uncertainty. Multi-model ensembles are effectively sensitivity analyses and cannot - even partially - quantify uncertainty. One consequence of this is that multi-model approaches cannot support a consistent scientific method - in particular, multi-model approaches yield unbounded errors in inference. In contrast, information theory supports a coherent hypothesis test that is robust to (i.e., bounded under) arbitrary epistemic uncertainty. This paper may be understood as advocating a procedure for hypothesis testing that does not require quantifying uncertainty, but is coherent and reliable (i.e., bounded) in the presence of arbitrary (unknown and unknowable) uncertainty. We conclude by offering some suggestions about how this proposed philosophy of science suggests new ways to conceptualize and construct simulation models of complex, dynamical systems.

  6. Invariant measures in brain dynamics

    NASA Astrophysics Data System (ADS)

    Boyarsky, Abraham; Góra, Paweł

    2006-10-01

    This note concerns brain activity at the level of neural ensembles and uses ideas from ergodic dynamical systems to model and characterize chaotic patterns among these ensembles during conscious mental activity. Central to our model is the definition of a space of neural ensembles and the assumption of discrete time ensemble dynamics. We argue that continuous invariant measures draw the attention of deeper brain processes, engendering emergent properties such as consciousness. Invariant measures supported on a finite set of ensembles reflect periodic behavior, whereas the existence of continuous invariant measures reflect the dynamics of nonrepeating ensemble patterns that elicit the interest of deeper mental processes. We shall consider two different ways to achieve continuous invariant measures on the space of neural ensembles: (1) via quantum jitters, and (2) via sensory input accompanied by inner thought processes which engender a “folding” property on the space of ensembles.

  7. Ensemble Simulation of the Atmospheric Radionuclides Discharged by the Fukushima Nuclear Accident

    NASA Astrophysics Data System (ADS)

    Sekiyama, Thomas; Kajino, Mizuo; Kunii, Masaru

    2013-04-01

    Enormous amounts of radionuclides were discharged into the atmosphere by a nuclear accident at the Fukushima Daiichi nuclear power plant (FDNPP) after the earthquake and tsunami on 11 March 2011. The radionuclides were dispersed from the power plant and deposited mainly over eastern Japan and the North Pacific Ocean. A lot of numerical simulations of the radionuclide dispersion and deposition had been attempted repeatedly since the nuclear accident. However, none of them were able to perfectly simulate the distribution of dose rates observed after the accident over eastern Japan. This was partly due to the error of the wind vectors and precipitations used in the numerical simulations; unfortunately, their deterministic simulations could not deal with the probability distribution of the simulation results and errors. Therefore, an ensemble simulation of the atmospheric radionuclides was performed using the ensemble Kalman filter (EnKF) data assimilation system coupled with the Japan Meteorological Agency (JMA) non-hydrostatic mesoscale model (NHM); this mesoscale model has been used operationally for daily weather forecasts by JMA. Meteorological observations were provided to the EnKF data assimilation system from the JMA operational-weather-forecast dataset. Through this ensemble data assimilation, twenty members of the meteorological analysis over eastern Japan from 11 to 31 March 2011 were successfully obtained. Using these meteorological ensemble analysis members, the radionuclide behavior in the atmosphere such as advection, convection, diffusion, dry deposition, and wet deposition was simulated. This ensemble simulation provided the multiple results of the radionuclide dispersion and distribution. Because a large ensemble deviation indicates the low accuracy of the numerical simulation, the probabilistic information is obtainable from the ensemble simulation results. For example, the uncertainty of precipitation triggered the uncertainty of wet deposition; the uncertainty of wet deposition triggered the uncertainty of atmospheric radionuclide amounts. Then the remained radionuclides were transported downwind; consequently the uncertainty signal of the radionuclide amounts was propagated downwind. The signal propagation was seen in the ensemble simulation by the tracking of the large deviation areas of radionuclide concentration and deposition. These statistics are able to provide information useful for the probabilistic prediction of radionuclides.

  8. Big genomics and clinical data analytics strategies for precision cancer prognosis.

    PubMed

    Ow, Ghim Siong; Kuznetsov, Vladimir A

    2016-11-07

    The field of personalized and precise medicine in the era of big data analytics is growing rapidly. Previously, we proposed our model of patient classification termed Prognostic Signature Vector Matching (PSVM) and identified a 37 variable signature comprising 36 let-7b associated prognostic significant mRNAs and the age risk factor that stratified large high-grade serous ovarian cancer patient cohorts into three survival-significant risk groups. Here, we investigated the predictive performance of PSVM via optimization of the prognostic variable weights, which represent the relative importance of one prognostic variable over the others. In addition, we compared several multivariate prognostic models based on PSVM with classical machine learning techniques such as K-nearest-neighbor, support vector machine, random forest, neural networks and logistic regression. Our results revealed that negative log-rank p-values provides more robust weight values as opposed to the use of other quantities such as hazard ratios, fold change, or a combination of those factors. PSVM, together with the classical machine learning classifiers were combined in an ensemble (multi-test) voting system, which collectively provides a more precise and reproducible patient stratification. The use of the multi-test system approach, rather than the search for the ideal classification/prediction method, might help to address limitations of the individual classification algorithm in specific situation.

  9. Benchmark of Machine Learning Methods for Classification of a SENTINEL-2 Image

    NASA Astrophysics Data System (ADS)

    Pirotti, F.; Sunar, F.; Piragnolo, M.

    2016-06-01

    Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.

  10. SQUEEZE-E: The Optimal Solution for Molecular Simulations with Periodic Boundary Conditions.

    PubMed

    Wassenaar, Tsjerk A; de Vries, Sjoerd; Bonvin, Alexandre M J J; Bekker, Henk

    2012-10-09

    In molecular simulations of macromolecules, it is desirable to limit the amount of solvent in the system to avoid spending computational resources on uninteresting solvent-solvent interactions. As a consequence, periodic boundary conditions are commonly used, with a simulation box chosen as small as possible, for a given minimal distance between images. Here, we describe how such a simulation cell can be set up for ensembles, taking into account a priori available or estimable information regarding conformational flexibility. Doing so ensures that any conformation present in the input ensemble will satisfy the distance criterion during the simulation. This helps avoid periodicity artifacts due to conformational changes. The method introduces three new approaches in computational geometry: (1) The first is the derivation of an optimal packing of ensembles, for which the mathematical framework is described. (2) A new method for approximating the α-hull and the contact body for single bodies and ensembles is presented, which is orders of magnitude faster than existing routines, allowing the calculation of packings of large ensembles and/or large bodies. 3. A routine is described for searching a combination of three vectors on a discretized contact body forming a reduced base for a lattice with minimal cell volume. The new algorithms reduce the time required to calculate packings of single bodies from minutes or hours to seconds. The use and efficacy of the method is demonstrated for ensembles obtained from NMR, MD simulations, and elastic network modeling. An implementation of the method has been made available online at http://haddock.chem.uu.nl/services/SQUEEZE/ and has been made available as an option for running simulations through the weNMR GRID MD server at http://haddock.science.uu.nl/enmr/services/GROMACS/main.php .

  11. Sensitivity and specificity of machine learning classifiers for glaucoma diagnosis using Spectral Domain OCT and standard automated perimetry.

    PubMed

    Silva, Fabrício R; Vidotti, Vanessa G; Cremasco, Fernanda; Dias, Marcelo; Gomi, Edson S; Costa, Vital P

    2013-01-01

    To evaluate the sensitivity and specificity of machine learning classifiers (MLCs) for glaucoma diagnosis using Spectral Domain OCT (SD-OCT) and standard automated perimetry (SAP). Observational cross-sectional study. Sixty two glaucoma patients and 48 healthy individuals were included. All patients underwent a complete ophthalmologic examination, achromatic standard automated perimetry (SAP) and retinal nerve fiber layer (RNFL) imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec Inc., Dublin, California). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters and global indices of SAP. Subsequently, the following MLCs were tested using parameters from the SD-OCT and SAP: Bagging (BAG), Naive-Bayes (NB), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RAN), Ensemble Selection (ENS), Classification Tree (CTREE), Ada Boost M1(ADA),Support Vector Machine Linear (SVML) and Support Vector Machine Gaussian (SVMG). Areas under the receiver operating characteristic curves (aROC) obtained for isolated SAP and OCT parameters were compared with MLCs using OCT+SAP data. Combining OCT and SAP data, MLCs' aROCs varied from 0.777(CTREE) to 0.946 (RAN).The best OCT+SAP aROC obtained with RAN (0.946) was significantly larger the best single OCT parameter (p<0.05), but was not significantly different from the aROC obtained with the best single SAP parameter (p=0.19). Machine learning classifiers trained on OCT and SAP data can successfully discriminate between healthy and glaucomatous eyes. The combination of OCT and SAP measurements improved the diagnostic accuracy compared with OCT data alone.

  12. A transient stochastic weather generator incorporating climate model uncertainty

    NASA Astrophysics Data System (ADS)

    Glenis, Vassilis; Pinamonti, Valentina; Hall, Jim W.; Kilsby, Chris G.

    2015-11-01

    Stochastic weather generators (WGs), which provide long synthetic time series of weather variables such as rainfall and potential evapotranspiration (PET), have found widespread use in water resources modelling. When conditioned upon the changes in climatic statistics (change factors, CFs) predicted by climate models, WGs provide a useful tool for climate impacts assessment and adaption planning. The latest climate modelling exercises have involved large numbers of global and regional climate models integrations, designed to explore the implications of uncertainties in the climate model formulation and parameter settings: so called 'perturbed physics ensembles' (PPEs). In this paper we show how these climate model uncertainties can be propagated through to impact studies by testing multiple vectors of CFs, each vector derived from a different sample from a PPE. We combine this with a new methodology to parameterise the projected time-evolution of CFs. We demonstrate how, when conditioned upon these time-dependent CFs, an existing, well validated and widely used WG can be used to generate non-stationary simulations of future climate that are consistent with probabilistic outputs from the Met Office Hadley Centre's Perturbed Physics Ensemble. The WG enables extensive sampling of natural variability and climate model uncertainty, providing the basis for development of robust water resources management strategies in the context of a non-stationary climate.

  13. Predicting areas of sustainable error growth in quasigeostrophic flows using perturbation alignment properties

    NASA Astrophysics Data System (ADS)

    Rivière, G.; Hua, B. L.

    2004-10-01

    A new perturbation initialization method is used to quantify error growth due to inaccuracies of the forecast model initial conditions in a quasigeostrophic box ocean model describing a wind-driven double gyre circulation. This method is based on recent analytical results on Lagrangian alignment dynamics of the perturbation velocity vector in quasigeostrophic flows. More specifically, it consists in initializing a unique perturbation from the sole knowledge of the control flow properties at the initial time of the forecast and whose velocity vector orientation satisfies a Lagrangian equilibrium criterion. This Alignment-based Initialization method is hereafter denoted as the AI method.In terms of spatial distribution of the errors, we have compared favorably the AI error forecast with the mean error obtained with a Monte-Carlo ensemble prediction. It is shown that the AI forecast is on average as efficient as the error forecast initialized with the leading singular vector for the palenstrophy norm, and significantly more efficient than that for total energy and enstrophy norms. Furthermore, a more precise examination shows that the AI forecast is systematically relevant for all control flows whereas the palenstrophy singular vector forecast leads sometimes to very good scores and sometimes to very bad ones.A principal component analysis at the final time of the forecast shows that the AI mode spatial structure is comparable to that of the first eigenvector of the error covariance matrix for a "bred mode" ensemble. Furthermore, the kinetic energy of the AI mode grows at the same constant rate as that of the "bred modes" from the initial time to the final time of the forecast and is therefore characterized by a sustained phase of error growth. In this sense, the AI mode based on Lagrangian dynamics of the perturbation velocity orientation provides a rationale of the "bred mode" behavior.

  14. Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.

    PubMed

    Wang, Xiao; Zhang, Jun; Li, Guo-Zheng

    2015-01-01

    It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins. In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/ and http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/ respectively.

  15. Operational planning using Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS)

    NASA Astrophysics Data System (ADS)

    O'Connor, Alison; Kirtman, Benjamin; Harrison, Scott; Gorman, Joe

    2016-05-01

    The US Navy faces several limitations when planning operations in regard to forecasting environmental conditions. Currently, mission analysis and planning tools rely heavily on short-term (less than a week) forecasts or long-term statistical climate products. However, newly available data in the form of weather forecast ensembles provides dynamical and statistical extended-range predictions that can produce more accurate predictions if ensemble members can be combined correctly. Charles River Analytics is designing the Climatological Observations for Maritime Prediction and Analysis Support Service (COMPASS), which performs data fusion over extended-range multi-model ensembles, such as the North American Multi-Model Ensemble (NMME), to produce a unified forecast for several weeks to several seasons in the future. We evaluated thirty years of forecasts using machine learning to select predictions for an all-encompassing and superior forecast that can be used to inform the Navy's decision planning process.

  16. Ensemble coding of face identity is not independent of the coding of individual identity.

    PubMed

    Neumann, Markus F; Ng, Ryan; Rhodes, Gillian; Palermo, Romina

    2018-06-01

    Information about a group of similar objects can be summarized into a compressed code, known as ensemble coding. Ensemble coding of simple stimuli (e.g., groups of circles) can occur in the absence of detailed exemplar coding, suggesting dissociable processes. Here, we investigate whether a dissociation would still be apparent when coding facial identity, where individual exemplar information is much more important. We examined whether ensemble coding can occur when exemplar coding is difficult, as a result of large sets or short viewing times, or whether the two types of coding are positively associated. We found a positive association, whereby both ensemble and exemplar coding were reduced for larger groups and shorter viewing times. There was no evidence for ensemble coding in the absence of exemplar coding. At longer presentation times, there was an unexpected dissociation, where exemplar coding increased yet ensemble coding decreased, suggesting that robust information about face identity might suppress ensemble coding. Thus, for face identity, we did not find the classic dissociation-of access to ensemble information in the absence of detailed exemplar information-that has been used to support claims of distinct mechanisms for ensemble and exemplar coding.

  17. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.

  18. Ensembl comparative genomics resources

    PubMed Central

    Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  19. Ensemble modelling and structured decision-making to support Emergency Disease Management.

    PubMed

    Webb, Colleen T; Ferrari, Matthew; Lindström, Tom; Carpenter, Tim; Dürr, Salome; Garner, Graeme; Jewell, Chris; Stevenson, Mark; Ward, Michael P; Werkman, Marleen; Backer, Jantien; Tildesley, Michael

    2017-03-01

    Epidemiological models in animal health are commonly used as decision-support tools to understand the impact of various control actions on infection spread in susceptible populations. Different models contain different assumptions and parameterizations, and policy decisions might be improved by considering outputs from multiple models. However, a transparent decision-support framework to integrate outputs from multiple models is nascent in epidemiology. Ensemble modelling and structured decision-making integrate the outputs of multiple models, compare policy actions and support policy decision-making. We briefly review the epidemiological application of ensemble modelling and structured decision-making and illustrate the potential of these methods using foot and mouth disease (FMD) models. In case study one, we apply structured decision-making to compare five possible control actions across three FMD models and show which control actions and outbreak costs are robustly supported and which are impacted by model uncertainty. In case study two, we develop a methodology for weighting the outputs of different models and show how different weighting schemes may impact the choice of control action. Using these case studies, we broadly illustrate the potential of ensemble modelling and structured decision-making in epidemiology to provide better information for decision-making and outline necessary development of these methods for their further application. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  20. Research on bearing fault diagnosis of large machinery based on mathematical morphology

    NASA Astrophysics Data System (ADS)

    Wang, Yu

    2018-04-01

    To study the automatic diagnosis of large machinery fault based on support vector machine, combining the four common faults of the large machinery, the support vector machine is used to classify and identify the fault. The extracted feature vectors are entered. The feature vector is trained and identified by multi - classification method. The optimal parameters of the support vector machine are searched by trial and error method and cross validation method. Then, the support vector machine is compared with BP neural network. The results show that the support vector machines are short in time and high in classification accuracy. It is more suitable for the research of fault diagnosis in large machinery. Therefore, it can be concluded that the training speed of support vector machines (SVM) is fast and the performance is good.

  1. Background Error Covariance Estimation using Information from a Single Model Trajectory with Application to Ocean Data Assimilation into the GEOS-5 Coupled Model

    NASA Technical Reports Server (NTRS)

    Keppenne, Christian L.; Rienecker, Michele M.; Kovach, Robin M.; Vernieres, Guillaume; Koster, Randal D. (Editor)

    2014-01-01

    An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory. SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.

  2. Dispersion Modeling Using Ensemble Forecasts Compared to ETEX Measurements.

    NASA Astrophysics Data System (ADS)

    Straume, Anne Grete; N'dri Koffi, Ernest; Nodop, Katrin

    1998-11-01

    Numerous numerical models are developed to predict long-range transport of hazardous air pollution in connection with accidental releases. When evaluating and improving such a model, it is important to detect uncertainties connected to the meteorological input data. A Lagrangian dispersion model, the Severe Nuclear Accident Program, is used here to investigate the effect of errors in the meteorological input data due to analysis error. An ensemble forecast, produced at the European Centre for Medium-Range Weather Forecasts, is then used as model input. The ensemble forecast members are generated by perturbing the initial meteorological fields of the weather forecast. The perturbations are calculated from singular vectors meant to represent possible forecast developments generated by instabilities in the atmospheric flow during the early part of the forecast. The instabilities are generated by errors in the analyzed fields. Puff predictions from the dispersion model, using ensemble forecast input, are compared, and a large spread in the predicted puff evolutions is found. This shows that the quality of the meteorological input data is important for the success of the dispersion model. In order to evaluate the dispersion model, the calculations are compared with measurements from the European Tracer Experiment. The model manages to predict the measured puff evolution concerning shape and time of arrival to a fairly high extent, up to 60 h after the start of the release. The modeled puff is still too narrow in the advection direction.

  3. Background Error Covariance Estimation Using Information from a Single Model Trajectory with Application to Ocean Data Assimilation

    NASA Technical Reports Server (NTRS)

    Keppenne, Christian L.; Rienecker, Michele; Kovach, Robin M.; Vernieres, Guillaume

    2014-01-01

    An attractive property of ensemble data assimilation methods is that they provide flow dependent background error covariance estimates which can be used to update fields of observed variables as well as fields of unobserved model variables. Two methods to estimate background error covariances are introduced which share the above property with ensemble data assimilation methods but do not involve the integration of multiple model trajectories. Instead, all the necessary covariance information is obtained from a single model integration. The Space Adaptive Forecast error Estimation (SAFE) algorithm estimates error covariances from the spatial distribution of model variables within a single state vector. The Flow Adaptive error Statistics from a Time series (FAST) method constructs an ensemble sampled from a moving window along a model trajectory.SAFE and FAST are applied to the assimilation of Argo temperature profiles into version 4.1 of the Modular Ocean Model (MOM4.1) coupled to the GEOS-5 atmospheric model and to the CICE sea ice model. The results are validated against unassimilated Argo salinity data. They show that SAFE and FAST are competitive with the ensemble optimal interpolation (EnOI) used by the Global Modeling and Assimilation Office (GMAO) to produce its ocean analysis. Because of their reduced cost, SAFE and FAST hold promise for high-resolution data assimilation applications.

  4. Global ensemble texture representations are critical to rapid scene perception.

    PubMed

    Brady, Timothy F; Shafer-Skelton, Anna; Alvarez, George A

    2017-06-01

    Traditionally, recognizing the objects within a scene has been treated as a prerequisite to recognizing the scene itself. However, research now suggests that the ability to rapidly recognize visual scenes could be supported by global properties of the scene itself rather than the objects within the scene. Here, we argue for a particular instantiation of this view: That scenes are recognized by treating them as a global texture and processing the pattern of orientations and spatial frequencies across different areas of the scene without recognizing any objects. To test this model, we asked whether there is a link between how proficient individuals are at rapid scene perception and how proficiently they represent simple spatial patterns of orientation information (global ensemble texture). We find a significant and selective correlation between these tasks, suggesting a link between scene perception and spatial ensemble tasks but not nonspatial summary statistics In a second and third experiment, we additionally show that global ensemble texture information is not only associated with scene recognition, but that preserving only global ensemble texture information from scenes is sufficient to support rapid scene perception; however, preserving the same information is not sufficient for object recognition. Thus, global ensemble texture alone is sufficient to allow activation of scene representations but not object representations. Together, these results provide evidence for a view of scene recognition based on global ensemble texture rather than a view based purely on objects or on nonspatially localized global properties. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  5. Object-based habitat mapping using very high spatial resolution multispectral and hyperspectral imagery with LiDAR data

    NASA Astrophysics Data System (ADS)

    Onojeghuo, Alex Okiemute; Onojeghuo, Ajoke Ruth

    2017-07-01

    This study investigated the combined use of multispectral/hyperspectral imagery and LiDAR data for habitat mapping across parts of south Cumbria, North West England. The methodology adopted in this study integrated spectral information contained in pansharp QuickBird multispectral/AISA Eagle hyperspectral imagery and LiDAR-derived measures with object-based machine learning classifiers and ensemble analysis techniques. Using the LiDAR point cloud data, elevation models (such as the Digital Surface Model and Digital Terrain Model raster) and intensity features were extracted directly. The LiDAR-derived measures exploited in this study included Canopy Height Model, intensity and topographic information (i.e. mean, maximum and standard deviation). These three LiDAR measures were combined with spectral information contained in the pansharp QuickBird and Eagle MNF transformed imagery for image classification experiments. A fusion of pansharp QuickBird multispectral and Eagle MNF hyperspectral imagery with all LiDAR-derived measures generated the best classification accuracies, 89.8 and 92.6% respectively. These results were generated with the Support Vector Machine and Random Forest machine learning algorithms respectively. The ensemble analysis of all three learning machine classifiers for the pansharp QuickBird and Eagle MNF fused data outputs did not significantly increase the overall classification accuracy. Results of the study demonstrate the potential of combining either very high spatial resolution multispectral or hyperspectral imagery with LiDAR data for habitat mapping.

  6. Distributed Fading Memory for Stimulus Properties in the Primary Visual Cortex

    PubMed Central

    Singer, Wolf; Maass, Wolfgang

    2009-01-01

    It is currently not known how distributed neuronal responses in early visual areas carry stimulus-related information. We made multielectrode recordings from cat primary visual cortex and applied methods from machine learning in order to analyze the temporal evolution of stimulus-related information in the spiking activity of large ensembles of around 100 neurons. We used sequences of up to three different visual stimuli (letters of the alphabet) presented for 100 ms and with intervals of 100 ms or larger. Most of the information about visual stimuli extractable by sophisticated methods of machine learning, i.e., support vector machines with nonlinear kernel functions, was also extractable by simple linear classification such as can be achieved by individual neurons. New stimuli did not erase information about previous stimuli. The responses to the most recent stimulus contained about equal amounts of information about both this and the preceding stimulus. This information was encoded both in the discharge rates (response amplitudes) of the ensemble of neurons and, when using short time constants for integration (e.g., 20 ms), in the precise timing of individual spikes (≤∼20 ms), and persisted for several 100 ms beyond the offset of stimuli. The results indicate that the network from which we recorded is endowed with fading memory and is capable of performing online computations utilizing information about temporally sequential stimuli. This result challenges models assuming frame-by-frame analyses of sequential inputs. PMID:20027205

  7. Random ensemble learning for EEG classification.

    PubMed

    Hosseini, Mohammad-Parsa; Pompili, Dario; Elisevich, Kost; Soltanian-Zadeh, Hamid

    2018-01-01

    Real-time detection of seizure activity in epilepsy patients is critical in averting seizure activity and improving patients' quality of life. Accurate evaluation, presurgical assessment, seizure prevention, and emergency alerts all depend on the rapid detection of seizure onset. A new method of feature selection and classification for rapid and precise seizure detection is discussed wherein informative components of electroencephalogram (EEG)-derived data are extracted and an automatic method is presented using infinite independent component analysis (I-ICA) to select independent features. The feature space is divided into subspaces via random selection and multichannel support vector machines (SVMs) are used to classify these subspaces. The result of each classifier is then combined by majority voting to establish the final output. In addition, a random subspace ensemble using a combination of SVM, multilayer perceptron (MLP) neural network and an extended k-nearest neighbors (k-NN), called extended nearest neighbor (ENN), is developed for the EEG and electrocorticography (ECoG) big data problem. To evaluate the solution, a benchmark ECoG of eight patients with temporal and extratemporal epilepsy was implemented in a distributed computing framework as a multitier cloud-computing architecture. Using leave-one-out cross-validation, the accuracy, sensitivity, specificity, and both false positive and false negative ratios of the proposed method were found to be 0.97, 0.98, 0.96, 0.04, and 0.02, respectively. Application of the solution to cases under investigation with ECoG has also been effected to demonstrate its utility. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Distinct neural patterns enable grasp types decoding in monkey dorsal premotor cortex.

    PubMed

    Hao, Yaoyao; Zhang, Qiaosheng; Controzzi, Marco; Cipriani, Christian; Li, Yue; Li, Juncheng; Zhang, Shaomin; Wang, Yiwen; Chen, Weidong; Chiara Carrozza, Maria; Zheng, Xiaoxiang

    2014-12-01

    Recent studies have shown that dorsal premotor cortex (PMd), a cortical area in the dorsomedial grasp pathway, is involved in grasp movements. However, the neural ensemble firing property of PMd during grasp movements and the extent to which it can be used for grasp decoding are still unclear. To address these issues, we used multielectrode arrays to record both spike and local field potential (LFP) signals in PMd in macaque monkeys performing reaching and grasping of one of four differently shaped objects. Single and population neuronal activity showed distinct patterns during execution of different grip types. Cluster analysis of neural ensemble signals indicated that the grasp related patterns emerged soon (200-300 ms) after the go cue signal, and faded away during the hold period. The timing and duration of the patterns varied depending on the behaviors of individual monkey. Application of support vector machine model to stable activity patterns revealed classification accuracies of 94% and 89% for each of the two monkeys, indicating a robust, decodable grasp pattern encoded in the PMd. Grasp decoding using LFPs, especially the high-frequency bands, also produced high decoding accuracies. This study is the first to specify the neuronal population encoding of grasp during the time course of grasp. We demonstrate high grasp decoding performance in PMd. These findings, combined with previous evidence for reach related modulation studies, suggest that PMd may play an important role in generation and maintenance of grasp action and may be a suitable locus for brain-machine interface applications.

  9. Ensemble Architecture for Prediction of Enzyme-ligand Binding Residues Using Evolutionary Information.

    PubMed

    Pai, Priyadarshini P; Dattatreya, Rohit Kadam; Mondal, Sukanta

    2017-11-01

    Enzyme interactions with ligands are crucial for various biochemical reactions governing life. Over many years attempts to identify these residues for biotechnological manipulations have been made using experimental and computational techniques. The computational approaches have gathered impetus with the accruing availability of sequence and structure information, broadly classified into template-based and de novo methods. One of the predominant de novo methods using sequence information involves application of biological properties for supervised machine learning. Here, we propose a support vector machines-based ensemble for prediction of protein-ligand interacting residues using one of the most important discriminative contributing properties in the interacting residue neighbourhood, i. e., evolutionary information in the form of position-specific- scoring matrix (PSSM). The study has been performed on a non-redundant dataset comprising of 9269 interacting and 91773 non-interacting residues for prediction model generation and further evaluation. Of the various PSSM-based models explored, the proposed method named ROBBY (pRediction Of Biologically relevant small molecule Binding residues on enzYmes) shows an accuracy of 84.0 %, Matthews Correlation Coefficient of 0.343 and F-measure of 39.0 % on 78 test enzymes. Further, scope of adding domain knowledge such as pocket information has also been investigated; results showed significant enhancement in method precision. Findings are hoped to boost the reliability of small-molecule ligand interaction prediction for enzyme applications and drug design. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Ensemble of random forests One vs. Rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares.

    PubMed

    Ramírez, J; Górriz, J M; Ortiz, A; Martínez-Murcia, F J; Segovia, F; Salas-Gonzalez, D; Castillo-Barnes, D; Illán, I A; Puntonet, C G

    2018-05-15

    Alzheimer's disease (AD) is the most common cause of dementia in the elderly and affects approximately 30 million individuals worldwide. Mild cognitive impairment (MCI) is very frequently a prodromal phase of AD, and existing studies have suggested that people with MCI tend to progress to AD at a rate of about 10-15% per year. However, the ability of clinicians and machine learning systems to predict AD based on MRI biomarkers at an early stage is still a challenging problem that can have a great impact in improving treatments. The proposed system, developed by the SiPBA-UGR team for this challenge, is based on feature standardization, ANOVA feature selection, partial least squares feature dimension reduction and an ensemble of One vs. Rest random forest classifiers. With the aim of improving its performance when discriminating healthy controls (HC) from MCI, a second binary classification level was introduced that reconsiders the HC and MCI predictions of the first level. The system was trained and evaluated on an ADNI datasets that consist of T1-weighted MRI morphological measurements from HC, stable MCI, converter MCI and AD subjects. The proposed system yields a 56.25% classification score on the test subset which consists of 160 real subjects. The classifier yielded the best performance when compared to: (i) One vs. One (OvO), One vs. Rest (OvR) and error correcting output codes (ECOC) as strategies for reducing the multiclass classification task to multiple binary classification problems, (ii) support vector machines, gradient boosting classifier and random forest as base binary classifiers, and (iii) bagging ensemble learning. A robust method has been proposed for the international challenge on MCI prediction based on MRI data. The system yielded the second best performance during the competition with an accuracy rate of 56.25% when evaluated on the real subjects of the test set. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Opportunities and challenges for extended-range predictions of tropical cyclone impacts on hydrological predictions

    NASA Astrophysics Data System (ADS)

    Tsai, Hsiao-Chung; Elsberry, Russell L.

    2013-12-01

    SummaryAn opportunity exists to extend support to the decision-making processes of water resource management and hydrological operations by providing extended-range tropical cyclone (TC) formation and track forecasts in the western North Pacific from the 51-member ECMWF 32-day ensemble. A new objective verification technique demonstrates that the ECMWF ensemble can predict most of the formations and tracks of the TCs during July 2009 to December 2010, even for most of the tropical depressions. Due to the relatively large number of false-alarm TCs in the ECMWF ensemble forecasts that would cause problems for support of hydrological operations, characteristics of these false alarms are discussed. Special attention is given to the ability of the ECMWF ensemble to predict periods of no-TCs in the Taiwan area, since water resource management decisions also depend on the absence of typhoon-related rainfall. A three-tier approach is proposed to provide support for hydrological operations via extended-range forecasts twice weekly on the 30-day timescale, twice-daily on the 15-day timescale, and up to four times a day with a consensus of high-resolution deterministic models.

  12. Canonical-ensemble state-averaged complete active space self-consistent field (SA-CASSCF) strategy for problems with more diabatic than adiabatic states: Charge-bond resonance in monomethine cyanines

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Olsen, Seth, E-mail: seth.olsen@uq.edu.au

    2015-01-28

    This paper reviews basic results from a theory of the a priori classical probabilities (weights) in state-averaged complete active space self-consistent field (SA-CASSCF) models. It addresses how the classical probabilities limit the invariance of the self-consistency condition to transformations of the complete active space configuration interaction (CAS-CI) problem. Such transformations are of interest for choosing representations of the SA-CASSCF solution that are diabatic with respect to some interaction. I achieve the known result that a SA-CASSCF can be self-consistently transformed only within degenerate subspaces of the CAS-CI ensemble density matrix. For uniformly distributed (“microcanonical”) SA-CASSCF ensembles, self-consistency is invariant tomore » any unitary CAS-CI transformation that acts locally on the ensemble support. Most SA-CASSCF applications in current literature are microcanonical. A problem with microcanonical SA-CASSCF models for problems with “more diabatic than adiabatic” states is described. The problem is that not all diabatic energies and couplings are self-consistently resolvable. A canonical-ensemble SA-CASSCF strategy is proposed to solve the problem. For canonical-ensemble SA-CASSCF, the equilibrated ensemble is a Boltzmann density matrix parametrized by its own CAS-CI Hamiltonian and a Lagrange multiplier acting as an inverse “temperature,” unrelated to the physical temperature. Like the convergence criterion for microcanonical-ensemble SA-CASSCF, the equilibration condition for canonical-ensemble SA-CASSCF is invariant to transformations that act locally on the ensemble CAS-CI density matrix. The advantage of a canonical-ensemble description is that more adiabatic states can be included in the support of the ensemble without running into convergence problems. The constraint on the dimensionality of the problem is relieved by the introduction of an energy constraint. The method is illustrated with a complete active space valence-bond (CASVB) analysis of the charge/bond resonance electronic structure of a monomethine cyanine: Michler’s hydrol blue. The diabatic CASVB representation is shown to vary weakly for “temperatures” corresponding to visible photon energies. Canonical-ensemble SA-CASSCF enables the resolution of energies and couplings for all covalent and ionic CASVB structures contributing to the SA-CASSCF ensemble. The CASVB solution describes resonance of charge- and bond-localized electronic structures interacting via bridge resonance superexchange. The resonance couplings can be separated into channels associated with either covalent charge delocalization or chemical bonding interactions, with the latter significantly stronger than the former.« less

  13. Canonical-ensemble state-averaged complete active space self-consistent field (SA-CASSCF) strategy for problems with more diabatic than adiabatic states: charge-bond resonance in monomethine cyanines.

    PubMed

    Olsen, Seth

    2015-01-28

    This paper reviews basic results from a theory of the a priori classical probabilities (weights) in state-averaged complete active space self-consistent field (SA-CASSCF) models. It addresses how the classical probabilities limit the invariance of the self-consistency condition to transformations of the complete active space configuration interaction (CAS-CI) problem. Such transformations are of interest for choosing representations of the SA-CASSCF solution that are diabatic with respect to some interaction. I achieve the known result that a SA-CASSCF can be self-consistently transformed only within degenerate subspaces of the CAS-CI ensemble density matrix. For uniformly distributed ("microcanonical") SA-CASSCF ensembles, self-consistency is invariant to any unitary CAS-CI transformation that acts locally on the ensemble support. Most SA-CASSCF applications in current literature are microcanonical. A problem with microcanonical SA-CASSCF models for problems with "more diabatic than adiabatic" states is described. The problem is that not all diabatic energies and couplings are self-consistently resolvable. A canonical-ensemble SA-CASSCF strategy is proposed to solve the problem. For canonical-ensemble SA-CASSCF, the equilibrated ensemble is a Boltzmann density matrix parametrized by its own CAS-CI Hamiltonian and a Lagrange multiplier acting as an inverse "temperature," unrelated to the physical temperature. Like the convergence criterion for microcanonical-ensemble SA-CASSCF, the equilibration condition for canonical-ensemble SA-CASSCF is invariant to transformations that act locally on the ensemble CAS-CI density matrix. The advantage of a canonical-ensemble description is that more adiabatic states can be included in the support of the ensemble without running into convergence problems. The constraint on the dimensionality of the problem is relieved by the introduction of an energy constraint. The method is illustrated with a complete active space valence-bond (CASVB) analysis of the charge/bond resonance electronic structure of a monomethine cyanine: Michler's hydrol blue. The diabatic CASVB representation is shown to vary weakly for "temperatures" corresponding to visible photon energies. Canonical-ensemble SA-CASSCF enables the resolution of energies and couplings for all covalent and ionic CASVB structures contributing to the SA-CASSCF ensemble. The CASVB solution describes resonance of charge- and bond-localized electronic structures interacting via bridge resonance superexchange. The resonance couplings can be separated into channels associated with either covalent charge delocalization or chemical bonding interactions, with the latter significantly stronger than the former.

  14. The Hydrologic Ensemble Prediction Experiment (HEPEX)

    NASA Astrophysics Data System (ADS)

    Wood, A. W.; Thielen, J.; Pappenberger, F.; Schaake, J. C.; Hartman, R. K.

    2012-12-01

    The Hydrologic Ensemble Prediction Experiment was established in March, 2004, at a workshop hosted by the European Center for Medium Range Weather Forecasting (ECMWF). With support from the US National Weather Service (NWS) and the European Commission (EC), the HEPEX goal was to bring the international hydrological and meteorological communities together to advance the understanding and adoption of hydrological ensemble forecasts for decision support in emergency management and water resources sectors. The strategy to meet this goal includes meetings that connect the user, forecast producer and research communities to exchange ideas, data and methods; the coordination of experiments to address specific challenges; and the formation of testbeds to facilitate shared experimentation. HEPEX has organized about a dozen international workshops, as well as sessions at scientific meetings (including AMS, AGU and EGU) and special issues of scientific journals where workshop results have been published. Today, the HEPEX mission is to demonstrate the added value of hydrological ensemble prediction systems (HEPS) for emergency management and water resources sectors to make decisions that have important consequences for economy, public health, safety, and the environment. HEPEX is now organised around six major themes that represent core elements of a hydrologic ensemble prediction enterprise: input and pre-processing, ensemble techniques, data assimilation, post-processing, verification, and communication and use in decision making. This poster presents an overview of recent and planned HEPEX activities, highlighting case studies that exemplify the focus and objectives of HEPEX.

  15. Smile detectors correlation

    NASA Astrophysics Data System (ADS)

    Yuksel, Kivanc; Chang, Xin; Skarbek, Władysław

    2017-08-01

    The novel smile recognition algorithm is presented based on extraction of 68 facial salient points (fp68) using the ensemble of regression trees. The smile detector exploits the Support Vector Machine linear model. It is trained with few hundreds exemplar images by SVM algorithm working in 136 dimensional space. It is shown by the strict statistical data analysis that such geometric detector strongly depends on the geometry of mouth opening area, measured by triangulation of outer lip contour. To this goal two Bayesian detectors were developed and compared with SVM detector. The first uses the mouth area in 2D image, while the second refers to the mouth area in 3D animated face model. The 3D modeling is based on Candide-3 model and it is performed in real time along with three smile detectors and statistics estimators. The mouth area/Bayesian detectors exhibit high correlation with fp68/SVM detector in a range [0:8; 1:0], depending mainly on light conditions and individual features with advantage of 3D technique, especially in hard light conditions.

  16. Computational neuroanatomy using brain deformations: From brain parcellation to multivariate pattern analysis and machine learning.

    PubMed

    Davatzikos, Christos

    2016-10-01

    The past 20 years have seen a mushrooming growth of the field of computational neuroanatomy. Much of this work has been enabled by the development and refinement of powerful, high-dimensional image warping methods, which have enabled detailed brain parcellation, voxel-based morphometric analyses, and multivariate pattern analyses using machine learning approaches. The evolution of these 3 types of analyses over the years has overcome many challenges. We present the evolution of our work in these 3 directions, which largely follows the evolution of this field. We discuss the progression from single-atlas, single-registration brain parcellation work to current ensemble-based parcellation; from relatively basic mass-univariate t-tests to optimized regional pattern analyses combining deformations and residuals; and from basic application of support vector machines to generative-discriminative formulations of multivariate pattern analyses, and to methods dealing with heterogeneity of neuroanatomical patterns. We conclude with discussion of some of the future directions and challenges. Copyright © 2016. Published by Elsevier B.V.

  17. Machine learning models for lipophilicity and their domain of applicability.

    PubMed

    Schroeter, Timon; Schwaighofer, Anton; Mika, Sebastian; Laak, Antonius Ter; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert

    2007-01-01

    Unfavorable lipophilicity and water solubility cause many drug failures; therefore these properties have to be taken into account early on in lead discovery. Commercial tools for predicting lipophilicity usually have been trained on small and neutral molecules, and are thus often unable to accurately predict in-house data. Using a modern Bayesian machine learning algorithm--a Gaussian process model--this study constructs a log D7 model based on 14,556 drug discovery compounds of Bayer Schering Pharma. Performance is compared with support vector machines, decision trees, ridge regression, and four commercial tools. In a blind test on 7013 new measurements from the last months (including compounds from new projects) 81% were predicted correctly within 1 log unit, compared to only 44% achieved by commercial software. Additional evaluations using public data are presented. We consider error bars for each method (model based error bars, ensemble based, and distance based approaches), and investigate how well they quantify the domain of applicability of each model.

  18. Mapping of Coral Reef Environment in the Arabian Gulf Using Multispectral Remote Sensing

    NASA Astrophysics Data System (ADS)

    Ben-Romdhane, H.; Marpu, P. R.; Ghedira, H.; Ouarda, T. B. M. J.

    2016-06-01

    Coral reefs of the Arabian Gulf are subject to several pressures, thus requiring conservation actions. Well-designed conservation plans involve efficient mapping and monitoring systems. Satellite remote sensing is a cost-effective tool for seafloor mapping at large scales. Multispectral remote sensing of coastal habitats, like those of the Arabian Gulf, presents a special challenge due to their complexity and heterogeneity. The present study evaluates the potential of multispectral sensor DubaiSat-2 in mapping benthic communities of United Arab Emirates. We propose to use a spectral-spatial method that includes multilevel segmentation, nonlinear feature analysis and ensemble learning methods. Support Vector Machine (SVM) is used for comparison of classification performances. Comparative data were derived from the habitat maps published by the Environment Agency-Abu Dhabi. The spectral-spatial method produced 96.41% mapping accuracy. SVM classification is assessed to be 94.17% accurate. The adaptation of these methods can help achieving well-designed coastal management plans in the region.

  19. Computational neuroanatomy using brain deformations: From brain parcellation to multivariate pattern analysis and machine learning

    PubMed Central

    Davatzikos, Christos

    2017-01-01

    The past 20 years have seen a mushrooming growth of the field of computational neuroanatomy. Much of this work has been enabled by the development and refinement of powerful, high-dimensional image warping methods, which have enabled detailed brain parcellation, voxel-based morphometric analyses, and multivariate pattern analyses using machine learning approaches. The evolution of these 3 types of analyses over the years has overcome many challenges. We present the evolution of our work in these 3 directions, which largely follows the evolution of this field. We discuss the progression from single-atlas, single-registration brain parcellation work to current ensemble-based parcellation; from relatively basic mass-univariate t-tests to optimized regional pattern analyses combining deformations and residuals; and from basic application of support vector machines to generative-discriminative formulations of multivariate pattern analyses, and to methods dealing with heterogeneity of neuroanatomical patterns. We conclude with discussion of some of the future directions and challenges. PMID:27514582

  20. Wavelet images and Chou's pseudo amino acid composition for protein classification.

    PubMed

    Nanni, Loris; Brahnam, Sheryl; Lumini, Alessandra

    2012-08-01

    The last decade has seen an explosion in the collection of protein data. To actualize the potential offered by this wealth of data, it is important to develop machine systems capable of classifying and extracting features from proteins. Reliable machine systems for protein classification offer many benefits, including the promise of finding novel drugs and vaccines. In developing our system, we analyze and compare several feature extraction methods used in protein classification that are based on the calculation of texture descriptors starting from a wavelet representation of the protein. We then feed these texture-based representations of the protein into an Adaboost ensemble of neural network or a support vector machine classifier. In addition, we perform experiments that combine our feature extraction methods with a standard method that is based on the Chou's pseudo amino acid composition. Using several datasets, we show that our best approach outperforms standard methods. The Matlab code of the proposed protein descriptors is available at http://bias.csr.unibo.it/nanni/wave.rar .

  1. Plurigon: three dimensional visualization and classification of high-dimensionality data

    PubMed Central

    Martin, Bronwen; Chen, Hongyu; Daimon, Caitlin M.; Chadwick, Wayne; Siddiqui, Sana; Maudsley, Stuart

    2013-01-01

    High-dimensionality data is rapidly becoming the norm for biomedical sciences and many other analytical disciplines. Not only is the collection and processing time for such data becoming problematic, but it has become increasingly difficult to form a comprehensive appreciation of high-dimensionality data. Though data analysis methods for coping with multivariate data are well-documented in technical fields such as computer science, little effort is currently being expended to condense data vectors that exist beyond the realm of physical space into an easily interpretable and aesthetic form. To address this important need, we have developed Plurigon, a data visualization and classification tool for the integration of high-dimensionality visualization algorithms with a user-friendly, interactive graphical interface. Unlike existing data visualization methods, which are focused on an ensemble of data points, Plurigon places a strong emphasis upon the visualization of a single data point and its determining characteristics. Multivariate data vectors are represented in the form of a deformed sphere with a distinct topology of hills, valleys, plateaus, peaks, and crevices. The gestalt structure of the resultant Plurigon object generates an easily-appreciable model. User interaction with the Plurigon is extensive; zoom, rotation, axial and vector display, feature extraction, and anaglyph stereoscopy are currently supported. With Plurigon and its ability to analyze high-complexity data, we hope to see a unification of biomedical and computational sciences as well as practical applications in a wide array of scientific disciplines. Increased accessibility to the analysis of high-dimensionality data may increase the number of new discoveries and breakthroughs, ranging from drug screening to disease diagnosis to medical literature mining. PMID:23885241

  2. Prediction of cause of death from forensic autopsy reports using text classification techniques: A comparative study.

    PubMed

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa

    2018-07-01

    Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  3. Present and Future Projections of Habitat Suitability of the Asian Tiger Mosquito, a Vector of Viral Pathogens, from Global Climate Simulations.

    NASA Astrophysics Data System (ADS)

    Proestos, Y.; Christophides, G.; Erguler, K.; Tanarhte, M.; Waldock, J.; Lelieveld, J.

    2014-12-01

    Climate change can influence the transmission of vector borne diseases (VBDs) through altering the habitat suitability of insect vectors. Here we present global climate model simulations and evaluate the associated uncertainties in view of the main meteorological factors that may affect the distribution of the Asian Tiger mosquito (Aedes albopictus), which can transmit pathogens that cause Chikungunya, Dengue fever, yellow fever and various encephalitides. Using a general circulation model (GCM) at 50 km horizontal resolution to simulate mosquito survival variables including temperature, precipitation and relative humidity, we present both global and regional projections of the habitat suitability up to the middle of the 21st century. The model resolution of 50 km allows evaluation against previous projections for Europe and provides a basis for comparative analyses with other regions. Model uncertainties and performance are addressed in light of the recent CMIP5 ensemble climate model simulations for the RCP8.5 concentration pathway and using meteorological re-analysis data (ERA-Interim/ECMWF) for the recent past. Uncertainty ranges associated with the thresholds of meteorological variables that may affect the distribution of Ae. albopictus are diagnosed using fuzzy-logic methodology, notably to assess the influence of selected meteorological criteria and combinations of criteria that influence mosquito habitat suitability. From the climate projections for 2050, and adopting a habitat suitability index larger than 70%, we estimate that about 2.4 billion individuals in a land area of nearly 20 million square kilometres will potentially be exposed to Ae. albopictus. The synthesis of fuzzy-logic based on mosquito biology and climate change analysis provides new insights into the regional and global spreading of VBDs to support disease control and policy making.

  4. A new transform for the analysis of complex fractionated atrial electrograms

    PubMed Central

    2011-01-01

    Background Representation of independent biophysical sources using Fourier analysis can be inefficient because the basis is sinusoidal and general. When complex fractionated atrial electrograms (CFAE) are acquired during atrial fibrillation (AF), the electrogram morphology depends on the mix of distinct nonsinusoidal generators. Identification of these generators using efficient methods of representation and comparison would be useful for targeting catheter ablation sites to prevent arrhythmia reinduction. Method A data-driven basis and transform is described which utilizes the ensemble average of signal segments to identify and distinguish CFAE morphologic components and frequencies. Calculation of the dominant frequency (DF) of actual CFAE, and identification of simulated independent generator frequencies and morphologies embedded in CFAE, is done using a total of 216 recordings from 10 paroxysmal and 10 persistent AF patients. The transform is tested versus Fourier analysis to detect spectral components in the presence of phase noise and interference. Correspondence is shown between ensemble basis vectors of highest power and corresponding synthetic drivers embedded in CFAE. Results The ensemble basis is orthogonal, and efficient for representation of CFAE components as compared with Fourier analysis (p ≤ 0.002). When three synthetic drivers with additive phase noise and interference were decomposed, the top three peaks in the ensemble power spectrum corresponded to the driver frequencies more closely as compared with top Fourier power spectrum peaks (p ≤ 0.005). The synthesized drivers with phase noise and interference were extractable from their corresponding ensemble basis with a mean error of less than 10%. Conclusions The new transform is able to efficiently identify CFAE features using DF calculation and by discerning morphologic differences. Unlike the Fourier transform method, it does not distort CFAE signals prior to analysis, and is relatively robust to jitter in periodic events. Thus the ensemble method can provide a useful alternative for quantitative characterization of CFAE during clinical study. PMID:21569421

  5. GPU-Based Interactive Exploration and Online Probability Maps Calculation for Visualizing Assimilated Ocean Ensembles Data

    NASA Astrophysics Data System (ADS)

    Hoteit, I.; Hollt, T.; Hadwiger, M.; Knio, O. M.; Gopalakrishnan, G.; Zhan, P.

    2016-02-01

    Ocean reanalyses and forecasts are nowadays generated by combining ensemble simulations with data assimilation techniques. Most of these techniques resample the ensemble members after each assimilation cycle. Tracking behavior over time, such as all possible paths of a particle in an ensemble vector field, becomes very difficult, as the number of combinations rises exponentially with the number of assimilation cycles. In general a single possible path is not of interest but only the probabilities that any point in space might be reached by a particle at some point in time. We present an approach using probability-weighted piecewise particle trajectories to allow for interactive probability mapping. This is achieved by binning the domain and splitting up the tracing process into the individual assimilation cycles, so that particles that fall into the same bin after a cycle can be treated as a single particle with a larger probability as input for the next cycle. As a result we loose the possibility to track individual particles, but can create probability maps for any desired seed at interactive rates. The technique is integrated in an interactive visualization system that enables the visual analysis of the particle traces side by side with other forecast variables, such as the sea surface height, and their corresponding behavior over time. By harnessing the power of modern graphics processing units (GPUs) for visualization as well as computation, our system allows the user to browse through the simulation ensembles in real-time, view specific parameter settings or simulation models and move between different spatial or temporal regions without delay. In addition our system provides advanced visualizations to highlight the uncertainty, or show the complete distribution of the simulations at user-defined positions over the complete time series of the domain.

  6. Snow Radiance Data Assimilation over High Mountain Asia Using the NASA Land Information System and a Well-Trained Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Kwon, Y.; Forman, B. A.; Yoon, Y.; Kumar, S.

    2017-12-01

    High Mountain Asia (HMA) has been progressively losing ice and snow in recent decades, which could negatively impact regional water supply and native ecosystems. One goal of this study is to characterize the spatiotemporal variability of snow (and ice) across the HMA region. In addition, modeled snow water equivalent (SWE) estimates will be enhanced through the assimilation of passive microwave brightness temperatures (TB) collected by the Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E) as part of a radiance assimilation system. The radiance assimilation framework includes the NASA Land Information System (LIS) in conjunction with a well-trained support vector machine (SVM) that acts as the observation operator. The Noah Land Surface Model with multi-parameterization options (Noah-MP) is used as the prior model for simulating snow dynamics. Noah-MP is forced by meteorological fields from the NASA Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2) atmospheric reanalysis for the periods 01 Sep. 2002 to 01 Sep. 2011. The radiance assimilation system requires two separate phases: 1) training and 2) assimilation. During the training phase, a nonlinear SVM is generated for three different AMSR-E frequencies - 10.65, 18.7, and 36.5 GHz - at both vertical and horizontal polarization. The trained SVM is then used to predict TB during the assimilation phase. An ensemble Kalman filter will be used to condition the model on AMSR-E brightness temperatures not used during SVM training. The performance of the Noah-MP (with and without radiance assimilation) will be assessed via comparison to in-situ measurements, remotely-sensing geophysical retrievals, and other reanalysis products.

  7. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  8. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set.

    PubMed

    Lenselink, Eelke B; Ten Dijke, Niels; Bongers, Brandon; Papadatos, George; van Vlijmen, Herman W T; Kowalczyk, Wojtek; IJzerman, Adriaan P; van Westen, Gerard J P

    2017-08-14

    The increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics. In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution. Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method ('DNN_PCM') performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi-task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized 'DNN_PCM'). Here, a standardized set to test and evaluate different machine learning algorithms in the context of multi-task learning is offered by providing the data and the protocols. Graphical Abstract .

  9. Is quantum theory a form of statistical mechanics?

    NASA Astrophysics Data System (ADS)

    Adler, S. L.

    2007-05-01

    We give a review of the basic themes of my recent book: Adler S L 2004 Quantum Theory as an Emergent Phenomenon (Cambridge: Cambridge University Press). We first give motivations for considering the possibility that quantum mechanics is not exact, but is instead an accurate asymptotic approximation to a deeper level theory. For this deeper level, we propose a non-commutative generalization of classical mechanics, that we call "trace dynamics", and we give a brief survey of how it works, considering for simplicity only the bosonic case. We then discuss the statistical mechanics of trace dynamics and give our argument that with suitable approximations, the Ward identities for trace dynamics imply that ensemble averages in the canonical ensemble correspond to Wightman functions in quantum field theory. Thus, quantum theory emerges as the statistical thermodynamics of trace dynamics. Finally, we argue that Brownian motion corrections to this thermodynamics lead to stochastic corrections to the Schrödinger equation, of the type that have been much studied in the "continuous spontaneous localization" model of objective state vector reduction. In appendices to the talk, we give details of the existence of a conserved operator in trace dynamics that encodes the structure of the canonical algebra, of the derivation of the Ward identities, and of the proof that the stochastically-modified Schrödinger equation leads to state vector reduction with Born rule probabilities.

  10. Parameter estimation for stiff deterministic dynamical systems via ensemble Kalman filter

    NASA Astrophysics Data System (ADS)

    Arnold, Andrea; Calvetti, Daniela; Somersalo, Erkki

    2014-10-01

    A commonly encountered problem in numerous areas of applications is to estimate the unknown coefficients of a dynamical system from direct or indirect observations at discrete times of some of the components of the state vector. A related problem is to estimate unobserved components of the state. An egregious example of such a problem is provided by metabolic models, in which the numerous model parameters and the concentrations of the metabolites in tissue are to be estimated from concentration data in the blood. A popular method for addressing similar questions in stochastic and turbulent dynamics is the ensemble Kalman filter (EnKF), a particle-based filtering method that generalizes classical Kalman filtering. In this work, we adapt the EnKF algorithm for deterministic systems in which the numerical approximation error is interpreted as a stochastic drift with variance based on classical error estimates of numerical integrators. This approach, which is particularly suitable for stiff systems where the stiffness may depend on the parameters, allows us to effectively exploit the parallel nature of particle methods. Moreover, we demonstrate how spatial prior information about the state vector, which helps the stability of the computed solution, can be incorporated into the filter. The viability of the approach is shown by computed examples, including a metabolic system modeling an ischemic episode in skeletal muscle, with a high number of unknown parameters.

  11. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  12. Ensemble-based simultaneous state and parameter estimation for treatment of mesoscale model error: A real-data study

    NASA Astrophysics Data System (ADS)

    Hu, Xiao-Ming; Zhang, Fuqing; Nielsen-Gammon, John W.

    2010-04-01

    This study explores the treatment of model error and uncertainties through simultaneous state and parameter estimation (SSPE) with an ensemble Kalman filter (EnKF) in the simulation of a 2006 air pollution event over the greater Houston area during the Second Texas Air Quality Study (TexAQS-II). Two parameters in the atmospheric boundary layer parameterization associated with large model sensitivities are combined with standard prognostic variables in an augmented state vector to be continuously updated through assimilation of wind profiler observations. It is found that forecasts of the atmosphere with EnKF/SSPE are markedly improved over experiments with no state and/or parameter estimation. More specifically, the EnKF/SSPE is shown to help alleviate a near-surface cold bias and to alter the momentum mixing in the boundary layer to produce more realistic wind profiles.

  13. Motor-motor interactions in ensembles of muscle myosin: using theory to connect single molecule to ensemble measurements

    NASA Astrophysics Data System (ADS)

    Walcott, Sam

    2013-03-01

    Interactions between the proteins actin and myosin drive muscle contraction. Properties of a single myosin interacting with an actin filament are largely known, but a trillion myosins work together in muscle. We are interested in how single-molecule properties relate to ensemble function. Myosin's reaction rates depend on force, so ensemble models keep track of both molecular state and force on each molecule. These models make subtle predictions, e.g. that myosin, when part of an ensemble, moves actin faster than when isolated. This acceleration arises because forces between molecules speed reaction kinetics. Experiments support this prediction and allow parameter estimates. A model based on this analysis describes experiments from single molecule to ensemble. In vivo, actin is regulated by proteins that, when present, cause the binding of one myosin to speed the binding of its neighbors; binding becomes cooperative. Although such interactions preclude the mean field approximation, a set of linear ODEs describes these ensembles under simplified experimental conditions. In these experiments cooperativity is strong, with the binding of one molecule affecting ten neighbors on either side. We progress toward a description of myosin ensembles under physiological conditions.

  14. Classroom Environment as Related to Contest Ratings among High School Performing Ensembles.

    ERIC Educational Resources Information Center

    Hamann, Donald L.; And Others

    1990-01-01

    Examines influence of classroom environments, measured by the Classroom Environment Scale, Form R (CESR), on vocal and instrumental ensembles' musical achievement at festival contests. Using random sample, reveals subjects with higher scores on CESR scales of involvement, affiliation, teacher support, and organization received better contest…

  15. Structure D'Ensemble, Multiple Classification, Multiple Seriation and Amount of Irrelevant Information

    ERIC Educational Resources Information Center

    Hamel, B. Remmo; Van Der Veer, M. A. A.

    1972-01-01

    A significant positive correlation between multiple classification was found, in testing 65 children aged 6 to 8 years, at the stage of concrete operations. This is interpreted as support for the existence of a structure d'ensemble of operational schemes in the period of concrete operations. (Authors)

  16. Nucleon structure from 2+1-flavor domain-wall QCD

    NASA Astrophysics Data System (ADS)

    Ohta, Shigemi

    2018-03-01

    Nucleon-structure calculations of isovector vector-and axialvector-current form factors, transversity and scalar charge, and quark momentum and helicity fractions are reported from two recent 2+1-flavor dynamical domain-wall fermions lattice-QCD ensembles generated jointly by the RIKEN-BNL-Columbia and UKQCD Collaborations with Iwasaki × dislocation-suppressing-determinatn-ratio gauge action at inverse lattice spacing of 1.378(7) GeV and pion mass values of 249.4(3) and 172.3(3) MeV.

  17. The Development and Application of Random Matrix Theory in Adaptive Signal Processing in the Sample Deficient Regime

    DTIC Science & Technology

    2014-09-01

    optimal diagonal loading which minimizes the MSE. The be- havior of optimal diagonal loading when the arrival process is composed of plane waves embedded...observation vectors. The examples of the ensemble correlation matrix corresponding to the input process consisting of a single or multiple plane waves...Y ∗ij is a complex-conjugate of Yij. This result is used in order to evaluate the expectations of different quadratic forms. The Poincare -Nash

  18. PIV Data Validation Software Package

    NASA Technical Reports Server (NTRS)

    Blackshire, James L.

    1997-01-01

    A PIV data validation and post-processing software package was developed to provide semi-automated data validation and data reduction capabilities for Particle Image Velocimetry data sets. The software provides three primary capabilities including (1) removal of spurious vector data, (2) filtering, smoothing, and interpolating of PIV data, and (3) calculations of out-of-plane vorticity, ensemble statistics, and turbulence statistics information. The software runs on an IBM PC/AT host computer working either under Microsoft Windows 3.1 or Windows 95 operating systems.

  19. Performance assessment of automated tissue characterization for prostate H and E stained histopathology

    NASA Astrophysics Data System (ADS)

    DiFranco, Matthew D.; Reynolds, Hayley M.; Mitchell, Catherine; Williams, Scott; Allan, Prue; Haworth, Annette

    2015-03-01

    Reliable automated prostate tumor detection and characterization in whole-mount histology images is sought in many applications, including post-resection tumor staging and as ground-truth data for multi-parametric MRI interpretation. In this study, an ensemble-based supervised classification algorithm for high-resolution histology images was trained on tile-based image features including histogram and gray-level co-occurrence statistics. The algorithm was assessed using different combinations of H and E prostate slides from two separate medical centers and at two different magnifications (400x and 200x), with the aim of applying tumor classification models to new data. Slides from both datasets were annotated by expert pathologists in order to identify homogeneous cancerous and non-cancerous tissue regions of interest, which were then categorized as (1) low-grade tumor (LG-PCa), including Gleason 3 and high-grade prostatic intraepithelial neoplasia (HG-PIN), (2) high-grade tumor (HG-PCa), including various Gleason 4 and 5 patterns, or (3) non-cancerous, including benign stroma and benign prostatic hyperplasia (BPH). Classification models for both LG-PCa and HG-PCa were separately trained using a support vector machine (SVM) approach, and per-tile tumor prediction maps were generated from the resulting ensembles. Results showed high sensitivity for predicting HG-PCa with an AUC up to 0.822 using training data from both medical centres, while LG-PCa showed a lower sensitivity of 0.763 with the same training data. Visual inspection of cancer probability heatmaps from 9 patients showed that 17/19 tumors were detected, and HG-PCa generally reported less false positives than LG-PCa.

  20. Projection of spatial and temporal changes of rainfall in Sarawak of Borneo Island using statistical downscaling of CMIP5 models

    NASA Astrophysics Data System (ADS)

    Sa'adi, Zulfaqar; Shahid, Shamsuddin; Chung, Eun-Sung; Ismail, Tarmizi bin

    2017-11-01

    This study assesses the possible changes in rainfall patterns of Sarawak in Borneo Island due to climate change through statistical downscaling of General Circulation Models (GCM) projections. Available in-situ observed rainfall data were used to downscale the future rainfall from ensembles of 20 GCMs of Coupled Model Intercomparison Project phase 5 (CMIP5) for four Representative Concentration Pathways (RCP) scenarios, namely, RCP2.6, RCP4.5, RCP6.0 and RCP8.5. Model Output Statistics (MOS) based downscaling models were developed using two data mining approaches known as Random Forest (RF) and Support Vector Machine (SVM). The SVM was found to downscale all GCMs with normalized mean square error (NMSE) of 48.2-75.2 and skill score (SS) of 0.94-0.98 during validation. The results show that the future projection of the annual rainfalls is increasing and decreasing on the region-based and catchment-based basis due to the influence of the monsoon season affecting the coast of Sarawak. The ensemble mean of GCMs projections reveals the increased and decreased mean of annual precipitations at 33 stations with the rate of 0.1% to 19.6% and one station with the rate of - 7.9% to - 3.1%, respectively under all RCP scenarios. The remaining 15 stations showed inconsistency neither increasing nor decreasing at the rate of - 5.6% to 5.2%, but mainly showing a trend of decreasing rainfall during the first period (2010-2039) followed by increasing rainfall for the period of 2070-2099.

  1. Distinct neural patterns enable grasp types decoding in monkey dorsal premotor cortex

    NASA Astrophysics Data System (ADS)

    Hao, Yaoyao; Zhang, Qiaosheng; Controzzi, Marco; Cipriani, Christian; Li, Yue; Li, Juncheng; Zhang, Shaomin; Wang, Yiwen; Chen, Weidong; Chiara Carrozza, Maria; Zheng, Xiaoxiang

    2014-12-01

    Objective. Recent studies have shown that dorsal premotor cortex (PMd), a cortical area in the dorsomedial grasp pathway, is involved in grasp movements. However, the neural ensemble firing property of PMd during grasp movements and the extent to which it can be used for grasp decoding are still unclear. Approach. To address these issues, we used multielectrode arrays to record both spike and local field potential (LFP) signals in PMd in macaque monkeys performing reaching and grasping of one of four differently shaped objects. Main results. Single and population neuronal activity showed distinct patterns during execution of different grip types. Cluster analysis of neural ensemble signals indicated that the grasp related patterns emerged soon (200-300 ms) after the go cue signal, and faded away during the hold period. The timing and duration of the patterns varied depending on the behaviors of individual monkey. Application of support vector machine model to stable activity patterns revealed classification accuracies of 94% and 89% for each of the two monkeys, indicating a robust, decodable grasp pattern encoded in the PMd. Grasp decoding using LFPs, especially the high-frequency bands, also produced high decoding accuracies. Significance. This study is the first to specify the neuronal population encoding of grasp during the time course of grasp. We demonstrate high grasp decoding performance in PMd. These findings, combined with previous evidence for reach related modulation studies, suggest that PMd may play an important role in generation and maintenance of grasp action and may be a suitable locus for brain-machine interface applications.

  2. Comparing writing style feature-based classification methods for estimating user reputations in social media.

    PubMed

    Suh, Jong Hwan

    2016-01-01

    In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea's Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.

  3. A comparative research of different ensemble surrogate models based on set pair analysis for the DNAPL-contaminated aquifer remediation strategy optimization.

    PubMed

    Hou, Zeyu; Lu, Wenxi; Xue, Haibo; Lin, Jin

    2017-08-01

    Surrogate-based simulation-optimization technique is an effective approach for optimizing the surfactant enhanced aquifer remediation (SEAR) strategy for clearing DNAPLs. The performance of the surrogate model, which is used to replace the simulation model for the aim of reducing computation burden, is the key of corresponding researches. However, previous researches are generally based on a stand-alone surrogate model, and rarely make efforts to improve the approximation accuracy of the surrogate model to the simulation model sufficiently by combining various methods. In this regard, we present set pair analysis (SPA) as a new method to build ensemble surrogate (ES) model, and conducted a comparative research to select a better ES modeling pattern for the SEAR strategy optimization problems. Surrogate models were developed using radial basis function artificial neural network (RBFANN), support vector regression (SVR), and Kriging. One ES model is assembling RBFANN model, SVR model, and Kriging model using set pair weights according their performance, and the other is assembling several Kriging (the best surrogate modeling method of three) models built with different training sample datasets. Finally, an optimization model, in which the ES model was embedded, was established to obtain the optimal remediation strategy. The results showed the residuals of the outputs between the best ES model and simulation model for 100 testing samples were lower than 1.5%. Using an ES model instead of the simulation model was critical for considerably reducing the computation time of simulation-optimization process and maintaining high computation accuracy simultaneously. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles.

    PubMed

    Zou, Lingyun; Nan, Chonghan; Hu, Fuquan

    2013-12-15

    Various human pathogens secret effector proteins into hosts cells via the type IV secretion system (T4SS). These proteins play important roles in the interaction between bacteria and hosts. Computational methods for T4SS effector prediction have been developed for screening experimental targets in several isolated bacterial species; however, widely applicable prediction approaches are still unavailable In this work, four types of distinctive features, namely, amino acid composition, dipeptide composition, .position-specific scoring matrix composition and auto covariance transformation of position-specific scoring matrix, were calculated from primary sequences. A classifier, T4EffPred, was developed using the support vector machine with these features and their different combinations for effector prediction. Various theoretical tests were performed in a newly established dataset, and the results were measured with four indexes. We demonstrated that T4EffPred can discriminate IVA and IVB effectors in benchmark datasets with positive rates of 76.7% and 89.7%, respectively. The overall accuracy of 95.9% shows that the present method is accurate for distinguishing the T4SS effector in unidentified sequences. A classifier ensemble was designed to synthesize all single classifiers. Notable performance improvement was observed using this ensemble system in benchmark tests. To demonstrate the model's application, a genome-scale prediction of effectors was performed in Bartonella henselae, an important zoonotic pathogen. A number of putative candidates were distinguished. A web server implementing the prediction method and the source code are both available at http://bioinfo.tmmu.edu.cn/T4EffPred.

  5. A New Method for Determining Structure Ensemble: Application to a RNA Binding Di-Domain Protein.

    PubMed

    Liu, Wei; Zhang, Jingfeng; Fan, Jing-Song; Tria, Giancarlo; Grüber, Gerhard; Yang, Daiwen

    2016-05-10

    Structure ensemble determination is the basis of understanding the structure-function relationship of a multidomain protein with weak domain-domain interactions. Paramagnetic relaxation enhancement has been proven a powerful tool in the study of structure ensembles, but there exist a number of challenges such as spin-label flexibility, domain dynamics, and overfitting. Here we propose a new (to our knowledge) method to describe structure ensembles using a minimal number of conformers. In this method, individual domains are considered rigid; the position of each spin-label conformer and the structure of each protein conformer are defined by three and six orthogonal parameters, respectively. First, the spin-label ensemble is determined by optimizing the positions and populations of spin-label conformers against intradomain paramagnetic relaxation enhancements with a genetic algorithm. Subsequently, the protein structure ensemble is optimized using a more efficient genetic algorithm-based approach and an overfitting indicator, both of which were established in this work. The method was validated using a reference ensemble with a set of conformers whose populations and structures are known. This method was also applied to study the structure ensemble of the tandem di-domain of a poly (U) binding protein. The determined ensemble was supported by small-angle x-ray scattering and nuclear magnetic resonance relaxation data. The ensemble obtained suggests an induced fit mechanism for recognition of target RNA by the protein. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  6. An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images.

    PubMed

    Chin Neoh, Siew; Srisukkham, Worawut; Zhang, Li; Todryk, Stephen; Greystoke, Brigit; Peng Lim, Chee; Alamgir Hossain, Mohammed; Aslam, Nauman

    2015-10-09

    This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method.

  7. An Intelligent Decision Support System for Leukaemia Diagnosis using Microscopic Blood Images

    PubMed Central

    Chin Neoh, Siew; Srisukkham, Worawut; Zhang, Li; Todryk, Stephen; Greystoke, Brigit; Peng Lim, Chee; Alamgir Hossain, Mohammed; Aslam, Nauman

    2015-01-01

    This research proposes an intelligent decision support system for acute lymphoblastic leukaemia diagnosis from microscopic blood images. A novel clustering algorithm with stimulating discriminant measures (SDM) of both within- and between-cluster scatter variances is proposed to produce robust segmentation of nucleus and cytoplasm of lymphocytes/lymphoblasts. Specifically, the proposed between-cluster evaluation is formulated based on the trade-off of several between-cluster measures of well-known feature extraction methods. The SDM measures are used in conjuction with Genetic Algorithm for clustering nucleus, cytoplasm, and background regions. Subsequently, a total of eighty features consisting of shape, texture, and colour information of the nucleus and cytoplasm sub-images are extracted. A number of classifiers (multi-layer perceptron, Support Vector Machine (SVM) and Dempster-Shafer ensemble) are employed for lymphocyte/lymphoblast classification. Evaluated with the ALL-IDB2 database, the proposed SDM-based clustering overcomes the shortcomings of Fuzzy C-means which focuses purely on within-cluster scatter variance. It also outperforms Linear Discriminant Analysis and Fuzzy Compactness and Separation for nucleus-cytoplasm separation. The overall system achieves superior recognition rates of 96.72% and 96.67% accuracies using bootstrapping and 10-fold cross validation with Dempster-Shafer and SVM, respectively. The results also compare favourably with those reported in the literature, indicating the usefulness of the proposed SDM-based clustering method. PMID:26450665

  8. A multiphysical ensemble system of numerical snow modelling

    NASA Astrophysics Data System (ADS)

    Lafaysse, Matthieu; Cluzet, Bertrand; Dumont, Marie; Lejeune, Yves; Vionnet, Vincent; Morin, Samuel

    2017-05-01

    Physically based multilayer snowpack models suffer from various modelling errors. To represent these errors, we built the new multiphysical ensemble system ESCROC (Ensemble System Crocus) by implementing new representations of different physical processes in the deterministic coupled multilayer ground/snowpack model SURFEX/ISBA/Crocus. This ensemble was driven and evaluated at Col de Porte (1325 m a.s.l., French alps) over 18 years with a high-quality meteorological and snow data set. A total number of 7776 simulations were evaluated separately, accounting for the uncertainties of evaluation data. The ability of the ensemble to capture the uncertainty associated to modelling errors is assessed for snow depth, snow water equivalent, bulk density, albedo and surface temperature. Different sub-ensembles of the ESCROC system were studied with probabilistic tools to compare their performance. Results show that optimal members of the ESCROC system are able to explain more than half of the total simulation errors. Integrating members with biases exceeding the range corresponding to observational uncertainty is necessary to obtain an optimal dispersion, but this issue can also be a consequence of the fact that meteorological forcing uncertainties were not accounted for. The ESCROC system promises the integration of numerical snow-modelling errors in ensemble forecasting and ensemble assimilation systems in support of avalanche hazard forecasting and other snowpack-modelling applications.

  9. Nationwide validation of ensemble streamflow forecasts from the Hydrologic Ensemble Forecast Service (HEFS) of the U.S. National Weather Service

    NASA Astrophysics Data System (ADS)

    Lee, H. S.; Liu, Y.; Ward, J.; Brown, J.; Maestre, A.; Herr, H.; Fresch, M. A.; Wells, E.; Reed, S. M.; Jones, E.

    2017-12-01

    The National Weather Service's (NWS) Office of Water Prediction (OWP) recently launched a nationwide effort to verify streamflow forecasts from the Hydrologic Ensemble Forecast Service (HEFS) for a majority of forecast locations across the 13 River Forecast Centers (RFCs). Known as the HEFS Baseline Validation (BV), the project involves a joint effort between the OWP and the RFCs. It aims to provide a geographically consistent, statistically robust validation, and a benchmark to guide the operational implementation of the HEFS, inform practical applications, such as impact-based decision support services, and to provide an objective framework for evaluating strategic investments in the HEFS. For the BV, HEFS hindcasts are issued once per day on a 12Z cycle for the period of 1985-2015 with a forecast horizon of 30 days. For the first two weeks, the hindcasts are forced with precipitation and temperature ensemble forecasts from the Global Ensemble Forecast System of the National Centers for Environmental Prediction, and by resampled climatology for the remaining period. The HEFS-generated ensemble streamflow hindcasts are verified using the Ensemble Verification System. Skill is assessed relative to streamflow hindcasts generated from NWS' current operational system, namely climatology-based Ensemble Streamflow Prediction. In this presentation, we summarize the results and findings to date.

  10. A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment

    NASA Astrophysics Data System (ADS)

    Liu, Jingli; Li, Jianping; Xu, Weixuan; Shi, Yong

    Least squares support vector machine (LS-SVM) is a revised version of support vector machine (SVM) and has been proved to be a useful tool for pattern recognition. LS-SVM had excellent generalization performance and low computational cost. In this paper, we propose a new method called two-layer least squares support vector machine which combines kernel principle component analysis (KPCA) and linear programming form of least square support vector machine. With this method sparseness and robustness is obtained while solving large dimensional and large scale database. A U.S. commercial credit card database is used to test the efficiency of our method and the result proved to be a satisfactory one.

  11. Design of protein switches based on an ensemble model of allostery.

    PubMed

    Choi, Jay H; Laurent, Abigail H; Hilser, Vincent J; Ostermeier, Marc

    2015-04-22

    Switchable proteins that can be regulated through exogenous or endogenous inputs have a broad range of biotechnological and biomedical applications. Here we describe the design of switchable enzymes based on an ensemble allosteric model. First, we insert an enzyme domain into an effector-binding domain such that both domains remain functionally intact. Second, we induce the fusion to behave as a switch through the introduction of conditional conformational flexibility designed to increase the conformational entropy of the enzyme domain in a temperature- or pH-dependent fashion. We confirm the switching behaviour in vitro and in vivo. Structural and thermodynamic studies support the hypothesis that switching result from an increase in conformational entropy of the enzyme domain in the absence of effector. These results support the ensemble model of allostery and embody a strategy for the design of protein switches.

  12. Applications of Bayesian Procrustes shape analysis to ensemble radar reflectivity nowcast verification

    NASA Astrophysics Data System (ADS)

    Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang

    2016-07-01

    This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.

  13. Fixed points, stable manifolds, weather regimes, and their predictability

    DOE PAGES

    Deremble, Bruno; D'Andrea, Fabio; Ghil, Michael

    2009-10-27

    In a simple, one-layer atmospheric model, we study the links between low-frequency variability and the model’s fixed points in phase space. The model dynamics is characterized by the coexistence of multiple ''weather regimes.'' To investigate the transitions from one regime to another, we focus on the identification of stable manifolds associated with fixed points. We show that these manifolds act as separatrices between regimes. We track each manifold by making use of two local predictability measures arising from the meteorological applications of nonlinear dynamics, namely, ''bred vectors'' and singular vectors. These results are then verified in the framework of ensemblemore » forecasts issued from clouds (ensembles) of initial states. The divergence of the trajectories allows us to establish the connections between zones of low predictability, the geometry of the stable manifolds, and transitions between regimes.« less

  14. Wigner functions for fermions in strong magnetic fields

    NASA Astrophysics Data System (ADS)

    Sheng, Xin-li; Rischke, Dirk H.; Vasak, David; Wang, Qun

    2018-02-01

    We compute the covariant Wigner function for spin-(1/2) fermions in an arbitrarily strong magnetic field by exactly solving the Dirac equation at non-zero fermion-number and chiral-charge densities. The Landau energy levels as well as a set of orthonormal eigenfunctions are found as solutions of the Dirac equation. With these orthonormal eigenfunctions we construct the fermion field operators and the corresponding Wigner-function operator. The Wigner function is obtained by taking the ensemble average of the Wigner-function operator in global thermodynamical equilibrium, i.e., at constant temperature T and non-zero fermion-number and chiral-charge chemical potentials μ and μ_5, respectively. Extracting the vector and axial-vector components of the Wigner function, we reproduce the currents of the chiral magnetic and separation effect in an arbitrarily strong magnetic field.

  15. Massively Parallel Assimilation of TOGA/TAO and Topex/Poseidon Measurements into a Quasi Isopycnal Ocean General Circulation Model Using an Ensemble Kalman Filter

    NASA Technical Reports Server (NTRS)

    Keppenne, Christian L.; Rienecker, Michele; Borovikov, Anna Y.; Suarez, Max

    1999-01-01

    A massively parallel ensemble Kalman filter (EnKF)is used to assimilate temperature data from the TOGA/TAO array and altimetry from TOPEX/POSEIDON into a Pacific basin version of the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. The EnKF is an approximate Kalman filter in which the error-covariance propagation step is modeled by the integration of multiple instances of a numerical model. An estimate of the true error covariances is then inferred from the distribution of the ensemble of model state vectors. This inplementation of the filter takes advantage of the inherent parallelism in the EnKF algorithm by running all the model instances concurrently. The Kalman filter update step also occurs in parallel by having each processor process the observations that occur in the region of physical space for which it is responsible. The massively parallel data assimilation system is validated by withholding some of the data and then quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The distributions of the forecast and analysis error covariances predicted by the ENKF are also examined.

  16. Tracking Energy Flow Using a Volumetric Acoustic Intensity Imager (VAIM)

    NASA Technical Reports Server (NTRS)

    Klos, Jacob; Williams, Earl G.; Valdivia, Nicolas P.

    2006-01-01

    A new measurement device has been invented at the Naval Research Laboratory which images instantaneously the intensity vector throughout a three-dimensional volume nearly a meter on a side. The measurement device consists of a nearly transparent spherical array of 50 inexpensive microphones optimally positioned on an imaginary spherical surface of radius 0.2m. Front-end signal processing uses coherence analysis to produce multiple, phase-coherent holograms in the frequency domain each related to references located on suspect sound sources in an aircraft cabin. The analysis uses either SVD or Cholesky decomposition methods using ensemble averages of the cross-spectral density with the fixed references. The holograms are mathematically processed using spherical NAH (nearfield acoustical holography) to convert the measured pressure field into a vector intensity field in the volume of maximum radius 0.4 m centered on the sphere origin. The utility of this probe is evaluated in a detailed analysis of a recent in-flight experiment in cooperation with Boeing and NASA on NASA s Aries 757 aircraft. In this experiment the trim panels and insulation were removed over a section of the aircraft and the bare panels and windows were instrumented with accelerometers to use as references for the VAIM. Results show excellent success at locating and identifying the sources of interior noise in-flight in the frequency range of 0 to 1400 Hz. This work was supported by NASA and the Office of Naval Research.

  17. Development of Gridded Ensemble Precipitation and Temperature Datasets for the Contiguous United States Plus Hawai'i and Alaska

    NASA Astrophysics Data System (ADS)

    Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.

    2016-12-01

    Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.

  18. The NASA Reanalysis Ensemble Service - Advanced Capabilities for Integrated Reanalysis Access and Intercomparison

    NASA Astrophysics Data System (ADS)

    Tamkin, G.; Schnase, J. L.; Duffy, D.; Li, J.; Strong, S.; Thompson, J. H.

    2017-12-01

    NASA's efforts to advance climate analytics-as-a-service are making new capabilities available to the research community: (1) A full-featured Reanalysis Ensemble Service (RES) comprising monthly means data from multiple reanalysis data sets, accessible through an enhanced set of extraction, analytic, arithmetic, and intercomparison operations. The operations are made accessible through NASA's climate data analytics Web services and our client-side Climate Data Services Python library, CDSlib; (2) A cloud-based, high-performance Virtual Real-Time Analytics Testbed supporting a select set of climate variables. This near real-time capability enables advanced technologies like Spark and Hadoop-based MapReduce analytics over native NetCDF files; and (3) A WPS-compliant Web service interface to our climate data analytics service that will enable greater interoperability with next-generation systems such as ESGF. The Reanalysis Ensemble Service includes the following: - New API that supports full temporal, spatial, and grid-based resolution services with sample queries - A Docker-ready RES application to deploy across platforms - Extended capabilities that enable single- and multiple reanalysis area average, vertical average, re-gridding, standard deviation, and ensemble averages - Convenient, one-stop shopping for commonly used data products from multiple reanalyses including basic sub-setting and arithmetic operations (e.g., avg, sum, max, min, var, count, anomaly) - Full support for the MERRA-2 reanalysis dataset in addition to, ECMWF ERA-Interim, NCEP CFSR, JMA JRA-55 and NOAA/ESRL 20CR… - A Jupyter notebook-based distribution mechanism designed for client use cases that combines CDSlib documentation with interactive scenarios and personalized project management - Supporting analytic services for NASA GMAO Forward Processing datasets - Basic uncertainty quantification services that combine heterogeneous ensemble products with comparative observational products (e.g., reanalysis, observational, visualization) - The ability to compute and visualize multiple reanalysis for ease of inter-comparisons - Automated tools to retrieve and prepare data collections for analytic processing

  19. Robust support vector regression networks for function approximation with outliers.

    PubMed

    Chuang, Chen-Chia; Su, Shun-Feng; Jeng, Jin-Tsong; Hsiao, Chih-Ching

    2002-01-01

    Support vector regression (SVR) employs the support vector machine (SVM) to tackle problems of function approximation and regression estimation. SVR has been shown to have good robust properties against noise. When the parameters used in SVR are improperly selected, overfitting phenomena may still occur. However, the selection of various parameters is not straightforward. Besides, in SVR, outliers may also possibly be taken as support vectors. Such an inclusion of outliers in support vectors may lead to seriously overfitting phenomena. In this paper, a novel regression approach, termed as the robust support vector regression (RSVR) network, is proposed to enhance the robust capability of SVR. In the approach, traditional robust learning approaches are employed to improve the learning performance for any selected parameters. From the simulation results, our RSVR can always improve the performance of the learned systems for all cases. Besides, it can be found that even the training lasted for a long period, the testing errors would not go up. In other words, the overfitting phenomenon is indeed suppressed.

  20. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

    PubMed

    Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

    2013-01-01

    The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.

  1. Short-range solar radiation forecasts over Sweden

    NASA Astrophysics Data System (ADS)

    Landelius, Tomas; Lindskog, Magnus; Körnich, Heiner; Andersson, Sandra

    2018-04-01

    In this article the performance for short-range solar radiation forecasts by the global deterministic and ensemble models from the European Centre for Medium-Range Weather Forecasts (ECMWF) is compared with an ensemble of the regional mesoscale model HARMONIE-AROME used by the national meteorological services in Sweden, Norway and Finland. Note however that only the control members and the ensemble means are included in the comparison. The models resolution differs considerably with 18 km for the ECMWF ensemble, 9 km for the ECMWF deterministic model, and 2.5 km for the HARMONIE-AROME ensemble. The models share the same radiation code. It turns out that they all underestimate systematically the Direct Normal Irradiance (DNI) for clear-sky conditions. Except for this shortcoming, the HARMONIE-AROME ensemble model shows the best agreement with the distribution of observed Global Horizontal Irradiance (GHI) and DNI values. During mid-day the HARMONIE-AROME ensemble mean performs best. The control member of the HARMONIE-AROME ensemble also scores better than the global deterministic ECMWF model. This is an interesting result since mesoscale models have so far not shown good results when compared to the ECMWF models. Three days with clear, mixed and cloudy skies are used to illustrate the possible added value of a probabilistic forecast. It is shown that in these cases the mesoscale ensemble could provide decision support to a grid operator in terms of forecasts of both the amount of solar power and its probabilities.

  2. Estimation and correction of different flavors of surface observation biases in ensemble Kalman filter

    NASA Astrophysics Data System (ADS)

    Lorente-Plazas, Raquel; Hacker, Josua P.; Collins, Nancy; Lee, Jared A.

    2017-04-01

    The impact of assimilating surface observations has been shown in several publications, for improving weather prediction inside of the boundary layer as well as the flow aloft. However, the assimilation of surface observations is often far from optimal due to the presence of both model and observation biases. The sources of these biases can be diverse: an instrumental offset, errors associated to the comparison of point-based observations and grid-cell average, etc. To overcome this challenge, a method was developed using the ensemble Kalman filter. The approach consists on representing each observation bias as a parameter. These bias parameters are added to the forward operator and they extend the state vector. As opposed to the observation bias estimation approaches most common in operational systems (e.g. for satellite radiances), the state vector and parameters are simultaneously updated by applying the Kalman filter equations to the augmented state. The method to estimate and correct the observation bias is evaluated using observing system simulation experiments (OSSEs) with the Weather Research and Forecasting (WRF) model. OSSEs are constructed for the conventional observation network including radiosondes, aircraft observations, atmospheric motion vectors, and surface observations. Three different kinds of biases are added to 2-meter temperature for synthetic METARs. From the simplest to more sophisticated, imposed biases are: (1) a spatially invariant bias, (2) a spatially varying bias proportional to topographic height differences between the model and the observations, and (3) bias that is proportional to the temperature. The target region characterized by complex terrain is the western U.S. on a domain with 30-km grid spacing. Observations are assimilated every 3 hours using an 80-member ensemble during September 2012. Results demonstrate that the approach is able to estimate and correct the bias when it is spatially invariant (experiment 1). More complex bias structure in experiments (2) and (3) are more difficult to estimate, but still possible. Estimated the parameter in experiments with unbiased observations results in spatial and temporal parameter variability about zero, and establishes a threshold on the accuracy of the parameter in further experiments. When the observations are biased, the mean parameter value is close to the true bias, but temporal and spatial variability in the parameter estimates is similar to the parameters used when estimating a zero bias in the observations. The distributions are related to other errors in the forecasts, indicating that the parameters are absorbing some of the forecast error from other sources. In this presentation we elucidate the reasons for the resulting parameter estimates, and their variability.

  3. Slycat™ User Manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Crossno, Patricia J.; Gittinger, Jaxon; Hunt, Warren L.

    Slycat™ is a web-based system for performing data analysis and visualization of potentially large quantities of remote, high-dimensional data. Slycat™ specializes in working with ensemble data. An ensemble is a group of related data sets, which typically consists of a set of simulation runs exploring the same problem space. An ensemble can be thought of as a set of samples within a multi-variate domain, where each sample is a vector whose value defines a point in high-dimensional space. To understand and describe the underlying problem being modeled in the simulations, ensemble analysis looks for shared behaviors and common features acrossmore » the group of runs. Additionally, ensemble analysis tries to quantify differences found in any members that deviate from the rest of the group. The Slycat™ system integrates data management, scalable analysis, and visualization. Results are viewed remotely on a user’s desktop via commodity web clients using a multi-tiered hierarchy of computation and data storage, as shown in Figure 1. Our goal is to operate on data as close to the source as possible, thereby reducing time and storage costs associated with data movement. Consequently, we are working to develop parallel analysis capabilities that operate on High Performance Computing (HPC) platforms, to explore approaches for reducing data size, and to implement strategies for staging computation across the Slycat™ hierarchy. Within Slycat™, data and visual analysis are organized around projects, which are shared by a project team. Project members are explicitly added, each with a designated set of permissions. Although users sign-in to access Slycat™, individual accounts are not maintained. Instead, authentication is used to determine project access. Within projects, Slycat™ models capture analysis results and enable data exploration through various visual representations. Although for scientists each simulation run is a model of real-world phenomena given certain conditions, we use the term model to refer to our modeling of the ensemble data, not the physics. Different model types often provide complementary perspectives on data features when analyzing the same data set. Each model visualizes data at several levels of abstraction, allowing the user to range from viewing the ensemble holistically to accessing numeric parameter values for a single run. Bookmarks provide a mechanism for sharing results, enabling interesting model states to be labeled and saved.« less

  4. The GMAO Hybrid Ensemble-Variational Atmospheric Data Assimilation System: Version 2.0

    NASA Technical Reports Server (NTRS)

    Todling, Ricardo; El Akkraoui, Amal

    2018-01-01

    This document describes the implementation and usage of the Goddard Earth Observing System (GEOS) Hybrid Ensemble-Variational Atmospheric Data Assimilation System (Hybrid EVADAS). Its aim is to provide comprehensive guidance to users of GEOS ADAS interested in experimenting with its hybrid functionalities. The document is also aimed at providing a short summary of the state-of-science in this release of the hybrid system. As explained here, the ensemble data assimilation system (EnADAS) mechanism added to GEOS ADAS to enable hybrid data assimilation applications has been introduced to the pre-existing machinery of GEOS in the most non-intrusive possible way. Only very minor changes have been made to the original scripts controlling GEOS ADAS with the objective of facilitating its usage by both researchers and the GMAO's near-real-time Forward Processing applications. In a hybrid scenario two data assimilation systems run concurrently in a two-way feedback mode such that: the ensemble provides background ensemble perturbations required by the ADAS deterministic (typically high resolution) hybrid analysis; and the deterministic ADAS provides analysis information for recentering of the EnADAS analyses and information necessary to ensure that observation bias correction procedures are consistent between both the deterministic ADAS and the EnADAS. The nonintrusive approach to introducing hybrid capability to GEOS ADAS means, in particular, that previously existing features continue to be available. Thus, not only is this upgraded version of GEOS ADAS capable of supporting new applications such as Hybrid 3D-Var, 3D-EnVar, 4D-EnVar and Hybrid 4D-EnVar, it remains possible to use GEOS ADAS in its traditional 3D-Var mode which has been used in both MERRA and MERRA-2. Furthermore, as described in this document, GEOS ADAS also supports a configuration for exercising a purely ensemble-based assimilation strategy which can be fully decoupled from its variational component. We should point out that Release 1.0 of this document was made available to GMAO in mid-2013, when we introduced Hybrid 3D-Var capability to GEOS ADAS. This initial version of the documentation included a considerably different state-of-science introductory section but many of the same detailed description of the mechanisms of GEOS EnADAS. We are glad to report that a few of the desirable Future Works listed in Release 1.0 have now been added to the present version of GEOS EnADAS. These include the ability to exercise an Ensemble Prediction System that uses the ensemble analyses of GEOS EnADAS and (a very early, but functional version of) a tool to support Ensemble Forecast Sensitivity and Observation Impact applications.

  5. Deep learning ensemble with asymptotic techniques for oscillometric blood pressure estimation.

    PubMed

    Lee, Soojeong; Chang, Joon-Hyuk

    2017-11-01

    This paper proposes a deep learning based ensemble regression estimator with asymptotic techniques, and offers a method that can decrease uncertainty for oscillometric blood pressure (BP) measurements using the bootstrap and Monte-Carlo approach. While the former is used to estimate SBP and DBP, the latter attempts to determine confidence intervals (CIs) for SBP and DBP based on oscillometric BP measurements. This work originally employs deep belief networks (DBN)-deep neural networks (DNN) to effectively estimate BPs based on oscillometric measurements. However, there are some inherent problems with these methods. First, it is not easy to determine the best DBN-DNN estimator, and worthy information might be omitted when selecting one DBN-DNN estimator and discarding the others. Additionally, our input feature vectors, obtained from only five measurements per subject, represent a very small sample size; this is a critical weakness when using the DBN-DNN technique and can cause overfitting or underfitting, depending on the structure of the algorithm. To address these problems, an ensemble with an asymptotic approach (based on combining the bootstrap with the DBN-DNN technique) is utilized to generate the pseudo features needed to estimate the SBP and DBP. In the first stage, the bootstrap-aggregation technique is used to create ensemble parameters. Afterward, the AdaBoost approach is employed for the second-stage SBP and DBP estimation. We then use the bootstrap and Monte-Carlo techniques in order to determine the CIs based on the target BP estimated using the DBN-DNN ensemble regression estimator with the asymptotic technique in the third stage. The proposed method can mitigate the estimation uncertainty such as large the standard deviation of error (SDE) on comparing the proposed DBN-DNN ensemble regression estimator with the DBN-DNN single regression estimator, we identify that the SDEs of the SBP and DBP are reduced by 0.58 and 0.57  mmHg, respectively. These indicate that the proposed method actually enhances the performance by 9.18% and 10.88% compared with the DBN-DNN single estimator. The proposed methodology improves the accuracy of BP estimation and reduces the uncertainty for BP estimation. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Planetary Gears Feature Extraction and Fault Diagnosis Method Based on VMD and CNN.

    PubMed

    Liu, Chang; Cheng, Gang; Chen, Xihui; Pang, Yusong

    2018-05-11

    Given local weak feature information, a novel feature extraction and fault diagnosis method for planetary gears based on variational mode decomposition (VMD), singular value decomposition (SVD), and convolutional neural network (CNN) is proposed. VMD was used to decompose the original vibration signal to mode components. The mode matrix was partitioned into a number of submatrices and local feature information contained in each submatrix was extracted as a singular value vector using SVD. The singular value vector matrix corresponding to the current fault state was constructed according to the location of each submatrix. Finally, by training a CNN using singular value vector matrices as inputs, planetary gear fault state identification and classification was achieved. The experimental results confirm that the proposed method can successfully extract local weak feature information and accurately identify different faults. The singular value vector matrices of different fault states have a distinct difference in element size and waveform. The VMD-based partition extraction method is better than ensemble empirical mode decomposition (EEMD), resulting in a higher CNN total recognition rate of 100% with fewer training times (14 times). Further analysis demonstrated that the method can also be applied to the degradation recognition of planetary gears. Thus, the proposed method is an effective feature extraction and fault diagnosis technique for planetary gears.

  7. Inter-model comparison of the landscape determinants of vector-borne disease: implications for epidemiological and entomological risk modeling.

    PubMed

    Lorenz, Alyson; Dhingra, Radhika; Chang, Howard H; Bisanzio, Donal; Liu, Yang; Remais, Justin V

    2014-01-01

    Extrapolating landscape regression models for use in assessing vector-borne disease risk and other applications requires thoughtful evaluation of fundamental model choice issues. To examine implications of such choices, an analysis was conducted to explore the extent to which disparate landscape models agree in their epidemiological and entomological risk predictions when extrapolated to new regions. Agreement between six literature-drawn landscape models was examined by comparing predicted county-level distributions of either Lyme disease or Ixodes scapularis vector using Spearman ranked correlation. AUC analyses and multinomial logistic regression were used to assess the ability of these extrapolated landscape models to predict observed national data. Three models based on measures of vegetation, habitat patch characteristics, and herbaceous landcover emerged as effective predictors of observed disease and vector distribution. An ensemble model containing these three models improved precision and predictive ability over individual models. A priori assessment of qualitative model characteristics effectively identified models that subsequently emerged as better predictors in quantitative analysis. Both a methodology for quantitative model comparison and a checklist for qualitative assessment of candidate models for extrapolation are provided; both tools aim to improve collaboration between those producing models and those interested in applying them to new areas and research questions.

  8. Planetary Gears Feature Extraction and Fault Diagnosis Method Based on VMD and CNN

    PubMed Central

    Cheng, Gang; Chen, Xihui

    2018-01-01

    Given local weak feature information, a novel feature extraction and fault diagnosis method for planetary gears based on variational mode decomposition (VMD), singular value decomposition (SVD), and convolutional neural network (CNN) is proposed. VMD was used to decompose the original vibration signal to mode components. The mode matrix was partitioned into a number of submatrices and local feature information contained in each submatrix was extracted as a singular value vector using SVD. The singular value vector matrix corresponding to the current fault state was constructed according to the location of each submatrix. Finally, by training a CNN using singular value vector matrices as inputs, planetary gear fault state identification and classification was achieved. The experimental results confirm that the proposed method can successfully extract local weak feature information and accurately identify different faults. The singular value vector matrices of different fault states have a distinct difference in element size and waveform. The VMD-based partition extraction method is better than ensemble empirical mode decomposition (EEMD), resulting in a higher CNN total recognition rate of 100% with fewer training times (14 times). Further analysis demonstrated that the method can also be applied to the degradation recognition of planetary gears. Thus, the proposed method is an effective feature extraction and fault diagnosis technique for planetary gears. PMID:29751671

  9. TWSVR: Regression via Twin Support Vector Machine.

    PubMed

    Khemchandani, Reshma; Goyal, Keshav; Chandra, Suresh

    2016-02-01

    Taking motivation from Twin Support Vector Machine (TWSVM) formulation, Peng (2010) attempted to propose Twin Support Vector Regression (TSVR) where the regressor is obtained via solving a pair of quadratic programming problems (QPPs). In this paper we argue that TSVR formulation is not in the true spirit of TWSVM. Further, taking motivation from Bi and Bennett (2003), we propose an alternative approach to find a formulation for Twin Support Vector Regression (TWSVR) which is in the true spirit of TWSVM. We show that our proposed TWSVR can be derived from TWSVM for an appropriately constructed classification problem. To check the efficacy of our proposed TWSVR we compare its performance with TSVR and classical Support Vector Regression(SVR) on various regression datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Multiscale asymmetric orthogonal wavelet kernel for linear programming support vector learning and nonlinear dynamic systems identification.

    PubMed

    Lu, Zhao; Sun, Jing; Butts, Kenneth

    2014-05-01

    Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.

  11. A mechatronics platform to study prosthetic hand control using EMG signals.

    PubMed

    Geethanjali, P

    2016-09-01

    In this paper, a low-cost mechatronics platform for the design and development of robotic hands as well as a surface electromyogram (EMG) pattern recognition system is proposed. This paper also explores various EMG classification techniques using a low-cost electronics system in prosthetic hand applications. The proposed platform involves the development of a four channel EMG signal acquisition system; pattern recognition of acquired EMG signals; and development of a digital controller for a robotic hand. Four-channel surface EMG signals, acquired from ten healthy subjects for six different movements of the hand, were used to analyse pattern recognition in prosthetic hand control. Various time domain features were extracted and grouped into five ensembles to compare the influence of features in feature-selective classifiers (SLR) with widely considered non-feature-selective classifiers, such as neural networks (NN), linear discriminant analysis (LDA) and support vector machines (SVM) applied with different kernels. The results divulged that the average classification accuracy of the SVM, with a linear kernel function, outperforms other classifiers with feature ensembles, Hudgin's feature set and auto regression (AR) coefficients. However, the slight improvement in classification accuracy of SVM incurs more processing time and memory space in the low-level controller. The Kruskal-Wallis (KW) test also shows that there is no significant difference in the classification performance of SLR with Hudgin's feature set to that of SVM with Hudgin's features along with AR coefficients. In addition, the KW test shows that SLR was found to be better in respect to computation time and memory space, which is vital in a low-level controller. Similar to SVM, with a linear kernel function, other non-feature selective LDA and NN classifiers also show a slight improvement in performance using twice the features but with the drawback of increased memory space requirement and time. This prototype facilitated the study of various issues of pattern recognition and identified an efficient classifier, along with a feature ensemble, in the implementation of EMG controlled prosthetic hands in a laboratory setting at low-cost. This platform may help to motivate and facilitate prosthetic hand research in developing countries.

  12. Automated detection of pulmonary nodules in PET/CT images: Ensemble false-positive reduction using a convolutional neural network technique

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Teramoto, Atsushi, E-mail: teramoto@fujita-hu.ac.jp; Fujita, Hiroshi; Yamamuro, Osamu

    Purpose: Automated detection of solitary pulmonary nodules using positron emission tomography (PET) and computed tomography (CT) images shows good sensitivity; however, it is difficult to detect nodules in contact with normal organs, and additional efforts are needed so that the number of false positives (FPs) can be further reduced. In this paper, the authors propose an improved FP-reduction method for the detection of pulmonary nodules in PET/CT images by means of convolutional neural networks (CNNs). Methods: The overall scheme detects pulmonary nodules using both CT and PET images. In the CT images, a massive region is first detected using anmore » active contour filter, which is a type of contrast enhancement filter that has a deformable kernel shape. Subsequently, high-uptake regions detected by the PET images are merged with the regions detected by the CT images. FP candidates are eliminated using an ensemble method; it consists of two feature extractions, one by shape/metabolic feature analysis and the other by a CNN, followed by a two-step classifier, one step being rule based and the other being based on support vector machines. Results: The authors evaluated the detection performance using 104 PET/CT images collected by a cancer-screening program. The sensitivity in detecting candidates at an initial stage was 97.2%, with 72.8 FPs/case. After performing the proposed FP-reduction method, the sensitivity of detection was 90.1%, with 4.9 FPs/case; the proposed method eliminated approximately half the FPs existing in the previous study. Conclusions: An improved FP-reduction scheme using CNN technique has been developed for the detection of pulmonary nodules in PET/CT images. The authors’ ensemble FP-reduction method eliminated 93% of the FPs; their proposed method using CNN technique eliminates approximately half the FPs existing in the previous study. These results indicate that their method may be useful in the computer-aided detection of pulmonary nodules using PET/CT images.« less

  13. Prediction of lung cancer patient survival via supervised machine learning classification techniques.

    PubMed

    Lynch, Chip M; Abdollahi, Behnaz; Fuqua, Joshua D; de Carlo, Alexandra R; Bartholomai, James A; Balgemann, Rayeanne N; van Berkel, Victor H; Frieboes, Hermann B

    2017-12-01

    Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Assessing Density Functionals Using Many Body Theory for Hybrid Perovskites

    NASA Astrophysics Data System (ADS)

    Bokdam, Menno; Lahnsteiner, Jonathan; Ramberger, Benjamin; Schäfer, Tobias; Kresse, Georg

    2017-10-01

    Which density functional is the "best" for structure simulations of a particular material? A concise, first principles, approach to answer this question is presented. The random phase approximation (RPA)—an accurate many body theory—is used to evaluate various density functionals. To demonstrate and verify the method, we apply it to the hybrid perovskite MAPbI3 , a promising new solar cell material. The evaluation is done by first creating finite temperature ensembles for small supercells using RPA molecular dynamics, and then evaluating the variance between the RPA and various approximate density functionals for these ensembles. We find that, contrary to recent suggestions, van der Waals functionals do not improve the description of the material, whereas hybrid functionals and the strongly constrained appropriately normed (SCAN) density functional yield very good agreement with the RPA. Finally, our study shows that in the room temperature tetragonal phase of MAPbI3 , the molecules are preferentially parallel to the shorter lattice vectors but reorientation on ps time scales is still possible.

  15. Water demand forecasting: review of soft computing methods.

    PubMed

    Ghalehkhondabi, Iman; Ardjmand, Ehsan; Young, William A; Weckman, Gary R

    2017-07-01

    Demand forecasting plays a vital role in resource management for governments and private companies. Considering the scarcity of water and its inherent constraints, demand management and forecasting in this domain are critically important. Several soft computing techniques have been developed over the last few decades for water demand forecasting. This study focuses on soft computing methods of water consumption forecasting published between 2005 and 2015. These methods include artificial neural networks (ANNs), fuzzy and neuro-fuzzy models, support vector machines, metaheuristics, and system dynamics. Furthermore, it was discussed that while in short-term forecasting, ANNs have been superior in many cases, but it is still very difficult to pick a single method as the overall best. According to the literature, various methods and their hybrids are applied to water demand forecasting. However, it seems soft computing has a lot more to contribute to water demand forecasting. These contribution areas include, but are not limited, to various ANN architectures, unsupervised methods, deep learning, various metaheuristics, and ensemble methods. Moreover, it is found that soft computing methods are mainly used for short-term demand forecasting.

  16. Gait recognition based on Gabor wavelets and modified gait energy image for human identification

    NASA Astrophysics Data System (ADS)

    Huang, Deng-Yuan; Lin, Ta-Wei; Hu, Wu-Chih; Cheng, Chih-Hsiang

    2013-10-01

    This paper proposes a method for recognizing human identity using gait features based on Gabor wavelets and modified gait energy images (GEIs). Identity recognition by gait generally involves gait representation, extraction, and classification. In this work, a modified GEI convolved with an ensemble of Gabor wavelets is proposed as a gait feature. Principal component analysis is then used to project the Gabor-wavelet-based gait features into a lower-dimension feature space for subsequent classification. Finally, support vector machine classifiers based on a radial basis function kernel are trained and utilized to recognize human identity. The major contributions of this paper are as follows: (1) the consideration of the shadow effect to yield a more complete segmentation of gait silhouettes; (2) the utilization of motion estimation to track people when walkers overlap; and (3) the derivation of modified GEIs to extract more useful gait information. Extensive performance evaluation shows a great improvement of recognition accuracy due to the use of shadow removal, motion estimation, and gait representation using the modified GEIs and Gabor wavelets.

  17. Knee Joint Vibration Signal Analysis with Matching Pursuit Decomposition and Dynamic Weighted Classifier Fusion

    PubMed Central

    Cai, Suxian; Yang, Shanshan; Zheng, Fang; Lu, Meng; Wu, Yunfeng; Krishnan, Sridhar

    2013-01-01

    Analysis of knee joint vibration (VAG) signals can provide quantitative indices for detection of knee joint pathology at an early stage. In addition to the statistical features developed in the related previous studies, we extracted two separable features, that is, the number of atoms derived from the wavelet matching pursuit decomposition and the number of significant signal turns detected with the fixed threshold in the time domain. To perform a better classification over the data set of 89 VAG signals, we applied a novel classifier fusion system based on the dynamic weighted fusion (DWF) method to ameliorate the classification performance. For comparison, a single leastsquares support vector machine (LS-SVM) and the Bagging ensemble were used for the classification task as well. The results in terms of overall accuracy in percentage and area under the receiver operating characteristic curve obtained with the DWF-based classifier fusion method reached 88.76% and 0.9515, respectively, which demonstrated the effectiveness and superiority of the DWF method with two distinct features for the VAG signal analysis. PMID:23573175

  18. A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks

    PubMed Central

    Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

    2016-01-01

    The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks. PMID:27754380

  19. A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks.

    PubMed

    Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

    2016-10-13

    The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.

  20. Comparison of machine-learning methods for above-ground biomass estimation based on Landsat imagery

    NASA Astrophysics Data System (ADS)

    Wu, Chaofan; Shen, Huanhuan; Shen, Aihua; Deng, Jinsong; Gan, Muye; Zhu, Jinxia; Xu, Hongwei; Wang, Ke

    2016-07-01

    Biomass is one significant biophysical parameter of a forest ecosystem, and accurate biomass estimation on the regional scale provides important information for carbon-cycle investigation and sustainable forest management. In this study, Landsat satellite imagery data combined with field-based measurements were integrated through comparisons of five regression approaches [stepwise linear regression, K-nearest neighbor, support vector regression, random forest (RF), and stochastic gradient boosting] with two different candidate variable strategies to implement the optimal spatial above-ground biomass (AGB) estimation. The results suggested that RF algorithm exhibited the best performance by 10-fold cross-validation with respect to R2 (0.63) and root-mean-square error (26.44 ton/ha). Consequently, the map of estimated AGB was generated with a mean value of 89.34 ton/ha in northwestern Zhejiang Province, China, with a similar pattern to the distribution mode of local forest species. This research indicates that machine-learning approaches associated with Landsat imagery provide an economical way for biomass estimation. Moreover, ensemble methods using all candidate variables, especially for Landsat images, provide an alternative for regional biomass simulation.

  1. On the Local Equivalence Between the Canonical and the Microcanonical Ensembles for Quantum Spin Systems

    NASA Astrophysics Data System (ADS)

    Tasaki, Hal

    2018-06-01

    We study a quantum spin system on the d-dimensional hypercubic lattice Λ with N=L^d sites with periodic boundary conditions. We take an arbitrary translation invariant short-ranged Hamiltonian. For this system, we consider both the canonical ensemble with inverse temperature β _0 and the microcanonical ensemble with the corresponding energy U_N(β _0) . For an arbitrary self-adjoint operator \\hat{A} whose support is contained in a hypercubic block B inside Λ , we prove that the expectation values of \\hat{A} with respect to these two ensembles are close to each other for large N provided that β _0 is sufficiently small and the number of sites in B is o(N^{1/2}) . This establishes the equivalence of ensembles on the level of local states in a large but finite system. The result is essentially that of Brandao and Cramer (here restricted to the case of the canonical and the microcanonical ensembles), but we prove improved estimates in an elementary manner. We also review and prove standard results on the thermodynamic limits of thermodynamic functions and the equivalence of ensembles in terms of thermodynamic functions. The present paper assumes only elementary knowledge on quantum statistical mechanics and quantum spin systems.

  2. A Code Generation Approach for Auto-Vectorization in the Spade Compiler

    NASA Astrophysics Data System (ADS)

    Wang, Huayong; Andrade, Henrique; Gedik, Buğra; Wu, Kun-Lung

    We describe an auto-vectorization approach for the Spade stream processing programming language, comprising two ideas. First, we provide support for vectors as a primitive data type. Second, we provide a C++ library with architecture-specific implementations of a large number of pre-vectorized operations as the means to support language extensions. We evaluate our approach with several stream processing operators, contrasting Spade's auto-vectorization with the native auto-vectorization provided by the GNU gcc and Intel icc compilers.

  3. Lessons Learned from Assimilating Altimeter Data into a Coupled General Circulation Model with the GMAO Augmented Ensemble Kalman Filter

    NASA Technical Reports Server (NTRS)

    Keppenne, Christian; Vernieres, Guillaume; Rienecker, Michele; Jacob, Jossy; Kovach, Robin

    2011-01-01

    Satellite altimetry measurements have provided global, evenly distributed observations of the ocean surface since 1993. However, the difficulties introduced by the presence of model biases and the requirement that data assimilation systems extrapolate the sea surface height (SSH) information to the subsurface in order to estimate the temperature, salinity and currents make it difficult to optimally exploit these measurements. This talk investigates the potential of the altimetry data assimilation once the biases are accounted for with an ad hoc bias estimation scheme. Either steady-state or state-dependent multivariate background-error covariances from an ensemble of model integrations are used to address the problem of extrapolating the information to the sub-surface. The GMAO ocean data assimilation system applied to an ensemble of coupled model instances using the GEOS-5 AGCM coupled to MOM4 is used in the investigation. To model the background error covariances, the system relies on a hybrid ensemble approach in which a small number of dynamically evolved model trajectories is augmented on the one hand with past instances of the state vector along each trajectory and, on the other, with a steady state ensemble of error estimates from a time series of short-term model forecasts. A state-dependent adaptive error-covariance localization and inflation algorithm controls how the SSH information is extrapolated to the sub-surface. A two-step predictor corrector approach is used to assimilate future information. Independent (not-assimilated) temperature and salinity observations from Argo floats are used to validate the assimilation. A two-step projection method in which the system first calculates a SSH increment and then projects this increment vertically onto the temperature, salt and current fields is found to be most effective in reconstructing the sub-surface information. The performance of the system in reconstructing the sub-surface fields is particularly impressive for temperature, but not as satisfactory for salt.

  4. Testing a multi-malaria-model ensemble against 30 years of data in the Kenyan highlands

    PubMed Central

    2014-01-01

    Background Multi-model ensembles could overcome challenges resulting from uncertainties in models’ initial conditions, parameterization and structural imperfections. They could also quantify in a probabilistic way uncertainties in future climatic conditions and their impacts. Methods A four-malaria-model ensemble was implemented to assess the impact of long-term changes in climatic conditions on Plasmodium falciparum malaria morbidity observed in Kericho, in the highlands of Western Kenya, over the period 1979–2009. Input data included quality controlled temperature and rainfall records gathered at a nearby weather station over the historical periods 1979–2009 and 1980–2009, respectively. Simulations included models’ sensitivities to changes in sets of parameters and analysis of non-linear changes in the mean duration of host’s infectivity to vectors due to increased resistance to anti-malarial drugs. Results The ensemble explained from 32 to 38% of the variance of the observed P. falciparum malaria incidence. Obtained R2-values were above the results achieved with individual model simulation outputs. Up to 18.6% of the variance of malaria incidence could be attributed to the +0.19 to +0.25°C per decade significant long-term linear trend in near-surface air temperatures. On top of this 18.6%, at least 6% of the variance of malaria incidence could be related to the increased resistance to anti-malarial drugs. Ensemble simulations also suggest that climatic conditions have likely been less favourable to malaria transmission in Kericho in recent years. Conclusions Long-term changes in climatic conditions and non-linear changes in the mean duration of host’s infectivity are synergistically driving the increasing incidence of P. falciparum malaria in the Kenyan highlands. User-friendly, online-downloadable, open source mathematical tools, such as the one presented here, could improve decision-making processes of local and regional health authorities. PMID:24885824

  5. Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data.

    PubMed

    Li, Jiuyong; Liu, Lin; Liu, Jixue; Green, Ryan

    2017-12-01

    It is common that a trained classification model is applied to the operating data that is deviated from the training data because of noise. This paper will test an ensemble method, Diversified Multiple Tree (DMT), on its capability for classifying instances in a new laboratory using the classifier built on the instances of another laboratory. DMT is tested on three real world biomedical data sets from different laboratories in comparison with four benchmark ensemble methods, AdaBoost, Bagging, Random Forests, and Random Trees. Experiments have also been conducted on studying the limitation of DMT and its possible variations. Experimental results show that DMT is significantly more accurate than other benchmark ensemble classifiers on classifying new instances of a different laboratory from the laboratory where instances are used to build the classifier. This paper demonstrates that an ensemble classifier, DMT, is more robust in classifying noisy data than other widely used ensemble methods. DMT works on the data set that supports multiple simple trees.

  6. The joint methane profiles retrieval approach from GOSAT TIR and SWIR spectra

    NASA Astrophysics Data System (ADS)

    Zadvornykh, Ilya V.; Gribanov, Konstantin G.; Zakharov, Vyacheslav I.; Imasu, Ryoichi

    2017-11-01

    In this paper we present a method, using methane as example, which allows more accurate greenhouse gases retrieval in the Earth's atmosphere. Using the new version of the FIRE-ARMS software, supplemented with the VLIDORT vector radiation transfer model, we carried out joint methane retrieval from TIR (Thermal Infrared Range) and SWIR (ShortWavelength Infrared Range) GOSAT spectra using optimal estimation method. MACC reanalysis data from the European Center for Medium-Range Forecasts (ECMWF), supplemented by data from aircraft measurements of the HIPPO experiment were used as a statistical ensemble.

  7. Observability under recurrent loss of data

    NASA Technical Reports Server (NTRS)

    Luck, Rogelio; Ray, Asok; Halevi, Yoram

    1992-01-01

    An account is given of the concept of extended observability in finite-dimensional linear time-invariant systems under recurrent loss of data, where the state vector has to be reconstructed from an ensemble of sensor data at nonconsecutive samples. An at once necessary and sufficient condition for extended observability that can be expressed via a recursive relation is presented, together with such conditions for this as may be related to the characteristic polynomial of the state transition matrix in a discrete-time setting, or of the system matrix in a continuous-time setting.

  8. Multimode cavity-assisted quantum storage via continuous phase-matching control

    NASA Astrophysics Data System (ADS)

    Kalachev, Alexey; Kocharovskaya, Olga

    2013-09-01

    A scheme for spatial multimode quantum memory is developed such that spatial-temporal structure of a weak signal pulse can be stored and recalled via cavity-assisted off-resonant Raman interaction with a strong angular-modulated control field in an extended Λ-type atomic ensemble. It is shown that effective multimode storage is possible when the Raman coherence spatial grating involves wave vectors with different longitudinal components relative to the paraxial signal field. The possibilities of implementing the scheme in the solid-state materials are discussed.

  9. Timelike pion form factor in lattice QCD

    NASA Astrophysics Data System (ADS)

    Feng, Xu; Aoki, Sinya; Hashimoto, Shoji; Kaneko, Takashi

    2015-03-01

    We perform a nonperturbative lattice calculation of the complex phase and modulus of the pion form factor in the timelike momentum region using the finite-volume technique. We use two ensembles of 2 +1 -flavor overlap fermions at pion masses mπ=380 and 290 MeV. By calculating the I =1 correlators in the center-of-mass and three moving frames, we obtain the form factor at ten different values of the timelike momentum transfer around the vector resonance. We compare the results with the phenomenological model of Gounaris-Sakurai and its variant.

  10. The role of ensemble-based statistics in variational assimilation of cloud-affected observations from infrared imagers

    NASA Astrophysics Data System (ADS)

    Hacker, Joshua; Vandenberghe, Francois; Jung, Byoung-Jo; Snyder, Chris

    2017-04-01

    Effective assimilation of cloud-affected radiance observations from space-borne imagers, with the aim of improving cloud analysis and forecasting, has proven to be difficult. Large observation biases, nonlinear observation operators, and non-Gaussian innovation statistics present many challenges. Ensemble-variational data assimilation (EnVar) systems offer the benefits of flow-dependent background error statistics from an ensemble, and the ability of variational minimization to handle nonlinearity. The specific benefits of ensemble statistics, relative to static background errors more commonly used in variational systems, have not been quantified for the problem of assimilating cloudy radiances. A simple experiment framework is constructed with a regional NWP model and operational variational data assimilation system, to provide the basis understanding the importance of ensemble statistics in cloudy radiance assimilation. Restricting the observations to those corresponding to clouds in the background forecast leads to innovations that are more Gaussian. The number of large innovations is reduced compared to the more general case of all observations, but not eliminated. The Huber norm is investigated to handle the fat tails of the distributions, and allow more observations to be assimilated without the need for strict background checks that eliminate them. Comparing assimilation using only ensemble background error statistics with assimilation using only static background error statistics elucidates the importance of the ensemble statistics. Although the cost functions in both experiments converge to similar values after sufficient outer-loop iterations, the resulting cloud water, ice, and snow content are greater in the ensemble-based analysis. The subsequent forecasts from the ensemble-based analysis also retain more condensed water species, indicating that the local environment is more supportive of clouds. In this presentation we provide details that explain the apparent benefit from using ensembles for cloudy radiance assimilation in an EnVar context.

  11. Are Bred Vectors The Same As Lyapunov Vectors?

    NASA Astrophysics Data System (ADS)

    Kalnay, E.; Corazza, M.; Cai, M.

    Regional loss of predictability is an indication of the instability of the underlying flow, where small errors in the initial conditions (or imperfections in the model) grow to large amplitudes in finite times. The stability properties of evolving flows have been studied using Lyapunov vectors (e.g., Alligood et al, 1996, Ott, 1993, Kalnay, 2002), singular vectors (e.g., Lorenz, 1965, Farrell, 1988, Molteni and Palmer, 1993), and, more recently, with bred vectors (e.g., Szunyogh et al, 1997, Cai et al, 2001). Bred vectors (BVs) are, by construction, closely related to Lyapunov vectors (LVs). In fact, after an infinitely long breeding time, and with the use of infinitesimal ampli- tudes, bred vectors are identical to leading Lyapunov vectors. In practical applications, however, bred vectors are different from Lyapunov vectors in two important ways: a) bred vectors are never globally orthogonalized and are intrinsically local in space and time, and b) they are finite-amplitude, finite-time vectors. These two differences are very significant in a dynamical system whose size is very large. For example, the at- mosphere is large enough to have "room" for several synoptic scale instabilities (e.g., storms) to develop independently in different regions (say, North America and Aus- tralia), and it is complex enough to have several different possible types of instabilities (such as barotropic, baroclinic, convective, and even Brownian motion). Bred vectors share some of their properties with leading LVs (Corazza et al, 2001a, 2001b, Toth and Kalnay, 1993, 1997, Cai et al, 2001). For example, 1) Bred vectors are independent of the norm used to define the size of the perturba- tion. Corazza et al. (2001) showed that bred vectors obtained using a potential enstro- phy norm were indistinguishable from bred vectors obtained using a streamfunction squared norm, in contrast with singular vectors. 2) Bred vectors are independent of the length of the rescaling period as long as the perturbations remain approximately linear (for example, for atmospheric models the interval for rescaling could be varied between a single time step and 1 day without affecting qualitatively the characteristics of the bred vectors. However, the finite-amplitude, finite-time, and lack of orthogonalization of the BVs introduces important differences with LVs: 1) In regions that undergo strong instabilities, the bred vectors tend to be locally domi- 1 nated by simple, low-dimensional structures. Patil et al (2001) showed that the BV-dim (appendix) gives a good estimate of the number of dominant directions (shapes) of the local k bred vectors. For example, if half of them are aligned in one direction, and half in a different direction, the BV-dim is about two. If the majority of the bred vectors are aligned predominantly in one direction and only a few are aligned in a second direction, then the BV-dim is between 1 and 2. Patil et al., (2001) showed that the regions with low dimensionality cover about 20% of the atmosphere. They also found that these low-dimensionality regions have a very well defined vertical structure, and a typical lifetime of 3-7 days. The low dimensionality identifies regions where the in- stability of the basic flow has manifested itself in a low number of preferred directions of perturbation growth. 2) Using a Quasi-Geostrophic simulation system of data assimilation developed by Morss (1999), Corazza et al (2001a, b) found that bred vectors have structures that closely resemble the background (short forecasts used as first guess) errors, which in turn dominate the local analysis errors. This is especially true in regions of low dimensionality, which is not surprising if these are unstable regions where errors grow in preferred shapes. 3) The number of bred vectors needed to represent the unstable subspace in the QG system is small (about 6-10). This was shown by computing the local BV-dim as a function of the number of independent bred vectors. Convergence in the local dimen- sion starts to occur at about 6 BVs, and is essentially complete when the number of vectors is about 10-15 (Corazza et al, 2001a). This should be contrasted with the re- sults of Snyder and Joly (1998) and Palmer et al (1998) who showed that hundreds of Lyapunov vectors with positive Lyapunov exponents are needed to represent the attractor of the system in quasi-geostrophic models. 4) Since only a few bred vectors are needed, and background errors project strongly in the subspace of bred vectors, Corazza et al (2001b) were able to develop cost-efficient methods to improve the 3D-Var data assimilation by adding to the background error covariance terms proportional to the outer product of the bred vectors, thus represent- ing the "errors of the day". This approach led to a reduction of analysis error variance of about 40% at very low cost. 5) The fact that BVs have finite amplitude provides a natural way to filter out instabil- ities present in the system that have fast growth, but saturate nonlinearly at such small amplitudes that they are irrelevant for ensemble perturbations. As shown by Lorenz (1996) Lyapunov vectors (and singular vectors) of models including these physical phenomena would be dominated by the fast but small amplitude instabilities, unless they are explicitly excluded from the linearized models. Bred vectors, on the other 2 hand, through the choice of an appropriate size for the perturbation, provide a natural filter based on nonlinear saturation of fast but irrelevant instabilities. 6) Every bred vector is qualitatively similar to the *leading* LV. LVs beyond the leading LV are obtained by orthogonalization after each time step with respect to the previous LVs subspace. The orthogonalization requires the introduction of a norm. With an enstrophy norm, the successive LVs have larger and larger horizontal scales, and a choice of a stream function norm would lead to successively smaller scales in the LVs. Beyond the first few LVs, there is little qualitative similarity between the background errors and the LVs. In summary, in a system like the atmosphere with enough physical space for several independent local instabilities, BVs and LVs share some properties but they also have significant differences. BV are finite-amplitude, finite-time, and because they are not globally orthogonalized, they have local properties in space. Bred vectors are akin to the leading LV, but bred vectors derived from different arbitrary initial perturba- tions remain distinct from each other, instead of collapsing into a single leading vec- tor, presumably because the nonlinear terms and physical parameterizations introduce sufficient stochastic forcing to avoid such convergence. As a result, there is no need for global orthogonalization, and the number of bred vectors required to describe the natural instabilities in an atmospheric system (from a local point of view) is much smaller than the number of Lyapunov vectors with positive Lyapunov exponents. The BVs are independent of the norm, whereas the LVs beyond the first one do depend on the choice of norm: for example, they become larger in scale with a vorticity norm, and smaller with a stream function norm. These properties of BVs result in significant advantages for data assimilation and en- semble forecasting for the atmosphere. Errors in the analysis have structures very similar to bred vectors, and it is found that they project very strongly on the subspace of a few bred vectors. This is not true for either Lyapunov vectors beyond the lead- ing LVs, or for singular vectors unless they are constructed with a norm based on the analysis error covariance matrix (or a bred vector covariance). The similarity between bred vectors and analysis errors leads to the ability to include "errors of the day" in the background error covariance and a significant improvement of the analysis beyond 3D-Var at a very low cost (Corazza, 2001b). References Alligood K. T., T. D. Sauer and J. A. Yorke, 1996: Chaos: an introduction to dynamical systems. Springer-Verlag, New York. Buizza R., J. Tribbia, F. Molteni and T. Palmer, 1993: Computation of optimal unstable 3 structures for numerical weather prediction models. Tellus, 45A, 388-407. Cai, M., E. Kalnay and Z. Toth, 2001: Potential impact of bred vectors on ensemble forecasting and data assimilation in the Zebiak-Cane model. Submitted to J of Climate. Corazza, M., E. Kalnay, D. J. Patil, R. Morss, M. Cai, I. Szunyogh, B. R. Hunt, E. Ott and J. Yorke, 2001: Use of the breeding technique to determine the structure of the "errors of the day". Submitted to Nonlinear Processes in Geophysics. Corazza, M., E. Kalnay, DJ Patil, E. Ott, J. Yorke, I Szunyogh and M. Cai, 2001: Use of the breeding technique in the estimation of the background error covariance matrix for a quasigeostrophic model. AMS Symposium on Observations, Data Assimilation and Predictability, Preprints volume, Orlando, FA, 14-17 January 2002. Farrell, B., 1988: Small error dynamics and the predictability of atmospheric flow, J. Atmos. Sciences, 45, 163-172. Kalnay, E 2002: Atmospheric modeling, data assimilation and predictability. Chapter 6. Cambridge University Press, UK. In press. Kalnay E and Z Toth 1994: Removing growing errors in the analysis. Preprints, Tenth Conference on Numerical Weather Prediction, pp 212-215. Amer. Meteor. Soc., July 18-22, 1994. Lorenz, E.N., 1965: A study of the predictability of a 28-variable atmospheric model. Tellus, 21, 289-307. Lorenz, E.N., 1996: Predictability- A problem partly solved. Proceedings of the ECMWF Seminar on Predictability, Reading, England, Vol. 1 1-18. Molteni F. and TN Palmer, 1993: Predictability and finite-time instability of the north- ern winter circulation. Q. J. Roy. Meteorol. Soc. 119, 269-298. Morss, R.E.: 1999: Adaptive observations: Idealized sampling strategies for improving numerical weather prediction. Ph.D. Thesis, Massachussetts Institute of Technology, 225pp. Ott, E., 1993: Chaos in Dynamical Systems. Cambridge University Press. New York. Palmer, TN, R. Gelaro, J. Barkmeijer and R. Buizza, 1998: Singular vectors, metrics and adaptive observations. J. Atmos Sciences, 55, 633-653. Patil, DJ, BR Hunt, E Kalnay, J. Yorke, and E. Ott, 2001: Local low dimensionality of atmospheric dynamics. Phys. Rev. Lett., 86, 5878. Patil, DJ, I. Szunyogh, BR Hunt, E Kalnay, E Ott, and J. Yorke, 2001: Using large 4 member ensembles to isolate local low dimensionality of atmospheric dynamics. AMS Symposium on Observations, Data Assimilation and Predictability, Preprints volume, Orlando, FA, 14-17 January 2002. Snyder, C. and A. Joly, 1998: Development of perturbations within growing baroclinic waves. Q. J. Roy. Meteor. Soc., 124, pp 1961. Szunyogh, I, E. Kalnay and Z. Toth, 1997: A comparison of Lyapunov and Singular vectors in a low resolution GCM. Tellus, 49A, 200-227. Toth, Z and E Kalnay 1993: Ensemble forecasting at NMC - the generation of pertur- bations. Bull. Amer. Meteorol. Soc., 74, 2317-2330. Toth, Z and E Kalnay 1997: Ensemble forecasting at NCEP and the breeding method. Mon Wea Rev, 125, 3297-3319. * Corresponding author address: Eugenia Kalnay, Meteorology Depart- ment, University of Maryland, College Park, MD 20742-2425, USA; email: ekalnay@atmos.umd.edu Appendix: BV-dimension Patil et al., (2001) defined local bred vectors around a point in the 3-dimensional grid of the model by taking the 24 closest horizontal neighbors. If there are k bred vectors available, and N model variables for each grid point, the k local bred vectors form the columns of a 25Nxk matrix B. The kxk covariance matrix is C=B^T B. Its eigen- values are positive, and its eigenvectors v(i) are the singular vectors of the local bred vector subspace. The Bred Vector dimension (BV-dim) measures the local effective dimension: BV-dim[s,s,...,s(k)]={SUM[s(i)]}^2/SUM[s(i)]^2 where s(i) are the square roots of the eigenvalues of the covariance matrix. 5

  12. Signal detection using support vector machines in the presence of ultrasonic speckle

    NASA Astrophysics Data System (ADS)

    Kotropoulos, Constantine L.; Pitas, Ioannis

    2002-04-01

    Support Vector Machines are a general algorithm based on guaranteed risk bounds of statistical learning theory. They have found numerous applications, such as in classification of brain PET images, optical character recognition, object detection, face verification, text categorization and so on. In this paper we propose the use of support vector machines to segment lesions in ultrasound images and we assess thoroughly their lesion detection ability. We demonstrate that trained support vector machines with a Radial Basis Function kernel segment satisfactorily (unseen) ultrasound B-mode images as well as clinical ultrasonic images.

  13. Simultaneous state-parameter estimation supports the evaluation of data assimilation performance and measurement design for soil-water-atmosphere-plant system

    NASA Astrophysics Data System (ADS)

    Hu, Shun; Shi, Liangsheng; Zha, Yuanyuan; Williams, Mathew; Lin, Lin

    2017-12-01

    Improvements to agricultural water and crop managements require detailed information on crop and soil states, and their evolution. Data assimilation provides an attractive way of obtaining these information by integrating measurements with model in a sequential manner. However, data assimilation for soil-water-atmosphere-plant (SWAP) system is still lack of comprehensive exploration due to a large number of variables and parameters in the system. In this study, simultaneous state-parameter estimation using ensemble Kalman filter (EnKF) was employed to evaluate the data assimilation performance and provide advice on measurement design for SWAP system. The results demonstrated that a proper selection of state vector is critical to effective data assimilation. Especially, updating the development stage was able to avoid the negative effect of ;phenological shift;, which was caused by the contrasted phenological stage in different ensemble members. Simultaneous state-parameter estimation (SSPE) assimilation strategy outperformed updating-state-only (USO) assimilation strategy because of its ability to alleviate the inconsistency between model variables and parameters. However, the performance of SSPE assimilation strategy could deteriorate with an increasing number of uncertain parameters as a result of soil stratification and limited knowledge on crop parameters. In addition to the most easily available surface soil moisture (SSM) and leaf area index (LAI) measurements, deep soil moisture, grain yield or other auxiliary data were required to provide sufficient constraints on parameter estimation and to assure the data assimilation performance. This study provides an insight into the response of soil moisture and grain yield to data assimilation in SWAP system and is helpful for soil moisture movement and crop growth modeling and measurement design in practice.

  14. The QWeCI Project: seamlessly linking climate science to society

    NASA Astrophysics Data System (ADS)

    Morse, A. P.; Caminade, C.; Jones, A. E.; MacLeod, D.; Heath, A. E.

    2012-04-01

    The EU FP7 QWeCI project Quantifying Weather and Climate Impacts on health in developing countries (www.liv.ac.uk/qweci) has 13 partners with 7 of these in Africa. The geographical focus of the project is in Senegal, Ghana and Malawi. In all three countries the project has a strong scientific dissemination outlook as well as having field based surveillance programmes in Ghana and Senegal to understand more about the local parameters controlling the transmission of malaria and in Senegal of Rift Valley fever. The project has a strong and active climate science activity in using hindcasts of the new System 4 seasonal forecasting system at ECMWF; to further develop the use of monthly to seasonal forecasts from ensemble prediction systems; within project downscaling development; the assessment of decadal ensemble prediction systems; and the development and testing of vector borne disease models for malaria and Rift Valley fever. In parallel with the science programme the project has a large outreach activity involving regular communication and bi-lateral exchanges, science and decision maker focused workshops. In Malawi a long range WiFi network has been established for the dissemination of data. In Senegal where they is a concentration of partners and stakeholders the project is gaining a role as a catalyst for wider health and climate related activity within government departments and national research bodies along with the support and involvement of local communities. Within these wider community discussions we have interactive inputs from African and European scientists who are partners in the project. This paper will show highlights of the work completed so far and give an outline to future development and to encourage a wider user interaction from outside of the current project team and their direct collaborators.

  15. Influences and interactions of inundation, peat, and snow on active layer thickness: Modeling Archive

    DOE Data Explorer

    Scott Painter; Ethan Coon; Cathy Wilson; Dylan Harp; Adam Atchley

    2016-04-21

    This Modeling Archive is in support of an NGEE Arctic publication currently in review [4/2016]. The Advanced Terrestrial Simulator (ATS) was used to simulate thermal hydrological conditions across varied environmental conditions for an ensemble of 1D models of Arctic permafrost. The thickness of organic soil is varied from 2 to 40cm, snow depth is varied from approximately 0 to 1.2 meters, water table depth was varied from -51cm below the soil surface to 31 cm above the soil surface. A total of 15,960 ensemble members are included. Data produced includes the third and fourth simulation year: active layer thickness, time of deepest thaw depth, temperature of the unfrozen soil, and unfrozen liquid saturation, for each ensemble member. Input files used to run the ensemble are also included.

  16. Quantum memory operations in a flux qubit - spin ensemble hybrid system

    NASA Astrophysics Data System (ADS)

    Saito, S.; Zhu, X.; Amsuss, R.; Matsuzaki, Y.; Kakuyanagi, K.; Shimo-Oka, T.; Mizuochi, N.; Nemoto, K.; Munro, W. J.; Semba, K.

    2014-03-01

    Superconducting quantum bits (qubits) are one of the most promising candidates for a future large-scale quantum processor. However for larger scale realizations the currently reported coherence times of these macroscopic objects (superconducting qubits) has not yet reached those of microscopic systems (electron spins, nuclear spins, etc). In this context, a superconductor-spin ensemble hybrid system has attracted considerable attention. The spin ensemble could operate as a quantum memory for superconducting qubits. We have experimentally demonstrated quantum memory operations in a superconductor-diamond hybrid system. An excited state and a superposition state prepared in the flux qubit can be transferred to, stored in and retrieved from the NV spin ensemble in diamond. From these experiments, we have found the coherence time of the spin ensemble is limited by the inhomogeneous broadening of the electron spin (4.4 MHz) and by the hyperfine coupling to nitrogen nuclear spins (2.3 MHz). In the future, spin echo techniques could eliminate these effects and elongate the coherence time. Our results are a significant first step in utilizing the spin ensemble as long-lived quantum memory for superconducting flux qubits. This work was supported by the FIRST program and NICT.

  17. Collective coupling in hybrid superconducting circuits

    NASA Astrophysics Data System (ADS)

    Saito, Shiro

    Hybrid quantum systems utilizing superconducting circuits have attracted significant recent attention, not only for quantum information processing tasks but also as a way to explore fundamentally new physics regimes. In this talk, I will discuss two superconducting circuit based hybrid quantum system approaches. The first is a superconducting flux qubit - electron spin ensemble hybrid system in which quantum information manipulated in the flux qubit can be transferred to, stored in and retrieved from the ensemble. Although the coherence time of the ensemble is short, about 20 ns, this is a significant first step to utilize the spin ensemble as quantum memory for superconducting flux qubits. The second approach is a superconducting resonator - flux qubit ensemble hybrid system in which we fabricated a superconducting LC resonator coupled to a large ensemble of flux qubits. Here we observed a dispersive frequency shift of approximately 250 MHz in the resonators transmission spectrum. This indicates thousands of flux qubits are coupling to the resonator collectively. Although we need to improve our qubits inhomogeneity, our system has many potential uses including the creation of new quantum metamaterials, novel applications in quantum metrology and so on. This work was partially supported by JSPS KAKENHI Grant Number 25220601.

  18. New technique for ensemble dressing combining Multimodel SuperEnsemble and precipitation PDF

    NASA Astrophysics Data System (ADS)

    Cane, D.; Milelli, M.

    2009-09-01

    The Multimodel SuperEnsemble technique (Krishnamurti et al., Science 285, 1548-1550, 1999) is a postprocessing method for the estimation of weather forecast parameters reducing direct model output errors. It differs from other ensemble analysis techniques by the use of an adequate weighting of the input forecast models to obtain a combined estimation of meteorological parameters. Weights are calculated by least-square minimization of the difference between the model and the observed field during a so-called training period. Although it can be applied successfully on the continuous parameters like temperature, humidity, wind speed and mean sea level pressure (Cane and Milelli, Meteorologische Zeitschrift, 15, 2, 2006), the Multimodel SuperEnsemble gives good results also when applied on the precipitation, a parameter quite difficult to handle with standard post-processing methods. Here we present our methodology for the Multimodel precipitation forecasts applied on a wide spectrum of results over Piemonte very dense non-GTS weather station network. We will focus particularly on an accurate statistical method for bias correction and on the ensemble dressing in agreement with the observed precipitation forecast-conditioned PDF. Acknowledgement: this work is supported by the Italian Civil Defence Department.

  19. Support Vector Machines Model of Computed Tomography for Assessing Lymph Node Metastasis in Esophageal Cancer with Neoadjuvant Chemotherapy.

    PubMed

    Wang, Zhi-Long; Zhou, Zhi-Guo; Chen, Ying; Li, Xiao-Ting; Sun, Ying-Shi

    The aim of this study was to diagnose lymph node metastasis of esophageal cancer by support vector machines model based on computed tomography. A total of 131 esophageal cancer patients with preoperative chemotherapy and radical surgery were included. Various indicators (tumor thickness, tumor length, tumor CT value, total number of lymph nodes, and long axis and short axis sizes of largest lymph node) on CT images before and after neoadjuvant chemotherapy were recorded. A support vector machines model based on these CT indicators was built to predict lymph node metastasis. Support vector machines model diagnosed lymph node metastasis better than preoperative short axis size of largest lymph node on CT. The area under the receiver operating characteristic curves were 0.887 and 0.705, respectively. The support vector machine model of CT images can help diagnose lymph node metastasis in esophageal cancer with preoperative chemotherapy.

  20. Comparison of different assimilation methodologies of groundwater levels to improve predictions of root zone soil moisture with an integrated terrestrial system model

    NASA Astrophysics Data System (ADS)

    Zhang, Hongjuan; Kurtz, Wolfgang; Kollet, Stefan; Vereecken, Harry; Franssen, Harrie-Jan Hendricks

    2018-01-01

    The linkage between root zone soil moisture and groundwater is either neglected or simplified in most land surface models. The fully-coupled subsurface-land surface model TerrSysMP including variably saturated groundwater dynamics is used in this work. We test and compare five data assimilation methodologies for assimilating groundwater level data via the ensemble Kalman filter (EnKF) to improve root zone soil moisture estimation with TerrSysMP. Groundwater level data are assimilated in the form of pressure head or soil moisture (set equal to porosity in the saturated zone) to update state vectors. In the five assimilation methodologies, the state vector contains either (i) pressure head, or (ii) log-transformed pressure head, or (iii) soil moisture, or (iv) pressure head for the saturated zone only, or (v) a combination of pressure head and soil moisture, pressure head for the saturated zone and soil moisture for the unsaturated zone. These methodologies are evaluated in synthetic experiments which are performed for different climate conditions, soil types and plant functional types to simulate various root zone soil moisture distributions and groundwater levels. The results demonstrate that EnKF cannot properly handle strongly skewed pressure distributions which are caused by extreme negative pressure heads in the unsaturated zone during dry periods. This problem can only be alleviated by methodology (iii), (iv) and (v). The last approach gives the best results and avoids unphysical updates related to strongly skewed pressure heads in the unsaturated zone. If groundwater level data are assimilated by methodology (iii), EnKF fails to update the state vector containing the soil moisture values if for (almost) all the realizations the observation does not bring significant new information. Synthetic experiments for the joint assimilation of groundwater levels and surface soil moisture support methodology (v) and show great potential for improving the representation of root zone soil moisture.

  1. Modeling the present and future geographic distribution of the Lone star tick, Amblyomma americanum (Ixodida: Ixodidae), in the continental United States

    USGS Publications Warehouse

    Springer, Yuri P.; Jarnevich, Catherine S.; Barnett, David T.; Monaghan, Andrew J.; Eisen, Rebecca J.

    2015-01-01

    The Lone star tick (Amblyomma americanum L.) is the primary vector for pathogens of significant public health importance in North America, yet relatively little is known about its current and potential future distribution. Building on a published summary of tick collection records, we used an ensemble modeling approach to predict the present-day and future distribution of climatically suitable habitat for establishment of the Lone star tick within the continental United States. Of the nine climatic predictor variables included in our five present-day models, average vapor pressure in July was by far the most important determinant of suitable habitat. The present-day ensemble model predicted an essentially contiguous distribution of suitable habitat extending to the Atlantic coast east of the 100th western meridian and south of the 40th northern parallel, but excluding a high elevation region associated with the Appalachian Mountains. Future ensemble predictions for 2061–2080 forecasted a stable western range limit, northward expansion of suitable habitat into the Upper Midwest and western Pennsylvania, and range contraction along portions of the Gulf coast and the lower Mississippi river valley. These findings are informative for raising awareness of A. americanum-transmitted pathogens in areas where the Lone Star tick has recently or may become established.

  2. VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases

    PubMed Central

    Giraldo-Calderón, Gloria I.; Emrich, Scott J.; MacCallum, Robert M.; Maslen, Gareth; Dialynas, Emmanuel; Topalis, Pantelis; Ho, Nicholas; Gesing, Sandra; Madey, Gregory; Collins, Frank H.; Lawson, Daniel

    2015-01-01

    VectorBase is a National Institute of Allergy and Infectious Diseases supported Bioinformatics Resource Center (BRC) for invertebrate vectors of human pathogens. Now in its 11th year, VectorBase currently hosts the genomes of 35 organisms including a number of non-vectors for comparative analysis. Hosted data range from genome assemblies with annotated gene features, transcript and protein expression data to population genetics including variation and insecticide-resistance phenotypes. Here we describe improvements to our resource and the set of tools available for interrogating and accessing BRC data including the integration of Web Apollo to facilitate community annotation and providing Galaxy to support user-based workflows. VectorBase also actively supports our community through hands-on workshops and online tutorials. All information and data are freely available from our website at https://www.vectorbase.org/. PMID:25510499

  3. Multi-model ensembles for assessment of flood losses and associated uncertainty

    NASA Astrophysics Data System (ADS)

    Figueiredo, Rui; Schröter, Kai; Weiss-Motz, Alexander; Martina, Mario L. V.; Kreibich, Heidi

    2018-05-01

    Flood loss modelling is a crucial part of risk assessments. However, it is subject to large uncertainty that is often neglected. Most models available in the literature are deterministic, providing only single point estimates of flood loss, and large disparities tend to exist among them. Adopting any one such model in a risk assessment context is likely to lead to inaccurate loss estimates and sub-optimal decision-making. In this paper, we propose the use of multi-model ensembles to address these issues. This approach, which has been applied successfully in other scientific fields, is based on the combination of different model outputs with the aim of improving the skill and usefulness of predictions. We first propose a model rating framework to support ensemble construction, based on a probability tree of model properties, which establishes relative degrees of belief between candidate models. Using 20 flood loss models in two test cases, we then construct numerous multi-model ensembles, based both on the rating framework and on a stochastic method, differing in terms of participating members, ensemble size and model weights. We evaluate the performance of ensemble means, as well as their probabilistic skill and reliability. Our results demonstrate that well-designed multi-model ensembles represent a pragmatic approach to consistently obtain more accurate flood loss estimates and reliable probability distributions of model uncertainty.

  4. Finite temperature grand canonical ensemble study of the minimum electrophilicity principle.

    PubMed

    Miranda-Quintana, Ramón Alain; Chattaraj, Pratim K; Ayers, Paul W

    2017-09-28

    We analyze the minimum electrophilicity principle of conceptual density functional theory using the framework of the finite temperature grand canonical ensemble. We provide support for this principle, both for the cases of systems evolving from a non-equilibrium to an equilibrium state and for the change from one equilibrium state to another. In doing so, we clearly delineate the cases where this principle can, or cannot, be used.

  5. A Fractional Cartesian Composition Model for Semi-Spatial Comparative Visualization Design.

    PubMed

    Kolesar, Ivan; Bruckner, Stefan; Viola, Ivan; Hauser, Helwig

    2017-01-01

    The study of spatial data ensembles leads to substantial visualization challenges in a variety of applications. In this paper, we present a model for comparative visualization that supports the design of according ensemble visualization solutions by partial automation. We focus on applications, where the user is interested in preserving selected spatial data characteristics of the data as much as possible-even when many ensemble members should be jointly studied using comparative visualization. In our model, we separate the design challenge into a minimal set of user-specified parameters and an optimization component for the automatic configuration of the remaining design variables. We provide an illustrated formal description of our model and exemplify our approach in the context of several application examples from different domains in order to demonstrate its generality within the class of comparative visualization problems for spatial data ensembles.

  6. Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomic Data.

    PubMed

    Bolser, Dan M; Staines, Daniel M; Perry, Emily; Kersey, Paul J

    2017-01-01

    Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for 39 sequenced plant species. Available data includes genome sequence, gene models, functional annotation, and polymorphic loci; for the latter, additional information including population structure, individual genotypes, linkage, and phenotype data is available for some species. Comparative data is also available, including genomic alignments and "gene trees," which show the inferred evolutionary history of each gene family represented in the resource. Access to the data is provided through a genome browser, which incorporates many specialist interfaces for different data types, through a variety of programmatic interfaces, and via a specialist data mining tool supporting rapid filtering and retrieval of bulk data. Genomic data from many non-plant species, including those of plant pathogens, pests, and pollinators, is also available via the same interfaces through other divisions of Ensembl.Ensembl Plants is updated 4-6 times a year and is developed in collaboration with our international partners in the Gramene ( http://www.gramene.org ) and transPLANT projects ( http://www.transplantdb.eu ).

  7. Modelling dynamics in protein crystal structures by ensemble refinement

    PubMed Central

    Burnley, B Tom; Afonine, Pavel V; Adams, Paul D; Gros, Piet

    2012-01-01

    Single-structure models derived from X-ray data do not adequately account for the inherent, functionally important dynamics of protein molecules. We generated ensembles of structures by time-averaged refinement, where local molecular vibrations were sampled by molecular-dynamics (MD) simulation whilst global disorder was partitioned into an underlying overall translation–libration–screw (TLS) model. Modeling of 20 protein datasets at 1.1–3.1 Å resolution reduced cross-validated Rfree values by 0.3–4.9%, indicating that ensemble models fit the X-ray data better than single structures. The ensembles revealed that, while most proteins display a well-ordered core, some proteins exhibit a ‘molten core’ likely supporting functionally important dynamics in ligand binding, enzyme activity and protomer assembly. Order–disorder changes in HIV protease indicate a mechanism of entropy compensation for ordering the catalytic residues upon ligand binding by disordering specific core residues. Thus, ensemble refinement extracts dynamical details from the X-ray data that allow a more comprehensive understanding of structure–dynamics–function relationships. DOI: http://dx.doi.org/10.7554/eLife.00311.001 PMID:23251785

  8. System Dynamics based Dengue modeling environment to simulate evolution of Dengue infection under different climate scenarios

    NASA Astrophysics Data System (ADS)

    Anwar, R.; Khan, R.; Usmani, M.; Colwell, R. R.; Jutla, A.

    2017-12-01

    Vector borne infectious diseases such as Dengue, Zika and Chikungunya remain a public health threat. An estimate of the World Health Organization (WHO) suggests that about 2.5 billion people, representing ca. 40% of human population,are at increased risk of dengue; with more than 100 million infection cases every year. Vector-borne infections cannot be eradicated since disease causing pathogens survive in the environment. Over the last few decades dengue infection has been reported in more than 100 countries and is expanding geographically. Female Ae. Aegypti mosquito, the daytime active and a major vector for dengue virus, is associated with urban population density and regional climatic processes. However, mathematical quantification of relationships on abundance of vectors and climatic processes remain a challenge, particularly in regions where such data are not routinely collected. Here, using system dynamics based feedback mechanism, an algorithm integrating knowledge from entomological, meteorological and epidemiological processes is developed that has potential to provide ensemble simulations on risk of occurrence of dengue infection in human population. Using dataset from satellite remote sensing, the algorithm was calibrated and validated using actual dengue case data of Iquitos, Peru. We will show results on model capabilities in capturing initiation and peak in the observed time series. In addition, results from several simulation scenarios under different climatic conditions will be discussed.

  9. Logarithmic violation of scaling in strongly anisotropic turbulent transfer of a passive vector field

    NASA Astrophysics Data System (ADS)

    Antonov, N. V.; Gulitskiy, N. M.

    2015-01-01

    Inertial-range asymptotic behavior of a vector (e.g., magnetic) field, passively advected by a strongly anisotropic turbulent flow, is studied by means of the field-theoretic renormalization group and the operator product expansion. The advecting velocity field is Gaussian, not correlated in time, with the pair correlation function of the form ∝δ (t -t') /k⊥d -1 +ξ , where k⊥=|k⊥| and k⊥ is the component of the wave vector, perpendicular to the distinguished direction ("direction of the flow")—the d -dimensional generalization of the ensemble introduced by Avellaneda and Majda [Commun. Math. Phys. 131, 381 (1990), 10.1007/BF02161420]. The stochastic advection-diffusion equation for the transverse (divergence-free) vector field includes, as special cases, the kinematic dynamo model for magnetohydrodynamic turbulence and the linearized Navier-Stokes equation. In contrast to the well-known isotropic Kraichnan's model, where various correlation functions exhibit anomalous scaling behavior with infinite sets of anomalous exponents, here the dependence on the integral turbulence scale L has a logarithmic behavior: Instead of powerlike corrections to ordinary scaling, determined by naive (canonical) dimensions, the anomalies manifest themselves as polynomials of logarithms of L . The key point is that the matrices of scaling dimensions of the relevant families of composite operators appear nilpotent and cannot be diagonalized. The detailed proof of this fact is given for the correlation functions of arbitrary order.

  10. Using R in Taverna: RShell v1.2

    PubMed Central

    Wassink, Ingo; Rauwerda, Han; Neerincx, Pieter BT; Vet, Paul E van der; Breit, Timo M; Leunissen, Jack AM; Nijholt, Anton

    2009-01-01

    Background R is the statistical language commonly used by many life scientists in (omics) data analysis. At the same time, these complex analyses benefit from a workflow approach, such as used by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical. Findings We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv)Syntax highlighting of the R code; v) Improved usability through fewer port types. Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly. Conclusion Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way. PMID:19607662

  11. Testing of the Support Vector Machine for Binary-Class Classification

    NASA Technical Reports Server (NTRS)

    Scholten, Matthew

    2011-01-01

    The Support Vector Machine is a powerful algorithm, useful in classifying data in to species. The Support Vector Machines implemented in this research were used as classifiers for the final stage in a Multistage Autonomous Target Recognition system. A single kernel SVM known as SVMlight, and a modified version known as a Support Vector Machine with K-Means Clustering were used. These SVM algorithms were tested as classifiers under varying conditions. Image noise levels varied, and the orientation of the targets changed. The classifiers were then optimized to demonstrate their maximum potential as classifiers. Results demonstrate the reliability of SMV as a method for classification. From trial to trial, SVM produces consistent results

  12. Reduction of predictive uncertainty in estimating irrigation water requirement through multi-model ensembles and ensemble averaging

    NASA Astrophysics Data System (ADS)

    Multsch, S.; Exbrayat, J.-F.; Kirby, M.; Viney, N. R.; Frede, H.-G.; Breuer, L.

    2014-11-01

    Irrigation agriculture plays an increasingly important role in food supply. Many evapotranspiration models are used today to estimate the water demand for irrigation. They consider different stages of crop growth by empirical crop coefficients to adapt evapotranspiration throughout the vegetation period. We investigate the importance of the model structural vs. model parametric uncertainty for irrigation simulations by considering six evapotranspiration models and five crop coefficient sets to estimate irrigation water requirements for growing wheat in the Murray-Darling Basin, Australia. The study is carried out using the spatial decision support system SPARE:WATER. We find that structural model uncertainty is far more important than model parametric uncertainty to estimate irrigation water requirement. Using the Reliability Ensemble Averaging (REA) technique, we are able to reduce the overall predictive model uncertainty by more than 10%. The exceedance probability curve of irrigation water requirements shows that a certain threshold, e.g. an irrigation water limit due to water right of 400 mm, would be less frequently exceeded in case of the REA ensemble average (45%) in comparison to the equally weighted ensemble average (66%). We conclude that multi-model ensemble predictions and sophisticated model averaging techniques are helpful in predicting irrigation demand and provide relevant information for decision making.

  13. Propensity, Probability, and Quantum Theory

    NASA Astrophysics Data System (ADS)

    Ballentine, Leslie E.

    2016-08-01

    Quantum mechanics and probability theory share one peculiarity. Both have well established mathematical formalisms, yet both are subject to controversy about the meaning and interpretation of their basic concepts. Since probability plays a fundamental role in QM, the conceptual problems of one theory can affect the other. We first classify the interpretations of probability into three major classes: (a) inferential probability, (b) ensemble probability, and (c) propensity. Class (a) is the basis of inductive logic; (b) deals with the frequencies of events in repeatable experiments; (c) describes a form of causality that is weaker than determinism. An important, but neglected, paper by P. Humphreys demonstrated that propensity must differ mathematically, as well as conceptually, from probability, but he did not develop a theory of propensity. Such a theory is developed in this paper. Propensity theory shares many, but not all, of the axioms of probability theory. As a consequence, propensity supports the Law of Large Numbers from probability theory, but does not support Bayes theorem. Although there are particular problems within QM to which any of the classes of probability may be applied, it is argued that the intrinsic quantum probabilities (calculated from a state vector or density matrix) are most naturally interpreted as quantum propensities. This does not alter the familiar statistical interpretation of QM. But the interpretation of quantum states as representing knowledge is untenable. Examples show that a density matrix fails to represent knowledge.

  14. Prediction of in vivo hepatotoxicity effects using in vitro ...

    EPA Pesticide Factsheets

    High-throughput in vitro transcriptomics data support molecular understanding of chemical-induced toxicity. Here, we evaluated the utility of such data to predict liver toxicity. First, in vitro gene expression data for 93 genes was generated following exposure of metabolically competent HepaRG cells to 1060 environmental chemicals from the US EPA ToxCast library. The empirical relationship between these data and rat chronic liver endpoints from animal studies in the Toxicity Reference Database (ToxRefDB) was then evaluated using machine learning techniques. Chemicals were classified as positive (242) or negative (135) based on observed hepatic histopathologic effects, and divided into three categories: hypertrophy (183), injury (112) and proliferative lesions (101). Hepatotoxicants were classified on the basis of the bioactivity of 93 genes (descriptors) using six machine learning algorithms: linear discriminant analysis, naïve Bayes, support vector classification, classification and regression trees, k-nearest neighbors, and an ensemble of classifiers. Classification performance was evaluated using 10-fold cross-validation testing, and in-loop, filter-based, feature subset selection. The best balanced accuracy for prediction of hypertrophy, injury and proliferative lesions were 0.81 ± 0.07, 0.79 ± 0.08 and 0.77 ± 0.09, respectively. Gene specific perturbation of xenobiotic metabolism enzymes (CYP7A1/2E1/4A11/1A1/4A22) and transporters (ABCG2, ABCB11, SLC22

  15. Data Analysis, Modeling, and Ensemble Forecasting to Support NOWCAST and Forecast Activities at the Fallon Naval Station

    DTIC Science & Technology

    2010-09-30

    and climate forecasting and use of satellite data assimilation for model evaluation. He is a task leader on another NSF_EPSCoR project for the...1 DISTRIBUTION STATEMENT A: Approved for public release; distribution is unlimited. Data Analysis, Modeling, and Ensemble Forecasting to...observations including remotely sensed data . OBJECTIVES The main objectives of the study are: 1) to further develop, test, and continue twice daily

  16. Data Analysis, Modeling, and Ensemble Forecasting to Support NOWCAST and Forecast Activities at the Fallon Naval Station

    DTIC Science & Technology

    2011-09-30

    forecasting and use of satellite data assimilation for model evaluation (Jiang et al, 2011a). He is a task leader on another NSF EPSCoR project...K. Horvath, R. Belu, 2011a: Application of variational data assimilation to dynamical downscaling of regional wind energy resources in the western...1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Data Analysis, Modeling, and Ensemble Forecasting to

  17. Multiclass Reduced-Set Support Vector Machines

    NASA Technical Reports Server (NTRS)

    Tang, Benyang; Mazzoni, Dominic

    2006-01-01

    There are well-established methods for reducing the number of support vectors in a trained binary support vector machine, often with minimal impact on accuracy. We show how reduced-set methods can be applied to multiclass SVMs made up of several binary SVMs, with significantly better results than reducing each binary SVM independently. Our approach is based on Burges' approach that constructs each reduced-set vector as the pre-image of a vector in kernel space, but we extend this by recomputing the SVM weights and bias optimally using the original SVM objective function. This leads to greater accuracy for a binary reduced-set SVM, and also allows vectors to be 'shared' between multiple binary SVMs for greater multiclass accuracy with fewer reduced-set vectors. We also propose computing pre-images using differential evolution, which we have found to be more robust than gradient descent alone. We show experimental results on a variety of problems and find that this new approach is consistently better than previous multiclass reduced-set methods, sometimes with a dramatic difference.

  18. A Subdivision-Based Representation for Vector Image Editing.

    PubMed

    Liao, Zicheng; Hoppe, Hugues; Forsyth, David; Yu, Yizhou

    2012-11-01

    Vector graphics has been employed in a wide variety of applications due to its scalability and editability. Editability is a high priority for artists and designers who wish to produce vector-based graphical content with user interaction. In this paper, we introduce a new vector image representation based on piecewise smooth subdivision surfaces, which is a simple, unified and flexible framework that supports a variety of operations, including shape editing, color editing, image stylization, and vector image processing. These operations effectively create novel vector graphics by reusing and altering existing image vectorization results. Because image vectorization yields an abstraction of the original raster image, controlling the level of detail of this abstraction is highly desirable. To this end, we design a feature-oriented vector image pyramid that offers multiple levels of abstraction simultaneously. Our new vector image representation can be rasterized efficiently using GPU-accelerated subdivision. Experiments indicate that our vector image representation achieves high visual quality and better supports editing operations than existing representations.

  19. Investigating energy-based pool structure selection in the structure ensemble modeling with experimental distance constraints: The example from a multidomain protein Pub1.

    PubMed

    Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan

    2018-05-01

    The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.

  20. Structural anomalies in undoped Gallium Arsenide observed in high resolution diffraction imaging with monochromatic synchrotron radiation

    NASA Technical Reports Server (NTRS)

    Steiner, B.; Kuriyama, M.; Dobbyn, R. C.; Laor, U.; Larson, D.; Brown, M.

    1988-01-01

    Novel, streak-like disruption features restricted to the plane of diffraction have recently been observed in images obtained by synchrotron radiation diffraction from undoped, semi-insulating gallium arsenide crystals. These features were identified as ensembles of very thin platelets or interfaces lying in (110) planes, and a structural model consisting of antiphase domain boundaries was proposed. We report here the other principal features observed in high resolution monochromatic synchrotron radiation diffraction images: (quasi) cellular structure; linear, very low-angle subgrain boundaries in (110) directions, and surface stripes in a (110) direction. In addition, we report systematic differences in the acceptance angle for images involving various diffraction vectors. When these observations are considered together, a unifying picture emerges. The presence of ensembles of thin (110) antiphase platelet regions or boundaries is generally consistent not only with the streak-like diffraction features but with the other features reported here as well. For the formation of such regions we propose two mechanisms, operating in parallel, that appear to be consistent with the various defect features observed by a variety of techniques.

  1. Structural anomalies in undoped gallium arsenide observed in high-resolution diffraction imaging with monochromatic synchrotron radiation

    NASA Technical Reports Server (NTRS)

    Steiner, B.; Kuriyama, M.; Dobbyn, R. C.; Laor, U.; Larson, D.

    1989-01-01

    Novel, streak-like disruption features restricted to the plane of diffraction have recently been observed in images obtained by synchrotron radiation diffraction from undoped, semi-insulating gallium arsenide crystals. These features were identified as ensembles of very thin platelets or interfaces lying in (110) planes, and a structural model consisting of antiphase domain boundaries was proposed. We report here the other principal features observed in high resolution monochromatic synchrotron radiation diffraction images: (quasi) cellular structure; linear, very low-angle subgrain boundaries in (110) directions, and surface stripes in a (110) direction. In addition, we report systematic differences in the acceptance angle for images involving various diffraction vectors. When these observations are considered together, a unifying picture emerges. The presence of ensembles of thin (110) antiphase platelet regions or boundaries is generally consistent not only with the streak-like diffraction features but with the other features reported here as well. For the formation of such regions we propose two mechanisms, operating in parallel, that appear to be consistent with the various defect features observed by a variety of techniques.

  2. Single-shot quantum state estimation via a continuous measurement in the strong backaction regime

    NASA Astrophysics Data System (ADS)

    Cook, Robert L.; Riofrío, Carlos A.; Deutsch, Ivan H.

    2014-09-01

    We study quantum tomography based on a stochastic continuous-time measurement record obtained from a probe field collectively interacting with an ensemble of identically prepared systems. In comparison to previous studies, we consider here the case in which the measurement-induced backaction has a non-negligible effect on the dynamical evolution of the ensemble. We formulate a maximum likelihood estimate for the initial quantum state given only a single instance of the continuous diffusive measurement record. We apply our estimator to the simplest problem: state tomography of a single pure qubit, which, during the course of the measurement, is also subjected to dynamical control. We identify a regime where the many-body system is well approximated at all times by a separable pure spin coherent state, whose Bloch vector undergoes a conditional stochastic evolution. We simulate the results of our estimator and show that we can achieve close to the upper bound of fidelity set by the optimal generalized measurement. This estimate is compared to, and significantly outperforms, an equivalent estimator that ignores measurement backaction.

  3. Insect cell transformation vectors that support high level expression and promoter assessment in insect cell culture

    USDA-ARS?s Scientific Manuscript database

    A somatic transformation vector, pDP9, was constructed that provides a simplified means of producing permanently transformed cultured insect cells that support high levels of protein expression of foreign genes. The pDP9 plasmid vector incorporates DNA sequences from the Junonia coenia densovirus th...

  4. Development and Use of the Hydrologic Ensemble Forecast System by the National Weather Service to Support the New York City Water Supply

    NASA Astrophysics Data System (ADS)

    Shedd, R.; Reed, S. M.; Porter, J. H.

    2015-12-01

    The National Weather Service (NWS) has been working for several years on the development of the Hydrologic Ensemble Forecast System (HEFS). The objective of HEFS is to provide ensemble river forecasts incorporating the best precipitation and temperature forcings at any specific time horizon. For the current implementation, this includes the Global Ensemble Forecast System (GEFS) and the Climate Forecast System (CFSv2). One of the core partners that has been working with the NWS since the beginning of the development phase of HEFS is the New York City Department of Environmental Protection (NYCDEP) which is responsible for the complex water supply system for New York City. The water supply system involves a network of reservoirs in both the Delaware and Hudson River basins. At the same time that the NWS was developing HEFS, NYCDEP was working on enhancing the operations of their water supply reservoirs through the development of a new Operations Support Tool (OST). OST is designed to guide reservoir system operations to ensure an adequate supply of high-quality drinking water for the city, as well as to meet secondary objectives for reaches downstream of the reservoirs assuming the primary water supply goals can be met. These secondary objectives include fisheries and ecosystem support, enhanced peak flow attenuation beyond that provided natively by the reservoirs, salt front management, and water supply for other cities. Since January 2014, the NWS Northeast and Middle Atlantic River Forecast Centers have provided daily one year forecasts from HEFS to NYCDEP. OST ingests these forecasts, couples them with near-real-time environmental and reservoir system data, and drives models of the water supply system. The input of ensemble forecasts results in an ensemble of model output, from which information on the range and likelihood of possible future system states can be extracted. This type of probabilistic information provides system managers with additional information not available from deterministic forecasts and allows managers to better assess risk, and provides greater context for decision-making than has been available in the past. HEFS has allowed NYCDEP water supply managers to make better decisions on reservoir operations than they likely would have in the past, using only deterministic forecasts.

  5. Ecological Niche Modelling Predicts Southward Expansion of Lutzomyia (Nyssomyia) flaviscutellata (Diptera: Psychodidae: Phlebotominae), Vector of Leishmania (Leishmania) amazonensis in South America, under Climate Change.

    PubMed

    Carvalho, Bruno M; Rangel, Elizabeth F; Ready, Paul D; Vale, Mariana M

    2015-01-01

    Vector borne diseases are susceptible to climate change because distributions and densities of many vectors are climate driven. The Amazon region is endemic for cutaneous leishmaniasis and is predicted to be severely impacted by climate change. Recent records suggest that the distributions of Lutzomyia (Nyssomyia) flaviscutellata and the parasite it transmits, Leishmania (Leishmania) amazonensis, are expanding southward, possibly due to climate change, and sometimes associated with new human infection cases. We define the vector's climatic niche and explore future projections under climate change scenarios. Vector occurrence records were compiled from the literature, museum collections and Brazilian Health Departments. Six bioclimatic variables were used as predictors in six ecological niche model algorithms (BIOCLIM, DOMAIN, MaxEnt, GARP, logistic regression and Random Forest). Projections for 2050 used 17 general circulation models in two greenhouse gas representative concentration pathways: "stabilization" and "high increase". Ensemble models and consensus maps were produced by overlapping binary predictions. Final model outputs showed good performance and significance. The use of species absence data substantially improved model performance. Currently, L. flaviscutellata is widely distributed in the Amazon region, with records in the Atlantic Forest and savannah regions of Central Brazil. Future projections indicate expansion of the climatically suitable area for the vector in both scenarios, towards higher latitudes and elevations. L. flaviscutellata is likely to find increasingly suitable conditions for its expansion into areas where human population size and density are much larger than they are in its current locations. If environmental conditions change as predicted, the range of the vector is likely to expand to southeastern and central-southern Brazil, eastern Paraguay and further into the Amazonian areas of Bolivia, Peru, Ecuador, Colombia and Venezuela. These areas will only become endemic for L. amazonensis, however, if they have competent reservoir hosts and transmission dynamics matching those in the Amazon region.

  6. Predicting network modules of cell cycle regulators using relative protein abundance statistics.

    PubMed

    Oguz, Cihan; Watson, Layne T; Baumann, William T; Tyson, John J

    2017-02-28

    Parameter estimation in systems biology is typically done by enforcing experimental observations through an objective function as the parameter space of a model is explored by numerical simulations. Past studies have shown that one usually finds a set of "feasible" parameter vectors that fit the available experimental data equally well, and that these alternative vectors can make different predictions under novel experimental conditions. In this study, we characterize the feasible region of a complex model of the budding yeast cell cycle under a large set of discrete experimental constraints in order to test whether the statistical features of relative protein abundance predictions are influenced by the topology of the cell cycle regulatory network. Using differential evolution, we generate an ensemble of feasible parameter vectors that reproduce the phenotypes (viable or inviable) of wild-type yeast cells and 110 mutant strains. We use this ensemble to predict the phenotypes of 129 mutant strains for which experimental data is not available. We identify 86 novel mutants that are predicted to be viable and then rank the cell cycle proteins in terms of their contributions to cumulative variability of relative protein abundance predictions. Proteins involved in "regulation of cell size" and "regulation of G1/S transition" contribute most to predictive variability, whereas proteins involved in "positive regulation of transcription involved in exit from mitosis," "mitotic spindle assembly checkpoint" and "negative regulation of cyclin-dependent protein kinase by cyclin degradation" contribute the least. These results suggest that the statistics of these predictions may be generating patterns specific to individual network modules (START, S/G2/M, and EXIT). To test this hypothesis, we develop random forest models for predicting the network modules of cell cycle regulators using relative abundance statistics as model inputs. Predictive performance is assessed by the areas under receiver operating characteristics curves (AUC). Our models generate an AUC range of 0.83-0.87 as opposed to randomized models with AUC values around 0.50. By using differential evolution and random forest modeling, we show that the model prediction statistics generate distinct network module-specific patterns within the cell cycle network.

  7. The prediction of surface temperature in the new seasonal prediction system based on the MPI-ESM coupled climate model

    NASA Astrophysics Data System (ADS)

    Baehr, J.; Fröhlich, K.; Botzet, M.; Domeisen, D. I. V.; Kornblueh, L.; Notz, D.; Piontek, R.; Pohlmann, H.; Tietsche, S.; Müller, W. A.

    2015-05-01

    A seasonal forecast system is presented, based on the global coupled climate model MPI-ESM as used for CMIP5 simulations. We describe the initialisation of the system and analyse its predictive skill for surface temperature. The presented system is initialised in the atmospheric, oceanic, and sea ice component of the model from reanalysis/observations with full field nudging in all three components. For the initialisation of the ensemble, bred vectors with a vertically varying norm are implemented in the ocean component to generate initial perturbations. In a set of ensemble hindcast simulations, starting each May and November between 1982 and 2010, we analyse the predictive skill. Bias-corrected ensemble forecasts for each start date reproduce the observed surface temperature anomalies at 2-4 months lead time, particularly in the tropics. Niño3.4 sea surface temperature anomalies show a small root-mean-square error and predictive skill up to 6 months. Away from the tropics, predictive skill is mostly limited to the ocean, and to regions which are strongly influenced by ENSO teleconnections. In summary, the presented seasonal prediction system based on a coupled climate model shows predictive skill for surface temperature at seasonal time scales comparable to other seasonal prediction systems using different underlying models and initialisation strategies. As the same model underlying our seasonal prediction system—with a different initialisation—is presently also used for decadal predictions, this is an important step towards seamless seasonal-to-decadal climate predictions.

  8. Strongly coupling a cavity to inhomogeneous ensembles of emitters: Potential for long-lived solid-state quantum memories

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Diniz, I.; Portolan, S.; Auffeves, A.

    2011-12-15

    We investigate theoretically the coupling of a cavity mode to a continuous distribution of emitters. We discuss the influence of the emitters' inhomogeneous broadening on the existence and on the coherence properties of the polaritonic peaks. We find that their coherence depends crucially on the shape of the distribution and not only on its width. Under certain conditions the coupling to the cavity protects the polaritonic states from inhomogeneous broadening, resulting in a longer storage time for a quantum memory based on emitter ensembles. When two different ensembles of emitters are coupled to the resonator, they support a peculiar collectivemore » dark state, which is also very attractive for the storage of quantum information.« less

  9. NWS Operational Requirements for Ensemble-Based Hydrologic Forecasts

    NASA Astrophysics Data System (ADS)

    Hartman, R. K.

    2008-12-01

    Ensemble-based hydrologic forecasts have been developed and issued by National Weather Service (NWS) staff at River Forecast Centers (RFCs) for many years. Used principally for long-range water supply forecasts, only the uncertainty associated with weather and climate have been traditionally considered. As technology and societal expectations of resource managers increase, the use and desire for risk-based decision support tools has also increased. These tools require forecast information that includes reliable uncertainty estimates across all time and space domains. The development of reliable uncertainty estimates associated with hydrologic forecasts is being actively pursued within the United States and internationally. This presentation will describe the challenges, components, and requirements for operational hydrologic ensemble-based forecasts from the perspective of a NOAA/NWS River Forecast Center.

  10. ms2: A molecular simulation tool for thermodynamic properties

    NASA Astrophysics Data System (ADS)

    Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran

    2011-11-01

    This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.

  11. Axial-vector form factors of the nucleon from lattice QCD

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gupta, Rajan; Jang, Yong-Chull; Lin, Huey-Wen

    In this paper, we present results for the form factors of the isovector axial vector current in the nucleon state using large scale simulations of lattice QCD. The calculations were done using eight ensembles of gauge configurations generated by the MILC collaboration using the HISQ action with 2 + 1 + 1 dynamical flavors. These ensembles span three lattice spacings a ≈ 0.06 , 0.09, and 0.12 fm and light-quark masses corresponding to the pion masses M π ≈ 135, 225, and 310 MeV. High-statistics estimates allow us to quantify systematic uncertainties in the extraction of G A (Q 2)more » and the induced pseudoscalar form factor G P(Q 2) . We perform a simultaneous extrapolation in the lattice spacing, lattice volume and light-quark masses of the axial charge radius r A data to obtain physical estimates. Using the dipole ansatz to fit the Q 2 behavior we obtain r A | dipole = 0.49(3) fm , which corresponds to M A = 1.39(9) GeV , and is consistent with M A = 1.35(17) GeV obtained by the miniBooNE collaboration. The estimate obtained using the z -expansion is r A | z - expansion = 0.46(6) fm, and the combined result is r A | combined = 0.48(4) fm. Analysis of the induced pseudoscalar form factor G P (Q 2) yields low estimates for g* P and g πNN compared to their phenomenological values. To understand these, we analyze the partially conserved axial current (PCAC) relation by also calculating the pseudoscalar form factor. Lastly, we find that these low values are due to large deviations in the PCAC relation between the three form factors, and in the pion-pole dominance hypothesis.« less

  12. Axial-vector form factors of the nucleon from lattice QCD

    DOE PAGES

    Gupta, Rajan; Jang, Yong-Chull; Lin, Huey-Wen; ...

    2017-12-04

    In this paper, we present results for the form factors of the isovector axial vector current in the nucleon state using large scale simulations of lattice QCD. The calculations were done using eight ensembles of gauge configurations generated by the MILC collaboration using the HISQ action with 2 + 1 + 1 dynamical flavors. These ensembles span three lattice spacings a ≈ 0.06 , 0.09, and 0.12 fm and light-quark masses corresponding to the pion masses M π ≈ 135, 225, and 310 MeV. High-statistics estimates allow us to quantify systematic uncertainties in the extraction of G A (Q 2)more » and the induced pseudoscalar form factor G P(Q 2) . We perform a simultaneous extrapolation in the lattice spacing, lattice volume and light-quark masses of the axial charge radius r A data to obtain physical estimates. Using the dipole ansatz to fit the Q 2 behavior we obtain r A | dipole = 0.49(3) fm , which corresponds to M A = 1.39(9) GeV , and is consistent with M A = 1.35(17) GeV obtained by the miniBooNE collaboration. The estimate obtained using the z -expansion is r A | z - expansion = 0.46(6) fm, and the combined result is r A | combined = 0.48(4) fm. Analysis of the induced pseudoscalar form factor G P (Q 2) yields low estimates for g* P and g πNN compared to their phenomenological values. To understand these, we analyze the partially conserved axial current (PCAC) relation by also calculating the pseudoscalar form factor. Lastly, we find that these low values are due to large deviations in the PCAC relation between the three form factors, and in the pion-pole dominance hypothesis.« less

  13. LiCABEDS II. Modeling of ligand selectivity for G-protein-coupled cannabinoid receptors.

    PubMed

    Ma, Chao; Wang, Lirong; Yang, Peng; Myint, Kyaw Z; Xie, Xiang-Qun

    2013-01-28

    The cannabinoid receptor subtype 2 (CB2) is a promising therapeutic target for blood cancer, pain relief, osteoporosis, and immune system disease. The recent withdrawal of Rimonabant, which targets another closely related cannabinoid receptor (CB1), accentuates the importance of selectivity for the development of CB2 ligands in order to minimize their effects on the CB1 receptor. In our previous study, LiCABEDS (Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps) was reported as a generic ligand classification algorithm for the prediction of categorical molecular properties. Here, we report extension of the application of LiCABEDS to the modeling of cannabinoid ligand selectivity with molecular fingerprints as descriptors. The performance of LiCABEDS was systematically compared with another popular classification algorithm, support vector machine (SVM), according to prediction precision and recall rate. In addition, the examination of LiCABEDS models revealed the difference in structure diversity of CB1 and CB2 selective ligands. The structure determination from data mining could be useful for the design of novel cannabinoid lead compounds. More importantly, the potential of LiCABEDS was demonstrated through successful identification of newly synthesized CB2 selective compounds.

  14. A novel method for in silico identification of regulatory SNPs in human genome.

    PubMed

    Li, Rong; Zhong, Dexing; Liu, Ruiling; Lv, Hongqiang; Zhang, Xinman; Liu, Jun; Han, Jiuqiang

    2017-02-21

    Regulatory single nucleotide polymorphisms (rSNPs), kind of functional noncoding genetic variants, can affect gene expression in a regulatory way, and they are thought to be associated with increased susceptibilities to complex diseases. Here a novel computational approach to identify potential rSNPs is presented. Different from most other rSNPs finding methods which based on hypothesis that SNPs causing large allele-specific changes in transcription factor binding affinities are more likely to play regulatory functions, we use a set of documented experimentally verified rSNPs and nonfunctional background SNPs to train classifiers, so the discriminating features are found. To characterize variants, an extensive range of characteristics, such as sequence context, DNA structure and evolutionary conservation etc. are analyzed. Support vector machine is adopted to build the classifier model together with an ensemble method to deal with unbalanced data. 10-fold cross-validation result shows that our method can achieve accuracy with sensitivity of ~78% and specificity of ~82%. Furthermore, our method performances better than some other algorithms based on aforementioned hypothesis in handling false positives. The original data and the source matlab codes involved are available at https://sourceforge.net/projects/rsnppredict/. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Hybrid Disease Diagnosis Using Multiobjective Optimization with Evolutionary Parameter Optimization

    PubMed Central

    Nalluri, MadhuSudana Rao; K., Kannan; M., Manisha

    2017-01-01

    With the widespread adoption of e-Healthcare and telemedicine applications, accurate, intelligent disease diagnosis systems have been profoundly coveted. In recent years, numerous individual machine learning-based classifiers have been proposed and tested, and the fact that a single classifier cannot effectively classify and diagnose all diseases has been almost accorded with. This has seen a number of recent research attempts to arrive at a consensus using ensemble classification techniques. In this paper, a hybrid system is proposed to diagnose ailments using optimizing individual classifier parameters for two classifier techniques, namely, support vector machine (SVM) and multilayer perceptron (MLP) technique. We employ three recent evolutionary algorithms to optimize the parameters of the classifiers above, leading to six alternative hybrid disease diagnosis systems, also referred to as hybrid intelligent systems (HISs). Multiple objectives, namely, prediction accuracy, sensitivity, and specificity, have been considered to assess the efficacy of the proposed hybrid systems with existing ones. The proposed model is evaluated on 11 benchmark datasets, and the obtained results demonstrate that our proposed hybrid diagnosis systems perform better in terms of disease prediction accuracy, sensitivity, and specificity. Pertinent statistical tests were carried out to substantiate the efficacy of the obtained results. PMID:29065626

  16. Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data.

    PubMed

    Yılmaz Isıkhan, Selen; Karabulut, Erdem; Alpar, Celal Reha

    2016-01-01

    Background/Aim . Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods . Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results . The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion . Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT.

  17. Predictability Experiments With the Navy Operational Global Atmospheric Prediction System

    NASA Astrophysics Data System (ADS)

    Reynolds, C. A.; Gelaro, R.; Rosmond, T. E.

    2003-12-01

    There are several areas of research in numerical weather prediction and atmospheric predictability, such as targeted observations and ensemble perturbation generation, where it is desirable to combine information about the uncertainty of the initial state with information about potential rapid perturbation growth. Singular vectors (SVs) provide a framework to accomplish this task in a mathematically rigorous and computationally feasible manner. In this study, SVs are calculated using the tangent and adjoint models of the Navy Operational Global Atmospheric Prediction System (NOGAPS). The analysis error variance information produced by the NRL Atmospheric Variational Data Assimilation System is used as the initial-time SV norm. These VAR SVs are compared to SVs for which total energy is both the initial and final time norms (TE SVs). The incorporation of analysis error variance information has a significant impact on the structure and location of the SVs. This in turn has a significant impact on targeted observing applications. The utility and implications of such experiments in assessing the analysis error variance estimates will be explored. Computing support has been provided by the Department of Defense High Performance Computing Center at the Naval Oceanographic Office Major Shared Resource Center at Stennis, Mississippi.

  18. Correlation in photon pairs generated using four-wave mixing in a cold atomic ensemble

    NASA Astrophysics Data System (ADS)

    Ferdinand, Andrew Richard; Manjavacas, Alejandro; Becerra, Francisco Elohim

    2017-04-01

    Spontaneous four-wave mixing (FWM) in atomic ensembles can be used to generate narrowband entangled photon pairs at or near atomic resonances. While extensive research has been done to investigate the quantum correlations in the time and polarization of such photon pairs, the study and control of high dimensional quantum correlations contained in their spatial degrees of freedom has not been fully explored. In our work we experimentally investigate the generation of correlated light from FWM in a cold ensemble of cesium atoms as a function of the frequencies of the pump fields in the FWM process. In addition, we theoretically study the spatial correlations of the photon pairs generated in the FWM process, specifically the joint distribution of their orbital angular momentum (OAM). We investigate the width of the distribution of the OAM modes, known as the spiral bandwidth, and the purity of OAM correlations as a function of the properties of the pump fields, collected photons, and the atomic ensemble. These studies will guide experiments involving high dimensional entanglement of photons generated from this FWM process and OAM-based quantum communication with atomic ensembles. This work is supported by AFORS Grant FA9550-14-1-0300.

  19. Heat Stress Evaluation of Two-layer Chemical Demilitarization Ensembles with a Full Face Negative Pressure Respirator

    PubMed Central

    FLETCHER, Oclla Michele; GUERRINA, Ryan; ASHLEY, Candi D.; BERNARD, Thomas E.

    2014-01-01

    The purpose of this study was to examine the heat stress effects of three protective clothing ensembles: (1) protective apron over cloth coveralls including full face negative pressure respirator (APRON); (2) the apron over cloth coveralls with respirator plus protective pants (APRON+PANTS); and (3) protective coveralls over cloth coveralls with respirator (PROTECTIVE COVERALLS). In addition, there was a no-respirator ensemble (PROTECTIVE COVERALLS-noR), and WORK CLOTHES as a reference ensemble. Four acclimatized male participants completed a full set of five trials, and two of the participants repeated the full set. The progressive heat stress protocol was used to find the critical WBGT (WBGTcrit) and apparent total evaporative resistance (Re,T,a) at the upper limit of thermal equilibrium. The results (WBGTcrit [°C-WBGT] and Re,T,a [kPa m2 W−1]) were WORK CLOTHES (35.5, 0.0115), APRON (31.6, 0.0179), APRON+PANTS (27.7, 0.0244), PROTECTIVE COVERALLS (25.9, 0.0290), and PROTECTIVE COVERALLS-noR (26.2, 0.0296). There were significant differences among the ensembles. Supporting previous studies, there was little evidence to suggest that the respirator contributed to heat stress. PMID:24705801

  20. [Support vector machine?assisted diagnosis of human malignant gastric tissues based on dielectric properties].

    PubMed

    Zhang, Sa; Li, Zhou; Xin, Xue-Gang

    2017-12-20

    To achieve differential diagnosis of normal and malignant gastric tissues based on discrepancies in their dielectric properties using support vector machine. The dielectric properties of normal and malignant gastric tissues at the frequency ranging from 42.58 to 500 MHz were measured by coaxial probe method, and the Cole?Cole model was used to fit the measured data. Receiver?operating characteristic (ROC) curve analysis was used to evaluate the discrimination capability with respect to permittivity, conductivity, and Cole?Cole fitting parameters. Support vector machine was used for discriminating normal and malignant gastric tissues, and the discrimination accuracy was calculated using k?fold cross? The area under the ROC curve was above 0.8 for permittivity at the 5 frequencies at the lower end of the measured frequency range. The combination of the support vector machine with the permittivity at all these 5 frequencies combined achieved the highest discrimination accuracy of 84.38% with a MATLAB runtime of 3.40 s. The support vector machine?assisted diagnosis is feasible for human malignant gastric tissues based on the dielectric properties.

  1. Research on intrusion detection based on Kohonen network and support vector machine

    NASA Astrophysics Data System (ADS)

    Shuai, Chunyan; Yang, Hengcheng; Gong, Zeweiyi

    2018-05-01

    In view of the problem of low detection accuracy and the long detection time of support vector machine, which directly applied to the network intrusion detection system. Optimization of SVM parameters can greatly improve the detection accuracy, but it can not be applied to high-speed network because of the long detection time. a method based on Kohonen neural network feature selection is proposed to reduce the optimization time of support vector machine parameters. Firstly, this paper is to calculate the weights of the KDD99 network intrusion data by Kohonen network and select feature by weight. Then, after the feature selection is completed, genetic algorithm (GA) and grid search method are used for parameter optimization to find the appropriate parameters and classify them by support vector machines. By comparing experiments, it is concluded that feature selection can reduce the time of parameter optimization, which has little influence on the accuracy of classification. The experiments suggest that the support vector machine can be used in the network intrusion detection system and reduce the missing rate.

  2. AESS: Accelerated Exact Stochastic Simulation

    NASA Astrophysics Data System (ADS)

    Jenkins, David D.; Peterson, Gregory D.

    2011-12-01

    The Stochastic Simulation Algorithm (SSA) developed by Gillespie provides a powerful mechanism for exploring the behavior of chemical systems with small species populations or with important noise contributions. Gene circuit simulations for systems biology commonly employ the SSA method, as do ecological applications. This algorithm tends to be computationally expensive, so researchers seek an efficient implementation of SSA. In this program package, the Accelerated Exact Stochastic Simulation Algorithm (AESS) contains optimized implementations of Gillespie's SSA that improve the performance of individual simulation runs or ensembles of simulations used for sweeping parameters or to provide statistically significant results. Program summaryProgram title: AESS Catalogue identifier: AEJW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: University of Tennessee copyright agreement No. of lines in distributed program, including test data, etc.: 10 861 No. of bytes in distributed program, including test data, etc.: 394 631 Distribution format: tar.gz Programming language: C for processors, CUDA for NVIDIA GPUs Computer: Developed and tested on various x86 computers and NVIDIA C1060 Tesla and GTX 480 Fermi GPUs. The system targets x86 workstations, optionally with multicore processors or NVIDIA GPUs as accelerators. Operating system: Tested under Ubuntu Linux OS and CentOS 5.5 Linux OS Classification: 3, 16.12 Nature of problem: Simulation of chemical systems, particularly with low species populations, can be accurately performed using Gillespie's method of stochastic simulation. Numerous variations on the original stochastic simulation algorithm have been developed, including approaches that produce results with statistics that exactly match the chemical master equation (CME) as well as other approaches that approximate the CME. Solution method: The Accelerated Exact Stochastic Simulation (AESS) tool provides implementations of a wide variety of popular variations on the Gillespie method. Users can select the specific algorithm considered most appropriate. Comparisons between the methods and with other available implementations indicate that AESS provides the fastest known implementation of Gillespie's method for a variety of test models. Users may wish to execute ensembles of simulations to sweep parameters or to obtain better statistical results, so AESS supports acceleration of ensembles of simulation using parallel processing with MPI, SSE vector units on x86 processors, and/or using NVIDIA GPUs with CUDA.

  3. Using multiclass classification to automate the identification of patient safety incident reports by type and severity.

    PubMed

    Wang, Ying; Coiera, Enrico; Runciman, William; Magrabi, Farah

    2017-06-12

    Approximately 10% of admissions to acute-care hospitals are associated with an adverse event. Analysis of incident reports helps to understand how and why incidents occur and can inform policy and practice for safer care. Unfortunately our capacity to monitor and respond to incident reports in a timely manner is limited by the sheer volumes of data collected. In this study, we aim to evaluate the feasibility of using multiclass classification to automate the identification of patient safety incidents in hospitals. Text based classifiers were applied to identify 10 incident types and 4 severity levels. Using the one-versus-one (OvsO) and one-versus-all (OvsA) ensemble strategies, we evaluated regularized logistic regression, linear support vector machine (SVM) and SVM with a radial-basis function (RBF) kernel. Classifiers were trained and tested with "balanced" datasets (n_ Type  = 2860, n_ SeverityLevel  = 1160) from a state-wide incident reporting system. Testing was also undertaken with imbalanced "stratified" datasets (n_ Type  = 6000, n_ SeverityLevel =5950) from the state-wide system and an independent hospital reporting system. Classifier performance was evaluated using a confusion matrix, as well as F-score, precision and recall. The most effective combination was a OvsO ensemble of binary SVM RBF classifiers with binary count feature extraction. For incident type, classifiers performed well on balanced and stratified datasets (F-score: 78.3, 73.9%), but were worse on independent datasets (68.5%). Reports about falls, medications, pressure injury, aggression and blood products were identified with high recall and precision. "Documentation" was the hardest type to identify. For severity level, F-score for severity assessment code (SAC) 1 (extreme risk) was 87.3 and 64% for SAC4 (low risk) on balanced data. With stratified data, high recall was achieved for SAC1 (82.8-84%) but precision was poor (6.8-11.2%). High risk incidents (SAC2) were confused with medium risk incidents (SAC3). Binary classifier ensembles appear to be a feasible method for identifying incidents by type and severity level. Automated identification should enable safety problems to be detected and addressed in a more timely manner. Multi-label classifiers may be necessary for reports that relate to more than one incident type.

  4. IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework.

    PubMed

    Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan

    2016-02-01

    Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called "HM-BagMoov" overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named "IntelliHealth" is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Ensembl core software resources: storage and programmatic access for DNA sequence and genome annotation.

    PubMed

    Ruffier, Magali; Kähäri, Andreas; Komorowska, Monika; Keenan, Stephen; Laird, Matthew; Longden, Ian; Proctor, Glenn; Searle, Steve; Staines, Daniel; Taylor, Kieron; Vullo, Alessandro; Yates, Andrew; Zerbino, Daniel; Flicek, Paul

    2017-01-01

    The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl 'Core' database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list ( http://www.ensembl.org/info/about/contact/index.html ). http://www.ensembl.org. © The Author(s) 2017. Published by Oxford University Press.

  6. Reduction of predictive uncertainty in estimating irrigation water requirement through multi-model ensembles and ensemble averaging

    NASA Astrophysics Data System (ADS)

    Multsch, S.; Exbrayat, J.-F.; Kirby, M.; Viney, N. R.; Frede, H.-G.; Breuer, L.

    2015-04-01

    Irrigation agriculture plays an increasingly important role in food supply. Many evapotranspiration models are used today to estimate the water demand for irrigation. They consider different stages of crop growth by empirical crop coefficients to adapt evapotranspiration throughout the vegetation period. We investigate the importance of the model structural versus model parametric uncertainty for irrigation simulations by considering six evapotranspiration models and five crop coefficient sets to estimate irrigation water requirements for growing wheat in the Murray-Darling Basin, Australia. The study is carried out using the spatial decision support system SPARE:WATER. We find that structural model uncertainty among reference ET is far more important than model parametric uncertainty introduced by crop coefficients. These crop coefficients are used to estimate irrigation water requirement following the single crop coefficient approach. Using the reliability ensemble averaging (REA) technique, we are able to reduce the overall predictive model uncertainty by more than 10%. The exceedance probability curve of irrigation water requirements shows that a certain threshold, e.g. an irrigation water limit due to water right of 400 mm, would be less frequently exceeded in case of the REA ensemble average (45%) in comparison to the equally weighted ensemble average (66%). We conclude that multi-model ensemble predictions and sophisticated model averaging techniques are helpful in predicting irrigation demand and provide relevant information for decision making.

  7. A Power Transformers Fault Diagnosis Model Based on Three DGA Ratios and PSO Optimization SVM

    NASA Astrophysics Data System (ADS)

    Ma, Hongzhe; Zhang, Wei; Wu, Rongrong; Yang, Chunyan

    2018-03-01

    In order to make up for the shortcomings of existing transformer fault diagnosis methods in dissolved gas-in-oil analysis (DGA) feature selection and parameter optimization, a transformer fault diagnosis model based on the three DGA ratios and particle swarm optimization (PSO) optimize support vector machine (SVM) is proposed. Using transforming support vector machine to the nonlinear and multi-classification SVM, establishing the particle swarm optimization to optimize the SVM multi classification model, and conducting transformer fault diagnosis combined with the cross validation principle. The fault diagnosis results show that the average accuracy of test method is better than the standard support vector machine and genetic algorithm support vector machine, and the proposed method can effectively improve the accuracy of transformer fault diagnosis is proved.

  8. GenomeHubs: simple containerized setup of a custom Ensembl database and web server for any species

    PubMed Central

    Kumar, Sujai; Stevens, Lewis; Blaxter, Mark

    2017-01-01

    Abstract As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration. Database URL: http://GenomeHubs.org PMID:28605774

  9. Multi-criterion model ensemble of CMIP5 surface air temperature over China

    NASA Astrophysics Data System (ADS)

    Yang, Tiantian; Tao, Yumeng; Li, Jingjing; Zhu, Qian; Su, Lu; He, Xiaojia; Zhang, Xiaoming

    2018-05-01

    The global circulation models (GCMs) are useful tools for simulating climate change, projecting future temperature changes, and therefore, supporting the preparation of national climate adaptation plans. However, different GCMs are not always in agreement with each other over various regions. The reason is that GCMs' configurations, module characteristics, and dynamic forcings vary from one to another. Model ensemble techniques are extensively used to post-process the outputs from GCMs and improve the variability of model outputs. Root-mean-square error (RMSE), correlation coefficient (CC, or R) and uncertainty are commonly used statistics for evaluating the performances of GCMs. However, the simultaneous achievements of all satisfactory statistics cannot be guaranteed in using many model ensemble techniques. In this paper, we propose a multi-model ensemble framework, using a state-of-art evolutionary multi-objective optimization algorithm (termed MOSPD), to evaluate different characteristics of ensemble candidates and to provide comprehensive trade-off information for different model ensemble solutions. A case study of optimizing the surface air temperature (SAT) ensemble solutions over different geographical regions of China is carried out. The data covers from the period of 1900 to 2100, and the projections of SAT are analyzed with regard to three different statistical indices (i.e., RMSE, CC, and uncertainty). Among the derived ensemble solutions, the trade-off information is further analyzed with a robust Pareto front with respect to different statistics. The comparison results over historical period (1900-2005) show that the optimized solutions are superior over that obtained simple model average, as well as any single GCM output. The improvements of statistics are varying for different climatic regions over China. Future projection (2006-2100) with the proposed ensemble method identifies that the largest (smallest) temperature changes will happen in the South Central China (the Inner Mongolia), the North Eastern China (the South Central China), and the North Western China (the South Central China), under RCP 2.6, RCP 4.5, and RCP 8.5 scenarios, respectively.

  10. Design of an Evolutionary Approach for Intrusion Detection

    PubMed Central

    2013-01-01

    A novel evolutionary approach is proposed for effective intrusion detection based on benchmark datasets. The proposed approach can generate a pool of noninferior individual solutions and ensemble solutions thereof. The generated ensembles can be used to detect the intrusions accurately. For intrusion detection problem, the proposed approach could consider conflicting objectives simultaneously like detection rate of each attack class, error rate, accuracy, diversity, and so forth. The proposed approach can generate a pool of noninferior solutions and ensembles thereof having optimized trade-offs values of multiple conflicting objectives. In this paper, a three-phase, approach is proposed to generate solutions to a simple chromosome design in the first phase. In the first phase, a Pareto front of noninferior individual solutions is approximated. In the second phase of the proposed approach, the entire solution set is further refined to determine effective ensemble solutions considering solution interaction. In this phase, another improved Pareto front of ensemble solutions over that of individual solutions is approximated. The ensemble solutions in improved Pareto front reported improved detection results based on benchmark datasets for intrusion detection. In the third phase, a combination method like majority voting method is used to fuse the predictions of individual solutions for determining prediction of ensemble solution. Benchmark datasets, namely, KDD cup 1999 and ISCX 2012 dataset, are used to demonstrate and validate the performance of the proposed approach for intrusion detection. The proposed approach can discover individual solutions and ensemble solutions thereof with a good support and a detection rate from benchmark datasets (in comparison with well-known ensemble methods like bagging and boosting). In addition, the proposed approach is a generalized classification approach that is applicable to the problem of any field having multiple conflicting objectives, and a dataset can be represented in the form of labelled instances in terms of its features. PMID:24376390

  11. Extending Climate Analytics as a Service to the Earth System Grid Federation Progress Report on the Reanalysis Ensemble Service

    NASA Astrophysics Data System (ADS)

    Tamkin, G.; Schnase, J. L.; Duffy, D.; Li, J.; Strong, S.; Thompson, J. H.

    2016-12-01

    We are extending climate analytics-as-a-service, including: (1) A high-performance Virtual Real-Time Analytics Testbed supporting six major reanalysis data sets using advanced technologies like the Cloudera Impala-based SQL and Hadoop-based MapReduce analytics over native NetCDF files. (2) A Reanalysis Ensemble Service (RES) that offers a basic set of commonly used operations over the reanalysis collections that are accessible through NASA's climate data analytics Web services and our client-side Climate Data Services Python library, CDSlib. (3) An Open Geospatial Consortium (OGC) WPS-compliant Web service interface to CDSLib to accommodate ESGF's Web service endpoints. This presentation will report on the overall progress of this effort, with special attention to recent enhancements that have been made to the Reanalysis Ensemble Service, including the following: - An CDSlib Python library that supports full temporal, spatial, and grid-based resolution services - A new reanalysis collections reference model to enable operator design and implementation - An enhanced library of sample queries to demonstrate and develop use case scenarios - Extended operators that enable single- and multiple reanalysis area average, vertical average, re-gridding, and trend, climatology, and anomaly computations - Full support for the MERRA-2 reanalysis and the initial integration of two additional reanalyses - A prototype Jupyter notebook-based distribution mechanism that combines CDSlib documentation with interactive use case scenarios and personalized project management - Prototyped uncertainty quantification services that combine ensemble products with comparative observational products - Convenient, one-stop shopping for commonly used data products from multiple reanalyses, including basic subsetting and arithmetic operations over the data and extractions of trends, climatologies, and anomalies - The ability to compute and visualize multiple reanalysis intercomparisons

  12. New Developments in the Data Assimilation Research Testbed

    NASA Astrophysics Data System (ADS)

    Hoar, T. J.; Anderson, J. L.; Raeder, K.; Karspeck, A. R.; Romine, G.; Liu, H.; Collins, N.

    2011-12-01

    NCAR's Data Assimilation Research Testbed (DART) is a community facility that provides ensemble data assimilation tools for geophysical applications. DART works with an expanding set of models and a wide range of conventional and novel observations, and provides a variety of assimilation algorithms and diagnostic tools. The Kodiak release of DART became available in July 2011 and includes more than 20 major feature enhancements, support for 24 models, support for (at least) 14 observation formats, expanded documentation and diagnostic tools, and 12 new utilities. A few examples of research projects that demonstrate the effectiveness and flexibility of the DART are described. The Community Atmosphere Model (CAM) and DART assimilated all the observations that were used in the NCEP/NCAR Reanalysis to produce a global, 6-hourly, 80-member ensemble reanalysis for 1998 through the present. The dataset is ideal for research applications that would benefit from an ensemble of equally-likely atmospheric states that are consistent with observations. Individual ensemble members may be used as a "data atmosphere" in any Community Earth System Model (CESM) experiment. The CESM interfaces for the Parallel Ocean Program (POP) and the Community Land Model (CLM) also support multiple instances, allowing data assimilation experiments exploiting unique atmospheric forcing for each POP or CLM model instance. A multi-year DART ocean assimilation has been completed and provides valuable insight into the successes and challenges of oceanic data assimilation. The DART/CLM research focuses on snow cover fraction and snow depth. The Weather Research and Forecasting (WRF) model was used with DART to perform a real-time CONUS domain mesoscale ensemble analysis with continuous cycling for 47 days. A member was selected once daily for high-resolution convective forecasts supporting a test phase of the Deep Convective Clouds and Chemistry experiment and the Storm Prediction Center spring experiment. The impacts of Moderate Resolution Imaging Spectroradiometer (MODIS) infrared and Advanced Microwave Scanning Radiometer (AMSR) microwave total precipitable water (TPW) observations on analyses and forecasts of tropical cyclone Sinlaku (2008) are investigated by performing assimilations with a 45km resolution WRF model over the Western Pacific domain for 8-14 Septmber, 2008. Particular emphasis is on the performance of the assimilation algorithms in the hurricane core and the impact of novel observations in the hurricane core.

  13. The Design of a Templated C++ Small Vector Class for Numerical Computing

    NASA Technical Reports Server (NTRS)

    Moran, Patrick J.

    2000-01-01

    We describe the design and implementation of a templated C++ class for vectors. The vector class is templated both for vector length and vector component type; the vector length is fixed at template instantiation time. The vector implementation is such that for a vector of N components of type T, the total number of bytes required by the vector is equal to N * size of (T), where size of is the built-in C operator. The property of having a size no bigger than that required by the components themselves is key in many numerical computing applications, where one may allocate very large arrays of small, fixed-length vectors. In addition to the design trade-offs motivating our fixed-length vector design choice, we review some of the C++ template features essential to an efficient, succinct implementation. In particular, we highlight some of the standard C++ features, such as partial template specialization, that are not supported by all compilers currently. This report provides an inventory listing the relevant support currently provided by some key compilers, as well as test code one can use to verify compiler capabilities.

  14. An Efficient Wait-Free Vector

    DOE PAGES

    Feldman, Steven; Valera-Leon, Carlos; Dechev, Damian

    2016-03-01

    The vector is a fundamental data structure, which provides constant-time access to a dynamically-resizable range of elements. Currently, there exist no wait-free vectors. The only non-blocking version supports only a subset of the sequential vector API and exhibits significant synchronization overhead caused by supporting opposing operations. Since many applications operate in phases of execution, wherein each phase only a subset of operations are used, this overhead is unnecessary for the majority of the application. To address the limitations of the non-blocking version, we present a new design that is wait-free, supports more of the operations provided by the sequential vector,more » and provides alternative implementations of key operations. These alternatives allow the developer to balance the performance and functionality of the vector as requirements change throughout execution. Compared to the known non-blocking version and the concurrent vector found in Intel’s TBB library, our design outperforms or provides comparable performance in the majority of tested scenarios. Over all tested scenarios, the presented design performs an average of 4.97 times more operations per second than the non-blocking vector and 1.54 more than the TBB vector. In a scenario designed to simulate the filling of a vector, performance improvement increases to 13.38 and 1.16 times. This work presents the first ABA-free non-blocking vector. Finally, unlike the other non-blocking approach, all operations are wait-free and bounds-checked and elements are stored contiguously in memory.« less

  15. Advanced Atmospheric Ensemble Modeling Techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buckley, R.; Chiswell, S.; Kurzeja, R.

    Ensemble modeling (EM), the creation of multiple atmospheric simulations for a given time period, has become an essential tool for characterizing uncertainties in model predictions. We explore two novel ensemble modeling techniques: (1) perturbation of model parameters (Adaptive Programming, AP), and (2) data assimilation (Ensemble Kalman Filter, EnKF). The current research is an extension to work from last year and examines transport on a small spatial scale (<100 km) in complex terrain, for more rigorous testing of the ensemble technique. Two different release cases were studied, a coastal release (SF6) and an inland release (Freon) which consisted of two releasemore » times. Observations of tracer concentration and meteorology are used to judge the ensemble results. In addition, adaptive grid techniques have been developed to reduce required computing resources for transport calculations. Using a 20- member ensemble, the standard approach generated downwind transport that was quantitatively good for both releases; however, the EnKF method produced additional improvement for the coastal release where the spatial and temporal differences due to interior valley heating lead to the inland movement of the plume. The AP technique showed improvements for both release cases, with more improvement shown in the inland release. This research demonstrated that transport accuracy can be improved when models are adapted to a particular location/time or when important local data is assimilated into the simulation and enhances SRNL’s capability in atmospheric transport modeling in support of its current customer base and local site missions, as well as our ability to attract new customers within the intelligence community.« less

  16. SIMULATION OF THE ICELAND VOLCANIC ERUPTION OF APRIL 2010 USING THE ENSEMBLE SYSTEM

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buckley, R.

    2011-05-10

    The Eyjafjallajokull volcanic eruption in Iceland in April 2010 disrupted transportation in Europe which ultimately affected travel plans for many on a global basis. The Volcanic Ash Advisory Centre (VAAC) is responsible for providing guidance to the aviation industry of the transport of volcanic ash clouds. There are nine such centers located globally, and the London branch (headed by the United Kingdom Meteorological Office, or UKMet) was responsible for modeling the Iceland volcano. The guidance provided by the VAAC created some controversy due to the burdensome travel restrictions and uncertainty involved in the prediction of ash transport. The Iceland volcanicmore » eruption provides a useful exercise of the European ENSEMBLE program, coordinated by the Joint Research Centre (JRC) in Ispra, Italy. ENSEMBLE, a decision support system for emergency response, uses transport model results from a variety of countries in an effort to better understand the uncertainty involved with a given accident scenario. Model results in the form of airborne concentration and surface deposition are required from each member of the ensemble in a prescribed format that may then be uploaded to a website for manipulation. The Savannah River National Laboratory (SRNL) is the lone regular United States participant throughout the 10-year existence of ENSEMBLE. For the Iceland volcano, four separate source term estimates have been provided to ENSEMBLE participants. This paper focuses only on one of those source terms. The SRNL results in relation to other modeling agency results along with useful information obtained using an ensemble of transport results will be discussed.« less

  17. Understanding the Structural Ensembles of a Highly Extended Disordered Protein†

    PubMed Central

    Daughdrill, Gary W.; Kashtanov, Stepan; Stancik, Amber; Hill, Shannon E.; Helms, Gregory; Muschol, Martin

    2013-01-01

    Developing a comprehensive description of the equilibrium structural ensembles for intrinsically disordered proteins (IDPs) is essential to understanding their function. The p53 transactivation domain (p53TAD) is an IDP that interacts with multiple protein partners and contains numerous phosphorylation sites. Multiple techniques were used to investigate the equilibrium structural ensemble of p53TAD in its native and chemically unfolded states. The results from these experiments show that the native state of p53TAD has dimensions similar to a classical random coil while the chemically unfolded state is more extended. To investigate the molecular properties responsible for this behavior, a novel algorithm that generates diverse and unbiased structural ensembles of IDPs was developed. This algorithm was used to generate a large pool of plausible p53TAD structures that were reweighted to identify a subset of structures with the best fit to small angle X-ray scattering data. High weight structures in the native state ensemble show features that are localized to protein binding sites and regions with high proline content. The features localized to the protein binding sites are mostly eliminated in the chemically unfolded ensemble; while, the regions with high proline content remain relatively unaffected. Data from NMR experiments support these results, showing that residues from the protein binding sites experience larger environmental changes upon unfolding by urea than regions with high proline content. This behavior is consistent with the urea-induced exposure of nonpolar and aromatic side-chains in the protein binding sites that are partially excluded from solvent in the native state ensemble. PMID:21979461

  18. Cooperative emission of light by an ensemble of dipoles near a metal nanoparticle: the plasmonic Dicke effect.

    PubMed

    Pustovit, Vitaliy N; Shahbazyan, Tigran V

    2009-02-20

    We identify a new mechanism for cooperative emission of light by an ensemble of N dipoles near a metal nanostructure supporting a surface plasmon. The cross talk between emitters due to the virtual plasmon exchange leads to the formation of three plasmonic superradiant modes whose radiative decay rates scale with N, while the total radiated energy is thrice that of a single emitter. Our numerical simulations indicate that the plasmonic Dicke effect survives nonradiative losses in the metal.

  19. A Self-Organizing Map-Based Approach to Generating Reduced-Size, Statistically Similar Climate Datasets

    NASA Astrophysics Data System (ADS)

    Cabell, R.; Delle Monache, L.; Alessandrini, S.; Rodriguez, L.

    2015-12-01

    Climate-based studies require large amounts of data in order to produce accurate and reliable results. Many of these studies have used 30-plus year data sets in order to produce stable and high-quality results, and as a result, many such data sets are available, generally in the form of global reanalyses. While the analysis of these data lead to high-fidelity results, its processing can be very computationally expensive. This computational burden prevents the utilization of these data sets for certain applications, e.g., when rapid response is needed in crisis management and disaster planning scenarios resulting from release of toxic material in the atmosphere. We have developed a methodology to reduce large climate datasets to more manageable sizes while retaining statistically similar results when used to produce ensembles of possible outcomes. We do this by employing a Self-Organizing Map (SOM) algorithm to analyze general patterns of meteorological fields over a regional domain of interest to produce a small set of "typical days" with which to generate the model ensemble. The SOM algorithm takes as input a set of vectors and generates a 2D map of representative vectors deemed most similar to the input set and to each other. Input predictors are selected that are correlated with the model output, which in our case is an Atmospheric Transport and Dispersion (T&D) model that is highly dependent on surface winds and boundary layer depth. To choose a subset of "typical days," each input day is assigned to its closest SOM map node vector and then ranked by distance. Each node vector is treated as a distribution and days are sampled from them by percentile. Using a 30-node SOM, with sampling every 20th percentile, we have been able to reduce 30 years of the Climate Forecast System Reanalysis (CFSR) data for the month of October to 150 "typical days." To estimate the skill of this approach, the "Measure of Effectiveness" (MOE) metric is used to compare area and overlap of statistical exceedance between the reduced data set and the full 30-year CFSR dataset. Using the MOE, we find that our SOM-derived climate subset produces statistics that fall within 85-90% overlap with the full set while using only 15% of the total data length, and consequently, 15% of the computational time required to run the T&D model for the full period.

  20. Kronecker-Basis-Representation Based Tensor Sparsity and Its Applications to Tensor Recovery.

    PubMed

    Xie, Qi; Zhao, Qian; Meng, Deyu; Xu, Zongben

    2017-08-02

    It is well known that the sparsity/low-rank of a vector/matrix can be rationally measured by nonzero-entries-number ($l_0$ norm)/nonzero- singular-values-number (rank), respectively. However, data from real applications are often generated by the interaction of multiple factors, which obviously cannot be sufficiently represented by a vector/matrix, while a high order tensor is expected to provide more faithful representation to deliver the intrinsic structure underlying such data ensembles. Unlike the vector/matrix case, constructing a rational high order sparsity measure for tensor is a relatively harder task. To this aim, in this paper we propose a measure for tensor sparsity, called Kronecker-basis-representation based tensor sparsity measure (KBR briefly), which encodes both sparsity insights delivered by Tucker and CANDECOMP/PARAFAC (CP) low-rank decompositions for a general tensor. Then we study the KBR regularization minimization (KBRM) problem, and design an effective ADMM algorithm for solving it, where each involved parameter can be updated with closed-form equations. Such an efficient solver makes it possible to extend KBR to various tasks like tensor completion and tensor robust principal component analysis. A series of experiments, including multispectral image (MSI) denoising, MSI completion and background subtraction, substantiate the superiority of the proposed methods beyond state-of-the-arts.

  1. Soft-sensing model of temperature for aluminum reduction cell on improved twin support vector regression

    NASA Astrophysics Data System (ADS)

    Li, Tao

    2018-06-01

    The complexity of aluminum electrolysis process leads the temperature for aluminum reduction cells hard to measure directly. However, temperature is the control center of aluminum production. To solve this problem, combining some aluminum plant's practice data, this paper presents a Soft-sensing model of temperature for aluminum electrolysis process on Improved Twin Support Vector Regression (ITSVR). ITSVR eliminates the slow learning speed of Support Vector Regression (SVR) and the over-fit risk of Twin Support Vector Regression (TSVR) by introducing a regularization term into the objective function of TSVR, which ensures the structural risk minimization principle and lower computational complexity. Finally, the model with some other parameters as auxiliary variable, predicts the temperature by ITSVR. The simulation result shows Soft-sensing model based on ITSVR has short time-consuming and better generalization.

  2. Depolarization of an Ultrashort Pulse in a Disordered Ensemble of Mie Particles

    NASA Astrophysics Data System (ADS)

    Gorodnichev, E. E.; Ivliev, S. V.; Kuzovlev, A. I.; Rogozkin, D. B.

    2017-12-01

    We study propagation of an ultrashort pulse of polarized light through a turbid medium with the Reynolds-McCormick phase function. Within the basic mode approach to the vector radiative transfer equation, the temporal profile of the degree of polarization is calculated analytically with the use of the small-angle approximation. The degree of polarization is shown to be described by the self-similar dependence on some combination of the transport scattering coefficient, the temporal delay and the sample thickness. Our results are in excellent agreement with the data of numerical simulations carried out previously for aqueous suspension of polystyrene microspheres.

  3. Implementation of the ANNs ensembles in macro-BIM cost estimates of buildings' floor structural frames

    NASA Astrophysics Data System (ADS)

    Juszczyk, Michał

    2018-04-01

    This paper reports some results of the studies on the use of artificial intelligence tools for the purposes of cost estimation based on building information models. A problem of the cost estimates based on the building information models on a macro level supported by the ensembles of artificial neural networks is concisely discussed. In the course of the research a regression model has been built for the purposes of cost estimation of buildings' floor structural frames, as higher level elements. Building information models are supposed to serve as a repository of data used for the purposes of cost estimation. The core of the model is the ensemble of neural networks. The developed model allows the prediction of cost estimates with satisfactory accuracy.

  4. Designing Computer-Supported Collaborative Learning at Work for Rural It Workers: Learning Ensembles and Geographic Isolation

    ERIC Educational Resources Information Center

    Goggins, Sean P.

    2014-01-01

    This paper presents the results of a 9-month ethnographic and action research study of rural technology workers where computer support for collaborative learning through workplace technologies was introduced to a US-based technology firm. Throughout the implementation of this support and participation, issues related to geographic isolation are…

  5. A GLM Post-processor to Adjust Ensemble Forecast Traces

    NASA Astrophysics Data System (ADS)

    Thiemann, M.; Day, G. N.; Schaake, J. C.; Draijer, S.; Wang, L.

    2011-12-01

    The skill of hydrologic ensemble forecasts has improved in the last years through a better understanding of climate variability, better climate forecasts and new data assimilation techniques. Having been extensively utilized for probabilistic water supply forecasting, interest is developing to utilize these forecasts in operational decision making. Hydrologic ensemble forecast members typically have inherent biases in flow timing and volume caused by (1) structural errors in the models used, (2) systematic errors in the data used to calibrate those models, (3) uncertain initial hydrologic conditions, and (4) uncertainties in the forcing datasets. Furthermore, hydrologic models have often not been developed for operational decision points and ensemble forecasts are thus not always available where needed. A statistical post-processor can be used to address these issues. The post-processor should (1) correct for systematic biases in flow timing and volume, (2) preserve the skill of the available raw forecasts, (3) preserve spatial and temporal correlation as well as the uncertainty in the forecasted flow data, (4) produce adjusted forecast ensembles that represent the variability of the observed hydrograph to be predicted, and (5) preserve individual forecast traces as equally likely. The post-processor should also allow for the translation of available ensemble forecasts to hydrologically similar locations where forecasts are not available. This paper introduces an ensemble post-processor (EPP) developed in support of New York City water supply operations. The EPP employs a general linear model (GLM) to (1) adjust available ensemble forecast traces and (2) create new ensembles for (nearby) locations where only historic flow observations are available. The EPP is calibrated by developing daily and aggregated statistical relationships form historical flow observations and model simulations. These are then used in operation to obtain the conditional probability density function (PDF) of the observations to be predicted, thus jointly adjusting individual ensemble members. These steps are executed in a normalized transformed space ('z'-space) to account for the strong non-linearity in the flow observations involved. A data window centered on each calibration date is used to minimize impacts from sampling errors and data noise. Testing on datasets from California and New York suggests that the EPP can successfully minimize biases in ensemble forecasts, while preserving the raw forecast skill in a 'days to weeks' forecast horizon and reproducing the variability of climatology for 'weeks to years' forecast horizons.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feldman, Steven; Valera-Leon, Carlos; Dechev, Damian

    The vector is a fundamental data structure, which provides constant-time access to a dynamically-resizable range of elements. Currently, there exist no wait-free vectors. The only non-blocking version supports only a subset of the sequential vector API and exhibits significant synchronization overhead caused by supporting opposing operations. Since many applications operate in phases of execution, wherein each phase only a subset of operations are used, this overhead is unnecessary for the majority of the application. To address the limitations of the non-blocking version, we present a new design that is wait-free, supports more of the operations provided by the sequential vector,more » and provides alternative implementations of key operations. These alternatives allow the developer to balance the performance and functionality of the vector as requirements change throughout execution. Compared to the known non-blocking version and the concurrent vector found in Intel’s TBB library, our design outperforms or provides comparable performance in the majority of tested scenarios. Over all tested scenarios, the presented design performs an average of 4.97 times more operations per second than the non-blocking vector and 1.54 more than the TBB vector. In a scenario designed to simulate the filling of a vector, performance improvement increases to 13.38 and 1.16 times. This work presents the first ABA-free non-blocking vector. Finally, unlike the other non-blocking approach, all operations are wait-free and bounds-checked and elements are stored contiguously in memory.« less

  7. A comparative study of surface EMG classification by fuzzy relevance vector machine and fuzzy support vector machine.

    PubMed

    Xie, Hong-Bo; Huang, Hu; Wu, Jianhua; Liu, Lei

    2015-02-01

    We present a multiclass fuzzy relevance vector machine (FRVM) learning mechanism and evaluate its performance to classify multiple hand motions using surface electromyographic (sEMG) signals. The relevance vector machine (RVM) is a sparse Bayesian kernel method which avoids some limitations of the support vector machine (SVM). However, RVM still suffers the difficulty of possible unclassifiable regions in multiclass problems. We propose two fuzzy membership function-based FRVM algorithms to solve such problems, based on experiments conducted on seven healthy subjects and two amputees with six hand motions. Two feature sets, namely, AR model coefficients and room mean square value (AR-RMS), and wavelet transform (WT) features, are extracted from the recorded sEMG signals. Fuzzy support vector machine (FSVM) analysis was also conducted for wide comparison in terms of accuracy, sparsity, training and testing time, as well as the effect of training sample sizes. FRVM yielded comparable classification accuracy with dramatically fewer support vectors in comparison with FSVM. Furthermore, the processing delay of FRVM was much less than that of FSVM, whilst training time of FSVM much faster than FRVM. The results indicate that FRVM classifier trained using sufficient samples can achieve comparable generalization capability as FSVM with significant sparsity in multi-channel sEMG classification, which is more suitable for sEMG-based real-time control applications.

  8. A new method for the prediction of chatter stability lobes based on dynamic cutting force simulation model and support vector machine

    NASA Astrophysics Data System (ADS)

    Peng, Chong; Wang, Lun; Liao, T. Warren

    2015-10-01

    Currently, chatter has become the critical factor in hindering machining quality and productivity in machining processes. To avoid cutting chatter, a new method based on dynamic cutting force simulation model and support vector machine (SVM) is presented for the prediction of chatter stability lobes. The cutting force is selected as the monitoring signal, and the wavelet energy entropy theory is used to extract the feature vectors. A support vector machine is constructed using the MATLAB LIBSVM toolbox for pattern classification based on the feature vectors derived from the experimental cutting data. Then combining with the dynamic cutting force simulation model, the stability lobes diagram (SLD) can be estimated. Finally, the predicted results are compared with existing methods such as zero-order analytical (ZOA) and semi-discretization (SD) method as well as actual cutting experimental results to confirm the validity of this new method.

  9. Community detection in complex networks using proximate support vector clustering

    NASA Astrophysics Data System (ADS)

    Wang, Feifan; Zhang, Baihai; Chai, Senchun; Xia, Yuanqing

    2018-03-01

    Community structure, one of the most attention attracting properties in complex networks, has been a cornerstone in advances of various scientific branches. A number of tools have been involved in recent studies concentrating on the community detection algorithms. In this paper, we propose a support vector clustering method based on a proximity graph, owing to which the introduced algorithm surpasses the traditional support vector approach both in accuracy and complexity. Results of extensive experiments undertaken on computer generated networks and real world data sets illustrate competent performances in comparison with the other counterparts.

  10. A Wavelet Support Vector Machine Combination Model for Singapore Tourist Arrival to Malaysia

    NASA Astrophysics Data System (ADS)

    Rafidah, A.; Shabri, Ani; Nurulhuda, A.; Suhaila, Y.

    2017-08-01

    In this study, wavelet support vector machine model (WSVM) is proposed and applied for monthly data Singapore tourist time series prediction. The WSVM model is combination between wavelet analysis and support vector machine (SVM). In this study, we have two parts, first part we compare between the kernel function and second part we compare between the developed models with single model, SVM. The result showed that kernel function linear better than RBF while WSVM outperform with single model SVM to forecast monthly Singapore tourist arrival to Malaysia.

  11. Predicting primary progressive aphasias with support vector machine approaches in structural MRI data.

    PubMed

    Bisenius, Sandrine; Mueller, Karsten; Diehl-Schmid, Janine; Fassbender, Klaus; Grimmer, Timo; Jessen, Frank; Kassubek, Jan; Kornhuber, Johannes; Landwehrmeyer, Bernhard; Ludolph, Albert; Schneider, Anja; Anderl-Straub, Sarah; Stuke, Katharina; Danek, Adrian; Otto, Markus; Schroeter, Matthias L

    2017-01-01

    Primary progressive aphasia (PPA) encompasses the three subtypes nonfluent/agrammatic variant PPA, semantic variant PPA, and the logopenic variant PPA, which are characterized by distinct patterns of language difficulties and regional brain atrophy. To validate the potential of structural magnetic resonance imaging data for early individual diagnosis, we used support vector machine classification on grey matter density maps obtained by voxel-based morphometry analysis to discriminate PPA subtypes (44 patients: 16 nonfluent/agrammatic variant PPA, 17 semantic variant PPA, 11 logopenic variant PPA) from 20 healthy controls (matched for sample size, age, and gender) in the cohort of the multi-center study of the German consortium for frontotemporal lobar degeneration. Here, we compared a whole-brain with a meta-analysis-based disease-specific regions-of-interest approach for support vector machine classification. We also used support vector machine classification to discriminate the three PPA subtypes from each other. Whole brain support vector machine classification enabled a very high accuracy between 91 and 97% for identifying specific PPA subtypes vs. healthy controls, and 78/95% for the discrimination between semantic variant vs. nonfluent/agrammatic or logopenic PPA variants. Only for the discrimination between nonfluent/agrammatic and logopenic PPA variants accuracy was low with 55%. Interestingly, the regions that contributed the most to the support vector machine classification of patients corresponded largely to the regions that were atrophic in these patients as revealed by group comparisons. Although the whole brain approach took also into account regions that were not covered in the regions-of-interest approach, both approaches showed similar accuracies due to the disease-specificity of the selected networks. Conclusion, support vector machine classification of multi-center structural magnetic resonance imaging data enables prediction of PPA subtypes with a very high accuracy paving the road for its application in clinical settings.

  12. A Fast Reduced Kernel Extreme Learning Machine.

    PubMed

    Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua

    2016-04-01

    In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. The Hydrologic Ensemble Prediction Experiment (HEPEX)

    NASA Astrophysics Data System (ADS)

    Wood, Andy; Wetterhall, Fredrik; Ramos, Maria-Helena

    2015-04-01

    The Hydrologic Ensemble Prediction Experiment was established in March, 2004, at a workshop hosted by the European Center for Medium Range Weather Forecasting (ECMWF), and co-sponsored by the US National Weather Service (NWS) and the European Commission (EC). The HEPEX goal was to bring the international hydrological and meteorological communities together to advance the understanding and adoption of hydrological ensemble forecasts for decision support. HEPEX pursues this goal through research efforts and practical implementations involving six core elements of a hydrologic ensemble prediction enterprise: input and pre-processing, ensemble techniques, data assimilation, post-processing, verification, and communication and use in decision making. HEPEX has grown through meetings that connect the user, forecast producer and research communities to exchange ideas, data and methods; the coordination of experiments to address specific challenges; and the formation of testbeds to facilitate shared experimentation. In the last decade, HEPEX has organized over a dozen international workshops, as well as sessions at scientific meetings (including AMS, AGU and EGU) and special issues of scientific journals where workshop results have been published. Through these interactions and an active online blog (www.hepex.org), HEPEX has built a strong and active community of nearly 400 researchers & practitioners around the world. This poster presents an overview of recent and planned HEPEX activities, highlighting case studies that exemplify the focus and objectives of HEPEX.

  14. Improving wave forecasting by integrating ensemble modelling and machine learning

    NASA Astrophysics Data System (ADS)

    O'Donncha, F.; Zhang, Y.; James, S. C.

    2017-12-01

    Modern smart-grid networks use technologies to instantly relay information on supply and demand to support effective decision making. Integration of renewable-energy resources with these systems demands accurate forecasting of energy production (and demand) capacities. For wave-energy converters, this requires wave-condition forecasting to enable estimates of energy production. Current operational wave forecasting systems exhibit substantial errors with wave-height RMSEs of 40 to 60 cm being typical, which limits the reliability of energy-generation predictions thereby impeding integration with the distribution grid. In this study, we integrate physics-based models with statistical learning aggregation techniques that combine forecasts from multiple, independent models into a single "best-estimate" prediction of the true state. The Simulating Waves Nearshore physics-based model is used to compute wind- and currents-augmented waves in the Monterey Bay area. Ensembles are developed based on multiple simulations perturbing input data (wave characteristics supplied at the model boundaries and winds) to the model. A learning-aggregation technique uses past observations and past model forecasts to calculate a weight for each model. The aggregated forecasts are compared to observation data to quantify the performance of the model ensemble and aggregation techniques. The appropriately weighted ensemble model outperforms an individual ensemble member with regard to forecasting wave conditions.

  15. Support vector machines

    NASA Technical Reports Server (NTRS)

    Garay, Michael J.; Mazzoni, Dominic; Davies, Roger; Wagstaff, Kiri

    2004-01-01

    Support Vector Machines (SVMs) are a type of supervised learning algorith,, other examples of which are Artificial Neural Networks (ANNs), Decision Trees, and Naive Bayesian Classifiers. Supervised learning algorithms are used to classify objects labled by a 'supervisor' - typically a human 'expert.'.

  16. Product Quality Modelling Based on Incremental Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Wang, J.; Zhang, W.; Qin, B.; Shi, W.

    2012-05-01

    Incremental Support vector machine (ISVM) is a new learning method developed in recent years based on the foundations of statistical learning theory. It is suitable for the problem of sequentially arriving field data and has been widely used for product quality prediction and production process optimization. However, the traditional ISVM learning does not consider the quality of the incremental data which may contain noise and redundant data; it will affect the learning speed and accuracy to a great extent. In order to improve SVM training speed and accuracy, a modified incremental support vector machine (MISVM) is proposed in this paper. Firstly, the margin vectors are extracted according to the Karush-Kuhn-Tucker (KKT) condition; then the distance from the margin vectors to the final decision hyperplane is calculated to evaluate the importance of margin vectors, where the margin vectors are removed while their distance exceed the specified value; finally, the original SVs and remaining margin vectors are used to update the SVM. The proposed MISVM can not only eliminate the unimportant samples such as noise samples, but also can preserve the important samples. The MISVM has been experimented on two public data and one field data of zinc coating weight in strip hot-dip galvanizing, and the results shows that the proposed method can improve the prediction accuracy and the training speed effectively. Furthermore, it can provide the necessary decision supports and analysis tools for auto control of product quality, and also can extend to other process industries, such as chemical process and manufacturing process.

  17. Ecological Niche Modelling Predicts Southward Expansion of Lutzomyia (Nyssomyia) flaviscutellata (Diptera: Psychodidae: Phlebotominae), Vector of Leishmania (Leishmania) amazonensis in South America, under Climate Change

    PubMed Central

    Carvalho, Bruno M.; Ready, Paul D.

    2015-01-01

    Vector borne diseases are susceptible to climate change because distributions and densities of many vectors are climate driven. The Amazon region is endemic for cutaneous leishmaniasis and is predicted to be severely impacted by climate change. Recent records suggest that the distributions of Lutzomyia (Nyssomyia) flaviscutellata and the parasite it transmits, Leishmania (Leishmania) amazonensis, are expanding southward, possibly due to climate change, and sometimes associated with new human infection cases. We define the vector’s climatic niche and explore future projections under climate change scenarios. Vector occurrence records were compiled from the literature, museum collections and Brazilian Health Departments. Six bioclimatic variables were used as predictors in six ecological niche model algorithms (BIOCLIM, DOMAIN, MaxEnt, GARP, logistic regression and Random Forest). Projections for 2050 used 17 general circulation models in two greenhouse gas representative concentration pathways: “stabilization” and “high increase”. Ensemble models and consensus maps were produced by overlapping binary predictions. Final model outputs showed good performance and significance. The use of species absence data substantially improved model performance. Currently, L. flaviscutellata is widely distributed in the Amazon region, with records in the Atlantic Forest and savannah regions of Central Brazil. Future projections indicate expansion of the climatically suitable area for the vector in both scenarios, towards higher latitudes and elevations. L. flaviscutellata is likely to find increasingly suitable conditions for its expansion into areas where human population size and density are much larger than they are in its current locations. If environmental conditions change as predicted, the range of the vector is likely to expand to southeastern and central-southern Brazil, eastern Paraguay and further into the Amazonian areas of Bolivia, Peru, Ecuador, Colombia and Venezuela. These areas will only become endemic for L. amazonensis, however, if they have competent reservoir hosts and transmission dynamics matching those in the Amazon region. PMID:26619186

  18. Addressing model uncertainty through stochastic parameter perturbations within the High Resolution Rapid Refresh (HRRR) ensemble

    NASA Astrophysics Data System (ADS)

    Wolff, J.; Jankov, I.; Beck, J.; Carson, L.; Frimel, J.; Harrold, M.; Jiang, H.

    2016-12-01

    It is well known that global and regional numerical weather prediction ensemble systems are under-dispersive, producing unreliable and overconfident ensemble forecasts. Typical approaches to alleviate this problem include the use of multiple dynamic cores, multiple physics suite configurations, or a combination of the two. While these approaches may produce desirable results, they have practical and theoretical deficiencies and are more difficult and costly to maintain. An active area of research that promotes a more unified and sustainable system for addressing the deficiencies in ensemble modeling is the use of stochastic physics to represent model-related uncertainty. Stochastic approaches include Stochastic Parameter Perturbations (SPP), Stochastic Kinetic Energy Backscatter (SKEB), Stochastic Perturbation of Physics Tendencies (SPPT), or some combination of all three. The focus of this study is to assess the model performance within a convection-permitting ensemble at 3-km grid spacing across the Contiguous United States (CONUS) when using stochastic approaches. For this purpose, the test utilized a single physics suite configuration based on the operational High-Resolution Rapid Refresh (HRRR) model, with ensemble members produced by employing stochastic methods. Parameter perturbations were employed in the Rapid Update Cycle (RUC) land surface model and Mellor-Yamada-Nakanishi-Niino (MYNN) planetary boundary layer scheme. Results will be presented in terms of bias, error, spread, skill, accuracy, reliability, and sharpness using the Model Evaluation Tools (MET) verification package. Due to the high level of complexity of running a frequently updating (hourly), high spatial resolution (3 km), large domain (CONUS) ensemble system, extensive high performance computing (HPC) resources were needed to meet this objective. Supercomputing resources were provided through the National Center for Atmospheric Research (NCAR) Strategic Capability (NSC) project support, allowing for a more extensive set of tests over multiple seasons, consequently leading to more robust results. Through the use of these stochastic innovations and powerful supercomputing at NCAR, further insights and advancements in ensemble forecasting at convection-permitting scales will be possible.

  19. An evaluation of soil water outlooks for winter wheat in south-eastern Australia

    NASA Astrophysics Data System (ADS)

    Western, A. W.; Dassanayake, K. B.; Perera, K. C.; Alves, O.; Young, G.; Argent, R.

    2015-12-01

    Abstract: Soil moisture is a key limiting resource for rain-fed cropping in Australian broad-acre cropping zones. Seasonal rainfall and temperature outlooks are standard operational services offered by the Australian Bureau of Meteorology and are routinely used to support agricultural decisions. This presentation examines the performance of proposed soil water seasonal outlooks in the context of wheat cropping in south-eastern Australia (autumn planting, late spring harvest). We used weather ensembles simulated by the Predictive Ocean-Atmosphere Model for Australia (POAMA), as input to the Agricultural Production Simulator (APSIM) to construct ensemble soil water "outlooks" at twenty sites. Hindcasts were made over a 33 year period using the 33 POAMA ensemble members. The overall modelling flow involved: 1. Downscaling of the daily weather series (rainfall, minimum and maximum temperature, humidity, radiation) from the ~250km POAMA grid scale to a local weather station using quantile-quantile correction. This was based on a 33 year observation record extracted from the SILO data drill product. 2. Using APSIM to produce soil water ensembles from the downscaled weather ensembles. A warm up period of 5 years of observed weather was followed by a 9 month hindcast period based on each ensemble member. 3. The soil water ensembles were summarized by estimating the proportion of outlook ensembles in each climatological tercile, where the climatology was constructed using APSIM and observed weather from the 33 years of hindcasts at the relevant site. 4. The soil water outlooks were evaluated for different lead times and months using a "truth" run of APSIM based on observed weather. Outlooks generally have useful some forecast skill for lead times of up to two-three months, except late spring; in line with current useful lead times for rainfall outlooks. Better performance was found in summer and autumn when vegetation cover and water use is low.

  20. Ensemble Sparse Classification of Alzheimer’s Disease

    PubMed Central

    Liu, Manhua; Zhang, Daoqiang; Shen, Dinggang

    2012-01-01

    The high-dimensional pattern classification methods, e.g., support vector machines (SVM), have been widely investigated for analysis of structural and functional brain images (such as magnetic resonance imaging (MRI)) to assist the diagnosis of Alzheimer’s disease (AD) including its prodromal stage, i.e., mild cognitive impairment (MCI). Most existing classification methods extract features from neuroimaging data and then construct a single classifier to perform classification. However, due to noise and small sample size of neuroimaging data, it is challenging to train only a global classifier that can be robust enough to achieve good classification performance. In this paper, instead of building a single global classifier, we propose a local patch-based subspace ensemble method which builds multiple individual classifiers based on different subsets of local patches and then combines them for more accurate and robust classification. Specifically, to capture the local spatial consistency, each brain image is partitioned into a number of local patches and a subset of patches is randomly selected from the patch pool to build a weak classifier. Here, the sparse representation-based classification (SRC) method, which has shown effective for classification of image data (e.g., face), is used to construct each weak classifier. Then, multiple weak classifiers are combined to make the final decision. We evaluate our method on 652 subjects (including 198 AD patients, 225 MCI and 229 normal controls) from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using MR images. The experimental results show that our method achieves an accuracy of 90.8% and an area under the ROC curve (AUC) of 94.86% for AD classification and an accuracy of 87.85% and an AUC of 92.90% for MCI classification, respectively, demonstrating a very promising performance of our method compared with the state-of-the-art methods for AD/MCI classification using MR images. PMID:22270352

  1. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography.

    PubMed

    Narula, Sukrit; Shameer, Khader; Salem Omar, Alaa Mabrouk; Dudley, Joel T; Sengupta, Partho P

    2016-11-29

    Machine-learning models may aid cardiac phenotypic recognition by using features of cardiac tissue deformation. This study investigated the diagnostic value of a machine-learning framework that incorporates speckle-tracking echocardiographic data for automated discrimination of hypertrophic cardiomyopathy (HCM) from physiological hypertrophy seen in athletes (ATH). Expert-annotated speckle-tracking echocardiographic datasets obtained from 77 ATH and 62 HCM patients were used for developing an automated system. An ensemble machine-learning model with 3 different machine-learning algorithms (support vector machines, random forests, and artificial neural networks) was developed and a majority voting method was used for conclusive predictions with further K-fold cross-validation. Feature selection using an information gain (IG) algorithm revealed that volume was the best predictor for differentiating between HCM ands. ATH (IG = 0.24) followed by mid-left ventricular segmental (IG = 0.134) and average longitudinal strain (IG = 0.131). The ensemble machine-learning model showed increased sensitivity and specificity compared with early-to-late diastolic transmitral velocity ratio (p < 0.01), average early diastolic tissue velocity (e') (p < 0.01), and strain (p = 0.04). Because ATH were younger, adjusted analysis was undertaken in younger HCM patients and compared with ATH with left ventricular wall thickness >13 mm. In this subgroup analysis, the automated model continued to show equal sensitivity, but increased specificity relative to early-to-late diastolic transmitral velocity ratio, e', and strain. Our results suggested that machine-learning algorithms can assist in the discrimination of physiological versus pathological patterns of hypertrophic remodeling. This effort represents a step toward the development of a real-time, machine-learning-based system for automated interpretation of echocardiographic images, which may help novice readers with limited experience. Copyright © 2016 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  2. Four types of ensemble coding in data visualizations.

    PubMed

    Szafir, Danielle Albers; Haroz, Steve; Gleicher, Michael; Franconeri, Steven

    2016-01-01

    Ensemble coding supports rapid extraction of visual statistics about distributed visual information. Researchers typically study this ability with the goal of drawing conclusions about how such coding extracts information from natural scenes. Here we argue that a second domain can serve as another strong inspiration for understanding ensemble coding: graphs, maps, and other visual presentations of data. Data visualizations allow observers to leverage their ability to perform visual ensemble statistics on distributions of spatial or featural visual information to estimate actual statistics on data. We survey the types of visual statistical tasks that occur within data visualizations across everyday examples, such as scatterplots, and more specialized images, such as weather maps or depictions of patterns in text. We divide these tasks into four categories: identification of sets of values, summarization across those values, segmentation of collections, and estimation of structure. We point to unanswered questions for each category and give examples of such cross-pollination in the current literature. Increased collaboration between the data visualization and perceptual psychology research communities can inspire new solutions to challenges in visualization while simultaneously exposing unsolved problems in perception research.

  3. DrugECs: An Ensemble System with Feature Subspaces for Accurate Drug-Target Interaction Prediction

    PubMed Central

    Jiang, Jinjian; Wang, Nian; Zhang, Jun

    2017-01-01

    Background Drug-target interaction is key in drug discovery, especially in the design of new lead compound. However, the work to find a new lead compound for a specific target is complicated and hard, and it always leads to many mistakes. Therefore computational techniques are commonly adopted in drug design, which can save time and costs to a significant extent. Results To address the issue, a new prediction system is proposed in this work to identify drug-target interaction. First, drug-target pairs are encoded with a fragment technique and the software “PaDEL-Descriptor.” The fragment technique is for encoding target proteins, which divides each protein sequence into several fragments in order and encodes each fragment with several physiochemical properties of amino acids. The software “PaDEL-Descriptor” creates encoding vectors for drug molecules. Second, the dataset of drug-target pairs is resampled and several overlapped subsets are obtained, which are then input into kNN (k-Nearest Neighbor) classifier to build an ensemble system. Conclusion Experimental results on the drug-target dataset showed that our method performs better and runs faster than the state-of-the-art predictors. PMID:28744468

  4. [Design Method Analysis and Performance Comparison of Wall Filter for Ultrasound Color Flow Imaging].

    PubMed

    Wang, Lutao; Xiao, Jun; Chai, Hua

    2015-08-01

    The successful suppression of clutter arising from stationary or slowly moving tissue is one of the key issues in medical ultrasound color blood imaging. Remaining clutter may cause bias in the mean blood frequency estimation and results in a potentially misleading description of blood-flow. In this paper, based on the principle of general wall-filter, the design process of three classes of filters, infinitely impulse response with projection initialization (Prj-IIR), polynomials regression (Pol-Reg), and eigen-based filters are previewed and analyzed. The performance of the filters was assessed by calculating the bias and variance of a mean blood velocity using a standard autocorrelation estimator. Simulation results show that the performance of Pol-Reg filter is similar to Prj-IIR filters. Both of them can offer accurate estimation of mean blood flow speed under steady clutter conditions, and the clutter rejection ability can be enhanced by increasing the ensemble size of Doppler vector. Eigen-based filters can effectively remove the non-stationary clutter component, and further improve the estimation accuracy for low speed blood flow signals. There is also no significant increase in computation complexity for eigen-based filters when the ensemble size is less than 10.

  5. Principle Component Analysis of AIRS and CrIS Data

    NASA Technical Reports Server (NTRS)

    Aumann, H. H.; Manning, Evan

    2015-01-01

    Synthetic Eigen Vectors (EV) used for the statistical analysis of the PC reconstruction residual of large ensembles of data are a novel tool for the analysis of data from hyperspectral infrared sounders like the Atmospheric Infrared Sounder (AIRS) on the EOS Aqua and the Cross-track Infrared Sounder (CrIS) on the SUOMI polar orbiting satellites. Unlike empirical EV, which are derived from the observed spectra, the synthetic EV are derived from a large ensemble of spectra which are calculated assuming that, given a state of the atmosphere, the spectra created by the instrument can be accurately calculated. The synthetic EV are then used to reconstruct the observed spectra. The analysis of the differences between the observed spectra and the reconstructed spectra for Simultaneous Nadir Overpasses of tropical oceans reveals unexpected differences at the more than 200 mK level under relatively clear conditions, particularly in the mid-wave water vapor channels of CrIS. The repeatability of these differences using independently trained SEV and results from different years appears to rule out inconsistencies in the radiative transfer algorithm or the data simulation. The reasons for these discrepancies are under evaluation.

  6. Task-phase-specific dynamics of basal forebrain neuronal ensembles

    PubMed Central

    Tingley, David; Alexander, Andrew S.; Kolbu, Sean; de Sa, Virginia R.; Chiba, Andrea A.; Nitz, Douglas A.

    2014-01-01

    Cortically projecting basal forebrain neurons play a critical role in learning and attention, and their degeneration accompanies age-related impairments in cognition. Despite the impressive anatomical and cell-type complexity of this system, currently available data suggest that basal forebrain neurons lack complexity in their response fields, with activity primarily reflecting only macro-level brain states such as sleep and wake, onset of relevant stimuli and/or reward obtainment. The current study examined the spiking activity of basal forebrain neuron populations across multiple phases of a selective attention task, addressing, in particular, the issue of complexity in ensemble firing patterns across time. Clustering techniques applied to the full population revealed a large number of distinct categories of task-phase-specific activity patterns. Unique population firing-rate vectors defined each task phase and most categories of task-phase-specific firing had counterparts with opposing firing patterns. An analogous set of task-phase-specific firing patterns was also observed in a population of posterior parietal cortex neurons. Thus, consistent with the known anatomical complexity, basal forebrain population dynamics are capable of differentially modulating their cortical targets according to the unique sets of environmental stimuli, motor requirements, and cognitive processes associated with different task phases. PMID:25309352

  7. Seeking for the rational basis of the median model: the optimal combination of multi-model ensemble results

    NASA Astrophysics Data System (ADS)

    Riccio, A.; Giunta, G.; Galmarini, S.

    2007-04-01

    In this paper we present an approach for the statistical analysis of multi-model ensemble results. The models considered here are operational long-range transport and dispersion models, also used for the real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides. We first introduce the theoretical basis (with its roots sinking into the Bayes theorem) and then apply this approach to the analysis of model results obtained during the ETEX-1 exercise. We recover some interesting results, supporting the heuristic approach called "median model", originally introduced in Galmarini et al. (2004a, b). This approach also provides a way to systematically reduce (and quantify) model uncertainties, thus supporting the decision-making process and/or regulatory-purpose activities in a very effective manner.

  8. Seeking for the rational basis of the Median Model: the optimal combination of multi-model ensemble results

    NASA Astrophysics Data System (ADS)

    Riccio, A.; Giunta, G.; Galmarini, S.

    2007-12-01

    In this paper we present an approach for the statistical analysis of multi-model ensemble results. The models considered here are operational long-range transport and dispersion models, also used for the real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides. We first introduce the theoretical basis (with its roots sinking into the Bayes theorem) and then apply this approach to the analysis of model results obtained during the ETEX-1 exercise. We recover some interesting results, supporting the heuristic approach called "median model", originally introduced in Galmarini et al. (2004a, b). This approach also provides a way to systematically reduce (and quantify) model uncertainties, thus supporting the decision-making process and/or regulatory-purpose activities in a very effective manner.

  9. Label noise in subtype discrimination of class C G protein-coupled receptors: A systematic approach to the analysis of classification errors.

    PubMed

    König, Caroline; Cárdenas, Martha I; Giraldo, Jesús; Alquézar, René; Vellido, Alfredo

    2015-09-29

    The characterization of proteins in families and subfamilies, at different levels, entails the definition and use of class labels. When the adscription of a protein to a family is uncertain, or even wrong, this becomes an instance of what has come to be known as a label noise problem. Label noise has a potentially negative effect on any quantitative analysis of proteins that depends on label information. This study investigates class C of G protein-coupled receptors, which are cell membrane proteins of relevance both to biology in general and pharmacology in particular. Their supervised classification into different known subtypes, based on primary sequence data, is hampered by label noise. The latter may stem from a combination of expert knowledge limitations and the lack of a clear correspondence between labels that mostly reflect GPCR functionality and the different representations of the protein primary sequences. In this study, we describe a systematic approach, using Support Vector Machine classifiers, to the analysis of G protein-coupled receptor misclassifications. As a proof of concept, this approach is used to assist the discovery of labeling quality problems in a curated, publicly accessible database of this type of proteins. We also investigate the extent to which physico-chemical transformations of the protein sequences reflect G protein-coupled receptor subtype labeling. The candidate mislabeled cases detected with this approach are externally validated with phylogenetic trees and against further trusted sources such as the National Center for Biotechnology Information, Universal Protein Resource, European Bioinformatics Institute and Ensembl Genome Browser information repositories. In quantitative classification problems, class labels are often by default assumed to be correct. Label noise, though, is bound to be a pervasive problem in bioinformatics, where labels may be obtained indirectly through complex, many-step similarity modelling processes. In the case of G protein-coupled receptors, methods capable of singling out and characterizing those sequences with consistent misclassification behaviour are required to minimize this problem. A systematic, Support Vector Machine-based method has been proposed in this study for such purpose. The proposed method enables a filtering approach to the label noise problem and might become a support tool for database curators in proteomics.

  10. Bayesian Kernel Methods for Non-Gaussian Distributions: Binary and Multi-class Classification Problems

    DTIC Science & Technology

    2013-05-28

    those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms . When one class occurs...incremental support vector machine algorithm for online learning when fewer than 50 data points are available. (a) Papers published in peer-reviewed journals...learning environments, where data processing occurs one observation at a time and the classification algorithm improves over time with new

  11. Transient Calibration of a Variably-Saturated Groundwater Flow Model By Iterative Ensemble Smoothering: Synthetic Case and Application to the Flow Induced During Shaft Excavation and Operation of the Bure Underground Research Laboratory

    NASA Astrophysics Data System (ADS)

    Lam, D. T.; Kerrou, J.; Benabderrahmane, H.; Perrochet, P.

    2017-12-01

    The calibration of groundwater flow models in transient state can be motivated by the expected improved characterization of the aquifer hydraulic properties, especially when supported by a rich transient dataset. In the prospect of setting up a calibration strategy for a variably-saturated transient groundwater flow model of the area around the ANDRA's Bure Underground Research Laboratory, we wish to take advantage of the long hydraulic head and flowrate time series collected near and at the access shafts in order to help inform the model hydraulic parameters. A promising inverse approach for such high-dimensional nonlinear model, and which applicability has been illustrated more extensively in other scientific fields, could be an iterative ensemble smoother algorithm initially developed for a reservoir engineering problem. Furthermore, the ensemble-based stochastic framework will allow to address to some extent the uncertainty of the calibration for a subsequent analysis of a flow process dependent prediction. By assimilating the available data in one single step, this method iteratively updates each member of an initial ensemble of stochastic realizations of parameters until the minimization of an objective function. However, as it is well known for ensemble-based Kalman methods, this correction computed from approximations of covariance matrices is most efficient when the ensemble realizations are multi-Gaussian. As shown by the comparison of the updated ensemble mean obtained for our simplified synthetic model of 2D vertical flow by using either multi-Gaussian or multipoint simulations of parameters, the ensemble smoother fails to preserve the initial connectivity of the facies and the parameter bimodal distribution. Given the geological structures depicted by the multi-layered geological model built for the real case, our goal is to find how to still best leverage the performance of the ensemble smoother while using an initial ensemble of conditional multi-Gaussian simulations or multipoint simulations as conceptually consistent as possible. Performance of the algorithm including additional steps to help mitigate the effects of non-Gaussian patterns, such as Gaussian anamorphosis, or resampling of facies from the training image using updated local probability constraints will be assessed.

  12. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  13. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  14. An empirical comparison of different approaches for combining multimodal neuroimaging data with support vector machine

    PubMed Central

    Pettersson-Yeo, William; Benetti, Stefania; Marquand, Andre F.; Joules, Richard; Catani, Marco; Williams, Steve C. R.; Allen, Paul; McGuire, Philip; Mechelli, Andrea

    2014-01-01

    In the pursuit of clinical utility, neuroimaging researchers of psychiatric and neurological illness are increasingly using analyses, such as support vector machine, that allow inference at the single-subject level. Recent studies employing single-modality data, however, suggest that classification accuracies must be improved for such utility to be realized. One possible solution is to integrate different data types to provide a single combined output classification; either by generating a single decision function based on an integrated kernel matrix, or, by creating an ensemble of multiple single modality classifiers and integrating their predictions. Here, we describe four integrative approaches: (1) an un-weighted sum of kernels, (2) multi-kernel learning, (3) prediction averaging, and (4) majority voting, and compare their ability to enhance classification accuracy relative to the best single-modality classification accuracy. We achieve this by integrating structural, functional, and diffusion tensor magnetic resonance imaging data, in order to compare ultra-high risk (n = 19), first episode psychosis (n = 19) and healthy control subjects (n = 23). Our results show that (i) whilst integration can enhance classification accuracy by up to 13%, the frequency of such instances may be limited, (ii) where classification can be enhanced, simple methods may yield greater increases relative to more computationally complex alternatives, and, (iii) the potential for classification enhancement is highly influenced by the specific diagnostic comparison under consideration. In conclusion, our findings suggest that for moderately sized clinical neuroimaging datasets, combining different imaging modalities in a data-driven manner is no “magic bullet” for increasing classification accuracy. However, it remains possible that this conclusion is dependent on the use of neuroimaging modalities that had little, or no, complementary information to offer one another, and that the integration of more diverse types of data would have produced greater classification enhancement. We suggest that future studies ideally examine a greater variety of data types (e.g., genetic, cognitive, and neuroimaging) in order to identify the data types and combinations optimally suited to the classification of early stage psychosis. PMID:25076868

  15. An empirical comparison of different approaches for combining multimodal neuroimaging data with support vector machine.

    PubMed

    Pettersson-Yeo, William; Benetti, Stefania; Marquand, Andre F; Joules, Richard; Catani, Marco; Williams, Steve C R; Allen, Paul; McGuire, Philip; Mechelli, Andrea

    2014-01-01

    In the pursuit of clinical utility, neuroimaging researchers of psychiatric and neurological illness are increasingly using analyses, such as support vector machine, that allow inference at the single-subject level. Recent studies employing single-modality data, however, suggest that classification accuracies must be improved for such utility to be realized. One possible solution is to integrate different data types to provide a single combined output classification; either by generating a single decision function based on an integrated kernel matrix, or, by creating an ensemble of multiple single modality classifiers and integrating their predictions. Here, we describe four integrative approaches: (1) an un-weighted sum of kernels, (2) multi-kernel learning, (3) prediction averaging, and (4) majority voting, and compare their ability to enhance classification accuracy relative to the best single-modality classification accuracy. We achieve this by integrating structural, functional, and diffusion tensor magnetic resonance imaging data, in order to compare ultra-high risk (n = 19), first episode psychosis (n = 19) and healthy control subjects (n = 23). Our results show that (i) whilst integration can enhance classification accuracy by up to 13%, the frequency of such instances may be limited, (ii) where classification can be enhanced, simple methods may yield greater increases relative to more computationally complex alternatives, and, (iii) the potential for classification enhancement is highly influenced by the specific diagnostic comparison under consideration. In conclusion, our findings suggest that for moderately sized clinical neuroimaging datasets, combining different imaging modalities in a data-driven manner is no "magic bullet" for increasing classification accuracy. However, it remains possible that this conclusion is dependent on the use of neuroimaging modalities that had little, or no, complementary information to offer one another, and that the integration of more diverse types of data would have produced greater classification enhancement. We suggest that future studies ideally examine a greater variety of data types (e.g., genetic, cognitive, and neuroimaging) in order to identify the data types and combinations optimally suited to the classification of early stage psychosis.

  16. Uncertainty analysis of neural network based flood forecasting models: An ensemble based approach for constructing prediction interval

    NASA Astrophysics Data System (ADS)

    Kasiviswanathan, K.; Sudheer, K.

    2013-05-01

    Artificial neural network (ANN) based hydrologic models have gained lot of attention among water resources engineers and scientists, owing to their potential for accurate prediction of flood flows as compared to conceptual or physics based hydrologic models. The ANN approximates the non-linear functional relationship between the complex hydrologic variables in arriving at the river flow forecast values. Despite a large number of applications, there is still some criticism that ANN's point prediction lacks in reliability since the uncertainty of predictions are not quantified, and it limits its use in practical applications. A major concern in application of traditional uncertainty analysis techniques on neural network framework is its parallel computing architecture with large degrees of freedom, which makes the uncertainty assessment a challenging task. Very limited studies have considered assessment of predictive uncertainty of ANN based hydrologic models. In this study, a novel method is proposed that help construct the prediction interval of ANN flood forecasting model during calibration itself. The method is designed to have two stages of optimization during calibration: at stage 1, the ANN model is trained with genetic algorithm (GA) to obtain optimal set of weights and biases vector, and during stage 2, the optimal variability of ANN parameters (obtained in stage 1) is identified so as to create an ensemble of predictions. During the 2nd stage, the optimization is performed with multiple objectives, (i) minimum residual variance for the ensemble mean, (ii) maximum measured data points to fall within the estimated prediction interval and (iii) minimum width of prediction interval. The method is illustrated using a real world case study of an Indian basin. The method was able to produce an ensemble that has an average prediction interval width of 23.03 m3/s, with 97.17% of the total validation data points (measured) lying within the interval. The derived prediction interval for a selected hydrograph in the validation data set is presented in Fig 1. It is noted that most of the observed flows lie within the constructed prediction interval, and therefore provides information about the uncertainty of the prediction. One specific advantage of the method is that when ensemble mean value is considered as a forecast, the peak flows are predicted with improved accuracy by this method compared to traditional single point forecasted ANNs. Fig. 1 Prediction Interval for selected hydrograph

  17. A stochastic ensemble-based model to predict crop water requirements from numerical weather forecasts and VIS-NIR high resolution satellite images in Southern Italy

    NASA Astrophysics Data System (ADS)

    Pelosi, Anna; Falanga Bolognesi, Salvatore; De Michele, Carlo; Medina Gonzalez, Hanoi; Villani, Paolo; D'Urso, Guido; Battista Chirico, Giovanni

    2015-04-01

    Irrigation agriculture is one the biggest consumer of water in Europe, especially in southern regions, where it accounts for up to 70% of the total water consumption. The EU Common Agricultural Policy, combined with the Water Framework Directive, imposes to farmers and irrigation managers a substantial increase of the efficiency in the use of water in agriculture for the next decade. Ensemble numerical weather predictions can be valuable data for developing operational advisory irrigation services. We propose a stochastic ensemble-based model providing spatial and temporal estimates of crop water requirements, implemented within an advisory service offering detailed maps of irrigation water requirements and crop water consumption estimates, to be used by water irrigation managers and farmers. The stochastic model combines estimates of crop potential evapotranspiration retrieved from ensemble numerical weather forecasts (COSMO-LEPS, 16 members, 7 km resolution) and canopy parameters (LAI, albedo, fractional vegetation cover) derived from high resolution satellite images in the visible and near infrared wavelengths. The service provides users with daily estimates of crop water requirements for lead times up to five days. The temporal evolution of the crop potential evapotranspiration is simulated with autoregressive models. An ensemble Kalman filter is employed for updating model states by assimilating both ground based meteorological variables (where available) and numerical weather forecasts. The model has been applied in Campania region (Southern Italy), where a satellite assisted irrigation advisory service has been operating since 2006. This work presents the results of the system performance for one year of experimental service. The results suggest that the proposed model can be an effective support for a sustainable use and management of irrigation water, under conditions of water scarcity and drought. Since the evapotranspiration term represents a staple component in the water balance of a catchment, as outstanding future development, the model could also offer an advanced support for water resources management decisions at catchment scale.

  18. Vector-model-supported approach in prostate plan optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Eva Sau Fan; Department of Health Technology and Informatics, The Hong Kong Polytechnic University; Wu, Vincent Wing Cheung

    Lengthy time consumed in traditional manual plan optimization can limit the use of step-and-shoot intensity-modulated radiotherapy/volumetric-modulated radiotherapy (S&S IMRT/VMAT). A vector model base, retrieving similar radiotherapy cases, was developed with respect to the structural and physiologic features extracted from the Digital Imaging and Communications in Medicine (DICOM) files. Planning parameters were retrieved from the selected similar reference case and applied to the test case to bypass the gradual adjustment of planning parameters. Therefore, the planning time spent on the traditional trial-and-error manual optimization approach in the beginning of optimization could be reduced. Each S&S IMRT/VMAT prostate reference database comprised 100more » previously treated cases. Prostate cases were replanned with both traditional optimization and vector-model-supported optimization based on the oncologists' clinical dose prescriptions. A total of 360 plans, which consisted of 30 cases of S&S IMRT, 30 cases of 1-arc VMAT, and 30 cases of 2-arc VMAT plans including first optimization and final optimization with/without vector-model-supported optimization, were compared using the 2-sided t-test and paired Wilcoxon signed rank test, with a significance level of 0.05 and a false discovery rate of less than 0.05. For S&S IMRT, 1-arc VMAT, and 2-arc VMAT prostate plans, there was a significant reduction in the planning time and iteration with vector-model-supported optimization by almost 50%. When the first optimization plans were compared, 2-arc VMAT prostate plans had better plan quality than 1-arc VMAT plans. The volume receiving 35 Gy in the femoral head for 2-arc VMAT plans was reduced with the vector-model-supported optimization compared with the traditional manual optimization approach. Otherwise, the quality of plans from both approaches was comparable. Vector-model-supported optimization was shown to offer much shortened planning time and iteration number without compromising the plan quality.« less

  19. A helper virus-free HSV-1 vector containing the vesicular glutamate transporter-1 promoter supports expression preferentially in VGLUT1-containing glutamatergic neurons.

    PubMed

    Zhang, Guo-rong; Geller, Alfred I

    2010-05-17

    Multiple potential uses of direct gene transfer into neurons require restricting expression to specific classes of glutamatergic neurons. Thus, it is desirable to develop vectors containing glutamatergic class-specific promoters. The three vesicular glutamate transporters (VGLUTs) are expressed in distinct populations of neurons, and VGLUT1 is the predominant VGLUT in the neocortex, hippocampus, and cerebellar cortex. We previously reported a plasmid (amplicon) Herpes Simplex Virus (HSV-1) vector that placed the Lac Z gene under the regulation of the VGLUT1 promoter (pVGLUT1lac). Using helper virus-free vector stocks, we showed that this vector supported approximately 90% glutamatergic neuron-specific expression in postrhinal (POR) cortex, in rats sacrificed at either 4 days or 2 months after gene transfer. We now show that pVGLUT1lac supports expression preferentially in VGLUT1-containing glutamatergic neurons. pVGLUT1lac vector stock was injected into either POR cortex, which contains primarily VGLUT1-containing glutamatergic neurons, or into the ventral medial hypothalamus (VMH), which contains predominantly VGLUT2-containing glutamatergic neurons. Rats were sacrificed at 4 days after gene transfer, and the types of cells expressing ss-galactosidase were determined by immunofluorescent costaining. Cell counts showed that pVGLUT1lac supported expression in approximately 10-fold more cells in POR cortex than in the VMH, whereas a control vector supported expression in similar numbers of cells in these two areas. Further, in POR cortex, pVGLUT1lac supported expression predominately in VGLUT1-containing neurons, and, in the VMH, pVGLUT1lac showed an approximately 10-fold preference for the rare VGLUT1-containing neurons. VGLUT1-specific expression may benefit specific experiments on learning or specific gene therapy approaches, particularly in the neocortex. Copyright 2010 Elsevier B.V. All rights reserved.

  20. Automatic event detection in low SNR microseismic signals based on multi-scale permutation entropy and a support vector machine

    NASA Astrophysics Data System (ADS)

    Jia, Rui-Sheng; Sun, Hong-Mei; Peng, Yan-Jun; Liang, Yong-Quan; Lu, Xin-Ming

    2017-07-01

    Microseismic monitoring is an effective means for providing early warning of rock or coal dynamical disasters, and its first step is microseismic event detection, although low SNR microseismic signals often cannot effectively be detected by routine methods. To solve this problem, this paper presents permutation entropy and a support vector machine to detect low SNR microseismic events. First, an extraction method of signal features based on multi-scale permutation entropy is proposed by studying the influence of the scale factor on the signal permutation entropy. Second, the detection model of low SNR microseismic events based on the least squares support vector machine is built by performing a multi-scale permutation entropy calculation for the collected vibration signals, constructing a feature vector set of signals. Finally, a comparative analysis of the microseismic events and noise signals in the experiment proves that the different characteristics of the two can be fully expressed by using multi-scale permutation entropy. The detection model of microseismic events combined with the support vector machine, which has the features of high classification accuracy and fast real-time algorithms, can meet the requirements of online, real-time extractions of microseismic events.

  1. Stimuli Reduce the Dimensionality of Cortical Activity

    PubMed Central

    Mazzucato, Luca; Fontanini, Alfredo; La Camera, Giancarlo

    2016-01-01

    The activity of ensembles of simultaneously recorded neurons can be represented as a set of points in the space of firing rates. Even though the dimension of this space is equal to the ensemble size, neural activity can be effectively localized on smaller subspaces. The dimensionality of the neural space is an important determinant of the computational tasks supported by the neural activity. Here, we investigate the dimensionality of neural ensembles from the sensory cortex of alert rats during periods of ongoing (inter-trial) and stimulus-evoked activity. We find that dimensionality grows linearly with ensemble size, and grows significantly faster during ongoing activity compared to evoked activity. We explain these results using a spiking network model based on a clustered architecture. The model captures the difference in growth rate between ongoing and evoked activity and predicts a characteristic scaling with ensemble size that could be tested in high-density multi-electrode recordings. Moreover, we present a simple theory that predicts the existence of an upper bound on dimensionality. This upper bound is inversely proportional to the amount of pair-wise correlations and, compared to a homogeneous network without clusters, it is larger by a factor equal to the number of clusters. The empirical estimation of such bounds depends on the number and duration of trials and is well predicted by the theory. Together, these results provide a framework to analyze neural dimensionality in alert animals, its behavior under stimulus presentation, and its theoretical dependence on ensemble size, number of clusters, and correlations in spiking network models. PMID:26924968

  2. Stimuli Reduce the Dimensionality of Cortical Activity.

    PubMed

    Mazzucato, Luca; Fontanini, Alfredo; La Camera, Giancarlo

    2016-01-01

    The activity of ensembles of simultaneously recorded neurons can be represented as a set of points in the space of firing rates. Even though the dimension of this space is equal to the ensemble size, neural activity can be effectively localized on smaller subspaces. The dimensionality of the neural space is an important determinant of the computational tasks supported by the neural activity. Here, we investigate the dimensionality of neural ensembles from the sensory cortex of alert rats during periods of ongoing (inter-trial) and stimulus-evoked activity. We find that dimensionality grows linearly with ensemble size, and grows significantly faster during ongoing activity compared to evoked activity. We explain these results using a spiking network model based on a clustered architecture. The model captures the difference in growth rate between ongoing and evoked activity and predicts a characteristic scaling with ensemble size that could be tested in high-density multi-electrode recordings. Moreover, we present a simple theory that predicts the existence of an upper bound on dimensionality. This upper bound is inversely proportional to the amount of pair-wise correlations and, compared to a homogeneous network without clusters, it is larger by a factor equal to the number of clusters. The empirical estimation of such bounds depends on the number and duration of trials and is well predicted by the theory. Together, these results provide a framework to analyze neural dimensionality in alert animals, its behavior under stimulus presentation, and its theoretical dependence on ensemble size, number of clusters, and correlations in spiking network models.

  3. AI User Support System for SAP ERP

    NASA Astrophysics Data System (ADS)

    Vlasov, Vladimir; Chebotareva, Victoria; Rakhimov, Marat; Kruglikov, Sergey

    2017-10-01

    An intelligent system for SAP ERP user support is proposed in this paper. It enables automatic replies on users’ requests for support, saving time for problem analysis and resolution and improving responsiveness for end users. The system is based on an ensemble of machine learning algorithms of multiclass text classification, providing efficient question understanding, and a special framework for evidence retrieval, providing the best answer derivation.

  4. Earth's magnetic moment during geomagnetic reversals

    NASA Astrophysics Data System (ADS)

    Sokoloff, D. D.

    2017-11-01

    The behavior of the dipole magnetic moment of the geomagnetic field during the reversals is considered. By analogy with the reversals of the magnetic field of the Sun, the scenario is suggested in which during the reversal the mean dipole moment becomes zero, whereas the instantaneous value of the dipole magnetic moment remains nonzero and the corresponding vector rotates from the vicinity of one geographical pole to the other. A thorough discussion concerning the definition of the mean magnetic moment, which is used in this concept, is presented. Since the behavior of the geomagnetic field during the reversal is far from stationary, the ensemble average instead of the time average has to be considered.

  5. A hybrid approach to select features and classify diseases based on medical data

    NASA Astrophysics Data System (ADS)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  6. Evaluating an ensemble classification approach for crop diversity verification in Danish greening subsidy control

    NASA Astrophysics Data System (ADS)

    Chellasamy, Menaka; Ferré, Ty Paul Andrew; Greve, Mogens Humlekrog

    2016-07-01

    Beginning in 2015, Danish farmers are obliged to meet specific crop diversification rules based on total land area and number of crops cultivated to be eligible for new greening subsidies. Hence, there is a need for the Danish government to extend their subsidy control system to verify farmers' declarations to warrant greening payments under the new crop diversification rules. Remote Sensing (RS) technology has been used since 1992 to control farmers' subsidies in Denmark. However, a proper RS-based approach is yet to be finalised to validate new crop diversity requirements designed for assessing compliance under the recent subsidy scheme (2014-2020); This study uses an ensemble classification approach (proposed by the authors in previous studies) for validating the crop diversity requirements of the new rules. The approach uses a neural network ensemble classification system with bi-temporal (spring and early summer) WorldView-2 imagery (WV2) and includes the following steps: (1) automatic computation of pixel-based prediction probabilities using multiple neural networks; (2) quantification of the classification uncertainty using Endorsement Theory (ET); (3) discrimination of crop pixels and validation of the crop diversification rules at farm level; and (4) identification of farmers who are violating the requirements for greening subsidies. The prediction probabilities are computed by a neural network ensemble supplied with training samples selected automatically using farmers declared parcels (field vectors containing crop information and the field boundary of each crop). Crop discrimination is performed by considering a set of conclusions derived from individual neural networks based on ET. Verification of the diversification rules is performed by incorporating pixel-based classification uncertainty or confidence intervals with the class labels at the farmer level. The proposed approach was tested with WV2 imagery acquired in 2011 for a study area in Vennebjerg, Denmark, containing 132 farmers, 1258 fields, and 18 crops. The classification results obtained show an overall accuracy of 90.2%. The RS-based results suggest that 36 farmers did not follow the crop diversification rules that would qualify for the greening subsidies. When compared to the farmers' reported crop mixes, irrespective of the rule, the RS results indicate that false crop declarations were made by 8 farmers, covering 15 fields. If the farmers' reports had been submitted for the new greening subsidies, 3 farmers would have made a false claim; while remaining 5 farmers obey the rules of required crop proportion even though they have submitted the false crop code due to their small holding size. The RS results would have supported 96 farmers for greening subsidy claims, with no instances of suggesting a greening subsidy for a holding that the farmer did not report as meeting the required conditions. These results suggest that the proposed RS based method shows great promise for validating the new greening subsidies in Denmark.

  7. Alpharetroviral Self-inactivating Vectors: Long-term Transgene Expression in Murine Hematopoietic Cells and Low Genotoxicity

    PubMed Central

    Suerth, Julia D; Maetzig, Tobias; Brugman, Martijn H; Heinz, Niels; Appelt, Jens-Uwe; Kaufmann, Kerstin B; Schmidt, Manfred; Grez, Manuel; Modlich, Ute; Baum, Christopher; Schambach, Axel

    2012-01-01

    Comparative integrome analyses have highlighted alpharetroviral vectors with a relatively neutral, and thus favorable, integration spectrum. However, previous studies used alpharetroviral vectors harboring viral coding sequences and intact long-terminal repeats (LTRs). We recently developed self-inactivating (SIN) alpharetroviral vectors with an advanced split-packaging design. In a murine bone marrow (BM) transplantation model we now compared alpharetroviral, gammaretroviral, and lentiviral SIN vectors and showed that all vectors transduced hematopoietic stem cells (HSCs), leading to comparable, sustained multilineage transgene expression in primary and secondary transplanted mice. Alpharetroviral integrations were decreased near transcription start sites, CpG islands, and potential cancer genes compared with gammaretroviral, and decreased in genes compared with lentiviral integrations. Analyzing the transcriptome and intragenic integrations in engrafting cells, we observed stronger correlations between in-gene integration targeting and transcriptional activity for gammaretroviral and lentiviral vectors than for alpharetroviral vectors. Importantly, the relatively “extragenic” alpharetroviral integration pattern still supported long-term transgene expression upon serial transplantation. Furthermore, sensitive genotoxicity studies revealed a decreased immortalization incidence compared with gammaretroviral and lentiviral SIN vectors. We conclude that alpharetroviral SIN vectors have a favorable integration pattern which lowers the risk of insertional mutagenesis while supporting long-term transgene expression in the progeny of transplanted HSCs. PMID:22334016

  8. Predicting hepatotoxicity using ToxCast in vitro bioactivity and ...

    EPA Pesticide Factsheets

    Background: The U.S. EPA ToxCastTM program is screening thousands of environmental chemicals for bioactivity using hundreds of high-throughput in vitro assays to build predictive models of toxicity. We represented chemicals based on bioactivity and chemical structure descriptors then used supervised machine learning to predict their hepatotoxic effects.Results: A set of 677 chemicals were represented by 711 in vitro bioactivity descriptors (from ToxCast assays), 4,376 chemical structure descriptors (from QikProp, OpenBabel, PADEL, and PubChem), and three hepatotoxicity categories (from animal studies). Hepatotoxicants were defined by rat liver histopathology observed after chronic chemical testing and grouped into hypertrophy (161), injury (101) and proliferative lesions (99). Classifiers were built using six machine learning algorithms: linear discriminant analysis (LDA), Naïve Bayes (NB), support vector classification (SVM), classification and regression trees (CART), k-nearest neighbors (KNN) and an ensemble of classifiers (ENSMB). Classifiers of hepatotoxicity were built using chemical structure, ToxCast bioactivity, and a hybrid representation. Predictive performance was evaluated using 10-fold cross-validation testing and in-loop, filter-based, feature subset selection. Hybrid classifiers had the best balanced accuracy for predicting hypertrophy (0.78±0.08), injury (0.73±0.10) and proliferative lesions (0.72±0.09). Though chemical and bioactivity class

  9. FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection.

    PubMed

    Noto, Keith; Brodley, Carla; Slonim, Donna

    2012-01-01

    Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called "normal" instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach.

  10. Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning.

    PubMed

    Mirzaei, Shokoufeh; Sidi, Tomer; Keasar, Chen; Crivelli, Silvia

    2016-08-24

    The function of a protein is determined by its structure, which creates a need for efficient methods of protein structure determination to advance scientific and medical research. Because current experimental structure determination methods carry a high price tag, computational predictions are highly desirable. Given a protein sequence, computational methods produce numerous 3D structures known as decoys. However, selection of the best quality decoys is challenging as the end users can handle only a few ones. Therefore, scoring functions are central to decoy selection. They combine measurable features into a single number indicator of decoy quality. Unfortunately, current scoring functions do not consistently select the best decoys. Machine learning techniques offer great potential to improve decoy scoring. This paper presents two machine-learning based scoring functions to predict the quality of proteins structures, i.e., the similarity between the predicted structure and the experimental one without knowing the latter. We use different metrics to compare these scoring functions against three state-of-the-art scores. This is a first attempt at comparing different scoring functions using the same non-redundant dataset for training and testing and the same features. The results show that adding informative features may be more significant than the method used.

  11. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    NASA Astrophysics Data System (ADS)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.

  12. Role of Artificial Intelligence Techniques (Automatic Classifiers) in Molecular Imaging Modalities in Neurodegenerative Diseases.

    PubMed

    Cascianelli, Silvia; Scialpi, Michele; Amici, Serena; Forini, Nevio; Minestrini, Matteo; Fravolini, Mario Luca; Sinzinger, Helmut; Schillaci, Orazio; Palumbo, Barbara

    2017-01-01

    Artificial Intelligence (AI) is a very active Computer Science research field aiming to develop systems that mimic human intelligence and is helpful in many human activities, including Medicine. In this review we presented some examples of the exploiting of AI techniques, in particular automatic classifiers such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification Tree (ClT) and ensemble methods like Random Forest (RF), able to analyze findings obtained by positron emission tomography (PET) or single-photon emission tomography (SPECT) scans of patients with Neurodegenerative Diseases, in particular Alzheimer's Disease. We also focused our attention on techniques applied in order to preprocess data and reduce their dimensionality via feature selection or projection in a more representative domain (Principal Component Analysis - PCA - or Partial Least Squares - PLS - are examples of such methods); this is a crucial step while dealing with medical data, since it is necessary to compress patient information and retain only the most useful in order to discriminate subjects into normal and pathological classes. Main literature papers on the application of these techniques to classify patients with neurodegenerative disease extracting data from molecular imaging modalities are reported, showing that the increasing development of computer aided diagnosis systems is very promising to contribute to the diagnostic process.

  13. Predicting human liver microsomal stability with machine learning techniques.

    PubMed

    Sakiyama, Yojiro; Yuki, Hitomi; Moriya, Takashi; Hattori, Kazunari; Suzuki, Misaki; Shimada, Kaoru; Honma, Teruki

    2008-02-01

    To ensure a continuing pipeline in pharmaceutical research, lead candidates must possess appropriate metabolic stability in the drug discovery process. In vitro ADMET (absorption, distribution, metabolism, elimination, and toxicity) screening provides us with useful information regarding the metabolic stability of compounds. However, before the synthesis stage, an efficient process is required in order to deal with the vast quantity of data from large compound libraries and high-throughput screening. Here we have derived a relationship between the chemical structure and its metabolic stability for a data set of in-house compounds by means of various in silico machine learning such as random forest, support vector machine (SVM), logistic regression, and recursive partitioning. For model building, 1952 proprietary compounds comprising two classes (stable/unstable) were used with 193 descriptors calculated by Molecular Operating Environment. The results using test compounds have demonstrated that all classifiers yielded satisfactory results (accuracy > 0.8, sensitivity > 0.9, specificity > 0.6, and precision > 0.8). Above all, classification by random forest as well as SVM yielded kappa values of approximately 0.7 in an independent validation set, slightly higher than other classification tools. These results suggest that nonlinear/ensemble-based classification methods might prove useful in the area of in silico ADME modeling.

  14. Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging.

    PubMed

    Wang, Rui; Li, Rui; Lei, Yanyan; Zhu, Quing

    2015-01-01

    Support vector machine (SVM) is one of the most effective classification methods for cancer detection. The efficiency and quality of a SVM classifier depends strongly on several important features and a set of proper parameters. Here, a series of classification analyses, with one set of photoacoustic data from ovarian tissues ex vivo and a widely used breast cancer dataset- the Wisconsin Diagnostic Breast Cancer (WDBC), revealed the different accuracy of a SVM classification in terms of the number of features used and the parameters selected. A pattern recognition system is proposed by means of SVM-Recursive Feature Elimination (RFE) with the Radial Basis Function (RBF) kernel. To improve the effectiveness and robustness of the system, an optimized tuning ensemble algorithm called as SVM-RFE(C) with correlation filter was implemented to quantify feature and parameter information based on cross validation. The proposed algorithm is first demonstrated outperforming SVM-RFE on WDBC. Then the best accuracy of 94.643% and sensitivity of 94.595% were achieved when using SVM-RFE(C) to test 57 new PAT data from 19 patients. The experiment results show that the classifier constructed with SVM-RFE(C) algorithm is able to learn additional information from new data and has significant potential in ovarian cancer diagnosis.

  15. DCS-SVM: a novel semi-automated method for human brain MR image segmentation.

    PubMed

    Ahmadvand, Ali; Daliri, Mohammad Reza; Hajiali, Mohammadtaghi

    2017-11-27

    In this paper, a novel method is proposed which appropriately segments magnetic resonance (MR) brain images into three main tissues. This paper proposes an extension of our previous work in which we suggested a combination of multiple classifiers (CMC)-based methods named dynamic classifier selection-dynamic local training local Tanimoto index (DCS-DLTLTI) for MR brain image segmentation into three main cerebral tissues. This idea is used here and a novel method is developed that tries to use more complex and accurate classifiers like support vector machine (SVM) in the ensemble. This work is challenging because the CMC-based methods are time consuming, especially on huge datasets like three-dimensional (3D) brain MR images. Moreover, SVM is a powerful method that is used for modeling datasets with complex feature space, but it also has huge computational cost for big datasets, especially those with strong interclass variability problems and with more than two classes such as 3D brain images; therefore, we cannot use SVM in DCS-DLTLTI. Therefore, we propose a novel approach named "DCS-SVM" to use SVM in DCS-DLTLTI to improve the accuracy of segmentation results. The proposed method is applied on well-known datasets of the Internet Brain Segmentation Repository (IBSR) and promising results are obtained.

  16. FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection

    PubMed Central

    Brodley, Carla; Slonim, Donna

    2011-01-01

    Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called “normal” instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach. PMID:22639542

  17. Feature selection and classification of multiparametric medical images using bagging and SVM

    NASA Astrophysics Data System (ADS)

    Fan, Yong; Resnick, Susan M.; Davatzikos, Christos

    2008-03-01

    This paper presents a framework for brain classification based on multi-parametric medical images. This method takes advantage of multi-parametric imaging to provide a set of discriminative features for classifier construction by using a regional feature extraction method which takes into account joint correlations among different image parameters; in the experiments herein, MRI and PET images of the brain are used. Support vector machine classifiers are then trained based on the most discriminative features selected from the feature set. To facilitate robust classification and optimal selection of parameters involved in classification, in view of the well-known "curse of dimensionality", base classifiers are constructed in a bagging (bootstrap aggregating) framework for building an ensemble classifier and the classification parameters of these base classifiers are optimized by means of maximizing the area under the ROC (receiver operating characteristic) curve estimated from their prediction performance on left-out samples of bootstrap sampling. This classification system is tested on a sex classification problem, where it yields over 90% classification rates for unseen subjects. The proposed classification method is also compared with other commonly used classification algorithms, with favorable results. These results illustrate that the methods built upon information jointly extracted from multi-parametric images have the potential to perform individual classification with high sensitivity and specificity.

  18. Nonlinear force dependence on optically bound micro-particle arrays in the evanescent fields of fundamental and higher order microfibre modes

    PubMed Central

    Maimaiti, Aili; Holzmann, Daniela; Truong, Viet Giang; Ritsch, Helmut; Nic Chormaic, Síle

    2016-01-01

    Particles trapped in the evanescent field of an ultrathin optical fibre interact over very long distances via multiple scattering of the fibre-guided fields. In ultrathin fibres that support higher order modes, these interactions are stronger and exhibit qualitatively new behaviour due to the coupling of different fibre modes, which have different propagation wave-vectors, by the particles. Here, we study one dimensional longitudinal optical binding interactions of chains of 3 μm polystyrene spheres under the influence of the evanescent fields of a two-mode microfibre. The observation of long-range interactions, self-ordering and speed variation of particle chains reveals strong optical binding effects between the particles that can be modelled well by a tritter scattering-matrix approach. The optical forces, optical binding interactions and the velocity of bounded particle chains are calculated using this method. Results show good agreement with finite element numerical simulations. Experimental data and theoretical analysis show that higher order modes in a microfibre offer a promising method to not only obtain stable, multiple particle trapping or faster particle propulsion speeds, but that they also allow for better control over each individual trapped object in particle ensembles near the microfibre surface. PMID:27451935

  19. Classification of Shiga toxin-producing escherichia coli (STEC) serotypes with hyperspectral microscope imagery

    NASA Astrophysics Data System (ADS)

    Park, Bosoon; Windham, William R.; Ladely, Scott R.; Gurram, Prudhvi; Kwon, Heesung; Yoon, Seung-Chul; Lawrence, Kurt C.; Narang, Neelam; Cray, William C.

    2012-05-01

    Non-O157:H7 Shiga toxin-producing Escherichia coli (STEC) strains such as O26, O45, O103, O111, O121 and O145 are recognized as serious outbreak to cause human illness due to their toxicity. A conventional microbiological method for cell counting is laborious and needs long time for the results. Since optical detection method is promising for realtime, in-situ foodborne pathogen detection, acousto-optical tunable filters (AOTF)-based hyperspectral microscopic imaging (HMI) method has been developed for identifying pathogenic bacteria because of its capability to differentiate both spatial and spectral characteristics of each bacterial cell from microcolony samples. Using the AOTF-based HMI method, 89 contiguous spectral images could be acquired within approximately 30 seconds with 250 ms exposure time. From this study, we have successfully developed the protocol for live-cell immobilization on glass slides to acquire quality spectral images from STEC bacterial cells using the modified dry method. Among the contiguous spectral imagery between 450 and 800 nm, the intensity of spectral images at 458, 498, 522, 546, 570, 586, 670 and 690 nm were distinctive for STEC bacteria. With two different classification algorithms, Support Vector Machine (SVM) and Sparse Kernel-based Ensemble Learning (SKEL), a STEC serotype O45 could be classified with 92% detection accuracy.

  20. Scalable Metropolis Monte Carlo for simulation of hard shapes

    NASA Astrophysics Data System (ADS)

    Anderson, Joshua A.; Eric Irrgang, M.; Glotzer, Sharon C.

    2016-07-01

    We design and implement a scalable hard particle Monte Carlo simulation toolkit (HPMC), and release it open source as part of HOOMD-blue. HPMC runs in parallel on many CPUs and many GPUs using domain decomposition. We employ BVH trees instead of cell lists on the CPU for fast performance, especially with large particle size disparity, and optimize inner loops with SIMD vector intrinsics on the CPU. Our GPU kernel proposes many trial moves in parallel on a checkerboard and uses a block-level queue to redistribute work among threads and avoid divergence. HPMC supports a wide variety of shape classes, including spheres/disks, unions of spheres, convex polygons, convex spheropolygons, concave polygons, ellipsoids/ellipses, convex polyhedra, convex spheropolyhedra, spheres cut by planes, and concave polyhedra. NVT and NPT ensembles can be run in 2D or 3D triclinic boxes. Additional integration schemes permit Frenkel-Ladd free energy computations and implicit depletant simulations. In a benchmark system of a fluid of 4096 pentagons, HPMC performs 10 million sweeps in 10 min on 96 CPU cores on XSEDE Comet. The same simulation would take 7.6 h in serial. HPMC also scales to large system sizes, and the same benchmark with 16.8 million particles runs in 1.4 h on 2048 GPUs on OLCF Titan.

  1. Gestalt Effects in Visual Working Memory.

    PubMed

    Kałamała, Patrycja; Sadowska, Aleksandra; Ordziniak, Wawrzyniec; Chuderski, Adam

    2017-01-01

    Four experiments investigated whether conforming to Gestalt principles, well known to drive visual perception, also facilitates the active maintenance of information in visual working memory (VWM). We used the change detection task, which required the memorization of visual patterns composed of several shapes. We observed no effects of symmetry of visual patterns on VWM performance. However, there was a moderate positive effect when a particular shape that was probed matched the shape of the whole pattern (the whole-part similarity effect). Data support the models assuming that VWM encodes not only particular objects of the perceptual scene but also the spatial relations between them (the ensemble representation). The ensemble representation may prime objects similar to its shape and thereby boost access to them. In contrast, the null effect of symmetry relates the fact that this very feature of an ensemble does not yield any useful additional information for VWM.

  2. Identifying saltcedar with hyperspectral data and support vector machines

    USDA-ARS?s Scientific Manuscript database

    Saltcedar (Tamarix spp.) are a group of dense phreatophytic shrubs and trees that are invasive to riparian areas throughout the United States. This study determined the feasibility of using hyperspectral data and a support vector machine (SVM) classifier to discriminate saltcedar from other cover t...

  3. Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data.

    PubMed

    Held, Elizabeth; Cape, Joshua; Tintle, Nathan

    2016-01-01

    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.

  4. A novel approach to malignant-benign classification of pulmonary nodules by using ensemble learning classifiers.

    PubMed

    Tartar, A; Akan, A; Kilic, N

    2014-01-01

    Computer-aided detection systems can help radiologists to detect pulmonary nodules at an early stage. In this paper, a novel Computer-Aided Diagnosis system (CAD) is proposed for the classification of pulmonary nodules as malignant and benign. The proposed CAD system using ensemble learning classifiers, provides an important support to radiologists at the diagnosis process of the disease, achieves high classification performance. The proposed approach with bagging classifier results in 94.7 %, 90.0 % and 77.8 % classification sensitivities for benign, malignant and undetermined classes (89.5 % accuracy), respectively.

  5. Applying spectral unmixing and support vector machine to airborne hyperspectral imagery for detecting giant reed

    USDA-ARS?s Scientific Manuscript database

    This study evaluated linear spectral unmixing (LSU), mixture tuned matched filtering (MTMF) and support vector machine (SVM) techniques for detecting and mapping giant reed (Arundo donax L.), an invasive weed that presents a severe threat to agroecosystems and riparian areas throughout the southern ...

  6. Support vector machines classifiers of physical activities in preschoolers

    USDA-ARS?s Scientific Manuscript database

    The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...

  7. Fabric wrinkle characterization and classification using modified wavelet coefficients and optimized support-vector-machine classifier

    USDA-ARS?s Scientific Manuscript database

    This paper presents a novel wrinkle evaluation method that uses modified wavelet coefficients and an optimized support-vector-machine (SVM) classification scheme to characterize and classify wrinkle appearance of fabric. Fabric images were decomposed with the wavelet transform (WT), and five parame...

  8. Comparison of Support Vector Machine, Neural Network, and CART Algorithms for the Land-Cover Classification Using Limited Training Data Points

    EPA Science Inventory

    Support vector machine (SVM) was applied for land-cover characterization using MODIS time-series data. Classification performance was examined with respect to training sample size, sample variability, and landscape homogeneity (purity). The results were compared to two convention...

  9. The influence of internal variability on Earth's energy balance framework and implications for estimating climate sensitivity

    NASA Astrophysics Data System (ADS)

    Dessler, Andrew E.; Mauritsen, Thorsten; Stevens, Bjorn

    2018-04-01

    Our climate is constrained by the balance between solar energy absorbed by the Earth and terrestrial energy radiated to space. This energy balance has been widely used to infer equilibrium climate sensitivity (ECS) from observations of 20th-century warming. Such estimates yield lower values than other methods, and these have been influential in pushing down the consensus ECS range in recent assessments. Here we test the method using a 100-member ensemble of the Max Planck Institute Earth System Model (MPI-ESM1.1) simulations of the period 1850-2005 with known forcing. We calculate ECS in each ensemble member using energy balance, yielding values ranging from 2.1 to 3.9 K. The spread in the ensemble is related to the central assumption in the energy budget framework: that global average surface temperature anomalies are indicative of anomalies in outgoing energy (either of terrestrial origin or reflected solar energy). We find that this assumption is not well supported over the historical temperature record in the model ensemble or more recent satellite observations. We find that framing energy balance in terms of 500 hPa tropical temperature better describes the planet's energy balance.

  10. Attracting Dynamics of Frontal Cortex Ensembles during Memory-Guided Decision-Making

    PubMed Central

    Seamans, Jeremy K.; Durstewitz, Daniel

    2011-01-01

    A common theoretical view is that attractor-like properties of neuronal dynamics underlie cognitive processing. However, although often proposed theoretically, direct experimental support for the convergence of neural activity to stable population patterns as a signature of attracting states has been sparse so far, especially in higher cortical areas. Combining state space reconstruction theorems and statistical learning techniques, we were able to resolve details of anterior cingulate cortex (ACC) multiple single-unit activity (MSUA) ensemble dynamics during a higher cognitive task which were not accessible previously. The approach worked by constructing high-dimensional state spaces from delays of the original single-unit firing rate variables and the interactions among them, which were then statistically analyzed using kernel methods. We observed cognitive-epoch-specific neural ensemble states in ACC which were stable across many trials (in the sense of being predictive) and depended on behavioral performance. More interestingly, attracting properties of these cognitively defined ensemble states became apparent in high-dimensional expansions of the MSUA spaces due to a proper unfolding of the neural activity flow, with properties common across different animals. These results therefore suggest that ACC networks may process different subcomponents of higher cognitive tasks by transiting among different attracting states. PMID:21625577

  11. A Thermal Physiological Comparison of Two HazMat Protective Ensembles With and Without Active Convective Cooling

    NASA Technical Reports Server (NTRS)

    Williamson, Rebecca; Carbo, Jorge; Luna, Bernadette; Webbon, Bruce W.

    1998-01-01

    Wearing impermeable garments for hazardous materials clean up can often present a health and safety problem for the wearer. Even short duration clean up activities can produce heat stress injuries in hazardous materials workers. It was hypothesized that an internal cooling system might increase worker productivity and decrease likelihood of heat stress injuries in typical HazMat operations. Two HazMat protective ensembles were compared during treadmill exercise. The different ensembles were created using two different suits: a Trelleborg VPS suit representative of current HazMat suits and a prototype suit developed by NASA engineers. The two life support systems used were a current technology Interspiro Spirolite breathing apparatus and a liquid air breathing system that also provided convective cooling. Twelve local members of a HazMat team served as test subjects. They were fully instrumented to allow a complete physiological comparison of their thermal responses to the different ensembles. Results showed that cooling from the liquid air system significantly decreased thermal stress. The results of the subjective evaluations of new design features in the prototype suit were also highly favorable. Incorporation of these new design features could lead to significant operational advantages in the future.

  12. Application of new methods based on ECMWF ensemble model for predicting severe convective weather situations

    NASA Astrophysics Data System (ADS)

    Lazar, Dora; Ihasz, Istvan

    2013-04-01

    The short and medium range operational forecasts, warning and alarm of the severe weather are one of the most important activities of the Hungarian Meteorological Service. Our study provides comprehensive summary of newly developed methods based on ECMWF ensemble forecasts to assist successful prediction of the convective weather situations. . In the first part of the study a brief overview is given about the components of atmospheric convection, which are the atmospheric lifting force, convergence and vertical wind shear. The atmospheric instability is often used to characterize the so-called instability index; one of the most popular and often used indexes is the convective available potential energy. Heavy convective events, like intensive storms, supercells and tornadoes are needed the vertical instability, adequate moisture and vertical wind shear. As a first step statistical studies of these three parameters are based on nine years time series of 51-member ensemble forecasting model based on convective summer time period, various statistical analyses were performed. Relationship of the rate of the convective and total precipitation and above three parameters was studied by different statistical methods. Four new visualization methods were applied for supporting successful forecasts of severe weathers. Two of the four visualization methods the ensemble meteogram and the ensemble vertical profiles had been available at the beginning of our work. Both methods show probability of the meteorological parameters for the selected location. Additionally two new methods have been developed. First method provides probability map of the event exceeding predefined values, so the incident of the spatial uncertainty is well-defined. The convective weather events are characterized by the incident of space often rhapsodic occurs rather have expected the event area can be selected so that the ensemble forecasts give very good support. Another new visualization tool shows time evolution of predefined multiple thresholds in graphical form for any selected location. With applying this tool degree of the dangerous weather conditions can be well estimated. Besides intensive convective periods are clearly marked during the forecasting period. Developments were done by MAGICS++ software under UNIX operating system. The third part of the study usefulness of these tools is demonstrated in three interesting cases studies of last summer.

  13. Analysis of the hydrological response of a distributed physically-based model using post-assimilation (EnKF) diagnostics of streamflow and in situ soil moisture observations

    NASA Astrophysics Data System (ADS)

    Trudel, Mélanie; Leconte, Robert; Paniconi, Claudio

    2014-06-01

    Data assimilation techniques not only enhance model simulations and forecast, they also provide the opportunity to obtain a diagnostic of both the model and observations used in the assimilation process. In this research, an ensemble Kalman filter was used to assimilate streamflow observations at a basin outlet and at interior locations, as well as soil moisture at two different depths (15 and 45 cm). The simulation model is the distributed physically-based hydrological model CATHY (CATchment HYdrology) and the study site is the Des Anglais watershed, a 690 km2 river basin located in southern Quebec, Canada. Use of Latin hypercube sampling instead of a conventional Monte Carlo method to generate the ensemble reduced the size of the ensemble, and therefore the calculation time. Different post-assimilation diagnostics, based on innovations (observation minus background), analysis residuals (observation minus analysis), and analysis increments (analysis minus background), were used to evaluate assimilation optimality. An important issue in data assimilation is the estimation of error covariance matrices. These diagnostics were also used in a calibration exercise to determine the standard deviation of model parameters, forcing data, and observations that led to optimal assimilations. The analysis of innovations showed a lag between the model forecast and the observation during rainfall events. Assimilation of streamflow observations corrected this discrepancy. Assimilation of outlet streamflow observations improved the Nash-Sutcliffe efficiencies (NSE) between the model forecast (one day) and the observation at both outlet and interior point locations, owing to the structure of the state vector used. However, assimilation of streamflow observations systematically increased the simulated soil moisture values.

  14. eHive: an artificial intelligence workflow system for genomic analysis.

    PubMed

    Severin, Jessica; Beal, Kathryn; Vilella, Albert J; Fitzgerald, Stephen; Schuster, Michael; Gordon, Leo; Ureta-Vidal, Abel; Flicek, Paul; Herrero, Javier

    2010-05-11

    The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.

  15. A sub-ensemble theory of ideal quantum measurement processes

    NASA Astrophysics Data System (ADS)

    Allahverdyan, Armen E.; Balian, Roger; Nieuwenhuizen, Theo M.

    2017-01-01

    In order to elucidate the properties currently attributed to ideal measurements, one must explain how the concept of an individual event with a well-defined outcome may emerge from quantum theory which deals with statistical ensembles, and how different runs issued from the same initial state may end up with different final states. This so-called "measurement problem" is tackled with two guidelines. On the one hand, the dynamics of the macroscopic apparatus A coupled to the tested system S is described mathematically within a standard quantum formalism, where " q-probabilities" remain devoid of interpretation. On the other hand, interpretative principles, aimed to be minimal, are introduced to account for the expected features of ideal measurements. Most of the five principles stated here, which relate the quantum formalism to physical reality, are straightforward and refer to macroscopic variables. The process can be identified with a relaxation of S + A to thermodynamic equilibrium, not only for a large ensemble E of runs but even for its sub-ensembles. The different mechanisms of quantum statistical dynamics that ensure these types of relaxation are exhibited, and the required properties of the Hamiltonian of S + A are indicated. The additional theoretical information provided by the study of sub-ensembles remove Schrödinger's quantum ambiguity of the final density operator for E which hinders its direct interpretation, and bring out a commutative behaviour of the pointer observable at the final time. The latter property supports the introduction of a last interpretative principle, needed to switch from the statistical ensembles and sub-ensembles described by quantum theory to individual experimental events. It amounts to identify some formal " q-probabilities" with ordinary frequencies, but only those which refer to the final indications of the pointer. The desired properties of ideal measurements, in particular the uniqueness of the result for each individual run of the ensemble and von Neumann's reduction, are thereby recovered with economic interpretations. The status of Born's rule involving both A and S is re-evaluated, and contextuality of quantum measurements is made obvious.

  16. An ensemble-ANFIS based uncertainty assessment model for forecasting multi-scalar standardized precipitation index

    NASA Astrophysics Data System (ADS)

    Ali, Mumtaz; Deo, Ravinesh C.; Downs, Nathan J.; Maraseni, Tek

    2018-07-01

    Forecasting drought by means of the World Meteorological Organization-approved Standardized Precipitation Index (SPI) is considered to be a fundamental task to support socio-economic initiatives and effectively mitigating the climate-risk. This study aims to develop a robust drought modelling strategy to forecast multi-scalar SPI in drought-rich regions of Pakistan where statistically significant lagged combinations of antecedent SPI are used to forecast future SPI. With ensemble-Adaptive Neuro Fuzzy Inference System ('ensemble-ANFIS') executed via a 10-fold cross-validation procedure, a model is constructed by randomly partitioned input-target data. Resulting in 10-member ensemble-ANFIS outputs, judged by mean square error and correlation coefficient in the training period, the optimal forecasts are attained by the averaged simulations, and the model is benchmarked with M5 Model Tree and Minimax Probability Machine Regression (MPMR). The results show the proposed ensemble-ANFIS model's preciseness was notably better (in terms of the root mean square and mean absolute error including the Willmott's, Nash-Sutcliffe and Legates McCabe's index) for the 6- and 12- month compared to the 3-month forecasts as verified by the largest error proportions that registered in smallest error band. Applying 10-member simulations, ensemble-ANFIS model was validated for its ability to forecast severity (S), duration (D) and intensity (I) of drought (including the error bound). This enabled uncertainty between multi-models to be rationalized more efficiently, leading to a reduction in forecast error caused by stochasticity in drought behaviours. Through cross-validations at diverse sites, a geographic signature in modelled uncertainties was also calculated. Considering the superiority of ensemble-ANFIS approach and its ability to generate uncertainty-based information, the study advocates the versatility of a multi-model approach for drought-risk forecasting and its prime importance for estimating drought properties over confidence intervals to generate better information for strategic decision-making.

  17. Seventy Years of the EPR Paradox

    NASA Astrophysics Data System (ADS)

    Kupczynski, Marian

    2006-11-01

    In spite of the fact that statistical predictions of quantum theory (QT) can only be tested if large amount of data is available a claim has been made that QT provides the most complete description of an individual physical system. Einstein's opposition to this claim and the paradox he presented in the article written together with Podolsky and Rosen in 1935 inspired generations of physicists in their quest for better understanding of QT. Seventy years after EPR article it is clear that without deep understanding of the character and limitations of QT one may not hope to find a meaningful unified theory of all physical interactions, manipulate qubits or construct a quantum computer.. In this paper we present shortly the EPR paper, the discussion, which followed it and Bell inequalities (BI). To avoid various paradoxes we advocate purely statistical contextual interpretation (PSC) of QT. According to PSC a state vector is not an attribute of a single electron, photon, trapped ion or quantum dot. A value of an observable assigned to a physical system has only a meaning in a context of a particular physical experiment PSC does not provide any mental space-time picture of sub phenomena. The EPR paradox is avoided because the reduction of the state vector in the measurement process is a passage from a description of the whole ensemble of the experimental results to a particular sub-ensemble of these results. We show that the violation of BI is neither a proof of the completeness of QT nor of its non-locality. Therefore we rephrase the EPR question and ask whether QT is "predictably "complete or in other words does it provide the complete description of experimental data. To test the "predictable completeness" it is not necessary to perform additional experiments it is sufficient to analyze more in detail the existing experimental data by using various non-parametric purity tests and other specific statistical tools invented to study the fine structure the time-series.

  18. Performance Evaluation of EnKF-based Hydrogeological Site Characterization using Color Coherent Vectors

    NASA Astrophysics Data System (ADS)

    Moslehi, M.; de Barros, F.

    2017-12-01

    Complexity of hydrogeological systems arises from the multi-scale heterogeneity and insufficient measurements of their underlying parameters such as hydraulic conductivity and porosity. An inadequate characterization of hydrogeological properties can significantly decrease the trustworthiness of numerical models that predict groundwater flow and solute transport. Therefore, a variety of data assimilation methods have been proposed in order to estimate hydrogeological parameters from spatially scarce data by incorporating the governing physical models. In this work, we propose a novel framework for evaluating the performance of these estimation methods. We focus on the Ensemble Kalman Filter (EnKF) approach that is a widely used data assimilation technique. It reconciles multiple sources of measurements to sequentially estimate model parameters such as the hydraulic conductivity. Several methods have been used in the literature to quantify the accuracy of the estimations obtained by EnKF, including Rank Histograms, RMSE and Ensemble Spread. However, these commonly used methods do not regard the spatial information and variability of geological formations. This can cause hydraulic conductivity fields with very different spatial structures to have similar histograms or RMSE. We propose a vision-based approach that can quantify the accuracy of estimations by considering the spatial structure embedded in the estimated fields. Our new approach consists of adapting a new metric, Color Coherent Vectors (CCV), to evaluate the accuracy of estimated fields achieved by EnKF. CCV is a histogram-based technique for comparing images that incorporate spatial information. We represent estimated fields as digital three-channel images and use CCV to compare and quantify the accuracy of estimations. The sensitivity of CCV to spatial information makes it a suitable metric for assessing the performance of spatial data assimilation techniques. Under various factors of data assimilation methods such as number, layout, and type of measurements, we compare the performance of CCV with other metrics such as RMSE. By simulating hydrogeological processes using estimated and true fields, we observe that CCV outperforms other existing evaluation metrics.

  19. Vector-model-supported optimization in volumetric-modulated arc stereotactic radiotherapy planning for brain metastasis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Eva Sau Fan; Department of Health Technology and Informatics, The Hong Kong Polytechnic University; Wu, Vincent Wing Cheung

    Long planning time in volumetric-modulated arc stereotactic radiotherapy (VMA-SRT) cases can limit its clinical efficiency and use. A vector model could retrieve previously successful radiotherapy cases that share various common anatomic features with the current case. The prsent study aimed to develop a vector model that could reduce planning time by applying the optimization parameters from those retrieved reference cases. Thirty-six VMA-SRT cases of brain metastasis (gender, male [n = 23], female [n = 13]; age range, 32 to 81 years old) were collected and used as a reference database. Another 10 VMA-SRT cases were planned with both conventional optimization and vector-model-supported optimization, followingmore » the oncologists' clinical dose prescriptions. Planning time and plan quality measures were compared using the 2-sided paired Wilcoxon signed rank test with a significance level of 0.05, with positive false discovery rate (pFDR) of less than 0.05. With vector-model-supported optimization, there was a significant reduction in the median planning time, a 40% reduction from 3.7 to 2.2 hours (p = 0.002, pFDR = 0.032), and for the number of iterations, a 30% reduction from 8.5 to 6.0 (p = 0.006, pFDR = 0.047). The quality of plans from both approaches was comparable. From these preliminary results, vector-model-supported optimization can expedite the optimization of VMA-SRT for brain metastasis while maintaining plan quality.« less

  20. PlasmoGEM, a database supporting a community resource for large-scale experimental genetics in malaria parasites.

    PubMed

    Schwach, Frank; Bushell, Ellen; Gomes, Ana Rita; Anar, Burcu; Girling, Gareth; Herd, Colin; Rayner, Julian C; Billker, Oliver

    2015-01-01

    The Plasmodium Genetic Modification (PlasmoGEM) database (http://plasmogem.sanger.ac.uk) provides access to a resource of modular, versatile and adaptable vectors for genome modification of Plasmodium spp. parasites. PlasmoGEM currently consists of >2000 plasmids designed to modify the genome of Plasmodium berghei, a malaria parasite of rodents, which can be requested by non-profit research organisations free of charge. PlasmoGEM vectors are designed with long homology arms for efficient genome integration and carry gene specific barcodes to identify individual mutants. They can be used for a wide array of applications, including protein localisation, gene interaction studies and high-throughput genetic screens. The vector production pipeline is supported by a custom software suite that automates both the vector design process and quality control by full-length sequencing of the finished vectors. The PlasmoGEM web interface allows users to search a database of finished knock-out and gene tagging vectors, view details of their designs, download vector sequence in different formats and view available quality control data as well as suggested genotyping strategies. We also make gDNA library clones and intermediate vectors available for researchers to produce vectors for themselves. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. Revealing Risks in Adaptation Planning: expanding Uncertainty Treatment and dealing with Large Projection Ensembles during Planning Scenario development

    NASA Astrophysics Data System (ADS)

    Brekke, L. D.; Clark, M. P.; Gutmann, E. D.; Wood, A.; Mizukami, N.; Mendoza, P. A.; Rasmussen, R.; Ikeda, K.; Pruitt, T.; Arnold, J. R.; Rajagopalan, B.

    2015-12-01

    Adaptation planning assessments often rely on single methods for climate projection downscaling and hydrologic analysis, do not reveal uncertainties from associated method choices, and thus likely produce overly confident decision-support information. Recent work by the authors has highlighted this issue by identifying strengths and weaknesses of widely applied methods for downscaling climate projections and assessing hydrologic impacts. This work has shown that many of the methodological choices made can alter the magnitude, and even the sign of the climate change signal. Such results motivate consideration of both sources of method uncertainty within an impacts assessment. Consequently, the authors have pursued development of improved downscaling techniques spanning a range of method classes (quasi-dynamical and circulation-based statistical methods) and developed approaches to better account for hydrologic analysis uncertainty (multi-model; regional parameter estimation under forcing uncertainty). This presentation summarizes progress in the development of these methods, as well as implications of pursuing these developments. First, having access to these methods creates an opportunity to better reveal impacts uncertainty through multi-method ensembles, expanding on present-practice ensembles which are often based only on emissions scenarios and GCM choices. Second, such expansion of uncertainty treatment combined with an ever-expanding wealth of global climate projection information creates a challenge of how to use such a large ensemble for local adaptation planning. To address this challenge, the authors are evaluating methods for ensemble selection (considering the principles of fidelity, diversity and sensitivity) that is compatible with present-practice approaches for abstracting change scenarios from any "ensemble of opportunity". Early examples from this development will also be presented.

  2. 1-norm support vector novelty detection and its sparseness.

    PubMed

    Zhang, Li; Zhou, WeiDa

    2013-12-01

    This paper proposes a 1-norm support vector novelty detection (SVND) method and discusses its sparseness. 1-norm SVND is formulated as a linear programming problem and uses two techniques for inducing sparseness, or the 1-norm regularization and the hinge loss function. We also find two upper bounds on the sparseness of 1-norm SVND, or exact support vector (ESV) and kernel Gram matrix rank bounds. The ESV bound indicates that 1-norm SVND has a sparser representation model than SVND. The kernel Gram matrix rank bound can loosely estimate the sparseness of 1-norm SVND. Experimental results show that 1-norm SVND is feasible and effective. Copyright © 2013 Elsevier Ltd. All rights reserved.

  3. ℓ(p)-Norm multikernel learning approach for stock market price forecasting.

    PubMed

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.

  4. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics

    PubMed Central

    HUANG, SHUJUN; CAI, NIANGUANG; PACHECO, PEDRO PENZUTI; NARANDES, SHAVIRA; WANG, YANG; XU, WAYNE

    2017-01-01

    Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. PMID:29275361

  5. Adaptive Encoding of Outcome Prediction by Prefrontal Cortex Ensembles Supports Behavioral Flexibility.

    PubMed

    Del Arco, Alberto; Park, Junchol; Wood, Jesse; Kim, Yunbok; Moghaddam, Bita

    2017-08-30

    The prefrontal cortex (PFC) is thought to play a critical role in behavioral flexibility by monitoring action-outcome contingencies. How PFC ensembles represent shifts in behavior in response to changes in these contingencies remains unclear. We recorded single-unit activity and local field potentials in the dorsomedial PFC (dmPFC) of male rats during a set-shifting task that required them to update their behavior, among competing options, in response to changes in action-outcome contingencies. As behavior was updated, a subset of PFC ensembles encoded the current trial outcome before the outcome was presented. This novel outcome-prediction encoding was absent in a control task, in which actions were rewarded pseudorandomly, indicating that PFC neurons are not merely providing an expectancy signal. In both control and set-shifting tasks, dmPFC neurons displayed postoutcome discrimination activity, indicating that these neurons also monitor whether a behavior is successful in generating rewards. Gamma-power oscillatory activity increased before the outcome in both tasks but did not differentiate between expected outcomes, suggesting that this measure is not related to set-shifting behavior but reflects expectation of an outcome after action execution. These results demonstrate that PFC neurons support flexible rule-based action selection by predicting outcomes that follow a particular action. SIGNIFICANCE STATEMENT Tracking action-outcome contingencies and modifying behavior when those contingencies change is critical to behavioral flexibility. We find that ensembles of dorsomedial prefrontal cortex neurons differentiate between expected outcomes when action-outcome contingencies change. This predictive mode of signaling may be used to promote a new response strategy at the service of behavioral flexibility. Copyright © 2017 the authors 0270-6474/17/378363-11$15.00/0.

  6. Discriminative analysis with a limited number of MEG trials in depression.

    PubMed

    Lu, Qing; Jiang, Haiteng; Bi, Kun; Liu, Chu; Yao, Zhijian

    2014-01-01

    In studies when exploring distinct patterns of functional abnormalities inherent in depression, experiments are generally repeated over many trials, and then the data are averaged across those trials in order to improve the signal to noise ratio. Repeated stimuli will lead to unpredictable impairment on signals, due to material familiarity or subjects׳ fatigue. In this consideration, signal processing tools powerful on small numbers of trials are expected to alleviate the work load on subjects, especially for mental disease studies. Forty-four subjects, half-depressed patients and half-healthy subjects, were recruited for MEG scanning in response to sad facial stimuli. Multichannel matching pursuit (MMP) was implemented to manage the limited number of trials. The post-MMP MEG signals were utilized to calculate the power topography over the whole brain, as inputs for a Support Vector Machine (SVM) classifier. Standard ICA and conventional ensemble averaging plus Butterworth filtering were employed as well as benchmark studies for performance comparison. A limited number of trials were required via MMP to discriminate the depressive. Post-MMP discriminative analysis revealed a deficit theta pattern and an excessive alpha/beta pattern. The small sample size may impair the stability of the reported findings. The transient tiny variance of the signal was excluded from exploration. The deficit theta pattern together with the excessive alpha/beta pattern in depression may indicate the dysfunction of the limbic-cortical circuit in a 'top-down' process. The post-MMP discrimination helps alleviate the scanning burden, facilitating the possibility for neuroimaging supporting the affective disorder clinical diagnosis. Copyright © 2014 Elsevier B.V. All rights reserved.

  7. State updating of a distributed hydrological model with Ensemble Kalman Filtering: Effects of updating frequency and observation network density on forecast accuracy

    NASA Astrophysics Data System (ADS)

    Rakovec, O.; Weerts, A.; Hazenberg, P.; Torfs, P.; Uijlenhoet, R.

    2012-12-01

    This paper presents a study on the optimal setup for discharge assimilation within a spatially distributed hydrological model (Rakovec et al., 2012a). The Ensemble Kalman filter (EnKF) is employed to update the grid-based distributed states of such an hourly spatially distributed version of the HBV-96 model. By using a physically based model for the routing, the time delay and attenuation are modelled more realistically. The discharge and states at a given time step are assumed to be dependent on the previous time step only (Markov property). Synthetic and real world experiments are carried out for the Upper Ourthe (1600 km2), a relatively quickly responding catchment in the Belgian Ardennes. The uncertain precipitation model forcings were obtained using a time-dependent multivariate spatial conditional simulation method (Rakovec et al., 2012b), which is further made conditional on preceding simulations. We assess the impact on the forecasted discharge of (1) various sets of the spatially distributed discharge gauges and (2) the filtering frequency. The results show that the hydrological forecast at the catchment outlet is improved by assimilating interior gauges. This augmentation of the observation vector improves the forecast more than increasing the updating frequency. In terms of the model states, the EnKF procedure is found to mainly change the pdfs of the two routing model storages, even when the uncertainty in the discharge simulations is smaller than the defined observation uncertainty. Rakovec, O., Weerts, A. H., Hazenberg, P., Torfs, P. J. J. F., and Uijlenhoet, R.: State updating of a distributed hydrological model with Ensemble Kalman Filtering: effects of updating frequency and observation network density on forecast accuracy, Hydrol. Earth Syst. Sci. Discuss., 9, 3961-3999, doi:10.5194/hessd-9-3961-2012, 2012a. Rakovec, O., Hazenberg, P., Torfs, P. J. J. F., Weerts, A. H., and Uijlenhoet, R.: Generating spatial precipitation ensembles: impact of temporal correlation structure, Hydrol. Earth Syst. Sci. Discuss., 9, 3087-3127, doi:10.5194/hessd-9-3087-2012, 2012b.

  8. Support vector machine applied to predict the zoonotic potential of E. coli O157 cattle isolates

    USDA-ARS?s Scientific Manuscript database

    Methods based on sequence data analysis facilitate the tracking of disease outbreaks, allow relationships between strains to be reconstructed and virulence factors to be identified. However, these methods are used postfactum after an outbreak has happened. Here, we show that support vector machine a...

  9. Support vector machine incremental learning triggered by wrongly predicted samples

    NASA Astrophysics Data System (ADS)

    Tang, Ting-long; Guan, Qiu; Wu, Yi-rong

    2018-05-01

    According to the classic Karush-Kuhn-Tucker (KKT) theorem, at every step of incremental support vector machine (SVM) learning, the newly adding sample which violates the KKT conditions will be a new support vector (SV) and migrate the old samples between SV set and non-support vector (NSV) set, and at the same time the learning model should be updated based on the SVs. However, it is not exactly clear at this moment that which of the old samples would change between SVs and NSVs. Additionally, the learning model will be unnecessarily updated, which will not greatly increase its accuracy but decrease the training speed. Therefore, how to choose the new SVs from old sets during the incremental stages and when to process incremental steps will greatly influence the accuracy and efficiency of incremental SVM learning. In this work, a new algorithm is proposed to select candidate SVs and use the wrongly predicted sample to trigger the incremental processing simultaneously. Experimental results show that the proposed algorithm can achieve good performance with high efficiency, high speed and good accuracy.

  10. Prediction of Spirometric Forced Expiratory Volume (FEV1) Data Using Support Vector Regression

    NASA Astrophysics Data System (ADS)

    Kavitha, A.; Sujatha, C. M.; Ramakrishnan, S.

    2010-01-01

    In this work, prediction of forced expiratory volume in 1 second (FEV1) in pulmonary function test is carried out using the spirometer and support vector regression analysis. Pulmonary function data are measured with flow volume spirometer from volunteers (N=175) using a standard data acquisition protocol. The acquired data are then used to predict FEV1. Support vector machines with polynomial kernel function with four different orders were employed to predict the values of FEV1. The performance is evaluated by computing the average prediction accuracy for normal and abnormal cases. Results show that support vector machines are capable of predicting FEV1 in both normal and abnormal cases and the average prediction accuracy for normal subjects was higher than that of abnormal subjects. Accuracy in prediction was found to be high for a regularization constant of C=10. Since FEV1 is the most significant parameter in the analysis of spirometric data, it appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.

  11. Quantum Support Vector Machine for Big Data Classification

    NASA Astrophysics Data System (ADS)

    Rebentrost, Patrick; Mohseni, Masoud; Lloyd, Seth

    2014-09-01

    Supervised machine learning is the classification of new data based on already classified training examples. In this work, we show that the support vector machine, an optimized binary classifier, can be implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number of training examples. In cases where classical sampling algorithms require polynomial time, an exponential speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.

  12. "Intelligent Ensemble" Projections of Precipitation and Surface Radiation in Support of Agricultural Climate Change Adaptation

    NASA Technical Reports Server (NTRS)

    Taylor, Patrick C.; Baker, Noel C.

    2015-01-01

    Earth's climate is changing and will continue to change into the foreseeable future. Expected changes in the climatological distribution of precipitation, surface temperature, and surface solar radiation will significantly impact agriculture. Adaptation strategies are, therefore, required to reduce the agricultural impacts of climate change. Climate change projections of precipitation, surface temperature, and surface solar radiation distributions are necessary input for adaption planning studies. These projections are conventionally constructed from an ensemble of climate model simulations (e.g., the Coupled Model Intercomparison Project 5 (CMIP5)) as an equal weighted average, one model one vote. Each climate model, however, represents the array of climate-relevant physical processes with varying degrees of fidelity influencing the projection of individual climate variables differently. Presented here is a new approach, termed the "Intelligent Ensemble, that constructs climate variable projections by weighting each model according to its ability to represent key physical processes, e.g., precipitation probability distribution. This approach provides added value over the equal weighted average method. Physical process metrics applied in the "Intelligent Ensemble" method are created using a combination of NASA and NOAA satellite and surface-based cloud, radiation, temperature, and precipitation data sets. The "Intelligent Ensemble" method is applied to the RCP4.5 and RCP8.5 anthropogenic climate forcing simulations within the CMIP5 archive to develop a set of climate change scenarios for precipitation, temperature, and surface solar radiation in each USDA Farm Resource Region for use in climate change adaptation studies.

  13. State updating of a distributed hydrological model with Ensemble Kalman Filtering: effects of updating frequency and observation network density on forecast accuracy

    NASA Astrophysics Data System (ADS)

    Rakovec, O.; Weerts, A. H.; Hazenberg, P.; Torfs, P. J. J. F.; Uijlenhoet, R.

    2012-09-01

    This paper presents a study on the optimal setup for discharge assimilation within a spatially distributed hydrological model. The Ensemble Kalman filter (EnKF) is employed to update the grid-based distributed states of such an hourly spatially distributed version of the HBV-96 model. By using a physically based model for the routing, the time delay and attenuation are modelled more realistically. The discharge and states at a given time step are assumed to be dependent on the previous time step only (Markov property). Synthetic and real world experiments are carried out for the Upper Ourthe (1600 km2), a relatively quickly responding catchment in the Belgian Ardennes. We assess the impact on the forecasted discharge of (1) various sets of the spatially distributed discharge gauges and (2) the filtering frequency. The results show that the hydrological forecast at the catchment outlet is improved by assimilating interior gauges. This augmentation of the observation vector improves the forecast more than increasing the updating frequency. In terms of the model states, the EnKF procedure is found to mainly change the pdfs of the two routing model storages, even when the uncertainty in the discharge simulations is smaller than the defined observation uncertainty.

  14. Lattice QCD calculation of the B(s )→D(s) *ℓν form factors at zero recoil and implications for |Vc b|

    NASA Astrophysics Data System (ADS)

    Harrison, Judd; Davies, Christine T. H.; Wingate, Matthew; Hpqcd Collaboration

    2018-03-01

    We present results of a lattice QCD calculation of B →D* and Bs→Ds* axial vector matrix elements with both states at rest. These zero recoil matrix elements provide the normalization necessary to infer a value for the CKM matrix element |Vc b| from experimental measurements of B¯ 0→D*+ℓ-ν ¯ and B¯s0→Ds*+ℓ-ν¯ decay. Results are derived from correlation functions computed with highly improved staggered quarks (HISQ) for light, strange, and charm quark propagators, and nonrelativistic QCD for the bottom quark propagator. The calculation of correlation functions employs MILC Collaboration ensembles over a range of three lattice spacings. These gauge field configurations include sea quark effects of charm, strange, and equal-mass up and down quarks. We use ensembles with physically light up and down quarks, as well as heavier values. Our main results are FB→D *(1 )=0.895 ±0.01 0stat±0.024sys and FBs→Ds*(1 )=0.883 ±0.01 2stat±0.02 8sys . We discuss the consequences for |Vc b| in light of recent investigations into the extrapolation of experimental data to zero recoil.

  15. Chemical entity recognition in patents by combining dictionary-based and statistical approaches

    PubMed Central

    Akhondi, Saber A.; Pons, Ewoud; Afzal, Zubair; van Haagen, Herman; Becker, Benedikt F.H.; Hettne, Kristina M.; van Mulligen, Erik M.; Kors, Jan A.

    2016-01-01

    We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistical one. For this purpose the performance of several lexical resources was assessed using Peregrine, our open-source indexing engine. We combined our dictionary-based results on the patent corpus with the results of tmChem, a chemical recognizer using a conditional random field classifier. To improve the performance of tmChem, we utilized three additional features, viz. part-of-speech tags, lemmas and word-vector clusters. When evaluated on the training data, our final system obtained an F-score of 85.21% for the CEMP task, and an accuracy of 91.53% for the CPD task. On the test set, the best system ranked sixth among 21 teams for CEMP with an F-score of 86.82%, and second among nine teams for CPD with an accuracy of 94.23%. The differences in performance between the best ensemble system and the statistical system separately were small. Database URL: http://biosemantics.org/chemdner-patents PMID:27141091

  16. Prediction of lysine ubiquitylation with ensemble classifier and feature selection.

    PubMed

    Zhao, Xiaowei; Li, Xiangtao; Ma, Zhiqiang; Yin, Minghao

    2011-01-01

    Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.

  17. Predicting complications of percutaneous coronary intervention using a novel support vector method.

    PubMed

    Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan

    2013-01-01

    To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer-Lemeshow χ(2) value (seven cases) and the mean cross-entropy error (eight cases). The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains.

  18. Predicting complications of percutaneous coronary intervention using a novel support vector method

    PubMed Central

    Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan

    2013-01-01

    Objective To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Materials and methods Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. Results The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer–Lemeshow χ2 value (seven cases) and the mean cross-entropy error (eight cases). Conclusions The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains. PMID:23599229

  19. A support vector machine approach for classification of welding defects from ultrasonic signals

    NASA Astrophysics Data System (ADS)

    Chen, Yuan; Ma, Hong-Wei; Zhang, Guang-Ming

    2014-07-01

    Defect classification is an important issue in ultrasonic non-destructive evaluation. A layered multi-class support vector machine (LMSVM) classification system, which combines multiple SVM classifiers through a layered architecture, is proposed in this paper. The proposed LMSVM classification system is applied to the classification of welding defects from ultrasonic test signals. The measured ultrasonic defect echo signals are first decomposed into wavelet coefficients by the wavelet packet transform. The energy of the wavelet coefficients at different frequency channels are used to construct the feature vectors. The bees algorithm (BA) is then used for feature selection and SVM parameter optimisation for the LMSVM classification system. The BA-based feature selection optimises the energy feature vectors. The optimised feature vectors are input to the LMSVM classification system for training and testing. Experimental results of classifying welding defects demonstrate that the proposed technique is highly robust, precise and reliable for ultrasonic defect classification.

  20. Support vector machine based decision for mechanical fault condition monitoring in induction motor using an advanced Hilbert-Park transform.

    PubMed

    Ben Salem, Samira; Bacha, Khmais; Chaari, Abdelkader

    2012-09-01

    In this work we suggest an original fault signature based on an improved combination of Hilbert and Park transforms. Starting from this combination we can create two fault signatures: Hilbert modulus current space vector (HMCSV) and Hilbert phase current space vector (HPCSV). These two fault signatures are subsequently analysed using the classical fast Fourier transform (FFT). The effects of mechanical faults on the HMCSV and HPCSV spectrums are described, and the related frequencies are determined. The magnitudes of spectral components, relative to the studied faults (air-gap eccentricity and outer raceway ball bearing defect), are extracted in order to develop the input vector necessary for learning and testing the support vector machine with an aim of classifying automatically the various states of the induction motor. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  1. Land Warrior (LW)/Mounted Warrior (MW) DOTMLPF Assessment

    DTIC Science & Technology

    2007-06-01

    the DOTMLPF impacts of equipping a Stryker battalion with MW and LW? 3) What is the estimated life cycle cost (LCC) of each LW BOIP alternative? 4...A R E D A T A Assessed costs included: - Support Equipment/Disposal. - Contractor Logistics Support. - Ensemble Hardware...DOTMLPF . 3 - Cost . LUT Navigation Experiment Lethality Experiment AMSAA TRAC-WSMR TRAC-WSMR ATEC LW Test Unit

  2. Off-forward gluonic structure of vector mesons

    NASA Astrophysics Data System (ADS)

    Detmold, W.; Pefkou, D.; Shanahan, P. E.

    2017-06-01

    The spin-independent and transversity generalized form factors (GFFs) of the ϕ meson are studied using lattice QCD calculations with light quark masses corresponding to a pion mass mπ˜450 (5 ) MeV . One transversity and three spin-independent GFFs related to the lowest moments of leading-twist spin-independent and transversity gluon distributions are obtained at six nonzero values of the momentum transfer up to 1.2 GeV 2 . These quantities are compared with the analogous spin-independent quark GFFs and the electromagnetic form factors determined on the same lattice ensemble. The results show quantitative distinction between the spatial distribution of transversely polarized gluons, unpolarized gluons, and quarks and point the way towards further investigations of the gluon structure of nucleons and nuclei.

  3. Mathematical Design Optimization of Wide-Field X-ray Telescopes: Mirror Nodal Positions and Detector Tilts

    NASA Technical Reports Server (NTRS)

    Elsner, R. F.; O'Dell, S. L.; Ramsey, B. D.; Weisskopf, M. C.

    2011-01-01

    We describe a mathematical formalism for determining the mirror shell nodal positions and detector tilts that optimize the spatial resolution averaged over a field-of-view for a nested x-ray telescope, assuming known mirror segment surface prescriptions and known detector focal surface. The results are expressed in terms of ensemble averages over variable combinations of the ray positions and wave vectors in the flat focal plane intersecting the optical axis at the nominal on-axis focus, which can be determined by Monte-Carlo ray traces of the individual mirror shells. This work is part of our continuing efforts to provide analytical tools to aid in the design process for wide-field survey x-ray astronomy missions.

  4. D → Klv semileptonic decay using lattice QCD with HISQ at physical pion masses

    NASA Astrophysics Data System (ADS)

    Chakraborty, Bipasha; Davies, Christine; Koponen, Jonna; Lepage, G. Peter

    2018-03-01

    he quark flavor sector of the Standard Model is a fertile ground to look for new physics effects through a unitarity test of the Cabbibo-Kobayashi-Maskawa (CKM) matrix. We present a lattice QCD calculation of the scalar and the vector form factors (over a large q2 region including q2 = 0) associated with the D→ Klv semi-leptonic decay. This calculation will then allow us to determine the central CKM matrix element, Vcs in the Standard Model, by comparing the lattice QCD results for the form factors and the experimental decay rate. This form factor calculation has been performed on the Nf = 2 + 1 + 1 MILC HISQ ensembles with the physical light quark masses.

  5. Telegraph noise in Markovian master equation for electron transport through molecular junctions

    NASA Astrophysics Data System (ADS)

    Kosov, Daniel S.

    2018-05-01

    We present a theoretical approach to solve the Markovian master equation for quantum transport with stochastic telegraph noise. Considering probabilities as functionals of a random telegraph process, we use Novikov's functional method to convert the stochastic master equation to a set of deterministic differential equations. The equations are then solved in the Laplace space, and the expression for the probability vector averaged over the ensemble of realisations of the stochastic process is obtained. We apply the theory to study the manifestations of telegraph noise in the transport properties of molecular junctions. We consider the quantum electron transport in a resonant-level molecule as well as polaronic regime transport in a molecular junction with electron-vibration interaction.

  6. A lattice calculation of the hadronic vacuum polarization contribution to (g - 2)µ

    NASA Astrophysics Data System (ADS)

    Della Morte, M.; Francis, A.; Gérardin, A.; Gülpers, V.; Herdoíza, G.; von Hippel, G.; Horch, H.; Jäger, B.; Meyer, H. B.; Nyffeler, A.; Wittig, H.

    2018-03-01

    We present results of calculations of the hadronic vacuum polarisation contribution to the muon anomalous magnetic moment. Specifically, we focus on controlling the infrared regime of the vacuum polarisation function. Our results are corrected for finite-size effects by combining the Gounaris-Sakurai parameterisation of the timelike pion form factor with the Lüscher formalism. The impact of quark-disconnected diagrams and the precision of the scale determination is discussed and included in our final result in two-flavour QCD, which carries an overall uncertainty of 6%. We present preliminary results computed on ensembles with Nf = 2 + 1 dynamical flavours and discuss how the long-distance contribution can be accurately constrained by a dedicated spectrum calculation in the iso-vector channel.

  7. Propellants and Life Support SCAPE Suit and ECU Capability

    NASA Technical Reports Server (NTRS)

    Goetzfried, Andreas

    2011-01-01

    This presentation outlines the details for a conference booth that exhibits the Propellant Handlers Ensemble (PHE) and the Environmental Control Unit (ECU) for personnel loading propellants. A demonstration of the ECU Loading will be performed at the conference.

  8. Bayesian data assimilation provides rapid decision support for vector-borne diseases.

    PubMed

    Jewell, Chris P; Brown, Richard G

    2015-07-06

    Predicting the spread of vector-borne diseases in response to incursions requires knowledge of both host and vector demographics in advance of an outbreak. Although host population data are typically available, for novel disease introductions there is a high chance of the pathogen using a vector for which data are unavailable. This presents a barrier to estimating the parameters of dynamical models representing host-vector-pathogen interaction, and hence limits their ability to provide quantitative risk forecasts. The Theileria orientalis (Ikeda) outbreak in New Zealand cattle demonstrates this problem: even though the vector has received extensive laboratory study, a high degree of uncertainty persists over its national demographic distribution. Addressing this, we develop a Bayesian data assimilation approach whereby indirect observations of vector activity inform a seasonal spatio-temporal risk surface within a stochastic epidemic model. We provide quantitative predictions for the future spread of the epidemic, quantifying uncertainty in the model parameters, case infection times and the disease status of undetected infections. Importantly, we demonstrate how our model learns sequentially as the epidemic unfolds and provide evidence for changing epidemic dynamics through time. Our approach therefore provides a significant advance in rapid decision support for novel vector-borne disease outbreaks. © 2015 The Author(s) Published by the Royal Society. All rights reserved.

  9. Support Vector Machines: Relevance Feedback and Information Retrieval.

    ERIC Educational Resources Information Center

    Drucker, Harris; Shahrary, Behzad; Gibbon, David C.

    2002-01-01

    Compares support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred. Includes nine tables. (Contains 24…

  10. Subpixel urban land cover estimation: comparing cubist, random forests, and support vector regression

    Treesearch

    Jeffrey T. Walton

    2008-01-01

    Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...

  11. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.

    PubMed

    Huang, Shujun; Cai, Nianguang; Pacheco, Pedro Penzuti; Narrandes, Shavira; Wang, Yang; Xu, Wayne

    2018-01-01

    Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications. Copyright© 2018, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.

  12. Dual linear structured support vector machine tracking method via scale correlation filter

    NASA Astrophysics Data System (ADS)

    Li, Weisheng; Chen, Yanquan; Xiao, Bin; Feng, Chen

    2018-01-01

    Adaptive tracking-by-detection methods based on structured support vector machine (SVM) performed well on recent visual tracking benchmarks. However, these methods did not adopt an effective strategy of object scale estimation, which limits the overall tracking performance. We present a tracking method based on a dual linear structured support vector machine (DLSSVM) with a discriminative scale correlation filter. The collaborative tracker comprised of a DLSSVM model and a scale correlation filter obtains good results in tracking target position and scale estimation. The fast Fourier transform is applied for detection. Extensive experiments show that our tracking approach outperforms many popular top-ranking trackers. On a benchmark including 100 challenging video sequences, the average precision of the proposed method is 82.8%.

  13. Object recognition of ladar with support vector machine

    NASA Astrophysics Data System (ADS)

    Sun, Jian-Feng; Li, Qi; Wang, Qi

    2005-01-01

    Intensity, range and Doppler images can be obtained by using laser radar. Laser radar can detect much more object information than other detecting sensor, such as passive infrared imaging and synthetic aperture radar (SAR), so it is well suited as the sensor of object recognition. Traditional method of laser radar object recognition is extracting target features, which can be influenced by noise. In this paper, a laser radar recognition method-Support Vector Machine is introduced. Support Vector Machine (SVM) is a new hotspot of recognition research after neural network. It has well performance on digital written and face recognition. Two series experiments about SVM designed for preprocessing and non-preprocessing samples are performed by real laser radar images, and the experiments results are compared.

  14. nu-Anomica: A Fast Support Vector Based Novelty Detection Technique

    NASA Technical Reports Server (NTRS)

    Das, Santanu; Bhaduri, Kanishka; Oza, Nikunj C.; Srivastava, Ashok N.

    2009-01-01

    In this paper we propose nu-Anomica, a novel anomaly detection technique that can be trained on huge data sets with much reduced running time compared to the benchmark one-class Support Vector Machines algorithm. In -Anomica, the idea is to train the machine such that it can provide a close approximation to the exact decision plane using fewer training points and without losing much of the generalization performance of the classical approach. We have tested the proposed algorithm on a variety of continuous data sets under different conditions. We show that under all test conditions the developed procedure closely preserves the accuracy of standard one-class Support Vector Machines while reducing both the training time and the test time by 5 - 20 times.

  15. ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting

    PubMed Central

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561

  16. Support vector machine for automatic pain recognition

    NASA Astrophysics Data System (ADS)

    Monwar, Md Maruf; Rezaei, Siamak

    2009-02-01

    Facial expressions are a key index of emotion and the interpretation of such expressions of emotion is critical to everyday social functioning. In this paper, we present an efficient video analysis technique for recognition of a specific expression, pain, from human faces. We employ an automatic face detector which detects face from the stored video frame using skin color modeling technique. For pain recognition, location and shape features of the detected faces are computed. These features are then used as inputs to a support vector machine (SVM) for classification. We compare the results with neural network based and eigenimage based automatic pain recognition systems. The experiment results indicate that using support vector machine as classifier can certainly improve the performance of automatic pain recognition system.

  17. Design of 2D time-varying vector fields.

    PubMed

    Chen, Guoning; Kwatra, Vivek; Wei, Li-Yi; Hansen, Charles D; Zhang, Eugene

    2012-10-01

    Design of time-varying vector fields, i.e., vector fields that can change over time, has a wide variety of important applications in computer graphics. Existing vector field design techniques do not address time-varying vector fields. In this paper, we present a framework for the design of time-varying vector fields, both for planar domains as well as manifold surfaces. Our system supports the creation and modification of various time-varying vector fields with desired spatial and temporal characteristics through several design metaphors, including streamlines, pathlines, singularity paths, and bifurcations. These design metaphors are integrated into an element-based design to generate the time-varying vector fields via a sequence of basis field summations or spatial constrained optimizations at the sampled times. The key-frame design and field deformation are also introduced to support other user design scenarios. Accordingly, a spatial-temporal constrained optimization and the time-varying transformation are employed to generate the desired fields for these two design scenarios, respectively. We apply the time-varying vector fields generated using our design system to a number of important computer graphics applications that require controllable dynamic effects, such as evolving surface appearance, dynamic scene design, steerable crowd movement, and painterly animation. Many of these are difficult or impossible to achieve via prior simulation-based methods. In these applications, the time-varying vector fields have been applied as either orientation fields or advection fields to control the instantaneous appearance or evolving trajectories of the dynamic effects.

  18. Integrated Optical Dipole Trap for Cold Neutral Atoms with an Optical Waveguide Coupler

    NASA Astrophysics Data System (ADS)

    Lee, J.; Park, D. H.; Mittal, S.; Meng, Y.; Dagenais, M.; Rolston, S. L.

    2013-05-01

    Using an optical waveguide, an integrated optical dipole trap uses two-color (red and blue-detuned) traveling evanescent wave fields for trapping cold neutral atoms. To achieve longitudinal confinement, we propose using an integrated optical waveguide coupler, which provides a potential gradient along the beam propagation direction sufficient to confine atoms. This integrated optical dipole trap can support an atomic ensemble with a large optical depth due to its small mode area. Its quasi-TE0 waveguide mode has an advantage over the HE11 mode of a nanofiber, with little inhomogeneous Zeeman broadening at the trapping region. The longitudinal confinement eliminates the need for a 1D optical lattice, reducing collisional blockaded atomic loading, potentially producing larger ensembles. The waveguide trap allows for scalability and integrability with nano-fabrication technology. We analyze the potential performance of such integrated atom traps and present current research progress towards a fiber-coupled silicon nitride optical waveguide integrable with atom chips. Work is supported by the ARO Atomtronics MURI. Work is supported by the ARO Atomtronics MURI.

  19. Impaired hippocampal place cell dynamics in a mouse model of the 22q11.2 deletion

    PubMed Central

    Zaremba, Jeffrey D; Diamantopoulou, Anastasia; Danielson, Nathan B; Grosmark, Andres D; Kaifosh, Patrick W; Bowler, John C; Liao, Zhenrui; Sparks, Fraser T; Gogos, Joseph A; Losonczy, Attila

    2018-01-01

    Hippocampal place cells represent the cellular substrate of episodic memory. Place cell ensembles reorganize to support learning but must also maintain stable representations to facilitate memory recall. Despite extensive research, the learning-related role of place cell dynamics in health and disease remains elusive. Using chronic two-photon Ca2+ imaging in hippocampal area CA1 of wild-type and Df(16)A+/− mice, an animal model of 22q11.2 deletion syndrome, one of the most common genetic risk factors for cognitive dysfunction and schizophrenia, we found that goal-oriented learning in wild-type mice was supported by stable spatial maps and robust remapping of place fields toward the goal location. Df(16)A+/− mice showed a significant learning deficit accompanied by reduced spatial map stability and the absence of goal-directed place cell reorganization. These results expand our understanding of the hippocampal ensemble dynamics supporting cognitive flexibility and demonstrate their importance in a model of 22q11.2-associated cognitive dysfunction. PMID:28869582

  20. Techniques utilized in the simulated altitude testing of a 2D-CD vectoring and reversing nozzle

    NASA Technical Reports Server (NTRS)

    Block, H. Bruce; Bryant, Lively; Dicus, John H.; Moore, Allan S.; Burns, Maureen E.; Solomon, Robert F.; Sheer, Irving

    1988-01-01

    Simulated altitude testing of a two-dimensional, convergent-divergent, thrust vectoring and reversing exhaust nozzle was accomplished. An important objective of this test was to develop test hardware and techniques to properly operate a vectoring and reversing nozzle within the confines of an altitude test facility. This report presents detailed information on the major test support systems utilized, the operational performance of the systems and the problems encountered, and test equipment improvements recommended for future tests. The most challenging support systems included the multi-axis thrust measurement system, vectored and reverse exhaust gas collection systems, and infrared temperature measurement systems used to evaluate and monitor the nozzle. The feasibility of testing a vectoring and reversing nozzle of this type in an altitude chamber was successfully demonstrated. Supporting systems performed as required. During reverser operation, engine exhaust gases were successfully captured and turned downstream. However, a small amount of exhaust gas spilled out the collector ducts' inlet openings when the reverser was opened more than 60 percent. The spillage did not affect engine or nozzle performance. The three infrared systems which viewed the nozzle through the exhaust collection system worked remarkably well considering the harsh environment.

Top