NASA Astrophysics Data System (ADS)
Ma, Chuang; Bao, Zhong-Kui; Zhang, Hai-Feng
2017-10-01
So far, many network-structure-based link prediction methods have been proposed. However, these methods only highlight one or two structural features of networks, and then use the methods to predict missing links in different networks. The performances of these existing methods are not always satisfied in all cases since each network has its unique underlying structural features. In this paper, by analyzing different real networks, we find that the structural features of different networks are remarkably different. In particular, even in the same network, their inner structural features are utterly different. Therefore, more structural features should be considered. However, owing to the remarkably different structural features, the contributions of different features are hard to be given in advance. Inspired by these facts, an adaptive fusion model regarding link prediction is proposed to incorporate multiple structural features. In the model, a logistic function combing multiple structural features is defined, then the weight of each feature in the logistic function is adaptively determined by exploiting the known structure information. Last, we use the "learnt" logistic function to predict the connection probabilities of missing links. According to our experimental results, we find that the performance of our adaptive fusion model is better than many similarity indices.
Structural features that predict real-value fluctuations of globular proteins.
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2012-05-01
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.
Structural features that predict real-value fluctuations of globular proteins
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2012-01-01
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Structural features based genome-wide characterization and prediction of nucleosome organization
2012-01-01
Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene expression regulation. The results indicated that our proposed methods are effective in predicting nucleosome occupancy and positions and that these structural features are highly predictive of nucleosome organization. The implementation of our DLaNe method based on structural features is available online. PMID:22449207
Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi
2014-01-01
Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.
Wiebe, Nicholas J P; Meyer, Irmtraud M
2010-06-24
The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.
Ni, Qianwu; Chen, Lei
2017-01-01
Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Zheng, Ce; Kurgan, Lukasz
2008-10-10
beta-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of beta-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based beta-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential beta-turns, while the remaining four amino acids are useful to predict non-beta-turns. Empirical evaluation using three nonredundant datasets shows favorable Q total, Q predicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Q total barrier and achieves Q total = 80.9%, MCC = 0.47, and Q predicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between beta-turns and non-beta-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at http://biomine.ece.ualberta.ca/BTNpred/BTNpred.html.
Zheng, Ce; Kurgan, Lukasz
2008-01-01
Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. Results We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between β-turns and non-β-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at . PMID:18847492
Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong
2011-09-01
One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.
Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS
Li, Bi-Qing; Feng, Kai-Yan; Chen, Lei; Huang, Tao; Cai, Yu-Dong
2012-01-01
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction. PMID:22937126
Asymmetric bagging and feature selection for activities prediction of drug molecules.
Li, Guo-Zheng; Meng, Hao-Hua; Lu, Wen-Cong; Yang, Jack Y; Yang, Mary Qu
2008-05-28
Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.
Extracting physicochemical features to predict protein secondary structure.
Huang, Yin-Fu; Chen, Shu-Ying
2013-01-01
We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.
Extracting Physicochemical Features to Predict Protein Secondary Structure
Chen, Shu-Ying
2013-01-01
We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances. PMID:23766688
Defining and predicting structurally conserved regions in protein superfamilies
Huang, Ivan K.; Grishin, Nick V.
2013-01-01
Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics Online PMID:23193223
Jin, Mingwu; Deng, Weishu
2018-05-15
There is a spectrum of the progression from healthy control (HC) to mild cognitive impairment (MCI) without conversion to Alzheimer's disease (AD), to MCI with conversion to AD (cMCI), and to AD. This study aims to predict the different disease stages using brain structural information provided by magnetic resonance imaging (MRI) data. The neighborhood component analysis (NCA) is applied to select most powerful features for prediction. The ensemble decision tree classifier is built to predict which group the subject belongs to. The best features and model parameters are determined by cross validation of the training data. Our results show that 16 out of a total of 429 features were selected by NCA using 240 training subjects, including MMSE score and structural measures in memory-related regions. The boosting tree model with NCA features can achieve prediction accuracy of 56.25% on 160 test subjects. Principal component analysis (PCA) and sequential feature selection (SFS) are used for feature selection, while support vector machine (SVM) is used for classification. The boosting tree model with NCA features outperforms all other combinations of feature selection and classification methods. The results suggest that NCA be a better feature selection strategy than PCA and SFS for the data used in this study. Ensemble tree classifier with boosting is more powerful than SVM to predict the subject group. However, more advanced feature selection and classification methods or additional measures besides structural MRI may be needed to improve the prediction performance. Copyright © 2018 Elsevier B.V. All rights reserved.
Mizianty, Marcin J; Kurgan, Lukasz
2009-12-13
Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.
2009-01-01
Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388
Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors.
Sun, Meijian; Wang, Xia; Zou, Chuanxin; He, Zenghui; Liu, Wei; Li, Honglin
2016-06-07
RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
A deep learning framework for modeling structural features of RNA-binding protein targets
Zhang, Sai; Zhou, Jingtian; Hu, Hailin; Gong, Haipeng; Chen, Ligong; Cheng, Chao; Zeng, Jianyang
2016-01-01
RNA-binding proteins (RBPs) play important roles in the post-transcriptional control of RNAs. Identifying RBP binding sites and characterizing RBP binding preferences are key steps toward understanding the basic mechanisms of the post-transcriptional gene regulation. Though numerous computational methods have been developed for modeling RBP binding preferences, discovering a complete structural representation of the RBP targets by integrating their available structural features in all three dimensions is still a challenging task. In this paper, we develop a general and flexible deep learning framework for modeling structural binding preferences and predicting binding sites of RBPs, which takes (predicted) RNA tertiary structural information into account for the first time. Our framework constructs a unified representation that characterizes the structural specificities of RBP targets in all three dimensions, which can be further used to predict novel candidate binding sites and discover potential binding motifs. Through testing on the real CLIP-seq datasets, we have demonstrated that our deep learning framework can automatically extract effective hidden structural features from the encoded raw sequence and structural profiles, and predict accurate RBP binding sites. In addition, we have conducted the first study to show that integrating the additional RNA tertiary structural features can improve the model performance in predicting RBP binding sites, especially for the polypyrimidine tract-binding protein (PTB), which also provides a new evidence to support the view that RBPs may own specific tertiary structural binding preferences. In particular, the tests on the internal ribosome entry site (IRES) segments yield satisfiable results with experimental support from the literature and further demonstrate the necessity of incorporating RNA tertiary structural information into the prediction model. The source code of our approach can be found in https://github.com/thucombio/deepnet-rbp. PMID:26467480
Changes in quantitative 3D shape features of the optic nerve head associated with age
NASA Astrophysics Data System (ADS)
Christopher, Mark; Tang, Li; Fingert, John H.; Scheetz, Todd E.; Abramoff, Michael D.
2013-02-01
Optic nerve head (ONH) structure is an important biological feature of the eye used by clinicians to diagnose and monitor progression of diseases such as glaucoma. ONH structure is commonly examined using stereo fundus imaging or optical coherence tomography. Stereo fundus imaging provides stereo views of the ONH that retain 3D information useful for characterizing structure. In order to quantify 3D ONH structure, we applied a stereo correspondence algorithm to a set of stereo fundus images. Using these quantitative 3D ONH structure measurements, eigen structures were derived using principal component analysis from stereo images of 565 subjects from the Ocular Hypertension Treatment Study (OHTS). To evaluate the usefulness of the eigen structures, we explored associations with the demographic variables age, gender, and race. Using regression analysis, the eigen structures were found to have significant (p < 0.05) associations with both age and race after Bonferroni correction. In addition, classifiers were constructed to predict the demographic variables based solely on the eigen structures. These classifiers achieved an area under receiver operating characteristic curve of 0.62 in predicting a binary age variable, 0.52 in predicting gender, and 0.67 in predicting race. The use of objective, quantitative features or eigen structures can reveal hidden relationships between ONH structure and demographics. The use of these features could similarly allow specific aspects of ONH structure to be isolated and associated with the diagnosis of glaucoma, disease progression and outcomes, and genetic factors.
Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.
Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing
2016-08-24
Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.
Some of the most interesting CASP11 targets through the eyes of their authors
Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K.; Edwards, Robert A.; Fass, Deborah; Hartmann, Marcus D.; Korycinski, Mateusz; Lewis, Richard J.; Lorimer, Donald; Lupas, Andrei N.; Newman, Janet; Peat, Thomas S.; Piepenbrink, Kurt H.; Prahlad, Janani; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca M.; Seguritan, Victor; Sundberg, Eric J.; Singh, Abhimanyu K.; Wilson, Mark A.
2015-01-01
ABSTRACT The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34–50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:26473983
Some of the most interesting CASP11 targets through the eyes of their authors.
Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K; Edwards, Robert A; Fass, Deborah; Hartmann, Marcus D; Korycinski, Mateusz; Lewis, Richard J; Lorimer, Donald; Lupas, Andrei N; Newman, Janet; Peat, Thomas S; Piepenbrink, Kurt H; Prahlad, Janani; van Raaij, Mark J; Rohwer, Forest; Segall, Anca M; Seguritan, Victor; Sundberg, Eric J; Singh, Abhimanyu K; Wilson, Mark A; Schwede, Torsten
2016-09-01
The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34-50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
How Structure Defines Affinity in Protein-Protein Interactions
Erijman, Ariel; Rosenthal, Eran; Shifman, Julia M.
2014-01-01
Protein-protein interactions (PPI) in nature are conveyed by a multitude of binding modes involving various surfaces, secondary structure elements and intermolecular interactions. This diversity results in PPI binding affinities that span more than nine orders of magnitude. Several early studies attempted to correlate PPI binding affinities to various structure-derived features with limited success. The growing number of high-resolution structures, the appearance of more precise methods for measuring binding affinities and the development of new computational algorithms enable more thorough investigations in this direction. Here, we use a large dataset of PPI structures with the documented binding affinities to calculate a number of structure-based features that could potentially define binding energetics. We explore how well each calculated biophysical feature alone correlates with binding affinity and determine the features that could be used to distinguish between high-, medium- and low- affinity PPIs. Furthermore, we test how various combinations of features could be applied to predict binding affinity and observe a slow improvement in correlation as more features are incorporated into the equation. In addition, we observe a considerable improvement in predictions if we exclude from our analysis low-resolution and NMR structures, revealing the importance of capturing exact intermolecular interactions in our calculations. Our analysis should facilitate prediction of new interactions on the genome scale, better characterization of signaling networks and design of novel binding partners for various target proteins. PMID:25329579
Computational prediction of kink properties of helices in membrane proteins
NASA Astrophysics Data System (ADS)
Mai, T.-L.; Chen, C.-M.
2014-02-01
We have combined molecular dynamics simulations and fold identification procedures to investigate the structure of 696 kinked and 120 unkinked transmembrane (TM) helices in the PDBTM database. Our main aim of this study is to understand the formation of helical kinks by simulating their quasi-equilibrium heating processes, which might be relevant to the prediction of their structural features. The simulated structural features of these TM helices, including the position and the angle of helical kinks, were analyzed and compared with statistical data from PDBTM. From quasi-equilibrium heating processes of TM helices with four very different relaxation time constants, we found that these processes gave comparable predictions of the structural features of TM helices. Overall, 95 % of our best kink position predictions have an error of no more than two residues and 75 % of our best angle predictions have an error of less than 15°. Various structure assessments have been carried out to assess our predicted models of TM helices in PDBTM. Our results show that, in 696 predicted kinked helices, 70 % have a RMSD less than 2 Å, 71 % have a TM-score greater than 0.5, 69 % have a MaxSub score greater than 0.8, 60 % have a GDT-TS score greater than 85, and 58 % have a GDT-HA score greater than 70. For unkinked helices, our predicted models are also highly consistent with their crystal structure. These results provide strong supports for our assumption that kink formation of TM helices in quasi-equilibrium heating processes is relevant to predicting the structure of TM helices.
Improve the prediction of RNA-binding residues using structural neighbours.
Li, Quan; Cao, Zanxia; Liu, Haiyan
2010-03-01
The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cell's basic functions. The identification and prediction of RNA binding sites is important for understanding the RNA-binding mechanism. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary. We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary (PSSM) and other structural information (secondary structure and solvent accessibility) significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing methods, including the amino acid compositions of structure neighbors lead to clearly improvement. A web server was developed for predicting RNA binding residues in a protein sequence (or structure),which is available at http://mcgill.3322.org/RNA/.
Jalem, Randy; Nakayama, Masanobu; Noda, Yusuke; Le, Tam; Takeuchi, Ichiro; Tateyama, Yoshitaka; Yamazaki, Hisatsugu
2018-01-01
Abstract Increasing attention has been paid to materials informatics approaches that promise efficient and fast discovery and optimization of functional inorganic materials. Technical breakthrough is urgently requested to advance this field and efforts have been made in the development of materials descriptors to encode or represent characteristics of crystalline solids, such as chemical composition, crystal structure, electronic structure, etc. We propose a general representation scheme for crystalline solids that lifts restrictions on atom ordering, cell periodicity, and system cell size based on structural descriptors of directly binned Voronoi-tessellation real feature values and atomic/chemical descriptors based on the electronegativity of elements in the crystal. Comparison was made vs. radial distribution function (RDF) feature vector, in terms of predictive accuracy on density functional theory (DFT) material properties: cohesive energy (CE), density (d), electronic band gap (BG), and decomposition energy (Ed). It was confirmed that the proposed feature vector from Voronoi real value binning generally outperforms the RDF-based one for the prediction of aforementioned properties. Together with electronegativity-based features, Voronoi-tessellation features from a given crystal structure that are derived from second-nearest neighbor information contribute significantly towards prediction. PMID:29707064
Jalem, Randy; Nakayama, Masanobu; Noda, Yusuke; Le, Tam; Takeuchi, Ichiro; Tateyama, Yoshitaka; Yamazaki, Hisatsugu
2018-01-01
Increasing attention has been paid to materials informatics approaches that promise efficient and fast discovery and optimization of functional inorganic materials. Technical breakthrough is urgently requested to advance this field and efforts have been made in the development of materials descriptors to encode or represent characteristics of crystalline solids, such as chemical composition, crystal structure, electronic structure, etc. We propose a general representation scheme for crystalline solids that lifts restrictions on atom ordering, cell periodicity, and system cell size based on structural descriptors of directly binned Voronoi-tessellation real feature values and atomic/chemical descriptors based on the electronegativity of elements in the crystal. Comparison was made vs. radial distribution function (RDF) feature vector, in terms of predictive accuracy on density functional theory (DFT) material properties: cohesive energy (CE), density ( d ), electronic band gap (BG), and decomposition energy (Ed). It was confirmed that the proposed feature vector from Voronoi real value binning generally outperforms the RDF-based one for the prediction of aforementioned properties. Together with electronegativity-based features, Voronoi-tessellation features from a given crystal structure that are derived from second-nearest neighbor information contribute significantly towards prediction.
Accurate prediction of personalized olfactory perception from large-scale chemoinformatic features.
Li, Hongyang; Panwar, Bharat; Omenn, Gilbert S; Guan, Yuanfang
2018-02-01
The olfactory stimulus-percept problem has been studied for more than a century, yet it is still hard to precisely predict the odor given the large-scale chemoinformatic features of an odorant molecule. A major challenge is that the perceived qualities vary greatly among individuals due to different genetic and cultural backgrounds. Moreover, the combinatorial interactions between multiple odorant receptors and diverse molecules significantly complicate the olfaction prediction. Many attempts have been made to establish structure-odor relationships for intensity and pleasantness, but no models are available to predict the personalized multi-odor attributes of molecules. In this study, we describe our winning algorithm for predicting individual and population perceptual responses to various odorants in the DREAM Olfaction Prediction Challenge. We find that random forest model consisting of multiple decision trees is well suited to this prediction problem, given the large feature spaces and high variability of perceptual ratings among individuals. Integrating both population and individual perceptions into our model effectively reduces the influence of noise and outliers. By analyzing the importance of each chemical feature, we find that a small set of low- and nondegenerative features is sufficient for accurate prediction. Our random forest model successfully predicts personalized odor attributes of structurally diverse molecules. This model together with the top discriminative features has the potential to extend our understanding of olfactory perception mechanisms and provide an alternative for rational odorant design.
Sturm, Marc; Quinten, Sascha; Huber, Christian G.; Kohlbacher, Oliver
2007-01-01
We propose a new model for predicting the retention time of oligonucleotides. The model is based on ν support vector regression using features derived from base sequence and predicted secondary structure of oligonucleotides. Because of the secondary structure information, the model is applicable even at relatively low temperatures where the secondary structure is not suppressed by thermal denaturing. This makes the prediction of oligonucleotide retention time for arbitrary temperatures possible, provided that the target temperature lies within the temperature range of the training data. We describe different possibilities of feature calculation from base sequence and secondary structure, present the results and compare our model to existing models. PMID:17567619
On the structural context and identification of enzyme catalytic residues.
Chien, Yu-Tung; Huang, Shao-Wei
2013-01-01
Enzymes play important roles in most of the biological processes. Although only a small fraction of residues are directly involved in catalytic reactions, these catalytic residues are the most crucial parts in enzymes. The study of the fundamental and unique features of catalytic residues benefits the understanding of enzyme functions and catalytic mechanisms. In this work, we analyze the structural context of catalytic residues based on theoretical and experimental structure flexibility. The results show that catalytic residues have distinct structural features and context. Their neighboring residues, whether sequence or structure neighbors within specific range, are usually structurally more rigid than those of noncatalytic residues. The structural context feature is combined with support vector machine to identify catalytic residues from enzyme structure. The prediction results are better or comparable to those of recent structure-based prediction methods.
PredictProtein—an open resource for online prediction of protein structural and functional features
Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard
2014-01-01
PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431
2014-01-01
Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:24776231
Cao, Renzhi; Wang, Zheng; Wang, Yiheng; Cheng, Jianlin
2014-04-28
It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.
Critical Features of Fragment Libraries for Protein Structure Prediction
dos Santos, Karina Baptista
2017-01-01
The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928
Critical Features of Fragment Libraries for Protein Structure Prediction.
Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel
2017-01-01
The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.
2010-01-01
Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480
On the importance of cotranscriptional RNA structure formation
Lai, Daniel; Proctor, Jeff R.; Meyer, Irmtraud M.
2013-01-01
The expression of genes, both coding and noncoding, can be significantly influenced by RNA structural features of their corresponding transcripts. There is by now mounting experimental and some theoretical evidence that structure formation in vivo starts during transcription and that this cotranscriptional folding determines the functional RNA structural features that are being formed. Several decades of research in bioinformatics have resulted in a wide range of computational methods for predicting RNA secondary structures. Almost all state-of-the-art methods in terms of prediction accuracy, however, completely ignore the process of structure formation and focus exclusively on the final RNA structure. This review hopes to bridge this gap. We summarize the existing evidence for cotranscriptional folding and then review the different, currently used strategies for RNA secondary-structure prediction. Finally, we propose a range of ideas on how state-of-the-art methods could be potentially improved by explicitly capturing the process of cotranscriptional structure formation. PMID:24131802
Insights into multimodal imaging classification of ADHD
Colby, John B.; Rudie, Jeffrey D.; Brown, Jesse A.; Douglas, Pamela K.; Cohen, Mark S.; Shehzad, Zarrar
2012-01-01
Attention deficit hyperactivity disorder (ADHD) currently is diagnosed in children by clinicians via subjective ADHD-specific behavioral instruments and by reports from the parents and teachers. Considering its high prevalence and large economic and societal costs, a quantitative tool that aids in diagnosis by characterizing underlying neurobiology would be extremely valuable. This provided motivation for the ADHD-200 machine learning (ML) competition, a multisite collaborative effort to investigate imaging classifiers for ADHD. Here we present our ML approach, which used structural and functional magnetic resonance imaging data, combined with demographic information, to predict diagnostic status of individuals with ADHD from typically developing (TD) children across eight different research sites. Structural features included quantitative metrics from 113 cortical and non-cortical regions. Functional features included Pearson correlation functional connectivity matrices, nodal and global graph theoretical measures, nodal power spectra, voxelwise global connectivity, and voxelwise regional homogeneity. We performed feature ranking for each site and modality using the multiple support vector machine recursive feature elimination (SVM-RFE) algorithm, and feature subset selection by optimizing the expected generalization performance of a radial basis function kernel SVM (RBF-SVM) trained across a range of the top features. Site-specific RBF-SVMs using these optimal feature sets from each imaging modality were used to predict the class labels of an independent hold-out test set. A voting approach was used to combine these multiple predictions and assign final class labels. With this methodology we were able to predict diagnosis of ADHD with 55% accuracy (versus a 39% chance level in this sample), 33% sensitivity, and 80% specificity. This approach also allowed us to evaluate predictive structural and functional features giving insight into abnormal brain circuitry in ADHD. PMID:22912605
Knowledge-based fragment binding prediction.
Tang, Grace W; Altman, Russ B
2014-04-01
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.
Knowledge-based Fragment Binding Prediction
Tang, Grace W.; Altman, Russ B.
2014-01-01
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening. PMID:24762971
Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J. Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V.; Craig, Timothy K.; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C.; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten
2014-01-01
For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, over 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this paper, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict trans-membrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin IL-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fibre protein gp17 from bacteriophage T7; the Bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. PMID:24318984
Fan, Ming; Zheng, Bin; Li, Lihua
2015-10-01
Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.
Complete fold annotation of the human proteome using a novel structural feature space.
Middleton, Sarah A; Illuminati, Joseph; Kim, Junhyong
2017-04-13
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.
Complete fold annotation of the human proteome using a novel structural feature space
Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong
2017-01-01
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families. PMID:28406174
Automated prediction of protein function and detection of functional sites from structure.
Pazos, Florencio; Sternberg, Michael J E
2004-10-12
Current structural genomics projects are yielding structures for proteins whose functions are unknown. Accordingly, there is a pressing requirement for computational methods for function prediction. Here we present PHUNCTIONER, an automatic method for structure-based function prediction using automatically extracted functional sites (residues associated to functions). The method relates proteins with the same function through structural alignments and extracts 3D profiles of conserved residues. Functional features to train the method are extracted from the Gene Ontology (GO) database. The method extracts these features from the entire GO hierarchy and hence is applicable across the whole range of function specificity. 3D profiles associated with 121 GO annotations were extracted. We tested the power of the method both for the prediction of function and for the extraction of functional sites. The success of function prediction by our method was compared with the standard homology-based method. In the zone of low sequence similarity (approximately 15%), our method assigns the correct GO annotation in 90% of the protein structures considered, approximately 20% higher than inheritance of function from the closest homologue.
Critical Song Features for Auditory Pattern Recognition in Crickets
Meckenhäuser, Gundula; Hennig, R. Matthias; Nawrot, Martin P.
2013-01-01
Many different invertebrate and vertebrate species use acoustic communication for pair formation. In the cricket Gryllus bimaculatus, females recognize their species-specific calling song and localize singing males by positive phonotaxis. The song pattern of males has a clear structure consisting of brief and regular pulses that are grouped into repetitive chirps. Information is thus present on a short and a long time scale. Here, we ask which structural features of the song critically determine the phonotactic performance. To this end we employed artificial neural networks to analyze a large body of behavioral data that measured females’ phonotactic behavior under systematic variation of artificially generated song patterns. In a first step we used four non-redundant descriptive temporal features to predict the female response. The model prediction showed a high correlation with the experimental results. We used this behavioral model to explore the integration of the two different time scales. Our result suggested that only an attractive pulse structure in combination with an attractive chirp structure reliably induced phonotactic behavior to signals. In a further step we investigated all feature sets, each one consisting of a different combination of eight proposed temporal features. We identified feature sets of size two, three, and four that achieve highest prediction power by using the pulse period from the short time scale plus additional information from the long time scale. PMID:23437054
Mining the key predictors for event outbreaks in social networks
NASA Astrophysics Data System (ADS)
Yi, Chengqi; Bao, Yuanyuan; Xue, Yibo
2016-04-01
It will be beneficial to devise a method to predict a so-called event outbreak. Existing works mainly focus on exploring effective methods for improving the accuracy of predictions, while ignoring the underlying causes: What makes event go viral? What factors that significantly influence the prediction of an event outbreak in social networks? In this paper, we proposed a novel definition for an event outbreak, taking into account the structural changes to a network during the propagation of content. In addition, we investigated features that were sensitive to predicting an event outbreak. In order to investigate the universality of these features at different stages of an event, we split the entire lifecycle of an event into 20 equal segments according to the proportion of the propagation time. We extracted 44 features, including features related to content, users, structure, and time, from each segment of the event. Based on these features, we proposed a prediction method using supervised classification algorithms to predict event outbreaks. Experimental results indicate that, as time goes by, our method is highly accurate, with a precision rate ranging from 79% to 97% and a recall rate ranging from 74% to 97%. In addition, after applying a feature-selection algorithm, the top five selected features can considerably improve the accuracy of the prediction. Data-driven experimental results show that the entropy of the eigenvector centrality, the entropy of the PageRank, the standard deviation of the betweenness centrality, the proportion of re-shares without content, and the average path length are the key predictors for an event outbreak. Our findings are especially useful for further exploring the intrinsic characteristics of outbreak prediction.
Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin
2007-12-01
Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V; Craig, Timothy K; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C; van Raaij, Mark J; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten
2014-02-01
For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. Copyright © 2013 The Authors. Wiley Periodicals, Inc.
DemQSAR: predicting human volume of distribution and clearance of drugs
NASA Astrophysics Data System (ADS)
Demir-Kavuk, Ozgur; Bentzien, Jörg; Muegge, Ingo; Knapp, Ernst-Walter
2011-12-01
In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure-activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VDss) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VDss and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/.
DemQSAR: predicting human volume of distribution and clearance of drugs.
Demir-Kavuk, Ozgur; Bentzien, Jörg; Muegge, Ingo; Knapp, Ernst-Walter
2011-12-01
In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure-activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VD(ss)) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VD(ss) and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/ .
Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong
2009-07-01
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.
Predictive information processing in music cognition. A critical review.
Rohrmeier, Martin A; Koelsch, Stefan
2012-02-01
Expectation and prediction constitute central mechanisms in the perception and cognition of music, which have been explored in theoretical and empirical accounts. We review the scope and limits of theoretical accounts of musical prediction with respect to feature-based and temporal prediction. While the concept of prediction is unproblematic for basic single-stream features such as melody, it is not straight-forward for polyphonic structures or higher-order features such as formal predictions. Behavioural results based on explicit and implicit (priming) paradigms provide evidence of priming in various domains that may reflect predictive behaviour. Computational learning models, including symbolic (fragment-based), probabilistic/graphical, or connectionist approaches, provide well-specified predictive models of specific features and feature combinations. While models match some experimental results, full-fledged music prediction cannot yet be modelled. Neuroscientific results regarding the early right-anterior negativity (ERAN) and mismatch negativity (MMN) reflect expectancy violations on different levels of processing complexity, and provide some neural evidence for different predictive mechanisms. At present, the combinations of neural and computational modelling methodologies are at early stages and require further research. Copyright © 2012 Elsevier B.V. All rights reserved.
Krol, Jacek; Sobczak, Krzysztof; Wilczynska, Urszula; Drath, Maria; Jasinska, Anna; Kaczynska, Danuta; Krzyzosiak, Wlodzimierz J
2004-10-01
We have established the structures of 10 human microRNA (miRNA) precursors using biochemical methods. Eight of these structures turned out to be different from those that were computer-predicted. The differences localized in the terminal loop region and at the opposite side of the precursor hairpin stem. We have analyzed the features of these structures from the perspectives of miRNA biogenesis and active strand selection. We demonstrated the different thermodynamic stability profiles for pre-miRNA hairpins harboring miRNAs at their 5'- and 3'-sides and discussed their functional implications. Our results showed that miRNA prediction based on predicted precursor structures may give ambiguous results, and the success rate is significantly higher for the experimentally determined structures. On the other hand, the differences between the predicted and experimentally determined structures did not affect the stability of termini produced through "conceptual dicing." This result confirms the value of thermodynamic analysis based on mfold as a predictor of strand section by RNAi-induced silencing complex (RISC).
Kirschner, Andreas; Frishman, Dmitrij
2008-10-01
Prediction of beta-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting beta-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three aspects of protein structure: beta-turns, beta-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence profiles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research fields where multiple mutually depending target classes need to be predicted. http://webclu.bio.wzw.tum.de/predator-web/.
Suresh, V; Parthasarathy, S
2014-01-01
We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
Gao, Yu-Fei; Li, Bi-Qing; Cai, Yu-Dong; Feng, Kai-Yan; Li, Zhan-Dong; Jiang, Yang
2013-01-27
Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.
Antibody-protein interactions: benchmark datasets and prediction tools evaluation
Ponomarenko, Julia V; Bourne, Philip E
2007-01-01
Background The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. Results Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper. Conclusion It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately discriminatory features can be found. PMID:17910770
regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution.
Zhang, Xinjun; Li, Meng; Lin, Hai; Rao, Xi; Feng, Weixing; Yang, Yuedong; Mort, Matthew; Cooper, David N; Wang, Yue; Wang, Yadong; Wells, Clark; Zhou, Yaoqi; Liu, Yunlong
2017-09-01
While synonymous single-nucleotide variants (sSNVs) have largely been unstudied, since they do not alter protein sequence, mounting evidence suggests that they may affect RNA conformation, splicing, and the stability of nascent-mRNAs to promote various diseases. Accurately prioritizing deleterious sSNVs from a pool of neutral ones can significantly improve our ability of selecting functional genetic variants identified from various genome-sequencing projects, and, therefore, advance our understanding of disease etiology. In this study, we develop a computational algorithm to prioritize sSNVs based on their impact on mRNA splicing and protein function. In addition to genomic features that potentially affect splicing regulation, our proposed algorithm also includes dozens structural features that characterize the functions of alternatively spliced exons on protein function. Our systematical evaluation on thousands of sSNVs suggests that several structural features, including intrinsic disorder protein scores, solvent accessible surface areas, protein secondary structures, and known and predicted protein family domains, show significant differences between disease-causing and neutral sSNVs. Our result suggests that the protein structure features offer an added dimension of information while distinguishing disease-causing and neutral synonymous variants. The inclusion of structural features increases the predictive accuracy for functional sSNV prioritization.
Complete fold annotation of the human proteome using a novel structural feature space
Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong
2017-04-13
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less
Prediction of interface residue based on the features of residue interaction network.
Jiao, Xiong; Ranganathan, Shoba
2017-11-07
Protein-protein interaction plays a crucial role in the cellular biological processes. Interface prediction can improve our understanding of the molecular mechanisms of the related processes and functions. In this work, we propose a classification method to recognize the interface residue based on the features of a weighted residue interaction network. The random forest algorithm is used for the prediction and 16 network parameters and the B-factor are acting as the element of the input feature vector. Compared with other similar work, the method is feasible and effective. The relative importance of these features also be analyzed to identify the key feature for the prediction. Some biological meaning of the important feature is explained. The results of this work can be used for the related work about the structure-function relationship analysis via a residue interaction network model. Copyright © 2017 Elsevier Ltd. All rights reserved.
PrAS: Prediction of amidation sites using multiple feature extraction.
Wang, Tong; Zheng, Wei; Wuyun, Qiqige; Wu, Zhenfeng; Ruan, Jishou; Hu, Gang; Gao, Jianzhao
2017-02-01
Amidation plays an important role in a variety of pathological processes and serious diseases like neural dysfunction and hypertension. However, identification of protein amidation sites through traditional experimental methods is time consuming and expensive. In this paper, we proposed a novel predictor for Prediction of Amidation Sites (PrAS), which is the first software package for academic users. The method incorporated four representative feature types, which are position-based features, physicochemical and biochemical properties features, predicted structure-based features and evolutionary information features. A novel feature selection method, positive contribution feature selection was proposed to optimize features. PrAS achieved AUC of 0.96, accuracy of 92.1%, sensitivity of 81.2%, specificity of 94.9% and MCC of 0.76 on the independent test set. PrAS is freely available at https://sourceforge.net/p/praspkg. Copyright © 2016 Elsevier Ltd. All rights reserved.
DeLeon, Orlando; Hodis, Hagit; O’Malley, Yunxia; Johnson, Jacklyn; Salimi, Hamid; Zhai, Yinjie; Winter, Elizabeth; Remec, Claire; Eichelberger, Noah; Van Cleave, Brandon; Puliadi, Ramya; Harrington, Robert D.; Stapleton, Jack T.; Haim, Hillel
2017-01-01
The envelope glycoproteins (Envs) of HIV-1 continuously evolve in the host by random mutations and recombination events. The resulting diversity of Env variants circulating in the population and their continuing diversification process limit the efficacy of AIDS vaccines. We examined the historic changes in Env sequence and structural features (measured by integrity of epitopes on the Env trimer) in a geographically defined population in the United States. As expected, many Env features were relatively conserved during the 1980s. From this state, some features diversified whereas others remained conserved across the years. We sought to identify “clues” to predict the observed historic diversification patterns. Comparison of viruses that cocirculate in patients at any given time revealed that each feature of Env (sequence or structural) exists at a defined level of variance. The in-host variance of each feature is highly conserved among individuals but can vary between different HIV-1 clades. We designate this property “volatility” and apply it to model evolution of features as a linear diffusion process that progresses with increasing genetic distance. Volatilities of different features are highly correlated with their divergence in longitudinally monitored patients. Volatilities of features also correlate highly with their population-level diversification. Using volatility indices measured from a small number of patient samples, we accurately predict the population diversity that developed for each feature over the course of 30 years. Amino acid variants that evolved at key antigenic sites are also predicted well. Therefore, small “fluctuations” in feature values measured in isolated patient samples accurately describe their potential for population-level diversification. These tools will likely contribute to the design of population-targeted AIDS vaccines by effectively capturing the diversity of currently circulating strains and addressing properties of variants expected to appear in the future. PMID:28384158
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2015-01-01
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning.
Mirzaei, Shokoufeh; Sidi, Tomer; Keasar, Chen; Crivelli, Silvia
2016-08-24
The function of a protein is determined by its structure, which creates a need for efficient methods of protein structure determination to advance scientific and medical research. Because current experimental structure determination methods carry a high price tag, computational predictions are highly desirable. Given a protein sequence, computational methods produce numerous 3D structures known as decoys. However, selection of the best quality decoys is challenging as the end users can handle only a few ones. Therefore, scoring functions are central to decoy selection. They combine measurable features into a single number indicator of decoy quality. Unfortunately, current scoring functions do not consistently select the best decoys. Machine learning techniques offer great potential to improve decoy scoring. This paper presents two machine-learning based scoring functions to predict the quality of proteins structures, i.e., the similarity between the predicted structure and the experimental one without knowing the latter. We use different metrics to compare these scoring functions against three state-of-the-art scores. This is a first attempt at comparing different scoring functions using the same non-redundant dataset for training and testing and the same features. The results show that adding informative features may be more significant than the method used.
Predicting turns in proteins with a unified model.
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
Predicting Turns in Proteins with a Unified Model
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872
NASA Astrophysics Data System (ADS)
Nagarajan, Mahesh B.; Checefsky, Walter A.; Abidin, Anas Z.; Tsai, Halley; Wang, Xixi; Hobbs, Susan K.; Bauer, Jan S.; Baum, Thomas; Wismüller, Axel
2015-03-01
While the proximal femur is preferred for measuring bone mineral density (BMD) in fracture risk estimation, the introduction of volumetric quantitative computed tomography has revealed stronger associations between BMD and spinal fracture status. In this study, we propose to capture properties of trabecular bone structure in spinal vertebrae with advanced second-order statistical features for purposes of fracture risk assessment. For this purpose, axial multi-detector CT (MDCT) images were acquired from 28 spinal vertebrae specimens using a whole-body 256-row CT scanner with a dedicated calibration phantom. A semi-automated method was used to annotate the trabecular compartment in the central vertebral slice with a circular region of interest (ROI) to exclude cortical bone; pixels within were converted to values indicative of BMD. Six second-order statistical features derived from gray-level co-occurrence matrices (GLCM) and the mean BMD within the ROI were then extracted and used in conjunction with a generalized radial basis functions (GRBF) neural network to predict the failure load of the specimens; true failure load was measured through biomechanical testing. Prediction performance was evaluated with a root-mean-square error (RMSE) metric. The best prediction performance was observed with GLCM feature `correlation' (RMSE = 1.02 ± 0.18), which significantly outperformed all other GLCM features (p < 0.01). GLCM feature correlation also significantly outperformed MDCTmeasured mean BMD (RMSE = 1.11 ± 0.17) (p< 10-4). These results suggest that biomechanical strength prediction in spinal vertebrae can be significantly improved through characterization of trabecular bone structure with GLCM-derived texture features.
In silico systems for the prediction of the ability of chemicals to induce carcinogenicity in rodents have generally relied on knowledge of the structure and physical-chemical features of the compound, as well as the mutagenic and genotoxic features of the compound in various bio...
Zheng, Lu-Lu; Niu, Shen; Hao, Pei; Feng, KaiYan; Cai, Yu-Dong; Li, Yixue
2011-01-01
Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations. PMID:22174779
Biological and functional relevance of CASP predictions
Liu, Tianyun; Ish‐Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D.
2017-01-01
Abstract Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo‐sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo‐sites), and Ten sites containing important motifs, loops, or key residues with important disease‐associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best‐ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand‐binding sites, most prediction methods have higher performance on apo‐sites than holo‐sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein‐protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein‐protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. PMID:28975675
Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.
2009-01-01
In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409
bpRNA: large-scale automated annotation and analysis of RNA secondary structure.
Danaee, Padideh; Rouches, Mason; Wiley, Michelle; Deng, Dezhong; Huang, Liang; Hendrix, David
2018-05-09
While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.
Rtools: a web server for various secondary structural analyses on single RNA sequences.
Hamada, Michiaki; Ono, Yukiteru; Kiryu, Hisanori; Sato, Kengo; Kato, Yuki; Fukunaga, Tsukasa; Mori, Ryota; Asai, Kiyoshi
2016-07-08
The secondary structures, as well as the nucleotide sequences, are the important features of RNA molecules to characterize their functions. According to the thermodynamic model, however, the probability of any secondary structure is very small. As a consequence, any tool to predict the secondary structures of RNAs has limited accuracy. On the other hand, there are a few tools to compensate the imperfect predictions by calculating and visualizing the secondary structural information from RNA sequences. It is desirable to obtain the rich information from those tools through a friendly interface. We implemented a web server of the tools to predict secondary structures and to calculate various structural features based on the energy models of secondary structures. By just giving an RNA sequence to the web server, the user can get the different types of solutions of the secondary structures, the marginal probabilities such as base-paring probabilities, loop probabilities and accessibilities of the local bases, the energy changes by arbitrary base mutations as well as the measures for validations of the predicted secondary structures. The web server is available at http://rtools.cbrc.jp, which integrates software tools, CentroidFold, CentroidHomfold, IPKnot, CapR, Raccess, Rchange and RintD. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Effects of metric hierarchy and rhyme predictability on word duration in The Cat in the Hat.
Breen, Mara
2018-05-01
Word durations convey many types of linguistic information, including intrinsic lexical features like length and frequency and contextual features like syntactic and semantic structure. The current study was designed to investigate whether hierarchical metric structure and rhyme predictability account for durational variation over and above other features in productions of a rhyming, metrically-regular children's book: The Cat in the Hat (Dr. Seuss, 1957). One-syllable word durations and inter-onset intervals were modeled as functions of segment number, lexical frequency, word class, syntactic structure, repetition, and font emphasis. Consistent with prior work, factors predicting longer word durations and inter-onset intervals included more phonemes, lower frequency, first mention, alignment with a syntactic boundary, and capitalization. A model parameter corresponding to metric grid height improved model fit of word durations and inter-onset intervals. Specifically, speakers realized five levels of metric hierarchy with inter-onset intervals such that interval duration increased linearly with increased height in the metric hierarchy. Conversely, speakers realized only three levels of metric hierarchy with word duration, demonstrating that they shortened the highly predictable rhyme resolutions. These results further understanding of the factors that affect spoken word duration, and demonstrate the myriad cues that children receive about linguistic structure from nursery rhymes. Copyright © 2018 Elsevier B.V. All rights reserved.
Biological and functional relevance of CASP predictions.
Liu, Tianyun; Ish-Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D; Altman, Russ B
2018-03-01
Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo-sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo-sites), and Ten sites containing important motifs, loops, or key residues with important disease-associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best-ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand-binding sites, most prediction methods have higher performance on apo-sites than holo-sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein-protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein-protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. © 2017 The Authors Proteins: Structure, Function and Bioinformatics Published by Wiley Periodicals, Inc.
Davie, Stuart J; Di Pasquale, Nicodemo; Popelier, Paul L A
2016-10-15
Machine learning algorithms have been demonstrated to predict atomistic properties approaching the accuracy of quantum chemical calculations at significantly less computational cost. Difficulties arise, however, when attempting to apply these techniques to large systems, or systems possessing excessive conformational freedom. In this article, the machine learning method kriging is applied to predict both the intra-atomic and interatomic energies, as well as the electrostatic multipole moments, of the atoms of a water molecule at the center of a 10 water molecule (decamer) cluster. Unlike previous work, where the properties of small water clusters were predicted using a molecular local frame, and where training set inputs (features) were based on atomic index, a variety of feature definitions and coordinate frames are considered here to increase prediction accuracy. It is shown that, for a water molecule at the center of a decamer, no single method of defining features or coordinate schemes is optimal for every property. However, explicitly accounting for the structure of the first solvation shell in the definition of the features of the kriging training set, and centring the coordinate frame on the atom-of-interest will, in general, return better predictions than models that apply the standard methods of feature definition, or a molecular coordinate frame. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
Zhu, Jianwei; Zhang, Haicang; Li, Shuai Cheng; Wang, Chao; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo
2017-12-01
Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge. In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent performance improvement, indicating robustness of our approach. Furthermore, bi-clustering results of the extracted features are compatible with fold hierarchy of proteins, implying that these features are fold-specific. Together, these results suggest that the features extracted from predicted contacts are orthogonal to alignment-related features, and the combination of them could greatly facilitate fold recognition at superfamily/fold levels and template-based prediction of protein structures. Source code of DeepFR is freely available through https://github.com/zhujianwei31415/deepfr, and a web server is available through http://protein.ict.ac.cn/deepfr. zheng@itp.ac.cn or dbu@ict.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Prediction of phenotypes of missense mutations in human proteins from biological assemblies.
Wei, Qiong; Xu, Qifang; Dunbrack, Roland L
2013-02-01
Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Copyright © 2012 Wiley Periodicals, Inc.
Wang, Yong-Cui; Wang, Yong; Yang, Zhi-Xia; Deng, Nai-Yang
2011-06-20
Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.
Fang, Chao; Shang, Yi; Xu, Dong
2018-05-01
Protein secondary structure prediction can provide important information for protein 3D structure prediction and protein functions. Deep learning offers a new opportunity to significantly improve prediction accuracy. In this article, a new deep neural network architecture, named the Deep inception-inside-inception (Deep3I) network, is proposed for protein secondary structure prediction and implemented as a software tool MUFOLD-SS. The input to MUFOLD-SS is a carefully designed feature matrix corresponding to the primary amino acid sequence of a protein, which consists of a rich set of information derived from individual amino acid, as well as the context of the protein sequence. Specifically, the feature matrix is a composition of physio-chemical properties of amino acids, PSI-BLAST profile, and HHBlits profile. MUFOLD-SS is composed of a sequence of nested inception modules and maps the input matrix to either eight states or three states of secondary structures. The architecture of MUFOLD-SS enables effective processing of local and global interactions between amino acids in making accurate prediction. In extensive experiments on multiple datasets, MUFOLD-SS outperformed the best existing methods and other deep neural networks significantly. MUFold-SS can be downloaded from http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html. © 2018 Wiley Periodicals, Inc.
Mooney, Catherine; Haslam, Niall J.; Pollastri, Gianluca; Shields, Denis C.
2012-01-01
The conventional wisdom is that certain classes of bioactive peptides have specific structural features that endow their particular functions. Accordingly, predictions of bioactivity have focused on particular subgroups, such as antimicrobial peptides. We hypothesized that bioactive peptides may share more general features, and assessed this by contrasting the predictive power of existing antimicrobial predictors as well as a novel general predictor, PeptideRanker, across different classes of peptides. We observed that existing antimicrobial predictors had reasonable predictive power to identify peptides of certain other classes i.e. toxin and venom peptides. We trained two general predictors of peptide bioactivity, one focused on short peptides (4–20 amino acids) and one focused on long peptides ( amino acids). These general predictors had performance that was typically as good as, or better than, that of specific predictors. We noted some striking differences in the features of short peptide and long peptide predictions, in particular, high scoring short peptides favour phenylalanine. This is consistent with the hypothesis that short and long peptides have different functional constraints, perhaps reflecting the difficulty for typical short peptides in supporting independent tertiary structure. We conclude that there are general shared features of bioactive peptides across different functional classes, indicating that computational prediction may accelerate the discovery of novel bioactive peptides and aid in the improved design of existing peptides, across many functional classes. An implementation of the predictive method, PeptideRanker, may be used to identify among a set of peptides those that may be more likely to be bioactive. PMID:23056189
NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data.
Mao, Wusong; Cong, Peisheng; Wang, Zhiheng; Lu, Longjian; Zhu, Zhongliang; Li, Tonghua
2013-01-01
Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.
Predicting beta-turns in proteins using support vector machines with fractional polynomials
2013-01-01
Background β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. Results We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. Conclusions In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods. PMID:24565438
Predicting beta-turns in proteins using support vector machines with fractional polynomials.
Elbashir, Murtada; Wang, Jianxin; Wu, Fang-Xiang; Wang, Lusheng
2013-11-07
β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.
Predicting human age using regional morphometry and inter-regional morphological similarity
NASA Astrophysics Data System (ADS)
Wang, Xun-Heng; Li, Lihua
2016-03-01
The goal of this study is predicting human age using neuro-metrics derived from structural MRI, as well as investigating the relationships between age and predictive neuro-metrics. To this end, a cohort of healthy subjects were recruited from 1000 Functional Connectomes Project. The ages of the participations were ranging from 7 to 83 (36.17+/-20.46). The structural MRI for each subject was preprocessed using FreeSurfer, resulting in regional cortical thickness, mean curvature, regional volume and regional surface area for 148 anatomical parcellations. The individual age was predicted from the combination of regional and inter-regional neuro-metrics. The prediction accuracy is r = 0.835, p < 0.00001, evaluated by Pearson correlation coefficient between predicted ages and actual ages. Moreover, the LASSO linear regression also found certain predictive features, most of which were inter-regional features. The turning-point of the developmental trajectories in human brain was around 40 years old based on regional cortical thickness. In conclusion, structural MRI could be potential biomarkers for the aging in human brain. The human age could be successfully predicted from the combination of regional morphometry and inter-regional morphological similarity. The inter-regional measures could be beneficial to investigating human brain connectome.
Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul
2013-01-01
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
Prediction of protein secondary structure content for the twilight zone sequences.
Homaeian, Leila; Kurgan, Lukasz A; Ruan, Jishou; Cios, Krzysztof J; Chen, Ke
2007-11-15
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure. (c) 2007 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Beguet, Benoit; Guyon, Dominique; Boukir, Samia; Chehata, Nesrine
2014-10-01
The main goal of this study is to design a method to describe the structure of forest stands from Very High Resolution satellite imagery, relying on some typical variables such as crown diameter, tree height, trunk diameter, tree density and tree spacing. The emphasis is placed on the automatization of the process of identification of the most relevant image features for the forest structure retrieval task, exploiting both spectral and spatial information. Our approach is based on linear regressions between the forest structure variables to be estimated and various spectral and Haralick's texture features. The main drawback of this well-known texture representation is the underlying parameters which are extremely difficult to set due to the spatial complexity of the forest structure. To tackle this major issue, an automated feature selection process is proposed which is based on statistical modeling, exploring a wide range of parameter values. It provides texture measures of diverse spatial parameters hence implicitly inducing a multi-scale texture analysis. A new feature selection technique, we called Random PRiF, is proposed. It relies on random sampling in feature space, carefully addresses the multicollinearity issue in multiple-linear regression while ensuring accurate prediction of forest variables. Our automated forest variable estimation scheme was tested on Quickbird and Pléiades panchromatic and multispectral images, acquired at different periods on the maritime pine stands of two sites in South-Western France. It outperforms two well-established variable subset selection techniques. It has been successfully applied to identify the best texture features in modeling the five considered forest structure variables. The RMSE of all predicted forest variables is improved by combining multispectral and panchromatic texture features, with various parameterizations, highlighting the potential of a multi-resolution approach for retrieving forest structure variables from VHR satellite images. Thus an average prediction error of ˜ 1.1 m is expected on crown diameter, ˜ 0.9 m on tree spacing, ˜ 3 m on height and ˜ 0.06 m on diameter at breast height.
Mwangi, Benson; Ebmeier, Klaus P; Matthews, Keith; Steele, J Douglas
2012-05-01
Quantitative abnormalities of brain structure in patients with major depressive disorder have been reported at a group level for decades. However, these structural differences appear subtle in comparison with conventional radiologically defined abnormalities, with considerable inter-subject variability. Consequently, it has not been possible to readily identify scans from patients with major depressive disorder at an individual level. Recently, machine learning techniques such as relevance vector machines and support vector machines have been applied to predictive classification of individual scans with variable success. Here we describe a novel hybrid method, which combines machine learning with feature selection and characterization, with the latter aimed at maximizing the accuracy of machine learning prediction. The method was tested using a multi-centre dataset of T(1)-weighted 'structural' scans. A total of 62 patients with major depressive disorder and matched controls were recruited from referred secondary care clinical populations in Aberdeen and Edinburgh, UK. The generalization ability and predictive accuracy of the classifiers was tested using data left out of the training process. High prediction accuracy was achieved (~90%). While feature selection was important for maximizing high predictive accuracy with machine learning, feature characterization contributed only a modest improvement to relevance vector machine-based prediction (~5%). Notably, while the only information provided for training the classifiers was T(1)-weighted scans plus a categorical label (major depressive disorder versus controls), both relevance vector machine and support vector machine 'weighting factors' (used for making predictions) correlated strongly with subjective ratings of illness severity. These results indicate that machine learning techniques have the potential to inform clinical practice and research, as they can make accurate predictions about brain scan data from individual subjects. Furthermore, machine learning weighting factors may reflect an objective biomarker of major depressive disorder illness severity, based on abnormalities of brain structure.
Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui
2012-11-07
RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.
Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.
Funk, Christopher S; Kahanda, Indika; Ben-Hur, Asa; Verspoor, Karin M
2015-01-01
Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.
Tadayyon, Hadi; Sannachi, Lakshmanan; Gangeh, Mehrdad J.; Kim, Christina; Ghandi, Sonal; Trudeau, Maureen; Pritchard, Kathleen; Tran, William T.; Slodkowska, Elzbieta; Sadeghi-Naini, Ali; Czarnota, Gregory J.
2017-01-01
Quantitative ultrasound (QUS) can probe tissue structure and analyze tumour characteristics. Using a 6-MHz ultrasound system, radiofrequency data were acquired from 56 locally advanced breast cancer patients prior to their neoadjuvant chemotherapy (NAC) and QUS texture features were computed from regions of interest in tumour cores and their margins as potential predictive and prognostic indicators. Breast tumour molecular features were also collected and used for analysis. A multiparametric QUS model was constructed, which demonstrated a response prediction accuracy of 88% and ability to predict patient 5-year survival rates (p = 0.01). QUS features demonstrated superior performance in comparison to molecular markers and the combination of QUS and molecular markers did not improve response prediction. This study demonstrates, for the first time, that non-invasive QUS features in the core and margin of breast tumours can indicate breast cancer response to neoadjuvant chemotherapy (NAC) and predict five-year recurrence-free survival. PMID:28401902
Tadayyon, Hadi; Sannachi, Lakshmanan; Gangeh, Mehrdad J; Kim, Christina; Ghandi, Sonal; Trudeau, Maureen; Pritchard, Kathleen; Tran, William T; Slodkowska, Elzbieta; Sadeghi-Naini, Ali; Czarnota, Gregory J
2017-04-12
Quantitative ultrasound (QUS) can probe tissue structure and analyze tumour characteristics. Using a 6-MHz ultrasound system, radiofrequency data were acquired from 56 locally advanced breast cancer patients prior to their neoadjuvant chemotherapy (NAC) and QUS texture features were computed from regions of interest in tumour cores and their margins as potential predictive and prognostic indicators. Breast tumour molecular features were also collected and used for analysis. A multiparametric QUS model was constructed, which demonstrated a response prediction accuracy of 88% and ability to predict patient 5-year survival rates (p = 0.01). QUS features demonstrated superior performance in comparison to molecular markers and the combination of QUS and molecular markers did not improve response prediction. This study demonstrates, for the first time, that non-invasive QUS features in the core and margin of breast tumours can indicate breast cancer response to neoadjuvant chemotherapy (NAC) and predict five-year recurrence-free survival.
Meng, Jun; Liu, Dong; Sun, Chao; Luan, Yushi
2014-12-30
MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown that miRNAs are involved in biological responses to a variety of biotic and abiotic stresses. Identification of these molecules and their targets can aid the understanding of regulatory processes. Recently, prediction methods based on machine learning have been widely used for miRNA prediction. However, most of these methods were designed for mammalian miRNA prediction, and few are available for predicting miRNAs in the pre-miRNAs of specific plant species. Although the complete Solanum lycopersicum genome has been published, only 77 Solanum lycopersicum miRNAs have been identified, far less than the estimated number. Therefore, it is essential to develop a prediction method based on machine learning to identify new plant miRNAs. A novel classification model based on a support vector machine (SVM) was trained to identify real and pseudo plant pre-miRNAs together with their miRNAs. An initial set of 152 novel features related to sequential structures was used to train the model. By applying feature selection, we obtained the best subset of 47 features for use with the Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) method for the classification of plant pre-miRNAs. Using this method, 63 features were obtained for plant miRNA classification. We then developed an integrated classification model, miPlantPreMat, which comprises MiPlantPre and MiPlantMat, to identify plant pre-miRNAs and their miRNAs. This model achieved approximately 90% accuracy using plant datasets from nine plant species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, Arabidopsis lyrata, Zea mays and Solanum lycopersicum. Using miPlantPreMat, 522 Solanum lycopersicum miRNAs were identified in the Solanum lycopersicum genome sequence. We developed an integrated classification model, miPlantPreMat, based on structure-sequence features and SVM. MiPlantPreMat was used to identify both plant pre-miRNAs and the corresponding mature miRNAs. An improved feature selection method was proposed, resulting in high classification accuracy, sensitivity and specificity.
Fox, Charles W; Burns, C Sean
2015-01-01
A poorly chosen article title may make a paper difficult to discover or discourage readership when discovered, reducing an article's impact. Yet, it is unclear how the structure of a manuscript's title influences readership and impact. We used manuscript tracking data for all manuscripts submitted to the journal Functional Ecology from 2004 to 2013 and citation data for papers published in this journal from 1987 to 2011 to examine how title features changed and whether a manuscript's title structure was predictive of success during the manuscript review process and/or impact (citation) after publication. Titles of manuscripts submitted to Functional Ecology became marginally longer (after controlling for other variables), broader in focus (less frequent inclusion of genus and species names), and included more humor and subtitles over the period of the study. Papers with subtitles were less likely to be rejected by editors both pre- and post-peer review, although both effects were small and the presence of subtitles in published papers was not predictive of citations. Papers with specific names of study organisms in their titles fared poorly during editorial (but not peer) review and, if published, were less well cited than papers whose titles did not include specific names. Papers with intermediate length titles were more successful during editorial review, although the effect was small and title word count was not predictive of citations. No features of titles were predictive of reviewer willingness to review papers or the length of time a paper was in peer review. We conclude that titles have changed in structure over time, but features of title structure have only small or no relationship with success during editorial review and post-publication impact. The title feature that was most predictive of manuscript success: papers whose titles emphasize broader conceptual or comparative issues fare better both pre- and post-publication than do papers with organism-specific titles. PMID:26045949
Fox, Charles W; Burns, C Sean
2015-05-01
A poorly chosen article title may make a paper difficult to discover or discourage readership when discovered, reducing an article's impact. Yet, it is unclear how the structure of a manuscript's title influences readership and impact. We used manuscript tracking data for all manuscripts submitted to the journal Functional Ecology from 2004 to 2013 and citation data for papers published in this journal from 1987 to 2011 to examine how title features changed and whether a manuscript's title structure was predictive of success during the manuscript review process and/or impact (citation) after publication. Titles of manuscripts submitted to Functional Ecology became marginally longer (after controlling for other variables), broader in focus (less frequent inclusion of genus and species names), and included more humor and subtitles over the period of the study. Papers with subtitles were less likely to be rejected by editors both pre- and post-peer review, although both effects were small and the presence of subtitles in published papers was not predictive of citations. Papers with specific names of study organisms in their titles fared poorly during editorial (but not peer) review and, if published, were less well cited than papers whose titles did not include specific names. Papers with intermediate length titles were more successful during editorial review, although the effect was small and title word count was not predictive of citations. No features of titles were predictive of reviewer willingness to review papers or the length of time a paper was in peer review. We conclude that titles have changed in structure over time, but features of title structure have only small or no relationship with success during editorial review and post-publication impact. The title feature that was most predictive of manuscript success: papers whose titles emphasize broader conceptual or comparative issues fare better both pre- and post-publication than do papers with organism-specific titles.
Qi, Miao; Wang, Ting; Yi, Yugen; Gao, Na; Kong, Jun; Wang, Jianzhong
2017-04-01
Feature selection has been regarded as an effective tool to help researchers understand the generating process of data. For mining the synthesis mechanism of microporous AlPOs, this paper proposes a novel feature selection method by joint l 2,1 norm and Fisher discrimination constraints (JNFDC). In order to obtain more effective feature subset, the proposed method can be achieved in two steps. The first step is to rank the features according to sparse and discriminative constraints. The second step is to establish predictive model with the ranked features, and select the most significant features in the light of the contribution of improving the predictive accuracy. To the best of our knowledge, JNFDC is the first work which employs the sparse representation theory to explore the synthesis mechanism of six kinds of pore rings. Numerical simulations demonstrate that our proposed method can select significant features affecting the specified structural property and improve the predictive accuracy. Moreover, comparison results show that JNFDC can obtain better predictive performances than some other state-of-the-art feature selection methods. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
A feature-based approach to modeling protein-protein interaction hot spots.
Cho, Kyu-il; Kim, Dongsup; Lee, Doheon
2009-05-01
Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to pi-related interactions, especially pi . . . pi interactions.
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
NASA Astrophysics Data System (ADS)
Draghici, Sorin; Cumberland, Lonnie T., Jr.; Kovari, Ladislau C.
2000-04-01
This paper presents some results of data mining HIV genotypic and structural data. Our aim is to try to relate structural features of HIV enzymes essential to its reproductive abilities to the drug resistance phenomenon. This paper concentrates on the HIV protease enzyme and Indinavir which is one of the FDA approved protease inhibitors. Our starting point was the current list of HIV mutations related to drug resistance. We used the fact that some molecular structures determined through high resolution X-ray crystallography were available for the protease-Indinavir complex. Starting with these structures and the known mutations, we modelled the mutant proteases and studied the pattern of atomic contacts between the protease and the drug. After suitable pre- processing, these patterns have been used as the input of our data mining process. We have used both supervised and unsupervised learning techniques with the aim of understanding the relationship between structural features at a molecular level and resistance to Indinavir. The supervised learning was aimed at predicting IC90 values for arbitrary mutants. The SOFM was aimed at identifying those structural features that are important for drug resistance and discovering a classifier based on such features. We have used validation and cross validation to test the generalization abilities of the learning paradigm we have designed. The straightforward supervised learning was able to learn very successfully but validation results are less than satisfactory. This is due to the insufficient number of patterns in the training set which in turn is due to the scarcity of the available data. The data mining using SOFM was very successful. We have managed to distinguish between resistant and non-resistant mutants using structural features. We have been able to divide all reported HIV mutants into several categories based on their 3- dimensional molecular structures and the pattern of contacts between the mutant protease and Indinavir. Our classifier shows reasonably good prediction performance being able to predict the drug resistance of previously unseen mutants with an accuracy of between 60% and 70%. We believe that this performance can be greatly improved once more data becomes available. The results presented here support the hypothesis that structural features of the molecular structure can be used in antiviral drug treatment selection and drug design.
Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou
2016-01-01
The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646
SEGMENTATION OF MITOCHONDRIA IN ELECTRON MICROSCOPY IMAGES USING ALGEBRAIC CURVES.
Seyedhosseini, Mojtaba; Ellisman, Mark H; Tasdizen, Tolga
2013-01-01
High-resolution microscopy techniques have been used to generate large volumes of data with enough details for understanding the complex structure of the nervous system. However, automatic techniques are required to segment cells and intracellular structures in these multi-terabyte datasets and make anatomical analysis possible on a large scale. We propose a fully automated method that exploits both shape information and regional statistics to segment irregularly shaped intracellular structures such as mitochondria in electron microscopy (EM) images. The main idea is to use algebraic curves to extract shape features together with texture features from image patches. Then, these powerful features are used to learn a random forest classifier, which can predict mitochondria locations precisely. Finally, the algebraic curves together with regional information are used to segment the mitochondria at the predicted locations. We demonstrate that our method outperforms the state-of-the-art algorithms in segmentation of mitochondria in EM images.
Adhikari, Badri; Hou, Jie; Cheng, Jianlin
2018-03-01
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.
What are the structural features that drive partitioning of proteins in aqueous two-phase systems?
Wu, Zhonghua; Hu, Gang; Wang, Kui; Zaslavsky, Boris Yu; Kurgan, Lukasz; Uversky, Vladimir N
2017-01-01
Protein partitioning in aqueous two-phase systems (ATPSs) represents a convenient, inexpensive, and easy to scale-up protein separation technique. Since partition behavior of a protein dramatically depends on an ATPS composition, it would be highly beneficial to have reliable means for (even qualitative) prediction of partitioning of a target protein under different conditions. Our aim was to understand which structural features of proteins contribute to partitioning of a query protein in a given ATPS. We undertook a systematic empirical analysis of relations between 57 numerical structural descriptors derived from the corresponding amino acid sequences and crystal structures of 10 well-characterized proteins and the partition behavior of these proteins in 29 different ATPSs. This analysis revealed that just a few structural characteristics of proteins can accurately determine behavior of these proteins in a given ATPS. However, partition behavior of proteins in different ATPSs relies on different structural features. In other words, we could not find a unique set of protein structural features derived from their crystal structures that could be used for the description of the protein partition behavior of all proteins in all ATPSs analyzed in this study. We likely need to gain better insight into relationships between protein-solvent interactions and protein structure peculiarities, in particular given limitations of the used here crystal structures, to be able to construct a model that accurately predicts protein partition behavior across all ATPSs. Copyright © 2016 Elsevier B.V. All rights reserved.
Predicting the performance of fingerprint similarity searching.
Vogt, Martin; Bajorath, Jürgen
2011-01-01
Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the "background." By quantifying the difference in feature distribution using the Kullback-Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.
High Precision Prediction of Functional Sites in Protein Structures
Buturovic, Ljubomir; Wong, Mike; Tang, Grace W.; Altman, Russ B.; Petkovic, Dragutin
2014-01-01
We address the problem of assigning biological function to solved protein structures. Computational tools play a critical role in identifying potential active sites and informing screening decisions for further lab analysis. A critical parameter in the practical application of computational methods is the precision, or positive predictive value. Precision measures the level of confidence the user should have in a particular computed functional assignment. Low precision annotations lead to futile laboratory investigations and waste scarce research resources. In this paper we describe an advanced version of the protein function annotation system FEATURE, which achieved 99% precision and average recall of 95% across 20 representative functional sites. The system uses a Support Vector Machine classifier operating on the microenvironment of physicochemical features around an amino acid. We also compared performance of our method with state-of-the-art sequence-level annotator Pfam in terms of precision, recall and localization. To our knowledge, no other functional site annotator has been rigorously evaluated against these key criteria. The software and predictive models are incorporated into the WebFEATURE service at http://feature.stanford.edu/wf4.0-beta. PMID:24632601
A feature-based developmental model of the infant brain in structural MRI.
Toews, Matthew; Wells, William M; Zöllei, Lilla
2012-01-01
In this paper, anatomical development is modeled as a collection of distinctive image patterns localized in space and time. A Bayesian posterior probability is defined over a random variable of subject age, conditioned on data in the form of scale-invariant image features. The model is automatically learned from a large set of images exhibiting significant variation, used to discover anatomical structure related to age and development, and fit to new images to predict age. The model is applied to a set of 230 infant structural MRIs of 92 subjects acquired at multiple sites over an age range of 8-590 days. Experiments demonstrate that the model can be used to identify age-related anatomical structure, and to predict the age of new subjects with an average error of 72 days.
Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan
2015-06-01
Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Computer vision system for egg volume prediction using backpropagation neural network
NASA Astrophysics Data System (ADS)
Siswantoro, J.; Hilman, M. Y.; Widiasri, M.
2017-11-01
Volume is one of considered aspects in egg sorting process. A rapid and accurate volume measurement method is needed to develop an egg sorting system. Computer vision system (CVS) provides a promising solution for volume measurement problem. Artificial neural network (ANN) has been used to predict the volume of egg in several CVSs. However, volume prediction from ANN could have less accuracy due to inappropriate input features or inappropriate ANN structure. This paper proposes a CVS for predicting the volume of egg using ANN. The CVS acquired an image of egg from top view and then processed the image to extract its 1D and 2 D size features. The features were used as input for ANN in predicting the volume of egg. The experiment results show that the proposed CSV can predict the volume of egg with a good accuracy and less computation time.
A Method for WD40 Repeat Detection and Secondary Structure Prediction
Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong
2013-01-01
WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530
Sparse Zero-Sum Games as Stable Functional Feature Selection
Sokolovska, Nataliya; Teytaud, Olivier; Rizkalla, Salwa; Clément, Karine; Zucker, Jean-Daniel
2015-01-01
In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints. PMID:26325268
A feature-based approach to modeling protein–protein interaction hot spots
Cho, Kyu-il; Kim, Dongsup; Lee, Doheon
2009-01-01
Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions. PMID:19273533
Analysis of Physicochemical and Structural Properties Determining HIV-1 Coreceptor Usage
Bozek, Katarzyna; Lengauer, Thomas; Sierra, Saleta; Kaiser, Rolf; Domingues, Francisco S.
2013-01-01
The relationship of HIV tropism with disease progression and the recent development of CCR5-blocking drugs underscore the importance of monitoring virus coreceptor usage. As an alternative to costly phenotypic assays, computational methods aim at predicting virus tropism based on the sequence and structure of the V3 loop of the virus gp120 protein. Here we present a numerical descriptor of the V3 loop encoding its physicochemical and structural properties. The descriptor allows for structure-based prediction of HIV tropism and identification of properties of the V3 loop that are crucial for coreceptor usage. Use of the proposed descriptor for prediction results in a statistically significant improvement over the prediction based solely on V3 sequence with 3 percentage points improvement in AUC and 7 percentage points in sensitivity at the specificity of the 11/25 rule (95%). We additionally assessed the predictive power of the new method on clinically derived ‘bulk’ sequence data and obtained a statistically significant improvement in AUC of 3 percentage points over sequence-based prediction. Furthermore, we demonstrated the capacity of our method to predict therapy outcome by applying it to 53 samples from patients undergoing Maraviroc therapy. The analysis of structural features of the loop informative of tropism indicates the importance of two loop regions and their physicochemical properties. The regions are located on opposite strands of the loop stem and the respective features are predominantly charge-, hydrophobicity- and structure-related. These regions are in close proximity in the bound conformation of the loop potentially forming a site determinant for the coreceptor binding. The method is available via server under http://structure.bioinf.mpi-inf.mpg.de/. PMID:23555214
In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features.
Ding, Yiliang; Tang, Yin; Kwok, Chun Kit; Zhang, Yu; Bevilacqua, Philip C; Assmann, Sarah M
2014-01-30
RNA structure has critical roles in processes ranging from ligand sensing to the regulation of translation, polyadenylation and splicing. However, a lack of genome-wide in vivo RNA structural data has limited our understanding of how RNA structure regulates gene expression in living cells. Here we present a high-throughput, genome-wide in vivo RNA structure probing method, structure-seq, in which dimethyl sulphate methylation of unprotected adenines and cytosines is identified by next-generation sequencing. Application of this method to Arabidopsis thaliana seedlings yielded the first in vivo genome-wide RNA structure map at nucleotide resolution for any organism, with quantitative structural information across more than 10,000 transcripts. Our analysis reveals a three-nucleotide periodic repeat pattern in the structure of coding regions, as well as a less-structured region immediately upstream of the start codon, and shows that these features are strongly correlated with translation efficiency. We also find patterns of strong and weak secondary structure at sites of alternative polyadenylation, as well as strong secondary structure at 5' splice sites that correlates with unspliced events. Notably, in vivo structures of messenger RNAs annotated for stress responses are poorly predicted in silico, whereas mRNA structures of genes related to cell function maintenance are well predicted. Global comparison of several structural features between these two categories shows that the mRNAs associated with stress responses tend to have more single-strandedness, longer maximal loop length and higher free energy per nucleotide, features that may allow these RNAs to undergo conformational changes in response to environmental conditions. Structure-seq allows the RNA structurome and its biological roles to be interrogated on a genome-wide scale and should be applicable to any organism.
XtalOpt version r9: An open-source evolutionary algorithm for crystal structure prediction
Falls, Zackary; Lonie, David C.; Avery, Patrick; ...
2015-10-23
This is a new version of XtalOpt, an evolutionary algorithm for crystal structure prediction available for download from the CPC library or the XtalOpt website, http://xtalopt.github.io. XtalOpt is published under the Gnu Public License (GPL), which is an open source license that is recognized by the Open Source Initiative. We have detailed the new version incorporates many bug-fixes and new features here and predict the crystal structure of a system from its stoichiometry alone, using evolutionary algorithms.
NASA Astrophysics Data System (ADS)
Krishnamurthy, Narayanan; Maddali, Siddharth; Romanov, Vyacheslav; Hawk, Jeffrey
We present some structural properties of multi-component steel alloys as predicted by a random forest machine-learning model. These non-parametric models are trained on high-dimensional data sets defined by features such as chemical composition, pre-processing temperatures and environmental influences, the latter of which are based upon standardized testing procedures for tensile, creep and rupture properties as defined by the American Society of Testing and Materials (ASTM). We quantify the goodness of fit of these models as well as the inferred relative importance of each of these features, all with a conveniently defined metric and scale. The models are tested with synthetic data points, generated subject to the appropriate mathematical constraints for the various features. By this we highlight possible trends in the increase or degradation of the structural properties with perturbations in the features of importance. This work is presented as part of the Data Science Initiative at the National Energy Technology Laboratory, directed specifically towards the computational design of steel alloys.
An Efficient Scheme for Crystal Structure Prediction Based on Structural Motifs
Zhu, Zizhong; Wu, Ping; Wu, Shunqing; ...
2017-05-15
An efficient scheme based on structural motifs is proposed for the crystal structure prediction of materials. The key advantage of the present method comes in two fold: first, the degrees of freedom of the system are greatly reduced, since each structural motif, regardless of its size, can always be described by a set of parameters (R, θ, φ) with five degrees of freedom; second, the motifs could always appear in the predicted structures when the energies of the structures are relatively low. Both features make the present scheme a very efficient method for predicting desired materials. The method has beenmore » applied to the case of LiFePO 4, an important cathode material for lithium-ion batteries. Numerous new structures of LiFePO 4 have been found, compared to those currently available, available, demonstrating the reliability of the present methodology and illustrating the promise of the concept of structural motifs.« less
An Efficient Scheme for Crystal Structure Prediction Based on Structural Motifs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Zizhong; Wu, Ping; Wu, Shunqing
An efficient scheme based on structural motifs is proposed for the crystal structure prediction of materials. The key advantage of the present method comes in two fold: first, the degrees of freedom of the system are greatly reduced, since each structural motif, regardless of its size, can always be described by a set of parameters (R, θ, φ) with five degrees of freedom; second, the motifs could always appear in the predicted structures when the energies of the structures are relatively low. Both features make the present scheme a very efficient method for predicting desired materials. The method has beenmore » applied to the case of LiFePO 4, an important cathode material for lithium-ion batteries. Numerous new structures of LiFePO 4 have been found, compared to those currently available, available, demonstrating the reliability of the present methodology and illustrating the promise of the concept of structural motifs.« less
NASA Astrophysics Data System (ADS)
Li, Yane; Fan, Ming; Li, Lihua; Zheng, Bin
2017-03-01
This study proposed a near-term breast cancer risk assessment model based on local region bilateral asymmetry features in Mammography. The database includes 566 cases who underwent at least two sequential FFDM examinations. The `prior' examination in the two series all interpreted as negative (not recalled). In the "current" examination, 283 women were diagnosed cancers and 283 remained negative. Age of cancers and negative cases completely matched. These cases were divided into three subgroups according to age: 152 cases among the 37-49 age-bracket, 220 cases in the age-bracket 50- 60, and 194 cases with the 61-86 age-bracket. For each image, two local regions including strip-based regions and difference-of-Gaussian basic element regions were segmented. After that, structural variation features among pixel values and structural similarity features were computed for strip regions. Meanwhile, positional features were extracted for basic element regions. The absolute subtraction value was computed between each feature of the left and right local-regions. Next, a multi-layer perception classifier was implemented to assess performance of features for prediction. Features were then selected according stepwise regression analysis. The AUC achieved 0.72, 0.75 and 0.71 for these 3 age-based subgroups, respectively. The maximum adjustable odds ratios were 12.4, 20.56 and 4.91 for these three groups, respectively. This study demonstrate that the local region-based bilateral asymmetry features extracted from CC-view mammography could provide useful information to predict near-term breast cancer risk.
Wang, Tsang-Hsiu; Chu, Hsing-Yu; Wang, I-Teng
2014-10-15
The methyl 1-benzyl-1H-1,2,3-triazole-4-carboxylate (C11H11N3O2) has been studied by theoretically methods. The structure of this compound is optimized by density functional theory (DFT), the second-order Møller-Plesset perturbation theory (MP2) and G3 theory (G3(MP2)) levels. Our calculation results are in very good agreement with experimental values. Compared to a perfect pentagonal structure, the geometrical structures of C11H11N3O2 show a little distortion of 1,2,3-triazole ring due to the highly electronegativity of substitution groups. In addition, dipole moment and frontier molecular orbitals (FMOs) of the C11H11N3O2 are calculated as well. Because of solvent effect, the HOMO-LUMO energy gap in methanol is predicted to be smaller than in gas phase by 0.367eV. The simulated UV-vis spectra are investigated by time-dependent density functional theory (TD-DFT), and two obviously absorption features have been predicted. These two absorption features are located between 170nm and 210nm, which is in ultraviolet C range. Moreover, the UV absorption features in methanol are predicted to be more intense than in gas phase; besides, the red shift is predicted in methanol as well. Copyright © 2014 Elsevier B.V. All rights reserved.
A Feature-based Developmental Model of the Infant Brain in Structural MRI
Toews, Matthew; Wells, William M.; Zöllei, Lilla
2014-01-01
In this paper, anatomical development is modeled as a collection of distinctive image patterns localized in space and time. A Bayesian posterior probability is defined over a random variable of subject age, conditioned on data in the form of scale-invariant image features. The model is automatically learned from a large set of images exhibiting significant variation, used to discover anatomical structure related to age and development, and fit to new images to predict age. The model is applied to a set of 230 infant structural MRIs of 92 subjects acquired at multiple sites over an age range of 8-590 days. Experiments demonstrate that the model can be used to identify age-related anatomical structure, and to predict the age of new subjects with an average error of 72 days. PMID:23286050
Judycka-Proma, U; Bober, L; Gajewicz, A; Puzyn, T; Błażejowski, J
2015-03-05
Forty ampholytic compounds of biological and pharmaceutical relevance were subjected to chemometric analysis based on unsupervised and supervised learning algorithms. This enabled relations to be found between empirical spectral characteristics derived from electronic absorption data and structural and physicochemical parameters predicted by quantum chemistry methods or phenomenological relationships based on additivity rules. It was found that the energies of long wavelength absorption bands are correlated through multiparametric linear relationships with parameters reflecting the bulkiness features of the absorbing molecules as well as their nucleophilicity and electrophilicity. These dependences enable the quantitative analysis of spectral features of the compounds, as well as a comparison of their similarities and certain pharmaceutical and biological features. Three QSPR models to predict the energies of long-wavelength absorption in buffers with pH=2.5 and pH=7.0, as well as in methanol, were developed and validated in this study. These models can be further used to predict the long-wavelength absorption energies of untested substances (if they are structurally similar to the training compounds). Copyright © 2014 Elsevier B.V. All rights reserved.
Hayat, Maqsood; Khan, Asifullah
2013-05-01
Membrane protein is the prime constituent of a cell, which performs a role of mediator between intra and extracellular processes. The prediction of transmembrane (TM) helix and its topology provides essential information regarding the function and structure of membrane proteins. However, prediction of TM helix and its topology is a challenging issue in bioinformatics and computational biology due to experimental complexities and lack of its established structures. Therefore, the location and orientation of TM helix segments are predicted from topogenic sequences. In this regard, we propose WRF-TMH model for effectively predicting TM helix segments. In this model, information is extracted from membrane protein sequences using compositional index and physicochemical properties. The redundant and irrelevant features are eliminated through singular value decomposition. The selected features provided by these feature extraction strategies are then fused to develop a hybrid model. Weighted random forest is adopted as a classification approach. We have used two benchmark datasets including low and high-resolution datasets. tenfold cross validation is employed to assess the performance of WRF-TMH model at different levels including per protein, per segment, and per residue. The success rates of WRF-TMH model are quite promising and are the best reported so far on the same datasets. It is observed that WRF-TMH model might play a substantial role, and will provide essential information for further structural and functional studies on membrane proteins. The accompanied web predictor is accessible at http://111.68.99.218/WRF-TMH/ .
Combination of lateral and PA view radiographs to study development of knee OA and associated pain
NASA Astrophysics Data System (ADS)
Minciullo, Luca; Thomson, Jessie; Cootes, Timothy F.
2017-03-01
Knee Osteoarthritis (OA) is the most common form of arthritis, affecting millions of people around the world. The effects of the disease have been studied using the shape and texture features of bones in PosteriorAnterior (PA) and Lateral radiographs separately. In this work we compare the utility of features from each view, and evaluate whether combining features from both is advantageous. We built a fully automated system to independently locate landmark points in both radiographic images using Random Forest Constrained Local Models. We extracted discriminative features from the two bony outlines using Appearance Models. The features were used to train Random Forest classifiers to solve three specific tasks: (i) OA classification, distinguishing patients with structural signs of OA from the others; (ii) predicting future onset of the disease and (iii) predicting which patients with no current pain will have a positive pain score later in a follow-up visit. Using a subset of the MOST dataset we show that the PA view has more discriminative features to classify and predict OA, while the lateral view contains features that achieve better performance in predicting pain, and that combining the features from both views gives a small improvement in accuracy of the classification compared to the individual views.
NASA Astrophysics Data System (ADS)
Pandremmenou, K.; Shahid, M.; Kondi, L. P.; Lövström, B.
2015-03-01
In this work, we propose a No-Reference (NR) bitstream-based model for predicting the quality of H.264/AVC video sequences, affected by both compression artifacts and transmission impairments. The proposed model is based on a feature extraction procedure, where a large number of features are calculated from the packet-loss impaired bitstream. Many of the features are firstly proposed in this work, and the specific set of the features as a whole is applied for the first time for making NR video quality predictions. All feature observations are taken as input to the Least Absolute Shrinkage and Selection Operator (LASSO) regression method. LASSO indicates the most important features, and using only them, it is possible to estimate the Mean Opinion Score (MOS) with high accuracy. Indicatively, we point out that only 13 features are able to produce a Pearson Correlation Coefficient of 0.92 with the MOS. Interestingly, the performance statistics we computed in order to assess our method for predicting the Structural Similarity Index and the Video Quality Metric are equally good. Thus, the obtained experimental results verified the suitability of the features selected by LASSO as well as the ability of LASSO in making accurate predictions through sparse modeling.
Classification of AB O 3 perovskite solids: a machine learning study
Pilania, G.; Balachandran, P. V.; Gubernatis, J. E.; ...
2015-07-23
Here we explored the use of machine learning methods for classifying whether a particularABO 3chemistry forms a perovskite or non-perovskite structured solid. Starting with three sets of feature pairs (the tolerance and octahedral factors, theAandBionic radii relative to the radius of O, and the bond valence distances between theAandBions from the O atoms), we used machine learning to create a hyper-dimensional partial dependency structure plot using all three feature pairs or any two of them. Doing so increased the accuracy of our predictions by 2–3 percentage points over using any one pair. We also included the Mendeleev numbers of theAandBatomsmore » to this set of feature pairs. Moreover, doing this and using the capabilities of our machine learning algorithm, the gradient tree boosting classifier, enabled us to generate a new type of structure plot that has the simplicity of one based on using just the Mendeleev numbers, but with the added advantages of having a higher accuracy and providing a measure of likelihood of the predicted structure.« less
NASA Astrophysics Data System (ADS)
Jaenisch, Holger; Handley, James
2013-06-01
We introduce a generalized numerical prediction and forecasting algorithm. We have previously published it for malware byte sequence feature prediction and generalized distribution modeling for disparate test article analysis. We show how non-trivial non-periodic extrapolation of a numerical sequence (forecast and backcast) from the starting data is possible. Our ancestor-progeny prediction can yield new options for evolutionary programming. Our equations enable analytical integrals and derivatives to any order. Interpolation is controllable from smooth continuous to fractal structure estimation. We show how our generalized trigonometric polynomial can be derived using a Fourier transform.
Materials prediction via classification learning
Balachandran, Prasanna V.; Theiler, James; Rondinelli, James M.; ...
2015-08-25
In the paradigm of materials informatics for accelerated materials discovery, the choice of feature set (i.e. attributes that capture aspects of structure, chemistry and/or bonding) is critical. Ideally, the feature sets should provide a simple physical basis for extracting major structural and chemical trends and furthermore, enable rapid predictions of new material chemistries. Orbital radii calculated from model pseudopotential fits to spectroscopic data are potential candidates to satisfy these conditions. Although these radii (and their linear combinations) have been utilized in the past, their functional forms are largely justified with heuristic arguments. Here we show that machine learning methods naturallymore » uncover the functional forms that mimic most frequently used features in the literature, thereby providing a mathematical basis for feature set construction without a priori assumptions. We apply these principles to study two broad materials classes: (i) wide band gap AB compounds and (ii) rare earth-main group RM intermetallics. The AB compounds serve as a prototypical example to demonstrate our approach, whereas the RM intermetallics show how these concepts can be used to rapidly design new ductile materials. In conclusion, our predictive models indicate that ScCo, ScIr, and YCd should be ductile, whereas each was previously proposed to be brittle.« less
Materials Prediction via Classification Learning
Balachandran, Prasanna V.; Theiler, James; Rondinelli, James M.; Lookman, Turab
2015-01-01
In the paradigm of materials informatics for accelerated materials discovery, the choice of feature set (i.e. attributes that capture aspects of structure, chemistry and/or bonding) is critical. Ideally, the feature sets should provide a simple physical basis for extracting major structural and chemical trends and furthermore, enable rapid predictions of new material chemistries. Orbital radii calculated from model pseudopotential fits to spectroscopic data are potential candidates to satisfy these conditions. Although these radii (and their linear combinations) have been utilized in the past, their functional forms are largely justified with heuristic arguments. Here we show that machine learning methods naturally uncover the functional forms that mimic most frequently used features in the literature, thereby providing a mathematical basis for feature set construction without a priori assumptions. We apply these principles to study two broad materials classes: (i) wide band gap AB compounds and (ii) rare earth-main group RM intermetallics. The AB compounds serve as a prototypical example to demonstrate our approach, whereas the RM intermetallics show how these concepts can be used to rapidly design new ductile materials. Our predictive models indicate that ScCo, ScIr, and YCd should be ductile, whereas each was previously proposed to be brittle. PMID:26304800
Barik, Amita; Das, Santasabuj
2018-01-02
Small RNAs (sRNAs) in bacteria have emerged as key players in transcriptional and post-transcriptional regulation of gene expression. Here, we present a statistical analysis of different sequence- and structure-related features of bacterial sRNAs to identify the descriptors that could discriminate sRNAs from other bacterial RNAs. We investigated a comprehensive and heterogeneous collection of 816 sRNAs, identified by northern blotting across 33 bacterial species and compared their various features with other classes of bacterial RNAs, such as tRNAs, rRNAs and mRNAs. We observed that sRNAs differed significantly from the rest with respect to G+C composition, normalized minimum free energy of folding, motif frequency and several RNA-folding parameters like base-pairing propensity, Shannon entropy and base-pair distance. Based on the selected features, we developed a predictive model using Random Forests (RF) method to classify the above four classes of RNAs. Our model displayed an overall predictive accuracy of 89.5%. These findings would help to differentiate bacterial sRNAs from other RNAs and further promote prediction of novel sRNAs in different bacterial species.
Zhou, Jingyu; Tian, Shulin; Yang, Chenglin
2014-01-01
Few researches pay attention to prediction about analog circuits. The few methods lack the correlation with circuit analysis during extracting and calculating features so that FI (fault indicator) calculation often lack rationality, thus affecting prognostic performance. To solve the above problem, this paper proposes a novel prediction method about single components of analog circuits based on complex field modeling. Aiming at the feature that faults of single components hold the largest number in analog circuits, the method starts with circuit structure, analyzes transfer function of circuits, and implements complex field modeling. Then, by an established parameter scanning model related to complex field, it analyzes the relationship between parameter variation and degeneration of single components in the model in order to obtain a more reasonable FI feature set via calculation. According to the obtained FI feature set, it establishes a novel model about degeneration trend of analog circuits' single components. At last, it uses particle filter (PF) to update parameters for the model and predicts remaining useful performance (RUP) of analog circuits' single components. Since calculation about the FI feature set is more reasonable, accuracy of prediction is improved to some extent. Finally, the foregoing conclusions are verified by experiments.
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys. PMID:25148528
Non-parametric adaptative JPEG fragments carving
NASA Astrophysics Data System (ADS)
Amrouche, Sabrina Cherifa; Salamani, Dalila
2018-04-01
The most challenging JPEG recovery tasks arise when the file header is missing. In this paper we propose to use a two layer machine learning model to restore headerless JPEG images. We first build a classifier able to identify the structural properties of the images/fragments and then use an AutoEncoder (AE) to learn the fragment features for the header prediction. We define a JPEG universal header and the remaining free image parameters (Height, Width) are predicted with a Gradient Boosting Classifier. Our approach resulted in 90% accuracy using the manually defined features and 78% accuracy using the AE features.
The calcium binding properties and structure prediction of the Hax-1 protein.
Balcerak, Anna; Rowinski, Sebastian; Szafron, Lukasz M; Grzybowska, Ewa A
2017-01-01
Hax-1 is a protein involved in regulation of different cellular processes, but its properties and exact mechanisms of action remain unknown. In this work, using purified, recombinant Hax-1 and by applying an in vitro autoradiography assay we have shown that this protein binds Ca 2+ . Additionally, we performed structure prediction analysis which shows that Hax-1 displays definitive structural features, such as two α-helices, short β-strands and four disordered segments.
Protein structure based prediction of catalytic residues.
Fajardo, J Eduardo; Fiser, Andras
2013-02-22
Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.
Sun, Rongrong; Wang, Yuanyuan
2008-11-01
Predicting the spontaneous termination of the atrial fibrillation (AF) leads to not only better understanding of mechanisms of the arrhythmia but also the improved treatment of the sustained AF. A novel method is proposed to characterize the AF based on structure and the quantification of the recurrence plot (RP) to predict the termination of the AF. The RP of the electrocardiogram (ECG) signal is firstly obtained and eleven features are extracted to characterize its three basic patterns. Then the sequential forward search (SFS) algorithm and Davies-Bouldin criterion are utilized to select the feature subset which can predict the AF termination effectively. Finally, the multilayer perceptron (MLP) neural network is applied to predict the AF termination. An AF database which includes one training set and two testing sets (A and B) of Holter ECG recordings is studied. Experiment results show that 97% of testing set A and 95% of testing set B are correctly classified. It demonstrates that this algorithm has the ability to predict the spontaneous termination of the AF effectively.
Nano-textured high sensitivity ion sensitive field effect transistors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hajmirzaheydarali, M.; Sadeghipari, M.; Akbari, M.
2016-02-07
Nano-textured gate engineered ion sensitive field effect transistors (ISFETs), suitable for high sensitivity pH sensors, have been realized. Utilizing a mask-less deep reactive ion etching results in ultra-fine poly-Si features on the gate of ISFET devices where spacing of the order of 10 nm and less is achieved. Incorporation of these nano-sized features on the gate is responsible for high sensitivities up to 400 mV/pH in contrast to conventional planar structures. The fabrication process for this transistor is inexpensive, and it is fully compatible with standard complementary metal oxide semiconductor fabrication procedure. A theoretical modeling has also been presented to predict themore » extension of the diffuse layer into the electrolyte solution for highly featured structures and to correlate this extension with the high sensitivity of the device. The observed ultra-fine features by means of scanning electron microscopy and transmission electron microscopy tools corroborate the theoretical prediction.« less
Zhou, Wengang; Dickerson, Julie A
2012-01-01
Knowledge of protein subcellular locations can help decipher a protein's biological function. This work proposes new features: sequence-based: Hybrid Amino Acid Pair (HAAP) and two structure-based: Secondary Structural Element Composition (SSEC) and solvent accessibility state frequency. A multi-class Support Vector Machine is developed to predict the locations. Testing on two established data sets yields better prediction accuracies than the best available systems. Comparisons with existing methods show comparable results to ESLPred2. When StruLocPred is applied to the entire Arabidopsis proteome, over 77% of proteins with known locations match the prediction results. An implementation of this system is at http://wgzhou.ece. iastate.edu/StruLocPred/.
Nagarajan, Mahesh B.; De, Titas; Lochmüller, Eva-Maria; Eckstein, Felix; Wismüller, Axel
2017-01-01
The ability of Anisotropic Minkowski Functionals (AMFs) to capture local anisotropy while evaluating topological properties of the underlying gray-level structures has been previously demonstrated. We evaluate the ability of this approach to characterize local structure properties of trabecular bone micro-architecture in ex vivo proximal femur specimens, as visualized on multi-detector CT, for purposes of biomechanical bone strength prediction. To this end, volumetric AMFs were computed locally for each voxel of volumes of interest (VOI) extracted from the femoral head of 146 specimens. The local anisotropy captured by such AMFs was quantified using a fractional anisotropy measure; the magnitude and direction of anisotropy at every pixel was stored in histograms that served as a feature vectors that characterized the VOIs. A linear multi-regression analysis algorithm was used to predict the failure load (FL) from the feature sets; the predicted FL was compared to the true FL determined through biomechanical testing. The prediction performance was measured by the root mean square error (RMSE) for each feature set. The best prediction performance was obtained from the fractional anisotropy histogram of AMF Euler Characteristic (RMSE = 1.01 ± 0.13), which was significantly better than MDCT-derived mean BMD (RMSE = 1.12 ± 0.16, p<0.05). We conclude that such anisotropic Minkowski Functionals can capture valuable information regarding regional trabecular bone quality and contribute to improved bone strength prediction, which is important for improving the clinical assessment of osteoporotic fracture risk. PMID:29170581
Protein single-model quality assessment by feature-based probability density functions.
Cao, Renzhi; Cheng, Jianlin
2016-04-04
Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method-Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.
Evaluation of 3D-Jury on CASP7 models.
Kaján, László; Rychlewski, Leszek
2007-08-21
3D-Jury, the structure prediction consensus method publicly available in the Meta Server http://meta.bioinfo.pl/, was evaluated using models gathered in the 7th round of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers. The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models. The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature http://meta.bioinfo.pl/compare_your_model_example.pl available in the Meta Server.
Evaluation of 3D-Jury on CASP7 models
Kaján, László; Rychlewski, Leszek
2007-01-01
Background 3D-Jury, the structure prediction consensus method publicly available in the Meta Server , was evaluated using models gathered in the 7th round of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers. Results The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models. Conclusion The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature available in the Meta Server. PMID:17711571
Lattice-free prediction of three-dimensional structure of programmed DNA assemblies
Pan, Keyao; Kim, Do-Nyun; Zhang, Fei; Adendorff, Matthew R.; Yan, Hao; Bathe, Mark
2014-01-01
DNA can be programmed to self-assemble into high molecular weight 3D assemblies with precise nanometer-scale structural features. Although numerous sequence design strategies exist to realize these assemblies in solution, there is currently no computational framework to predict their 3D structures on the basis of programmed underlying multi-way junction topologies constrained by DNA duplexes. Here, we introduce such an approach and apply it to assemblies designed using the canonical immobile four-way junction. The procedure is used to predict the 3D structure of high molecular weight planar and spherical ring-like origami objects, a tile-based sheet-like ribbon, and a 3D crystalline tensegrity motif, in quantitative agreement with experiments. Our framework provides a new approach to predict programmed nucleic acid 3D structure on the basis of prescribed secondary structure motifs, with possible application to the design of such assemblies for use in biomolecular and materials science. PMID:25470497
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.
Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E
2018-04-25
Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
Ding, Feng; Sharma, Shantanu; Chalasani, Poornima; Demidov, Vadim V.; Broude, Natalia E.; Dokholyan, Nikolay V.
2008-01-01
RNA molecules with novel functions have revived interest in the accurate prediction of RNA three-dimensional (3D) structure and folding dynamics. However, existing methods are inefficient in automated 3D structure prediction. Here, we report a robust computational approach for rapid folding of RNA molecules. We develop a simplified RNA model for discrete molecular dynamics (DMD) simulations, incorporating base-pairing and base-stacking interactions. We demonstrate correct folding of 150 structurally diverse RNA sequences. The majority of DMD-predicted 3D structures have <4 Å deviations from experimental structures. The secondary structures corresponding to the predicted 3D structures consist of 94% native base-pair interactions. Folding thermodynamics and kinetics of tRNAPhe, pseudoknots, and mRNA fragments in DMD simulations are in agreement with previous experimental findings. Folding of RNA molecules features transient, non-native conformations, suggesting non-hierarchical RNA folding. Our method allows rapid conformational sampling of RNA folding, with computational time increasing linearly with RNA length. We envision this approach as a promising tool for RNA structural and functional analyses. PMID:18456842
Tensor-driven extraction of developmental features from varying paediatric EEG datasets.
Kinney-Lang, Eli; Spyrou, Loukianos; Ebied, Ahmed; Chin, Richard Fm; Escudero, Javier
2018-05-21
Constant changes in developing children's brains can pose a challenge in EEG dependant technologies. Advancing signal processing methods to identify developmental differences in paediatric populations could help improve function and usability of such technologies. Taking advantage of the multi-dimensional structure of EEG data through tensor analysis may offer a framework for extracting relevant developmental features of paediatric datasets. A proof of concept is demonstrated through identifying latent developmental features in resting-state EEG. Approach. Three paediatric datasets (n = 50, 17, 44) were analyzed using a two-step constrained parallel factor (PARAFAC) tensor decomposition. Subject age was used as a proxy measure of development. Classification used support vector machines (SVM) to test if PARAFAC identified features could predict subject age. The results were cross-validated within each dataset. Classification analysis was complemented by visualization of the high-dimensional feature structures using t-distributed Stochastic Neighbour Embedding (t-SNE) maps. Main Results. Development-related features were successfully identified for the developmental conditions of each dataset. SVM classification showed the identified features could accurately predict subject at a significant level above chance for both healthy and impaired populations. t-SNE maps revealed suitable tensor factorization was key in extracting the developmental features. Significance. The described methods are a promising tool for identifying latent developmental features occurring throughout childhood EEG. © 2018 IOP Publishing Ltd.
Predicting film genres with implicit ideals.
Olney, Andrew McGregor
2012-01-01
We present a new approach to defining film genre based on implicit ideals. When viewers rate the likability of a film, they indirectly express their ideal of what a film should be. Across six studies we investigate the category structure that emerges from likability ratings and the category structure that emerges from the features of film. We further compare these data-driven category structures with human annotated film genres. We conclude that film genres are structured more around ideals than around features of film. This finding lends experimental support to the notion that film genres are set of shifting, fuzzy, and highly contextualized psychological categories.
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
NASA Astrophysics Data System (ADS)
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I
2018-04-14
Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations. Copyright © 2018 Elsevier Ltd. All rights reserved.
Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu
2012-12-01
Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.
REVIEWS OF TOPICAL PROBLEMS: Prediction and discovery of new structures in spiral galaxies
NASA Astrophysics Data System (ADS)
Fridman, Aleksei M.
2007-02-01
A review is given of the last 20 years of published research into the nature, origin mechanisms, and observed features of spiral-vortex structures found in galaxies. The so-called rotating shallow water experiments are briefly discussed, carried out with a facility designed by the present author and built at the Russian Scientific Center 'Kurchatov Institute' to model the origin of galactic spiral structures. The discovery of new vortex-anticyclone structures in these experiments stimulated searching for them astronomically using the RAS Special Astrophysical Observatory's 6-meter BTA optical telescope, formerly the world's and now Europe's largest. Seven years after the pioneering experiments, Afanasyev and the present author discovered the predicted giant anticyclones in the galaxy Mrk 1040 by using BTA. Somewhat later, the theoretical prediction of giant cyclones in spiral galaxies was made, also to be verified by BTA afterwards. To use the observed line-of-sight velocity field for reconstructing the 3D velocity vector distribution in a galactic disk, a method for solving a problem from the class of ill-posed astrophysical problems was developed by the present author and colleagues. In addition to the vortex structure, other new features were discovered — in particular, slow bars (another theoretical prediction), for whose discovery an observational test capable of distinguishing them from their earlier-studied normal (fast) counterparts was designed.
NASA Astrophysics Data System (ADS)
Christopher, Mark; Tang, Li; Fingert, John H.; Scheetz, Todd E.; Abramoff, Michael D.
2014-03-01
Evaluation of optic nerve head (ONH) structure is a commonly used clinical technique for both diagnosis and monitoring of glaucoma. Glaucoma is associated with characteristic changes in the structure of the ONH. We present a method for computationally identifying ONH structural features using both imaging and genetic data from a large cohort of participants at risk for primary open angle glaucoma (POAG). Using 1054 participants from the Ocular Hypertension Treatment Study, ONH structure was measured by application of a stereo correspondence algorithm to stereo fundus images. In addition, the genotypes of several known POAG genetic risk factors were considered for each participant. ONH structural features were discovered using both a principal component analysis approach to identify the major modes of variance within structural measurements and a linear discriminant analysis approach to capture the relationship between genetic risk factors and ONH structure. The identified ONH structural features were evaluated based on the strength of their associations with genotype and development of POAG by the end of the OHTS study. ONH structural features with strong associations with genotype were identified for each of the genetic loci considered. Several identified ONH structural features were significantly associated (p < 0.05) with the development of POAG after Bonferroni correction. Further, incorporation of genetic risk status was found to substantially increase performance of early POAG prediction. These results suggest incorporating both imaging and genetic data into ONH structural modeling significantly improves the ability to explain POAG-related changes to ONH structure.
Odor Impression Prediction from Mass Spectra.
Nozaki, Yuji; Nakamoto, Takamichi
2016-01-01
The sense of smell arises from the perception of odors from chemicals. However, the relationship between the impression of odor and the numerous physicochemical parameters has yet to be understood owing to its complexity. As such, there is no established general method for predicting the impression of odor of a chemical only from its physicochemical properties. In this study, we designed a novel predictive model based on an artificial neural network with a deep structure for predicting odor impression utilizing the mass spectra of chemicals, and we conducted a series of computational analyses to evaluate its performance. Feature vectors extracted from the original high-dimensional space using two autoencoders equipped with both input and output layers in the model are used to build a mapping function from the feature space of mass spectra to the feature space of sensory data. The results of predictions obtained by the proposed new method have notable accuracy (R≅0.76) in comparison with a conventional method (R≅0.61).
Huang, Ying; Chen, Shi-Yi; Deng, Feilong
2016-01-01
In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
A Novel Prediction Method about Single Components of Analog Circuits Based on Complex Field Modeling
Tian, Shulin; Yang, Chenglin
2014-01-01
Few researches pay attention to prediction about analog circuits. The few methods lack the correlation with circuit analysis during extracting and calculating features so that FI (fault indicator) calculation often lack rationality, thus affecting prognostic performance. To solve the above problem, this paper proposes a novel prediction method about single components of analog circuits based on complex field modeling. Aiming at the feature that faults of single components hold the largest number in analog circuits, the method starts with circuit structure, analyzes transfer function of circuits, and implements complex field modeling. Then, by an established parameter scanning model related to complex field, it analyzes the relationship between parameter variation and degeneration of single components in the model in order to obtain a more reasonable FI feature set via calculation. According to the obtained FI feature set, it establishes a novel model about degeneration trend of analog circuits' single components. At last, it uses particle filter (PF) to update parameters for the model and predicts remaining useful performance (RUP) of analog circuits' single components. Since calculation about the FI feature set is more reasonable, accuracy of prediction is improved to some extent. Finally, the foregoing conclusions are verified by experiments. PMID:25147853
Molecular Docking for Prediction and Interpretation of Adverse Drug Reactions.
Luo, Heng; Fokoue-Nkoutche, Achille; Singh, Nalini; Yang, Lun; Hu, Jianying; Zhang, Ping
2018-05-23
Adverse drug reactions (ADRs) present a major burden for patients and the healthcare industry. Various computational methods have been developed to predict ADRs for drug molecules. However, many of these methods require experimental or surveillance data and cannot be used when only structural information is available. We collected 1,231 small molecule drugs and 600 human proteins and utilized molecular docking to generate binding features among them. We developed machine learning models that use these docking features to make predictions for 1,533 ADRs. These models obtain an overall area under the receiver operating characteristic curve (AUROC) of 0.843 and an overall area under the precision-recall curve (AUPR) of 0.395, outperforming seven structural fingerprint-based prediction models. Using the method, we predicted skin striae for fluticasone propionate, dermatitis acneiform for mometasone, and decreased libido for irinotecan, as demonstrations. Furthermore, we analyzed the top binding proteins associated with some of the ADRs, which can help to understand and/or generate hypotheses for underlying mechanisms of ADRs. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Graph wavelet alignment kernels for drug virtual screening.
Smalter, Aaron; Huan, Jun; Lushington, Gerald
2009-06-01
In this paper, we introduce a novel statistical modeling technique for target property prediction, with applications to virtual screening and drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to summarize features capturing graph local topology. We design a novel graph kernel function to utilize the topology features to build predictive models for chemicals via Support Vector Machine classifier. We call the new graph kernel a graph wavelet-alignment kernel. We have evaluated the efficacy of the wavelet-alignment kernel using a set of chemical structure-activity prediction benchmarks. Our results indicate that the use of the kernel function yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art chemical classification approaches. In addition, our results also show that the use of wavelet functions significantly decreases the computational costs for graph kernel computation with more than ten fold speedup.
A computational method for predicting regulation of human microRNAs on the influenza virus genome
2013-01-01
Background While it has been suggested that host microRNAs (miRNAs) may downregulate viral gene expression as an antiviral defense mechanism, such a mechanism has not been explored in the influenza virus for human flu studies. As it is difficult to conduct related experiments on humans, computational studies can provide some insight. Although many computational tools have been designed for miRNA target prediction, there is a need for cross-species prediction, especially for predicting viral targets of human miRNAs. However, finding putative human miRNAs targeting influenza virus genome is still challenging. Results We developed machine-learning features and conducted comprehensive data training for predicting interactions between H1N1 genome segments and host miRNA. We defined our seed region as the first ten nucleotides from the 5' end of the miRNA to the 3' end of the miRNA and integrated various features including the number of consecutive matching bases in the seed region of 10 bases, a triplet feature in seed regions, thermodynamic energy, penalty of bulges and wobbles at binding sites, and the secondary structure of viral RNA for the prediction. Conclusions Compared to general predictive models, our model fully takes into account the conservation patterns and features of viral RNA secondary structures, and greatly improves the prediction accuracy. Our model identified some key miRNAs including hsa-miR-489, hsa-miR-325, hsa-miR-876-3p and hsa-miR-2117, which target HA, PB2, MP and NS of H1N1, respectively. Our study provided an interesting hypothesis concerning the miRNA-based antiviral defense mechanism against influenza virus in human, i.e., the binding between human miRNA and viral RNAs may not result in gene silencing but rather may block the viral RNA replication. PMID:24565017
Masso, Majid; Vaisman, Iosif I
2014-01-01
The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. Three of the predictors evaluate changes to protein stability upon mutation, each complementing a distinct experimental approach. Two additional classifiers are available, one for predicting activity changes due to residue replacements and the other for determining the disease potential of mutations associated with nonsynonymous single nucleotide polymorphisms (nsSNPs) in human proteins. These five command-line driven tools, as well as all the supporting programs, complement those that run our AUTO-MUTE web-based server. Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback. Included among these upgrades is the ability to perform three highly requested tasks: to run "big data" batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models.
Mohebbi, Maryam; Ghassemian, Hassan; Asl, Babak Mohammadzadeh
2011-05-01
This paper aims to propose an effective paroxysmal atrial fibrillation (PAF) predictor which is based on the analysis of the heart rate variability (HRV) signal. Predicting the onset of PAF, based on non-invasive techniques, is clinically important and can be invaluable in order to avoid useless therapeutic interventions and to minimize the risks for the patients. This method consists of four steps: Preprocessing, feature extraction, feature reduction, and classification. In the first step, the QRS complexes are detected from the electrocardiogram (ECG) signal and then the HRV signal is extracted. In the next step, the recurrence plot (RP) of HRV signal is obtained and six features are extracted to characterize the basic patterns of the RP. These features consist of length of longest diagonal segments, average length of the diagonal lines, entropy, trapping time, length of longest vertical line, and recurrence trend. In the third step, these features are reduced to three features by the linear discriminant analysis (LDA) technique. Using LDA not only reduces the number of the input features, but also increases the classification accuracy by selecting the most discriminating features. Finally, a support vector machine-based classifier is used to classify the HRV signals. The performance of the proposed method in prediction of PAF episodes was evaluated using the Atrial Fibrillation Prediction Database which consists of both 30-minutes ECG recordings end just prior to the onset of PAF and segments at least 45 min distant from any PAF events. The obtained sensitivity, specificity, and positive predictivity were 96.55%, 100%, and 100%, respectively.
Song, Jiangning; Tan, Hao; Wang, Mingjun; Webb, Geoffrey I.; Akutsu, Tatsuya
2012-01-01
Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. PMID:22319565
Designing and benchmarking the MULTICOM protein structure prediction system
2013-01-01
Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:23442819
Exploiting Information Diffusion Feature for Link Prediction in Sina Weibo
NASA Astrophysics Data System (ADS)
Li, Dong; Zhang, Yongchao; Xu, Zhiming; Chu, Dianhui; Li, Sheng
2016-01-01
The rapid development of online social networks (e.g., Twitter and Facebook) has promoted research related to social networks in which link prediction is a key problem. Although numerous attempts have been made for link prediction based on network structure, node attribute and so on, few of the current studies have considered the impact of information diffusion on link creation and prediction. This paper mainly addresses Sina Weibo, which is the largest microblog platform with Chinese characteristics, and proposes the hypothesis that information diffusion influences link creation and verifies the hypothesis based on real data analysis. We also detect an important feature from the information diffusion process, which is used to promote link prediction performance. Finally, the experimental results on Sina Weibo dataset have demonstrated the effectiveness of our methods.
Exploiting Information Diffusion Feature for Link Prediction in Sina Weibo.
Li, Dong; Zhang, Yongchao; Xu, Zhiming; Chu, Dianhui; Li, Sheng
2016-01-28
The rapid development of online social networks (e.g., Twitter and Facebook) has promoted research related to social networks in which link prediction is a key problem. Although numerous attempts have been made for link prediction based on network structure, node attribute and so on, few of the current studies have considered the impact of information diffusion on link creation and prediction. This paper mainly addresses Sina Weibo, which is the largest microblog platform with Chinese characteristics, and proposes the hypothesis that information diffusion influences link creation and verifies the hypothesis based on real data analysis. We also detect an important feature from the information diffusion process, which is used to promote link prediction performance. Finally, the experimental results on Sina Weibo dataset have demonstrated the effectiveness of our methods.
2011-01-01
Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots. PMID:21798070
Prediction of distal residue participation in enzyme catalysis
Brodkin, Heather R; DeLateur, Nicholas A; Somarowthu, Srinivas; Mills, Caitlyn L; Novak, Walter R; Beuning, Penny J; Ringe, Dagmar; Ondrechen, Mary Jo
2015-01-01
A scoring method for the prediction of catalytically important residues in enzyme structures is presented and used to examine the participation of distal residues in enzyme catalysis. Scores are based on the Partial Order Optimum Likelihood (POOL) machine learning method, using computed electrostatic properties, surface geometric features, and information obtained from the phylogenetic tree as input features. Predictions of distal residue participation in catalysis are compared with experimental kinetics data from the literature on variants of the featured enzymes; some additional kinetics measurements are reported for variants of Pseudomonas putida nitrile hydratase (ppNH) and for Escherichia coli alkaline phosphatase (AP). The multilayer active sites of P. putida nitrile hydratase and of human phosphoglucose isomerase are predicted by the POOL log ZP scores, as is the single-layer active site of P. putida ketosteroid isomerase. The log ZP score cutoff utilized here results in over-prediction of distal residue involvement in E. coli alkaline phosphatase. While fewer experimental data points are available for P. putida mandelate racemase and for human carbonic anhydrase II, the POOL log ZP scores properly predict the previously reported participation of distal residues. PMID:25627867
Soil-pipe interaction modeling for pipe behavior prediction with super learning based methods
NASA Astrophysics Data System (ADS)
Shi, Fang; Peng, Xiang; Liu, Huan; Hu, Yafei; Liu, Zheng; Li, Eric
2018-03-01
Underground pipelines are subject to severe distress from the surrounding expansive soil. To investigate the structural response of water mains to varying soil movements, field data, including pipe wall strains in situ soil water content, soil pressure and temperature, was collected. The research on monitoring data analysis has been reported, but the relationship between soil properties and pipe deformation has not been well-interpreted. To characterize the relationship between soil property and pipe deformation, this paper presents a super learning based approach combining feature selection algorithms to predict the water mains structural behavior in different soil environments. Furthermore, automatic variable selection method, e.i. recursive feature elimination algorithm, were used to identify the critical predictors contributing to the pipe deformations. To investigate the adaptability of super learning to different predictive models, this research employed super learning based methods to three different datasets. The predictive performance was evaluated by R-squared, root-mean-square error and mean absolute error. Based on the prediction performance evaluation, the superiority of super learning was validated and demonstrated by predicting three types of pipe deformations accurately. In addition, a comprehensive understand of the water mains working environments becomes possible.
The expanding universe of thiolated gold nanoclusters and beyond.
Jiang, De-en
2013-08-21
Thiolated gold nanoclusters form a universe of their own. Researchers in this field are constantly pushing the boundary of this universe by identifying new compositions and in a few "lucky" cases, solving their structures. Such solved structures, even if there are only few, provide important hints for predicting the many identified compositions that are yet to be crystallized or structure determined. Structure prediction is the most pressing issue for a computational chemist in this field. The success of the density functional theory method in gauging the energetic ordering of isomers for thiolated gold clusters has been truly remarkable, but to predict the most stable structure for a given composition remains a great challenge. In this feature article from a computational chemist's point of view, the author shows how one understands and predicts structures for thiolated gold nanoclusters based on his old and new results. To further entertain the reader, the author also offers several "imaginative" structures, claims, and challenges for this field.
Zhou, Peng; Wang, Congcong; Tian, Feifei; Ren, Yanrong; Yang, Chao; Huang, Jian
2013-01-01
Quantitative structure-activity relationship (QSAR), a regression modeling methodology that establishes statistical correlation between structure feature and apparent behavior for a series of congeneric molecules quantitatively, has been widely used to evaluate the activity, toxicity and property of various small-molecule compounds such as drugs, toxicants and surfactants. However, it is surprising to see that such useful technique has only very limited applications to biomacromolecules, albeit the solved 3D atom-resolution structures of proteins, nucleic acids and their complexes have accumulated rapidly in past decades. Here, we present a proof-of-concept paradigm for the modeling, prediction and interpretation of the binding affinity of 144 sequence-nonredundant, structure-available and affinity-known protein complexes (Kastritis et al. Protein Sci 20:482-491, 2011) using a biomacromolecular QSAR (BioQSAR) scheme. We demonstrate that the modeling performance and predictive power of BioQSAR are comparable to or even better than that of traditional knowledge-based strategies, mechanism-type methods and empirical scoring algorithms, while BioQSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency. The BioQSAR scheme could be readily modified to infer the biological behavior and functions of other biomacromolecules, if their X-ray crystal structures, NMR conformation assemblies or computationally modeled structures are available.
Abriata, Luciano A; Kinch, Lisa N; Tamò, Giorgio E; Monastyrskyy, Bohdan; Kryshtafovych, Andriy; Dal Peraro, Matteo
2018-03-01
For assessment purposes, CASP targets are split into evaluation units. We herein present the official definition of CASP12 evaluation units (EUs) and their classification into difficulty categories. Each target can be evaluated as one EU (the whole target) or/and several EUs (separate structural domains or groups of structural domains). The specific scenario for a target split is determined by the domain organization of available templates, the difference in server performance on separate domains versus combination of the domains, and visual inspection. In the end, 71 targets were split into 96 EUs. Classification of the EUs into difficulty categories was done semi-automatically with the assistance of metrics provided by the Prediction Center. These metrics account for sequence and structural similarities of the EUs to potential structural templates from the Protein Data Bank, and for the baseline performance of automated server predictions. The metrics readily separate the 96 EUs into 38 EUs that should be straightforward for template-based modeling (TBM) and 39 that are expected to be hard for homology modeling and are thus left for free modeling (FM). The remaining 19 borderline evaluation units were dubbed FM/TBM, and were inspected case by case. The article also overviews structural and evolutionary features of selected targets relevant to our accompanying article presenting the assessment of FM and FM/TBM predictions, and overviews structural features of the hardest evaluation units from the FM category. We finally suggest improvements for the EU definition and classification procedures. © 2017 Wiley Periodicals, Inc.
Sound transmission through lightweight double-leaf partitions: theoretical modelling
NASA Astrophysics Data System (ADS)
Wang, J.; Lu, T. J.; Woodhouse, J.; Langley, R. S.; Evans, J.
2005-09-01
This paper presents theoretical modelling of the sound transmission loss through double-leaf lightweight partitions stiffened with periodically placed studs. First, by assuming that the effect of the studs can be replaced with elastic springs uniformly distributed between the sheathing panels, a simple smeared model is established. Second, periodic structure theory is used to develop a more accurate model taking account of the discrete placing of the studs. Both models treat incident sound waves in the horizontal plane only, for simplicity. The predictions of the two models are compared, to reveal the physical mechanisms determining sound transmission. The smeared model predicts relatively simple behaviour, in which the only conspicuous features are associated with coincidence effects with the two types of structural wave allowed by the partition model, and internal resonances of the air between the panels. In the periodic model, many more features are evident, associated with the structure of pass- and stop-bands for structural waves in the partition. The models are used to explain the effects of incidence angle and of the various system parameters. The predictions are compared with existing test data for steel plates with wooden stiffeners, and good agreement is obtained.
The RNA Newton polytope and learnability of energy parameters.
Forouzmand, Elmirasadat; Chitsaz, Hamidreza
2013-07-01
Computational RNA structure prediction is a mature important problem that has received a new wave of attention with the discovery of regulatory non-coding RNAs and the advent of high-throughput transcriptome sequencing. Despite nearly two score years of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. So far, researchers have proposed increasingly complex energy models and improved parameter estimation methods, experimental and/or computational, in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. Why is that? The first step toward high-accuracy structure prediction is to pick an energy model that is inherently capable of predicting each and every one of known structures to date. In this article, we introduce the notion of learnability of the parameters of an energy model as a measure of such an inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. To the best of our knowledge, this is the first approach toward computing the RNA Newton polytope and a systematic assessment of the inherent capabilities of an energy model. The worst case complexity of our algorithm is exponential in the number of features. However, dimensionality reduction techniques can provide approximate solutions to avoid the curse of dimensionality. We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U, C-G and G-U base pairs. Our results show that this simple energy model satisfies the necessary condition for more than half of the input unpseudoknotted sequence-structure pairs (55%) chosen from the RNA STRAND v2.0 database and severely violates the condition for ~ 13%, which provide a set of hard cases that require further investigation. From 1350 RNA strands, the observed 3D feature vector for 749 strands is on the surface of the computed polytope. For 289 RNA strands, the observed feature vector is not on the boundary of the polytope but its distance from the boundary is not more than one. A distance of one essentially means one base pair difference between the observed structure and the closest point on the boundary of the polytope, which need not be the feature vector of a structure. For 171 sequences, this distance is larger than two, and for only 11 sequences, this distance is larger than five. The source code is available on http://compbio.cs.wayne.edu/software/rna-newton-polytope.
STRUM: structure-based prediction of protein stability changes upon single-point mutation.
Quan, Lijun; Lv, Qiang; Zhang, Yang
2016-10-01
Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
STRUM: structure-based prediction of protein stability changes upon single-point mutation
Quan, Lijun; Lv, Qiang; Zhang, Yang
2016-01-01
Motivation: Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. Results: We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. Availability and Implementation: http://zhanglab.ccmb.med.umich.edu/STRUM/ Contact: qiang@suda.edu.cn and zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318206
Predicting Film Genres with Implicit Ideals
Olney, Andrew McGregor
2013-01-01
We present a new approach to defining film genre based on implicit ideals. When viewers rate the likability of a film, they indirectly express their ideal of what a film should be. Across six studies we investigate the category structure that emerges from likability ratings and the category structure that emerges from the features of film. We further compare these data-driven category structures with human annotated film genres. We conclude that film genres are structured more around ideals than around features of film. This finding lends experimental support to the notion that film genres are set of shifting, fuzzy, and highly contextualized psychological categories. PMID:23423823
Computational modeling of membrane proteins
Leman, Julia Koehler; Ulmschneider, Martin B.; Gray, Jeffrey J.
2014-01-01
The determination of membrane protein (MP) structures has always trailed that of soluble proteins due to difficulties in their overexpression, reconstitution into membrane mimetics, and subsequent structure determination. The percentage of MP structures in the protein databank (PDB) has been at a constant 1-2% for the last decade. In contrast, over half of all drugs target MPs, only highlighting how little we understand about drug-specific effects in the human body. To reduce this gap, researchers have attempted to predict structural features of MPs even before the first structure was experimentally elucidated. In this review, we present current computational methods to predict MP structure, starting with secondary structure prediction, prediction of trans-membrane spans, and topology. Even though these methods generate reliable predictions, challenges such as predicting kinks or precise beginnings and ends of secondary structure elements are still waiting to be addressed. We describe recent developments in the prediction of 3D structures of both α-helical MPs as well as β-barrels using comparative modeling techniques, de novo methods, and molecular dynamics (MD) simulations. The increase of MP structures has (1) facilitated comparative modeling due to availability of more and better templates, and (2) improved the statistics for knowledge-based scoring functions. Moreover, de novo methods have benefitted from the use of correlated mutations as restraints. Finally, we outline current advances that will likely shape the field in the forthcoming decade. PMID:25355688
Age structure is critical to the population dynamics and survival of honeybee colonies.
Betti, M I; Wahl, L M; Zamir, M
2016-11-01
Age structure is an important feature of the division of labour within honeybee colonies, but its effects on colony dynamics have rarely been explored. We present a model of a honeybee colony that incorporates this key feature, and use this model to explore the effects of both winter and disease on the fate of the colony. The model offers a novel explanation for the frequently observed phenomenon of 'spring dwindle', which emerges as a natural consequence of the age-structured dynamics. Furthermore, the results indicate that a model taking age structure into account markedly affects the predicted timing and severity of disease within a bee colony. The timing of the onset of disease with respect to the changing seasons may also have a substantial impact on the fate of a honeybee colony. Finally, simulations predict that an infection may persist in a honeybee colony over several years, with effects that compound over time. Thus, the ultimate collapse of the colony may be the result of events several years past.
The Response of Simple Polymer Structures Under Dynamic Loading
NASA Astrophysics Data System (ADS)
Proud, William; Ellison, Kay; Yapp, Su; Cole, Cloe; Galimberti, Stefano; Institute of Shock Physics Team
2017-06-01
The dynamic response of polymeric materials has been widely studied with the effects of degree of crystallinity, strain rate, temperature and sample size being commonly reported. This study uses a simple PMMA structure, a right cylindrical sample, with structural features such as holes. The features are added an varied in a systematic fashion. Samples were dynamically loaded using a Split Hopkinson Pressure Bar up to failure. The resulting stress-strain curves are presented showing the change in sample response. The strain to failure is shown to increase initially with the presence of holes, while failure stress is relatively unaffected. The fracture patterns seen in the failed samples change, with tensile cracks, Hertzian cones, shear effects being dominant for different holes sizes and geometries. The sample were prepared by laser cutting and checked for residual stress before experiment. The data is used to validate predictive model predictions where material, structure and damage are included.. The Institute of Shock Physics acknowledges the support of Imperial College London and the Atomic Weapons Establishment.
New features in Saturn's atmosphere revealed by high-resolution thermal infrared images
NASA Technical Reports Server (NTRS)
Gezari, D. Y.; Mumma, M. J.; Espenak, F.; Deming, D.; Bjoraker, G.; Woods, L.; Folz, W.
1989-01-01
Observations of the stratospheric IR emission structure on Saturn are presented. The high-spatial-resolution global images show a variety of new features, including a narrow equatorial belt of enhanced emission at 7.8 micron, a prominent symmetrical north polar hotspot at all three wavelengths, and a midlatitude structure which is asymmetrically brightened at the east limb. The results confirm the polar brightening and reversal in position predicted by recent models for seasonal thermal variations of Saturn's stratosphere.
Protein structure based prediction of catalytic residues
2013-01-01
Background Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. Results We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. Conclusions We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases. PMID:23433045
A multivariate prediction model for Rho-dependent termination of transcription.
Nadiras, Cédric; Eveno, Eric; Schwartz, Annie; Figueroa-Bossi, Nara; Boudvillain, Marc
2018-06-21
Bacterial transcription termination proceeds via two main mechanisms triggered either by simple, well-conserved (intrinsic) nucleic acid motifs or by the motor protein Rho. Although bacterial genomes can harbor hundreds of termination signals of either type, only intrinsic terminators are reliably predicted. Computational tools to detect the more complex and diversiform Rho-dependent terminators are lacking. To tackle this issue, we devised a prediction method based on Orthogonal Projections to Latent Structures Discriminant Analysis [OPLS-DA] of a large set of in vitro termination data. Using previously uncharacterized genomic sequences for biochemical evaluation and OPLS-DA, we identified new Rho-dependent signals and quantitative sequence descriptors with significant predictive value. Most relevant descriptors specify features of transcript C>G skewness, secondary structure, and richness in regularly-spaced 5'CC/UC dinucleotides that are consistent with known principles for Rho-RNA interaction. Descriptors collectively warrant OPLS-DA predictions of Rho-dependent termination with a ∼85% success rate. Scanning of the Escherichia coli genome with the OPLS-DA model identifies significantly more termination-competent regions than anticipated from transcriptomics and predicts that regions intrinsically refractory to Rho are primarily located in open reading frames. Altogether, this work delineates features important for Rho activity and describes the first method able to predict Rho-dependent terminators in bacterial genomes.
Zhou, Hang; Yang, Yang; Shen, Hong-Bin
2017-03-15
Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Models of the elastic x-ray scattering feature for warm dense aluminum
Starrett, Charles Edward; Saumon, Didier
2015-09-03
The elastic feature of x-ray scattering from warm dense aluminum has recently been measured by Fletcher et al. [Nature Photonics 9, 274 (2015)] with much higher accuracy than had hitherto been possible. This measurement is a direct test of the ionic structure predicted by models of warm dense matter. We use the method of pseudoatom molecular dynamics to predict this elastic feature for warm dense aluminum with temperatures of 1–100 eV and densities of 2.7–8.1g/cm 3. We compare these predictions to experiments, finding good agreement with Fletcher et al. and corroborating the discrepancy found in analyses of an earlier experimentmore » of Ma et al. [Phys. Rev. Lett. 110, 065001 (2013)]. Lastly, we also evaluate the validity of the Thomas-Fermi model of the electrons and of the hypernetted chain approximation in computing the elastic feature and find them both wanting in the regime currently probed by experiments.« less
Prediction of Peptide and Protein Propensity for Amyloid Formation
Família, Carlos; Dennison, Sarah R.; Quintas, Alexandre; Phoenix, David A.
2015-01-01
Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation. PMID:26241652
Lee, Hasup; Baek, Minkyung; Lee, Gyu Rie; Park, Sangwoo; Seok, Chaok
2017-03-01
Many proteins function as homo- or hetero-oligomers; therefore, attempts to understand and regulate protein functions require knowledge of protein oligomer structures. The number of available experimental protein structures is increasing, and oligomer structures can be predicted using the experimental structures of related proteins as templates. However, template-based models may have errors due to sequence differences between the target and template proteins, which can lead to functional differences. Such structural differences may be predicted by loop modeling of local regions or refinement of the overall structure. In CAPRI (Critical Assessment of PRotein Interactions) round 30, we used recently developed features of the GALAXY protein modeling package, including template-based structure prediction, loop modeling, model refinement, and protein-protein docking to predict protein complex structures from amino acid sequences. Out of the 25 CAPRI targets, medium and acceptable quality models were obtained for 14 and 1 target(s), respectively, for which proper oligomer or monomer templates could be detected. Symmetric interface loop modeling on oligomer model structures successfully improved model quality, while loop modeling on monomer model structures failed. Overall refinement of the predicted oligomer structures consistently improved the model quality, in particular in interface contacts. Proteins 2017; 85:399-407. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Mohebbi, Maryam; Ghassemian, Hassan; Asl, Babak Mohammadzadeh
2011-01-01
This paper aims to propose an effective paroxysmal atrial fibrillation (PAF) predictor which is based on the analysis of the heart rate variability (HRV) signal. Predicting the onset of PAF, based on non-invasive techniques, is clinically important and can be invaluable in order to avoid useless therapeutic interventions and to minimize the risks for the patients. This method consists of four steps: Preprocessing, feature extraction, feature reduction, and classification. In the first step, the QRS complexes are detected from the electrocardiogram (ECG) signal and then the HRV signal is extracted. In the next step, the recurrence plot (RP) of HRV signal is obtained and six features are extracted to characterize the basic patterns of the RP. These features consist of length of longest diagonal segments, average length of the diagonal lines, entropy, trapping time, length of longest vertical line, and recurrence trend. In the third step, these features are reduced to three features by the linear discriminant analysis (LDA) technique. Using LDA not only reduces the number of the input features, but also increases the classification accuracy by selecting the most discriminating features. Finally, a support vector machine-based classifier is used to classify the HRV signals. The performance of the proposed method in prediction of PAF episodes was evaluated using the Atrial Fibrillation Prediction Database which consists of both 30-minutes ECG recordings end just prior to the onset of PAF and segments at least 45 min distant from any PAF events. The obtained sensitivity, specificity, and positive predictivity were 96.55%, 100%, and 100%, respectively. PMID:22606666
Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin
2016-11-01
Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features. Copyright © 2016. Published by Elsevier Inc.
Using Conversation Topics for Predicting Therapy Outcomes in Schizophrenia
Howes, Christine; Purver, Matthew; McCabe, Rose
2013-01-01
Previous research shows that aspects of doctor-patient communication in therapy can predict patient symptoms, satisfaction and future adherence to treatment (a significant problem with conditions such as schizophrenia). However, automatic prediction has so far shown success only when based on low-level lexical features, and it is unclear how well these can generalize to new data, or whether their effectiveness is due to their capturing aspects of style, structure or content. Here, we examine the use of topic as a higher-level measure of content, more likely to generalize and to have more explanatory power. Investigations show that while topics predict some important factors such as patient satisfaction and ratings of therapy quality, they lack the full predictive power of lower-level features. For some factors, unsupervised methods produce models comparable to manual annotation. PMID:23943658
Against Structural Constraints in Subject-Verb Agreement Production
ERIC Educational Resources Information Center
Gillespie, Maureen; Pearlmutter, Neal J.
2013-01-01
Syntactic structure has been considered an integral component of agreement computation in language production. In agreement error studies, clause-boundedness (Bock & Cutting, 1992) and hierarchical feature-passing (Franck, Vigliocco, & Nicol, 2002) predict that local nouns within clausal modifiers should produce fewer errors than do those within…
The Norm of Teacher Autonomy/Equality: Measurement & Findings.
ERIC Educational Resources Information Center
Packard, John S.
As part of a larger investigation of the effects of introducing a formal unit structure into elementary schools, an attempt was made to predict in which of the newly unitized schools teachers would first show an increase in task interdependence. Prominent among the various features under consideration in the prediction study are teacher norms…
Prediction and Dissection of Protein-RNA Interactions by Molecular Descriptors.
Liu, Zhi-Ping; Chen, Luonan
2016-01-01
Protein-RNA interactions play crucial roles in numerous biological processes. However, detecting the interactions and binding sites between protein and RNA by traditional experiments is still time consuming and labor costing. Thus, it is of importance to develop bioinformatics methods for predicting protein-RNA interactions and binding sites. Accurate prediction of protein-RNA interactions and recognitions will highly benefit to decipher the interaction mechanisms between protein and RNA, as well as to improve the RNA-related protein engineering and drug design. In this work, we summarize the current bioinformatics strategies of predicting protein-RNA interactions and dissecting protein-RNA interaction mechanisms from local structure binding motifs. In particular, we focus on the feature-based machine learning methods, in which the molecular descriptors of protein and RNA are extracted and integrated as feature vectors of representing the interaction events and recognition residues. In addition, the available methods are classified and compared comprehensively. The molecular descriptors are expected to elucidate the binding mechanisms of protein-RNA interaction and reveal the functional implications from structural complementary perspective.
Chen, Peng; Li, Jinyan; Wong, Limsoon; Kuwahara, Hiroyuki; Huang, Jianhua Z; Gao, Xin
2013-08-01
Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Copyright © 2013 Wiley Periodicals, Inc.
Hao, Xiaohu; Zhang, Guijun; Zhou, Xiaogen
2018-04-01
Computing conformations which are essential to associate structural and functional information with gene sequences, is challenging due to the high dimensionality and rugged energy surface of the protein conformational space. Consequently, the dimension of the protein conformational space should be reduced to a proper level, and an effective exploring algorithm should be proposed. In this paper, a plug-in method for guiding exploration in conformational feature space with Lipschitz underestimation (LUE) for ab-initio protein structure prediction is proposed. The conformational space is converted into ultrafast shape recognition (USR) feature space firstly. Based on the USR feature space, the conformational space can be further converted into Underestimation space according to Lipschitz estimation theory for guiding exploration. As a consequence of the use of underestimation model, the tight lower bound estimate information can be used for exploration guidance, the invalid sampling areas can be eliminated in advance, and the number of energy function evaluations can be reduced. The proposed method provides a novel technique to solve the exploring problem of protein conformational space. LUE is applied to differential evolution (DE) algorithm, and metropolis Monte Carlo(MMC) algorithm which is available in the Rosetta; When LUE is applied to DE and MMC, it will be screened by the underestimation method prior to energy calculation and selection. Further, LUE is compared with DE and MMC by testing on 15 small-to-medium structurally diverse proteins. Test results show that near-native protein structures with higher accuracy can be obtained more rapidly and efficiently with the use of LUE. Copyright © 2018 Elsevier Ltd. All rights reserved.
Predictive modeling: Solubility of C60 and C70 fullerenes in diverse solvents.
Gupta, Shikha; Basant, Nikita
2018-06-01
Solubility of fullerenes imposes a major limitation to further advanced research and technological development using these novel materials. There have been continued efforts to discover better solvents and their properties that influence the solubility of fullerenes. Here, we have developed QSPR (quantitative structure-property relationship) models based on structural features of diverse solvents and large experimental data for predicting the solubility of C 60 and C 70 fullerenes. The developed models identified most relevant features of the solvents that encode the polarizability, polarity and lipophilicity properties which largely influence the solubilizing potential of the solvent for the fullerenes. We also established Inter-moieties solubility correlations (IMSC) based quantitative property-property relationship (QPPR) models for predicting solubility of C 60 and C 70 fullerenes. The QSPR and QPPR models were internally and externally validated deriving the most stringent statistical criteria and predicted C 60 and C 70 solubility values in different solvents were in close agreement with the experimental values. In test sets, the QSPR models yielded high correlations (R 2 > 0.964) and low root mean squared error of prediction errors (RMSEP< 0.25). Results of comparison with other studies indicated that the proposed models could effectively improve the accuracy and ability for predicting solubility of C 60 and C 70 fullerenes in solvents with diverse structures and would be useful in development of more effective solvents. Copyright © 2018 Elsevier Ltd. All rights reserved.
Protein–DNA Interactions: The Story so Far and a New Method for Prediction
Jones, Susan; Thornton, Janet M.
2003-01-01
This review describes methods for the prediction of DNA binding function, and specifically summarizes a new method using 3D structural templates. The new method features the HTH motif that is found in approximately one-third of DNAbinding protein families. A library of 3D structural templates of HTH motifs was derived from proteins in the PDB. Templates were scanned against complete protein structures and the optimal superposition of a template on a structure calculated. Significance thresholds in terms of a minimum root mean squared deviation (rmsd) of an optimal superposition, and a minimum motif accessible surface area (ASA), have been calculated. Inmore » this way, it is possible to scan the template library against proteins of unknown function to make predictions about DNA-binding functionality.« less
Sterling, Mark; Huang, David T; Ghoraani, Behnaz
2015-01-01
We propose a new algorithm to predict the outcome of direct-current electric (DCE) cardioversion for atrial fibrillation (AF) patients. AF is the most common cardiac arrhythmia and DCE cardioversion is a noninvasive treatment to end AF and return the patient to sinus rhythm (SR). Unfortunately, there is a high risk of AF recurrence in persistent AF patients; hence clinically it is important to predict the DCE outcome in order to avoid the procedure's side effects. This study develops a feature extraction and classification framework to predict AF recurrence patients from the underlying structure of atrial activity (AA). A multiresolution signal decomposition technique, based on matching pursuit (MP), was used to project the AA over a dictionary of wavelets. Seven novel features were derived from the decompositions and were employed in a quadratic discrimination analysis classification to predict the success of post-DCE cardioversion in 40 patients with persistent AF. The proposed algorithm achieved 100% sensitivity and 95% specificity, indicating that the proposed computational approach captures detailed structural information about the underlying AA and could provide reliable information for effective management of AF.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baghram, Shant; Abolhasani, Ali Akbar; Firouzjahi, Hassan
We study the predictions of anomalous inflationary models on the abundance of structures in large scale structure observations. The anomalous features encoded in primordial curvature perturbation power spectrum are (a): localized feature in momentum space, (b): hemispherical asymmetry and (c): statistical anisotropies. We present a model-independent expression relating the number density of structures to the changes in the matter density variance. Models with localized feature can alleviate the tension between observations and numerical simulations of cold dark matter structures on galactic scales as a possible solution to the missing satellite problem. In models with hemispherical asymmetry we show that themore » abundance of structures becomes asymmetric depending on the direction of observation to sky. In addition, we study the effects of scale-dependent dipole amplitude on the abundance of structures. Using the quasars data and adopting the power-law scaling k{sup n{sub A}-1} for the amplitude of dipole we find the upper bound n{sub A} < 0.6 for the spectral index of the dipole asymmetry. In all cases there is a critical mass scale M{sub c} in which for M M{sub c}) the enhancement in variance induced from anomalous feature decreases (increases) the abundance of dark matter structures in Universe.« less
Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning
2018-03-08
Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.
Statistical analyses and computational prediction of helical kinks in membrane proteins
NASA Astrophysics Data System (ADS)
Huang, Y.-H.; Chen, C.-M.
2012-10-01
We have carried out statistical analyses and computer simulations of helical kinks for TM helices in the PDBTM database. About 59 % of 1562 TM helices showed a significant kink, and 38 % of these kinks are associated with prolines in a range of ±4 residues. Our analyses show that helical kinks are more populated in the central region of helices, particularly in the range of 1-3 residues away from the helix center. Among 1,053 helical kinks analyzed, 88 % of kinks are bends (change in helix axis without loss of helical character) and 12 % are disruptions (change in helix axis and loss of helical character). It is found that proline residues tend to cause larger kink angles in helical bends, while this effect is not observed in helical disruptions. A further analysis of these kinked helices suggests that a kinked helix usually has 1-2 broken backbone hydrogen bonds with the corresponding N-O distance in the range of 4.2-8.7 Å, whose distribution is sharply peaked at 4.9 Å followed by an exponential decay with increasing distance. Our main aims of this study are to understand the formation of helical kinks and to predict their structural features. Therefore we further performed molecular dynamics (MD) simulations under four simulation scenarios to investigate kink formation in 37 kinked TM helices and 5 unkinked TM helices. The representative models of these kinked helices are predicted by a clustering algorithm, SPICKER, from numerous decoy structures possessing the above generic features of kinked helices. Our results show an accuracy of 95 % in predicting the kink position of kinked TM helices and an error less than 10° in the angle prediction of 71.4 % kinked helices. For unkinked helices, based on various structure similarity tests, our predicted models are highly consistent with their crystal structure. These results provide strong supports for the validity of our method in predicting the structure of TM helices.
NASA Astrophysics Data System (ADS)
Wang, Yu; Guo, Yanzhi; Kuang, Qifan; Pu, Xuemei; Ji, Yue; Zhang, Zhihang; Li, Menglong
2015-04-01
The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients ( R p and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.
ERIC Educational Resources Information Center
Michmerhuizen, Anna; Rose, Karine; Annankra, Wentiirim; Vander Griend, Douglas A.
2017-01-01
Making optimal pedagogical and predictive use of the radius ratio rule to distinguish between solid state structures that feature tetrahedral, octahedral and cubic holes requires several updated insights. A comparative analysis of the Born-Landé equation for lattice energy is developed to show that the rock salt structure is a suitable choice for…
GeneSilico protein structure prediction meta-server.
Kurowski, Michal A; Bujnicki, Janusz M
2003-07-01
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
GeneSilico protein structure prediction meta-server
Kurowski, Michal A.; Bujnicki, Janusz M.
2003-01-01
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313
Prediction of distal residue participation in enzyme catalysis.
Brodkin, Heather R; DeLateur, Nicholas A; Somarowthu, Srinivas; Mills, Caitlyn L; Novak, Walter R; Beuning, Penny J; Ringe, Dagmar; Ondrechen, Mary Jo
2015-05-01
A scoring method for the prediction of catalytically important residues in enzyme structures is presented and used to examine the participation of distal residues in enzyme catalysis. Scores are based on the Partial Order Optimum Likelihood (POOL) machine learning method, using computed electrostatic properties, surface geometric features, and information obtained from the phylogenetic tree as input features. Predictions of distal residue participation in catalysis are compared with experimental kinetics data from the literature on variants of the featured enzymes; some additional kinetics measurements are reported for variants of Pseudomonas putida nitrile hydratase (ppNH) and for Escherichia coli alkaline phosphatase (AP). The multilayer active sites of P. putida nitrile hydratase and of human phosphoglucose isomerase are predicted by the POOL log ZP scores, as is the single-layer active site of P. putida ketosteroid isomerase. The log ZP score cutoff utilized here results in over-prediction of distal residue involvement in E. coli alkaline phosphatase. While fewer experimental data points are available for P. putida mandelate racemase and for human carbonic anhydrase II, the POOL log ZP scores properly predict the previously reported participation of distal residues. 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Park, Hahnbeom; Bradley, Philip; Greisen, Per; Liu, Yuan; Mulligan, Vikram Khipple; Kim, David E.; Baker, David; DiMaio, Frank
2017-01-01
Most biomolecular modeling energy functions for structure prediction, sequence design, and molecular docking, have been parameterized using existing macromolecular structural data; this contrasts molecular mechanics force fields which are largely optimized using small-molecule data. In this study, we describe an integrated method that enables optimization of a biomolecular modeling energy function simultaneously against small-molecule thermodynamic data and high-resolution macromolecular structural data. We use this approach to develop a next-generation Rosetta energy function that utilizes a new anisotropic implicit solvation model, and an improved electrostatics and Lennard-Jones model, illustrating how energy functions can be considerably improved in their ability to describe large-scale energy landscapes by incorporating both small-molecule and macromolecule data. The energy function improves performance in a wide range of protein structure prediction challenges, including monomeric structure prediction, protein-protein and protein-ligand docking, protein sequence design, and prediction of the free energy changes by mutation, while reasonably recapitulating small-molecule thermodynamic properties. PMID:27766851
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-11
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields
NASA Astrophysics Data System (ADS)
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-01
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Structured Light-Based 3D Reconstruction System for Plants.
Nguyen, Thuy Tuong; Slaughter, David C; Max, Nelson; Maloof, Julin N; Sinha, Neelima
2015-07-29
Camera-based 3D reconstruction of physical objects is one of the most popular computer vision trends in recent years. Many systems have been built to model different real-world subjects, but there is lack of a completely robust system for plants. This paper presents a full 3D reconstruction system that incorporates both hardware structures (including the proposed structured light system to enhance textures on object surfaces) and software algorithms (including the proposed 3D point cloud registration and plant feature measurement). This paper demonstrates the ability to produce 3D models of whole plants created from multiple pairs of stereo images taken at different viewing angles, without the need to destructively cut away any parts of a plant. The ability to accurately predict phenotyping features, such as the number of leaves, plant height, leaf size and internode distances, is also demonstrated. Experimental results show that, for plants having a range of leaf sizes and a distance between leaves appropriate for the hardware design, the algorithms successfully predict phenotyping features in the target crops, with a recall of 0.97 and a precision of 0.89 for leaf detection and less than a 13-mm error for plant size, leaf size and internode distance.
Thermodynamic database for proteins: features and applications.
Gromiha, M Michael; Sarai, Akinori
2010-01-01
We have developed a thermodynamic database for proteins and mutants, ProTherm, which is a collection of a large number of thermodynamic data on protein stability along with the sequence and structure information, experimental methods and conditions, and literature information. This is a valuable resource for understanding/predicting the stability of proteins, and it can be accessible at http://www.gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html . ProTherm has several features including various search, display, and sorting options and visualization tools. We have analyzed the data in ProTherm to examine the relationship among thermodynamics, structure, and function of proteins. We describe the progress on the development of methods for understanding/predicting protein stability, such as (i) relationship between the stability of protein mutants and amino acid properties, (ii) average assignment method, (iii) empirical energy functions, (iv) torsion, distance, and contact potentials, and (v) machine learning techniques. The list of online resources for predicting protein stability has also been provided.
Discriminative prediction of mammalian enhancers from DNA sequence
Lee, Dongwon; Karchin, Rachel; Beer, Michael A.
2011-01-01
Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers. PMID:21875935
A Santos, Jose C; Nassif, Houssam; Page, David; Muggleton, Stephen H; E Sternberg, Michael J
2012-07-11
There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. In addition to confirming literature results, ProGolem's model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.
The persuasion network is modulated by drug-use risk and predicts anti-drug message effectiveness
Mangus, J Michael; Turner, Benjamin O
2017-01-01
Abstract While a persuasion network has been proposed, little is known about how network connections between brain regions contribute to attitude change. Two possible mechanisms have been advanced. One hypothesis predicts that attitude change results from increased connectivity between structures implicated in affective and executive processing in response to increases in argument strength. A second functional perspective suggests that highly arousing messages reduce connectivity between structures implicated in the encoding of sensory information, which disrupts message processing and thereby inhibits attitude change. However, persuasion is a multi-determined construct that results from both message features and audience characteristics. Therefore, persuasive messages should lead to specific functional connectivity patterns among a priori defined structures within the persuasion network. The present study exposed 28 subjects to anti-drug public service announcements where arousal, argument strength, and subject drug-use risk were systematically varied. Psychophysiological interaction analyses provide support for the affective-executive hypothesis but not for the encoding-disruption hypothesis. Secondary analyses show that video-level connectivity patterns among structures within the persuasion network predict audience responses in independent samples (one college-aged, one nationally representative). We propose that persuasion neuroscience research is best advanced by considering network-level effects while accounting for interactions between message features and target audience characteristics. PMID:29140500
Prediction of brain-computer interface aptitude from individual brain structure.
Halder, S; Varkuti, B; Bogdan, M; Kübler, A; Rosenstiel, W; Sitaram, R; Birbaumer, N
2013-01-01
Brain-computer interface (BCI) provide a non-muscular communication channel for patients with impairments of the motor system. A significant number of BCI users is unable to obtain voluntary control of a BCI-system in proper time. This makes methods that can be used to determine the aptitude of a user necessary. We hypothesized that integrity and connectivity of involved white matter connections may serve as a predictor of individual BCI-performance. Therefore, we analyzed structural data from anatomical scans and DTI of motor imagery BCI-users differentiated into high and low BCI-aptitude groups based on their overall performance. Using a machine learning classification method we identified discriminating structural brain trait features and correlated the best features with a continuous measure of individual BCI-performance. Prediction of the aptitude group of each participant was possible with near perfect accuracy (one error). Tissue volumetric analysis yielded only poor classification results. In contrast, the structural integrity and myelination quality of deep white matter structures such as the Corpus Callosum, Cingulum, and Superior Fronto-Occipital Fascicle were positively correlated with individual BCI-performance. This confirms that structural brain traits contribute to individual performance in BCI use.
Prediction of brain-computer interface aptitude from individual brain structure
Halder, S.; Varkuti, B.; Bogdan, M.; Kübler, A.; Rosenstiel, W.; Sitaram, R.; Birbaumer, N.
2013-01-01
Objective: Brain-computer interface (BCI) provide a non-muscular communication channel for patients with impairments of the motor system. A significant number of BCI users is unable to obtain voluntary control of a BCI-system in proper time. This makes methods that can be used to determine the aptitude of a user necessary. Methods: We hypothesized that integrity and connectivity of involved white matter connections may serve as a predictor of individual BCI-performance. Therefore, we analyzed structural data from anatomical scans and DTI of motor imagery BCI-users differentiated into high and low BCI-aptitude groups based on their overall performance. Results: Using a machine learning classification method we identified discriminating structural brain trait features and correlated the best features with a continuous measure of individual BCI-performance. Prediction of the aptitude group of each participant was possible with near perfect accuracy (one error). Conclusions: Tissue volumetric analysis yielded only poor classification results. In contrast, the structural integrity and myelination quality of deep white matter structures such as the Corpus Callosum, Cingulum, and Superior Fronto-Occipital Fascicle were positively correlated with individual BCI-performance. Significance: This confirms that structural brain traits contribute to individual performance in BCI use. PMID:23565083
Structure-activity relationships for skin sensitization: recent improvements to Derek for Windows.
Langton, Kate; Patlewicz, Grace Y; Long, Anthony; Marchant, Carol A; Basketter, David A
2006-12-01
Derek for Windows (DfW) is a knowledge-based expert system that predicts the toxicity of a chemical from its structure. Its predictions are based in part on alerts that describe structural features or toxicophores associated with toxicity. Recently, improvements have been made to skin sensitization alerts within the DfW knowledge base in collaboration with Unilever. These include modifications to the alerts describing the skin sensitization potential of aldehydes, 1,2-diketones, and isothiazolinones and consist of enhancements to the toxicophore definition, the mechanistic classification, and the extent of supporting evidence provided. The outcomes from this collaboration demonstrate the importance of updating and refining computer models for the prediction of skin sensitization as new information from experimental and theoretical studies becomes available.
Our study assesses the value of both in vitro assay and quantitative structure activity relationship (QSAR) data in predicting in vivo toxicity using numerous statistical models and approaches to process the data. Our models are built on datasets of (i) 586 chemicals for which bo...
Protein 8-class secondary structure prediction using conditional neural fields.
Wang, Zhiyong; Zhao, Feng; Peng, Jian; Xu, Jinbo
2011-10-01
Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Chira, Camelia; Horvath, Dragos; Dumitrescu, D
2011-07-30
Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying
2015-01-01
Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.
2015-01-01
Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483
Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng
2009-04-21
In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.
Age structure is critical to the population dynamics and survival of honeybee colonies
Betti, M. I.; Wahl, L. M.
2016-01-01
Age structure is an important feature of the division of labour within honeybee colonies, but its effects on colony dynamics have rarely been explored. We present a model of a honeybee colony that incorporates this key feature, and use this model to explore the effects of both winter and disease on the fate of the colony. The model offers a novel explanation for the frequently observed phenomenon of ‘spring dwindle’, which emerges as a natural consequence of the age-structured dynamics. Furthermore, the results indicate that a model taking age structure into account markedly affects the predicted timing and severity of disease within a bee colony. The timing of the onset of disease with respect to the changing seasons may also have a substantial impact on the fate of a honeybee colony. Finally, simulations predict that an infection may persist in a honeybee colony over several years, with effects that compound over time. Thus, the ultimate collapse of the colony may be the result of events several years past. PMID:28018627
RRCRank: a fusion method using rank strategy for residue-residue contact prediction.
Jing, Xiaoyang; Dong, Qiwen; Lu, Ruqian
2017-09-02
In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.
NASA Astrophysics Data System (ADS)
Irvine, John M.; Ghadar, Nastaran; Duncan, Steve; Floyd, David; O'Dowd, David; Lin, Kristie; Chang, Tom
2017-03-01
Quantitative biomarkers for assessing the presence, severity, and progression of age-related macular degeneration (AMD) would benefit research, diagnosis, and treatment. This paper explores development of quantitative biomarkers derived from OCT imagery of the retina. OCT images for approximately 75 patients with Wet AMD, Dry AMD, and no AMD (healthy eyes) were analyzed to identify image features indicative of the patients' conditions. OCT image features provide a statistical characterization of the retina. Healthy eyes exhibit a layered structure, whereas chaotic patterns indicate the deterioration associated with AMD. Our approach uses wavelet and Frangi filtering, combined with statistical features that do not rely on image segmentation, to assess patient conditions. Classification analysis indicates clear separability of Wet AMD from other conditions, including Dry AMD and healthy retinas. The probability of correct classification of was 95.7%, as determined from cross validation. Similar classification analysis predicts the response of Wet AMD patients to treatment, as measured by the Best Corrected Visual Acuity (BCVA). A statistical model predicts BCVA from the imagery features with R2 = 0.846. Initial analysis of OCT imagery indicates that imagery-derived features can provide useful biomarkers for characterization and quantification of AMD: Accurate assessment of Wet AMD compared to other conditions; image-based prediction of outcome for Wet AMD treatment; and features derived from the OCT imagery accurately predict BCVA; unlike many methods in the literature, our techniques do not rely on segmentation of the OCT image. Next steps include larger scale testing and validation.
Communication: Finding destructive interference features in molecular transport junctions.
Reuter, Matthew G; Hansen, Thorsten
2014-11-14
Associating molecular structure with quantum interference features in electrode-molecule-electrode transport junctions has been difficult because existing guidelines for understanding interferences only apply to conjugated hydrocarbons. Herein we use linear algebra and the Landauer-Büttiker theory for electron transport to derive a general rule for predicting the existence and locations of interference features. Our analysis illustrates that interferences can be directly determined from the molecular Hamiltonian and the molecule-electrode couplings, and we demonstrate its utility with several examples.
Compound Structure-Independent Activity Prediction in High-Dimensional Target Space.
Balfer, Jenny; Hu, Ye; Bajorath, Jürgen
2014-08-01
Profiling of compound libraries against arrays of targets has become an important approach in pharmaceutical research. The prediction of multi-target compound activities also represents an attractive task for machine learning with potential for drug discovery applications. Herein, we have explored activity prediction in high-dimensional target space. Different types of models were derived to predict multi-target activities. The models included naïve Bayesian (NB) and support vector machine (SVM) classifiers based upon compound structure information and NB models derived on the basis of activity profiles, without considering compound structure. Because the latter approach can be applied to incomplete training data and principally depends on the feature independence assumption, SVM modeling was not applicable in this case. Furthermore, iterative hybrid NB models making use of both activity profiles and compound structure information were built. In high-dimensional target space, NB models utilizing activity profile data were found to yield more accurate activity predictions than structure-based NB and SVM models or hybrid models. An in-depth analysis of activity profile-based models revealed the presence of correlation effects across different targets and rationalized prediction accuracy. Taken together, the results indicate that activity profile information can be effectively used to predict the activity of test compounds against novel targets. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Ulomov, V. I.; Danilova, T. I.; Medvedeva, N. S.; Polyakova, T. P.
2006-07-01
The Scythian-Turan platform, together with the Alpine Iran-Caucasus-Anatolia and Hercynian Central Tien Shan orogenic structures adjacent to it, represents a coherent seismogeodynamic system responsible for regional seismicity features in the territory under consideration. Investigations of the spatiotemporal and energy evolution of seismogeodynamic processes along the main lineament structures of the orogen reveal characteristic features directly related to the prediction of seismic hazard in this region, as well as in southern European Russia. These characteristics primarily include kinematic features in the sequences of seismic events of various magnitudes and an ordered migration of seismic activation, enabling the more or less reliable determination of the occurrence time intervals (years) and areas of forthcoming large earthquakes (magnitudes of 7.0 ± 0.2, 7.5 ± 0.2, and 8.0 ± 0.2).
Using Deep Learning Model for Meteorological Satellite Cloud Image Prediction
NASA Astrophysics Data System (ADS)
Su, X.
2017-12-01
A satellite cloud image contains much weather information such as precipitation information. Short-time cloud movement forecast is important for precipitation forecast and is the primary means for typhoon monitoring. The traditional methods are mostly using the cloud feature matching and linear extrapolation to predict the cloud movement, which makes that the nonstationary process such as inversion and deformation during the movement of the cloud is basically not considered. It is still a hard task to predict cloud movement timely and correctly. As deep learning model could perform well in learning spatiotemporal features, to meet this challenge, we could regard cloud image prediction as a spatiotemporal sequence forecasting problem and introduce deep learning model to solve this problem. In this research, we use a variant of Gated-Recurrent-Unit(GRU) that has convolutional structures to deal with spatiotemporal features and build an end-to-end model to solve this forecast problem. In this model, both the input and output are spatiotemporal sequences. Compared to Convolutional LSTM(ConvLSTM) model, this model has lower amount of parameters. We imply this model on GOES satellite data and the model perform well.
Velankar, Sameer; Kryshtafovych, Andriy; Huang, Shen‐You; Schneidman‐Duhovny, Dina; Sali, Andrej; Segura, Joan; Fernandez‐Fuentes, Narcis; Viswanath, Shruthi; Elber, Ron; Grudinin, Sergei; Popov, Petr; Neveu, Emilie; Lee, Hasup; Baek, Minkyung; Park, Sangwoo; Heo, Lim; Rie Lee, Gyu; Seok, Chaok; Qin, Sanbo; Zhou, Huan‐Xiang; Ritchie, David W.; Maigret, Bernard; Devignes, Marie‐Dominique; Ghoorah, Anisah; Torchala, Mieczyslaw; Chaleil, Raphaël A.G.; Bates, Paul A.; Ben‐Zeev, Efrat; Eisenstein, Miriam; Negi, Surendra S.; Weng, Zhiping; Vreven, Thom; Pierce, Brian G.; Borrman, Tyler M.; Yu, Jinchao; Ochsenbein, Françoise; Guerois, Raphaël; Vangone, Anna; Rodrigues, João P.G.L.M.; van Zundert, Gydo; Nellen, Mehdi; Xue, Li; Karaca, Ezgi; Melquiond, Adrien S.J.; Visscher, Koen; Kastritis, Panagiotis L.; Bonvin, Alexandre M.J.J.; Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Li, Jilong; Ma, Zhiwei; Cheng, Jianlin; Zou, Xiaoqin; Shen, Yang; Peterson, Lenna X.; Kim, Hyung‐Rae; Roy, Amit; Han, Xusi; Esquivel‐Rodriguez, Juan; Kihara, Daisuke; Yu, Xiaofeng; Bruce, Neil J.; Fuller, Jonathan C.; Wade, Rebecca C.; Anishchenko, Ivan; Kundrotas, Petras J.; Vakser, Ilya A.; Imai, Kenichiro; Yamada, Kazunori; Oda, Toshiyuki; Nakamura, Tsukasa; Tomii, Kentaro; Pallara, Chiara; Romero‐Durana, Miguel; Jiménez‐García, Brian; Moal, Iain H.; Férnandez‐Recio, Juan; Joung, Jong Young; Kim, Jong Yun; Joo, Keehyoung; Lee, Jooyoung; Kozakov, Dima; Vajda, Sandor; Mottarella, Scott; Hall, David R.; Beglov, Dmitri; Mamonov, Artem; Xia, Bing; Bohnuud, Tanggis; Del Carpio, Carlos A.; Ichiishi, Eichiro; Marze, Nicholas; Kuroda, Daisuke; Roy Burman, Shourya S.; Gray, Jeffrey J.; Chermak, Edrisse; Cavallo, Luigi; Oliva, Romina; Tovchigrechko, Andrey
2016-01-01
ABSTRACT We present the results for CAPRI Round 30, the first joint CASP‐CAPRI experiment, which brought together experts from the protein structure prediction and protein–protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact‐sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology‐built subunit models and the smaller pair‐wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy. Proteins 2016; 84(Suppl 1):323–348. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:27122118
Computational approaches for predicting biomedical research collaborations.
Zhang, Qing; Yu, Hong
2014-01-01
Biomedical research is increasingly collaborative, and successful collaborations often produce high impact work. Computational approaches can be developed for automatically predicting biomedical research collaborations. Previous works of collaboration prediction mainly explored the topological structures of research collaboration networks, leaving out rich semantic information from the publications themselves. In this paper, we propose supervised machine learning approaches to predict research collaborations in the biomedical field. We explored both the semantic features extracted from author research interest profile and the author network topological features. We found that the most informative semantic features for author collaborations are related to research interest, including similarity of out-citing citations, similarity of abstracts. Of the four supervised machine learning models (naïve Bayes, naïve Bayes multinomial, SVMs, and logistic regression), the best performing model is logistic regression with an ROC ranging from 0.766 to 0.980 on different datasets. To our knowledge we are the first to study in depth how research interest and productivities can be used for collaboration prediction. Our approach is computationally efficient, scalable and yet simple to implement. The datasets of this study are available at https://github.com/qingzhanggithub/medline-collaboration-datasets.
KFC Server: interactive forecasting of protein interaction hot spots.
Darnell, Steven J; LeGault, Laura; Mitchell, Julie C
2008-07-01
The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) model-a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted protein-protein or protein-DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org.
KFC Server: interactive forecasting of protein interaction hot spots
Darnell, Steven J.; LeGault, Laura; Mitchell, Julie C.
2008-01-01
The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) model—a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted protein–protein or protein–DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org. PMID:18539611
The motor origins of human and avian song structure
Tierney, Adam T.; Russo, Frank A.; Patel, Aniruddh D.
2011-01-01
Human song exhibits great structural diversity, yet certain aspects of melodic shape (how pitch is patterned over time) are widespread. These include a predominance of arch-shaped and descending melodic contours in musical phrases, a tendency for phrase-final notes to be relatively long, and a bias toward small pitch movements between adjacent notes in a melody [Huron D (2006) Sweet Anticipation: Music and the Psychology of Expectation (MIT Press, Cambridge, MA)]. What is the origin of these features? We hypothesize that they stem from motor constraints on song production (i.e., the energetic efficiency of their underlying motor actions) rather than being innately specified. One prediction of this hypothesis is that any animals subject to similar motor constraints on song will exhibit similar melodic shapes, no matter how distantly related those animals are to humans. Conversely, animals who do not share similar motor constraints on song will not exhibit convergent melodic shapes. Birds provide an ideal case for testing these predictions, because their peripheral mechanisms of song production have both notable similarities and differences from human vocal mechanisms [Riede T, Goller F (2010) Brain Lang 115:69–80]. We use these similarities and differences to make specific predictions about shared and distinct features of human and avian song structure and find that these predictions are confirmed by empirical analysis of diverse human and avian song samples. PMID:21876156
Impact of mutations on the allosteric conformational equilibrium
Weinkam, Patrick; Chen, Yao Chi; Pons, Jaume; Sali, Andrej
2012-01-01
Allostery in a protein involves effector binding at an allosteric site that changes the structure and/or dynamics at a distant, functional site. In addition to the chemical equilibrium of ligand binding, allostery involves a conformational equilibrium between one protein substate that binds the effector and a second substate that less strongly binds the effector. We run molecular dynamics simulations using simple, smooth energy landscapes to sample specific ligand-induced conformational transitions, as defined by the effector-bound and unbound protein structures. These simulations can be performed using our web server: http://salilab.org/allosmod/. We then develop a set of features to analyze the simulations and capture the relevant thermodynamic properties of the allosteric conformational equilibrium. These features are based on molecular mechanics energy functions, stereochemical effects, and structural/dynamic coupling between sites. Using a machine-learning algorithm on a dataset of 10 proteins and 179 mutations, we predict both the magnitude and sign of the allosteric conformational equilibrium shift by the mutation; the impact of a large identifiable fraction of the mutations can be predicted with an average unsigned error of 1 kBT. With similar accuracy, we predict the mutation effects for an 11th protein that was omitted from the initial training and testing of the machine-learning algorithm. We also assess which calculated thermodynamic properties contribute most to the accuracy of the prediction. PMID:23228330
2010-09-01
22 Figure 23. Flow Type and the reference empirical model ............................................................ 24 Figure 24. Baseline...Trajectory ...................................................................................................... 25 Figure 25. Flow Features Important...94 viii GLOSSARY ACCTE Advanced Ceramic Composites for Turbine Engines AFRL Air Force Research Laboratory AoA Angle of Attack ASE
Lim, Chun Shen; Brown, Chris M
2017-01-01
Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community.
Lim, Chun Shen; Brown, Chris M.
2018-01-01
Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community. PMID:29354101
Understanding the biological underpinnings of ecohydrological processes
NASA Astrophysics Data System (ADS)
Huxman, T. E.; Scott, R. L.; Barron-Gafford, G. A.; Hamerlynck, E. P.; Jenerette, D.; Tissue, D. T.; Breshears, D. D.; Saleska, S. R.
2012-12-01
Climate change presents a challenge for predicting ecosystem response, as multiple factors drive both the physical and life processes happening on the land surface and their interactions result in a complex, evolving coupled system. For example, changes in surface temperature and precipitation influence near-surface hydrology through impacts on system energy balance, affecting a range of physical processes. These changes in the salient features of the environment affect biological processes and elicit responses along the hierarchy of life (biochemistry to community composition). Many of these structural or process changes can alter patterns of soil water-use and influence land surface characteristics that affect local climate. Of the many features that affect our ability to predict the future dynamics of ecosystems, it is this hierarchical response of life that creates substantial complexity. Advances in the ability to predict or understand aspects of demography help describe thresholds in coupled ecohydrological system. Disentangling the physical and biological features that underlie land surface dynamics following disturbance are allowing a better understanding of the partitioning of water in the time-course of recovery. Better predicting the timing of phenology and key seasonal events allow for a more accurate description of the full functional response of the land surface to climate. In addition, explicitly considering the hierarchical structural features of life are helping to describe complex time-dependent behavior in ecosystems. However, despite this progress, we have yet to build an ability to fully account for the generalization of the main features of living systems into models that can describe ecohydrological processes, especially acclimation, assembly and adaptation. This is unfortunate, given that many key ecosystem services are functions of these coupled co-evolutionary processes. To date, both the lack of controlled measurements and experimentation has precluded determination of sufficient theoretical development. Understanding the land-surface response and feedback to climate change requires a mechanistic understanding of the coupling of ecological and hydrological processes and an expansion of theory from the life sciences to appropriately contribute to the broader Earth system science goal.
Link Prediction in Evolving Networks Based on Popularity of Nodes.
Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian
2017-08-02
Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.
Coutinho, Eduardo; Cangelosi, Angelo
2011-08-01
We sustain that the structure of affect elicited by music is largely dependent on dynamic temporal patterns in low-level music structural parameters. In support of this claim, we have previously provided evidence that spatiotemporal dynamics in psychoacoustic features resonate with two psychological dimensions of affect underlying judgments of subjective feelings: arousal and valence. In this article we extend our previous investigations in two aspects. First, we focus on the emotions experienced rather than perceived while listening to music. Second, we evaluate the extent to which peripheral feedback in music can account for the predicted emotional responses, that is, the role of physiological arousal in determining the intensity and valence of musical emotions. Akin to our previous findings, we will show that a significant part of the listeners' reported emotions can be predicted from a set of six psychoacoustic features--loudness, pitch level, pitch contour, tempo, texture, and sharpness. Furthermore, the accuracy of those predictions is improved with the inclusion of physiological cues--skin conductance and heart rate. The interdisciplinary work presented here provides a new methodology to the field of music and emotion research based on the combination of computational and experimental work, which aid the analysis of the emotional responses to music, while offering a platform for the abstract representation of those complex relationships. Future developments may aid specific areas, such as, psychology and music therapy, by providing coherent descriptions of the emotional effects of specific music stimuli. 2011 APA, all rights reserved
NASA Astrophysics Data System (ADS)
BozorgMagham, Amir E.; Ross, Shane D.; Schmale, David G.
2013-09-01
The language of Lagrangian coherent structures (LCSs) provides a new means for studying transport and mixing of passive particles advected by an atmospheric flow field. Recent observations suggest that LCSs govern the large-scale atmospheric motion of airborne microorganisms, paving the way for more efficient models and management strategies for the spread of infectious diseases affecting plants, domestic animals, and humans. In addition, having reliable predictions of the timing of hyperbolic LCSs may contribute to improved aerobiological sampling of microorganisms with unmanned aerial vehicles and LCS-based early warning systems. Chaotic atmospheric dynamics lead to unavoidable forecasting errors in the wind velocity field, which compounds errors in LCS forecasting. In this study, we reveal the cumulative effects of errors of (short-term) wind field forecasts on the finite-time Lyapunov exponent (FTLE) fields and the associated LCSs when realistic forecast plans impose certain limits on the forecasting parameters. Objectives of this paper are to (a) quantify the accuracy of prediction of FTLE-LCS features and (b) determine the sensitivity of such predictions to forecasting parameters. Results indicate that forecasts of attracting LCSs exhibit less divergence from the archive-based LCSs than the repelling features. This result is important since attracting LCSs are the backbone of long-lived features in moving fluids. We also show under what circumstances one can trust the forecast results if one merely wants to know if an LCS passed over a region and does not need to precisely know the passage time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bajaj, R. Alexandra; Arbing, Mark A.; Shin, Annie
The structure of Msmeg_6760, a protein of unknown function, has been determined. Biochemical and bioinformatics analyses determined that Msmeg_6760 interacts with a protein encoded in the same operon, Msmeg_6762, and predicted that the operon is a toxin–antitoxin (TA) system. Structural comparison of Msmeg_6760 with proteins of known function suggests that Msmeg_6760 binds a hydrophobic ligand in a buried cavity lined by large hydrophobic residues. Access to this cavity could be controlled by a gate–latch mechanism. The function of the Msmeg_6760 toxin is unknown, but structure-based predictions revealed that Msmeg_6760 and Msmeg_6762 are homologous to Rv2034 and Rv2035, a predicted novelmore » TA system involved inMycobacterium tuberculosislatency during macrophage infection. The Msmeg_6760 toxin fold has not been previously described for bacterial toxins and its unique structural features suggest that toxin activation is likely to be mediated by a novel mechanism.« less
Antunes, Deborah; Jorge, Natasha A. N.; Caffarena, Ernesto R.; Passetti, Fabio
2018-01-01
RNA molecules are essential players in many fundamental biological processes. Prokaryotes and eukaryotes have distinct RNA classes with specific structural features and functional roles. Computational prediction of protein structures is a research field in which high confidence three-dimensional protein models can be proposed based on the sequence alignment between target and templates. However, to date, only a few approaches have been developed for the computational prediction of RNA structures. Similar to proteins, RNA structures may be altered due to the interaction with various ligands, including proteins, other RNAs, and metabolites. A riboswitch is a molecular mechanism, found in the three kingdoms of life, in which the RNA structure is modified by the binding of a metabolite. It can regulate multiple gene expression mechanisms, such as transcription, translation initiation, and mRNA splicing and processing. Due to their nature, these entities also act on the regulation of gene expression and detection of small metabolites and have the potential to helping in the discovery of new classes of antimicrobial agents. In this review, we describe software and web servers currently available for riboswitch aptamer identification and secondary and tertiary structure prediction, including applications. PMID:29403526
Discrimination Enhancement with Transient Feature Analysis of a Graphene Chemical Sensor.
Nallon, Eric C; Schnee, Vincent P; Bright, Collin J; Polcha, Michael P; Li, Qiliang
2016-01-19
A graphene chemical sensor is subjected to a set of structurally and chemically similar hydrocarbon compounds consisting of toluene, o-xylene, p-xylene, and mesitylene. The fractional change in resistance of the sensor upon exposure to these compounds exhibits a similar response magnitude among compounds, whereas large variation is observed within repetitions for each compound, causing a response overlap. Therefore, traditional features depending on maximum response change will cause confusion during further discrimination and classification analysis. More robust features that are less sensitive to concentration, sampling, and drift variability would provide higher quality information. In this work, we have explored the advantage of using transient-based exponential fitting coefficients to enhance the discrimination of similar compounds. The advantages of such feature analysis to discriminate each compound is evaluated using principle component analysis (PCA). In addition, machine learning-based classification algorithms were used to compare the prediction accuracies when using fitting coefficients as features. The additional features greatly enhanced the discrimination between compounds while performing PCA and also improved the prediction accuracy by 34% when using linear discrimination analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong
Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less
NASA Astrophysics Data System (ADS)
Rudskoy, A. I.; Kondrat'ev, S. Yu.; Sokolov, Yu. A.
2016-05-01
Possibilities of electron beam synthesis of structural and tool composite materials are considered. It is shown that a novel process involving mathematical modeling of each individual operation makes it possible to create materials with programmable structure and predictable properties from granules of various specified chemical compositions and sizes.
ERIC Educational Resources Information Center
Petrov, Mark G.
2016-01-01
Thermally activated analysis of experimental data allows considering about the structure features of each material. By modelling the structural heterogeneity of materials by means of rheological models, general and local plastic flows in metals and alloys can be described over. Based on physical fundamentals of failure and deformation of materials…
Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy
Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao; Song, Qing
2016-01-01
Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc. PMID:27662651
2012-01-01
Background There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. Results The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. Conclusions In addition to confirming literature results, ProGolem’s model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners. PMID:22783946
Structured Light-Based 3D Reconstruction System for Plants
Nguyen, Thuy Tuong; Slaughter, David C.; Max, Nelson; Maloof, Julin N.; Sinha, Neelima
2015-01-01
Camera-based 3D reconstruction of physical objects is one of the most popular computer vision trends in recent years. Many systems have been built to model different real-world subjects, but there is lack of a completely robust system for plants.This paper presents a full 3D reconstruction system that incorporates both hardware structures (including the proposed structured light system to enhance textures on object surfaces) and software algorithms (including the proposed 3D point cloud registration and plant feature measurement). This paper demonstrates the ability to produce 3D models of whole plants created from multiple pairs of stereo images taken at different viewing angles, without the need to destructively cut away any parts of a plant. The ability to accurately predict phenotyping features, such as the number of leaves, plant height, leaf size and internode distances, is also demonstrated. Experimental results show that, for plants having a range of leaf sizes and a distance between leaves appropriate for the hardware design, the algorithms successfully predict phenotyping features in the target crops, with a recall of 0.97 and a precision of 0.89 for leaf detection and less than a 13-mm error for plant size, leaf size and internode distance. PMID:26230701
Soft Computing Methods for Disulfide Connectivity Prediction.
Márquez-Chamorro, Alfonso E; Aguilar-Ruiz, Jesús S
2015-01-01
The problem of protein structure prediction (PSP) is one of the main challenges in structural bioinformatics. To tackle this problem, PSP can be divided into several subproblems. One of these subproblems is the prediction of disulfide bonds. The disulfide connectivity prediction problem consists in identifying which nonadjacent cysteines would be cross-linked from all possible candidates. Determining the disulfide bond connectivity between the cysteines of a protein is desirable as a previous step of the 3D PSP, as the protein conformational search space is highly reduced. The most representative soft computing approaches for the disulfide bonds connectivity prediction problem of the last decade are summarized in this paper. Certain aspects, such as the different methodologies based on soft computing approaches (artificial neural network or support vector machine) or features of the algorithms, are used for the classification of these methods.
NASA Astrophysics Data System (ADS)
Kawata, Y.; Niki, N.; Ohmatsu, H.; Aokage, K.; Kusumoto, M.; Tsuchida, T.; Eguchi, K.; Kaneko, M.
2015-03-01
Advantages of CT scanners with high resolution have allowed the improved detection of lung cancers. In the recent release of positive results from the National Lung Screening Trial (NLST) in the US showing that CT screening does in fact have a positive impact on the reduction of lung cancer related mortality. While this study does show the efficacy of CT based screening, physicians often face the problems of deciding appropriate management strategies for maximizing patient survival and for preserving lung function. Several key manifold-learning approaches efficiently reveal intrinsic low-dimensional structures latent in high-dimensional data spaces. This study was performed to investigate whether the dimensionality reduction can identify embedded structures from the CT histogram feature of non-small-cell lung cancer (NSCLC) space to improve the performance in predicting the likelihood of RFS for patients with NSCLC.
Optimal Design of Experiments by Combining Coarse and Fine Measurements
NASA Astrophysics Data System (ADS)
Lee, Alpha A.; Brenner, Michael P.; Colwell, Lucy J.
2017-11-01
In many contexts, it is extremely costly to perform enough high-quality experimental measurements to accurately parametrize a predictive quantitative model. However, it is often much easier to carry out large numbers of experiments that indicate whether each sample is above or below a given threshold. Can many such categorical or "coarse" measurements be combined with a much smaller number of high-resolution or "fine" measurements to yield accurate models? Here, we demonstrate an intuitive strategy, inspired by statistical physics, wherein the coarse measurements are used to identify the salient features of the data, while the fine measurements determine the relative importance of these features. A linear model is inferred from the fine measurements, augmented by a quadratic term that captures the correlation structure of the coarse data. We illustrate our strategy by considering the problems of predicting the antimalarial potency and aqueous solubility of small organic molecules from their 2D molecular structure.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).
Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C
2015-01-01
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.
The persuasion network is modulated by drug-use risk and predicts anti-drug message effectiveness.
Huskey, Richard; Mangus, J Michael; Turner, Benjamin O; Weber, René
2017-12-01
While a persuasion network has been proposed, little is known about how network connections between brain regions contribute to attitude change. Two possible mechanisms have been advanced. One hypothesis predicts that attitude change results from increased connectivity between structures implicated in affective and executive processing in response to increases in argument strength. A second functional perspective suggests that highly arousing messages reduce connectivity between structures implicated in the encoding of sensory information, which disrupts message processing and thereby inhibits attitude change. However, persuasion is a multi-determined construct that results from both message features and audience characteristics. Therefore, persuasive messages should lead to specific functional connectivity patterns among a priori defined structures within the persuasion network. The present study exposed 28 subjects to anti-drug public service announcements where arousal, argument strength, and subject drug-use risk were systematically varied. Psychophysiological interaction analyses provide support for the affective-executive hypothesis but not for the encoding-disruption hypothesis. Secondary analyses show that video-level connectivity patterns among structures within the persuasion network predict audience responses in independent samples (one college-aged, one nationally representative). We propose that persuasion neuroscience research is best advanced by considering network-level effects while accounting for interactions between message features and target audience characteristics. © The Author (2017). Published by Oxford University Press.
Fast metabolite identification with Input Output Kernel Regression.
Brouard, Céline; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho
2016-06-15
An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. celine.brouard@aalto.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Fast metabolite identification with Input Output Kernel Regression
Brouard, Céline; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho
2016-01-01
Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: celine.brouard@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307628
Return to Work After Lumbar Microdiscectomy - Personalizing Approach Through Predictive Modeling.
Papić, Monika; Brdar, Sanja; Papić, Vladimir; Lončar-Turukalo, Tatjana
2016-01-01
Lumbar disc herniation (LDH) is the most common disease among working population requiring surgical intervention. This study aims to predict the return to work after operative treatment of LDH based on the observational study including 153 patients. The classification problem was approached using decision trees (DT), support vector machines (SVM) and multilayer perception (MLP) combined with RELIEF algorithm for feature selection. MLP provided best recall of 0.86 for the class of patients not returning to work, which combined with the selected features enables early identification and personalized targeted interventions towards subjects at risk of prolonged disability. The predictive modeling indicated at the most decisive risk factors in prolongation of work absence: psychosocial factors, mobility of the spine and structural changes of facet joints and professional factors including standing, sitting and microclimate.
Goldstein-Piekarski, Andrea N.; Greer, Stephanie M.; Stark, Shauna; Stark, Craig E.
2016-01-01
Sleep deprivation impairs the formation of new memories. However, marked interindividual variability exists in the degree to which sleep loss compromises learning, the mechanistic reasons for which are unclear. Furthermore, which physiological sleep processes restore learning ability following sleep deprivation are similarly unknown. Here, we demonstrate that the structural morphology of human hippocampal subfields represents one factor determining vulnerability (and conversely, resilience) to the impact of sleep deprivation on memory formation. Moreover, this same measure of brain morphology was further associated with the quality of nonrapid eye movement slow wave oscillations during recovery sleep, and by way of such activity, determined the success of memory restoration. Such findings provide a novel human biomarker of cognitive susceptibility to, and recovery from, sleep deprivation. Moreover, this metric may be of special predictive utility for professions in which memory function is paramount yet insufficient sleep is pervasive (e.g., aviation, military, and medicine). SIGNIFICANCE STATEMENT Sleep deprivation does not impact all people equally. Some individuals show cognitive resilience to the effects of sleep loss, whereas others express striking vulnerability, the reasons for which remain largely unknown. Here, we demonstrate that structural features of the human brain, specifically those within the hippocampus, accurately predict which individuals are susceptible (or conversely, resilient) to memory impairments caused by sleep deprivation. Moreover, this same structural feature determines the success of memory restoration following subsequent recovery sleep. Therefore, structural properties of the human brain represent a novel biomarker predicting individual vulnerability to (and recovery from) the effects of sleep loss, one with occupational relevance in professions where insufficient sleep is pervasive yet memory function is paramount. PMID:26911684
Conceptual Hierarchies in a Flat Attractor Network
O’Connor, Christopher M.; Cree, George S.; McRae, Ken
2009-01-01
The structure of people’s conceptual knowledge of concrete nouns has traditionally been viewed as hierarchical (Collins & Quillian, 1969). For example, superordinate concepts (vegetable) are assumed to reside at a higher level than basic-level concepts (carrot). A feature-based attractor network with a single layer of semantic features developed representations of both basic-level and superordinate concepts. No hierarchical structure was built into the network. In Experiment and Simulation 1, the graded structure of categories (typicality ratings) is accounted for by the flat attractor-network. Experiment and Simulation 2 show that, as with basic-level concepts, such a network predicts feature verification latencies for superordinate concepts (vegetable
Fatigue Analyses Under Constant- and Variable-Amplitude Loading Using Small-Crack Theory
NASA Technical Reports Server (NTRS)
Newman, J. C., Jr.; Phillips, E. P.; Everett, R. A., Jr.
1999-01-01
Studies on the growth of small cracks have led to the observation that fatigue life of many engineering materials is primarily "crack growth" from micro-structural features, such as inclusion particles, voids, slip-bands or from manufacturing defects. This paper reviews the capabilities of a plasticity-induced crack-closure model to predict fatigue lives of metallic materials using "small-crack theory" under various loading conditions. Constraint factors, to account for three-dimensional effects, were selected to correlate large-crack growth rate data as a function of the effective stress-intensity factor range (delta-Keff) under constant-amplitude loading. Modifications to the delta-Keff-rate relations in the near-threshold regime were needed to fit measured small-crack growth rate behavior. The model was then used to calculate small-and large-crack growth rates, and to predict total fatigue lives, for notched and un-notched specimens under constant-amplitude and spectrum loading. Fatigue lives were predicted using crack-growth relations and micro-structural features like those that initiated cracks in the fatigue specimens for most of the materials analyzed. Results from the tests and analyses agreed well.
A systematic investigation of computation models for predicting Adverse Drug Reactions (ADRs).
Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong
2014-01-01
Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms.
Integrated Optical Design Analysis (IODA): New Test Data and Modeling Features
NASA Technical Reports Server (NTRS)
Moore, Jim; Troy, Ed; Patrick, Brian
2003-01-01
A general overview of the capabilities of the IODA ("Integrated Optical Design Analysis") exchange of data and modeling results between thermal, structures, optical design, and testing engineering disciplines. This presentation focuses on new features added to the software that allow measured test data to be imported into the IODA environment for post processing or comparisons with pretest model predictions. software is presented. IODA promotes efficient
Shatabda, Swakkhar; Saha, Sanjay; Sharma, Alok; Dehzangi, Abdollah
2017-12-21
Bacteriophage proteins are viruses that can significantly impact on the functioning of bacteria and can be used in phage based therapy. The functioning of Bacteriophage in the host bacteria depends on its location in those host cells. It is very important to know the subcellular location of the phage proteins in a host cell in order to understand their working mechanism. In this paper, we propose iPHLoc-ES, a prediction method for subcellular localization of bacteriophage proteins. We aim to solve two problems: discriminating between host located and non-host located phage proteins and discriminating between the locations of host located protein in a host cell (membrane or cytoplasm). To do this, we extract sets of evolutionary and structural features of phage protein and employ Support Vector Machine (SVM) as our classifier. We also use recursive feature elimination (RFE) to reduce the number of features for effective prediction. On standard dataset using standard evaluation criteria, our method significantly outperforms the state-of-the-art predictor. iPHLoc-ES is readily available to use as a standalone tool from: https://github.com/swakkhar/iPHLoc-ES/ and as a web application from: http://brl.uiu.ac.bd/iPHLoc-ES/. Copyright © 2017 Elsevier Ltd. All rights reserved.
Butts, Carter T.; Bierma, Jan C.; Martin, Rachel W.
2016-01-01
In his 1875 monograph on insectivorous plants, Darwin described the feeding reactions of Drosera flypaper traps and predicted that their secretions contained a “ferment” similar to mammalian pepsin, an aspartic protease. Here we report a high-quality draft genome sequence for the cape sundew, Drosera capensis, the first genome of a carnivorous plant from order Caryophyllales, which also includes the Venus flytrap (Dionaea) and the tropical pitcher plants (Nepenthes). This species was selected in part for its hardiness and ease of cultivation, making it an excellent model organism for further investigations of plant carnivory. Analysis of predicted protein sequences yields genes encoding proteases homologous to those found in other plants, some of which display sequence and structural features that suggest novel functionalities. Because the sequence similarity to proteins of known structure is in most cases too low for traditional homology modeling, 3D structures of representative proteases are predicted using comparative modeling with all-atom refinement. Although the overall folds and active residues for these proteins are conserved, we find structural and sequence differences consistent with a diversity of substrate recognition patterns. Finally, we predict differences in substrate specificities using in silico experiments, providing targets for structure/function studies of novel enzymes with biological and technological significance. PMID:27353064
Toward link predictability of complex networks
Lü, Linyuan; Pan, Liming; Zhou, Tao; Zhang, Yi-Cheng; Stanley, H. Eugene
2015-01-01
The organization of real networks usually embodies both regularities and irregularities, and, in principle, the former can be modeled. The extent to which the formation of a network can be explained coincides with our ability to predict missing links. To understand network organization, we should be able to estimate link predictability. We assume that the regularity of a network is reflected in the consistency of structural features before and after a random removal of a small set of links. Based on the perturbation of the adjacency matrix, we propose a universal structural consistency index that is free of prior knowledge of network organization. Extensive experiments on disparate real-world networks demonstrate that (i) structural consistency is a good estimation of link predictability and (ii) a derivative algorithm outperforms state-of-the-art link prediction methods in both accuracy and robustness. This analysis has further applications in evaluating link prediction algorithms and monitoring sudden changes in evolving network mechanisms. It will provide unique fundamental insights into the above-mentioned academic research fields, and will foster the development of advanced information filtering technologies of interest to information technology practitioners. PMID:25659742
Zhang, Jian; Gao, Bo; Chai, Haiting; Ma, Zhiqiang; Yang, Guifu
2016-08-26
DNA-binding proteins (DBPs) play fundamental roles in many biological processes. Therefore, the developing of effective computational tools for identifying DBPs is becoming highly desirable. In this study, we proposed an accurate method for the prediction of DBPs. Firstly, we focused on the challenge of improving DBP prediction accuracy with information solely from the sequence. Secondly, we used multiple informative features to encode the protein. These features included evolutionary conservation profile, secondary structure motifs, and physicochemical properties. Thirdly, we introduced a novel improved Binary Firefly Algorithm (BFA) to remove redundant or noisy features as well as select optimal parameters for the classifier. The experimental results of our predictor on two benchmark datasets outperformed many state-of-the-art predictors, which revealed the effectiveness of our method. The promising prediction performance on a new-compiled independent testing dataset from PDB and a large-scale dataset from UniProt proved the good generalization ability of our method. In addition, the BFA forged in this research would be of great potential in practical applications in optimization fields, especially in feature selection problems. A highly accurate method was proposed for the identification of DBPs. A user-friendly web-server named iDbP (identification of DNA-binding Proteins) was constructed and provided for academic use.
Link prediction based on local weighted paths for complex networks
NASA Astrophysics Data System (ADS)
Yao, Yabing; Zhang, Ruisheng; Yang, Fan; Yuan, Yongna; Hu, Rongjing; Zhao, Zhili
As a significant problem in complex networks, link prediction aims to find the missing and future links between two unconnected nodes by estimating the existence likelihood of potential links. It plays an important role in understanding the evolution mechanism of networks and has broad applications in practice. In order to improve prediction performance, a variety of structural similarity-based methods that rely on different topological features have been put forward. As one topological feature, the path information between node pairs is utilized to calculate the node similarity. However, many path-dependent methods neglect the different contributions of paths for a pair of nodes. In this paper, a local weighted path (LWP) index is proposed to differentiate the contributions between paths. The LWP index considers the effect of the link degrees of intermediate links and the connectivity influence of intermediate nodes on paths to quantify the path weight in the prediction procedure. The experimental results on 12 real-world networks show that the LWP index outperforms other seven prediction baselines.
Zhang, Qingqing; Huo, Mengqi; Zhang, Yanling; Qiao, Yanjiang; Gao, Xiaoyan
2018-06-01
High-resolution mass spectrometry (HRMS) provides a powerful tool for the rapid analysis and identification of compounds in herbs. However, the diversity and large differences in the content of the chemical constituents in herbal medicines, especially isomerisms, are a great challenge for mass spectrometry-based structural identification. In the current study, a new strategy for the structural characterization of potential new phthalide compounds was proposed by isomer structure predictions combined with a quantitative structure-retention relationship (QSRR) analysis using phthalide compounds in Chuanxiong as an example. This strategy consists of three steps. First, the structures of phthalide compounds were reasonably predicted on the basis of the structure features and MS/MS fragmentation patterns: (1) the collected raw HRMS data were preliminarily screened by an in-house database; (2) the MS/MS fragmentation patterns of the analogous compounds were summarized; (3) the reported phthalide compounds were identified, and the structures of the isomers were reasonably predicted. Second, the QSRR model was established and verified using representative phthalide compound standards. Finally, the retention times of the predicted isomers were calculated by the QSRR model, and the structures of these peaks were rationally characterized by matching retention times of the detected chromatographic peaks and the predicted isomers. A multiple linear regression QSRR model in which 6 physicochemical variables were screened was built using 23 phthalide standards. The retention times of the phthalide isomers in Chuanxiong were well predicted by the QSRR model combined with reasonable structure predictions (R 2 =0.955). A total of 81 peaks were detected from Chuanxiong and assigned to reasonable structures, and 26 potential new phthalide compounds were structurally characterized. This strategy can improve the identification efficiency and reliability of homologues in complex materials. Copyright © 2018 Elsevier B.V. All rights reserved.
Three-Dimensional Molecular Modeling of a Diverse Range of SC Clan Serine Proteases
Laskar, Aparna; Chatterjee, Aniruddha; Chatterjee, Somnath; Rodger, Euan J.
2012-01-01
Serine proteases are involved in a variety of biological processes and are classified into clans sharing structural homology. Although various three-dimensional structures of SC clan proteases have been experimentally determined, they are mostly bacterial and animal proteases, with some from archaea, plants, and fungi, and as yet no structures have been determined for protozoa. To bridge this gap, we have used molecular modeling techniques to investigate the structural properties of different SC clan serine proteases from a diverse range of taxa. Either SWISS-MODEL was used for homology-based structure prediction or the LOOPP server was used for threading-based structure prediction. The predicted models were refined using Insight II and SCRWL and validated against experimental structures. Investigation of secondary structures and electrostatic surface potential was performed using MOLMOL. The structural geometry of the catalytic core shows clear deviations between taxa, but the relative positions of the catalytic triad residues were conserved. Evolutionary divergence was also exhibited by large variation in secondary structure features outside the core, differences in overall amino acid distribution, and unique surface electrostatic potential patterns between species. Encompassing a wide range of taxa, our structural analysis provides an evolutionary perspective on SC clan serine proteases. PMID:23213528
Intelligent seismic risk mitigation system on structure building
NASA Astrophysics Data System (ADS)
Suryanita, R.; Maizir, H.; Yuniorto, E.; Jingga, H.
2018-01-01
Indonesia located on the Pacific Ring of Fire, is one of the highest-risk seismic zone in the world. The strong ground motion might cause catastrophic collapse of the building which leads to casualties and property damages. Therefore, it is imperative to properly design the structural response of building against seismic hazard. Seismic-resistant building design process requires structural analysis to be performed to obtain the necessary building responses. However, the structural analysis could be very difficult and time consuming. This study aims to predict the structural response includes displacement, velocity, and acceleration of multi-storey building with the fixed floor plan using Artificial Neural Network (ANN) method based on the 2010 Indonesian seismic hazard map. By varying the building height, soil condition, and seismic location in 47 cities in Indonesia, 6345 data sets were obtained and fed into the ANN model for the learning process. The trained ANN can predict the displacement, velocity, and acceleration responses with up to 96% of predicted rate. The trained ANN architecture and weight factors were later used to build a simple tool in Visual Basic program which possesses the features for prediction of structural response as mentioned previously.
A Predictive Model of Intein Insertion Site for Use in the Engineering of Molecular Switches
Apgar, James; Ross, Mary; Zuo, Xiao; Dohle, Sarah; Sturtevant, Derek; Shen, Binzhang; de la Vega, Humberto; Lessard, Philip; Lazar, Gabor; Raab, R. Michael
2012-01-01
Inteins are intervening protein domains with self-splicing ability that can be used as molecular switches to control activity of their host protein. Successfully engineering an intein into a host protein requires identifying an insertion site that permits intein insertion and splicing while allowing for proper folding of the mature protein post-splicing. By analyzing sequence and structure based properties of native intein insertion sites we have identified four features that showed significant correlation with the location of the intein insertion sites, and therefore may be useful in predicting insertion sites in other proteins that provide native-like intein function. Three of these properties, the distance to the active site and dimer interface site, the SVM score of the splice site cassette, and the sequence conservation of the site showed statistically significant correlation and strong predictive power, with area under the curve (AUC) values of 0.79, 0.76, and 0.73 respectively, while the distance to secondary structure/loop junction showed significance but with less predictive power (AUC of 0.54). In a case study of 20 insertion sites in the XynB xylanase, two features of native insertion sites showed correlation with the splice sites and demonstrated predictive value in selecting non-native splice sites. Structural modeling of intein insertions at two sites highlighted the role that the insertion site location could play on the ability of the intein to modulate activity of the host protein. These findings can be used to enrich the selection of insertion sites capable of supporting intein splicing and hosting an intein switch. PMID:22649521
Longitudinal Validation of General and Specific Structural Features of Personality Pathology
Wright, Aidan G.C.; Hopwood, Christopher J.; Skodol, Andrew E.; Morey, Leslie C.
2016-01-01
Theorists have long argued that personality disorder (PD) is best understood in terms of general impairments shared across the disorders as well as more specific instantiations of pathology. A model based on this theoretical structure was proposed as part of the DSM-5 revision process. However, only recently has this structure been subjected to formal quantitative evaluation, with little in the way of validation efforts via external correlates or prospective longitudinal prediction. We used the Collaborative Longitudinal Study of Personality Disorders dataset to: (1) estimate structural models that parse general from specific variance in personality disorder features, (2) examine patterns of growth in general and specific features over the course of 10 years, and (3) establish concurrent and dynamic longitudinal associations in PD features and a host of external validators including basic personality traits and psychosocial functioning scales. We found that general PD exhibited much lower absolute stability and was most strongly related to broad markers of psychosocial functioning, concurrently and longitudinally, whereas specific features had much higher mean stability and exhibited more circumscribed associations with functioning. However, both general and specific factors showed recognizable associations with normative and pathological traits. These results can inform efforts to refine the conceptualization and diagnosis of personality pathology. PMID:27819472
NASA Astrophysics Data System (ADS)
Islam, Atiq; Iftekharuddin, Khan M.; Ogg, Robert J.; Laningham, Fred H.; Sivakumar, Bhuvaneswari
2008-03-01
In this paper, we characterize the tumor texture in pediatric brain magnetic resonance images (MRIs) and exploit these features for automatic segmentation of posterior fossa (PF) tumors. We focus on PF tumor because of the prevalence of such tumor in pediatric patients. Due to varying appearance in MRI, we propose to model the tumor texture with a multi-fractal process, such as a multi-fractional Brownian motion (mBm). In mBm, the time-varying Holder exponent provides flexibility in modeling irregular tumor texture. We develop a detailed mathematical framework for mBm in two-dimension and propose a novel algorithm to estimate the multi-fractal structure of tissue texture in brain MRI based on wavelet coefficients. This wavelet based multi-fractal feature along with MR image intensity and a regular fractal feature obtained using our existing piecewise-triangular-prism-surface-area (PTPSA) method, are fused in segmenting PF tumor and non-tumor regions in brain T1, T2, and FLAIR MR images respectively. We also demonstrate a non-patient-specific automated tumor prediction scheme based on these image features. We experimentally show the tumor discriminating power of our novel multi-fractal texture along with intensity and fractal features in automated tumor segmentation and statistical prediction. To evaluate the performance of our tumor prediction scheme, we obtain ROCs and demonstrate how sharply the curves reach the specificity of 1.0 sacrificing minimal sensitivity. Experimental results show the effectiveness of our proposed techniques in automatic detection of PF tumors in pediatric MRIs.
NASA Astrophysics Data System (ADS)
Marhadi, Kun Saptohartyadi
Structural optimization for damage tolerance under various unforeseen damage scenarios is computationally challenging. It couples non-linear progressive failure analysis with sampling-based stochastic analysis of random damage. The goal of this research was to understand the relationship between alternate load paths available in a structure and its damage tolerance, and to use this information to develop computationally efficient methods for designing damage tolerant structures. Progressive failure of a redundant truss structure subjected to small random variability was investigated to identify features that correlate with robustness and predictability of the structure's progressive failure. The identified features were used to develop numerical surrogate measures that permit computationally efficient deterministic optimization to achieve robustness and predictability of progressive failure. Analysis of damage tolerance on designs with robust progressive failure indicated that robustness and predictability of progressive failure do not guarantee damage tolerance. Damage tolerance requires a structure to redistribute its load to alternate load paths. In order to investigate the load distribution characteristics that lead to damage tolerance in structures, designs with varying degrees of damage tolerance were generated using brute force stochastic optimization. A method based on principal component analysis was used to describe load distributions (alternate load paths) in the structures. Results indicate that a structure that can develop alternate paths is not necessarily damage tolerant. The alternate load paths must have a required minimum load capability. Robustness analysis of damage tolerant optimum designs indicates that designs are tailored to specified damage. A design Optimized under one damage specification can be sensitive to other damages not considered. Effectiveness of existing load path definitions and characterizations were investigated for continuum structures. A load path definition using a relative compliance change measure (U* field) was demonstrated to be the most useful measure of load path. This measure provides quantitative information on load path trajectories and qualitative information on the effectiveness of the load path. The use of the U* description of load paths in optimizing structures for effective load paths was investigated.
Chen, Shangying; Zhang, Peng; Liu, Xin; Qin, Chu; Tao, Lin; Zhang, Cheng; Yang, Sheng Yong; Chen, Yu Zong; Chui, Wai Keung
2016-06-01
The overall efficacy and safety profile of a new drug is partially evaluated by the therapeutic index in clinical studies and by the protective index (PI) in preclinical studies. In-silico predictive methods may facilitate the assessment of these indicators. Although QSAR and QSTR models can be used for predicting PI, their predictive capability has not been evaluated. To test this capability, we developed QSAR and QSTR models for predicting the activity and toxicity of anticonvulsants at accuracy levels above the literature-reported threshold (LT) of good QSAR models as tested by both the internal 5-fold cross validation and external validation method. These models showed significantly compromised PI predictive capability due to the cumulative errors of the QSAR and QSTR models. Therefore, in this investigation a new quantitative structure-index relationship (QSIR) model was devised and it showed improved PI predictive capability that superseded the LT of good QSAR models. The QSAR, QSTR and QSIR models were developed using support vector regression (SVR) method with the parameters optimized by using the greedy search method. The molecular descriptors relevant to the prediction of anticonvulsant activities, toxicities and PIs were analyzed by a recursive feature elimination method. The selected molecular descriptors are primarily associated with the drug-like, pharmacological and toxicological features and those used in the published anticonvulsant QSAR and QSTR models. This study suggested that QSIR is useful for estimating the therapeutic index of drug candidates. Copyright © 2016. Published by Elsevier Inc.
Kahan, Tracey L; Claudatos, Stephanie
2016-04-01
Self-ratings of dream experiences were obtained from 144 college women for 788 dreams, using the Subjective Experiences Rating Scale (SERS). Consistent with past studies, dreams were characterized by a greater prevalence of vision, audition, and movement than smell, touch, or taste, by both positive and negative emotion, and by a range of cognitive processes. A Principal Components Analysis of SERS ratings revealed ten subscales: four sensory, three affective, one cognitive, and two structural (events/actions, locations). Correlations (Pearson r) among subscale means showed a stronger relationship among the process-oriented features (sensory, cognitive, affective) than between the process-oriented and content-centered (structural) features--a pattern predicted from past research (e.g., Bulkeley & Kahan, 2008). Notably, cognition and positive emotion were associated with a greater number of other phenomenal features than was negative emotion; these findings are consistent with studies of the qualitative features of waking autobiographical memory (e.g., Fredrickson, 2001). Copyright © 2016 Elsevier Inc. All rights reserved.
Rauscher, S; Flamm, C; Mandl, C W; Heinz, F X; Stadler, P F
1997-07-01
The prediction of the complete matrix of base pairing probabilities was applied to the 3' noncoding region (NCR) of flavivirus genomes. This approach identifies not only well-defined secondary structure elements, but also regions of high structural flexibility. Flaviviruses, many of which are important human pathogens, have a common genomic organization, but exhibit a significant degree of RNA sequence diversity in the functionally important 3'-NCR. We demonstrate the presence of secondary structures shared by all flaviviruses, as well as structural features that are characteristic for groups of viruses within the genus reflecting the established classification scheme. The significance of most of the predicted structures is corroborated by compensatory mutations. The availability of infectious clones for several flaviviruses will allow the assessment of these structural elements in processes of the viral life cycle, such as replication and assembly.
deepNF: Deep network fusion for protein function prediction.
Gligorijevic, Vladimir; Barot, Meet; Bonneau, Richard
2018-06-01
The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity. deepNF is freely available at: https://github.com/VGligorijevic/deepNF. vgligorijevic@flatironinstitute.org, rb133@nyu.edu. Supplementary data are available at Bioinformatics online.
Hall, Matthew D.; Salam, Noeris K.; Hellawell, Jennifer L.; Fales, Henry M.; Kensler, Caroline B.; Ludwig, Joseph A.; Szakacs, Gergely; Hibbs, David E.; Gottesman, Michael M.
2009-01-01
We have recently identified a new class of compounds that selectively kill cells that express P-glycoprotein (P-gp, MDR1), the ATPase efflux pump that confers multidrug resistance on cancer cells. Several isatin-β-thiosemicarbazones from our initial study have been validated, and a range of analogs synthesized and tested. A number demonstrated improved MDR1-selective activity over the lead, NSC73306 (1). Pharmacophores for cytotoxicity and MDR1-selectivity were generated to delineate the structural features required for activity. The MDR1-selective pharmacophore highlights the importance of aromatic/hydrophobic features at the N4 position of the thiosemicarbazone, and the reliance on the isatin moiety as key bioisosteric contributors. Additionally, a quantitative structure-activity relationship (QSAR) model that yielded a cross-validated correlation coefficient of 0.85 effectively predicts the cytotoxicty of untested thiosemicarbazones. Together, the models serve as effective approaches for predicting structures with MDR1-selective activity, and aid in directing the search for the mechanism of action of 1. PMID:19397322
Huang, Liqiang
2015-05-01
Basic visual features (e.g., color, orientation) are assumed to be processed in the same general way across different visual tasks. Here, a significant deviation from this assumption was predicted on the basis of the analysis of stimulus spatial structure, as characterized by the Boolean-map notion. If a task requires memorizing the orientations of a set of bars, then the map consisting of those bars can be readily used to hold the overall structure in memory and will thus be especially useful. If the task requires visual search for a target, then the map, which contains only an overall structure, will be of little use. Supporting these predictions, the present study demonstrated that in comparison to stimulus colors, bar orientations were processed more efficiently in change-detection tasks but less efficiently in visual search tasks (Cohen's d = 4.24). In addition to offering support for the role of the Boolean map in conscious access, the present work also throws doubts on the generality of processing visual features. © The Author(s) 2015.
Detecting nonsense for Chinese comments based on logistic regression
NASA Astrophysics Data System (ADS)
Zhuolin, Ren; Guang, Chen; Shu, Chen
2016-07-01
To understand cyber citizens' opinion accurately from Chinese news comments, the clear definition on nonsense is present, and a detection model based on logistic regression (LR) is proposed. The detection of nonsense can be treated as a binary-classification problem. Besides of traditional lexical features, we propose three kinds of features in terms of emotion, structure and relevance. By these features, we train an LR model and demonstrate its effect in understanding Chinese news comments. We find that each of proposed features can significantly promote the result. In our experiments, we achieve a prediction accuracy of 84.3% which improves the baseline 77.3% by 7%.
NASA Astrophysics Data System (ADS)
Richa, Tambi; Ide, Soichiro; Suzuki, Ryosuke; Ebina, Teppei; Kuroda, Yutaka
2017-02-01
Efficient and rapid prediction of domain regions from amino acid sequence information alone is often required for swift structural and functional characterization of large multi-domain proteins. Here we introduce Fast H-DROP, a thirty times accelerated version of our previously reported H-DROP (Helical Domain linker pRediction using OPtimal features), which is unique in specifically predicting helical domain linkers (boundaries). Fast H-DROP, analogously to H-DROP, uses optimum features selected from a set of 3000 ones by combining a random forest and a stepwise feature selection protocol. We reduced the computational time from 8.5 min per sequence in H-DROP to 14 s per sequence in Fast H-DROP on an 8 Xeon processor Linux server by using SWISS-PROT instead of Genbank non-redundant (nr) database for generating the PSSMs. The sensitivity and precision of Fast H-DROP assessed by cross-validation were 33.7 and 36.2%, which were merely 2% lower than that of H-DROP. The reduced computational time of Fast H-DROP, without affecting prediction performances, makes it more interactive and user-friendly. Fast H-DROP and H-DROP are freely available from http://domserv.lab.tuat.ac.jp/.
Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong
2015-01-01
Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Deep learning methods for protein torsion angle prediction.
Li, Haiou; Hou, Jie; Adhikari, Badri; Lyu, Qiang; Cheng, Jianlin
2017-09-18
Deep learning is one of the most powerful machine learning methods that has achieved the state-of-the-art performance in many domains. Since deep learning was introduced to the field of bioinformatics in 2012, it has achieved success in a number of areas such as protein residue-residue contact prediction, secondary structure prediction, and fold recognition. In this work, we developed deep learning methods to improve the prediction of torsion (dihedral) angles of proteins. We design four different deep learning architectures to predict protein torsion angles. The architectures including deep neural network (DNN) and deep restricted Boltzmann machine (DRBN), deep recurrent neural network (DRNN) and deep recurrent restricted Boltzmann machine (DReRBM) since the protein torsion angle prediction is a sequence related problem. In addition to existing protein features, two new features (predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments) are used as input to each of the four deep learning architectures to predict phi and psi angles of protein backbone. The mean absolute error (MAE) of phi and psi angles predicted by DRNN, DReRBM, DRBM and DNN is about 20-21° and 29-30° on an independent dataset. The MAE of phi angle is comparable to the existing methods, but the MAE of psi angle is 29°, 2° lower than the existing methods. On the latest CASP12 targets, our methods also achieved the performance better than or comparable to a state-of-the art method. Our experiment demonstrates that deep learning is a valuable method for predicting protein torsion angles. The deep recurrent network architecture performs slightly better than deep feed-forward architecture, and the predicted residue contact number and the error distribution of torsion angles extracted from sequence fragments are useful features for improving prediction accuracy.
Jaspard, Emmanuel; Macherel, David; Hunault, Gilles
2012-01-01
Late Embryogenesis Abundant Proteins (LEAPs) are ubiquitous proteins expected to play major roles in desiccation tolerance. Little is known about their structure - function relationships because of the scarcity of 3-D structures for LEAPs. The previous building of LEAPdb, a database dedicated to LEAPs from plants and other organisms, led to the classification of 710 LEAPs into 12 non-overlapping classes with distinct properties. Using this resource, numerous physico-chemical properties of LEAPs and amino acid usage by LEAPs have been computed and statistically analyzed, revealing distinctive features for each class. This unprecedented analysis allowed a rigorous characterization of the 12 LEAP classes, which differed also in multiple structural and physico-chemical features. Although most LEAPs can be predicted as intrinsically disordered proteins, the analysis indicates that LEAP class 7 (PF03168) and probably LEAP class 11 (PF04927) are natively folded proteins. This study thus provides a detailed description of the structural properties of this protein family opening the path toward further LEAP structure - function analysis. Finally, since each LEAP class can be clearly characterized by a unique set of physico-chemical properties, this will allow development of software to predict proteins as LEAPs. PMID:22615859
Miao, Zhichao; Westhof, Eric
2016-07-08
RBscore&NBench combines a web server, RBscore and a database, NBench. RBscore predicts RNA-/DNA-binding residues in proteins and visualizes the prediction scores and features on protein structures. The scoring scheme of RBscore directly links feature values to nucleic acid binding probabilities and illustrates the nucleic acid binding energy funnel on the protein surface. To avoid dataset, binding site definition and assessment metric biases, we compared RBscore with 18 web servers and 3 stand-alone programs on 41 datasets, which demonstrated the high and stable accuracy of RBscore. A comprehensive comparison led us to develop a benchmark database named NBench. The web server is available on: http://ahsoka.u-strasbg.fr/rbscorenbench/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Structural protein descriptors in 1-dimension and their sequence-based predictions.
Kurgan, Lukasz; Disfani, Fatemeh Miri
2011-09-01
The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.
Tarafder, Sumit; Toukir Ahmed, Md; Iqbal, Sumaiya; Tamjidul Hoque, Md; Sohel Rahman, M
2018-03-14
Accessible surface area (ASA) of a protein residue is an effective feature for protein structure prediction, binding region identification, fold recognition problems etc. Improving the prediction of ASA by the application of effective feature variables is a challenging but explorable task to consider, specially in the field of machine learning. Among the existing predictors of ASA, REGAd 3 p is a highly accurate ASA predictor which is based on regularized exact regression with polynomial kernel of degree 3. In this work, we present a new predictor RBSURFpred, which extends REGAd 3 p on several dimensions by incorporating 58 physicochemical, evolutionary and structural properties into 9-tuple peptides via Chou's general PseAAC, which allowed us to obtain higher accuracies in predicting both real-valued and binary ASA. We have compared RBSURFpred for both real and binary space predictions with state-of-the-art predictors, such as REGAd 3 p and SPIDER2. We also have carried out a rigorous analysis of the performance of RBSURFpred in terms of different amino acids and their properties, and also with biologically relevant case-studies. The performance of RBSURFpred establishes itself as a useful tool for the community. Copyright © 2018 Elsevier Ltd. All rights reserved.
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.
Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F
2017-08-18
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
A novel knowledge-based potential for RNA 3D structure evaluation
NASA Astrophysics Data System (ADS)
Yang, Yi; Gu, Qi; Zhang, Ben-Gong; Shi, Ya-Zhou; Shao, Zhi-Gang
2018-03-01
Ribonucleic acids (RNAs) play a vital role in biology, and knowledge of their three-dimensional (3D) structure is required to understand their biological functions. Recently structural prediction methods have been developed to address this issue, but a series of RNA 3D structures are generally predicted by most existing methods. Therefore, the evaluation of the predicted structures is generally indispensable. Although several methods have been proposed to assess RNA 3D structures, the existing methods are not precise enough. In this work, a new all-atom knowledge-based potential is developed for more accurately evaluating RNA 3D structures. The potential not only includes local and nonlocal interactions but also fully considers the specificity of each RNA by introducing a retraining mechanism. Based on extensive test sets generated from independent methods, the proposed potential correctly distinguished the native state and ranked near-native conformations to effectively select the best. Furthermore, the proposed potential precisely captured RNA structural features such as base-stacking and base-pairing. Comparisons with existing potential methods show that the proposed potential is very reliable and accurate in RNA 3D structure evaluation. Project supported by the National Science Foundation of China (Grants Nos. 11605125, 11105054, 11274124, and 11401448).
MultiMiTar: a novel multi objective optimization based miRNA-target prediction method.
Mitra, Ramkrishna; Bandyopadhyay, Sanghamitra
2011-01-01
Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
2011-01-01
Background Existing methods of predicting DNA-binding proteins used valuable features of physicochemical properties to design support vector machine (SVM) based classifiers. Generally, selection of physicochemical properties and determination of their corresponding feature vectors rely mainly on known properties of binding mechanism and experience of designers. However, there exists a troublesome problem for designers that some different physicochemical properties have similar vectors of representing 20 amino acids and some closely related physicochemical properties have dissimilar vectors. Results This study proposes a systematic approach (named Auto-IDPCPs) to automatically identify a set of physicochemical and biochemical properties in the AAindex database to design SVM-based classifiers for predicting and analyzing DNA-binding domains/proteins. Auto-IDPCPs consists of 1) clustering 531 amino acid indices in AAindex into 20 clusters using a fuzzy c-means algorithm, 2) utilizing an efficient genetic algorithm based optimization method IBCGA to select an informative feature set of size m to represent sequences, and 3) analyzing the selected features to identify related physicochemical properties which may affect the binding mechanism of DNA-binding domains/proteins. The proposed Auto-IDPCPs identified m=22 features of properties belonging to five clusters for predicting DNA-binding domains with a five-fold cross-validation accuracy of 87.12%, which is promising compared with the accuracy of 86.62% of the existing method PSSM-400. For predicting DNA-binding sequences, the accuracy of 75.50% was obtained using m=28 features, where PSSM-400 has an accuracy of 74.22%. Auto-IDPCPs and PSSM-400 have accuracies of 80.73% and 82.81%, respectively, applied to an independent test data set of DNA-binding domains. Some typical physicochemical properties discovered are hydrophobicity, secondary structure, charge, solvent accessibility, polarity, flexibility, normalized Van Der Waals volume, pK (pK-C, pK-N, pK-COOH and pK-a(RCOOH)), etc. Conclusions The proposed approach Auto-IDPCPs would help designers to investigate informative physicochemical and biochemical properties by considering both prediction accuracy and analysis of binding mechanism simultaneously. The approach Auto-IDPCPs can be also applicable to predict and analyze other protein functions from sequences. PMID:21342579
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...
2015-10-26
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thor, M; Tyagi, N; Deasy, J
2015-06-15
Purpose: The aim of this study was to explore the use of Magnetic Resonance Imaging (MRI)-derived features as indicators of Radiotherapy (RT)-induced normal tissue morbidity. We also investigate the relationship between these features and RT dose in four critical structures. Methods: We demonstrate our approach for four patients treated with RT for base of tongue cancer in 2005–2007. For each patient, two MRI scans (T1-weighted pre (T1pre) and post (T1post) gadolinium contrast-enhancement) were acquired within the first six months after RT. The assessed morbidity endpoint observed in 2/4 patients was Grade 2+ CTCAEv.3 trismus. Four ipsilateral masticatory-related structures (masseter, lateralmore » and medial pterygoid, and the temporal muscles) were delineated on both T1pre and T1post and these scans were co-registered to the treatment planning CT using a deformable demons algorithm. For each structure, the maximum and mean RT dose, and six MRI-derived features (the second order texture features entropy and homogeneity, and the first order mean, median, kurtosis, and skewness) were extracted and compared structure-wise between patients with and without trismus. All MRI-derived features were calculated as the difference between T1pre and T1post, ΔS. Results: For 5/6 features and all structures, ΔS diverged between trismus and non-trismus patients particularly for the masseter, lateral pterygoid, and temporal muscles using the kurtosis feature (−0.2 vs. 6.4 for lateral pterygoid). Both the maximum and mean RT dose in all four muscles were higher amongst the trismus patients (with the maximum dose being up to 25 Gy higher). Conclusion: Using MRI-derived features to quantify RT-induced normal tissue complications is feasible. We showed that several features are different between patients with and without morbidity and that the RT dose in all investigated structures are higher amongst patients with morbidity. MRI-derived features, therefore, has the potential to improve predictions of normal tissue morbidity.« less
Power law tails in phylogenetic systems.
Qin, Chongli; Colwell, Lucy J
2018-01-23
Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.
Quantifying side-chain conformational variations in protein structure
Miao, Zhichao; Cao, Yang
2016-01-01
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs. PMID:27845406
Quantifying side-chain conformational variations in protein structure
NASA Astrophysics Data System (ADS)
Miao, Zhichao; Cao, Yang
2016-11-01
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.
Quantifying side-chain conformational variations in protein structure.
Miao, Zhichao; Cao, Yang
2016-11-15
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.
Deng, Lei; Fan, Chao; Zeng, Zhiwen
2017-12-28
Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.
The devil is in the detail: brain dynamics in preparation for a global-local task.
Leaver, Echo E; Low, Kathy A; DiVacri, Assunta; Merla, Arcangelo; Fabiani, Monica; Gratton, Gabriele
2015-08-01
When analyzing visual scenes, it is sometimes important to determine the relevant "grain" size. Attention control mechanisms may help direct our processing to the intended grain size. Here we used the event-related optical signal, a method possessing high temporal and spatial resolution, to examine the involvement of brain structures within the dorsal attention network (DAN) and the visual processing network (VPN) in preparation for the appropriate level of analysis. Behavioral data indicate that the small features of a hierarchical stimulus (local condition) are more difficult to process than the large features (global condition). Consistent with this finding, cues predicting a local trial were associated with greater DAN activation. This activity was bilateral but more pronounced in the left hemisphere, where it showed a frontal-to-parietal progression over time. Furthermore, the amount of DAN activation, especially in the left hemisphere and in parietal regions, was predictive of subsequent performance. Although local cues elicited left-lateralized DAN activity, no preponderantly right activity was observed for global cues; however, the data indicated an interaction between level of analysis (local vs. global) and hemisphere in VPN. They further showed that local processing involves structures in the ventral VPN, whereas global processing involves structures in the dorsal VPN. These results indicate that in our study preparation for analyzing different size features is an asymmetric process, in which greater preparation is required to focus on small rather than large features, perhaps because of their lesser salience. This preparation involves the same DAN used for other attention control operations.
Hull, Damien C; Williams, Glenn A; Griffiths, Mark D
2013-09-01
Video games provide opportunities for positive psychological experiences such as flow-like phenomena during play and general happiness that could be associated with gaming achievements. However, research has shown that specific features of game play may be associated with problematic behaviour associated with addiction-like experiences. The study was aimed at analysing whether certain structural characteristics of video games, flow, and global happiness could be predictive of video game addiction. A total of 110 video game players were surveyed about a game they had recently played by using a 24-item checklist of structural characteristics, an adapted Flow State Scale, the Oxford Happiness Questionnaire, and the Game Addiction Scale. The study revealed decreases in general happiness had the strongest role in predicting increases in gaming addiction. One of the nine factors of the flow experience was a significant predictor of gaming addiction - perceptions of time being altered during play. The structural characteristic that significantly predicted addiction was its social element with increased sociability being associated with higher levels of addictive-like experiences. Overall, the structural characteristics of video games, elements of the flow experience, and general happiness accounted for 49.2% of the total variance in Game Addiction Scale levels. Implications for interventions are discussed, particularly with regard to making players more aware of time passing and in capitalising on benefits of social features of video game play to guard against addictive-like tendencies among video game players.
Hull, Damien C.; Williams, Glenn A.; Griffiths, Mark D.
2013-01-01
Aims: Video games provide opportunities for positive psychological experiences such as flow-like phenomena during play and general happiness that could be associated with gaming achievements. However, research has shown that specific features of game play may be associated with problematic behaviour associated with addiction-like experiences. The study was aimed at analysing whether certain structural characteristics of video games, flow, and global happiness could be predictive of video game addiction. Method: A total of 110 video game players were surveyed about a game they had recently played by using a 24-item checklist of structural characteristics, an adapted Flow State Scale, the Oxford Happiness Questionnaire, and the Game Addiction Scale. Results: The study revealed decreases in general happiness had the strongest role in predicting increases in gaming addiction. One of the nine factors of the flow experience was a significant predictor of gaming addiction – perceptions of time being altered during play. The structural characteristic that significantly predicted addiction was its social element with increased sociability being associated with higher levels of addictive-like experiences. Overall, the structural characteristics of video games, elements of the flow experience, and general happiness accounted for 49.2% of the total variance in Game Addiction Scale levels. Conclusions: Implications for interventions are discussed, particularly with regard to making players more aware of time passing and in capitalising on benefits of social features of video game play to guard against addictive-like tendencies among video game players. PMID:25215196
NASA Astrophysics Data System (ADS)
Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.
2017-11-01
A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.
Environmental modeling and recognition for an autonomous land vehicle
NASA Technical Reports Server (NTRS)
Lawton, D. T.; Levitt, T. S.; Mcconnell, C. C.; Nelson, P. C.
1987-01-01
An architecture for object modeling and recognition for an autonomous land vehicle is presented. Examples of objects of interest include terrain features, fields, roads, horizon features, trees, etc. The architecture is organized around a set of data bases for generic object models and perceptual structures, temporary memory for the instantiation of object and relational hypotheses, and a long term memory for storing stable hypotheses that are affixed to the terrain representation. Multiple inference processes operate over these databases. Researchers describe these particular components: the perceptual structure database, the grouping processes that operate over this, schemas, and the long term terrain database. A processing example that matches predictions from the long term terrain model to imagery, extracts significant perceptual structures for consideration as potential landmarks, and extracts a relational structure to update the long term terrain database is given.
A Systematic Investigation of Computation Models for Predicting Adverse Drug Reactions (ADRs)
Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong
2014-01-01
Background Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. Principal Findings In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Conclusion Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms. PMID:25180585
Predicting β-Turns in Protein Using Kernel Logistic Regression
Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, FangXiang; Li, Min
2013-01-01
A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case. PMID:23509793
Predicting β-turns in protein using kernel logistic regression.
Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, Fangxiang; Li, Min
2013-01-01
A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case.
Prediction of missing links and reconstruction of complex networks
NASA Astrophysics Data System (ADS)
Zhang, Cheng-Jun; Zeng, An
2016-04-01
Predicting missing links in complex networks is of great significance from both theoretical and practical point of view, which not only helps us understand the evolution of real systems but also relates to many applications in social, biological and online systems. In this paper, we study the features of different simple link prediction methods, revealing that they may lead to the distortion of networks’ structural and dynamical properties. Moreover, we find that high prediction accuracy is not definitely corresponding to a high performance in preserving the network properties when using link prediction methods to reconstruct networks. Our work highlights the importance of considering the feedback effect of the link prediction methods on network properties when designing the algorithms.
Quantifying the Hierarchical Order in Self-Aligned Carbon Nanotubes from Atomic to Micrometer Scale.
Meshot, Eric R; Zwissler, Darwin W; Bui, Ngoc; Kuykendall, Tevye R; Wang, Cheng; Hexemer, Alexander; Wu, Kuang Jen J; Fornasiero, Francesco
2017-06-27
Fundamental understanding of structure-property relationships in hierarchically organized nanostructures is crucial for the development of new functionality, yet quantifying structure across multiple length scales is challenging. In this work, we used nondestructive X-ray scattering to quantitatively map the multiscale structure of hierarchically self-organized carbon nanotube (CNT) "forests" across 4 orders of magnitude in length scale, from 2.0 Å to 1.5 μm. Fully resolved structural features include the graphitic honeycomb lattice and interlayer walls (atomic), CNT diameter (nano), as well as the greater CNT ensemble (meso) and large corrugations (micro). Correlating orientational order across hierarchical levels revealed a cascading decrease as we probed finer structural feature sizes with enhanced sensitivity to small-scale disorder. Furthermore, we established qualitative relationships for single-, few-, and multiwall CNT forest characteristics, showing that multiscale orientational order is directly correlated with number density spanning 10 9 -10 12 cm -2 , yet order is inversely proportional to CNT diameter, number of walls, and atomic defects. Lastly, we captured and quantified ultralow-q meridional scattering features and built a phenomenological model of the large-scale CNT forest morphology, which predicted and confirmed that these features arise due to microscale corrugations along the vertical forest direction. Providing detailed structural information at multiple length scales is important for design and synthesis of CNT materials as well as other hierarchically organized nanostructures.
Proteome-wide Prediction of Self-interacting Proteins Based on Multiple Properties*
Liu, Zhongyang; Guo, Feifei; Zhang, Jiyang; Wang, Jian; Lu, Liang; Li, Dong; He, Fuchu
2013-01-01
Self-interacting proteins, whose two or more copies can interact with each other, play important roles in cellular functions and the evolution of protein interaction networks (PINs). Knowing whether a protein can self-interact can contribute to and sometimes is crucial for the elucidation of its functions. Previous related research has mainly focused on the structures and functions of specific self-interacting proteins, whereas knowledge on their overall properties is limited. Meanwhile, the two current most common high throughput protein interaction assays have limited ability to detect self-interactions because of biological artifacts and design limitations, whereas the bioinformatic prediction method of self-interacting proteins is lacking. This study aims to systematically study and predict self-interacting proteins from an overall perspective. We find that compared with other proteins the self-interacting proteins in the structural aspect contain more domains; in the evolutionary aspect they tend to be conserved and ancient; in the functional aspect they are significantly enriched with enzyme genes, housekeeping genes, and drug targets, and in the topological aspect tend to occupy important positions in PINs. Furthermore, based on these features, after feature selection, we use logistic regression to integrate six representative features, including Gene Ontology term, domain, paralogous interactor, enzyme, model organism self-interacting protein, and betweenness centrality in the PIN, to develop a proteome-wide prediction model of self-interacting proteins. Using 5-fold cross-validation and an independent test, this model shows good performance. Finally, the prediction model is developed into a user-friendly web service SLIPPER (SeLf-Interacting Protein PrEdictoR). Users may submit a list of proteins, and then SLIPPER will return the probability_scores measuring their possibility to be self-interacting proteins and various related annotation information. This work helps us understand the role self-interacting proteins play in cellular functions from an overall perspective, and the constructed prediction model may contribute to the high throughput finding of self-interacting proteins and provide clues for elucidating their functions. PMID:23422585
FRAGSION: ultra-fast protein fragment library generation by IOHMM sampling.
Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin
2016-07-01
Speed, accuracy and robustness of building protein fragment library have important implications in de novo protein structure prediction since fragment-based methods are one of the most successful approaches in template-free modeling (FM). Majority of the existing fragment detection methods rely on database-driven search strategies to identify candidate fragments, which are inherently time-consuming and often hinder the possibility to locate longer fragments due to the limited sizes of databases. Also, it is difficult to alleviate the effect of noisy sequence-based predicted features such as secondary structures on the quality of fragment. Here, we present FRAGSION, a database-free method to efficiently generate protein fragment library by sampling from an Input-Output Hidden Markov Model. FRAGSION offers some unique features compared to existing approaches in that it (i) is lightning-fast, consuming only few seconds of CPU time to generate fragment library for a protein of typical length (300 residues); (ii) can generate dynamic-size fragments of any length (even for the whole protein sequence) and (iii) offers ways to handle noise in predicted secondary structure during fragment sampling. On a FM dataset from the most recent Critical Assessment of Structure Prediction, we demonstrate that FGRAGSION provides advantages over the state-of-the-art fragment picking protocol of ROSETTA suite by speeding up computation by several orders of magnitude while achieving comparable performance in fragment quality. Source code and executable versions of FRAGSION for Linux and MacOS is freely available to non-commercial users at http://sysbio.rnet.missouri.edu/FRAGSION/ It is bundled with a manual and example data. chengji@missouri.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Rapid experimental measurements of physicochemical properties to inform models and testing.
Nicolas, Chantel I; Mansouri, Kamel; Phillips, Katherine A; Grulke, Christopher M; Richard, Ann M; Williams, Antony J; Rabinowitz, James; Isaacs, Kristin K; Yau, Alice; Wambaugh, John F
2018-05-02
The structures and physicochemical properties of chemicals are important for determining their potential toxicological effects, toxicokinetics, and route(s) of exposure. These data are needed to prioritize the risk for thousands of environmental chemicals, but experimental values are often lacking. In an attempt to efficiently fill data gaps in physicochemical property information, we generated new data for 200 structurally diverse compounds, which were rigorously selected from the USEPA ToxCast chemical library, and whose structures are available within the Distributed Structure-Searchable Toxicity Database (DSSTox). This pilot study evaluated rapid experimental methods to determine five physicochemical properties, including the log of the octanol:water partition coefficient (known as log(K ow ) or logP), vapor pressure, water solubility, Henry's law constant, and the acid dissociation constant (pKa). For most compounds, experiments were successful for at least one property; log(K ow ) yielded the largest return (176 values). It was determined that 77 ToxPrint structural features were enriched in chemicals with at least one measurement failure, indicating which features may have played a role in rapid method failures. To gauge consistency with traditional measurement methods, the new measurements were compared with previous measurements (where available). Since quantitative structure-activity/property relationship (QSAR/QSPR) models are used to fill gaps in physicochemical property information, 5 suites of QSPRs were evaluated for their predictive ability and chemical coverage or applicability domain of new experimental measurements. The ability to have accurate measurements of these properties will facilitate better exposure predictions in two ways: 1) direct input of these experimental measurements into exposure models; and 2) construction of QSPRs with a wider applicability domain, as their predicted physicochemical values can be used to parameterize exposure models in the absence of experimental data. Published by Elsevier B.V.
Watanabe, Takanori; Kessler, Daniel; Scott, Clayton; Angstadt, Michael; Sripada, Chandra
2014-01-01
Substantial evidence indicates that major psychiatric disorders are associated with distributed neural dysconnectivity, leading to strong interest in using neuroimaging methods to accurately predict disorder status. In this work, we are specifically interested in a multivariate approach that uses features derived from whole-brain resting state functional connectomes. However, functional connectomes reside in a high dimensional space, which complicates model interpretation and introduces numerous statistical and computational challenges. Traditional feature selection techniques are used to reduce data dimensionality, but are blind to the spatial structure of the connectomes. We propose a regularization framework where the 6-D structure of the functional connectome (defined by pairs of points in 3-D space) is explicitly taken into account via the fused Lasso or the GraphNet regularizer. Our method only restricts the loss function to be convex and margin-based, allowing non-differentiable loss functions such as the hinge-loss to be used. Using the fused Lasso or GraphNet regularizer with the hinge-loss leads to a structured sparse support vector machine (SVM) with embedded feature selection. We introduce a novel efficient optimization algorithm based on the augmented Lagrangian and the classical alternating direction method, which can solve both fused Lasso and GraphNet regularized SVM with very little modification. We also demonstrate that the inner subproblems of the algorithm can be solved efficiently in analytic form by coupling the variable splitting strategy with a data augmentation scheme. Experiments on simulated data and resting state scans from a large schizophrenia dataset show that our proposed approach can identify predictive regions that are spatially contiguous in the 6-D “connectome space,” offering an additional layer of interpretability that could provide new insights about various disease processes. PMID:24704268
POOL server: machine learning application for functional site prediction in proteins.
Somarowthu, Srinivas; Ondrechen, Mary Jo
2012-08-01
We present an automated web server for partial order optimum likelihood (POOL), a machine learning application that combines computed electrostatic and geometric information for high-performance prediction of catalytic residues from 3D structures. Input features consist of THEMATICS electrostatics data and pocket information from ConCavity. THEMATICS measures deviation from typical, sigmoidal titration behavior to identify functionally important residues and ConCavity identifies binding pockets by analyzing the surface geometry of protein structures. Both THEMATICS and ConCavity (structure only) do not require the query protein to have any sequence or structure similarity to other proteins. Hence, POOL is applicable to proteins with novel folds and engineered proteins. As an additional option for cases where sequence homologues are available, users can include evolutionary information from INTREPID for enhanced accuracy in site prediction. The web site is free and open to all users with no login requirements at http://www.pool.neu.edu. m.ondrechen@neu.edu Supplementary data are available at Bioinformatics online.
Discriminant analysis of multiple cortical changes in mild cognitive impairment
NASA Astrophysics Data System (ADS)
Wu, Congling; Guo, Shengwen; Lai, Chunren; Wu, Yupeng; Zhao, Di; Jiang, Xingjun
2017-02-01
To reveal the differences in brain structures and morphological changes between the mild cognitive impairment (MCI) and the normal control (NC), analyze and predict the risk of MCI conversion. First, the baseline and 2-year longitudinal follow-up magnetic resonance (MR) images of 73 NC, 46 patients with stable MCI (sMCI) and 40 patients with converted MCI (cMCI) were selected. Second, the FreeSurfer was used to extract the cortical features, including the cortical thickness, surface area, gray matter volume and mean curvature. Third, the support vector machine-recursive feature elimination method (SVM-RFE) were adopted to determine salient features for effective discrimination. Finally, the distribution and importance of essential brain regions were described. The experimental results showed that the cortical thickness and gray matter volume exhibited prominent capability in discrimination, and surface area and mean curvature behaved relatively weak. Furthermore, the combination of different morphological features, especially the baseline combined with the longitudinal changes, can be used to evidently improve the performance of classification. In addition, brain regions with high weights predominately located in the temporal lobe and the frontal lobe, which were relative to emotional control and memory functions. It suggests that there were significant different patterns in the brain structure and changes between the compared group, which could not only be effectively applied for classification, but also be used to evaluate and predict the conversion of the patients with MCI.
Lei, Tailong; Sun, Huiyong; Kang, Yu; Zhu, Feng; Liu, Hui; Zhou, Wenfang; Wang, Zhe; Li, Dan; Li, Youyong; Hou, Tingjun
2017-11-06
Xenobiotic chemicals and their metabolites are mainly excreted out of our bodies by the urinary tract through the urine. Chemical-induced urinary tract toxicity is one of the main reasons that cause failure during drug development, and it is a common adverse event for medications, natural supplements, and environmental chemicals. Despite its importance, there are only a few in silico models for assessing urinary tract toxicity for a large number of compounds with diverse chemical structures. Here, we developed a series of qualitative and quantitative structure-activity relationship (QSAR) models for predicting urinary tract toxicity. In our study, the recursive feature elimination method incorporated with random forests (RFE-RF) was used for dimension reduction, and then eight machine learning approaches were used for QSAR modeling, i.e., relevance vector machine (RVM), support vector machine (SVM), regularized random forest (RRF), C5.0 trees, eXtreme gradient boosting (XGBoost), AdaBoost.M1, SVM boosting (SVMBoost), and RVM boosting (RVMBoost). For building classification models, the synthetic minority oversampling technique was used to handle the imbalance data set problem. Among all the machine learning approaches, SVMBoost based on the RBF kernel achieves both the best quantitative (q ext 2 = 0.845) and qualitative predictions for the test set (MCC of 0.787, AUC of 0.893, sensitivity of 89.6%, specificity of 94.1%, and global accuracy of 90.8%). The application domains were then analyzed, and all of the tested chemicals fall within the application domain coverage. We also examined the structure features of the chemicals with large prediction errors. In brief, both the regression and classification models developed by the SVMBoost approach have reliable prediction capability for assessing chemical-induced urinary tract toxicity.
LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction
Huang, Li
2017-01-01
Predicting novel microRNA (miRNA)-disease associations is clinically significant due to miRNAs’ potential roles of diagnostic biomarkers and therapeutic targets for various human diseases. Previous studies have demonstrated the viability of utilizing different types of biological data to computationally infer new disease-related miRNAs. Yet researchers face the challenge of how to effectively integrate diverse datasets and make reliable predictions. In this study, we presented a computational model named Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction (LRSSLMDA), which projected miRNAs/diseases’ statistical feature profile and graph theoretical feature profile to a common subspace. It used Laplacian regularization to preserve the local structures of the training data and a L1-norm constraint to select important miRNA/disease features for prediction. The strength of dimensionality reduction enabled the model to be easily extended to much higher dimensional datasets than those exploited in this study. Experimental results showed that LRSSLMDA outperformed ten previous models: the AUC of 0.9178 in global leave-one-out cross validation (LOOCV) and the AUC of 0.8418 in local LOOCV indicated the model’s superior prediction accuracy; and the average AUC of 0.9181+/-0.0004 in 5-fold cross validation justified its accuracy and stability. In addition, three types of case studies further demonstrated its predictive power. Potential miRNAs related to Colon Neoplasms, Lymphoma, Kidney Neoplasms, Esophageal Neoplasms and Breast Neoplasms were predicted by LRSSLMDA. Respectively, 98%, 88%, 96%, 98% and 98% out of the top 50 predictions were validated by experimental evidences. Therefore, we conclude that LRSSLMDA would be a valuable computational tool for miRNA-disease association prediction. PMID:29253885
Wang, Xun-Heng; Jiao, Yun; Li, Lihua
2017-10-24
Attention deficit hyperactivity disorder (ADHD) is a common brain disorder with high prevalence in school-age children. Previously developed machine learning-based methods have discriminated patients with ADHD from normal controls by providing label information of the disease for individuals. Inattention and impulsivity are the two most significant clinical symptoms of ADHD. However, predicting clinical symptoms (i.e., inattention and impulsivity) is a challenging task based on neuroimaging data. The goal of this study is twofold: to build predictive models for clinical symptoms of ADHD based on resting-state fMRI and to mine brain networks for predictive patterns of inattention and impulsivity. To achieve this goal, a cohort of 74 boys with ADHD and a cohort of 69 age-matched normal controls were recruited from the ADHD-200 Consortium. Both structural and resting-state fMRI images were obtained for each participant. Temporal patterns between and within intrinsic connectivity networks (ICNs) were applied as raw features in the predictive models. Specifically, sample entropy was taken asan intra-ICN feature, and phase synchronization (PS) was used asan inter-ICN feature. The predictive models were based on the least absolute shrinkage and selectionator operator (LASSO) algorithm. The performance of the predictive model for inattention is r=0.79 (p<10 -8 ), and the performance of the predictive model for impulsivity is r=0.48 (p<10 -8 ). The ICN-related predictive patterns may provide valuable information for investigating the brain network mechanisms of ADHD. In summary, the predictive models for clinical symptoms could be beneficial for personalizing ADHD medications. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.
Analyses of Fatigue and Fatigue-Crack Growth under Constant- and Variable-Amplitude Loading
NASA Technical Reports Server (NTRS)
Newman, J. C., Jr.
1999-01-01
Studies on the growth of small cracks have led to the observation that fatigue life of many engineering materials is primarily crack growth from micro-structural features, such as inclusion particles, voids, slip-bands or from manufacturing defects. This paper reviews the capabilities of a plasticity-induced crack-closure model to predict fatigue lives of metallic materials using small-crack theory under various loading conditions. Constraint factors, to account for three-dimensional effects, were selected to correlate large-crack growth rate data as a function of the effective stress-intensity factor range (delta K(sub eff)) under constant-amplitude loading. Modifications to the delta K(sub eff)-rate relations in the near-threshold regime were needed to fit measured small-crack growth rate behavior. The model was then used to calculate small- and large-crack growth rates, and to predict total fatigue lives, for notched and un-notched specimens under constant-amplitude and spectrum loading. Fatigue lives were predicted using crack-growth relations and micro-structural features like those that initiated cracks in the fatigue specimens for most of the materials analyzed. Results from the tests and analyses agreed well.
VarMod: modelling the functional effects of non-synonymous variants
Pappalardo, Morena; Wass, Mark N.
2014-01-01
Unravelling the genotype–phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein–protein interfaces and protein–ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod. PMID:24906884
Promotion and resignation in employee networks
NASA Astrophysics Data System (ADS)
Yuan, Jia; Zhang, Qian-Ming; Gao, Jian; Zhang, Linyan; Wan, Xue-Song; Yu, Xiao-Jun; Zhou, Tao
2016-02-01
Enterprises have put more and more emphasis on data analysis so as to obtain effective management advices. Managers and researchers are trying to dig out the major factors that lead to employees' promotion and resignation. Most previous analyses are based on questionnaire survey, which usually consists of a small fraction of samples and contains biases caused by psychological defense. In this paper, we successfully collect a data set consisting of all the employees' work-related interactions (action network, AN for short) and online social connections (social network, SN for short) of a company, which inspires us to reveal the correlations between structural features and employees' career development, namely promotion and resignation. Through statistical analysis, we show that the structural features of both AN and SN are correlated and predictive to employees' promotion and resignation, and the AN has higher correlation and predictability. More specifically, the in-degree in AN is the most relevant indicator for promotion, while the k-shell index in AN and in-degree in SN are both very predictive to resignation. Our results provide a novel and actionable understanding of enterprise management and suggest that to enhance the interplays among employees, no matter work-related or social interplays, can be helpful to reduce staffs' turnover risk.
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe
2016-01-01
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe
2016-06-20
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
NASA technology utilization house
NASA Technical Reports Server (NTRS)
1977-01-01
Following systems and features, which are predicted to save approximately $20,000 in utility costs over twenty year period, are incorporated into single-level, contemporarily designed, energy efficient residential structure: solar heating and cooling; energy efficient appliances; water recycling; security, smoke, and tornado detectors; and flat conductor electrical wiring.
Gonsalves, Valerie M; McLawsen, Julia E; Huss, Matthew T; Scalora, Mario J
2013-01-01
A wealth of research has underscored the strong relationship between PCL-R scores and recidivism. However, mounting criticism cites the PCL-R's cumbersome administration procedures and failure to adequately measure core features associated with the construct of psychopathy (Skeem, Polaschek, Patrick, & Lilienfeld, 2011). In light of these concerns, this study examined the PPI and the PPI-R, which were designed to measure core personality features associated with psychopathy (Lilienfeld & Andrews, 1996; Lilienfeld & Widows, 2005). Study one examined the PPI relative to the PCL-R and examined its factor structure. The instruments shared few significant correlations and neither the PCL-R nor the PPI significantly predicted recidivism. Study two examined the PPI-R relative to the PCL-R, the PPI, both history of violence and future criminal activity and measure of related constructs. The PPI-R was significantly correlated with measures of empathy and criminal thinking and the factors were related to a history of violence and predicted future violent criminal behavior. Copyright © 2013 Elsevier Ltd. All rights reserved.
Chen, Ying Pin; Liu, Tian Fu; Fordham, Stephen; Zhou, Hong Cai
2015-12-01
Two metal-organic frameworks [PCN-426(Ni) and PCN-427(Cu)] have been designed and synthesized to investigate the structure predictability using a SBB (supermolecular building blocks) approach. Tetratopic ligands featuring 120° angular carboxylate moieties were coordinated with a [Ni3(μ3-O)] cluster and a [Cu2O2] unit, respectively. As topologically predicted, 4-connected networks with square coordination adopted the nbo net for the Ni-MOF and ssb net for the Cu-MOF. PCN-426(Ni) was augmented with 12-connected octahedral SBBs, while PCN-427(Cu) was constructed with tetragonal open channels. After a CO2 supercritical drying procedure, the PCN-426(Ni) possessed a Brunauer-Emmett-Teller (BET) surface area as high as 3935 m(2) g(-1) and impressively high N2 uptake of 1500 cm(3) g(-1). This work demonstrates the generalization of the SBB strategy, finding an alternative to inconvenient synthetic processes to achieve the desired structural features.
2014-01-01
Background Osteopontin (Eta, secreted sialoprotein 1, opn) is secreted from different cell types including cancer cells. Three splice variant forms namely osteopontin-a, osteopontin-b and osteopontin-c have been identified. The main astonishing feature is that osteopontin-c is found to be elevated in almost all types of cancer cells. This was the vital point to consider it for sequence analysis and structure predictions which provide ample chances for prognostic, therapeutic and preventive cancer research. Methods Osteopontin-c gene sequence was determined from Breast Cancer sample and was translated to protein sequence. It was then analyzed using various software and web tools for binding pockets, docking and druggability analysis. Due to the lack of homological templates, tertiary structure was predicted using ab-initio method server – I-TASSER and was evaluated after refinement using web tools. Refined structure was compared with known bone sialoprotein electron microscopic structure and docked with CD44 for binding analysis and binding pockets were identified for drug designing. Results Signal sequence of about sixteen amino acid residues was identified using signal sequence prediction servers. Due to the absence of known structures of similar proteins, three dimensional structure of osteopontin-c was predicted using I-TASSER server. The predicted structure was refined with the help of SUMMA server and was validated using SAVES server. Molecular dynamic analysis was carried out using GROMACS software. The final model was built and was used for docking with CD44. Druggable pockets were identified using pocket energies. Conclusions The tertiary structure of osteopontin-c was predicted successfully using the ab-initio method and the predictions showed that osteopontin-c is of fibrous nature comparable to firbronectin. Docking studies showed the significant similarities of QSAET motif in the interaction of CD44 and osteopontins between the normal and splice variant forms of osteopontins and binding pockets analyses revealed several pockets which paved the way to the identification of a druggable pocket. PMID:24401206
Compact structure and non-Gaussian dynamics of ring polymer melts.
Brás, Ana R; Goossen, Sebastian; Krutyeva, Margarita; Radulescu, Aurel; Farago, Bela; Allgaier, Jürgen; Pyckhout-Hintzen, Wim; Wischnewski, Andreas; Richter, Dieter
2014-05-28
We present a neutron scattering analysis of the structure and dynamics of PEO polymer rings with a molecular weight 2.5 times higher than the entanglement mass. The melt structure was found to be more compact than a Gaussian model would suggest. With increasing time the center of mass (c.o.m.) diffusion undergoes a transition from sub-diffusive to diffusive behavior. The transition time agrees well with the decorrelation time predicted by a mode coupling approach. As a novel feature well pronounced non-Gaussian behavior of the c.o.m. diffusion was found that shows surprising analogies to the cage effect known from glassy systems. Finally, the longest wavelength Rouse modes are suppressed possibly as a consequence of an onset of lattice animal features as hypothesized in theoretical approaches.
Improved model quality assessment using ProQ2.
Ray, Arjun; Lindahl, Erik; Wallner, Björn
2012-09-10
Employing methods to assess the quality of modeled protein structures is now standard practice in bioinformatics. In a broad sense, the techniques can be divided into methods relying on consensus prediction on the one hand, and single-model methods on the other. Consensus methods frequently perform very well when there is a clear consensus, but this is not always the case. In particular, they frequently fail in selecting the best possible model in the hard cases (lacking consensus) or in the easy cases where models are very similar. In contrast, single-model methods do not suffer from these drawbacks and could potentially be applied on any protein of interest to assess quality or as a scoring function for sampling-based refinement. Here, we present a new single-model method, ProQ2, based on ideas from its predecessor, ProQ. ProQ2 is a model quality assessment algorithm that uses support vector machines to predict local as well as global quality of protein models. Improved performance is obtained by combining previously used features with updated structural and predicted features. The most important contribution can be attributed to the use of profile weighting of the residue specific features and the use features averaged over the whole model even though the prediction is still local. ProQ2 is significantly better than its predecessors at detecting high quality models, improving the sum of Z-scores for the selected first-ranked models by 20% and 32% compared to the second-best single-model method in CASP8 and CASP9, respectively. The absolute quality assessment of the models at both local and global level is also improved. The Pearson's correlation between the correct and local predicted score is improved from 0.59 to 0.70 on CASP8 and from 0.62 to 0.68 on CASP9; for global score to the correct GDT_TS from 0.75 to 0.80 and from 0.77 to 0.80 again compared to the second-best single methods in CASP8 and CASP9, respectively. ProQ2 is available at http://proq2.wallnerlab.org.
Kandaswamy, Krishna Kumar; Pugalenthi, Ganesan; Möller, Steffen; Hartmann, Enno; Kalies, Kai-Uwe; Suganthan, P N; Martinetz, Thomas
2010-12-01
Apoptosis is an essential process for controlling tissue homeostasis by regulating a physiological balance between cell proliferation and cell death. The subcellular locations of proteins performing the cell death are determined by mostly independent cellular mechanisms. The regular bioinformatics tools to predict the subcellular locations of such apoptotic proteins do often fail. This work proposes a model for the sorting of proteins that are involved in apoptosis, allowing us to both the prediction of their subcellular locations as well as the molecular properties that contributed to it. We report a novel hybrid Genetic Algorithm (GA)/Support Vector Machine (SVM) approach to predict apoptotic protein sequences using 119 sequence derived properties like frequency of amino acid groups, secondary structure, and physicochemical properties. GA is used for selecting a near-optimal subset of informative features that is most relevant for the classification. Jackknife cross-validation is applied to test the predictive capability of the proposed method on 317 apoptosis proteins. Our method achieved 85.80% accuracy using all 119 features and 89.91% accuracy for 25 features selected by GA. Our models were examined by a test dataset of 98 apoptosis proteins and obtained an overall accuracy of 90.34%. The results show that the proposed approach is promising; it is able to select small subsets of features and still improves the classification accuracy. Our model can contribute to the understanding of programmed cell death and drug discovery. The software and dataset are available at http://www.inb.uni-luebeck.de/tools-demos/apoptosis/GASVM.
The sequence, structure and evolutionary features of HOTAIR in mammals
2011-01-01
Background An increasing number of long noncoding RNAs (lncRNAs) have been identified recently. Different from all the others that function in cis to regulate local gene expression, the newly identified HOTAIR is located between HoxC11 and HoxC12 in the human genome and regulates HoxD expression in multiple tissues. Like the well-characterised lncRNA Xist, HOTAIR binds to polycomb proteins to methylate histones at multiple HoxD loci, but unlike Xist, many details of its structure and function, as well as the trans regulation, remain unclear. Moreover, HOTAIR is involved in the aberrant regulation of gene expression in cancer. Results To identify conserved domains in HOTAIR and study the phylogenetic distribution of this lncRNA, we searched the genomes of 10 mammalian and 3 non-mammalian vertebrates for matches to its 6 exons and the two conserved domains within the 1800 bp exon6 using Infernal. There was just one high-scoring hit for each mammal, but many low-scoring hits were found in both mammals and non-mammalian vertebrates. These hits and their flanking genes in four placental mammals and platypus were examined to determine whether HOTAIR contained elements shared by other lncRNAs. Several of the hits were within unknown transcripts or ncRNAs, many were within introns of, or antisense to, protein-coding genes, and conservation of the flanking genes was observed only between human and chimpanzee. Phylogenetic analysis revealed discrete evolutionary dynamics for orthologous sequences of HOTAIR exons. Exon1 at the 5' end and a domain in exon6 near the 3' end, which contain domains that bind to multiple proteins, have evolved faster in primates than in other mammals. Structures were predicted for exon1, two domains of exon6 and the full HOTAIR sequence. The sequence and structure of two fragments, in exon1 and the domain B of exon6 respectively, were identified to robustly occur in predicted structures of exon1, domain B of exon6 and the full HOTAIR in mammals. Conclusions HOTAIR exists in mammals, has poorly conserved sequences and considerably conserved structures, and has evolved faster than nearby HoxC genes. Exons of HOTAIR show distinct evolutionary features, and a 239 bp domain in the 1804 bp exon6 is especially conserved. These features, together with the absence of some exons and sequences in mouse, rat and kangaroo, suggest ab initio generation of HOTAIR in marsupials. Structure prediction identifies two fragments in the 5' end exon1 and the 3' end domain B of exon6, with sequence and structure invariably occurring in various predicted structures of exon1, the domain B of exon6 and the full HOTAIR. PMID:21496275
Predicting Chemically Induced Duodenal Ulcer and Adrenal Necrosis with Classification Trees
NASA Astrophysics Data System (ADS)
Giampaolo, Casimiro; Gray, Andrew T.; Olshen, Richard A.; Szabo, Sandor
1991-07-01
Binary tree-structured statistical classification algorithms and properties of 56 model alkyl nucleophiles were brought to bear on two problems of experimental pharmacology and toxicology. Each rat of a learning sample of 745 was administered one compound and autopsied to determine the presence of duodenal ulcer or adrenal hemorrhagic necrosis. The cited statistical classification schemes were then applied to these outcomes and 67 features of the compounds to ascertain those characteristics that are associated with biologic activity. For predicting duodenal ulceration, dipole moment, melting point, and solubility in octanol are particularly important, while for predicting adrenal necrosis, important features include the number of sulfhydryl groups and double bonds. These methods may constitute inexpensive but powerful ways to screen untested compounds for possible organ-specific toxicity. Mechanisms for the etiology and pathogenesis of the duodenal and adrenal lesions are suggested, as are additional avenues for drug design.
Lee, Yong-Jik; Lee, Sang-Jae; Kim, Seong-Bo; Lee, Sang Jun; Lee, Sung Haeng; Lee, Dong-Woo
2014-03-18
Structural genomics demonstrates that despite low levels of structural similarity of proteins comprising a metabolic pathway, their substrate binding regions are likely to be conserved. Herein based on the 3D-structures of the α/β-fold proteins involved in the ara operon, we attempted to predict the substrate binding residues of thermophilic Geobacillus stearothermophilus L-arabinose isomerase (GSAI) with no 3D-structure available. Comparison of the structures of L-arabinose catabolic enzymes revealed a conserved feature to form the substrate-binding modules, which can be extended to predict the substrate binding site of GSAI (i.e., D195, E261 and E333). Moreover, these data implicated that proteins in the l-arabinose metabolic pathway might retain their substrate binding niches as the modular structure through conserved molecular evolution even with totally different structural scaffolds. Copyright © 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Checefsky, Walter A.; Abidin, Anas Z.; Nagarajan, Mahesh B.; Bauer, Jan S.; Baum, Thomas; Wismüller, Axel
2016-03-01
The current clinical standard for measuring Bone Mineral Density (BMD) is dual X-ray absorptiometry, however more recently BMD derived from volumetric quantitative computed tomography has been shown to demonstrate a high association with spinal fracture susceptibility. In this study, we propose a method of fracture risk assessment using structural properties of trabecular bone in spinal vertebrae. Experimental data was acquired via axial multi-detector CT (MDCT) from 12 spinal vertebrae specimens using a whole-body 256-row CT scanner with a dedicated calibration phantom. Common image processing methods were used to annotate the trabecular compartment in the vertebral slices creating a circular region of interest (ROI) that excluded cortical bone for each slice. The pixels inside the ROI were converted to values indicative of BMD. High dimensional geometrical features were derived using the scaling index method (SIM) at different radii and scaling factors (SF). The mean BMD values within the ROI were then extracted and used in conjunction with a support vector machine to predict the failure load of the specimens. Prediction performance was measured using the root-mean-square error (RMSE) metric and determined that SIM combined with mean BMD features (RMSE = 0.82 +/- 0.37) outperformed MDCT-measured mean BMD (RMSE = 1.11 +/- 0.33) (p < 10-4). These results demonstrate that biomechanical strength prediction in vertebrae can be significantly improved through the use of SIM-derived texture features from trabecular bone.
Phase Transition and Physical Properties of InS
NASA Astrophysics Data System (ADS)
Wang, Hai-Yan; Li, Xiao-Feng; Xu, Lei; Li, Xu-Sheng; Hu, Qian-Ku
2018-02-01
Using the crystal structure prediction method based on particle swarm optimization algorithm, three phases (Pnnm, C2/m and Pm-3m) for InS are predicted. The new phase Pm-3m of InS under high pressure is firstly reported in the work. The structural features and electronic structure under high pressure of InS are fully investigated. We predicted the stable ground-state structure of InS was the Pnnm phase and phase transformation of InS from Pnnm phase to Pm-3m phase is firstly found at the pressure of about 29.5 GPa. According to the calculated enthalpies of InS with four structures in the pressure range from 20 GPa to 45 GPa, we find the C2/m phase is a metastable phase. The calculated band gap value of about 2.08 eV for InS with Pnnm structure at 0 GPa agrees well with the experimental value. Moreover, the electronic structure suggests that the C2/m and Pm-3m phase are metallic phases. Supported by the National Natural Science Foundation of China under Grant Nos. 11404099, 11304140, 11147167 and Funds of Outstanding Youth of Henan Polytechnic University, China under Grant No. J2014-05
NASA Astrophysics Data System (ADS)
Ivanov, M. A.; Head, J. W.
2008-12-01
Detailed geological analysis of the Lakshmi Planum region of western Ishtar Terra results in the establishment of the sequence of major events during the formation and evolution of western Ishtar Terra, an important and somewhat unique area on Venus characterized by a raised volcanic plateau surrounded by distinctive folded mountain belts, such as Maxwell Montes. These mapping results and the stratigraphic and structural relationships provide a basis for addressing the complicated problem of Lakshmi Planum formation and for testing the suite of models previously proposed to explain this structure. We review and classify previous models of formation for western Ishtar Terra into "downwelling" models (generally involving convergence and underthrusting) and "upwelling" models (generally involving plume-like upwelling and divergence). The interpreted nature of units and the sequence of events derived from geological mapping are in contrast to the predictions of the divergent models. The major contradictions are as follows: (1) The very likely presence of an ancient (craton-like) tessera massif in the core of Lakshmi, which is inconsistent with the model of formation of Lakshmi due to rise and collapse of a mantle diapir; (2) The absence of rift zones in the interior of Lakshmi that are predicted by the divergent models; (3) The apparent migration of volcanic activity toward the center of Lakshmi, whereas divergent models predict the opposite trend; (4) The abrupt cessation of ridges of the mountain ranges at the edge of Lakshmi Planum and propagation of these ridges over hundreds of kilometers outside Lakshmi; the divergent models predict the opposite progression in the development of major contractional features. In contrast, convergent models of formation and evolution of Lakshmi Planum appear to be more consistent with the observations and explain this structure by collision and underthrusting/subduction of lower-lying plains with the elevated and rigid block of tessera. These models are capable of explaining formation of the major features of western Ishtar (for example, the mountain belts), the sequences of events, and principal volcanic and tectonic trends during the evolution of Lakshmi. To explain the pronounced north-south asymmetry of Lakshmi these models need to consider the likelihood that the major focal points of collision are at the north and north-west margins of the plateau. We note that pure downwelling models, however, face three important difficulties: (1) The possibly unrealistically long time span that appears to be required to produce the major features of Lakshmi; (2) The strong north-south asymmetry of the Planum; the pure downwelling models predict the formation of a more symmetrical structure; and (3) The absence of radial contractional structures (arches and ridges) in the interior of Lakshmi that would represent the predictions of the downwelling models.
Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan
2016-10-07
RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .
Einhäuser, Wolfgang; Nuthmann, Antje
2016-09-01
During natural scene viewing, humans typically attend and fixate selected locations for about 200-400 ms. Two variables characterize such "overt" attention: the probability of a location being fixated, and the fixation's duration. Both variables have been widely researched, but little is known about their relation. We use a two-step approach to investigate the relation between fixation probability and duration. In the first step, we use a large corpus of fixation data. We demonstrate that fixation probability (empirical salience) predicts fixation duration across different observers and tasks. Linear mixed-effects modeling shows that this relation is explained neither by joint dependencies on simple image features (luminance, contrast, edge density) nor by spatial biases (central bias). In the second step, we experimentally manipulate some of these features. We find that fixation probability from the corpus data still predicts fixation duration for this new set of experimental data. This holds even if stimuli are deprived of low-level images features, as long as higher level scene structure remains intact. Together, this shows a robust relation between fixation duration and probability, which does not depend on simple image features. Moreover, the study exemplifies the combination of empirical research on a large corpus of data with targeted experimental manipulations.
Fabrication of submicron proteinaceous structures by direct laser writing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Serien, Daniela; Takeuchi, Shoji, E-mail: takeuchi@iis.u-tokyo.ac.jp; ERATO Takeuchi Biohybrid Innovation Project, Japan Science and Technology Agency, 4-6-1 Komaba, Meguro-ku, 153-8505 Tokyo
In this paper, we provide a characterization of truly free-standing proteinaceous structures with submicron feature sizes depending on the fabrication conditions by model-based analysis. Protein cross-linking of bovine serum albumin is performed by direct laser writing and two-photon excitation of flavin adenine dinucleotide. We analyze the obtainable fabrication resolution and required threshold energy for polymerization. The applied polymerization model allows prediction of fabrication conditions and resulting fabrication size, alleviating the application of proteinaceous structure fabrication.
Essentialist beliefs about homosexuality: structure and implications for prejudice.
Haslam, Nick; Levy, Sheri R
2006-04-01
The structure of beliefs about the nature of homosexuality, and their association with antigay attitudes, were examined in three studies (Ns = 309, 487, and 216). Contrary to previous research, three dimensions were obtained: the belief that homosexuality is biologically based, immutable, and fixed early in life; the belief that it is cross-culturally and historically universal; and the belief that it constitutes a discrete, entitative type with defining features. Study 1 supported a three-factor structure for essentialist beliefs about male homosexuality. Study 2 replicated this structure with confirmatory factor analysis, extended it to beliefs about lesbianism, showed that all three dimensions predicted antigay attitudes, and demonstrated that essentialist beliefs mediate associations between prejudice and gender, ethnicity, and religiosity. Study 3 replicated the belief structure and mediation effects in a community sample and showed that essentialist beliefs predict antigay prejudice independently of right-wing authoritarianism, social dominance orientation, and political conservatism.
Camproux, A C; Tufféry, P
2005-08-05
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.
Binding ligand prediction for proteins using partial matching of local surface patches.
Sael, Lee; Kihara, Daisuke
2010-01-01
Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.
Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches
Sael, Lee; Kihara, Daisuke
2010-01-01
Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188
Predicting Greater Prairie-Chicken Lek Site Suitability to Inform Conservation Actions
Hovick, Torre J.; Dahlgren, David K.; Papeş, Monica; Elmore, R. Dwayne; Pitman, James C.
2015-01-01
The demands of a growing human population dictates that expansion of energy infrastructure, roads, and other development frequently takes place in native rangelands. Particularly, transmission lines and roads commonly divide rural landscapes and increase fragmentation. This has direct and indirect consequences on native wildlife that can be mitigated through thoughtful planning and proactive approaches to identifying areas of high conservation priority. We used nine years (2003–2011) of Greater Prairie-Chicken (Tympanuchus cupido) lek locations totaling 870 unique leks sites in Kansas and seven geographic information system (GIS) layers describing land cover, topography, and anthropogenic structures to model habitat suitability across the state. The models obtained had low omission rates (<0.18) and high area under the curve scores (AUC >0.81), indicating high model performance and reliability of predicted habitat suitability for Greater Prairie-Chickens. We found that elevation was the most influential in predicting lek locations, contributing three times more predictive power than any other variable. However, models were improved by the addition of land cover and anthropogenic features (transmission lines, roads, and oil and gas structures). Overall, our analysis provides a hierarchal understanding of Greater Prairie-Chicken habitat suitability that is broadly based on geomorphological features followed by land cover suitability. We found that when land features and vegetation cover are suitable for Greater Prairie-Chickens, fragmentation by anthropogenic sources such as roadways and transmission lines are a concern. Therefore, it is our recommendation that future human development in Kansas avoid areas that our models identified as highly suitable for Greater Prairie-Chickens and focus development on land cover types that are of lower conservation concern. PMID:26317349
Predicting Greater Prairie-Chicken Lek Site Suitability to Inform Conservation Actions.
Hovick, Torre J; Dahlgren, David K; Papeş, Monica; Elmore, R Dwayne; Pitman, James C
2015-01-01
The demands of a growing human population dictates that expansion of energy infrastructure, roads, and other development frequently takes place in native rangelands. Particularly, transmission lines and roads commonly divide rural landscapes and increase fragmentation. This has direct and indirect consequences on native wildlife that can be mitigated through thoughtful planning and proactive approaches to identifying areas of high conservation priority. We used nine years (2003-2011) of Greater Prairie-Chicken (Tympanuchus cupido) lek locations totaling 870 unique leks sites in Kansas and seven geographic information system (GIS) layers describing land cover, topography, and anthropogenic structures to model habitat suitability across the state. The models obtained had low omission rates (<0.18) and high area under the curve scores (AUC >0.81), indicating high model performance and reliability of predicted habitat suitability for Greater Prairie-Chickens. We found that elevation was the most influential in predicting lek locations, contributing three times more predictive power than any other variable. However, models were improved by the addition of land cover and anthropogenic features (transmission lines, roads, and oil and gas structures). Overall, our analysis provides a hierarchal understanding of Greater Prairie-Chicken habitat suitability that is broadly based on geomorphological features followed by land cover suitability. We found that when land features and vegetation cover are suitable for Greater Prairie-Chickens, fragmentation by anthropogenic sources such as roadways and transmission lines are a concern. Therefore, it is our recommendation that future human development in Kansas avoid areas that our models identified as highly suitable for Greater Prairie-Chickens and focus development on land cover types that are of lower conservation concern.
Kavianpour, Hamidreza; Vasighi, Mahdi
2017-02-01
Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.
Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)
Singh, Ranjan K.; Tanner, John J.
2013-01-01
Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20–30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100–200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760
Pre-Test Analysis Predictions for the Shell Buckling Knockdown Factor Checkout Tests - TA01 and TA02
NASA Technical Reports Server (NTRS)
Thornburgh, Robert P.; Hilburger, Mark W.
2011-01-01
This report summarizes the pre-test analysis predictions for the SBKF-P2-CYL-TA01 and SBKF-P2-CYL-TA02 shell buckling tests conducted at the Marshall Space Flight Center (MSFC) in support of the Shell Buckling Knockdown Factor (SBKF) Project, NASA Engineering and Safety Center (NESC) Assessment. The test article (TA) is an 8-foot-diameter aluminum-lithium (Al-Li) orthogrid cylindrical shell with similar design features as that of the proposed Ares-I and Ares-V barrel structures. In support of the testing effort, detailed structural analyses were conducted and the results were used to monitor the behavior of the TA during the testing. A summary of predicted results for each of the five load sequences is presented herein.
Hammond, Matthew D; Cimpian, Andrei
2017-05-01
Stereotypes are typically defined as beliefs about groups, but this definition is underspecified. Beliefs about groups can be generic or statistical. Generic beliefs attribute features to entire groups (e.g., men are strong), whereas statistical beliefs encode the perceived prevalence of features (e.g., how common it is for men to be strong). In the present research, we sought to determine which beliefs-generic or statistical-are more central to the cognitive structure of stereotypes. Specifically, we tested whether generic or statistical beliefs are more influential in people's social judgments, on the assumption that greater functional importance indicates greater centrality in stereotype structure. Relative to statistical beliefs, generic beliefs about social groups were significantly stronger predictors of expectations (Studies 1-3) and explanations (Study 4) for unfamiliar individuals' traits. In addition, consistent with prior evidence that generic beliefs are cognitively simpler than statistical beliefs, generic beliefs were particularly predictive of social judgments for participants with more intuitive (vs. analytic) cognitive styles and for participants higher (vs. lower) in authoritarianism, who tend to view outgroups in simplistic, all-or-none terms. The present studies suggest that generic beliefs about groups are more central than statistical beliefs to the cognitive structure of stereotypes. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Crystal Structure Prediction and its Application in Earth and Materials Sciences
NASA Astrophysics Data System (ADS)
Zhu, Qiang
First of all, we describe how to predict crystal structure by evolutionary approach, and extend this method to study the packing of organic molecules, by our specially designed constrained evolutionary algorithm. The main feature of this new approach is that each unit or molecule is treated as a whole body, which drastically reduces the search space and improves the efficiency. The improved method is possibly to be applied in the fields of (1) high pressure phase of simple molecules (H2O, NH3, CH4, etc); (2) pharmaceutical molecules (glycine, aspirin, etc); (3) complex inorganic crystals containing cluster or molecular unit, (Mg(BH4)2, Ca(BH4)2, etc). One application of the constrained evolutionary algorithm is given by the study of (Mg(BH4)2, which is a promising materials for hydrogen storage. Our prediction does not only reproduce the previous work on Mg(BH4)2 at ambient condition, but also yields two new tetragonal structures at high pressure, with space groups P4 and I41/acd are predicted to be lower in enthalpy, by 15.4 kJ/mol and 21.2 kJ/mol, respectively, than the earlier proposed P42nm phase. We have simulated X-ray diffraction spectra, lattice dynamics, and equations of state of these phases. The density, volume contraction, bulk modulus, and the simulated XRD patterns of P4 and I41/acd structures are in excellent agreement with the experimental results. Two kinds of oxides (Xe-O and Mg-O) have been studied under megabar pressures. For XeO, we predict the existence of thermodynamically stable Xe-O compounds at high pressures (XeO, XeO2 and XeO3 become stable at pressures of 83, 102 and 114 GPa, respectively). For Mg-O, our calculations find that two extraordinary compounds MgO2 and Mg3O 2 become thermodynamically stable at 116 GPa and 500 GPa, respectively. Our calculations indicate large charge transfer in these oxides for both systems, suggesting that large electronegativity difference and pressure are the key factors favouring their formations. We also discuss if these oxides might exist at earth and planetary conditions. If the target properties are set as the global fitness functions while structure relaxations are energy/enthalpy minimization, such hybrid optimization technique could effectively explore the landscape of properties for the given systems. Here we illustrate this function by the case of searching for superdense carbon allotropes. We find three structures (hP3, tI12, and tP12) that have significantly greater density. Furthermore, we find a collection of other superdense structures based on different ways of packing carbon tetrahedral. Superdense carbon allotropes are predicted to have remarkably high refractive indices and strong dispersion of light. Apart from evolutionary approach, there also exist some other methods for structural prediction. One can also combine the features from different methods. We develop a novel method for crystal structure prediction, based on metadynamics and evolutionary algorithms. This technique can be used to produce efficiently both the ground state and metastable states easily reachable from a reasonable initial structure. We use the cell shape as collective variable and evolutionary variation operators developed in the context of the USPEX method to equilibrate the system as a function of the collective variables. We illustrate how this approach helps one to find stable and metastable states for Al2SiO5, SiO2, MgSiO3. Apart from predicting crystal structures, the new method can also provide insight into mechanisms of phase transitions. This method is especially powerful in sampling the metastable structures from a given configuration. Experiments on cold compression indicated the existence of a new superhard carbon allotrope. Numerous metastable candidate structures featuring different topologies have been proposed for this allotrope. We use evolutionary metadynamics to systematically search for possible candidates which could be accessible from graphite. (Abstract shortened by UMI.)
Beheshti, Iman; Demirel, Hasan; Matsuda, Hiroshi
2017-04-01
We developed a novel computer-aided diagnosis (CAD) system that uses feature-ranking and a genetic algorithm to analyze structural magnetic resonance imaging data; using this system, we can predict conversion of mild cognitive impairment (MCI)-to-Alzheimer's disease (AD) at between one and three years before clinical diagnosis. The CAD system was developed in four stages. First, we used a voxel-based morphometry technique to investigate global and local gray matter (GM) atrophy in an AD group compared with healthy controls (HCs). Regions with significant GM volume reduction were segmented as volumes of interest (VOIs). Second, these VOIs were used to extract voxel values from the respective atrophy regions in AD, HC, stable MCI (sMCI) and progressive MCI (pMCI) patient groups. The voxel values were then extracted into a feature vector. Third, at the feature-selection stage, all features were ranked according to their respective t-test scores and a genetic algorithm designed to find the optimal feature subset. The Fisher criterion was used as part of the objective function in the genetic algorithm. Finally, the classification was carried out using a support vector machine (SVM) with 10-fold cross validation. We evaluated the proposed automatic CAD system by applying it to baseline values from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset (160 AD, 162 HC, 65 sMCI and 71 pMCI subjects). The experimental results indicated that the proposed system is capable of distinguishing between sMCI and pMCI patients, and would be appropriate for practical use in a clinical setting. Copyright © 2017 Elsevier Ltd. All rights reserved.
Setting an Agenda for a Person-Centered Approach to Personality Development.
ERIC Educational Resources Information Center
Robins, Richard W.; Tracy, Jessica L.
2003-01-01
Describes features and benefits of the person-centered approach to studying personality, identifies unanswered questions, and suggests research directions. Benefits noted include focus on intraindividual structure, descriptive efficiency, use of types as moderator variables, predictive validity, and conceptual clarity and intuitive appeal.…
The vanishing cryovolcanoes of Ceres
Sori, Michael M.; Byrne, Shane; Bland, Michael T.; Bramson, Ali; Ermakov, Anton; Hamilton, Christoper; Otto, Katharina; Ruesch, Ottaviano; Russell, Christopher
2017-01-01
Ahuna Mons is a 4 km tall mountain on Ceres interpreted as a geologically young cryovolcanic dome. Other possible cryovolcanic features are more ambiguous, implying that cryovolcanism is only a recent phenomenon or that other cryovolcanic structures have been modified beyond easy identification. We test the hypothesis that Cerean cryovolcanic domes viscously relax, precluding ancient domes from recognition. We use numerical models to predict flow velocities of Ahuna Mons to be 10–500 m/Myr, depending upon assumptions about ice content, rheology, grain size, and thermal parameters. Slower flow rates in this range are sufficiently fast to induce extensive relaxation of cryovolcanic structures over 108–109 years, but gradual enough for Ahuna Mons to remain identifiable today. Positive topographic features, including a tholus underlying Ahuna Mons, may represent relaxed cryovolcanic structures. A composition for Ahuna Mons of >40% ice explains the observed distribution of cryovolcanic structures because viscous relaxation renders old cryovolcanoes unrecognizable.
DNCON2: improved protein contact prediction using two-level deep convolutional neural networks.
Adhikari, Badri; Hou, Jie; Cheng, Jianlin
2018-05-01
Significant improvements in the prediction of protein residue-residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction. In this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks-the first five predict contacts at 6, 7.5, 8, 8.5 and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11 and 12 experiments, DNCON2 achieves mean precisions of 35, 50 and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length. The web server of DNCON2 is at http://sysbio.rnet.missouri.edu/dncon2/ where training and testing datasets as well as the predictions for CASP10, 11 and 12 free-modeling datasets can also be downloaded. Its source code is available at https://github.com/multicom-toolbox/DNCON2/. chengji@missouri.edu. Supplementary data are available at Bioinformatics online.
Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity
Elias-Kirma, Shani; Nir, Ronit; Segal, Eran
2017-01-01
Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements. PMID:28922394
Predicting structured metadata from unstructured metadata.
Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier
2016-01-01
Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data-defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. © The Author(s) 2016. Published by Oxford University Press.
Predicting structured metadata from unstructured metadata
Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier
2016-01-01
Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data—defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. Database URL: http://www.yeastgenome.org/ PMID:28637268
NASA Astrophysics Data System (ADS)
Ghavami, Raouf; Sadeghi, Faridoon; Rasouli, Zolikha; Djannati, Farhad
2012-12-01
Experimental values for the 13C NMR chemical shifts (ppm, TMS = 0) at 300 K ranging from 96.28 ppm (C4' of indole derivative 17) to 159.93 ppm (C4' of indole derivative 23) relative to deuteride chloroform (CDCl3, 77.0 ppm) or dimethylsulfoxide (DMSO, 39.50 ppm) as internal reference in CDCl3 or DMSO-d6 solutions have been collected from literature for thirty 2-functionalized 5-(methylsulfonyl)-1-phenyl-1H-indole derivatives containing different substituted groups. An effective quantitative structure-property relationship (QSPR) models were built using hybrid method combining genetic algorithm (GA) based on stepwise selection multiple linear regression (SWS-MLR) as feature-selection tools and correlation models between each carbon atom of indole derivative and calculated descriptors. Each compound was depicted by molecular structural descriptors that encode constitutional, topological, geometrical, electrostatic, and quantum chemical features. The accuracy of all developed models were confirmed using different types of internal and external procedures and various statistical tests. Furthermore, the domain of applicability for each model which indicates the area of reliable predictions was defined.
Impact of experimental design on PET radiomics in predicting somatic mutation status.
Yip, Stephen S F; Parmar, Chintan; Kim, John; Huynh, Elizabeth; Mak, Raymond H; Aerts, Hugo J W L
2017-12-01
PET-based radiomic features have demonstrated great promises in predicting genetic data. However, various experimental parameters can influence the feature extraction pipeline, and hence, Here, we investigated how experimental settings affect the performance of radiomic features in predicting somatic mutation status in non-small cell lung cancer (NSCLC) patients. 348 NSCLC patients with somatic mutation testing and diagnostic PET images were included in our analysis. Radiomic feature extractions were analyzed for varying voxel sizes, filters and bin widths. 66 radiomic features were evaluated. The performance of features in predicting mutations status was assessed using the area under the receiver-operating-characteristic curve (AUC). The influence of experimental parameters on feature predictability was quantified as the relative difference between the minimum and maximum AUC (δ). The large majority of features (n=56, 85%) were significantly predictive for EGFR mutation status (AUC≥0.61). 29 radiomic features significantly predicted EGFR mutations and were robust to experimental settings with δ Overall <5%. The overall influence (δ Overall ) of the voxel size, filter and bin width for all features ranged from 5% to 15%, respectively. For all features, none of the experimental designs was predictive of KRAS+ from KRAS- (AUC≤0.56). The predictability of 29 radiomic features was robust to the choice of experimental settings; however, these settings need to be carefully chosen for all other features. The combined effect of the investigated processing methods could be substantial and must be considered. Optimized settings that will maximize the predictive performance of individual radiomic features should be investigated in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
On mathematical modelling of flameless combustion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mancini, Marco; Schwoeppe, Patrick; Weber, Roman
2007-07-15
A further analysis of the IFRF semi-industrial-scale experiments on flameless (mild) combustion of natural gas is carried out. The experimental burner features a strong oxidizer jet and two weak natural gas jets. Numerous publications have shown the inability of various RANS-based mathematical models to predict the structure of the weak jet. We have proven that the failure is in error predictions of the entrainment and therefore is not related to any chemistry submodels, as has been postulated. (author)
Contact and Impact Dynamic Modeling Capabilities of LS-DYNA for Fluid-Structure Interaction Problems
2010-12-02
rigid sphere in a vertical water entry,” Applied Ocean Research, 13(1), pp. 43-48. Monaghan, J.J., 1994. “ Simulating free surface flows with SPH ...The kinematic free surface condition was used to determine the intersection between the free surface and the body in the outer flow domain...and the results were compared with analytical and numerical predictions. The predictive capability of ALE and SPH features of LS-DYNA for simulation
Zhang, Hua; Zhang, Tuo; Gao, Jianzhao; Ruan, Jishou; Shen, Shiyi; Kurgan, Lukasz
2012-01-01
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.
Protein functional features are reflected in the patterns of mRNA translation speed.
López, Daniel; Pazos, Florencio
2015-07-09
The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These "synonymous mRNAs" may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of "silent" single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins. We found that a number of protein functional and structural features are reflected in the patterns of ribosome occupancy, secondary structure and tRNA availability along the mRNA. One or more of these proxies of translation speed have distinctive patterns around the mRNA regions coding for certain protein local features. In some cases the three patterns follow a similar trend. We also show specific examples where these patterns of translation speed point to the protein's important structural and functional features. This support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local effects on the translation speed, have some consequence on the final polypeptide. These results open the possibility of predicting a protein's functional regions based on a single genomic sequence, and have implications for heterologous protein expression and fine-tuning protein function.
Multiresolution texture models for brain tumor segmentation in MRI.
Iftekharuddin, Khan M; Ahmed, Shaheen; Hossen, Jakir
2011-01-01
In this study we discuss different types of texture features such as Fractal Dimension (FD) and Multifractional Brownian Motion (mBm) for estimating random structures and varying appearance of brain tissues and tumors in magnetic resonance images (MRI). We use different selection techniques including KullBack - Leibler Divergence (KLD) for ranking different texture and intensity features. We then exploit graph cut, self organizing maps (SOM) and expectation maximization (EM) techniques to fuse selected features for brain tumors segmentation in multimodality T1, T2, and FLAIR MRI. We use different similarity metrics to evaluate quality and robustness of these selected features for tumor segmentation in MRI for real pediatric patients. We also demonstrate a non-patient-specific automated tumor prediction scheme by using improved AdaBoost classification based on these image features.
Simmering, Vanessa R; Wood, Chelsey M
2017-08-01
Working memory is a basic cognitive process that predicts higher-level skills. A central question in theories of working memory development is the generality of the mechanisms proposed to explain improvements in performance. Prior theories have been closely tied to particular tasks and/or age groups, limiting their generalizability. The cognitive dynamics theory of visual working memory development has been proposed to overcome this limitation. From this perspective, developmental improvements arise through the coordination of cognitive processes to meet demands of different behavioral tasks. This notion is described as real-time stability, and can be probed through experiments that assess how changing task demands impact children's performance. The current studies test this account by probing visual working memory for colors and shapes in a change detection task that compares detection of changes to new features versus swaps in color-shape binding. In Experiment 1, 3- to 4-year-old children showed impairments specific to binding swaps, as predicted by decreased real-time stability early in development; 5- to 6-year-old children showed a slight advantage on binding swaps, but 7- to 8-year-old children and adults showed no difference across trial types. Experiment 2 tested the proposed explanation of young children's binding impairment through added perceptual structure, which supported the stability and precision of feature localization in memory-a process key to detecting binding swaps. This additional structure improved young children's binding swap detection, but not new-feature detection or adults' performance. These results provide further evidence for the cognitive dynamics and real-time stability explanation of visual working memory development. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Molecular modeling of the microstructure evolution during carbon fiber processing
NASA Astrophysics Data System (ADS)
Desai, Saaketh; Li, Chunyu; Shen, Tongtong; Strachan, Alejandro
2017-12-01
The rational design of carbon fibers with desired properties requires quantitative relationships between the processing conditions, microstructure, and resulting properties. We developed a molecular model that combines kinetic Monte Carlo and molecular dynamics techniques to predict the microstructure evolution during the processes of carbonization and graphitization of polyacrylonitrile (PAN)-based carbon fibers. The model accurately predicts the cross-sectional microstructure of the fibers with the molecular structure of the stabilized PAN fibers and physics-based chemical reaction rates as the only inputs. The resulting structures exhibit key features observed in electron microcopy studies such as curved graphitic sheets and hairpin structures. In addition, computed X-ray diffraction patterns are in good agreement with experiments. We predict the transverse moduli of the resulting fibers between 1 GPa and 5 GPa, in good agreement with experimental results for high modulus fibers and slightly lower than those of high-strength fibers. The transverse modulus is governed by sliding between graphitic sheets, and the relatively low value for the predicted microstructures can be attributed to their perfect longitudinal texture. Finally, the simulations provide insight into the relationships between chemical kinetics and the final microstructure; we observe that high reaction rates result in porous structures with lower moduli.
Lu, Xiaobing; Yang, Yongzhe; Wu, Fengchun; Gao, Minjian; Xu, Yong; Zhang, Yue; Yao, Yongcheng; Du, Xin; Li, Chengwei; Wu, Lei; Zhong, Xiaomei; Zhou, Yanling; Fan, Ni; Zheng, Yingjun; Xiong, Dongsheng; Peng, Hongjun; Escudero, Javier; Huang, Biao; Li, Xiaobo; Ning, Yuping; Wu, Kai
2016-07-01
Structural abnormalities in schizophrenia (SZ) patients have been well documented with structural magnetic resonance imaging (MRI) data using voxel-based morphometry (VBM) and region of interest (ROI) analyses. However, these analyses can only detect group-wise differences and thus, have a poor predictive value for individuals. In the present study, we applied a machine learning method that combined support vector machine (SVM) with recursive feature elimination (RFE) to discriminate SZ patients from normal controls (NCs) using their structural MRI data. We first employed both VBM and ROI analyses to compare gray matter volume (GMV) and white matter volume (WMV) between 41 SZ patients and 42 age- and sex-matched NCs. The method of SVM combined with RFE was used to discriminate SZ patients from NCs using significant between-group differences in both GMV and WMV as input features. We found that SZ patients showed GM and WM abnormalities in several brain structures primarily involved in the emotion, memory, and visual systems. An SVM with a RFE classifier using the significant structural abnormalities identified by the VBM analysis as input features achieved the best performance (an accuracy of 88.4%, a sensitivity of 91.9%, and a specificity of 84.4%) in the discriminative analyses of SZ patients. These results suggested that distinct neuroanatomical profiles associated with SZ patients might provide a potential biomarker for disease diagnosis, and machine-learning methods can reveal neurobiological mechanisms in psychiatric diseases.
Global Organization of a Positive-strand RNA Virus Genome
Wu, Baodong; Grigull, Jörg; Ore, Moriam O.; Morin, Sylvie; White, K. Andrew
2013-01-01
The genomes of plus-strand RNA viruses contain many regulatory sequences and structures that direct different viral processes. The traditional view of these RNA elements are as local structures present in non-coding regions. However, this view is changing due to the discovery of regulatory elements in coding regions and functional long-range intra-genomic base pairing interactions. The ∼4.8 kb long RNA genome of the tombusvirus tomato bushy stunt virus (TBSV) contains these types of structural features, including six different functional long-distance interactions. We hypothesized that to achieve these multiple interactions this viral genome must utilize a large-scale organizational strategy and, accordingly, we sought to assess the global conformation of the entire TBSV genome. Atomic force micrographs of the genome indicated a mostly condensed structure composed of interconnected protrusions extending from a central hub. This configuration was consistent with the genomic secondary structure model generated using high-throughput selective 2′-hydroxyl acylation analysed by primer extension (i.e. SHAPE), which predicted different sized RNA domains originating from a central region. Known RNA elements were identified in both domain and inter-domain regions, and novel structural features were predicted and functionally confirmed. Interestingly, only two of the six long-range interactions known to form were present in the structural model. However, for those interactions that did not form, complementary partner sequences were positioned relatively close to each other in the structure, suggesting that the secondary structure level of viral genome structure could provide a basic scaffold for the formation of different long-range interactions. The higher-order structural model for the TBSV RNA genome provides a snapshot of the complex framework that allows multiple functional components to operate in concert within a confined context. PMID:23717202
Kulik, Natallia; Slámová, Kristýna; Ettrich, Rüdiger; Křen, Vladimír
2015-01-28
β-N-Acetylhexosaminidase (GH20) from the filamentous fungus Talaromyces flavus, previously identified as a prominent enzyme in the biosynthesis of modified glycosides, lacks a high resolution three-dimensional structure so far. Despite of high sequence identity to previously reported Aspergillus oryzae and Penicilluim oxalicum β-N-acetylhexosaminidases, this enzyme tolerates significantly better substrate modification. Understanding of key structural features, prediction of effective mutants and potential substrate characteristics prior to their synthesis are of general interest. Computational methods including homology modeling and molecular dynamics simulations were applied to shad light on the structure-activity relationship in the enzyme. Primary sequence analysis revealed some variable regions able to influence difference in substrate affinity of hexosaminidases. Moreover, docking in combination with consequent molecular dynamics simulations of C-6 modified glycosides enabled us to identify the structural features required for accommodation and processing of these bulky substrates in the active site of hexosaminidase from T. flavus. To access the reliability of predictions on basis of the reported model, all results were confronted with available experimental data that demonstrated the principal correctness of the predictions as well as the model. The main variable regions in β-N-acetylhexosaminidases determining difference in modified substrate affinity are located close to the active site entrance and engage two loops. Differences in primary sequence and the spatial arrangement of these loops and their interplay with active site amino acids, reflected by interaction energies and dynamics, account for the different catalytic activity and substrate specificity of the various fungal and bacterial β-N-acetylhexosaminidases.
Bao, Yu; Hayashida, Morihiro; Akutsu, Tatsuya
2016-11-25
Dicer is necessary for the process of mature microRNA (miRNA) formation because the Dicer enzyme cleaves pre-miRNA correctly to generate miRNA with correct seed regions. Nonetheless, the mechanism underlying the selection of a Dicer cleavage site is still not fully understood. To date, several studies have been conducted to solve this problem, for example, a recent discovery indicates that the loop/bulge structure plays a central role in the selection of Dicer cleavage sites. In accordance with this breakthrough, a support vector machine (SVM)-based method called PHDCleav was developed to predict Dicer cleavage sites which outperforms other methods based on random forest and naive Bayes. PHDCleav, however, tests only whether a position in the shift window belongs to a loop/bulge structure. In this paper, we used the length of loop/bulge structures (in addition to their presence or absence) to develop an improved method, LBSizeCleav, for predicting Dicer cleavage sites. To evaluate our method, we used 810 empirically validated sequences of human pre-miRNAs and performed fivefold cross-validation. In both 5p and 3p arms of pre-miRNAs, LBSizeCleav showed greater prediction accuracy than PHDCleav did. This result suggests that the length of loop/bulge structures is useful for prediction of Dicer cleavage sites. We developed a novel algorithm for feature space mapping based on the length of a loop/bulge for predicting Dicer cleavage sites. The better performance of our method indicates the usefulness of the length of loop/bulge structures for such predictions.
Length-independent structural similarities enrich the antibody CDR canonical class model.
Nowak, Jaroslaw; Baker, Terry; Georges, Guy; Kelm, Sebastian; Klostermann, Stefan; Shi, Jiye; Sridharan, Sudharsan; Deane, Charlotte M
2016-01-01
Complementarity-determining regions (CDRs) are antibody loops that make up the antigen binding site. Here, we show that all CDR types have structurally similar loops of different lengths. Based on these findings, we created length-independent canonical classes for the non-H3 CDRs. Our length variable structural clusters show strong sequence patterns suggesting either that they evolved from the same original structure or result from some form of convergence. We find that our length-independent method not only clusters a larger number of CDRs, but also predicts canonical class from sequence better than the standard length-dependent approach. To demonstrate the usefulness of our findings, we predicted cluster membership of CDR-L3 sequences from 3 next-generation sequencing datasets of the antibody repertoire (over 1,000,000 sequences). Using the length-independent clusters, we can structurally classify an additional 135,000 sequences, which represents a ∼20% improvement over the standard approach. This suggests that our length-independent canonical classes might be a highly prevalent feature of antibody space, and could substantially improve our ability to accurately predict the structure of novel CDRs identified by next-generation sequencing.
Four structural risk factors identify most fibril-forming kappa light chains.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stevens, F. J.; Biosciences Division
2000-09-01
Antibody light chains (LCs) comprise the most structurally diverse family of proteins involved in amyloidosis. Many antibody LCs incorporate structural features that impair their stability and solubility, leading to their assembly into fibrils and to their subsequent pathological deposition when produced in excess during multiple myeloma and primary amyloidosis. The particular amino acid variations in antibody LCs that account for fibril formation and amyloidogenesis have not been identified. This study focuses on amyloidogenesis within the Kl family of human LCs. Reanalysis of the current database of primary structures of proteins from more than 100 patients who produced Kl LCS, 37more » of which were amyloidogenic, reveals apparent structural features that may contribute to amyloidosis. These features include loss of conserved residues or the gain of particular residues through mutation at sites involving a repertoire of approximately 20% of the amino acid positions in the light chain variable domain (V{sub L}). Moreover, 80% of all K1 amyloidogenic V{sub L}s are identifiable by the presence of at least one of three single-site substitutions or the acquisition of an N-linked glycosylation site through mutations. These findings suggest that it is feasible to predict fibril propensity by analysis of primary structure.« less
EFFECTS OF CYTOSOLIC CONVERSION OF ESTRONE TO ESTRADIOL ON RAINBOW TROUT ER BINDING AFFINITY
Relative binding affinity (RBA) for estrone (E1) to the rainbow trout (Oncorhynchus mykiss) estrogen receptor (rtER) was measured as part of a larger effort to determine chemical structural features predictive of chemical estrogenicity in fish. Estrone RBA was found to vary consi...
A Rational Analysis of Rule-Based Concept Learning
ERIC Educational Resources Information Center
Goodman, Noah D.; Tenenbaum, Joshua B.; Feldman, Jacob; Griffiths, Thomas L.
2008-01-01
This article proposes a new model of human concept learning that provides a rational analysis of learning feature-based concepts. This model is built upon Bayesian inference for a grammatically structured hypothesis space--a concept language of logical rules. This article compares the model predictions to human generalization judgments in several…
Ning, Kaida; Chen, Bo; Sun, Fengzhu; Hobel, Zachary; Zhao, Lu; Matloff, Will; Toga, Arthur W
2018-08-01
A long-standing question is how to best use brain morphometric and genetic data to distinguish Alzheimer's disease (AD) patients from cognitively normal (CN) subjects and to predict those who will progress from mild cognitive impairment (MCI) to AD. Here, we use a neural network (NN) framework on both magnetic resonance imaging-derived quantitative structural brain measures and genetic data to address this question. We tested the effectiveness of NN models in classifying and predicting AD. We further performed a novel analysis of the NN model to gain insight into the most predictive imaging and genetics features and to identify possible interactions between features that affect AD risk. Data were obtained from the AD Neuroimaging Initiative cohort and included baseline structural MRI data and single nucleotide polymorphism (SNP) data for 138 AD patients, 225 CN subjects, and 358 MCI patients. We found that NN models with both brain and SNP features as predictors perform significantly better than models with either alone in classifying AD and CN subjects, with an area under the receiver operating characteristic curve (AUC) of 0.992, and in predicting the progression from MCI to AD (AUC=0.835). The most important predictors in the NN model were the left middle temporal gyrus volume, the left hippocampus volume, the right entorhinal cortex volume, and the APOE (a gene that encodes apolipoprotein E) ɛ4 risk allele. Furthermore, we identified interactions between the right parahippocampal gyrus and the right lateral occipital gyrus, the right banks of the superior temporal sulcus and the left posterior cingulate, and SNP rs10838725 and the left lateral occipital gyrus. Our work shows the ability of NN models to not only classify and predict AD occurrence but also to identify important AD risk factors and interactions among them. Copyright © 2018 Elsevier Inc. All rights reserved.
Brain properties predict proximity to symptom onset in sporadic Alzheimer's disease.
Vogel, Jacob W; Vachon-Presseau, Etienne; Pichet Binette, Alexa; Tam, Angela; Orban, Pierre; La Joie, Renaud; Savard, Mélissa; Picard, Cynthia; Poirier, Judes; Bellec, Pierre; Breitner, John C S; Villeneuve, Sylvia
2018-06-01
See Tijms and Visser (doi:10.1093/brain/awy113) for a scientific commentary on this article.Alzheimer's disease is preceded by a lengthy 'preclinical' stage spanning many years, during which subtle brain changes occur in the absence of overt cognitive symptoms. Predicting when the onset of disease symptoms will occur is an unsolved challenge in individuals with sporadic Alzheimer's disease. In individuals with autosomal dominant genetic Alzheimer's disease, the age of symptom onset is similar across generations, allowing the prediction of individual onset times with some accuracy. We extend this concept to persons with a parental history of sporadic Alzheimer's disease to test whether an individual's symptom onset age can be informed by the onset age of their affected parent, and whether this estimated onset age can be predicted using only MRI. Structural and functional MRIs were acquired from 255 ageing cognitively healthy subjects with a parental history of sporadic Alzheimer's disease from the PREVENT-AD cohort. Years to estimated symptom onset was calculated as participant age minus age of parental symptom onset. Grey matter volume was extracted from T1-weighted images and whole-brain resting state functional connectivity was evaluated using degree count. Both modalities were summarized using a 444-region cortical-subcortical atlas. The entire sample was divided into training (n = 138) and testing (n = 68) sets. Within the training set, individuals closer to or beyond their parent's symptom onset demonstrated reduced grey matter volume and altered functional connectivity, specifically in regions known to be vulnerable in Alzheimer's disease. Machine learning was used to identify a weighted set of imaging features trained to predict years to estimated symptom onset. This feature set alone significantly predicted years to estimated symptom onset in the unseen testing data. This model, using only neuroimaging features, significantly outperformed a similar model instead trained with cognitive, genetic, imaging and demographic features used in a traditional clinical setting. We next tested if these brain properties could be generalized to predict time to clinical progression in a subgroup of 26 individuals from the Alzheimer's Disease Neuroimaging Initiative, who eventually converted either to mild cognitive impairment or to Alzheimer's dementia. The feature set trained on years to estimated symptom onset in the PREVENT-AD predicted variance in time to clinical conversion in this separate longitudinal dataset. Adjusting for participant age did not impact any of the results. These findings demonstrate that years to estimated symptom onset or similar measures can be predicted from brain features and may help estimate presymptomatic disease progression in at-risk individuals.
Topology of membrane proteins-predictions, limitations and variations.
Tsirigos, Konstantinos D; Govindarajan, Sudha; Bassot, Claudio; Västermark, Åke; Lamb, John; Shu, Nanjiang; Elofsson, Arne
2017-10-26
Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these non-standard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.
IRESPred: Web Server for Prediction of Cellular and Viral Internal Ribosome Entry Site (IRES)
Kolekar, Pandurang; Pataskar, Abhijeet; Kulkarni-Kale, Urmila; Pal, Jayanta; Kulkarni, Abhijeet
2016-01-01
Cellular mRNAs are predominantly translated in a cap-dependent manner. However, some viral and a subset of cellular mRNAs initiate their translation in a cap-independent manner. This requires presence of a structured RNA element, known as, Internal Ribosome Entry Site (IRES) in their 5′ untranslated regions (UTRs). Experimental demonstration of IRES in UTR remains a challenging task. Computational prediction of IRES merely based on sequence and structure conservation is also difficult, particularly for cellular IRES. A web server, IRESPred is developed for prediction of both viral and cellular IRES using Support Vector Machine (SVM). The predictive model was built using 35 features that are based on sequence and structural properties of UTRs and the probabilities of interactions between UTR and small subunit ribosomal proteins (SSRPs). The model was found to have 75.51% accuracy, 75.75% sensitivity, 75.25% specificity, 75.75% precision and Matthews Correlation Coefficient (MCC) of 0.51 in blind testing. IRESPred was found to perform better than the only available viral IRES prediction server, VIPS. The IRESPred server is freely available at http://bioinfo.net.in/IRESPred/. PMID:27264539
Mining protein database using machine learning techniques.
Camargo, Renata da Silva; Niranjan, Mahesan
2008-08-25
With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous.
We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies.
In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a \\"knowledge gap\\" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins.
Application of Functional Use Predictions to Aid in Structure ...
Humans are potentially exposed to thousands of anthropogenic chemicals in commerce. Recent work has shown that the bulk of this exposure may occur in near-field indoor environments (e.g., home, school, work, etc.). Advances in suspect screening analyses (SSA) now allow an improved understanding of the chemicals present in these environments. However, due to the nature of suspect screening techniques, investigators are often left with chemical formula predictions, with the possibility of many chemical structures matching to each formula. Here, newly developed quantitative structure-use relationship (QSUR) models are used to identify potential exposure sources for candidate structures. Previously, a suspect screening workflow was introduced and applied to house dust samples collected from the U.S. Department of Housing and Urban Development’s American Healthy Homes Survey (AHHS) [Rager, et al., Env. Int. 88 (2016)]. This workflow utilized the US EPA’s Distributed Structure-Searchable Toxicity (DSSTox) Database to link identified molecular features to molecular formulas, and ultimately chemical structures. Multiple QSUR models were applied to support the evaluation of candidate structures. These QSURs predict the likelihood of a chemical having a functional use commonly associated with consumer products having near-field use. For 3,228 structures identified as possible chemicals in AHHS house dust samples, we were able to obtain the required descriptors to appl
Yang, Jie; Yin, Yingying; Zhang, Zuping; Long, Jun; Dong, Jian; Zhang, Yuqun; Xu, Zhi; Li, Lei; Liu, Jie; Yuan, Yonggui
2018-02-05
Major depressive disorder (MDD) is characterized by dysregulation of distributed structural and functional networks. It is now recognized that structural and functional networks are related at multiple temporal scales. The recent emergence of multimodal fusion methods has made it possible to comprehensively and systematically investigate brain networks and thereby provide essential information for influencing disease diagnosis and prognosis. However, such investigations are hampered by the inconsistent dimensionality features between structural and functional networks. Thus, a semi-multimodal fusion hierarchical feature reduction framework is proposed. Feature reduction is a vital procedure in classification that can be used to eliminate irrelevant and redundant information and thereby improve the accuracy of disease diagnosis. Our proposed framework primarily consists of two steps. The first step considers the connection distances in both structural and functional networks between MDD and healthy control (HC) groups. By adding a constraint based on sparsity regularization, the second step fully utilizes the inter-relationship between the two modalities. However, in contrast to conventional multi-modality multi-task methods, the structural networks were considered to play only a subsidiary role in feature reduction and were not included in the following classification. The proposed method achieved a classification accuracy, specificity, sensitivity, and area under the curve of 84.91%, 88.6%, 81.29%, and 0.91, respectively. Moreover, the frontal-limbic system contributed the most to disease diagnosis. Importantly, by taking full advantage of the complementary information from multimodal neuroimaging data, the selected consensus connections may be highly reliable biomarkers of MDD. Copyright © 2017 Elsevier B.V. All rights reserved.
Jones, David T; Kandathil, Shaun M
2018-04-26
In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.
Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969
Trabanino, Rene J; Vaidehi, Nagarajan; Hall, Spencer E; Goddard, William A; Floriano, Wely
2013-02-05
The invention provides computer-implemented methods and apparatus implementing a hierarchical protocol using multiscale molecular dynamics and molecular modeling methods to predict the presence of transmembrane regions in proteins, such as G-Protein Coupled Receptors (GPCR), and protein structural models generated according to the protocol. The protocol features a coarse grain sampling method, such as hydrophobicity analysis, to provide a fast and accurate procedure for predicting transmembrane regions. Methods and apparatus of the invention are useful to screen protein or polynucleotide databases for encoded proteins with transmembrane regions, such as GPCRs.
Palacios, P; Aguilera, I; Sánchez, K; Conesa, J C; Wahnón, P
2008-07-25
Results of density-functional calculations for indium thiospinel semiconductors substituted at octahedral sites with isolated transition metals (M=Ti,V) show an isolated partially filled narrow band containing three t2g-type states per M atom inside the usual semiconductor band gap. Thanks to this electronic structure feature, these materials will allow the absorption of photons with energy below the band gap, in addition to the normal light absorption of a semiconductor. To our knowledge, we demonstrate for the first time the formation of an isolated intermediate electronic band structure through M substitution at octahedral sites in a semiconductor, leading to an enhancement of the absorption coefficient in both infrared and visible ranges of the solar spectrum. This electronic structure feature could be applied for developing a new third-generation photovoltaic cell.
Trezza, Alfonso; Bernini, Andrea; Langella, Andrea; Ascher, David B; Pires, Douglas E V; Sodi, Andrea; Passerini, Ilaria; Pelo, Elisabetta; Rizzo, Stanislao; Niccolai, Neri; Spiga, Ottavia
2017-10-01
The aim of this article is to report the investigation of the structural features of ABCA4, a protein associated with a genetic retinal disease. A new database collecting knowledge of ABCA4 structure may facilitate predictions about the possible functional consequences of gene mutations observed in clinical practice. In order to correlate structural and functional effects of the observed mutations, the structure of mouse P-glycoprotein was used as a template for homology modeling. The obtained structural information and genetic data are the basis of our relational database (ABCA4Database). Sequence variability among all ABCA4-deposited entries was calculated and reported as Shannon entropy score at the residue level. The three-dimensional model of ABCA4 structure was used to locate the spatial distribution of the observed variable regions. Our predictions from structural in silico tools were able to accurately link the functional effects of mutations to phenotype. The development of the ABCA4Database gathers all the available genetic and structural information, yielding a global view of the molecular basis of some retinal diseases. ABCA4 modeled structure provides a molecular basis on which to analyze protein sequence mutations related to genetic retinal disease in order to predict the risk of retinal disease across all possible ABCA4 mutations. Additionally, our ABCA4 predicted structure is a good starting point for the creation of a new data analysis model, appropriate for precision medicine, in order to develop a deeper knowledge network of the disease and to improve the management of patients.
Löhner, Alexander; Cogdell, Richard
2018-01-01
As the electronic energies of the chromophores in a pigment–protein complex are imposed by the geometrical structure of the protein, this allows the spectral information obtained to be compared with predictions derived from structural models. Thereby, the single-molecule approach is particularly suited for the elucidation of specific, distinctive spectral features that are key for a particular model structure, and that would not be observable in ensemble-averaged spectra due to the heterogeneity of the biological objects. In this concise review, we illustrate with the example of the light-harvesting complexes from photosynthetic purple bacteria how results from low-temperature single-molecule spectroscopy can be used to discriminate between different structural models. Thereby the low-temperature approach provides two advantages: (i) owing to the negligible photobleaching, very long observation times become possible, and more importantly, (ii) at cryogenic temperatures, vibrational degrees of freedom are frozen out, leading to sharper spectral features and in turn to better resolved spectra. PMID:29321265
Dos Santos Vasconcelos, Crhisllane Rafaele; de Lima Campos, Túlio; Rezende, Antonio Mauro
2018-03-06
Systematic analysis of a parasite interactome is a key approach to understand different biological processes. It makes possible to elucidate disease mechanisms, to predict protein functions and to select promising targets for drug development. Currently, several approaches for protein interaction prediction for non-model species incorporate only small fractions of the entire proteomes and their interactions. Based on this perspective, this study presents an integration of computational methodologies, protein network predictions and comparative analysis of the protozoan species Leishmania braziliensis and Leishmania infantum. These parasites cause Leishmaniasis, a worldwide distributed and neglected disease, with limited treatment options using currently available drugs. The predicted interactions were obtained from a meta-approach, applying rigid body docking tests and template-based docking on protein structures predicted by different comparative modeling techniques. In addition, we trained a machine-learning algorithm (Gradient Boosting) using docking information performed on a curated set of positive and negative protein interaction data. Our final model obtained an AUC = 0.88, with recall = 0.69, specificity = 0.88 and precision = 0.83. Using this approach, it was possible to confidently predict 681 protein structures and 6198 protein interactions for L. braziliensis, and 708 protein structures and 7391 protein interactions for L. infantum. The predicted networks were integrated to protein interaction data already available, analyzed using several topological features and used to classify proteins as essential for network stability. The present study allowed to demonstrate the importance of integrating different methodologies of interaction prediction to increase the coverage of the protein interaction of the studied protocols, besides it made available protein structures and interactions not previously reported.
Garcia Lopez, Sebastian; Kim, Philip M.
2014-01-01
Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases. PMID:25243403
The sequential structure of brain activation predicts skill.
Anderson, John R; Bothell, Daniel; Fincham, Jon M; Moon, Jungaa
2016-01-29
In an fMRI study, participants were trained to play a complex video game. They were scanned early and then again after substantial practice. While better players showed greater activation in one region (right dorsal striatum) their relative skill was better diagnosed by considering the sequential structure of whole brain activation. Using a cognitive model that played this game, we extracted a characterization of the mental states that are involved in playing a game and the statistical structure of the transitions among these states. There was a strong correspondence between this measure of sequential structure and the skill of different players. Using multi-voxel pattern analysis, it was possible to recognize, with relatively high accuracy, the cognitive states participants were in during particular scans. We used the sequential structure of these activation-recognized states to predict the skill of individual players. These findings indicate that important features about information-processing strategies can be identified from a model-based analysis of the sequential structure of brain activation. Copyright © 2015 Elsevier Ltd. All rights reserved.
Elemental Water Impact Test: Phase 3 Plunge Depth of a 36-Inch Aluminum Tank Head
NASA Technical Reports Server (NTRS)
Vassilakos, Gregory J.
2014-01-01
Spacecraft are being designed based on LS-DYNA water landing simulations. The Elemental Water Impact Test (EWIT) series was undertaken to assess the accuracy of LS-DYNA water impact simulations. Phase 3 featured a composite tank head that was tested at a range of heights to verify the ability to predict structural failure of composites. To support planning for Phase 3, a test series was conducted with an aluminum tank head dropped from heights of 2, 6, 10, and 12 feet to verify that the test article would not impact the bottom of the test pool. This report focuses on the comparisons of the measured plunge depths to LS-DYNA predictions. The results for the tank head model demonstrated the following. 1. LS-DYNA provides accurate predictions for peak accelerations. 2. LS-DYNA consistently under-predicts plunge depth. An allowance of at least 20% should be added to the LS-DYNA predictions. 3. The LS-DYNA predictions for plunge depth are relatively insensitive to the fluid-structure coupling stiffness.
A Time-dependant atmospheric model of HD209458b
NASA Astrophysics Data System (ADS)
Iro, N.; Bézard, B.; Guillot, T.
2004-11-01
Charbonneau et al. (2002) conducted HST spectroscopic observations of HD209458 centered on the sodium doublet at 589.3 nm. An absorption feature was found, interpreted as an absorption from the sodium in the planet's atmosphere. However, this feature is weaker than predicted by static radiative equilibrium atmospheric models of HD209458b. We present a time-dependent radiative model of the atmosphere of HD209458b and investigate its thermal structure and chemical composition. Time-dependent temperature profiles are calculated, assuming a constant-with-height zonal wind, modelled as a solid body rotation. We predict day-night variations of the effective temperature of ˜600 K, for an equatorial rotation rate of 1 km s-1, in good agreement with the predictions by Showman & Guillot, 2002. At high altitudes (mbar pressures or less), the night temperatures are low enough to allow sodium to condense into Na2S. Synthetic transit spectra of the visible Na doublet show a much weaker sodium absorption on the morning limb than on the evening limb. The calculated dimming of the sodium feature during a planetary transit agrees with the value reported by Charbonneau et al. (2002).
Nonlinear random response prediction using MSC/NASTRAN
NASA Technical Reports Server (NTRS)
Robinson, J. H.; Chiang, C. K.; Rizzi, S. A.
1993-01-01
An equivalent linearization technique was incorporated into MSC/NASTRAN to predict the nonlinear random response of structures by means of Direct Matrix Abstract Programming (DMAP) modifications and inclusion of the nonlinear differential stiffness module inside the iteration loop. An iterative process was used to determine the rms displacements. Numerical results obtained for validation on simple plates and beams are in good agreement with existing solutions in both the linear and linearized regions. The versatility of the implementation will enable the analyst to determine the nonlinear random responses for complex structures under combined loads. The thermo-acoustic response of a hexagonal thermal protection system panel is used to highlight some of the features of the program.
Navigating at Will on the Water Phase Diagram
NASA Astrophysics Data System (ADS)
Pipolo, S.; Salanne, M.; Ferlat, G.; Klotz, S.; Saitta, A. M.; Pietrucci, F.
2017-12-01
Despite the simplicity of its molecular unit, water is a challenging system because of its uniquely rich polymorphism and predicted but yet unconfirmed features. Introducing a novel space of generalized coordinates that capture changes in the topology of the interatomic network, we are able to systematically track transitions among liquid, amorphous, and crystalline forms throughout the whole phase diagram of water, including the nucleation of crystals above and below the melting point. Our approach, based on molecular dynamics and enhanced sampling or free energy calculation techniques, is not specific to water and could be applied to very different structural phase transitions, paving the way towards the prediction of kinetic routes connecting polymorphic structures in a range of materials.
A general prediction model for the detection of ADHD and Autism using structural and functional MRI.
Sen, Bhaskar; Borle, Neil C; Greiner, Russell; Brown, Matthew R G
2018-01-01
This work presents a novel method for learning a model that can diagnose Attention Deficit Hyperactivity Disorder (ADHD), as well as Autism, using structural texture and functional connectivity features obtained from 3-dimensional structural magnetic resonance imaging (MRI) and 4-dimensional resting-state functional magnetic resonance imaging (fMRI) scans of subjects. We explore a series of three learners: (1) The LeFMS learner first extracts features from the structural MRI images using the texture-based filters produced by a sparse autoencoder. These filters are then convolved with the original MRI image using an unsupervised convolutional network. The resulting features are used as input to a linear support vector machine (SVM) classifier. (2) The LeFMF learner produces a diagnostic model by first computing spatial non-stationary independent components of the fMRI scans, which it uses to decompose each subject's fMRI scan into the time courses of these common spatial components. These features can then be used with a learner by themselves or in combination with other features to produce the model. Regardless of which approach is used, the final set of features are input to a linear support vector machine (SVM) classifier. (3) Finally, the overall LeFMSF learner uses the combined features obtained from the two feature extraction processes in (1) and (2) above as input to an SVM classifier, achieving an accuracy of 0.673 on the ADHD-200 holdout data and 0.643 on the ABIDE holdout data. Both of these results, obtained with the same LeFMSF framework, are the best known, over all hold-out accuracies on these datasets when only using imaging data-exceeding previously-published results by 0.012 for ADHD and 0.042 for Autism. Our results show that combining multi-modal features can yield good classification accuracy for diagnosis of ADHD and Autism, which is an important step towards computer-aided diagnosis of these psychiatric diseases and perhaps others as well.
Structure and non-structure of centrosomal proteins.
Dos Santos, Helena G; Abia, David; Janowski, Robert; Mortuza, Gulnahar; Bertero, Michela G; Boutin, Maïlys; Guarín, Nayibe; Méndez-Giraldez, Raúl; Nuñez, Alfonso; Pedrero, Juan G; Redondo, Pilar; Sanz, María; Speroni, Silvia; Teichert, Florian; Bruix, Marta; Carazo, José M; Gonzalez, Cayetano; Reina, José; Valpuesta, José M; Vernos, Isabelle; Zabala, Juan C; Montoya, Guillermo; Coll, Miquel; Bastolla, Ugo; Serrano, Luis
2013-01-01
Here we perform a large-scale study of the structural properties and the expression of proteins that constitute the human Centrosome. Centrosomal proteins tend to be larger than generic human proteins (control set), since their genes contain in average more exons (20.3 versus 14.6). They are rich in predicted disordered regions, which cover 57% of their length, compared to 39% in the general human proteome. They also contain several regions that are dually predicted to be disordered and coiled-coil at the same time: 55 proteins (15%) contain disordered and coiled-coil fragments that cover more than 20% of their length. Helices prevail over strands in regions homologous to known structures (47% predicted helical residues against 17% predicted as strands), and even more in the whole centrosomal proteome (52% against 7%), while for control human proteins 34.5% of the residues are predicted as helical and 12.8% are predicted as strands. This difference is mainly due to residues predicted as disordered and helical (30% in centrosomal and 9.4% in control proteins), which may correspond to alpha-helix forming molecular recognition features (α-MoRFs). We performed expression assays for 120 full-length centrosomal proteins and 72 domain constructs that we have predicted to be globular. These full-length proteins are often insoluble: Only 39 out of 120 expressed proteins (32%) and 19 out of 72 domains (26%) were soluble. We built or retrieved structural models for 277 out of 361 human proteins whose centrosomal localization has been experimentally verified. We could not find any suitable structural template with more than 20% sequence identity for 84 centrosomal proteins (23%), for which around 74% of the residues are predicted to be disordered or coiled-coils. The three-dimensional models that we built are available at http://ub.cbm.uam.es/centrosome/models/index.php.
Kinact: a computational approach for predicting activating missense mutations in protein kinases.
Rodrigues, Carlos H M; Ascher, David B; Pires, Douglas E V
2018-05-21
Protein phosphorylation is tightly regulated due to its vital role in many cellular processes. While gain of function mutations leading to constitutive activation of protein kinases are known to be driver events of many cancers, the identification of these mutations has proven challenging. Here we present Kinact, a novel machine learning approach for predicting kinase activating missense mutations using information from sequence and structure. By adapting our graph-based signatures, Kinact represents both structural and sequence information, which are used as evidence to train predictive models. We show the combination of structural and sequence features significantly improved the overall accuracy compared to considering either primary or tertiary structure alone, highlighting their complementarity. Kinact achieved a precision of 87% and 94% and Area Under ROC Curve of 0.89 and 0.92 on 10-fold cross-validation, and on blind tests, respectively, outperforming well established tools (P < 0.01). We further show that Kinact performs equally well on homology models built using templates with sequence identity as low as 33%. Kinact is freely available as a user-friendly web server at http://biosig.unimelb.edu.au/kinact/.
Driven to distraction: A lack of change gives rise to mind wandering.
Faber, Myrthe; Radvansky, Gabriel A; D'Mello, Sidney K
2018-04-01
How does the dynamic structure of the external world direct attention? We examined the relationship between event structure and attention to test the hypothesis that narrative shifts (both theoretical and perceived) negatively predict attentional lapses. Self-caught instances of mind wandering were collected while 108 participants watched a 32.5 min film called The Red Balloon. We used theoretical codings of situational change and human perceptions of event boundaries to predict mind wandering in 5-s intervals. Our findings suggest a temporal alignment between the structural dynamics of the film and mind wandering reports. Specifically, the number of situational changes and likelihood of perceiving event boundaries in the prior 0-15 s interval negatively predicted mind wandering net of low-level audiovisual features. Thus, mind wandering is less likely to occur when there is more event change, suggesting that narrative shifts keep attention from drifting inwards. Copyright © 2018 Elsevier B.V. All rights reserved.
High-pressure phase of brucite stable at Earth's mantle transition zone and lower mantle conditions.
Hermann, Andreas; Mookherjee, Mainak
2016-12-06
We investigate the high-pressure phase diagram of the hydrous mineral brucite, Mg(OH) 2 , using structure search algorithms and ab initio simulations. We predict a high-pressure phase stable at pressure and temperature conditions found in cold subducting slabs in Earth's mantle transition zone and lower mantle. This prediction implies that brucite can play a much more important role in water transport and storage in Earth's interior than hitherto thought. The predicted high-pressure phase, stable in calculations between 20 and 35 GPa and up to 800 K, features MgO 6 octahedral units arranged in the anatase-TiO 2 structure. Our findings suggest that brucite will transform from a layered to a compact 3D network structure before eventual decomposition into periclase and ice. We show that the high-pressure phase has unique spectroscopic fingerprints that should allow for straightforward detection in experiments. The phase also has distinct elastic properties that might make its direct detection in the deep Earth possible with geophysical methods.
NASA Astrophysics Data System (ADS)
Palu, J. M.; Burberry, C. M.
2014-12-01
The reactivation potential of pre-existing basement structures affects the geometry of subsequent deformation structures. A conceptual model depicting the results of these interactions can be applied to multiple fold-thrust systems and lead to valuable deformation predictions. These predictions include the potential for hydrocarbon traps or seismic risk in an actively deforming area. The Sawtooth Range, Montana, has been used as a study area. A model for the development of structures close to the Augusta Syncline in the Sawtooth Range is being developed using: 1) an ArcGIS map of the basement structures of the belt based on analysis of geophysical data indicating gravity anomalies and aeromagnetic lineations, seismic data indicating deformation structures, and well logs for establishing lithologies, previously collected by others and 2) an ArcGIS map of the surface deformation structures of the belt based on interpretation of remote sensing images and verification through the collection of surface field data indicating stress directions and age relationships, resulting in a conceptual model based on the understanding of the interaction of the two previous maps including statistical correlations of data and development of balanced cross-sections using Midland Valley's 2D/3D Move software. An analysis of the model will then indicate viable deformation paths where prominent basement structures influenced subsequently developed deformation structures and reactivated faults. Preliminary results indicate that the change in orientation of thrust faults observed in the Sawtooth Range, from a NNW-SSE orientation near the Gibson Reservoir to a WNW-ESE trend near Haystack Butte correlates with pre-existing deformation structures lying within the Great Falls Tectonic Zone. The Scapegoat-Bannatyne trend appears to be responsible for this orientation change and rather than being a single feature, may be composed of up to 4 NE-SW oriented basement strike-slip faults. This indicates that the pre-existing basement features have a profound effect on the geometry of the later deformation. This conceptual model can also be applied to other deformed belts to provide a prediction for the potential hydrocarbon trap locations of the belt as well as their seismic risk.
Auditory sensitivity of seals and sea lions in complex listening scenarios.
Cunningham, Kane A; Southall, Brandon L; Reichmuth, Colleen
2014-12-01
Standard audiometric data, such as audiograms and critical ratios, are often used to inform marine mammal noise-exposure criteria. However, these measurements are obtained using simple, artificial stimuli-i.e., pure tones and flat-spectrum noise-while natural sounds typically have more complex structure. In this study, detection thresholds for complex signals were measured in (I) quiet and (II) masked conditions for one California sea lion (Zalophus californianus) and one harbor seal (Phoca vitulina). In Experiment I, detection thresholds in quiet conditions were obtained for complex signals designed to isolate three common features of natural sounds: Frequency modulation, amplitude modulation, and harmonic structure. In Experiment II, detection thresholds were obtained for the same complex signals embedded in two types of masking noise: Synthetic flat-spectrum noise and recorded shipping noise. To evaluate how accurately standard hearing data predict detection of complex sounds, the results of Experiments I and II were compared to predictions based on subject audiograms and critical ratios combined with a basic hearing model. Both subjects exhibited greater-than-predicted sensitivity to harmonic signals in quiet and masked conditions, as well as to frequency-modulated signals in masked conditions. These differences indicate that the complex features of naturally occurring sounds enhance detectability relative to simple stimuli.
VarMod: modelling the functional effects of non-synonymous variants.
Pappalardo, Morena; Wass, Mark N
2014-07-01
Unravelling the genotype-phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein-protein interfaces and protein-ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Iwata, Hiroaki; Sawada, Ryusuke; Mizutani, Sayaka; Yamanishi, Yoshihiro
2015-02-23
Drug repositioning, or the application of known drugs to new indications, is a challenging issue in pharmaceutical science. In this study, we developed a new computational method to predict unknown drug indications for systematic drug repositioning in a framework of supervised network inference. We defined a descriptor for each drug-disease pair based on the phenotypic features of drugs (e.g., medicinal effects and side effects) and various molecular features of diseases (e.g., disease-causing genes, diagnostic markers, disease-related pathways, and environmental factors) and constructed a statistical model to predict new drug-disease associations for a wide range of diseases in the International Classification of Diseases. Our results show that the proposed method outperforms previous methods in terms of accuracy and applicability, and its performance does not depend on drug chemical structure similarity. Finally, we performed a comprehensive prediction of a drug-disease association network consisting of 2349 drugs and 858 diseases and described biologically meaningful examples of newly predicted drug indications for several types of cancers and nonhereditary diseases.
Similarity-based Regularized Latent Feature Model for Link Prediction in Bipartite Networks.
Wang, Wenjun; Chen, Xue; Jiao, Pengfei; Jin, Di
2017-12-05
Link prediction is an attractive research topic in the field of data mining and has significant applications in improving performance of recommendation system and exploring evolving mechanisms of the complex networks. A variety of complex systems in real world should be abstractly represented as bipartite networks, in which there are two types of nodes and no links connect nodes of the same type. In this paper, we propose a framework for link prediction in bipartite networks by combining the similarity based structure and the latent feature model from a new perspective. The framework is called Similarity Regularized Nonnegative Matrix Factorization (SRNMF), which explicitly takes the local characteristics into consideration and encodes the geometrical information of the networks by constructing a similarity based matrix. We also develop an iterative scheme to solve the objective function based on gradient descent. Extensive experiments on a variety of real world bipartite networks show that the proposed framework of link prediction has a more competitive, preferable and stable performance in comparison with the state-of-art methods.
Preprocessing Structured Clinical Data for Predictive Modeling and Decision Support
Oliveira, Mónica Duarte; Janela, Filipe; Martins, Henrique M. G.
2016-01-01
Summary Background EHR systems have high potential to improve healthcare delivery and management. Although structured EHR data generates information in machine-readable formats, their use for decision support still poses technical challenges for researchers due to the need to preprocess and convert data into a matrix format. During our research, we observed that clinical informatics literature does not provide guidance for researchers on how to build this matrix while avoiding potential pitfalls. Objectives This article aims to provide researchers a roadmap of the main technical challenges of preprocessing structured EHR data and possible strategies to overcome them. Methods Along standard data processing stages – extracting database entries, defining features, processing data, assessing feature values and integrating data elements, within an EDPAI framework –, we identified the main challenges faced by researchers and reflect on how to address those challenges based on lessons learned from our research experience and on best practices from related literature. We highlight the main potential sources of error, present strategies to approach those challenges and discuss implications of these strategies. Results Following the EDPAI framework, researchers face five key challenges: (1) gathering and integrating data, (2) identifying and handling different feature types, (3) combining features to handle redundancy and granularity, (4) addressing data missingness, and (5) handling multiple feature values. Strategies to address these challenges include: cross-checking identifiers for robust data retrieval and integration; applying clinical knowledge in identifying feature types, in addressing redundancy and granularity, and in accommodating multiple feature values; and investigating missing patterns adequately. Conclusions This article contributes to literature by providing a roadmap to inform structured EHR data preprocessing. It may advise researchers on potential pitfalls and implications of methodological decisions in handling structured data, so as to avoid biases and help realize the benefits of the secondary use of EHR data. PMID:27924347
Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S
2017-01-01
Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.
Geologic evidence of hotspot activity of Venus - Predictions for Magellan
NASA Technical Reports Server (NTRS)
Stofan, Ellen R.; Saunders, R. Stephen
1990-01-01
A number of distinctive types of geologic features have been identified on Venus that are interpreted to be related to thermal plumes including domal rises, coronae, and major composite shield volcanoes. The basic characteristics of these features as well as their distribution are documented. The three types of features have related morphologies and are interpreted to represent a continuum of features formed by mantle plumes at scales from 100s to over 1000 km. The Artemis structure, located in Aphrodite Terra, is proposed to be a large corona. If crustal spreading processes are operating on Venus, hotspot features should form chains on the surface as seen in terrestrial ocean basins. On the basis of current data on hotspot-related feature distribution on Venus, no clear evidence exists for hotspot chains. The complete distribution of hotspot features in Magellan data will be used to understand better the relationship between interior processes and surface features, as well as to provide a test for the crustal spreading hypothesis.
Liu, Zhihong; Zheng, Minghao; Yan, Xin; Gu, Qiong; Gasteiger, Johann; Tijhuis, Johan; Maas, Peter; Li, Jiabo; Xu, Jun
2014-09-01
Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H2O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (p(s)) and an unstable probability (p(uns)). 13,340 ACFs, together with their p(s) and p(uns) data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p(s) and p(uns) values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes' theorem, based upon the p(s) and p(uns) values of the compound ACFs. We were able to achieve performance with an AUC value of 84% and a tenfold cross validation accuracy of 76.5%. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.
Patel, Meenal J; Andreescu, Carmen; Price, Julie C; Edelman, Kathryn L; Reynolds, Charles F; Aizenstein, Howard J
2015-10-01
Currently, depression diagnosis relies primarily on behavioral symptoms and signs, and treatment is guided by trial and error instead of evaluating associated underlying brain characteristics. Unlike past studies, we attempted to estimate accurate prediction models for late-life depression diagnosis and treatment response using multiple machine learning methods with inputs of multi-modal imaging and non-imaging whole brain and network-based features. Late-life depression patients (medicated post-recruitment) (n = 33) and older non-depressed individuals (n = 35) were recruited. Their demographics and cognitive ability scores were recorded, and brain characteristics were acquired using multi-modal magnetic resonance imaging pretreatment. Linear and nonlinear learning methods were tested for estimating accurate prediction models. A learning method called alternating decision trees estimated the most accurate prediction models for late-life depression diagnosis (87.27% accuracy) and treatment response (89.47% accuracy). The diagnosis model included measures of age, Mini-mental state examination score, and structural imaging (e.g. whole brain atrophy and global white mater hyperintensity burden). The treatment response model included measures of structural and functional connectivity. Combinations of multi-modal imaging and/or non-imaging measures may help better predict late-life depression diagnosis and treatment response. As a preliminary observation, we speculate that the results may also suggest that different underlying brain characteristics defined by multi-modal imaging measures-rather than region-based differences-are associated with depression versus depression recovery because to our knowledge this is the first depression study to accurately predict both using the same approach. These findings may help better understand late-life depression and identify preliminary steps toward personalized late-life depression treatment. Copyright © 2015 John Wiley & Sons, Ltd.
Learning-based prediction of gestational age from ultrasound images of the fetal brain.
Namburete, Ana I L; Stebbing, Richard V; Kemp, Bryn; Yaqub, Mohammad; Papageorghiou, Aris T; Alison Noble, J
2015-04-01
We propose an automated framework for predicting gestational age (GA) and neurodevelopmental maturation of a fetus based on 3D ultrasound (US) brain image appearance. Our method capitalizes on age-related sonographic image patterns in conjunction with clinical measurements to develop, for the first time, a predictive age model which improves on the GA-prediction potential of US images. The framework benefits from a manifold surface representation of the fetal head which delineates the inner skull boundary and serves as a common coordinate system based on cranial position. This allows for fast and efficient sampling of anatomically-corresponding brain regions to achieve like-for-like structural comparison of different developmental stages. We develop bespoke features which capture neurosonographic patterns in 3D images, and using a regression forest classifier, we characterize structural brain development both spatially and temporally to capture the natural variation existing in a healthy population (N=447) over an age range of active brain maturation (18-34weeks). On a routine clinical dataset (N=187) our age prediction results strongly correlate with true GA (r=0.98,accurate within±6.10days), confirming the link between maturational progression and neurosonographic activity observable across gestation. Our model also outperforms current clinical methods by ±4.57 days in the third trimester-a period complicated by biological variations in the fetal population. Through feature selection, the model successfully identified the most age-discriminating anatomies over this age range as being the Sylvian fissure, cingulate, and callosal sulci. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Automated classifiers for early detection and diagnosis of retinopathy in diabetic eyes.
Somfai, Gábor Márk; Tátrai, Erika; Laurik, Lenke; Varga, Boglárka; Ölvedy, Veronika; Jiang, Hong; Wang, Jianhua; Smiddy, William E; Somogyi, Anikó; DeBuc, Delia Cabrera
2014-04-12
Artificial neural networks (ANNs) have been used to classify eye diseases, such as diabetic retinopathy (DR) and glaucoma. DR is the leading cause of blindness in working-age adults in the developed world. The implementation of DR diagnostic routines could be feasibly improved by the integration of structural and optical property test measurements of the retinal structure that provide important and complementary information for reaching a diagnosis. In this study, we evaluate the capability of several structural and optical features (thickness, total reflectance and fractal dimension) of various intraretinal layers extracted from optical coherence tomography images to train a Bayesian ANN to discriminate between healthy and diabetic eyes with and with no mild retinopathy. When exploring the probability as to whether the subject's eye was healthy (diagnostic condition, Test 1), we found that the structural and optical property features of the outer plexiform layer (OPL) and the complex formed by the ganglion cell and inner plexiform layers (GCL + IPL) provided the highest probability (positive predictive value (PPV) of 91% and 89%, respectively) for the proportion of patients with positive test results (healthy condition) who were correctly diagnosed (Test 1). The true negative, TP and PPV values remained stable despite the different sizes of training data sets (Test 2). The sensitivity, specificity and PPV were greater or close to 0.70 for the retinal nerve fiber layer's features, photoreceptor outer segments and retinal pigment epithelium when 23 diabetic eyes with mild retinopathy were mixed with 38 diabetic eyes with no retinopathy (Test 3). A Bayesian ANN trained on structural and optical features from optical coherence tomography data can successfully discriminate between healthy and diabetic eyes with and with no retinopathy. The fractal dimension of the OPL and the GCL + IPL complex predicted by the Bayesian radial basis function network provides better diagnostic utility to classify diabetic eyes with mild retinopathy. Moreover, the thickness and fractal dimension parameters of the retinal nerve fiber layer, photoreceptor outer segments and retinal pigment epithelium show promise for the diagnostic classification between diabetic eyes with and with no mild retinopathy.
Lou, Wangchao; Wang, Xiaoqing; Chen, Fan; Chen, Yixiao; Jiang, Bo; Zhang, Hua
2014-01-01
Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins. PMID:24475169
Attending to Structural Programming Features Predicts Differences in Learning and Motivation
ERIC Educational Resources Information Center
Witherspoon, Eben B.; Schunn, Christian D.; Higashi, Ross M.; Shoop, Robin
2018-01-01
Educational robotics programs offer an engaging opportunity to potentially teach core computer science concepts and practices in K-12 classrooms. Here, we test the effects of units with different programming content within a virtual robotics context on both learning gains and motivational changes in middle school (6th-8th grade) robotics…
NASA Astrophysics Data System (ADS)
Wen, Hongwei; Liu, Yue; Wang, Jieqiong; Zhang, Jishui; Peng, Yun; He, Huiguang
2016-03-01
Tourette syndrome (TS) is a childhood-onset neurobehavioral disorder characterized by the presence of multiple motor and vocal tics. Tic generation has been linked to disturbed networks of brain areas involved in planning, controlling and execution of action. The aim of our work is to select topological characteristics of structural network which were most efficient for estimating the classification models to identify early TS children. Here we employed the diffusion tensor imaging (DTI) and deterministic tractography to construct the structural networks of 44 TS children and 48 age and gender matched healthy children. We calculated four different connection matrices (fiber number, mean FA, averaged fiber length weighted and binary matrices) and then applied graph theoretical methods to extract the regional nodal characteristics of structural network. For each weighted or binary network, nodal degree, nodal efficiency and nodal betweenness were selected as features. Support Vector Machine Recursive Feature Extraction (SVM-RFE) algorithm was used to estimate the best feature subset for classification. The accuracy of 88.26% evaluated by a nested cross validation was achieved on combing best feature subset of each network characteristic. The identified discriminative brain nodes mostly located in the basal ganglia and frontal cortico-cortical networks involved in TS children which was associated with tic severity. Our study holds promise for early identification and predicting prognosis of TS children.
Thermomechanical deformation in the presence of metallurgical changes
NASA Technical Reports Server (NTRS)
Robinson, D. N.
1985-01-01
Nonisothermal testing that can be used as a basis of a nonisothermal representation is discussed. Related tests regarding metallurgical changes that occur in other high temperature structural alloys are discussed. A viscoplastic constitutive model capable of qualitatively representing the behavioral features was formulated. This model is used to assess the differences in ultimate life prediction in some typical nonisothermal structural problems when the constitutive model does or does not account for metallurgically induced thermomechanical history dependence.
Qidwai, Tabish; Yadav, Dharmendra K; Khan, Feroz; Dhawan, Sangeeta; Bhakuni, R S
2012-01-01
This work presents the development of quantitative structure activity relationship (QSAR) model to predict the antimalarial activity of artemisinin derivatives. The structures of the molecules are represented by chemical descriptors that encode topological, geometric, and electronic structure features. Screening through QSAR model suggested that compounds A24, A24a, A53, A54, A62 and A64 possess significant antimalarial activity. Linear model is developed by the multiple linear regression method to link structures to their reported antimalarial activity. The correlation in terms of regression coefficient (r(2)) was 0.90 and prediction accuracy of model in terms of cross validation regression coefficient (rCV(2)) was 0.82. This study indicates that chemical properties viz., atom count (all atoms), connectivity index (order 1, standard), ring count (all rings), shape index (basic kappa, order 2), and solvent accessibility surface area are well correlated with antimalarial activity. The docking study showed high binding affinity of predicted active compounds against antimalarial target Plasmepsins (Plm-II). Further studies for oral bioavailability, ADMET and toxicity risk assessment suggest that compound A24, A24a, A53, A54, A62 and A64 exhibits marked antimalarial activity comparable to standard antimalarial drugs. Later one of the predicted active compound A64 was chemically synthesized, structure elucidated by NMR and in vivo tested in multidrug resistant strain of Plasmodium yoelii nigeriensis infected mice. The experimental results obtained agreed well with the predicted values.
Automatic prediction of facial trait judgments: appearance vs. structural models.
Rojas, Mario; Masip, David; Todorov, Alexander; Vitria, Jordi
2011-01-01
Evaluating other individuals with respect to personality characteristics plays a crucial role in human relations and it is the focus of attention for research in diverse fields such as psychology and interactive computer systems. In psychology, face perception has been recognized as a key component of this evaluation system. Multiple studies suggest that observers use face information to infer personality characteristics. Interactive computer systems are trying to take advantage of these findings and apply them to increase the natural aspect of interaction and to improve the performance of interactive computer systems. Here, we experimentally test whether the automatic prediction of facial trait judgments (e.g. dominance) can be made by using the full appearance information of the face and whether a reduced representation of its structure is sufficient. We evaluate two separate approaches: a holistic representation model using the facial appearance information and a structural model constructed from the relations among facial salient points. State of the art machine learning methods are applied to a) derive a facial trait judgment model from training data and b) predict a facial trait value for any face. Furthermore, we address the issue of whether there are specific structural relations among facial points that predict perception of facial traits. Experimental results over a set of labeled data (9 different trait evaluations) and classification rules (4 rules) suggest that a) prediction of perception of facial traits is learnable by both holistic and structural approaches; b) the most reliable prediction of facial trait judgments is obtained by certain type of holistic descriptions of the face appearance; and c) for some traits such as attractiveness and extroversion, there are relationships between specific structural features and social perceptions.
Janet, Jon Paul; Kulik, Heather J
2017-11-22
Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.
Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.
2014-01-01
Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953
Sweeney, Elizabeth M; Vogelstein, Joshua T; Cuzzocreo, Jennifer L; Calabresi, Peter A; Reich, Daniel S; Crainiceanu, Ciprian M; Shinohara, Russell T
2014-01-01
Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance.
ERIC Educational Resources Information Center
Sweller, Naomi
2015-01-01
Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…
Low-Frequency Cortical Oscillations Entrain to Subthreshold Rhythmic Auditory Stimuli
Schroeder, Charles E.; Poeppel, David; van Atteveldt, Nienke
2017-01-01
Many environmental stimuli contain temporal regularities, a feature that can help predict forthcoming input. Phase locking (entrainment) of ongoing low-frequency neuronal oscillations to rhythmic stimuli is proposed as a potential mechanism for enhancing neuronal responses and perceptual sensitivity, by aligning high-excitability phases to events within a stimulus stream. Previous experiments show that rhythmic structure has a behavioral benefit even when the rhythm itself is below perceptual detection thresholds (ten Oever et al., 2014). It is not known whether this “inaudible” rhythmic sound stream also induces entrainment. Here we tested this hypothesis using magnetoencephalography and electrocorticography in humans to record changes in neuronal activity as subthreshold rhythmic stimuli gradually became audible. We found that significant phase locking to the rhythmic sounds preceded participants' detection of them. Moreover, no significant auditory-evoked responses accompanied this prethreshold entrainment. These auditory-evoked responses, distinguished by robust, broad-band increases in intertrial coherence, only appeared after sounds were reported as audible. Taken together with the reduced perceptual thresholds observed for rhythmic sequences, these findings support the proposition that entrainment of low-frequency oscillations serves a mechanistic role in enhancing perceptual sensitivity for temporally predictive sounds. This framework has broad implications for understanding the neural mechanisms involved in generating temporal predictions and their relevance for perception, attention, and awareness. SIGNIFICANCE STATEMENT The environment is full of rhythmically structured signals that the nervous system can exploit for information processing. Thus, it is important to understand how the brain processes such temporally structured, regular features of external stimuli. Here we report the alignment of slowly fluctuating oscillatory brain activity to external rhythmic structure before its behavioral detection. These results indicate that phase alignment is a general mechanism of the brain to process rhythmic structure and can occur without the perceptual detection of this temporal structure. PMID:28411273
[Prediction of ETA oligopeptides antagonists from Glycine max based on in silico proteolysis].
Qiao, Lian-Sheng; Jiang, Lu-di; Luo, Gang-Gang; Lu, Fang; Chen, Yan-Kun; Wang, Ling-Zhi; Li, Gong-Yu; Zhang, Yan-Ling
2017-02-01
Oligopeptides are one of the the key pharmaceutical effective constituents of traditional Chinese medicine(TCM). Systematic study on composition and efficacy of TCM oligopeptides is essential for the analysis of material basis and mechanism of TCM. In this study, the potential anti-hypertensive oligopeptides from Glycine max and their endothelin receptor A (ETA) antagonistic activity were discovered and predicted based on in silico technologies.Main protein sequences of G. max were collected and oligopeptides were obtained using in silico gastrointestinal tract proteolysis. Then, the pharmacophore of ETA antagonistic peptides was constructed and included one hydrophobic feature, one ionizable negative feature, one ring aromatic feature and five excluded volumes. Meanwhile, three-dimensional structure of ETA was developed by homology modeling methods for further docking studies. According to docking analysis and consensus score, the key amino acid of GLN165 was identified for ETA antagonistic activity. And 27 oligopeptides from G. max were predicted as the potential ETA antagonists by pharmacophore and docking studies.In silico proteolysis could be used to analyze the protein sequences from TCM. According to combination of in silico proteolysis and molecular simulation, the biological activities of oligopeptides could be predicted rapidly based on the known TCM protein sequence. It might provide the methodology basis for rapidly and efficiently implementing the mechanism analysis of TCM oligopeptides. Copyright© by the Chinese Pharmaceutical Association.
Realizing drug repositioning by adapting a recommendation system to handle the process.
Ozsoy, Makbule Guclin; Özyer, Tansel; Polat, Faruk; Alhajj, Reda
2018-04-12
Drug repositioning is the process of identifying new targets for known drugs. It can be used to overcome problems associated with traditional drug discovery by adapting existing drugs to treat new discovered diseases. Thus, it may reduce associated risk, cost and time required to identify and verify new drugs. Nowadays, drug repositioning has received more attention from industry and academia. To tackle this problem, researchers have applied many different computational methods and have used various features of drugs and diseases. In this study, we contribute to the ongoing research efforts by combining multiple features, namely chemical structures, protein interactions and side-effects to predict new indications of target drugs. To achieve our target, we realize drug repositioning as a recommendation process and this leads to a new perspective in tackling the problem. The utilized recommendation method is based on Pareto dominance and collaborative filtering. It can also integrate multiple data-sources and multiple features. For the computation part, we applied several settings and we compared their performance. Evaluation results show that the proposed method can achieve more concentrated predictions with high precision, where nearly half of the predictions are true. Compared to other state of the art methods described in the literature, the proposed method is better at making right predictions by having higher precision. The reported results demonstrate the applicability and effectiveness of recommendation methods for drug repositioning.
Grasso, Ernesto J.; Sottile, Adolfo E.; Coronel, Carlos E.
2016-01-01
It is known that caltrin (calcium transport inhibitor) protein binds to sperm cells during ejaculation and inhibits extracellular Ca2+ uptake. Although the sequence and some biological features of mouse caltrin I and bovine caltrin are known, their physicochemical properties and tertiary structure are mainly unknown. We predicted the 3D structures of mouse caltrin I and bovine caltrin by molecular homology modeling and threading. Surface electrostatic potentials and electric fields were calculated using the Poisson–Boltzmann equation. Several different bioinformatics tools and available web servers were used to thoroughly analyze the physicochemical characteristics of both proteins, such as their Kyte and Doolittle hydropathy scores and helical wheel projections. The results presented in this work significantly aid further understanding of the molecular mechanisms of caltrin proteins modulating physiological processes associated with fertilization. PMID:27812283
NASA Astrophysics Data System (ADS)
Costache, G. N.; Gavat, I.
2004-09-01
Along with the aggressive growing of the amount of digital data available (text, audio samples, digital photos and digital movies joined all in the multimedia domain) the need for classification, recognition and retrieval of this kind of data became very important. In this paper will be presented a system structure to handle multimedia data based on a recognition perspective. The main processing steps realized for the interesting multimedia objects are: first, the parameterization, by analysis, in order to obtain a description based on features, forming the parameter vector; second, a classification, generally with a hierarchical structure to make the necessary decisions. For audio signals, both speech and music, the derived perceptual features are the melcepstral (MFCC) and the perceptual linear predictive (PLP) coefficients. For images, the derived features are the geometric parameters of the speaker mouth. The hierarchical classifier consists generally in a clustering stage, based on the Kohonnen Self-Organizing Maps (SOM) and a final stage, based on a powerful classification algorithm called Support Vector Machines (SVM). The system, in specific variants, is applied with good results in two tasks: the first, is a bimodal speech recognition which uses features obtained from speech signal fused to features obtained from speaker's image and the second is a music retrieval from large music database.
Quantitative theory of hydrophobic effect as a driving force of protein structure
Perunov, Nikolay; England, Jeremy L
2014-01-01
Various studies suggest that the hydrophobic effect plays a major role in driving the folding of proteins. In the past, however, it has been challenging to translate this understanding into a predictive, quantitative theory of how the full pattern of sequence hydrophobicity in a protein shapes functionally important features of its tertiary structure. Here, we extend and apply such a phenomenological theory of the sequence-structure relationship in globular protein domains, which had previously been applied to the study of allosteric motion. In an effort to optimize parameters for the model, we first analyze the patterns of backbone burial found in single-domain crystal structures, and discover that classic hydrophobicity scales derived from bulk physicochemical properties of amino acids are already nearly optimal for prediction of burial using the model. Subsequently, we apply the model to studying structural fluctuations in proteins and establish a means of identifying ligand-binding and protein–protein interaction sites using this approach. PMID:24408023
Spike synchrony reveals emergence of proto-objects in visual cortex.
Martin, Anne B; von der Heydt, Rüdiger
2015-04-29
Neurons at early stages of the visual cortex signal elemental features, such as pieces of contour, but how these signals are organized into perceptual objects is unclear. Theories have proposed that spiking synchrony between these neurons encodes how features are grouped (binding-by-synchrony), but recent studies did not find the predicted increase in synchrony with binding. Here we propose that features are grouped to "proto-objects" by intrinsic feedback circuits that enhance the responses of the participating feature neurons. This hypothesis predicts synchrony exclusively between feature neurons that receive feedback from the same grouping circuit. We recorded from neurons in macaque visual cortex and used border-ownership selectivity, an intrinsic property of the neurons, to infer whether or not two neurons are part of the same grouping circuit. We found that binding produced synchrony between same-circuit neurons, but not between other pairs of neurons, as predicted by the grouping hypothesis. In a selective attention task, synchrony emerged with ignored as well as attended objects, and higher synchrony was associated with faster behavioral responses, as would be expected from early grouping mechanisms that provide the structure for object-based processing. Thus, synchrony could be produced by automatic activation of intrinsic grouping circuits. However, the binding-related elevation of synchrony was weak compared with its random fluctuations, arguing against synchrony as a code for binding. In contrast, feedback grouping circuits encode binding by modulating the response strength of related feature neurons. Thus, our results suggest a novel coding mechanism that might underlie the proto-objects of perception. Copyright © 2015 the authors 0270-6474/15/356860-11$15.00/0.
Brender, Jeffrey R.; Zhang, Yang
2015-01-01
The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. PMID:26506533
NASA Astrophysics Data System (ADS)
Saarinen, N.; Vastaranta, M.; Näsi, R.; Rosnell, T.; Hakala, T.; Honkavaara, E.; Wulder, M. A.; Luoma, V.; Tommaselli, A. M. G.; Imai, N. N.; Ribeiro, E. A. W.; Guimarães, R. B.; Holopainen, M.; Hyyppä, J.
2017-10-01
Biodiversity is commonly referred to as species diversity but in forest ecosystems variability in structural and functional characteristics can also be treated as measures of biodiversity. Small unmanned aerial vehicles (UAVs) provide a means for characterizing forest ecosystem with high spatial resolution, permitting measuring physical characteristics of a forest ecosystem from a viewpoint of biodiversity. The objective of this study is to examine the applicability of photogrammetric point clouds and hyperspectral imaging acquired with a small UAV helicopter in mapping biodiversity indicators, such as structural complexity as well as the amount of deciduous and dead trees at plot level in southern boreal forests. Standard deviation of tree heights within a sample plot, used as a proxy for structural complexity, was the most accurately derived biodiversity indicator resulting in a mean error of 0.5 m, with a standard deviation of 0.9 m. The volume predictions for deciduous and dead trees were underestimated by 32.4 m3/ha and 1.7 m3/ha, respectively, with standard deviation of 50.2 m3/ha for deciduous and 3.2 m3/ha for dead trees. The spectral features describing brightness (i.e. higher reflectance values) were prevailing in feature selection but several wavelengths were represented. Thus, it can be concluded that structural complexity can be predicted reliably but at the same time can be expected to be underestimated with photogrammetric point clouds obtained with a small UAV. Additionally, plot-level volume of dead trees can be predicted with small mean error whereas identifying deciduous species was more challenging at plot level.
Development of a brain MRI-based hidden Markov model for dementia recognition.
Chen, Ying; Pham, Tuan D
2013-01-01
Dementia is an age-related cognitive decline which is indicated by an early degeneration of cortical and sub-cortical structures. Characterizing those morphological changes can help to understand the disease development and contribute to disease early prediction and prevention. But modeling that can best capture brain structural variability and can be valid in both disease classification and interpretation is extremely challenging. The current study aimed to establish a computational approach for modeling the magnetic resonance imaging (MRI)-based structural complexity of the brain using the framework of hidden Markov models (HMMs) for dementia recognition. Regularity dimension and semi-variogram were used to extract structural features of the brains, and vector quantization method was applied to convert extracted feature vectors to prototype vectors. The output VQ indices were then utilized to estimate parameters for HMMs. To validate its accuracy and robustness, experiments were carried out on individuals who were characterized as non-demented and mild Alzheimer's diseased. Four HMMs were constructed based on the cohort of non-demented young, middle-aged, elder and demented elder subjects separately. Classification was carried out using a data set including both non-demented and demented individuals with a wide age range. The proposed HMMs have succeeded in recognition of individual who has mild Alzheimer's disease and achieved a better classification accuracy compared to other related works using different classifiers. Results have shown the ability of the proposed modeling for recognition of early dementia. The findings from this research will allow individual classification to support the early diagnosis and prediction of dementia. By using the brain MRI-based HMMs developed in our proposed research, it will be more efficient, robust and can be easily used by clinicians as a computer-aid tool for validating imaging bio-markers for early prediction of dementia.
A protein coevolution method uncovers critical features of the Hepatitis C Virus fusion mechanism
Douam, Florian; Mancip, Jimmy; Mailly, Laurent; Montserret, Roland; Ding, Qiang; Verhoeyen, Els; Baumert, Thomas F.; Ploss, Alexander; Carbone, Alessandra
2018-01-01
Amino-acid coevolution can be referred to mutational compensatory patterns preserving the function of a protein. Viral envelope glycoproteins, which mediate entry of enveloped viruses into their host cells, are shaped by coevolution signals that confer to viruses the plasticity to evade neutralizing antibodies without altering viral entry mechanisms. The functions and structures of the two envelope glycoproteins of the Hepatitis C Virus (HCV), E1 and E2, are poorly described. Especially, how these two proteins mediate the HCV fusion process between the viral and the cell membrane remains elusive. Here, as a proof of concept, we aimed to take advantage of an original coevolution method recently developed to shed light on the HCV fusion mechanism. When first applied to the well-characterized Dengue Virus (DENV) envelope glycoproteins, coevolution analysis was able to predict important structural features and rearrangements of these viral protein complexes. When applied to HCV E1E2, computational coevolution analysis predicted that E1 and E2 refold interdependently during fusion through rearrangements of the E2 Back Layer (BL). Consistently, a soluble BL-derived polypeptide inhibited HCV infection of hepatoma cell lines, primary human hepatocytes and humanized liver mice. We showed that this polypeptide specifically inhibited HCV fusogenic rearrangements, hence supporting the critical role of this domain during HCV fusion. By combining coevolution analysis and in vitro assays, we also uncovered functionally-significant coevolving signals between E1 and E2 BL/Stem regions that govern HCV fusion, demonstrating the accuracy of our coevolution predictions. Altogether, our work shed light on important structural features of the HCV fusion mechanism and contributes to advance our functional understanding of this process. This study also provides an important proof of concept that coevolution can be employed to explore viral protein mediated-processes, and can guide the development of innovative translational strategies against challenging human-tropic viruses. PMID:29505618
Gao, Zhen; Chen, Yang; Cai, Xiaoshu; Xu, Rong
2017-01-01
Abstract Motivation: Blood–Brain-Barrier (BBB) is a rigorous permeability barrier for maintaining homeostasis of Central Nervous System (CNS). Determination of compound’s permeability to BBB is prerequisite in CNS drug discovery. Existing computational methods usually predict drug BBB permeability from chemical structure and they generally apply to small compounds passing BBB through passive diffusion. As abundant information on drug side effects and indications has been recorded over time through extensive clinical usage, we aim to explore BBB permeability prediction from a new angle and introduce a novel approach to predict BBB permeability from drug clinical phenotypes (drug side effects and drug indications). This method can apply to both small compounds and macro-molecules penetrating BBB through various mechanisms besides passive diffusion. Results: We composed a training dataset of 213 drugs with known brain and blood steady-state concentrations ratio and extracted their side effects and indications as features. Next, we trained SVM models with polynomial kernel and obtained accuracy of 76.0%, AUC 0.739, and F1 score (macro weighted) 0.760 with Monte Carlo cross validation. The independent test accuracy was 68.3%, AUC 0.692, F1 score 0.676. When both chemical features and clinical phenotypes were available, combining the two types of features achieved significantly better performance than chemical feature based approach (accuracy 85.5% versus 72.9%, AUC 0.854 versus 0.733, F1 score 0.854 versus 0.725; P < e−90). We also conducted de novo prediction and identified 110 drugs in SIDER database having the potential to penetrate BBB, which could serve as start point for CNS drug repositioning research. Availability and Implementation: https://github.com/bioinformatics-gao/CASE-BBB-prediction-Data Contact: rxx@case.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27993785
A framework for feature extraction from hospital medical data with applications in risk prediction.
Tran, Truyen; Luo, Wei; Phung, Dinh; Gupta, Sunil; Rana, Santu; Kennedy, Richard Lee; Larkins, Ann; Venkatesh, Svetha
2014-12-30
Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities. Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD-baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes-baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders-baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia-baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72). The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.
Factors influencing protein tyrosine nitration – structure-based predictive models
Bayden, Alexander S.; Yakovlev, Vasily A.; Graves, Paul R.; Mikkelsen, Ross B.; Kellogg, Glen E.
2010-01-01
Models for exploring tyrosine nitration in proteins have been created based on 3D structural features of 20 proteins for which high resolution X-ray crystallographic or NMR data are available and for which nitration of 35 total tyrosines has been experimentally proven under oxidative stress. Factors suggested in previous work to enhance nitration were examined with quantitative structural descriptors. The role of neighboring acidic and basic residues is complex: for the majority of tyrosines that are nitrated the distance to the heteroatom of the closest charged sidechain corresponds to the distance needed for suspected nitrating species to form hydrogen bond bridges between the tyrosine and that charged amino acid. This suggests that such bridges play a very important role in tyrosine nitration. Nitration is generally hindered for tyrosines that are buried and for those tyrosines where there is insufficient space for the nitro group. For in vitro nitration, closed environments with nearby heteroatoms or unsaturated centers that can stabilize radicals are somewhat favored. Four quantitative structure-based models, depending on the conditions of nitration, have been developed for predicting site-specific tyrosine nitration. The best model, relevant for both in vitro and in vivo cases predicts 30 of 35 tyrosine nitrations (positive predictive value) and has a sensitivity of 60/71 (11 false positives). PMID:21172423
Zurek, Eva; Grochala, Wojciech
2014-11-27
Experimental studies of compressed matter are now routinely conducted at pressures exceeding 1 mln atm (100 GPa) and occasionally they even surpass 10 mln atm (1 TPa). The structure and properties of solids that have been so significantly squeezed differ considerably from those know at ambient pressures (1 atm), often times leading to new and unexpected physics. Chemical reactivity is also substantially altered in the extreme pressure regime. In this feature paper we describe how synergy between theory and experiment can pave the road towards new experimental discoveries. Because chemical rules-of-thumb established at 1 atm often fail to predict themore » structures of solids under high pressure, automated crystal structure prediction (CSP) methods have been increasingly employed. After outlining the most important CSP techniques, we showcase a few examples from the recent literature that exemplify just how useful theory can be as an aid in the interpretation of experimental data, describe exciting theoretical predictions that are guiding experiment, and discuss when the computational methods that are currently routinely employed fail. Lastly, we forecast important problems that will be targeted by theory as theoretical methods undergo rapid development, along with the simultaneous increase of computational power.« less
3dRPC: a web server for 3D RNA-protein structure prediction.
Huang, Yangyu; Li, Haotian; Xiao, Yi
2018-04-01
RNA-protein interactions occur in many biological processes. To understand the mechanism of these interactions one needs to know three-dimensional (3D) structures of RNA-protein complexes. 3dRPC is an algorithm for prediction of 3D RNA-protein complex structures and consists of a docking algorithm RPDOCK and a scoring function 3dRPC-Score. RPDOCK is used to sample possible complex conformations of an RNA and a protein by calculating the geometric and electrostatic complementarities and stacking interactions at the RNA-protein interface according to the features of atom packing of the interface. 3dRPC-Score is a knowledge-based potential that uses the conformations of nucleotide-amino-acid pairs as statistical variables and that is used to choose the near-native complex-conformations obtained from the docking method above. Recently, we built a web server for 3dRPC. The users can easily use 3dRPC without installing it locally. RNA and protein structures in PDB (Protein Data Bank) format are the only needed input files. It can also incorporate the information of interface residues or residue-pairs obtained from experiments or theoretical predictions to improve the prediction. The address of 3dRPC web server is http://biophy.hust.edu.cn/3dRPC. yxiao@hust.edu.cn.
Ensembles of novelty detection classifiers for structural health monitoring using guided waves
NASA Astrophysics Data System (ADS)
Dib, Gerges; Karpenko, Oleksii; Koricho, Ermias; Khomenko, Anton; Haq, Mahmoodul; Udpa, Lalita
2018-01-01
Guided wave structural health monitoring uses sparse sensor networks embedded in sophisticated structures for defect detection and characterization. The biggest challenge of those sensor networks is developing robust techniques for reliable damage detection under changing environmental and operating conditions (EOC). To address this challenge, we develop a novelty classifier for damage detection based on one class support vector machines. We identify appropriate features for damage detection and introduce a feature aggregation method which quadratically increases the number of available training observations. We adopt a two-level voting scheme by using an ensemble of classifiers and predictions. Each classifier is trained on a different segment of the guided wave signal, and each classifier makes an ensemble of predictions based on a single observation. Using this approach, the classifier can be trained using a small number of baseline signals. We study the performance using Monte-Carlo simulations of an analytical model and data from impact damage experiments on a glass fiber composite plate. We also demonstrate the classifier performance using two types of baseline signals: fixed and rolling baseline training set. The former requires prior knowledge of baseline signals from all EOC, while the latter does not and leverages the fact that EOC vary slowly over time and can be modeled as a Gaussian process.
ProQ3: Improved model quality assessments using Rosetta energy terms
Uziela, Karolis; Shu, Nanjiang; Wallner, Björn; Elofsson, Arne
2016-01-01
Quality assessment of protein models using no other information than the structure of the model itself has been shown to be useful for structure prediction. Here, we introduce two novel methods, ProQRosFA and ProQRosCen, inspired by the state-of-art method ProQ2, but using a completely different description of a protein model. ProQ2 uses contacts and other features calculated from a model, while the new predictors are based on Rosetta energies: ProQRosFA uses the full-atom energy function that takes into account all atoms, while ProQRosCen uses the coarse-grained centroid energy function. The two new predictors also include residue conservation and terms corresponding to the agreement of a model with predicted secondary structure and surface area, as in ProQ2. We show that the performance of these predictors is on par with ProQ2 and significantly better than all other model quality assessment programs. Furthermore, we show that combining the input features from all three predictors, the resulting predictor ProQ3 performs better than any of the individual methods. ProQ3, ProQRosFA and ProQRosCen are freely available both as a webserver and stand-alone programs at http://proq3.bioinfo.se/. PMID:27698390
The structure of distractor-response bindings: Conditions for configural and elemental integration.
Moeller, Birte; Frings, Christian; Pfister, Roland
2016-04-01
Human action control is influenced by bindings between perceived stimuli and responses carried out in their presence. Notably, responses given to a target stimulus can also be integrated with additional response-irrelevant distractor stimuli that accompany the target (distractor-response binding). Subsequently reencountering such a distractor then retrieves the associated response. Although a large body of evidence supports the existence of this effect, the specific structure of distractor-response bindings is still unclear. Here, we test the predictions derived from 2 possible assumptions about the structure of bindings between distractors and responses. According to a configural approach, the entire distractor object is integrated with a response, and only upon repetition of the entire distractor object the associated response would be retrieved. According to an elemental approach, one would predict integration of individual distractor features with the response and retrieval due to the repetition of an individual distractor feature. Four experiments indicate that both, configural and elemental bindings exist and specify boundary conditions for each type of binding. These findings provide detailed insights into the architecture of bindings between response-irrelevant stimuli and actions and thus allow for specifying how distractor stimuli influence human behavior. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Optimized Structure of the Traffic Flow Forecasting Model With a Deep Learning Approach.
Yang, Hao-Fan; Dillon, Tharam S; Chen, Yi-Ping Phoebe
2017-10-01
Forecasting accuracy is an important issue for successful intelligent traffic management, especially in the domain of traffic efficiency and congestion reduction. The dawning of the big data era brings opportunities to greatly improve prediction accuracy. In this paper, we propose a novel model, stacked autoencoder Levenberg-Marquardt model, which is a type of deep architecture of neural network approach aiming to improve forecasting accuracy. The proposed model is designed using the Taguchi method to develop an optimized structure and to learn traffic flow features through layer-by-layer feature granulation with a greedy layerwise unsupervised learning algorithm. It is applied to real-world data collected from the M6 freeway in the U.K. and is compared with three existing traffic predictors. To the best of our knowledge, this is the first time that an optimized structure of the traffic flow forecasting model with a deep learning approach is presented. The evaluation results demonstrate that the proposed model with an optimized structure has superior performance in traffic flow forecasting.
Predicting the effectiveness of viscoelastic damping pockets in beams
NASA Astrophysics Data System (ADS)
Butler, Nigel D.; Oyadiji, S. O.
2005-05-01
This paper looks at the use of viscoelastic damping pockets in the suppression of structural vibration. These are in the form of cavities filled with a viscoelastic material. The benefits and uses of these designed-in damping treatments are highlighted. The vibration responses of viscoelastically-damped beams are predicted using the finite element method. A series of cantilevered beams are considered and the damping performance of several configurations of designed-in dampers are predicted and compared to that of a traditional CLD treatment. It is shown that the effectiveness of the damping pockets and sinks depends on their location and size with respect to the highly stressed regions of the beams. Although there is a practical limit on the sizes of the geometrical features that can be designed-in, it is shown that if located correctly the damping pockets and sinks can be more effective at suppressing structural vibration than traditional CLD treatments.
Modelling biological invasions: species traits, species interactions, and habitat heterogeneity.
Cannas, Sergio A; Marco, Diana E; Páez, Sergio A
2003-05-01
In this paper we explore the integration of different factors to understand, predict and control ecological invasions, through a general cellular automaton model especially developed. The model includes life history traits of several species in a modular structure interacting multiple cellular automata. We performed simulations using field values corresponding to the exotic Gleditsia triacanthos and native co-dominant trees in a montane area. Presence of G. triacanthos juvenile bank was a determinant condition for invasion success. Main parameters influencing invasion velocity were mean seed dispersal distance and minimum reproductive age. Seed production had a small influence on the invasion velocity. Velocities predicted by the model agreed well with estimations from field data. Values of population density predicted matched field values closely. The modular structure of the model, the explicit interaction between the invader and the native species, and the simplicity of parameters and transition rules are novel features of the model.
Communication: Theoretical prediction of free-energy landscapes for complex self-assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacobs, William M.; Reinhardt, Aleks; Frenkel, Daan
2015-01-14
We present a technique for calculating free-energy profiles for the nucleation of multicomponent structures that contain as many species as building blocks. We find that a key factor is the topology of the graph describing the connectivity of the target assembly. By considering the designed interactions separately from weaker, incidental interactions, our approach yields predictions for the equilibrium yield and nucleation barriers. These predictions are in good agreement with corresponding Monte Carlo simulations. We show that a few fundamental properties of the connectivity graph determine the most prominent features of the assembly thermodynamics. Surprisingly, we find that polydispersity in themore » strengths of the designed interactions stabilizes intermediate structures and can be used to sculpt the free-energy landscape for self-assembly. Finally, we demonstrate that weak incidental interactions can preclude assembly at equilibrium due to the combinatorial possibilities for incorrect association.« less
Static and fatigue testing of full-scale fuselage panels fabricated using a Therm-X(R) process
NASA Technical Reports Server (NTRS)
Dinicola, Albert J.; Kassapoglou, Christos; Chou, Jack C.
1992-01-01
Large, curved, integrally stiffened composite panels representative of an aircraft fuselage structure were fabricated using a Therm-X process, an alternative concept to conventional two-sided hard tooling and contour vacuum bagging. Panels subsequently were tested under pure shear loading in both static and fatigue regimes to assess the adequacy of the manufacturing process, the effectiveness of damage tolerant design features co-cured with the structure, and the accuracy of finite element and closed-form predictions of postbuckling capability and failure load. Test results indicated the process yielded panels of high quality and increased damage tolerance through suppression of common failure modes such as skin-stiffener separation and frame-stiffener corner failure. Finite element analyses generally produced good predictions of postbuckled shape, and a global-local modelling technique yielded failure load predictions that were within 7% of the experimental mean.
NASA Astrophysics Data System (ADS)
Klose, C. D.; Giese, R.; Löw, S.; Borm, G.
Especially for deep underground excavations, the prediction of the locations of small- scale hazardous geotechnical structures is nearly impossible when exploration is re- stricted to surface based methods. Hence, for the AlpTransit base tunnels, exploration ahead has become an essential component of the excavation plan. The project de- scribed in this talk aims at improving the technology for the geological interpretation of reflection seismic data. The discovered geological-seismic relations will be used to develop an interpretation system based on artificial intelligence to predict hazardous geotechnical structures of the advancing tunnel face. This talk gives, at first, an overview about the data mining of geological and seismic properties of metamorphic rocks within the Penninic gneiss zone in Southern Switzer- land. The data results from measurements of a specific geophysical prediction system developed by the GFZ Potsdam, Germany, along the 2600 m long and 1400 m deep Faido access tunnel. The goal is to find those seismic features (i.e. compression and shear wave velocities, velocity ratios and velocity gradients) which show a significant relation to geological properties (i.e. fracturing and fabric features). The seismic properties were acquired from different tomograms, whereas the geolog- ical features derive from tunnel face maps. The features are statistically compared with the seismic rock properties taking into account the different methods used for the tunnel excavation (TBM and Drill/Blast). Fracturing and the mica content stay in a positive relation to the velocity values. Both, P- and S-wave velocities near the tunnel surface describe the petrology better, whereas in the interior of the rock mass they correlate to natural micro- and macro-scopic fractures surrounding tectonites, i.e. cataclasites. The latter lie outside of the excavation damage zone and the tunnel loos- ening zone. The shear wave velocities are better indicators for rock fracturing than compression wave velocities. The velocity ratios indicate the mica content and the water content of the rocks.
Brain properties predict proximity to symptom onset in sporadic Alzheimer’s disease
Vogel, Jacob W; Vachon-Presseau, Etienne; Pichet Binette, Alexa; Tam, Angela; Orban, Pierre; La Joie, Renaud; Savard, Mélissa; Picard, Cynthia; Poirier, Judes; Bellec, Pierre; Breitner, John C S; Villeneuve, Sylvia
2018-01-01
Abstract See Tijms and Visser (doi:10.1093/brain/awy113) for a scientific commentary on this article. Alzheimer’s disease is preceded by a lengthy ‘preclinical’ stage spanning many years, during which subtle brain changes occur in the absence of overt cognitive symptoms. Predicting when the onset of disease symptoms will occur is an unsolved challenge in individuals with sporadic Alzheimer’s disease. In individuals with autosomal dominant genetic Alzheimer’s disease, the age of symptom onset is similar across generations, allowing the prediction of individual onset times with some accuracy. We extend this concept to persons with a parental history of sporadic Alzheimer’s disease to test whether an individual’s symptom onset age can be informed by the onset age of their affected parent, and whether this estimated onset age can be predicted using only MRI. Structural and functional MRIs were acquired from 255 ageing cognitively healthy subjects with a parental history of sporadic Alzheimer’s disease from the PREVENT-AD cohort. Years to estimated symptom onset was calculated as participant age minus age of parental symptom onset. Grey matter volume was extracted from T1-weighted images and whole-brain resting state functional connectivity was evaluated using degree count. Both modalities were summarized using a 444-region cortical-subcortical atlas. The entire sample was divided into training (n = 138) and testing (n = 68) sets. Within the training set, individuals closer to or beyond their parent’s symptom onset demonstrated reduced grey matter volume and altered functional connectivity, specifically in regions known to be vulnerable in Alzheimer’s disease. Machine learning was used to identify a weighted set of imaging features trained to predict years to estimated symptom onset. This feature set alone significantly predicted years to estimated symptom onset in the unseen testing data. This model, using only neuroimaging features, significantly outperformed a similar model instead trained with cognitive, genetic, imaging and demographic features used in a traditional clinical setting. We next tested if these brain properties could be generalized to predict time to clinical progression in a subgroup of 26 individuals from the Alzheimer’s Disease Neuroimaging Initiative, who eventually converted either to mild cognitive impairment or to Alzheimer’s dementia. The feature set trained on years to estimated symptom onset in the PREVENT-AD predicted variance in time to clinical conversion in this separate longitudinal dataset. Adjusting for participant age did not impact any of the results. These findings demonstrate that years to estimated symptom onset or similar measures can be predicted from brain features and may help estimate presymptomatic disease progression in at-risk individuals. PMID:29688388
Features of Inner Structure of Placer Gold of the North-Eastern Part Siberian Platform
NASA Astrophysics Data System (ADS)
Gerasimov, Boris; Zhuravlev, Anatolii; Ivanov, Alexey
2017-12-01
Mineral and raw material base of placer and ore gold is based on prognosis evaluation, which allows to define promising areas regarding gold-bearing deposit prospecting. But there are some difficulties in gold primary source predicting and prospecting at the North-east Siberian platform, because the studied area is overlapped by thick cover of the Cenozoic deposits, where traditional methods of gold deposit prospecting are ineffective. In this connection, detailed study of typomorphic features of placer gold is important, because it contains key genetic information, necessary for development of mineralogical criteria of prognosis evaluation of ore gold content. Authors studied mineralogical-geochemical features of placer gold of the Anabar placer area for 15 years, with a view to identify indicators of gold, typical for different formation types of primary sources. This article presents results of these works. In placer regions, where primary sources of gold are not identified, there is need to study typomorphic features of placer gold, because it contains important genetic information, necessary for the development of mineralogical criteria of prognosis evaluation of ore gold content. Inner structures of gold from the Anabar placer region are studied, as one of the diagnostic typomorphic criteria as described in prominent method, developed by N.V. Petrovskaya [1980]. Etching of gold was carried out using reagent: HCl + HNO3 + FeCl3 × 6H2O + CrO3 +thioureat + water. Identified inner structures wer studied in details by means of scanning electron microscope JEOL JSM-6480LV. Two types of gold are identified according to the features of inner structure of placer gold of the Anabar region. First type - medium-high karat fine, well processed gold with significantly changed inner structure. This gold is allochthonous, which was redeposited many times from ancient intermediate reservoirs to younger deposits. Second type - low-medium karat, poorly rounded gold with unchanged inner structure. Poor roundness of gold particles and preservation of their primary inner structures indicate close proximity of primary source.
Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.
Liu, Liang; Cai, Yudong; Lu, Wencong; Feng, Kaiyan; Peng, Chunrong; Niu, Bing
2009-03-06
Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.
NASA Astrophysics Data System (ADS)
Mishra, Aashwin; Iaccarino, Gianluca
2017-11-01
In spite of their deficiencies, RANS models represent the workhorse for industrial investigations into turbulent flows. In this context, it is essential to provide diagnostic measures to assess the quality of RANS predictions. To this end, the primary step is to identify feature importances amongst massive sets of potentially descriptive and discriminative flow features. This aids the physical interpretability of the resultant discrepancy model and its extensibility to similar problems. Recent investigations have utilized approaches such as Random Forests, Support Vector Machines and the Least Absolute Shrinkage and Selection Operator for feature selection. With examples, we exhibit how such methods may not be suitable for turbulent flow datasets. The underlying rationale, such as the correlation bias and the required conditions for the success of penalized algorithms, are discussed with illustrative examples. Finally, we provide alternate approaches using convex combinations of regularized regression approaches and randomized sub-sampling in combination with feature selection algorithms, to infer model structure from data. This research was supported by the Defense Advanced Research Projects Agency under the Enabling Quantification of Uncertainty in Physical Systems (EQUiPS) project (technical monitor: Dr Fariba Fahroo).
Quantitative radiomic profiling of glioblastoma represents transcriptomic expression.
Kong, Doo-Sik; Kim, Junhyung; Ryu, Gyuha; You, Hye-Jin; Sung, Joon Kyung; Han, Yong Hee; Shin, Hye-Mi; Lee, In-Hee; Kim, Sung-Tae; Park, Chul-Kee; Choi, Seung Hong; Choi, Jeong Won; Seol, Ho Jun; Lee, Jung-Il; Nam, Do-Hyun
2018-01-19
Quantitative imaging biomarkers have increasingly emerged in the field of research utilizing available imaging modalities. We aimed to identify good surrogate radiomic features that can represent genetic changes of tumors, thereby establishing noninvasive means for predicting treatment outcome. From May 2012 to June 2014, we retrospectively identified 65 patients with treatment-naïve glioblastoma with available clinical information from the Samsung Medical Center data registry. Preoperative MR imaging data were obtained for all 65 patients with primary glioblastoma. A total of 82 imaging features including first-order statistics, volume, and size features, were semi-automatically extracted from structural and physiologic images such as apparent diffusion coefficient and perfusion images. Using commercially available software, NordicICE, we performed quantitative imaging analysis and collected the dataset composed of radiophenotypic parameters. Unsupervised clustering methods revealed that the radiophenotypic dataset was composed of three clusters. Each cluster represented a distinct molecular classification of glioblastoma; classical type, proneural and neural types, and mesenchymal type. These clusters also reflected differential clinical outcomes. We found that extracted imaging signatures does not represent copy number variation and somatic mutation. Quantitative radiomic features provide a potential evidence to predict molecular phenotype and treatment outcome. Radiomic profiles represents transcriptomic phenotypes more well.
Radiomics-based Prognosis Analysis for Non-Small Cell Lung Cancer
NASA Astrophysics Data System (ADS)
Zhang, Yucheng; Oikonomou, Anastasia; Wong, Alexander; Haider, Masoom A.; Khalvati, Farzad
2017-04-01
Radiomics characterizes tumor phenotypes by extracting large numbers of quantitative features from radiological images. Radiomic features have been shown to provide prognostic value in predicting clinical outcomes in several studies. However, several challenges including feature redundancy, unbalanced data, and small sample sizes have led to relatively low predictive accuracy. In this study, we explore different strategies for overcoming these challenges and improving predictive performance of radiomics-based prognosis for non-small cell lung cancer (NSCLC). CT images of 112 patients (mean age 75 years) with NSCLC who underwent stereotactic body radiotherapy were used to predict recurrence, death, and recurrence-free survival using a comprehensive radiomics analysis. Different feature selection and predictive modeling techniques were used to determine the optimal configuration of prognosis analysis. To address feature redundancy, comprehensive analysis indicated that Random Forest models and Principal Component Analysis were optimum predictive modeling and feature selection methods, respectively, for achieving high prognosis performance. To address unbalanced data, Synthetic Minority Over-sampling technique was found to significantly increase predictive accuracy. A full analysis of variance showed that data endpoints, feature selection techniques, and classifiers were significant factors in affecting predictive accuracy, suggesting that these factors must be investigated when building radiomics-based predictive models for cancer prognosis.
NASA Astrophysics Data System (ADS)
Ivanova, Julia
2014-05-01
The complexity of any task solving, including tasks in the Earth Sciences, depends on the completeness of the information that is available. The prediction of the mineralization zone localization is a task with incomplete information. The tasks of prediction are complicated because of search data difficult formalize, and the absent of single information structures of the representation of the search data. These facts complicate the process of structuring, processing and analysis of information. Geodata that need to process are presented in various formats: raster two-dimensional and three-dimensional fields, vector layers of polygons and lines, point markable layers, the spectral and discrete, quantized and continuous, analog and digital forms, as well as chemical formalization. In this form representative data cannot be combining into superclasses. At the same time the information content of geodata that are applied individually is very small. While a number of low informative features, which can be obtained in the process of research of mineralization zones are usually redundant. As a result the quality of knowledge that can be obtained from the search data decreases, as well as the technological cycle of information processing increases. Additionally, that leads to exploitation of datasets, and production of large shared datasets [1]. To solve efficiently the tasks of predicting, it is necessary to use union heterogeneous search features, accumulated factual data and modern science-based mathematical apparatus of processing and analysis of the information. As well young branches of human knowledge help to solve this task: remote sensing, geoinformatics, Earth and Space Science Informatics [2], apparatus of catastrophe theory and nonlinear dynamics, game theory. The purpose of the suggested approach is to increase informational content, and to reduce of geodata redundancy to improve the accuracy of the prediction of mineralization zones. The developed algorithm of prediction of the localization of mineralization zone consists of the some steps: 1. The collection of information about the studying territory of upcoming work from various sources, i.e. building of database (DB). The DB includes variety geodata. 2. The formalization, the concatenation and the union of geodata. Study of features correlation characteristics. Generation of new formal and functional search features. 3. The formation of a number of hypotheses based on initial data. The refinement of search features. 4. Preliminary mathematical modeling of prospective mineralized zones. The study of obtained results, the formation of additional features list. 5. The collection of additional features by field methods for verifying of hypotheses. 6. Processing and analyzing of obtaining data, the specification of preliminary mathematical model. 7. The examination of hypotheses using the obtained results. The study of prediction errors. 8. Building of multidimensional risk matrices of detection and bifurcation diagrams of mineralization [3]. 9. The final mathematical modeling of perspective mineralized zones. Thus, the proposed approach allows to increase the information content of geodata significantly, to reduce redundancy of geodate, and to increase the accuracy of predicting zones of gold mineralization. Currently the approach, suggested by the author, applies for prediction of the localization of gold mineralization at the territory of the Polar Urals. References: 1.W. J. Som de Cerff, M. Petitdidier, A. Gemünd, L. Horstink, H. Schwichtenberg, Earth Science Test Suites to Evaluate Grid Tools and Middleware-Examples for Grid Data Access Tools, Earth Science Informatics, Vol. 2, 117-131, 2009. DOI 10.1007/s12145-009-0022-y. 2. P. Mazzetti , S. Nativi, J. Caron, RESTfulI implementation of Geospatial, Services for Earth and Space Science Applications, International Journal of Digital Earth, Vol. 2, Supplement 1, 40-61, 2009. DOI: 10.1080/175389409028661532. 3. Arnold V.I., Catastrophe Theory, 4th ed. Moscow, Editorial-URSS (2004), ISBN 5-354-00674-0 (in Russian).
Predicting hepatotoxicity using ToxCast in vitro bioactivity and ...
Background: The U.S. EPA ToxCastTM program is screening thousands of environmental chemicals for bioactivity using hundreds of high-throughput in vitro assays to build predictive models of toxicity. We represented chemicals based on bioactivity and chemical structure descriptors then used supervised machine learning to predict their hepatotoxic effects.Results: A set of 677 chemicals were represented by 711 in vitro bioactivity descriptors (from ToxCast assays), 4,376 chemical structure descriptors (from QikProp, OpenBabel, PADEL, and PubChem), and three hepatotoxicity categories (from animal studies). Hepatotoxicants were defined by rat liver histopathology observed after chronic chemical testing and grouped into hypertrophy (161), injury (101) and proliferative lesions (99). Classifiers were built using six machine learning algorithms: linear discriminant analysis (LDA), Naïve Bayes (NB), support vector classification (SVM), classification and regression trees (CART), k-nearest neighbors (KNN) and an ensemble of classifiers (ENSMB). Classifiers of hepatotoxicity were built using chemical structure, ToxCast bioactivity, and a hybrid representation. Predictive performance was evaluated using 10-fold cross-validation testing and in-loop, filter-based, feature subset selection. Hybrid classifiers had the best balanced accuracy for predicting hypertrophy (0.78±0.08), injury (0.73±0.10) and proliferative lesions (0.72±0.09). Though chemical and bioactivity class
Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties.
Yue, Zhenyu; Zhang, Wenna; Lu, Yongming; Yang, Qiaoyue; Ding, Qiuying; Xia, Junfeng; Chen, Yan
2015-01-01
Natural products play a significant role in cancer chemotherapy. They are likely to provide many lead structures, which can be used as templates for the construction of novel drugs with enhanced antitumor activity. Traditional research approaches studied structure-activity relationship of natural products and obtained key structural properties, such as chemical bond or group, with the purpose of ascertaining their effect on a single cell line or a single tissue type. Here, for the first time, we develop a machine learning method to comprehensively predict natural products responses against a panel of cancer cell lines based on both the gene expression and the chemical properties of natural products. The results on two datasets, training set and independent test set, show that this proposed method yields significantly better prediction accuracy. In addition, we also demonstrate the predictive power of our proposed method by modeling the cancer cell sensitivity to two natural products, Curcumin and Resveratrol, which indicate that our method can effectively predict the response of cancer cell lines to these two natural products. Taken together, the method will facilitate the identification of natural products as cancer therapies and the development of precision medicine by linking the features of patient genomes to natural product sensitivity.
Energy efficient engine fan component detailed design report
NASA Technical Reports Server (NTRS)
Halle, J. E.; Michael, C. J.
1981-01-01
The fan component which was designed for the energy efficient engine is an advanced high performance, single stage system and is based on technology advancements in aerodynamics and structure mechanics. Two fan components were designed, both meeting the integrated core/low spool engine efficiency goal of 84.5%. The primary configuration, envisioned for a future flight propulsion system, features a shroudless, hollow blade and offers a predicted efficiency of 87.3%. A more conventional blade was designed, as a back up, for the integrated core/low spool demonstrator engine. The alternate blade configuration has a predicted efficiency of 86.3% for the future flight propulsion system. Both fan configurations meet goals established for efficiency surge margin, structural integrity and durability.
Hao, Xiao-Hu; Zhang, Gui-Jun; Zhou, Xiao-Gen; Yu, Xu-Feng
2016-01-01
To address the searching problem of protein conformational space in ab-initio protein structure prediction, a novel method using abstract convex underestimation (ACUE) based on the framework of evolutionary algorithm was proposed. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and rugged energy surface of the protein conformational space. As a consequence, the dimension of protein conformational space should be reduced to a proper level. In this paper, the high-dimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique. And, the underestimate space could be constructed according to abstract convex theory. Thus, the entropy effect caused by searching in the high-dimensionality conformational space could be avoided through such conversion. The tight lower bound estimate information was obtained to guide the searching direction, and the invalid searching area in which the global optimal solution is not located could be eliminated in advance. Moreover, instead of expensively calculating the energy of conformations in the original conformational space, the estimate value is employed to judge if the conformation is worth exploring to reduce the evaluation time, thereby making computational cost lower and the searching process more efficient. Additionally, fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space. The proposed method provides a novel technique to solve the searching problem of protein conformational space. Twenty small-to-medium structurally diverse proteins were tested, and the proposed ACUE method was compared with It Fix, HEA, Rosetta and the developed method LEDE without underestimate information. Test results show that the ACUE method can more rapidly and more efficiently obtain the near-native protein structure.
Neuro-symbolic representation learning on biological knowledge graphs.
Alshahrani, Mona; Khan, Mohammad Asif; Maddouri, Omar; Kinjo, Akira R; Queralt-Rosinach, Núria; Hoehndorf, Robert
2017-09-01
Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. https://github.com/bio-ontology-research-group/walking-rdf-and-owl. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Burganos, Vasilis N.; Skouras, Eugene D.; Kalarakis, Alexandros N.
2017-10-01
The lattice-Boltzmann (LB) method is used in this work to reproduce the controlled addition of binder and hydrophobicity-promoting agents, like polytetrafluoroethylene (PTFE), into gas diffusion layers (GDLs) and to predict flow permeabilities in the through- and in-plane directions. The present simulator manages to reproduce spreading of binder and hydrophobic additives, sequentially, into the neat fibrous layer using a two-phase flow model. Gas flow simulation is achieved by the same code, sidestepping the need for a post-processing flow code and avoiding the usual input/output and data interface problems that arise in other techniques. Compression effects on flow anisotropy of the impregnated GDL are also studied. The permeability predictions for different compression levels and for different binder or PTFE loadings are found to compare well with experimental data for commercial GDL products and with computational fluid dynamics (CFD) predictions. Alternatively, the PTFE-impregnated structure is reproduced from Scanning Electron Microscopy (SEM) images using an independent, purely geometrical approach. A comparison of the two approaches is made regarding their adequacy to reproduce correctly the main structural features of the GDL and to predict anisotropic flow permeabilities at different volume fractions of binder and hydrophobic additives.
Material Parameter Sensitivity of Predicted Injury in the Lower Leg
2015-06-01
in a region of the structure that experienced the largest strains due to geometric or structural features, e.g., a sharp curve or point. The specific...Annals of Biomedical Engineering. 2012;40(12):2519–2531. 23. Iwamoto M, Omori K, Kimpara H, Nakahira Y, Tamura A, Watanabe I, Miki K, Hasegawa J...cortical layer; the void space between the inner scaled bone and the original outer bone was considered the cortical shell. Thus, a sharp interface exists
Gromiha, M Michael; Anoosha, P; Huang, Liang-Tsung
2016-01-01
Protein stability is the free energy difference between unfolded and folded states of a protein, which lies in the range of 5-25 kcal/mol. Experimentally, protein stability is measured with circular dichroism, differential scanning calorimetry, and fluorescence spectroscopy using thermal and denaturant denaturation methods. These experimental data have been accumulated in the form of a database, ProTherm, thermodynamic database for proteins and mutants. It also contains sequence and structure information of a protein, experimental methods and conditions, and literature information. Different features such as search, display, and sorting options and visualization tools have been incorporated in the database. ProTherm is a valuable resource for understanding/predicting the stability of proteins and it can be accessed at http://www.abren.net/protherm/ . ProTherm has been effectively used to examine the relationship among thermodynamics, structure, and function of proteins. We describe the recent progress on the development of methods for understanding/predicting protein stability, such as (1) general trends on mutational effects on stability, (2) relationship between the stability of protein mutants and amino acid properties, (3) applications of protein three-dimensional structures for predicting their stability upon point mutations, (4) prediction of protein stability upon single mutations from amino acid sequence, and (5) prediction methods for addressing double mutants. A list of online resources for predicting has also been provided.
The value of nodal information in predicting lung cancer relapse using 4DPET/4DCT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Heyse, E-mail: heyse.li@mail.utoronto.ca; Becker, Nathan; Raman, Srinivas
2015-08-15
Purpose: There is evidence that computed tomography (CT) and positron emission tomography (PET) imaging metrics are prognostic and predictive in nonsmall cell lung cancer (NSCLC) treatment outcomes. However, few studies have explored the use of standardized uptake value (SUV)-based image features of nodal regions as predictive features. The authors investigated and compared the use of tumor and node image features extracted from the radiotherapy target volumes to predict relapse in a cohort of NSCLC patients undergoing chemoradiation treatment. Methods: A prospective cohort of 25 patients with locally advanced NSCLC underwent 4DPET/4DCT imaging for radiation planning. Thirty-seven image features were derivedmore » from the CT-defined volumes and SUVs of the PET image from both the tumor and nodal target regions. The machine learning methods of logistic regression and repeated stratified five-fold cross-validation (CV) were used to predict local and overall relapses in 2 yr. The authors used well-known feature selection methods (Spearman’s rank correlation, recursive feature elimination) within each fold of CV. Classifiers were ranked on their Matthew’s correlation coefficient (MCC) after CV. Area under the curve, sensitivity, and specificity values are also presented. Results: For predicting local relapse, the best classifier found had a mean MCC of 0.07 and was composed of eight tumor features. For predicting overall relapse, the best classifier found had a mean MCC of 0.29 and was composed of a single feature: the volume greater than 0.5 times the maximum SUV (N). Conclusions: The best classifier for predicting local relapse had only tumor features. In contrast, the best classifier for predicting overall relapse included a node feature. Overall, the methods showed that nodes add value in predicting overall relapse but not local relapse.« less
The nop gene from Phanerochaete chrysosporium encodes a peroxidase with novel structural features
Luis F. Larrondo; Angel Gonzalez; Tomas Perez-Acle; Dan Cullen; Rafael Vicuna
2005-01-01
Inspection of the genome of the ligninolytic basidiomycete Phanerochaete chrysosporium revealed an unusual peroxidase-like sequence. The corresponding full length cDNA was sequenced and an archetypal secretion signal predicted. The deduced mature protein (NoP, novel peroxidase) contains 295 aa residues and is therefore considerably shorter than other Class II (fungal)...
Structural impact and crashworthiness. Volume 1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davies, G.A.O.
1984-01-01
This volume contains the keynote addresses of those speakers invited to the International Confernece on Structural Impact and Crashworthiness held at Imperial College, London, in 1984. The speakers represent authoritative views on topics covering the spectrum of impact and crashworthiness involving several materials. The theme of this book may be summarized as 'understanding/modelling/prediction.' Ultimately a crashworthy design depends on many conceptual decisions being correct in the initial design phase. The overall configuration of a structure may be paramount; the detail design of joints and so on has to enable the structure to exploit energy absorption; the fail-safe features must notmore » be prohibitively expensive.« less
Wang, Zhenhai; Zhou, Xiang-Feng; Zhang, Xiaoming; Zhu, Qiang; Dong, Huafeng; Zhao, Mingwen; Oganov, Artem R
2015-09-09
Using systematic evolutionary structure searching we propose a new carbon allotrope, phagraphene [fæ'græfi:n], standing for penta-hexa-hepta-graphene, because the structure is composed of 5-6-7 carbon rings. This two-dimensional (2D) carbon structure is lower in energy than most of the predicted 2D carbon allotropes due to its sp(2)-binding features and density of atomic packing comparable to graphene. More interestingly, the electronic structure of phagraphene has distorted Dirac cones. The direction-dependent cones are further proved to be robust against external strain with tunable Fermi velocities.
Scattering by ensembles of small particles experiment, theory and application
NASA Technical Reports Server (NTRS)
Gustafson, B. A. S.
1980-01-01
A hypothetical self consistent picture of evolution of prestellar intertellar dust through a comet phase leads to predictions about the composition of the circum-solar dust cloud. Scattering properties of thus resulting conglomerates with a bird's-nest type of structure are investigated using a micro-wave analogue technique. Approximate theoretical methods of general interest are developed which compared favorably with the experimental results. The principal features of scattering of visible radiation by zodiacal light particles are reasonably reproduced. A component which is suggestive of (ALPHA)-meteoroids is also predicted.
Characterizing primary refractory neuroblastoma: prediction of outcome by microscopic image analysis
NASA Astrophysics Data System (ADS)
Niazi, M. Khalid Khan; Weiser, Daniel A.; Pawel, Bruce R.; Gurcan, Metin N.
2015-03-01
Neuroblastoma is a childhood cancer that starts in very early forms of nerve cells found in an embryo or fetus. It is a highly lethal cancer of sympathetic nervous system that commonly affects children of age five or younger. It accounts for a disproportionate number of childhood cancer deaths and remains a difficult cancer to eradicate despite intensive treatment that includes chemotherapy, surgery, hematopoietic stem cell transplantation, radiation therapy and immunotherapy. A poorly characterized group of patients are the 15% with primary refractory neuroblastoma (PRN) which is uniformly lethal due to de novo chemotherapy resistance. The lack of response to therapy is currently assessed after multiple months of cytotoxic therapy, driving the critical need to develop pretreatment clinic-biological biomarkers that can guide precise and effective therapeutic strategies. Therefore, our guiding hypothesis is that PRN has distinct biological features present at diagnosis that can be identified for prediction modeling. During a visual analysis of PRN slides, stained with hematoxylin and eosin, we observed that patients who survived for less than three years contained large eosin-stained structures as compared to those who survived for greater than three years. So, our hypothesis is that the size of eosin stained structures can be used as a differentiating feature to characterize recurrence in neuroblastoma. To test this hypothesis, we developed an image analysis method that performs stain separation, followed by the detection of large structures stained with Eosin. On a set of 21 PRN slides, stained with hematoxylin and eosin, our image analysis method predicted the outcome with 85.7% accuracy.
Extraction of business relationships in supply networks using statistical learning theory.
Zuo, Yi; Kajikawa, Yuya; Mori, Junichiro
2016-06-01
Supply chain management represents one of the most important scientific streams of operations research. The supply of energy, materials, products, and services involves millions of transactions conducted among national and local business enterprises. To deliver efficient and effective support for supply chain design and management, structural analyses and predictive models of customer-supplier relationships are expected to clarify current enterprise business conditions and to help enterprises identify innovative business partners for future success. This article presents the outcomes of a recent structural investigation concerning a supply network in the central area of Japan. We investigated the effectiveness of statistical learning theory to express the individual differences of a supply chain of enterprises within a certain business community using social network analysis. In the experiments, we employ support vector machine to train a customer-supplier relationship model on one of the main communities extracted from a supply network in the central area of Japan. The prediction results reveal an F-value of approximately 70% when the model is built by using network-based features, and an F-value of approximately 77% when the model is built by using attribute-based features. When we build the model based on both, F-values are improved to approximately 82%. The results of this research can help to dispel the implicit design space concerning customer-supplier relationships, which can be explored and refined from detailed topological information provided by network structures rather than from traditional and attribute-related enterprise profiles. We also investigate and discuss differences in the predictive accuracy of the model for different sizes of enterprises and types of business communities.
NASA Astrophysics Data System (ADS)
Bhattacharya, Jishnu
We perform first-principles investigations of thermally activated phase transitions and diffusion in solids. The atomic scale energy landscapes are evaluated with first-principles total energy calculations for different structural and configurational microstates. Effective Hamiltonians constructed from the total energies are subjected to Monte Carlo simulations to study thermodynamic and kinetic properties of the solids at finite temperatures. Cubic to tetragonal martensitic phase transitions are investigated beyond the harmonic approximation. As an example, stoichiometric TiH2 is studied where a cubic phase becomes stable at high temperature while ab-initio energy calculations predict the cubic phase to be mechanically unstable with respect to tetragonal distortions at zero Kelvin. An anharmonic Hamiltonian is used to explain the stability of the cubic phase at higher temperature. The importance of anharmonic terms is emphasized and the true nature of the high temperature phase is elucidated beyond the traditional Landau-like explanation. In Li-ion battery electrodes, phase transitions due to atomic redistribution with changes in Li concentration occur with insertion (removal) of Li-ions during discharge (charge). A comprehensive study of the thermodynamics and the non-dilute Li-diffusion mechanisms in spinel-Li1+xTi2 O4 is performed. Two distinct phases are predicted at different lithium compositions. The predicted voltage curve qualitatively matches with experimental observation. The predicted fast diffusion arises from crystallographic features unique to the spinel crystal structure elucidating the crucial role of crystal structure on Li diffusion in intercalation compounds. Effects of anion and guest species on diffusion are elucidated with Li- and Cu-diffusion in spinel-LixTiS2. We predict strong composition dependence of the diffusion coefficients. A unique feature about spinel-LixTiS2 is that the intermediate site of a Li-hop is coordinated by four Li-sites. This results in di- and triple-vacancy mechanisms at non-dilute concentrations with very different migration barriers. The strong dependence of hop mechanisms on local Li-arrangement is at the origin of large concentration dependence of the diffusion coefficients. This contrasts with spinel-Li xTiO2 where the transition states are coordinated only by the end states of the hop, thereby restricting hops to a single vacancy mechanism. Cu ions are predicted to have much slower diffusion rate in TiS 2 host compared to Li ions.
Tuning transport properties on graphene multiterminal structures by mechanical deformations
NASA Astrophysics Data System (ADS)
Latge, Andrea; Torres, Vanessa; Faria, Daiara
The realization of mechanical strain on graphene structures is viewed as a promise route to tune electronic and transport properties such as changing energy band-gaps and promoting localization of states. Using continuum models, mechanical deformations are described by effective gauge fields, mirrored as pseudomagnetic fields that may reach quite high values. Interesting symmetry features are developed due to out of plane deformations on graphene; lift sublattice symmetry was predicted and observed in centrosymmetric bumps and strained nanobubbles. Here we discuss the effects of Gaussian-like strain on a hexagonal graphene flake connected to three leads, modeled as perfect graphene nanoribbons. The Green function formalism is used within a tight-binding approximation. For this particular deformation sharp resonant states are achieved depending on the strained structure details. We also study a fold-strained structure in which the three leads are deformed extending up to the very center of the hexagonal flake. We show that conductance suppressions can be controlled by the strain intensity and important transport features are modeled by the electronic band structure of the leads.
SeqRate: sequence-based protein folding type classification and rates prediction
2010-01-01
Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647
Fabrication and Characterization of Woodpile Structures for Direct Laser Acceleration
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGuinness, C.; Colby, E.; England, R.J.
2010-08-26
An eight and nine layer three dimensional photonic crystal with a defect designed specifically for accelerator applications has been fabricated. The structures were fabricated using a combination of nanofabrication techniques, including low pressure chemical vapor deposition, optical lithography, and chemical mechanical polishing. Limits imposed by the optical lithography set the minimum feature size to 400 nm, corresponding to a structure with a bandgap centered at 4.26 {micro}m. Reflection spectroscopy reveal a peak in reflectivity about the predicted region, and good agreement with simulation is shown. The eight and nine layer structures will be aligned and bonded together to form themore » complete seventeen layer woodpile accelerator structure.« less
NASA Astrophysics Data System (ADS)
Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron
2017-05-01
This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.
Meyer, Irmtraud M
2017-05-01
RNA transcripts are the primary products of active genes in any living organism, including many viruses. Their cellular destiny not only depends on primary sequence signals, but can also be determined by RNA structure. Recent experimental evidence shows that many transcripts can be assigned more than a single functional RNA structure throughout their cellular life and that structure formation happens co-transcriptionally, i.e. as the transcript is synthesised in the cell. Moreover, functional RNA structures are not limited to non-coding transcripts, but can also feature in coding transcripts. The picture that now emerges is that RNA structures constitute an additional layer of information that can be encoded in any RNA transcript (and on top of other layers of information such as protein-context) in order to exert a wide range of functional roles. Moreover, different encoded RNA structures can be expressed at different stages of a transcript's life in order to alter the transcript's behaviour depending on its actual cellular context. Similar to the concept of alternative splicing for protein-coding genes, where a single transcript can yield different proteins depending on cellular context, it is thus appropriate to propose the notion of alternative RNA structure expression for any given transcript. This review introduces several computational strategies that my group developed to detect different aspects of RNA structure expression in vivo. Two aspects are of particular interest to us: (1) RNA secondary structure features that emerge during co-transcriptional folding and (2) functional RNA structure features that are expressed at different times of a transcript's life and potentially mutually exclusive. Copyright © 2017. Published by Elsevier Inc.
R-chie: a web server and R package for visualizing RNA secondary structures
Lai, Daniel; Proctor, Jeff R.; Zhu, Jing Yun A.; Meyer, Irmtraud M.
2012-01-01
Visually examining RNA structures can greatly aid in understanding their potential functional roles and in evaluating the performance of structure prediction algorithms. As many functional roles of RNA structures can already be studied given the secondary structure of the RNA, various methods have been devised for visualizing RNA secondary structures. Most of these methods depict a given RNA secondary structure as a planar graph consisting of base-paired stems interconnected by roundish loops. In this article, we present an alternative method of depicting RNA secondary structure as arc diagrams. This is well suited for structures that are difficult or impossible to represent as planar stem-loop diagrams. Arc diagrams can intuitively display pseudo-knotted structures, as well as transient and alternative structural features. In addition, they facilitate the comparison of known and predicted RNA secondary structures. An added benefit is that structure information can be displayed in conjunction with a corresponding multiple sequence alignments, thereby highlighting structure and primary sequence conservation and variation. We have implemented the visualization algorithm as a web server R-chie as well as a corresponding R package called R4RNA, which allows users to run the software locally and across a range of common operating systems. PMID:22434875
An ensemble model of QSAR tools for regulatory risk assessment.
Pradeep, Prachi; Povinelli, Richard J; White, Shannon; Merrill, Stephen J
2016-01-01
Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa ( κ ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. This feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.
An ensemble model of QSAR tools for regulatory risk assessment
Pradeep, Prachi; Povinelli, Richard J.; White, Shannon; ...
2016-09-22
Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflictingmore » predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. In conclusion, this feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.« less
Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction
Boiteau, Rene M.; Hoyt, David W.; Nicora, Carrie D.; ...
2018-01-17
Here, we introduce a cheminformatics approach that combines highly selective and orthogonal structure elucidation parameters; accurate mass, MS/MS (MS 2), and NMR in a single analysis platform to accurately identify unknown metabolites in untargeted studies. The approach starts with an unknown LC-MS feature, and then combines the experimental MS/MS and NMR information of the unknown to effectively filter the false positive candidate structures based on their predicted MS/MS and NMR spectra. We demonstrate the approach on a model mixture and then we identify an uncatalogued secondary metabolite in Arabidopsis thaliana. The NMR/MS 2 approach is well suited for discovery ofmore » new metabolites in plant extracts, microbes, soils, dissolved organic matter, food extracts, biofuels, and biomedical samples, facilitating the identification of metabolites that are not present in experimental NMR and MS metabolomics databases.« less
System among the corticosteroids: specificity and molecular dynamics
Brookes, Jennifer C.; Galigniana, Mario D.; Harker, Anthony H.; Stoneham, A. Marshall; Vinson, Gavin P.
2012-01-01
Understanding how structural features determine specific biological activities has often proved elusive. With over 161 000 steroid structures described, an algorithm able to predict activity from structural attributes would provide manifest benefits. Molecular simulations of a range of 35 corticosteroids show striking correlations between conformational mobility and biological specificity. Thus steroid ring A is important for glucocorticoid action, and is rigid in the most specific (and potent) examples, such as dexamethasone. By contrast, ring C conformation is important for the mineralocorticoids, and is rigid in aldosterone. Other steroids that are less specific, or have mixed functions, or none at all, are more flexible. One unexpected example is 11-deoxycorticosterone, which the methods predict (and our activity studies confirm) is not only a specific mineralocorticoid, but also has significant glucocorticoid activity. These methods may guide the design of new corticosteroid agonists and antagonists. They will also have application in other examples of ligand–receptor interactions. PMID:21613285
Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boiteau, Rene M.; Hoyt, David W.; Nicora, Carrie D.
Here, we introduce a cheminformatics approach that combines highly selective and orthogonal structure elucidation parameters; accurate mass, MS/MS (MS 2), and NMR in a single analysis platform to accurately identify unknown metabolites in untargeted studies. The approach starts with an unknown LC-MS feature, and then combines the experimental MS/MS and NMR information of the unknown to effectively filter the false positive candidate structures based on their predicted MS/MS and NMR spectra. We demonstrate the approach on a model mixture and then we identify an uncatalogued secondary metabolite in Arabidopsis thaliana. The NMR/MS 2 approach is well suited for discovery ofmore » new metabolites in plant extracts, microbes, soils, dissolved organic matter, food extracts, biofuels, and biomedical samples, facilitating the identification of metabolites that are not present in experimental NMR and MS metabolomics databases.« less
A shock absorber model for structure-borne noise analyses
NASA Astrophysics Data System (ADS)
Benaziz, Marouane; Nacivet, Samuel; Thouverez, Fabrice
2015-08-01
Shock absorbers are often responsible for undesirable structure-borne noise in cars. The early numerical prediction of this noise in the automobile development process can save time and money and yet remains a challenge for industry. In this paper, a new approach to predicting shock absorber structure-borne noise is proposed; it consists in modelling the shock absorber and including the main nonlinear phenomena responsible for discontinuities in the response. The model set forth herein features: compressible fluid behaviour, nonlinear flow rate-pressure relations, valve mechanical equations and rubber mounts. The piston, base valve and complete shock absorber model are compared with experimental results. Sensitivity of the shock absorber response is evaluated and the most important parameters are classified. The response envelope is also computed. This shock absorber model is able to accurately reproduce local nonlinear phenomena and improves our state of knowledge on potential noise sources within the shock absorber.
Structure Elucidation of Unknown Metabolites in Metabolomics by Combined NMR and MS/MS Prediction
Hoyt, David W.; Nicora, Carrie D.; Kinmonth-Schultz, Hannah A.; Ward, Joy K.
2018-01-01
We introduce a cheminformatics approach that combines highly selective and orthogonal structure elucidation parameters; accurate mass, MS/MS (MS2), and NMR into a single analysis platform to accurately identify unknown metabolites in untargeted studies. The approach starts with an unknown LC-MS feature, and then combines the experimental MS/MS and NMR information of the unknown to effectively filter out the false positive candidate structures based on their predicted MS/MS and NMR spectra. We demonstrate the approach on a model mixture, and then we identify an uncatalogued secondary metabolite in Arabidopsis thaliana. The NMR/MS2 approach is well suited to the discovery of new metabolites in plant extracts, microbes, soils, dissolved organic matter, food extracts, biofuels, and biomedical samples, facilitating the identification of metabolites that are not present in experimental NMR and MS metabolomics databases. PMID:29342073
Factors Influencing Progressive Failure Analysis Predictions for Laminated Composite Structure
NASA Technical Reports Server (NTRS)
Knight, Norman F., Jr.
2008-01-01
Progressive failure material modeling methods used for structural analysis including failure initiation and material degradation are presented. Different failure initiation criteria and material degradation models are described that define progressive failure formulations. These progressive failure formulations are implemented in a user-defined material model for use with a nonlinear finite element analysis tool. The failure initiation criteria include the maximum stress criteria, maximum strain criteria, the Tsai-Wu failure polynomial, and the Hashin criteria. The material degradation model is based on the ply-discounting approach where the local material constitutive coefficients are degraded. Applications and extensions of the progressive failure analysis material model address two-dimensional plate and shell finite elements and three-dimensional solid finite elements. Implementation details are described in the present paper. Parametric studies for laminated composite structures are discussed to illustrate the features of the progressive failure modeling methods that have been implemented and to demonstrate their influence on progressive failure analysis predictions.
Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.
Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi
2017-09-22
DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.
Fronto-Temporal Connectivity Predicts ECT Outcome in Major Depression.
Leaver, Amber M; Wade, Benjamin; Vasavada, Megha; Hellemann, Gerhard; Joshi, Shantanu H; Espinoza, Randall; Narr, Katherine L
2018-01-01
Electroconvulsive therapy (ECT) is arguably the most effective available treatment for severe depression. Recent studies have used MRI data to predict clinical outcome to ECT and other antidepressant therapies. One challenge facing such studies is selecting from among the many available metrics, which characterize complementary and sometimes non-overlapping aspects of brain function and connectomics. Here, we assessed the ability of aggregated, functional MRI metrics of basal brain activity and connectivity to predict antidepressant response to ECT using machine learning. A radial support vector machine was trained using arterial spin labeling (ASL) and blood-oxygen-level-dependent (BOLD) functional magnetic resonance imaging (fMRI) metrics from n = 46 (26 female, mean age 42) depressed patients prior to ECT (majority right-unilateral stimulation). Image preprocessing was applied using standard procedures, and metrics included cerebral blood flow in ASL, and regional homogeneity, fractional amplitude of low-frequency modulations, and graph theory metrics (strength, local efficiency, and clustering) in BOLD data. A 5-repeated 5-fold cross-validation procedure with nested feature-selection validated model performance. Linear regressions were applied post hoc to aid interpretation of discriminative features. The range of balanced accuracy in models performing statistically above chance was 58-68%. Here, prediction of non-responders was slightly higher than for responders (maximum performance 74 and 64%, respectively). Several features were consistently selected across cross-validation folds, mostly within frontal and temporal regions. Among these were connectivity strength among: a fronto-parietal network [including left dorsolateral prefrontal cortex (DLPFC)], motor and temporal networks (near ECT electrodes), and/or subgenual anterior cingulate cortex (sgACC). Our data indicate that pattern classification of multimodal fMRI metrics can successfully predict ECT outcome, particularly for individuals who will not respond to treatment. Notably, connectivity with networks highly relevant to ECT and depression were consistently selected as important predictive features. These included the left DLPFC and the sgACC, which are both targets of other neurostimulation therapies for depression, as well as connectivity between motor and right temporal cortices near electrode sites. Future studies that probe additional functional and structural MRI metrics and other patient characteristics may further improve the predictive power of these and similar models.
Visual Prediction Error Spreads Across Object Features in Human Visual Cortex
Summerfield, Christopher; Egner, Tobias
2016-01-01
Visual cognition is thought to rely heavily on contextual expectations. Accordingly, previous studies have revealed distinct neural signatures for expected versus unexpected stimuli in visual cortex. However, it is presently unknown how the brain combines multiple concurrent stimulus expectations such as those we have for different features of a familiar object. To understand how an unexpected object feature affects the simultaneous processing of other expected feature(s), we combined human fMRI with a task that independently manipulated expectations for color and motion features of moving-dot stimuli. Behavioral data and neural signals from visual cortex were then interrogated to adjudicate between three possible ways in which prediction error (surprise) in the processing of one feature might affect the concurrent processing of another, expected feature: (1) feature processing may be independent; (2) surprise might “spread” from the unexpected to the expected feature, rendering the entire object unexpected; or (3) pairing a surprising feature with an expected feature might promote the inference that the two features are not in fact part of the same object. To formalize these rival hypotheses, we implemented them in a simple computational model of multifeature expectations. Across a range of analyses, behavior and visual neural signals consistently supported a model that assumes a mixing of prediction error signals across features: surprise in one object feature spreads to its other feature(s), thus rendering the entire object unexpected. These results reveal neurocomputational principles of multifeature expectations and indicate that objects are the unit of selection for predictive vision. SIGNIFICANCE STATEMENT We address a key question in predictive visual cognition: how does the brain combine multiple concurrent expectations for different features of a single object such as its color and motion trajectory? By combining a behavioral protocol that independently varies expectation of (and attention to) multiple object features with computational modeling and fMRI, we demonstrate that behavior and fMRI activity patterns in visual cortex are best accounted for by a model in which prediction error in one object feature spreads to other object features. These results demonstrate how predictive vision forms object-level expectations out of multiple independent features. PMID:27810936
NASA Astrophysics Data System (ADS)
Chan, Lie Ping
The understanding of the electronic structure of the high-T_{c} superconductors could be important for a full theoretical description of the mechanism behind superconductivity in these materials. In this thesis, we present our measurements of the positron -electron momentum distributions of the cuprate superconductors Bi_2Sr_2CaCu _2O_8, Tl _2Ba_2Ca _2Cu_3O_ {10}, and the organic superconductor kappa-(BEDT)_2Cu(NCS) _2. We use the positron Two-dimensional Angular Correlation of Annihilation Radiation technique to make the measurements on single crystals and compare our high-statistics data with band structure calculations to determine the existence and nature of the respective Fermi surfaces. The spectra from unannealed Bi _2Sr_2CaCu _2O_8 exhibit effects of the superlattice modulation in the BiO_2 layers, and a theoretical understanding of the modulation effects on the electronic band structure is required to interpret these spectra. Since the present theory does not consider the modulation, we have developed a technique to remove the modulation effects from our spectra, and the resultant data when compared with the positron -electron momentum distribution calculation, yield features consistent with the predicted CuO_2 and BiO_2 Fermi surfaces. In the data from unannealed Tl_2Ba _2Ca_2Cu_3 O_{10}, we only observe indications of the TlO Fermi surfaces, and attribute the absence of the predicted CuO_2 Fermi surfaces to the poor sample quality. In the absence of positron-electron momentum calculations for kappa-(BEDT)_2Cu(NCS) _2, we compare our data to electronic band structure calculations, and observed features suggestive of the predicted Fermi surface contributions from the BEDT cation layers. A complete positron-electron calculation for kappa-(BEDT)_2 Cu(NCS)_2 is required to understand the positron wavefunction effects in this material.
Jelínek, Jan; Škoda, Petr; Hoksza, David
2017-12-06
Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
modPDZpep: a web resource for structure based analysis of human PDZ-mediated interaction networks.
Sain, Neetu; Mohanty, Debasisa
2016-09-21
PDZ domains recognize short sequence stretches usually present in C-terminal of their interaction partners. Because of the involvement of PDZ domains in many important biological processes, several attempts have been made for developing bioinformatics tools for genome-wide identification of PDZ interaction networks. Currently available tools for prediction of interaction partners of PDZ domains utilize machine learning approach. Since, they have been trained using experimental substrate specificity data for specific PDZ families, their applicability is limited to PDZ families closely related to the training set. These tools also do not allow analysis of PDZ-peptide interaction interfaces. We have used a structure based approach to develop modPDZpep, a program to predict the interaction partners of human PDZ domains and analyze structural details of PDZ interaction interfaces. modPDZpep predicts interaction partners by using structural models of PDZ-peptide complexes and evaluating binding energy scores using residue based statistical pair potentials. Since, it does not require training using experimental data on peptide binding affinity, it can predict substrates for diverse PDZ families. Because of the use of simple scoring function for binding energy, it is also fast enough for genome scale structure based analysis of PDZ interaction networks. Benchmarking using artificial as well as real negative datasets indicates good predictive power with ROC-AUC values in the range of 0.7 to 0.9 for a large number of human PDZ domains. Another novel feature of modPDZpep is its ability to map novel PDZ mediated interactions in human protein-protein interaction networks, either by utilizing available experimental phage display data or by structure based predictions. In summary, we have developed modPDZpep, a web-server for structure based analysis of human PDZ domains. It is freely available at http://www.nii.ac.in/modPDZpep.html or http://202.54.226.235/modPDZpep.html . This article was reviewed by Michael Gromiha and Zoltán Gáspári.
Algorithm-Dependent Generalization Bounds for Multi-Task Learning.
Liu, Tongliang; Tao, Dacheng; Song, Mingli; Maybank, Stephen J
2017-02-01
Often, tasks are collected for multi-task learning (MTL) because they share similar feature structures. Based on this observation, in this paper, we present novel algorithm-dependent generalization bounds for MTL by exploiting the notion of algorithmic stability. We focus on the performance of one particular task and the average performance over multiple tasks by analyzing the generalization ability of a common parameter that is shared in MTL. When focusing on one particular task, with the help of a mild assumption on the feature structures, we interpret the function of the other tasks as a regularizer that produces a specific inductive bias. The algorithm for learning the common parameter, as well as the predictor, is thereby uniformly stable with respect to the domain of the particular task and has a generalization bound with a fast convergence rate of order O(1/n), where n is the sample size of the particular task. When focusing on the average performance over multiple tasks, we prove that a similar inductive bias exists under certain conditions on the feature structures. Thus, the corresponding algorithm for learning the common parameter is also uniformly stable with respect to the domains of the multiple tasks, and its generalization bound is of the order O(1/T), where T is the number of tasks. These theoretical analyses naturally show that the similarity of feature structures in MTL will lead to specific regularizations for predicting, which enables the learning algorithms to generalize fast and correctly from a few examples.