Sample records for molecular features predicting

  1. Asymmetric bagging and feature selection for activities prediction of drug molecules.

    PubMed

    Li, Guo-Zheng; Meng, Hao-Hua; Lu, Wen-Cong; Yang, Jack Y; Yang, Mary Qu

    2008-05-28

    Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.

  2. Structural features that predict real-value fluctuations of globular proteins

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-01-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193

  3. Communication: Finding destructive interference features in molecular transport junctions.

    PubMed

    Reuter, Matthew G; Hansen, Thorsten

    2014-11-14

    Associating molecular structure with quantum interference features in electrode-molecule-electrode transport junctions has been difficult because existing guidelines for understanding interferences only apply to conjugated hydrocarbons. Herein we use linear algebra and the Landauer-Büttiker theory for electron transport to derive a general rule for predicting the existence and locations of interference features. Our analysis illustrates that interferences can be directly determined from the molecular Hamiltonian and the molecule-electrode couplings, and we demonstrate its utility with several examples.

  4. Breast cancer molecular subtype classification using deep features: preliminary results

    NASA Astrophysics Data System (ADS)

    Zhu, Zhe; Albadawy, Ehab; Saha, Ashirbani; Zhang, Jun; Harowicz, Michael R.; Mazurowski, Maciej A.

    2018-02-01

    Radiogenomics is a field of investigation that attempts to examine the relationship between imaging characteris- tics of cancerous lesions and their genomic composition. This could offer a noninvasive alternative to establishing genomic characteristics of tumors and aid cancer treatment planning. While deep learning has shown its supe- riority in many detection and classification tasks, breast cancer radiogenomic data suffers from a very limited number of training examples, which renders the training of the neural network for this problem directly and with no pretraining a very difficult task. In this study, we investigated an alternative deep learning approach referred to as deep features or off-the-shelf network approach to classify breast cancer molecular subtypes using breast dynamic contrast enhanced MRIs. We used the feature maps of different convolution layers and fully connected layers as features and trained support vector machines using these features for prediction. For the feature maps that have multiple layers, max-pooling was performed along each channel. We focused on distinguishing the Luminal A subtype from other subtypes. To evaluate the models, 10 fold cross-validation was performed and the final AUC was obtained by averaging the performance of all the folds. The highest average AUC obtained was 0.64 (0.95 CI: 0.57-0.71), using the feature maps of the last fully connected layer. This indicates the promise of using this approach to predict the breast cancer molecular subtypes. Since the best performance appears in the last fully connected layer, it also implies that breast cancer molecular subtypes may relate to high level image features

  5. Structural features that predict real-value fluctuations of globular proteins.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-05-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.

  6. Prediction of interface residue based on the features of residue interaction network.

    PubMed

    Jiao, Xiong; Ranganathan, Shoba

    2017-11-07

    Protein-protein interaction plays a crucial role in the cellular biological processes. Interface prediction can improve our understanding of the molecular mechanisms of the related processes and functions. In this work, we propose a classification method to recognize the interface residue based on the features of a weighted residue interaction network. The random forest algorithm is used for the prediction and 16 network parameters and the B-factor are acting as the element of the input feature vector. Compared with other similar work, the method is feasible and effective. The relative importance of these features also be analyzed to identify the key feature for the prediction. Some biological meaning of the important feature is explained. The results of this work can be used for the related work about the structure-function relationship analysis via a residue interaction network model. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection.

    PubMed

    Ju, Zhe; He, Jian-Jun

    2018-06-01

    Lysine glutarylation is new type of protein acylation modification in both prokaryotes and eukaryotes. To better understand the molecular mechanism of glutarylation, it is important to identify glutarylated substrates and their corresponding glutarylation sites accurately. In this study, a novel bioinformatics tool named GlutPred is developed to predict glutarylation sites by using multiple feature extraction and maximum relevance minimum redundancy feature selection. On the one hand, amino acid factors, binary encoding, and the composition of k-spaced amino acid pairs features are incorporated to encode glutarylation sites. And the maximum relevance minimum redundancy method and the incremental feature selection algorithm are adopted to remove the redundant features. On the other hand, a biased support vector machine algorithm is used to handle the imbalanced problem in glutarylation sites training dataset. As illustrated by 10-fold cross-validation, the performance of GlutPred achieves a satisfactory performance with a Sensitivity of 64.80%, a Specificity of 76.60%, an Accuracy of 74.90% and a Matthew's correlation coefficient of 0.3194. Feature analysis shows that some k-spaced amino acid pair features play the most important roles in the prediction of glutarylation sites. The conclusions derived from this study might provide some clues for understanding the molecular mechanisms of glutarylation. Copyright © 2018 Elsevier Inc. All rights reserved.

  8. Prediction and Dissection of Protein-RNA Interactions by Molecular Descriptors.

    PubMed

    Liu, Zhi-Ping; Chen, Luonan

    2016-01-01

    Protein-RNA interactions play crucial roles in numerous biological processes. However, detecting the interactions and binding sites between protein and RNA by traditional experiments is still time consuming and labor costing. Thus, it is of importance to develop bioinformatics methods for predicting protein-RNA interactions and binding sites. Accurate prediction of protein-RNA interactions and recognitions will highly benefit to decipher the interaction mechanisms between protein and RNA, as well as to improve the RNA-related protein engineering and drug design. In this work, we summarize the current bioinformatics strategies of predicting protein-RNA interactions and dissecting protein-RNA interaction mechanisms from local structure binding motifs. In particular, we focus on the feature-based machine learning methods, in which the molecular descriptors of protein and RNA are extracted and integrated as feature vectors of representing the interaction events and recognition residues. In addition, the available methods are classified and compared comprehensively. The molecular descriptors are expected to elucidate the binding mechanisms of protein-RNA interaction and reveal the functional implications from structural complementary perspective.

  9. Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

    PubMed

    Funk, Christopher S; Kahanda, Indika; Ben-Hur, Asa; Verspoor, Karin M

    2015-01-01

    Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.

  10. Response monitoring of breast cancer patients receiving neoadjuvant chemotherapy using quantitative ultrasound, texture, and molecular features

    PubMed Central

    Gangeh, Mehrdad; Tadayyon, Hadi; Sadeghi-Naini, Ali; Gandhi, Sonal; Wright, Frances C.; Slodkowska, Elzbieta; Curpen, Belinda; Tran, William; Czarnota, Gregory J.

    2018-01-01

    Background Pathological response of breast cancer to chemotherapy is a prognostic indicator for long-term disease free and overall survival. Responses of locally advanced breast cancer in the neoadjuvant chemotherapy (NAC) settings are often variable, and the prediction of response is imperfect. The purpose of this study was to detect primary tumor responses early after the start of neoadjuvant chemotherapy using quantitative ultrasound (QUS), textural analysis and molecular features in patients with locally advanced breast cancer. Methods The study included ninety six patients treated with neoadjuvant chemotherapy. Breast tumors were scanned with a clinical ultrasound system prior to chemotherapy treatment, during the first, fourth and eighth week of treatment, and prior to surgery. Quantitative ultrasound parameters and scatterer-based features were calculated from ultrasound radio frequency (RF) data within tumor regions of interest. Additionally, texture features were extracted from QUS parametric maps. Prior to therapy, all patients underwent a core needle biopsy and histological subtypes and biomarker ER, PR, and HER2 status were determined. Patients were classified into three treatment response groups based on combination of clinical and pathological analyses: complete responders (CR), partial responders (PR), and non-responders (NR). Response classifications from QUS parameters, receptors status and pathological were compared. Discriminant analysis was performed on extracted parameters using a support vector machine classifier to categorize subjects into CR, PR, and NR groups at all scan times. Results Of the 96 patients, the number of CR, PR and NR patients were 21, 52, and 23, respectively. The best prediction of treatment response was achieved with the combination mean QUS values, texture and molecular features with accuracies of 78%, 86% and 83% at weeks 1, 4, and 8, after treatment respectively. Mean QUS parameters or clinical receptors status alone

  11. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques

    PubMed Central

    Macyszyn, Luke; Akbari, Hamed; Pisapia, Jared M.; Da, Xiao; Attiah, Mark; Pigrish, Vadim; Bi, Yingtao; Pal, Sharmistha; Davuluri, Ramana V.; Roccograndi, Laura; Dahmane, Nadia; Martinez-Lage, Maria; Biros, George; Wolf, Ronald L.; Bilello, Michel; O'Rourke, Donald M.; Davatzikos, Christos

    2016-01-01

    Background MRI characteristics of brain gliomas have been used to predict clinical outcome and molecular tumor characteristics. However, previously reported imaging biomarkers have not been sufficiently accurate or reproducible to enter routine clinical practice and often rely on relatively simple MRI measures. The current study leverages advanced image analysis and machine learning algorithms to identify complex and reproducible imaging patterns predictive of overall survival and molecular subtype in glioblastoma (GB). Methods One hundred five patients with GB were first used to extract approximately 60 diverse features from preoperative multiparametric MRIs. These imaging features were used by a machine learning algorithm to derive imaging predictors of patient survival and molecular subtype. Cross-validation ensured generalizability of these predictors to new patients. Subsequently, the predictors were evaluated in a prospective cohort of 29 new patients. Results Survival curves yielded a hazard ratio of 10.64 for predicted long versus short survivors. The overall, 3-way (long/medium/short survival) accuracy in the prospective cohort approached 80%. Classification of patients into the 4 molecular subtypes of GB achieved 76% accuracy. Conclusions By employing machine learning techniques, we were able to demonstrate that imaging patterns are highly predictive of patient survival. Additionally, we found that GB subtypes have distinctive imaging phenotypes. These results reveal that when imaging markers related to infiltration, cell density, microvascularity, and blood–brain barrier compromise are integrated via advanced pattern analysis methods, they form very accurate predictive biomarkers. These predictive markers used solely preoperative images, hence they can significantly augment diagnosis and treatment of GB patients. PMID:26188015

  12. Sparse feature selection for classification and prediction of metastasis in endometrial cancer.

    PubMed

    Ahsen, Mehmet Eren; Boren, Todd P; Singh, Nitin K; Misganaw, Burook; Mutch, David G; Moore, Kathleen N; Backes, Floor J; McCourt, Carolyn K; Lea, Jayanthi S; Miller, David S; White, Michael A; Vidyasagar, Mathukumalli

    2017-03-27

    Metastasis via pelvic and/or para-aortic lymph nodes is a major risk factor for endometrial cancer. Lymph-node resection ameliorates risk but is associated with significant co-morbidities. Incidence in patients with stage I disease is 4-22% but no mechanism exists to accurately predict it. Therefore, national guidelines for primary staging surgery include pelvic and para-aortic lymph node dissection for all patients whose tumor exceeds 2cm in diameter. We sought to identify a robust molecular signature that can accurately classify risk of lymph node metastasis in endometrial cancer patients. 86 tumors matched for age and race, and evenly distributed between lymph node-positive and lymph node-negative cases, were selected as a training cohort. Genomic micro-RNA expression was profiled for each sample to serve as the predictive feature matrix. An independent set of 28 tumor samples was collected and similarly characterized to serve as a test cohort. A feature selection algorithm was designed for applications where the number of samples is far smaller than the number of measured features per sample. A predictive miRNA expression signature was developed using this algorithm, which was then used to predict the metastatic status of the independent test cohort. A weighted classifier, using 18 micro-RNAs, achieved 100% accuracy on the training cohort. When applied to the testing cohort, the classifier correctly predicted 90% of node-positive cases, and 80% of node-negative cases (FDR = 6.25%). Results indicate that the evaluation of the quantitative sparse-feature classifier proposed here in clinical trials may lead to significant improvement in the prediction of lymphatic metastases in endometrial cancer patients.

  13. MOlecular MAterials Property Prediction Package (MOMAP) 1.0: a software package for predicting the luminescent properties and mobility of organic functional materials

    NASA Astrophysics Data System (ADS)

    Niu, Yingli; Li, Wenqiang; Peng, Qian; Geng, Hua; Yi, Yuanping; Wang, Linjun; Nan, Guangjun; Wang, Dong; Shuai, Zhigang

    2018-04-01

    MOlecular MAterials Property Prediction Package (MOMAP) is a software toolkit for molecular materials property prediction. It focuses on luminescent properties and charge mobility properties. This article contains a brief descriptive introduction of key features, theoretical models and algorithms of the software, together with examples that illustrate the performance. First, we present the theoretical models and algorithms for molecular luminescent properties calculation, which includes the excited-state radiative/non-radiative decay rate constant and the optical spectra. Then, a multi-scale simulation approach and its algorithm for the molecular charge mobility are described. This approach is based on hopping model and combines with Kinetic Monte Carlo and molecular dynamics simulations, and it is especially applicable for describing a large category of organic semiconductors, whose inter-molecular electronic coupling is much smaller than intra-molecular charge reorganisation energy.

  14. Prediction of lysine ubiquitylation with ensemble classifier and feature selection.

    PubMed

    Zhao, Xiaowei; Li, Xiangtao; Ma, Zhiqiang; Yin, Minghao

    2011-01-01

    Ubiquitylation is an important process of post-translational modification. Correct identification of protein lysine ubiquitylation sites is of fundamental importance to understand the molecular mechanism of lysine ubiquitylation in biological systems. This paper develops a novel computational method to effectively identify the lysine ubiquitylation sites based on the ensemble approach. In the proposed method, 468 ubiquitylation sites from 323 proteins retrieved from the Swiss-Prot database were encoded into feature vectors by using four kinds of protein sequences information. An effective feature selection method was then applied to extract informative feature subsets. After different feature subsets were obtained by setting different starting points in the search procedure, they were used to train multiple random forests classifiers and then aggregated into a consensus classifier by majority voting. Evaluated by jackknife tests and independent tests respectively, the accuracy of the proposed predictor reached 76.82% for the training dataset and 79.16% for the test dataset, indicating that this predictor is a useful tool to predict lysine ubiquitylation sites. Furthermore, site-specific feature analysis was performed and it was shown that ubiquitylation is intimately correlated with the features of its surrounding sites in addition to features derived from the lysine site itself. The feature selection method is available upon request.

  15. Predictive and Prognostic Molecular Biomarkers for Response to Neoadjuvant Chemoradiation in Rectal Cancer.

    PubMed

    Dayde, Delphine; Tanaka, Ichidai; Jain, Rekha; Tai, Mei Chee; Taguchi, Ayumu

    2017-03-07

    The standard of care in locally advanced rectal cancer is neoadjuvant chemoradiation (nCRT) followed by radical surgery. Response to nCRT varies among patients and pathological complete response is associated with better outcome. However, there is a lack of effective methods to select rectal cancer patients who would or would not have a benefit from nCRT. The utility of clinicopathological and radiological features are limited due to lack of adequate sensitivity and specificity. Molecular biomarkers have the potential to predict response to nCRT at an early time point, but none have currently reached the clinic. Integration of diverse types of biomarkers including clinicopathological and imaging features, identification of mechanistic link to tumor biology, and rigorous validation using samples which represent disease heterogeneity, will allow to develop a sensitive and cost-effective molecular biomarker panel for precision medicine in rectal cancer. Here, we aim to review the recent advance in tissue- and blood-based molecular biomarker research and illustrate their potential in predicting nCRT response in rectal cancer.

  16. Learning through Feature Prediction: An Initial Investigation into Teaching Categories to Children with Autism through Predicting Missing Features

    ERIC Educational Resources Information Center

    Sweller, Naomi

    2015-01-01

    Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…

  17. Molecular Docking for Prediction and Interpretation of Adverse Drug Reactions.

    PubMed

    Luo, Heng; Fokoue-Nkoutche, Achille; Singh, Nalini; Yang, Lun; Hu, Jianying; Zhang, Ping

    2018-05-23

    Adverse drug reactions (ADRs) present a major burden for patients and the healthcare industry. Various computational methods have been developed to predict ADRs for drug molecules. However, many of these methods require experimental or surveillance data and cannot be used when only structural information is available. We collected 1,231 small molecule drugs and 600 human proteins and utilized molecular docking to generate binding features among them. We developed machine learning models that use these docking features to make predictions for 1,533 ADRs. These models obtain an overall area under the receiver operating characteristic curve (AUROC) of 0.843 and an overall area under the precision-recall curve (AUPR) of 0.395, outperforming seven structural fingerprint-based prediction models. Using the method, we predicted skin striae for fluticasone propionate, dermatitis acneiform for mometasone, and decreased libido for irinotecan, as demonstrations. Furthermore, we analyzed the top binding proteins associated with some of the ADRs, which can help to understand and/or generate hypotheses for underlying mechanisms of ADRs. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Macromolecular target prediction by self-organizing feature maps.

    PubMed

    Schneider, Gisbert; Schneider, Petra

    2017-03-01

    Rational drug discovery would greatly benefit from a more nuanced appreciation of the activity of pharmacologically active compounds against a diverse panel of macromolecular targets. Already, computational target-prediction models assist medicinal chemists in library screening, de novo molecular design, optimization of active chemical agents, drug re-purposing, in the spotting of potential undesired off-target activities, and in the 'de-orphaning' of phenotypic screening hits. The self-organizing map (SOM) algorithm has been employed successfully for these and other purposes. Areas covered: The authors recapitulate contemporary artificial neural network methods for macromolecular target prediction, and present the basic SOM algorithm at a conceptual level. Specifically, they highlight consensus target-scoring by the employment of multiple SOMs, and discuss the opportunities and limitations of this technique. Expert opinion: Self-organizing feature maps represent a straightforward approach to ligand clustering and classification. Some of the appeal lies in their conceptual simplicity and broad applicability domain. Despite known algorithmic shortcomings, this computational target prediction concept has been proven to work in prospective settings with high success rates. It represents a prototypic technique for future advances in the in silico identification of the modes of action and macromolecular targets of bioactive molecules.

  19. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques.

    PubMed

    Macyszyn, Luke; Akbari, Hamed; Pisapia, Jared M; Da, Xiao; Attiah, Mark; Pigrish, Vadim; Bi, Yingtao; Pal, Sharmistha; Davuluri, Ramana V; Roccograndi, Laura; Dahmane, Nadia; Martinez-Lage, Maria; Biros, George; Wolf, Ronald L; Bilello, Michel; O'Rourke, Donald M; Davatzikos, Christos

    2016-03-01

    MRI characteristics of brain gliomas have been used to predict clinical outcome and molecular tumor characteristics. However, previously reported imaging biomarkers have not been sufficiently accurate or reproducible to enter routine clinical practice and often rely on relatively simple MRI measures. The current study leverages advanced image analysis and machine learning algorithms to identify complex and reproducible imaging patterns predictive of overall survival and molecular subtype in glioblastoma (GB). One hundred five patients with GB were first used to extract approximately 60 diverse features from preoperative multiparametric MRIs. These imaging features were used by a machine learning algorithm to derive imaging predictors of patient survival and molecular subtype. Cross-validation ensured generalizability of these predictors to new patients. Subsequently, the predictors were evaluated in a prospective cohort of 29 new patients. Survival curves yielded a hazard ratio of 10.64 for predicted long versus short survivors. The overall, 3-way (long/medium/short survival) accuracy in the prospective cohort approached 80%. Classification of patients into the 4 molecular subtypes of GB achieved 76% accuracy. By employing machine learning techniques, we were able to demonstrate that imaging patterns are highly predictive of patient survival. Additionally, we found that GB subtypes have distinctive imaging phenotypes. These results reveal that when imaging markers related to infiltration, cell density, microvascularity, and blood-brain barrier compromise are integrated via advanced pattern analysis methods, they form very accurate predictive biomarkers. These predictive markers used solely preoperative images, hence they can significantly augment diagnosis and treatment of GB patients. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. Predicting Relapse in Patients With Medulloblastoma by Integrating Evidence From Clinical and Genomic Features

    PubMed Central

    Tamayo, Pablo; Cho, Yoon-Jae; Tsherniak, Aviad; Greulich, Heidi; Ambrogio, Lauren; Schouten-van Meeteren, Netteke; Zhou, Tianni; Buxton, Allen; Kool, Marcel; Meyerson, Matthew; Pomeroy, Scott L.; Mesirov, Jill P.

    2011-01-01

    Purpose Despite significant progress in the molecular understanding of medulloblastoma, stratification of risk in patients remains a challenge. Focus has shifted from clinical parameters to molecular markers, such as expression of specific genes and selected genomic abnormalities, to improve accuracy of treatment outcome prediction. Here, we show how integration of high-level clinical and genomic features or risk factors, including disease subtype, can yield more comprehensive, accurate, and biologically interpretable prediction models for relapse versus no-relapse classification. We also introduce a novel Bayesian nomogram indicating the amount of evidence that each feature contributes on a patient-by-patient basis. Patients and Methods A Bayesian cumulative log-odds model of outcome was developed from a training cohort of 96 children treated for medulloblastoma, starting with the evidence provided by clinical features of metastasis and histology (model A) and incrementally adding the evidence from gene-expression–derived features representing disease subtype–independent (model B) and disease subtype–dependent (model C) pathways, and finally high-level copy-number genomic abnormalities (model D). The models were validated on an independent test cohort (n = 78). Results On an independent multi-institutional test data set, models A to D attain an area under receiver operating characteristic (au-ROC) curve of 0.73 (95% CI, 0.60 to 0.84), 0.75 (95% CI, 0.64 to 0.86), 0.80 (95% CI, 0.70 to 0.90), and 0.78 (95% CI, 0.68 to 0.88), respectively, for predicting relapse versus no relapse. Conclusion The proposed models C and D outperform the current clinical classification schema (au-ROC, 0.68), our previously published eight-gene outcome signature (au-ROC, 0.71), and several new schemas recently proposed in the literature for medulloblastoma risk stratification. PMID:21357789

  1. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http

  2. Quantitative diffusion weighted imaging parameters in tumor and peritumoral stroma for prediction of molecular subtypes in breast cancer

    NASA Astrophysics Data System (ADS)

    He, Ting; Fan, Ming; Zhang, Peng; Li, Hui; Zhang, Juan; Shao, Guoliang; Li, Lihua

    2018-03-01

    Breast cancer can be classified into four molecular subtypes of Luminal A, Luminal B, HER2 and Basal-like, which have significant differences in treatment and survival outcomes. We in this study aim to predict immunohistochemistry (IHC) determined molecular subtypes of breast cancer using image features derived from tumor and peritumoral stroma region based on diffusion weighted imaging (DWI). A dataset of 126 breast cancer patients were collected who underwent preoperative breast MRI with a 3T scanner. The apparent diffusion coefficients (ADCs) were recorded from DWI, and breast image was segmented into regions comprising the tumor and the surrounding stromal. Statistical characteristics in various breast tumor and peritumoral regions were computed, including mean, minimum, maximum, variance, interquartile range, range, skewness, and kurtosis of ADC values. Additionally, the difference of features between each two regions were also calculated. The univariate logistic based classifier was performed for evaluating the performance of the individual features for discriminating subtypes. For multi-class classification, multivariate logistic regression model was trained and validated. The results showed that the tumor boundary and proximal peritumoral stroma region derived features have a higher performance in classification compared to that of the other regions. Furthermore, the prediction model using statistical features, difference features and all the features combined from these regions generated AUC values of 0.774, 0.796 and 0.811, respectively. The results in this study indicate that ADC feature in tumor and peritumoral stromal region would be valuable for estimating the molecular subtype in breast cancer.

  3. Molecular Pathogenesis and Diagnostic, Prognostic and Predictive Molecular Markers in Sarcoma.

    PubMed

    Mariño-Enríquez, Adrián; Bovée, Judith V M G

    2016-09-01

    Sarcomas are infrequent mesenchymal neoplasms characterized by notable morphological and molecular heterogeneity. Molecular studies in sarcoma provide refinements to morphologic classification, and contribute diagnostic information (frequently), prognostic stratification (rarely) and predict therapeutic response (occasionally). Herein, we summarize the major molecular mechanisms underlying sarcoma pathogenesis and present clinically useful diagnostic, prognostic and predictive molecular markers for sarcoma. Five major molecular alterations are discussed, illustrated with representative sarcoma types, including 1. the presence of chimeric transcription factors, in vascular tumors; 2. abnormal kinase signaling, in gastrointestinal stromal tumor; 3. epigenetic deregulation, in chondrosarcoma, chondroblastoma, and other tumors; 4. deregulated cell survival and proliferation, due to focal copy number alterations, in dedifferentiated liposarcoma; 5. extreme genomic instability, in conventional osteosarcoma as a representative example of sarcomas with highly complex karyotype. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Predicting lysine glycation sites using bi-profile bayes feature extraction.

    PubMed

    Ju, Zhe; Sun, Juhe; Li, Yanjie; Wang, Li

    2017-12-01

    Glycation is a nonenzymatic post-translational modification which has been found to be involved in various biological processes and closely associated with many metabolic diseases. The accurate identification of glycation sites is important to understand the underlying molecular mechanisms of glycation. As the traditional experimental methods are often labor-intensive and time-consuming, it is desired to develop computational methods to predict glycation sites. In this study, a novel predictor named BPB_GlySite is proposed to predict lysine glycation sites by using bi-profile bayes feature extraction and support vector machine algorithm. As illustrated by 10-fold cross-validation, BPB_GlySite achieves a satisfactory performance with a Sensitivity of 63.68%, a Specificity of 72.60%, an Accuracy of 69.63% and a Matthew's correlation coefficient of 0.3499. Experimental results also indicate that BPB_GlySite significantly outperforms three existing glycation sites predictors: NetGlycate, PreGly and Gly-PseAAC. Therefore, BPB_GlySite can be a useful bioinformatics tool for the prediction of glycation sites. A user-friendly web-server for BPB_GlySite is established at 123.206.31.171/BPB_GlySite/. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Prediction of acoustic feature parameters using myoelectric signals.

    PubMed

    Lee, Ki-Seung

    2010-07-01

    It is well-known that a clear relationship exists between human voices and myoelectric signals (MESs) from the area of the speaker's mouth. In this study, we utilized this information to implement a speech synthesis scheme in which MES alone was used to predict the parameters characterizing the vocal-tract transfer function of specific speech signals. Several feature parameters derived from MES were investigated to find the optimal feature for maximization of the mutual information between the acoustic and the MES features. After the optimal feature was determined, an estimation rule for the acoustic parameters was proposed, based on a minimum mean square error (MMSE) criterion. In a preliminary study, 60 isolated words were used for both objective and subjective evaluations. The results showed that the average Euclidean distance between the original and predicted acoustic parameters was reduced by about 30% compared with the average Euclidean distance of the original parameters. The intelligibility of the synthesized speech signals using the predicted features was also evaluated. A word-level identification ratio of 65.5% and a syllable-level identification ratio of 73% were obtained through a listening test.

  6. Extracting physicochemical features to predict protein secondary structure.

    PubMed

    Huang, Yin-Fu; Chen, Shu-Ying

    2013-01-01

    We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.

  7. Extracting Physicochemical Features to Predict Protein Secondary Structure

    PubMed Central

    Chen, Shu-Ying

    2013-01-01

    We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances. PMID:23766688

  8. Which ante mortem clinical features predict progressive supranuclear palsy pathology?

    PubMed

    Respondek, Gesine; Kurz, Carolin; Arzberger, Thomas; Compta, Yaroslau; Englund, Elisabet; Ferguson, Leslie W; Gelpi, Ellen; Giese, Armin; Irwin, David J; Meissner, Wassilios G; Nilsson, Christer; Pantelyat, Alexander; Rajput, Alex; van Swieten, John C; Troakes, Claire; Josephs, Keith A; Lang, Anthony E; Mollenhauer, Brit; Müller, Ulrich; Whitwell, Jennifer L; Antonini, Angelo; Bhatia, Kailash P; Bordelon, Yvette; Corvol, Jean-Christophe; Colosimo, Carlo; Dodel, Richard; Grossman, Murray; Kassubek, Jan; Krismer, Florian; Levin, Johannes; Lorenzl, Stefan; Morris, Huw; Nestor, Peter; Oertel, Wolfgang H; Rabinovici, Gil D; Rowe, James B; van Eimeren, Thilo; Wenning, Gregor K; Boxer, Adam; Golbe, Lawrence I; Litvan, Irene; Stamelou, Maria; Höglinger, Günter U

    2017-07-01

    Progressive supranuclear palsy (PSP) is a neuropathologically defined disease presenting with a broad spectrum of clinical phenotypes. To identify clinical features and investigations that predict or exclude PSP pathology during life, aiming at an optimization of the clinical diagnostic criteria for PSP. We performed a systematic review of the literature published since 1996 to identify clinical features and investigations that may predict or exclude PSP pathology. We then extracted standardized data from clinical charts of patients with pathologically diagnosed PSP and relevant disease controls and calculated the sensitivity, specificity, and positive predictive value of key clinical features for PSP in this cohort. Of 4166 articles identified by the database inquiry, 269 met predefined standards. The literature review identified clinical features predictive of PSP, including features of the following 4 functional domains: ocular motor dysfunction, postural instability, akinesia, and cognitive dysfunction. No biomarker or genetic feature was found reliably validated to predict definite PSP. High-quality original natural history data were available from 206 patients with pathologically diagnosed PSP and from 231 pathologically diagnosed disease controls (54 corticobasal degeneration, 51 multiple system atrophy with predominant parkinsonism, 53 Parkinson's disease, 73 behavioral variant frontotemporal dementia). We identified clinical features that predicted PSP pathology, including phenotypes other than Richardson's syndrome, with varying sensitivity and specificity. Our results highlight the clinical variability of PSP and the high prevalence of phenotypes other than Richardson's syndrome. The features of variant phenotypes with high specificity and sensitivity should serve to optimize clinical diagnosis of PSP. © 2017 International Parkinson and Movement Disorder Society. © 2017 International Parkinson and Movement Disorder Society.

  9. Visual Prediction Error Spreads Across Object Features in Human Visual Cortex

    PubMed Central

    Summerfield, Christopher; Egner, Tobias

    2016-01-01

    Visual cognition is thought to rely heavily on contextual expectations. Accordingly, previous studies have revealed distinct neural signatures for expected versus unexpected stimuli in visual cortex. However, it is presently unknown how the brain combines multiple concurrent stimulus expectations such as those we have for different features of a familiar object. To understand how an unexpected object feature affects the simultaneous processing of other expected feature(s), we combined human fMRI with a task that independently manipulated expectations for color and motion features of moving-dot stimuli. Behavioral data and neural signals from visual cortex were then interrogated to adjudicate between three possible ways in which prediction error (surprise) in the processing of one feature might affect the concurrent processing of another, expected feature: (1) feature processing may be independent; (2) surprise might “spread” from the unexpected to the expected feature, rendering the entire object unexpected; or (3) pairing a surprising feature with an expected feature might promote the inference that the two features are not in fact part of the same object. To formalize these rival hypotheses, we implemented them in a simple computational model of multifeature expectations. Across a range of analyses, behavior and visual neural signals consistently supported a model that assumes a mixing of prediction error signals across features: surprise in one object feature spreads to its other feature(s), thus rendering the entire object unexpected. These results reveal neurocomputational principles of multifeature expectations and indicate that objects are the unit of selection for predictive vision. SIGNIFICANCE STATEMENT We address a key question in predictive visual cognition: how does the brain combine multiple concurrent expectations for different features of a single object such as its color and motion trajectory? By combining a behavioral protocol that

  10. Feature selection using probabilistic prediction of support vector regression.

    PubMed

    Yang, Jian-Bo; Ong, Chong-Jin

    2011-06-01

    This paper presents a new wrapper-based feature selection method for support vector regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiments show that the proposed method generally performs better than, or at least as well as, the existing methods, with notable advantage when the dataset is sparse.

  11. Prior probability and feature predictability interactively bias perceptual decisions

    PubMed Central

    Dunovan, Kyle E.; Tremel, Joshua J.; Wheeler, Mark E.

    2014-01-01

    Anticipating a forthcoming sensory experience facilitates perception for expected stimuli but also hinders perception for less likely alternatives. Recent neuroimaging studies suggest that expectation biases arise from feature-level predictions that enhance early sensory representations and facilitate evidence accumulation for contextually probable stimuli while suppressing alternatives. Reasonably then, the extent to which prior knowledge biases subsequent sensory processing should depend on the precision of expectations at the feature level as well as the degree to which expected features match those of an observed stimulus. In the present study we investigated how these two sources of uncertainty modulated pre- and post-stimulus bias mechanisms in the drift-diffusion model during a probabilistic face/house discrimination task. We tested several plausible models of choice bias, concluding that predictive cues led to a bias in both the starting-point and rate of evidence accumulation favoring the more probable stimulus category. We further tested the hypotheses that prior bias in the starting-point was conditional on the feature-level uncertainty of category expectations and that dynamic bias in the drift-rate was modulated by the match between expected and observed stimulus features. Starting-point estimates suggested that subjects formed a constant prior bias in favor of the face category, which exhibits less feature-level variability, that was strengthened or weakened by trial-wise predictive cues. Furthermore, we found that the gain on face/house evidence was increased for stimuli with less ambiguous features and that this relationship was enhanced by valid category expectations. These findings offer new evidence that bridges psychological models of decision-making with recent predictive coding theories of perception. PMID:24978303

  12. Image Feature Types and Their Predictions of Aesthetic Preference and Naturalness

    PubMed Central

    Ibarra, Frank F.; Kardan, Omid; Hunter, MaryCarol R.; Kotabe, Hiroki P.; Meyer, Francisco A. C.; Berman, Marc G.

    2017-01-01

    Previous research has investigated ways to quantify visual information of a scene in terms of a visual processing hierarchy, i.e., making sense of visual environment by segmentation and integration of elementary sensory input. Guided by this research, studies have developed categories for low-level visual features (e.g., edges, colors), high-level visual features (scene-level entities that convey semantic information such as objects), and how models of those features predict aesthetic preference and naturalness. For example, in Kardan et al. (2015a), 52 participants provided aesthetic preference and naturalness ratings, which are used in the current study, for 307 images of mixed natural and urban content. Kardan et al. (2015a) then developed a model using low-level features to predict aesthetic preference and naturalness and could do so with high accuracy. What has yet to be explored is the ability of higher-level visual features (e.g., horizon line position relative to viewer, geometry of building distribution relative to visual access) to predict aesthetic preference and naturalness of scenes, and whether higher-level features mediate some of the association between the low-level features and aesthetic preference or naturalness. In this study we investigated these relationships and found that low- and high- level features explain 68.4% of the variance in aesthetic preference ratings and 88.7% of the variance in naturalness ratings. Additionally, several high-level features mediated the relationship between the low-level visual features and aaesthetic preference. In a multiple mediation analysis, the high-level feature mediators accounted for over 50% of the variance in predicting aesthetic preference. These results show that high-level visual features play a prominent role predicting aesthetic preference, but do not completely eliminate the predictive power of the low-level visual features. These strong predictors provide powerful insights for future research

  13. Predicting couple therapy outcomes based on speech acoustic features

    PubMed Central

    Nasir, Md; Baucom, Brian Robert; Narayanan, Shrikanth

    2017-01-01

    Automated assessment and prediction of marital outcome in couples therapy is a challenging task but promises to be a potentially useful tool for clinical psychologists. Computational approaches for inferring therapy outcomes using observable behavioral information obtained from conversations between spouses offer objective means for understanding relationship dynamics. In this work, we explore whether the acoustics of the spoken interactions of clinically distressed spouses provide information towards assessment of therapy outcomes. The therapy outcome prediction task in this work includes detecting whether there was a relationship improvement or not (posed as a binary classification) as well as discerning varying levels of improvement or decline in the relationship status (posed as a multiclass recognition task). We use each interlocutor’s acoustic speech signal characteristics such as vocal intonation and intensity, both independently and in relation to one another, as cues for predicting the therapy outcome. We also compare prediction performance with one obtained via standardized behavioral codes characterizing the relationship dynamics provided by human experts as features for automated classification. Our experiments, using data from a longitudinal clinical study of couples in distressed relations, showed that predictions of relationship outcomes obtained directly from vocal acoustics are comparable or superior to those obtained using human-rated behavioral codes as prediction features. In addition, combining direct signal-derived features with manually coded behavioral features improved the prediction performance in most cases, indicating the complementarity of relevant information captured by humans and machine algorithms. Additionally, considering the vocal properties of the interlocutors in relation to one another, rather than in isolation, showed to be important for improving the automatic prediction. This finding supports the notion that behavioral

  14. Stabilizing l1-norm prediction models by supervised feature grouping.

    PubMed

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2016-02-01

    Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l1-norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. PrAS: Prediction of amidation sites using multiple feature extraction.

    PubMed

    Wang, Tong; Zheng, Wei; Wuyun, Qiqige; Wu, Zhenfeng; Ruan, Jishou; Hu, Gang; Gao, Jianzhao

    2017-02-01

    Amidation plays an important role in a variety of pathological processes and serious diseases like neural dysfunction and hypertension. However, identification of protein amidation sites through traditional experimental methods is time consuming and expensive. In this paper, we proposed a novel predictor for Prediction of Amidation Sites (PrAS), which is the first software package for academic users. The method incorporated four representative feature types, which are position-based features, physicochemical and biochemical properties features, predicted structure-based features and evolutionary information features. A novel feature selection method, positive contribution feature selection was proposed to optimize features. PrAS achieved AUC of 0.96, accuracy of 92.1%, sensitivity of 81.2%, specificity of 94.9% and MCC of 0.76 on the independent test set. PrAS is freely available at https://sourceforge.net/p/praspkg. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Accurate prediction of personalized olfactory perception from large-scale chemoinformatic features.

    PubMed

    Li, Hongyang; Panwar, Bharat; Omenn, Gilbert S; Guan, Yuanfang

    2018-02-01

    The olfactory stimulus-percept problem has been studied for more than a century, yet it is still hard to precisely predict the odor given the large-scale chemoinformatic features of an odorant molecule. A major challenge is that the perceived qualities vary greatly among individuals due to different genetic and cultural backgrounds. Moreover, the combinatorial interactions between multiple odorant receptors and diverse molecules significantly complicate the olfaction prediction. Many attempts have been made to establish structure-odor relationships for intensity and pleasantness, but no models are available to predict the personalized multi-odor attributes of molecules. In this study, we describe our winning algorithm for predicting individual and population perceptual responses to various odorants in the DREAM Olfaction Prediction Challenge. We find that random forest model consisting of multiple decision trees is well suited to this prediction problem, given the large feature spaces and high variability of perceptual ratings among individuals. Integrating both population and individual perceptions into our model effectively reduces the influence of noise and outliers. By analyzing the importance of each chemical feature, we find that a small set of low- and nondegenerative features is sufficient for accurate prediction. Our random forest model successfully predicts personalized odor attributes of structurally diverse molecules. This model together with the top discriminative features has the potential to extend our understanding of olfactory perception mechanisms and provide an alternative for rational odorant design.

  17. Predicting age groups of Twitter users based on language and metadata features.

    PubMed

    Morgan-Lopez, Antonio A; Kim, Annice E; Chew, Robert F; Ruddle, Paul

    2017-01-01

    Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for

  18. Identification of informative features for predicting proinflammatory potentials of engine exhausts.

    PubMed

    Wang, Chia-Chi; Lin, Ying-Chi; Lin, Yuan-Chung; Jhang, Syu-Ruei; Tung, Chun-Wei

    2017-08-18

    The immunotoxicity of engine exhausts is of high concern to human health due to the increasing prevalence of immune-related diseases. However, the evaluation of immunotoxicity of engine exhausts is currently based on expensive and time-consuming experiments. It is desirable to develop efficient methods for immunotoxicity assessment. To accelerate the development of safe alternative fuels, this study proposed a computational method for identifying informative features for predicting proinflammatory potentials of engine exhausts. A principal component regression (PCR) algorithm was applied to develop prediction models. The informative features were identified by a sequential backward feature elimination (SBFE) algorithm. A total of 19 informative chemical and biological features were successfully identified by SBFE algorithm. The informative features were utilized to develop a computational method named FS-CBM for predicting proinflammatory potentials of engine exhausts. FS-CBM model achieved a high performance with correlation coefficient values of 0.997 and 0.943 obtained from training and independent test sets, respectively. The FS-CBM model was developed for predicting proinflammatory potentials of engine exhausts with a large improvement on prediction performance compared with our previous CBM model. The proposed method could be further applied to construct models for bioactivities of mixtures.

  19. Cellular automata with object-oriented features for parallel molecular network modeling.

    PubMed

    Zhu, Hao; Wu, Yinghui; Huang, Sui; Sun, Yan; Dhar, Pawan

    2005-06-01

    Cellular automata are an important modeling paradigm for studying the dynamics of large, parallel systems composed of multiple, interacting components. However, to model biological systems, cellular automata need to be extended beyond the large-scale parallelism and intensive communication in order to capture two fundamental properties characteristic of complex biological systems: hierarchy and heterogeneity. This paper proposes extensions to a cellular automata language, Cellang, to meet this purpose. The extended language, with object-oriented features, can be used to describe the structure and activity of parallel molecular networks within cells. Capabilities of this new programming language include object structure to define molecular programs within a cell, floating-point data type and mathematical functions to perform quantitative computation, message passing capability to describe molecular interactions, as well as new operators, statements, and built-in functions. We discuss relevant programming issues of these features, including the object-oriented description of molecular interactions with molecule encapsulation, message passing, and the description of heterogeneity and anisotropy at the cell and molecule levels. By enabling the integration of modeling at the molecular level with system behavior at cell, tissue, organ, or even organism levels, the program will help improve our understanding of how complex and dynamic biological activities are generated and controlled by parallel functioning of molecular networks. Index Terms-Cellular automata, modeling, molecular network, object-oriented.

  20. Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.

    PubMed

    Wang, Yong-Cui; Wang, Yong; Yang, Zhi-Xia; Deng, Nai-Yang

    2011-06-20

    Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.

  1. Interpretable Topic Features for Post-ICU Mortality Prediction.

    PubMed

    Luo, Yen-Fu; Rumshisky, Anna

    2016-01-01

    Electronic health records provide valuable resources for understanding the correlation between various diseases and mortality. The analysis of post-discharge mortality is critical for healthcare professionals to follow up potential causes of death after a patient is discharged from the hospital and give prompt treatment. Moreover, it may reduce the cost derived from readmissions and improve the quality of healthcare. Our work focused on post-discharge ICU mortality prediction. In addition to features derived from physiological measurements, we incorporated ICD-9-CM hierarchy into Bayesian topic model learning and extracted topic features from medical notes. We achieved highest AUCs of 0.835 and 0.829 for 30-day and 6-month post-discharge mortality prediction using baseline and topic proportions derived from Labeled-LDA. Moreover, our work emphasized the interpretability of topic features derived from topic model which may facilitates the understanding and investigation of the complexity between mortality and diseases.

  2. Combined Molecular Dynamics Simulation-Molecular-Thermodynamic Theory Framework for Predicting Surface Tensions.

    PubMed

    Sresht, Vishnu; Lewandowski, Eric P; Blankschtein, Daniel; Jusufi, Arben

    2017-08-22

    A molecular modeling approach is presented with a focus on quantitative predictions of the surface tension of aqueous surfactant solutions. The approach combines classical Molecular Dynamics (MD) simulations with a molecular-thermodynamic theory (MTT) [ Y. J. Nikas, S. Puvvada, D. Blankschtein, Langmuir 1992 , 8 , 2680 ]. The MD component is used to calculate thermodynamic and molecular parameters that are needed in the MTT model to determine the surface tension isotherm. The MD/MTT approach provides the important link between the surfactant bulk concentration, the experimental control parameter, and the surfactant surface concentration, the MD control parameter. We demonstrate the capability of the MD/MTT modeling approach on nonionic alkyl polyethylene glycol surfactants at the air-water interface and observe reasonable agreement of the predicted surface tensions and the experimental surface tension data over a wide range of surfactant concentrations below the critical micelle concentration. Our modeling approach can be extended to ionic surfactants and their mixtures with both ionic and nonionic surfactants at liquid-liquid interfaces.

  3. Predicting age groups of Twitter users based on language and metadata features

    PubMed Central

    Morgan-Lopez, Antonio A.; Chew, Robert F.; Ruddle, Paul

    2017-01-01

    Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles’ metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen’s d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as “school” for youth and “college” for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be

  4. Predicting protein amidation sites by orchestrating amino acid sequence features

    NASA Astrophysics Data System (ADS)

    Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

    2017-08-01

    Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.

  5. Sonographic features of invasive ductal breast carcinomas predictive of malignancy grade.

    PubMed

    Gupta, Kanika; Kumaresan, Meenakshisundaram; Venkatesan, Bhuvaneswari; Chandra, Tushar; Patil, Aruna; Menon, Maya

    2018-01-01

    Assessment of individual sonographic features provides vital clues about the biological behavior of breast masses and can assist in determining histological grade of malignancy and thereby prognosis. Assessment of individual sonographic features of biopsy proven invasive ductal breast carcinomas as predictors of malignancy grade. A retrospective analysis of sonographic findings of 103 biopsy proven invasive ductal breast carcinomas. Tumor characteristics on gray-scale ultrasound and color flow were assessed using American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) Atlas Fifth Edition. The sonographic findings of masses were individually correlated with their histopathologic grades. Chi square test, ordinal regression, and Goodman and Kruskal tau test. Breast mass showing reversal/lack of diastolic flow has a high probability of belonging to histological high grade tumor ( β 1.566, P 0.0001 ). The masses with abrupt interface boundary are more likely grade 3 ( β 1.524, P 0.001 ) in comparison to masses with echogenic halos. The suspicious calcifications present in and outside the mass is a finding associated with histologically high grade tumors. The invasive ductal carcinomas (IDCs) with complex solid and cystic echotexture are more likely to be of high histological grade ( β 1.146, P 0.04 ) as compared to masses with hypoechoic echotexture. Certain ultrasound features are associated with tumor grade on histopathology. If the radiologist is cognizant of these sonographic features, ultrasound can be a potent modality for predicting histopathological grade of IDCs of the breast, especially in settings where advanced tests such as receptor and molecular analyses are limited.

  6. Tumors of the Testis: Morphologic Features and Molecular Alterations.

    PubMed

    Howitt, Brooke E; Berney, Daniel M

    2015-12-01

    This article reviews the most frequently encountered tumor of the testis; pure and mixed malignant testicular germ cell tumors (TGCT), with emphasis on adult (postpubertal) TGCTs and their differential diagnoses. We additionally review TGCT in the postchemotherapy setting, and findings to be integrated into the surgical pathology report, including staging of testicular tumors and other problematic issues. The clinical features, gross pathologic findings, key histologic features, common differential diagnoses, the use of immunohistochemistry, and molecular alterations in TGCTs are discussed. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Molecular Pathology: Predictive, Prognostic, and Diagnostic Markers in Uterine Tumors.

    PubMed

    Ritterhouse, Lauren L; Howitt, Brooke E

    2016-09-01

    This article focuses on the diagnostic, prognostic, and predictive molecular biomarkers in uterine malignancies, in the context of morphologic diagnoses. The histologic classification of endometrial carcinomas is reviewed first, followed by the description and molecular classification of endometrial epithelial malignancies in the context of histologic classification. Taken together, the molecular and histologic classifications help clinicians to approach troublesome areas encountered in clinical practice and evaluate the utility of molecular alterations in the diagnosis and subclassification of endometrial carcinomas. Putative prognostic markers are reviewed. The use of molecular alterations and surrogate immunohistochemistry as prognostic and predictive markers is also discussed. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

    PubMed

    Ni, Qianwu; Chen, Lei

    2017-01-01

    Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  9. Improved Prediction of Blood-Brain Barrier Permeability Through Machine Learning with Combined Use of Molecular Property-Based Descriptors and Fingerprints.

    PubMed

    Yuan, Yaxia; Zheng, Fang; Zhan, Chang-Guo

    2018-03-21

    Blood-brain barrier (BBB) permeability of a compound determines whether the compound can effectively enter the brain. It is an essential property which must be accounted for in drug discovery with a target in the brain. Several computational methods have been used to predict the BBB permeability. In particular, support vector machine (SVM), which is a kernel-based machine learning method, has been used popularly in this field. For SVM training and prediction, the compounds are characterized by molecular descriptors. Some SVM models were based on the use of molecular property-based descriptors (including 1D, 2D, and 3D descriptors) or fragment-based descriptors (known as the fingerprints of a molecule). The selection of descriptors is critical for the performance of a SVM model. In this study, we aimed to develop a generally applicable new SVM model by combining all of the features of the molecular property-based descriptors and fingerprints to improve the accuracy for the BBB permeability prediction. The results indicate that our SVM model has improved accuracy compared to the currently available models of the BBB permeability prediction.

  10. Common features of microRNA target prediction tools

    PubMed Central

    Peterson, Sarah M.; Thompson, Jeffrey A.; Ufkin, Melanie L.; Sathyanarayana, Pradeep; Liaw, Lucy; Congdon, Clare Bates

    2014-01-01

    The human genome encodes for over 1800 microRNAs (miRNAs), which are short non-coding RNA molecules that function to regulate gene expression post-transcriptionally. Due to the potential for one miRNA to target multiple gene transcripts, miRNAs are recognized as a major mechanism to regulate gene expression and mRNA translation. Computational prediction of miRNA targets is a critical initial step in identifying miRNA:mRNA target interactions for experimental validation. The available tools for miRNA target prediction encompass a range of different computational approaches, from the modeling of physical interactions to the incorporation of machine learning. This review provides an overview of the major computational approaches to miRNA target prediction. Our discussion highlights three tools for their ease of use, reliance on relatively updated versions of miRBase, and range of capabilities, and these are DIANA-microT-CDS, miRanda-mirSVR, and TargetScan. In comparison across all miRNA target prediction tools, four main aspects of the miRNA:mRNA target interaction emerge as common features on which most target prediction is based: seed match, conservation, free energy, and site accessibility. This review explains these features and identifies how they are incorporated into currently available target prediction tools. MiRNA target prediction is a dynamic field with increasing attention on development of new analysis tools. This review attempts to provide a comprehensive assessment of these tools in a manner that is accessible across disciplines. Understanding the basis of these prediction methodologies will aid in user selection of the appropriate tools and interpretation of the tool output. PMID:24600468

  11. Common features of microRNA target prediction tools.

    PubMed

    Peterson, Sarah M; Thompson, Jeffrey A; Ufkin, Melanie L; Sathyanarayana, Pradeep; Liaw, Lucy; Congdon, Clare Bates

    2014-01-01

    The human genome encodes for over 1800 microRNAs (miRNAs), which are short non-coding RNA molecules that function to regulate gene expression post-transcriptionally. Due to the potential for one miRNA to target multiple gene transcripts, miRNAs are recognized as a major mechanism to regulate gene expression and mRNA translation. Computational prediction of miRNA targets is a critical initial step in identifying miRNA:mRNA target interactions for experimental validation. The available tools for miRNA target prediction encompass a range of different computational approaches, from the modeling of physical interactions to the incorporation of machine learning. This review provides an overview of the major computational approaches to miRNA target prediction. Our discussion highlights three tools for their ease of use, reliance on relatively updated versions of miRBase, and range of capabilities, and these are DIANA-microT-CDS, miRanda-mirSVR, and TargetScan. In comparison across all miRNA target prediction tools, four main aspects of the miRNA:mRNA target interaction emerge as common features on which most target prediction is based: seed match, conservation, free energy, and site accessibility. This review explains these features and identifies how they are incorporated into currently available target prediction tools. MiRNA target prediction is a dynamic field with increasing attention on development of new analysis tools. This review attempts to provide a comprehensive assessment of these tools in a manner that is accessible across disciplines. Understanding the basis of these prediction methodologies will aid in user selection of the appropriate tools and interpretation of the tool output.

  12. A framework for feature extraction from hospital medical data with applications in risk prediction.

    PubMed

    Tran, Truyen; Luo, Wei; Phung, Dinh; Gupta, Sunil; Rana, Santu; Kennedy, Richard Lee; Larkins, Ann; Venkatesh, Svetha

    2014-12-30

    Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities. Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD-baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes-baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders-baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia-baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72). The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.

  13. A Reduced Set of Features for Chronic Kidney Disease Prediction

    PubMed Central

    Misir, Rajesh; Mitra, Malay; Samanta, Ranjit Kumar

    2017-01-01

    Chronic kidney disease (CKD) is one of the life-threatening diseases. Early detection and proper management are solicited for augmenting survivability. As per the UCI data set, there are 24 attributes for predicting CKD or non-CKD. At least there are 16 attributes need pathological investigations involving more resources, money, time, and uncertainties. The objective of this work is to explore whether we can predict CKD or non-CKD with reasonable accuracy using less number of features. An intelligent system development approach has been used in this study. We attempted one important feature selection technique to discover reduced features that explain the data set much better. Two intelligent binary classification techniques have been adopted for the validity of the reduced feature set. Performances were evaluated in terms of four important classification evaluation parameters. As suggested from our results, we may more concentrate on those reduced features for identifying CKD and thereby reduces uncertainty, saves time, and reduces costs. PMID:28706750

  14. Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.

    PubMed

    Liu, Liang; Cai, Yudong; Lu, Wencong; Feng, Kaiyan; Peng, Chunrong; Niu, Bing

    2009-03-06

    Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.

  15. Patient feature based dosimetric Pareto front prediction in esophageal cancer radiotherapy.

    PubMed

    Wang, Jiazhou; Jin, Xiance; Zhao, Kuaike; Peng, Jiayuan; Xie, Jiang; Chen, Junchao; Zhang, Zhen; Studenski, Matthew; Hu, Weigang

    2015-02-01

    To investigate the feasibility of the dosimetric Pareto front (PF) prediction based on patient's anatomic and dosimetric parameters for esophageal cancer patients. Eighty esophagus patients in the authors' institution were enrolled in this study. A total of 2928 intensity-modulated radiotherapy plans were obtained and used to generate PF for each patient. On average, each patient had 36.6 plans. The anatomic and dosimetric features were extracted from these plans. The mean lung dose (MLD), mean heart dose (MHD), spinal cord max dose, and PTV homogeneity index were recorded for each plan. Principal component analysis was used to extract overlap volume histogram (OVH) features between PTV and other organs at risk. The full dataset was separated into two parts; a training dataset and a validation dataset. The prediction outcomes were the MHD and MLD. The spearman's rank correlation coefficient was used to evaluate the correlation between the anatomical features and dosimetric features. The stepwise multiple regression method was used to fit the PF. The cross validation method was used to evaluate the model. With 1000 repetitions, the mean prediction error of the MHD was 469 cGy. The most correlated factor was the first principal components of the OVH between heart and PTV and the overlap between heart and PTV in Z-axis. The mean prediction error of the MLD was 284 cGy. The most correlated factors were the first principal components of the OVH between heart and PTV and the overlap between lung and PTV in Z-axis. It is feasible to use patients' anatomic and dosimetric features to generate a predicted Pareto front. Additional samples and further studies are required improve the prediction model.

  16. Sonographic features of invasive ductal breast carcinomas predictive of malignancy grade

    PubMed Central

    Gupta, Kanika; Kumaresan, Meenakshisundaram; Venkatesan, Bhuvaneswari; Chandra, Tushar; Patil, Aruna; Menon, Maya

    2018-01-01

    Context: Assessment of individual sonographic features provides vital clues about the biological behavior of breast masses and can assist in determining histological grade of malignancy and thereby prognosis. Aims: Assessment of individual sonographic features of biopsy proven invasive ductal breast carcinomas as predictors of malignancy grade. Settings and Design: A retrospective analysis of sonographic findings of 103 biopsy proven invasive ductal breast carcinomas. Materials and Methods: Tumor characteristics on gray-scale ultrasound and color flow were assessed using American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) Atlas Fifth Edition. The sonographic findings of masses were individually correlated with their histopathologic grades. Statistical Analysis Used: Chi square test, ordinal regression, and Goodman and Kruskal tau test. Results: Breast mass showing reversal/lack of diastolic flow has a high probability of belonging to histological high grade tumor (β 1.566, P 0.0001). The masses with abrupt interface boundary are more likely grade 3 (β 1.524, P 0.001) in comparison to masses with echogenic halos. The suspicious calcifications present in and outside the mass is a finding associated with histologically high grade tumors. The invasive ductal carcinomas (IDCs) with complex solid and cystic echotexture are more likely to be of high histological grade (β 1.146, P 0.04) as compared to masses with hypoechoic echotexture. Conclusions: Certain ultrasound features are associated with tumor grade on histopathology. If the radiologist is cognizant of these sonographic features, ultrasound can be a potent modality for predicting histopathological grade of IDCs of the breast, especially in settings where advanced tests such as receptor and molecular analyses are limited. PMID:29692540

  17. Prediction of quantum interference in molecular junctions using a parabolic diagram: Understanding the origin of Fano and anti- resonances

    NASA Astrophysics Data System (ADS)

    Nozaki, Daijiro; Avdoshenko, Stanislav M.; Sevinçli, Hâldun; Gutierrez, Rafael; Cuniberti, Gianaurelio

    2013-03-01

    Recently the interest in quantum interference (QI) phenomena in molecular devices (molecular junctions) has been growing due to the unique features observed in the transmission spectra. In order to design single molecular devices exploiting QI effects as desired, it is necessary to provide simple rules for predicting the appearance of QI effects such as anti-resonances or Fano line shapes and for controlling them. In this study, we derive a transmission function of a generic molecular junction with a side group (T-shaped molecular junction) using a minimal toy model. We developed a simple method to predict the appearance of quantum interference, Fano resonances or anti- resonances, and its position in the conductance spectrum by introducing a simple graphical representation (parabolic model). Using it we can easily visualize the relation between the key electronic parameters and the positions of normal resonant peaks and anti-resonant peaks induced by quantum interference in the conductance spectrum. We also demonstrate Fano and anti-resonance in T-shaped molecular junctions using a simple tight-binding model. This parabolic model enables one to infer on-site energies of T-shaped molecules and the coupling between side group and main conduction channel from transmission spectra.

  18. Structural features based genome-wide characterization and prediction of nucleosome organization

    PubMed Central

    2012-01-01

    Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene

  19. DemQSAR: predicting human volume of distribution and clearance of drugs

    NASA Astrophysics Data System (ADS)

    Demir-Kavuk, Ozgur; Bentzien, Jörg; Muegge, Ingo; Knapp, Ernst-Walter

    2011-12-01

    In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure-activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VDss) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VDss and CL is

  20. DemQSAR: predicting human volume of distribution and clearance of drugs.

    PubMed

    Demir-Kavuk, Ozgur; Bentzien, Jörg; Muegge, Ingo; Knapp, Ernst-Walter

    2011-12-01

    In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure-activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VD(ss)) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VD(ss) and CL is

  1. Harnessing Computational Biology for Exact Linear B-Cell Epitope Prediction: A Novel Amino Acid Composition-Based Feature Descriptor.

    PubMed

    Saravanan, Vijayakumar; Gautham, Namasivayam

    2015-10-01

    Proteins embody epitopes that serve as their antigenic determinants. Epitopes occupy a central place in integrative biology, not to mention as targets for novel vaccine, pharmaceutical, and systems diagnostics development. The presence of T-cell and B-cell epitopes has been extensively studied due to their potential in synthetic vaccine design. However, reliable prediction of linear B-cell epitope remains a formidable challenge. Earlier studies have reported discrepancy in amino acid composition between the epitopes and non-epitopes. Hence, this study proposed and developed a novel amino acid composition-based feature descriptor, Dipeptide Deviation from Expected Mean (DDE), to distinguish the linear B-cell epitopes from non-epitopes effectively. In this study, for the first time, only exact linear B-cell epitopes and non-epitopes have been utilized for developing the prediction method, unlike the use of epitope-containing regions in earlier reports. To evaluate the performance of the DDE feature vector, models have been developed with two widely used machine-learning techniques Support Vector Machine and AdaBoost-Random Forest. Five-fold cross-validation performance of the proposed method with error-free dataset and dataset from other studies achieved an overall accuracy between nearly 61% and 73%, with balance between sensitivity and specificity metrics. Performance of the DDE feature vector was better (with accuracy difference of about 2% to 12%), in comparison to other amino acid-derived features on different datasets. This study reflects the efficiency of the DDE feature vector in enhancing the linear B-cell epitope prediction performance, compared to other feature representations. The proposed method is made as a stand-alone tool available freely for researchers, particularly for those interested in vaccine design and novel molecular target development for systems therapeutics and diagnostics: https://github.com/brsaran/LBEEP.

  2. Connectome-based predictive modeling of attention: Comparing different functional connectivity features and prediction methods across datasets.

    PubMed

    Yoo, Kwangsun; Rosenberg, Monica D; Hsu, Wei-Ting; Zhang, Sheng; Li, Chiang-Shan R; Scheinost, Dustin; Constable, R Todd; Chun, Marvin M

    2018-02-15

    Connectome-based predictive modeling (CPM; Finn et al., 2015; Shen et al., 2017) was recently developed to predict individual differences in traits and behaviors, including fluid intelligence (Finn et al., 2015) and sustained attention (Rosenberg et al., 2016a), from functional brain connectivity (FC) measured with fMRI. Here, using the CPM framework, we compared the predictive power of three different measures of FC (Pearson's correlation, accordance, and discordance) and two different prediction algorithms (linear and partial least square [PLS] regression) for attention function. Accordance and discordance are recently proposed FC measures that respectively track in-phase synchronization and out-of-phase anti-correlation (Meskaldji et al., 2015). We defined connectome-based models using task-based or resting-state FC data, and tested the effects of (1) functional connectivity measure and (2) feature-selection/prediction algorithm on individualized attention predictions. Models were internally validated in a training dataset using leave-one-subject-out cross-validation, and externally validated with three independent datasets. The training dataset included fMRI data collected while participants performed a sustained attention task and rested (N = 25; Rosenberg et al., 2016a). The validation datasets included: 1) data collected during performance of a stop-signal task and at rest (N = 83, including 19 participants who were administered methylphenidate prior to scanning; Farr et al., 2014a; Rosenberg et al., 2016b), 2) data collected during Attention Network Task performance and rest (N = 41, Rosenberg et al., in press), and 3) resting-state data and ADHD symptom severity from the ADHD-200 Consortium (N = 113; Rosenberg et al., 2016a). Models defined using all combinations of functional connectivity measure (Pearson's correlation, accordance, and discordance) and prediction algorithm (linear and PLS regression) predicted attentional abilities, with

  3. Patient feature based dosimetric Pareto front prediction in esophageal cancer radiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Jiazhou; Zhao, Kuaike; Peng, Jiayuan

    2015-02-15

    Purpose: To investigate the feasibility of the dosimetric Pareto front (PF) prediction based on patient’s anatomic and dosimetric parameters for esophageal cancer patients. Methods: Eighty esophagus patients in the authors’ institution were enrolled in this study. A total of 2928 intensity-modulated radiotherapy plans were obtained and used to generate PF for each patient. On average, each patient had 36.6 plans. The anatomic and dosimetric features were extracted from these plans. The mean lung dose (MLD), mean heart dose (MHD), spinal cord max dose, and PTV homogeneity index were recorded for each plan. Principal component analysis was used to extract overlapmore » volume histogram (OVH) features between PTV and other organs at risk. The full dataset was separated into two parts; a training dataset and a validation dataset. The prediction outcomes were the MHD and MLD. The spearman’s rank correlation coefficient was used to evaluate the correlation between the anatomical features and dosimetric features. The stepwise multiple regression method was used to fit the PF. The cross validation method was used to evaluate the model. Results: With 1000 repetitions, the mean prediction error of the MHD was 469 cGy. The most correlated factor was the first principal components of the OVH between heart and PTV and the overlap between heart and PTV in Z-axis. The mean prediction error of the MLD was 284 cGy. The most correlated factors were the first principal components of the OVH between heart and PTV and the overlap between lung and PTV in Z-axis. Conclusions: It is feasible to use patients’ anatomic and dosimetric features to generate a predicted Pareto front. Additional samples and further studies are required improve the prediction model.« less

  4. Glioma survival prediction with the combined analysis of in vivo 11C-MET-PET, ex vivo and patient features by supervised machine learning.

    PubMed

    Papp, Laszlo; Poetsch, Nina; Grahovac, Marko; Schmidbauer, Victor; Woehrer, Adelheid; Preusser, Matthias; Mitterhauser, Markus; Kiesel, Barbara; Wadsak, Wolfgang; Beyer, Thomas; Hacker, Marcus; Traub-Weidinger, Tatjana

    2017-11-24

    Gliomas are the most common types of tumors in the brain. While the definite diagnosis is routinely made ex vivo by histopathologic and molecular examination, diagnostic work-up of patients with suspected glioma is mainly done by using magnetic resonance imaging (MRI). Nevertheless, L-S-methyl- 11 C-methionine ( 11 C-MET) Positron Emission Tomography (PET) holds a great potential in characterization of gliomas. The aim of this study was to establish machine learning (ML) driven survival models for glioma built on 11 C-MET-PET, ex vivo and patient characteristics. Methods: 70 patients with a treatment naïve glioma, who had a positive 11 C-MET-PET and histopathology-derived ex vivo feature extraction, such as World Health Organization (WHO) 2007 tumor grade, histology and isocitrate dehydrogenase (IDH1-R132H) mutation status were included. The 11 C-MET-positive primary tumors were delineated semi-automatically on PET images followed by the feature extraction of tumor-to-background ratio based general and higher-order textural features by applying five different binning approaches. In vivo and ex vivo features, as well as patient characteristics (age, weight, height, body-mass-index, Karnofsky-score) were merged to characterize the tumors. Machine learning approaches were utilized to identify relevant in vivo, ex vivo and patient features and their relative weights for 36 months survival prediction. The resulting feature weights were used to establish three predictive models per binning configuration based on a combination of: in vivo/ex vivo and clinical patient information (M36IEP), in vivo and patient-only information (M36IP), and in vivo only (M36I). In addition a binning-independent ex vivo and patient-only (M36EP) model was created. The established models were validated in a Monte Carlo (MC) cross-validation scheme. Results: Most prominent ML-selected and -weighted features were patient and ex vivo based followed by in vivo features. The highest area under the

  5. HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features.

    PubMed

    Zaman, Rianon; Chowdhury, Shahana Yasmin; Rashid, Mahmood A; Sharma, Alok; Dehzangi, Abdollah; Shatabda, Swakkhar

    2017-01-01

    DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.

  6. Simulations of star-forming molecular clouds: observational predictions

    NASA Astrophysics Data System (ADS)

    Zhang, Shangjia; Hartmann, Lee; Kuznetsova, Aleksandra; Abelardo Zamora, Manuel

    2018-01-01

    Observations of protostellar molecular cloud cores can be used to test theories of star formation. However, observational results can be biased because of limited information: (a) only two spatial dimensions and one velocity dimension can be measured, (b) and cores generally are not spherically symmetric. We use numerical simulations of the formation and collapse of molecular gas with sink particles to make observational predictions. We use the radiative transfer code LIME to predict CO and NH3 channel maps. We find reasonable agreement with observed velocity structures and gradients but occasional large differences depending on viewing angle.

  7. Practical quantum mechanics-based fragment methods for predicting molecular crystal properties.

    PubMed

    Wen, Shuhao; Nanda, Kaushik; Huang, Yuanhang; Beran, Gregory J O

    2012-06-07

    Significant advances in fragment-based electronic structure methods have created a real alternative to force-field and density functional techniques in condensed-phase problems such as molecular crystals. This perspective article highlights some of the important challenges in modeling molecular crystals and discusses techniques for addressing them. First, we survey recent developments in fragment-based methods for molecular crystals. Second, we use examples from our own recent research on a fragment-based QM/MM method, the hybrid many-body interaction (HMBI) model, to analyze the physical requirements for a practical and effective molecular crystal model chemistry. We demonstrate that it is possible to predict molecular crystal lattice energies to within a couple kJ mol(-1) and lattice parameters to within a few percent in small-molecule crystals. Fragment methods provide a systematically improvable approach to making predictions in the condensed phase, which is critical to making robust predictions regarding the subtle energy differences found in molecular crystals.

  8. Feature maps driven no-reference image quality prediction of authentically distorted images

    NASA Astrophysics Data System (ADS)

    Ghadiyaram, Deepti; Bovik, Alan C.

    2015-03-01

    Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.

  9. Murine glomerulotropic monoclonal antibodies are highly oligoclonal and exhibit distinctive molecular features.

    PubMed

    Lefkowith, J B; Di Valerio, R; Norris, J; Glick, G D; Alexander, A L; Jackson, L; Gilkeson, G S

    1996-08-01

    We recently produced a panel of seven glomerular-binding mAbs from a nephritic MRL-lpr mouse that bind to histones/nucleosomes (group I) or DNA (group II) adherent to glomerular basement membrane. To elucidate the molecular basis of their binding and ontogeny, we sequenced their variable (V) regions, analyzed the apparent somatic mutations, and predicted their three-dimensional structures. There were two clonally related sets (3 of 4 in group I, 3 of 3 in group II) both of the VHJ1558 family, and one mAb of the VH 7183 family. V region somatic mutations within clonally related sets had little effect on glomerular binding and did not appear to be selected for based on glomerular binding. The VH regions were most homologous with those from autoantibodies to histones, DNA, or IgG (i.e., rheumatoid factors), the Vkappa regions, with those from autoantibodies to small nuclear ribonucleoproteins (snRNP). The VH regions also exhibited an unusual VD junction (in the group I clonally related set) and an overall high content of charged amino acids (arginine, aspartic acid) in complementarity-determining regions (CDRs), particularly in CDR3. Molecular modeling studies suggested that the Fv regions of these mAbs converge to form a flat, open surface with a net positive charge. The CDR arginines in group I mAbs; appear to be located in Ag contact regions of the binding cleft. In sum, these data suggest that glomerulotropic mAbs are a highly restricted set of Abs with distinctive molecular features that may mediate their binding to glomeruli.

  10. Prediction of triple-charm molecular pentaquarks

    NASA Astrophysics Data System (ADS)

    Chen, Rui; Hosaka, Atsushi; Liu, Xiang

    2017-12-01

    In a one-boson-exchange model, we study molecular states of double-charm baryon [Ξc c(3621 )] and a charmed meson (D and D*). Our model indicates that there exist two possible triple-charm molecular pentaquarks, a Ξc cD state with I (JP)=0 (1 /2-), and a Ξc cD* state with I (JP)=0 (3 /2-), and we do not find bound solutions for isotriplet states. In addition, we also extend our formula to explore Ξc cB¯(*), Ξc cD¯(*), and Ξc cB(*) systems and find more possible heavy flavor molecular pentaquarks, a Ξc cB ¯ state with I (JP)=0 (1 /2-), a Ξc cB¯* state with I (JP)=0 (3 /2-), and Ξc cD¯*/Ξc cB* states with I (JP)=0 (1 /2-). Experimental research for these predicted triple-charm molecular pentaquarks is encouraged.

  11. Prediction of subjective ratings of emotional pictures by EEG features

    NASA Astrophysics Data System (ADS)

    McFarland, Dennis J.; Parvaz, Muhammad A.; Sarnacki, William A.; Goldstein, Rita Z.; Wolpaw, Jonathan R.

    2017-02-01

    Objective. Emotion dysregulation is an important aspect of many psychiatric disorders. Brain-computer interface (BCI) technology could be a powerful new approach to facilitating therapeutic self-regulation of emotions. One possible BCI method would be to provide stimulus-specific feedback based on subject-specific electroencephalographic (EEG) responses to emotion-eliciting stimuli. Approach. To assess the feasibility of this approach, we studied the relationships between emotional valence/arousal and three EEG features: amplitude of alpha activity over frontal cortex; amplitude of theta activity over frontal midline cortex; and the late positive potential over central and posterior mid-line areas. For each feature, we evaluated its ability to predict emotional valence/arousal on both an individual and a group basis. Twenty healthy participants (9 men, 11 women; ages 22-68) rated each of 192 pictures from the IAPS collection in terms of valence and arousal twice (96 pictures on each of 4 d over 2 weeks). EEG was collected simultaneously and used to develop models based on canonical correlation to predict subject-specific single-trial ratings. Separate models were evaluated for the three EEG features: frontal alpha activity; frontal midline theta; and the late positive potential. In each case, these features were used to simultaneously predict both the normed ratings and the subject-specific ratings. Main results. Models using each of the three EEG features with data from individual subjects were generally successful at predicting subjective ratings on training data, but generalization to test data was less successful. Sparse models performed better than models without regularization. Significance. The results suggest that the frontal midline theta is a better candidate than frontal alpha activity or the late positive potential for use in a BCI-based paradigm designed to modify emotional reactions.

  12. Predicting features of breast cancer with gene expression patterns.

    PubMed

    Lu, Xuesong; Lu, Xin; Wang, Zhigang C; Iglehart, J Dirk; Zhang, Xuegong; Richardson, Andrea L

    2008-03-01

    Data from gene expression arrays hold an enormous amount of biological information. We sought to determine if global gene expression in primary breast cancers contained information about biologic, histologic, and anatomic features of the disease in individual patients. Microarray data from the tumors of 129 patients were analyzed for the ability to predict biomarkers [estrogen receptor (ER) and HER2], histologic features [grade and lymphatic-vascular invasion (LVI)], and stage parameters (tumor size and lymph node metastasis). Multiple statistical predictors were used and the prediction accuracy was determined by cross-validation error rate; multidimensional scaling (MDS) allowed visualization of the predicted states under study. Models built from gene expression data accurately predict ER and HER2 status, and divide tumor grade into high-grade and low-grade clusters; intermediate-grade tumors are not a unique group. In contrast, gene expression data is inaccurate at predicting tumor size, lymph node status or LVI. The best model for prediction of nodal status included tumor size, LVI status and pathologically defined tumor subtype (based on combinations of ER, HER2, and grade); the addition of microarray-based prediction to this model failed to improve the prediction accuracy. Global gene expression supports a binary division of ER, HER2, and grade, clearly separating tumors into two categories; intermediate values for these bio-indicators do not define intermediate tumor subsets. Results are consistent with a model of regional metastasis that depends on inherent biologic differences in metastatic propensity between breast cancer subtypes, upon which time and chance then operate.

  13. Quantitative prediction of drug side effects based on drug-related features.

    PubMed

    Niu, Yanqing; Zhang, Wen

    2017-09-01

    Unexpected side effects of drugs are great concern in the drug development, and the identification of side effects is an important task. Recently, machine learning methods are proposed to predict the presence or absence of interested side effects for drugs, but it is difficult to make the accurate prediction for all of them. In this paper, we transform side effect profiles of drugs as their quantitative scores, by summing up their side effects with weights. The quantitative scores may measure the dangers of drugs, and thus help to compare the risk of different drugs. Here, we attempt to predict quantitative scores of drugs, namely the quantitative prediction. Specifically, we explore a variety of drug-related features and evaluate their discriminative powers for the quantitative prediction. Then, we consider several feature combination strategies (direct combination, average scoring ensemble combination) to integrate three informative features: chemical substructures, targets, and treatment indications. Finally, the average scoring ensemble model which produces the better performances is used as the final quantitative prediction model. Since weights for side effects are empirical values, we randomly generate different weights in the simulation experiments. The experimental results show that the quantitative method is robust to different weights, and produces satisfying results. Although other state-of-the-art methods cannot make the quantitative prediction directly, the prediction results can be transformed as the quantitative scores. By indirect comparison, the proposed method produces much better results than benchmark methods in the quantitative prediction. In conclusion, the proposed method is promising for the quantitative prediction of side effects, which may work cooperatively with existing state-of-the-art methods to reveal dangers of drugs.

  14. Improving link prediction in complex networks by adaptively exploiting multiple structural features of networks

    NASA Astrophysics Data System (ADS)

    Ma, Chuang; Bao, Zhong-Kui; Zhang, Hai-Feng

    2017-10-01

    So far, many network-structure-based link prediction methods have been proposed. However, these methods only highlight one or two structural features of networks, and then use the methods to predict missing links in different networks. The performances of these existing methods are not always satisfied in all cases since each network has its unique underlying structural features. In this paper, by analyzing different real networks, we find that the structural features of different networks are remarkably different. In particular, even in the same network, their inner structural features are utterly different. Therefore, more structural features should be considered. However, owing to the remarkably different structural features, the contributions of different features are hard to be given in advance. Inspired by these facts, an adaptive fusion model regarding link prediction is proposed to incorporate multiple structural features. In the model, a logistic function combing multiple structural features is defined, then the weight of each feature in the logistic function is adaptively determined by exploiting the known structure information. Last, we use the "learnt" logistic function to predict the connection probabilities of missing links. According to our experimental results, we find that the performance of our adaptive fusion model is better than many similarity indices.

  15. Beyond [lambda][subscript max] Part 2: Predicting Molecular Color

    ERIC Educational Resources Information Center

    Williams, Darren L.; Flaherty, Thomas J.; Alnasleh, Bassam K.

    2009-01-01

    A concise roadmap for using computational chemistry programs (i.e., Gaussian 03W) to predict the color of a molecular species is presented. A color-predicting spreadsheet is available with the online material that uses transition wavelengths and peak-shape parameters to predict the visible absorbance spectrum, transmittance spectrum, chromaticity…

  16. Predicting discovery rates of genomic features.

    PubMed

    Gravel, Simon

    2014-06-01

    Successful sequencing experiments require judicious sample selection. However, this selection must often be performed on the basis of limited preliminary data. Predicting the statistical properties of the final sample based on preliminary data can be challenging, because numerous uncertain model assumptions may be involved. Here, we ask whether we can predict "omics" variation across many samples by sequencing only a fraction of them. In the infinite-genome limit, we find that a pilot study sequencing 5% of a population is sufficient to predict the number of genetic variants in the entire population within 6% of the correct value, using an estimator agnostic to demography, selection, or population structure. To reach similar accuracy in a finite genome with millions of polymorphisms, the pilot study would require ∼15% of the population. We present computationally efficient jackknife and linear programming methods that exhibit substantially less bias than the state of the art when applied to simulated data and subsampled 1000 Genomes Project data. Extrapolating based on the National Heart, Lung, and Blood Institute Exome Sequencing Project data, we predict that 7.2% of sites in the capture region would be variable in a sample of 50,000 African Americans and 8.8% in a European sample of equal size. Finally, we show how the linear programming method can also predict discovery rates of various genomic features, such as the number of transcription factor binding sites across different cell types. Copyright © 2014 by the Genetics Society of America.

  17. NetTurnP--neural network prediction of beta-turns by use of evolutionary information and predicted protein sequence features.

    PubMed

    Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl

    2010-11-30

    β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC=0.50, Qtotal=82.1%, sensitivity=75.6%, PPV=68.8% and AUC=0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17-0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences.

  18. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  19. Molecular biological features of male germ cell differentiation

    PubMed Central

    HIROSE, MIKA; TOKUHIRO, KEIZO; TAINAKA, HITOSHI; MIYAGAWA, YASUSHI; TSUJIMURA, AKIRA; OKUYAMA, AKIHIKO; NISHIMUNE, YOSHITAKE

    2007-01-01

    Somatic cell differentiation is required throughout the life of a multicellular organism to maintain homeostasis. In contrast, germ cells have only one specific function; to preserve the species by conveying the parental genes to the next generation. Recent studies of the development and molecular biology of the male germ cell have identified many genes, or isoforms, that are specifically expressed in the male germ cell. In the present review, we consider the unique features of male germ cell differentiation. (Reprod Med Biol 2007; 6: 1–9) PMID:29699260

  20. Ab initio RNA folding by discrete molecular dynamics: From structure prediction to folding mechanisms

    PubMed Central

    Ding, Feng; Sharma, Shantanu; Chalasani, Poornima; Demidov, Vadim V.; Broude, Natalia E.; Dokholyan, Nikolay V.

    2008-01-01

    RNA molecules with novel functions have revived interest in the accurate prediction of RNA three-dimensional (3D) structure and folding dynamics. However, existing methods are inefficient in automated 3D structure prediction. Here, we report a robust computational approach for rapid folding of RNA molecules. We develop a simplified RNA model for discrete molecular dynamics (DMD) simulations, incorporating base-pairing and base-stacking interactions. We demonstrate correct folding of 150 structurally diverse RNA sequences. The majority of DMD-predicted 3D structures have <4 Å deviations from experimental structures. The secondary structures corresponding to the predicted 3D structures consist of 94% native base-pair interactions. Folding thermodynamics and kinetics of tRNAPhe, pseudoknots, and mRNA fragments in DMD simulations are in agreement with previous experimental findings. Folding of RNA molecules features transient, non-native conformations, suggesting non-hierarchical RNA folding. Our method allows rapid conformational sampling of RNA folding, with computational time increasing linearly with RNA length. We envision this approach as a promising tool for RNA structural and functional analyses. PMID:18456842

  1. [Diagnosis, prognosis, and prediction of non-small cell lung cancer. Importance of morphology, immunohistochemistry and molecular pathology].

    PubMed

    Warth, A

    2015-11-01

    Tumor diagnostics are based on histomorphology, immunohistochemistry and molecular pathological analysis of mutations, translocations and amplifications which are of diagnostic, prognostic and/or predictive value. In recent decades only histomorphology was used to classify lung cancer as either small (SCLC) or non-small cell lung cancer (NSCLC), although NSCLC was further subdivided in different entities; however, as no specific therapy options were available classification of specific subtypes was not clinically meaningful. This fundamentally changed with the discovery of specific molecular alterations in adenocarcinoma (ADC), e.g. mutations in KRAS, EGFR and BRAF or translocations of the ALK and ROS1 gene loci, which now form the basis of targeted therapies and have led to a significantly improved patient outcome. The diagnostic, prognostic and predictive value of imaging, morphological, immunohistochemical and molecular characteristics as well as their interaction were systematically assessed in a large cohort with available clinical data including patient survival. Specific and sensitive diagnostic markers and marker panels were defined and diagnostic test algorithms for predictive biomarker assessment were optimized. It was demonstrated that the semi-quantitative assessment of ADC growth patterns is a stage-independent predictor of survival and is reproducibly applicable in the routine setting. Specific histomorphological characteristics correlated with computed tomography (CT) imaging features and thus allowed an improved interdisciplinary classification, especially in the preoperative or palliative setting. Moreover, specific molecular characteristics, for example BRAF mutations and the proliferation index (Ki-67) were identified as clinically relevant prognosticators. Comprehensive clinical, morphological, immunohistochemical and molecular assessment of NSCLCs allow an optimized patient stratification. Respective algorithms now form the backbone of the 2015

  2. Local-feature analysis for automated coarse-graining of bulk-polymer molecular dynamics simulations.

    PubMed

    Xue, Y; Ludovice, P J; Grover, M A

    2012-12-01

    A method for automated coarse-graining of bulk polymers is presented, using the data-mining tool of local feature analysis. Most existing methods for polymer coarse-graining define superatoms based on their covalent bonding topology along the polymer backbone, but here superatoms are defined based only on their correlated motions, as observed in molecular dynamics simulations. Correlated atomic motions are identified in the simulation data using local feature analysis, between atoms in the same or in different polymer chains. Groups of highly correlated atoms constitute the superatoms in the coarse-graining scheme, and the positions of their seed coordinates are then projected forward in time. Based on only the seed positions, local feature analysis enables the full reconstruction of all atomic positions. This reconstruction suggests an iterative scheme to reduce the computation of the simulations to initialize another short molecular dynamic simulation, identify new superatoms, and again project forward in time.

  3. Category-based predictions: influence of uncertainty and feature associations.

    PubMed

    Ross, B H; Murphy, G L

    1996-05-01

    Four experiments examined how people make inductive inferences using categories. Subjects read stories in which 2 categories were mentioned as possible identities of an object. The less likely category was varied to determine if people were using it, as well as the most likely category, in making predictions about the object. Experiment 1 showed that even when categorization uncertainty was emphasized, subjects used only 1 category as the basis for their prediction. Experiments 2-4 examined whether people would use multiple categories for making predictions when the feature to be predicted was associated to the less likely category. Multiple categories were used in this case, but only in limited circumstances; furthermore, using multiple categories in 1 prediction did not cause subjects to use them for subsequent predictions. The results increase the understanding of how categories are used in inductive inference.

  4. Nonstationary time series prediction combined with slow feature analysis

    NASA Astrophysics Data System (ADS)

    Wang, G.; Chen, X.

    2015-07-01

    Almost all climate time series have some degree of nonstationarity due to external driving forces perturbing the observed system. Therefore, these external driving forces should be taken into account when constructing the climate dynamics. This paper presents a new technique of obtaining the driving forces of a time series from the slow feature analysis (SFA) approach, and then introduces them into a predictive model to predict nonstationary time series. The basic theory of the technique is to consider the driving forces as state variables and to incorporate them into the predictive model. Experiments using a modified logistic time series and winter ozone data in Arosa, Switzerland, were conducted to test the model. The results showed improved prediction skills.

  5. Predicting Response to Neoadjuvant Chemoradiotherapy in Esophageal Cancer with Textural Features Derived from Pretreatment 18F-FDG PET/CT Imaging.

    PubMed

    Beukinga, Roelof J; Hulshoff, Jan B; van Dijk, Lisanne V; Muijs, Christina T; Burgerhof, Johannes G M; Kats-Ugurlu, Gursah; Slart, Riemer H J A; Slump, Cornelis H; Mul, Véronique E M; Plukker, John Th M

    2017-05-01

    Adequate prediction of tumor response to neoadjuvant chemoradiotherapy (nCRT) in esophageal cancer (EC) patients is important in a more personalized treatment. The current best clinical method to predict pathologic complete response is SUV max in 18 F-FDG PET/CT imaging. To improve the prediction of response, we constructed a model to predict complete response to nCRT in EC based on pretreatment clinical parameters and 18 F-FDG PET/CT-derived textural features. Methods: From a prospectively maintained single-institution database, we reviewed 97 consecutive patients with locally advanced EC and a pretreatment 18 F-FDG PET/CT scan between 2009 and 2015. All patients were treated with nCRT (carboplatin/paclitaxel/41.4 Gy) followed by esophagectomy. We analyzed clinical, geometric, and pretreatment textural features extracted from both 18 F-FDG PET and CT. The current most accurate prediction model with SUV max as a predictor variable was compared with 6 different response prediction models constructed using least absolute shrinkage and selection operator regularized logistic regression. Internal validation was performed to estimate the model's performances. Pathologic response was defined as complete versus incomplete response (Mandard tumor regression grade system 1 vs. 2-5). Results: Pathologic examination revealed 19 (19.6%) complete and 78 (80.4%) incomplete responders. Least absolute shrinkage and selection operator regularization selected the clinical parameters: histologic type and clinical T stage, the 18 F-FDG PET-derived textural feature long run low gray level emphasis, and the CT-derived textural feature run percentage. Introducing these variables to a logistic regression analysis showed areas under the receiver-operating-characteristic curve (AUCs) of 0.78 compared with 0.58 in the SUV max model. The discrimination slopes were 0.17 compared with 0.01, respectively. After internal validation, the AUCs decreased to 0.74 and 0.54, respectively. Conclusion

  6. Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

    PubMed

    Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu

    2012-12-01

    Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The

  7. Self-adaptive MOEA feature selection for classification of bankruptcy prediction data.

    PubMed

    Gaspar-Cunha, A; Recio, G; Costa, L; Estébanez, C

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.

  8. Perceptual quality prediction on authentically distorted images using a bag of features approach

    PubMed Central

    Ghadiyaram, Deepti; Bovik, Alan C.

    2017-01-01

    Current top-performing blind perceptual image quality prediction models are generally trained on legacy databases of human quality opinion scores on synthetically distorted images. Therefore, they learn image features that effectively predict human visual quality judgments of inauthentic and usually isolated (single) distortions. However, real-world images usually contain complex composite mixtures of multiple distortions. We study the perceptually relevant natural scene statistics of such authentically distorted images in different color spaces and transform domains. We propose a “bag of feature maps” approach that avoids assumptions about the type of distortion(s) contained in an image and instead focuses on capturing consistencies—or departures therefrom—of the statistics of real-world images. Using a large database of authentically distorted images, human opinions of them, and bags of features computed on them, we train a regressor to conduct image quality prediction. We demonstrate the competence of the features toward improving automatic perceptual quality prediction by testing a learned algorithm using them on a benchmark legacy database as well as on a newly introduced distortion-realistic resource called the LIVE In the Wild Image Quality Challenge Database. We extensively evaluate the perceptual quality prediction model and algorithm and show that it is able to achieve good-quality prediction power that is better than other leading models. PMID:28129417

  9. Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection.

    PubMed

    Gao, Yu-Fei; Li, Bi-Qing; Cai, Yu-Dong; Feng, Kai-Yan; Li, Zhan-Dong; Jiang, Yang

    2013-01-27

    Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.

  10. Dynamics of Molecular Emission Features from Nanosecond, Femtosecond Laser and Filament Ablation Plasmas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harilal, Sivanandan S.; Yeak, J.; Brumfield, Brian E.

    2016-06-15

    The evolutionary paths of molecular species and nanoparticles in laser ablation plumes are not well understood due to the complexity of numerous physical processes that occur simultaneously in a transient laser-plasma system. It is well known that the emission features of ions, atoms, molecules and nanoparticles in a laser ablation plume strongly depend on the laser irradiation conditions. In this letter we report the temporal emission features of AlO molecules in plasmas generated using a nanosecond laser, a femtosecond laser and filaments generated from a femtosecond laser. Our results show that, at a fixed laser energy, the persistence of AlOmore » is found to be highest and lowest in ns and filament laser plasmas respectively while molecular species are formed at early times for both ultrashort pulse (fs and filament) generated plasmas. Analysis of the AlO emission band features show that the vibrational temperature of AlO decays rapidly in filament assisted laser ablation plumes.« less

  11. Molecular factor computing for predictive spectroscopy.

    PubMed

    Dai, Bin; Urbas, Aaron; Douglas, Craig C; Lodder, Robert A

    2007-08-01

    The concept of molecular factor computing (MFC)-based predictive spectroscopy was demonstrated here with quantitative analysis of ethanol-in-water mixtures in a MFC-based prototype instrument. Molecular computing of vectors for transformation matrices enabled spectra to be represented in a desired coordinate system. New coordinate systems were selected to reduce the dimensionality of the spectral hyperspace and simplify the mechanical/electrical/computational construction of a new MFC spectrometer employing transmission MFC filters. A library search algorithm was developed to calculate the chemical constituents of the MFC filters. The prototype instrument was used to collect data from 39 ethanol-in-water mixtures (range 0-14%). For each sample, four different voltage outputs from the detector (forming two factor scores) were measured by using four different MFC filters. Twenty samples were used to calibrate the instrument and build a multivariate linear regression prediction model, and the remaining samples were used to validate the predictive ability of the model. In engineering simulations, four MFC filters gave an adequate calibration model (r2 = 0.995, RMSEC = 0.229%, RMSECV = 0.339%, p = 0.05 by f test). This result is slightly better than a corresponding PCR calibration model based on corrected transmission spectra (r2 = 0.993, RMSEC = 0.359%, RMSECV = 0.551%, p = 0.05 by f test). The first actual MFC prototype gave an RMSECV = 0.735%. MFC was a viable alternative to conventional spectrometry with the potential to be more simply implemented and more rapid and accurate.

  12. Molecular dissection of colorectal cancer in pre-clinical models identifies biomarkers predicting sensitivity to EGFR inhibitors

    PubMed Central

    Schütte, Moritz; Risch, Thomas; Abdavi-Azar, Nilofar; Boehnke, Karsten; Schumacher, Dirk; Keil, Marlen; Yildiriman, Reha; Jandrasits, Christine; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Worth, Catherine L.; Schweiger, Caroline; Liebs, Sandra; Lange, Martin; Warnatz, Hans- Jörg; Butcher, Lee M.; Barrett, James E.; Sultan, Marc; Wierling, Christoph; Golob-Schwarzl, Nicole; Lax, Sigurd; Uranitsch, Stefan; Becker, Michael; Welte, Yvonne; Regan, Joseph Lewis; Silvestrov, Maxine; Kehler, Inge; Fusi, Alberto; Kessler, Thomas; Herwig, Ralf; Landegren, Ulf; Wienke, Dirk; Nilsson, Mats; Velasco, Juan A.; Garin-Chesa, Pilar; Reinhard, Christoph; Beck, Stephan; Schäfer, Reinhold; Regenbrecht, Christian R. A.; Henderson, David; Lange, Bodo; Haybaeck, Johannes; Keilholz, Ulrich; Hoffmann, Jens; Lehrach, Hans; Yaspo, Marie-Laure

    2017-01-01

    Colorectal carcinoma represents a heterogeneous entity, with only a fraction of the tumours responding to available therapies, requiring a better molecular understanding of the disease in precision oncology. To address this challenge, the OncoTrack consortium recruited 106 CRC patients (stages I–IV) and developed a pre-clinical platform generating a compendium of drug sensitivity data totalling >4,000 assays testing 16 clinical drugs on patient-derived in vivo and in vitro models. This large biobank of 106 tumours, 35 organoids and 59 xenografts, with extensive omics data comparing donor tumours and derived models provides a resource for advancing our understanding of CRC. Models recapitulate many of the genetic and transcriptomic features of the donors, but defined less complex molecular sub-groups because of the loss of human stroma. Linking molecular profiles with drug sensitivity patterns identifies novel biomarkers, including a signature outperforming RAS/RAF mutations in predicting sensitivity to the EGFR inhibitor cetuximab. PMID:28186126

  13. Nonstationary time series prediction combined with slow feature analysis

    NASA Astrophysics Data System (ADS)

    Wang, G.; Chen, X.

    2015-01-01

    Almost all climate time series have some degree of nonstationarity due to external driving forces perturbations of the observed system. Therefore, these external driving forces should be taken into account when reconstructing the climate dynamics. This paper presents a new technique of combining the driving force of a time series obtained using the Slow Feature Analysis (SFA) approach, then introducing the driving force into a predictive model to predict non-stationary time series. In essence, the main idea of the technique is to consider the driving forces as state variables and incorporate them into the prediction model. To test the method, experiments using a modified logistic time series and winter ozone data in Arosa, Switzerland, were conducted. The results showed improved and effective prediction skill.

  14. Molecular effective coverage surface area of optical clearing agents for predicting optical clearing potential

    NASA Astrophysics Data System (ADS)

    Feng, Wei; Ma, Ning; Zhu, Dan

    2015-03-01

    The improvement of methods for optical clearing agent prediction exerts an important impact on tissue optical clearing technique. The molecular dynamic simulation is one of the most convincing and simplest approaches to predict the optical clearing potential of agents by analyzing the hydrogen bonds, hydrogen bridges and hydrogen bridges type forming between agents and collagen. However, the above analysis methods still suffer from some problem such as analysis of cyclic molecule by reason of molecular conformation. In this study, a molecular effective coverage surface area based on the molecular dynamic simulation was proposed to predict the potential of optical clearing agents. Several typical cyclic molecules, fructose, glucose and chain molecules, sorbitol, xylitol were analyzed by calculating their molecular effective coverage surface area, hydrogen bonds, hydrogen bridges and hydrogen bridges type, respectively. In order to verify this analysis methods, in vitro skin samples optical clearing efficacy were measured after 25 min immersing in the solutions, fructose, glucose, sorbitol and xylitol at concentration of 3.5 M using 1951 USAF resolution test target. The experimental results show accordance with prediction of molecular effective coverage surface area. Further to compare molecular effective coverage surface area with other parameters, it can show that molecular effective coverage surface area has a better performance in predicting OCP of agents.

  15. Mucosal melanoma: correlation of clinicopathologic, prognostic, and molecular features.

    PubMed

    Gru, Alejandro A; Becker, Nils; Dehner, Louis P; Pfeifer, John D

    2014-08-01

    Although the presence of the t(12;22)(q13;q12) translocation (the defining molecular feature of malignant melanoma of soft parts/clear cell sarcoma) in cutaneous melanoma has been investigated, no large-scale studies have been performed among mucosal melanoma (MucM). In this study we assessed the prevalence of the EWSR1 rearrangement in primary MucM, and analyzed gross and microscopic features with their potential impact on diagnosis and prognosis. Overall, 132 specimens from 84 patients were included. A total of 55 cases had an intramucosal component. Survival of MucMs of the head and neck was associated with two independent factors: size and histology. Tumors more than 3 cm in greatest dimension had an average survival of 12.75 months; those 3 cm or less had an average survival of 38.3 months (P=0.035). Purely epithelioid tumors had an average worse survival of 16.8 months (P=0.028). A cut-off value of 1 mm for Breslow depth provided a statistically significant difference in survival at both 3 and 5 years (P=-0.02) by multivariate analysis in the gynecologic tract. At the molecular level three cases had a EWSR1 rearrangement by fluorescent in-situ hybridization, but only one with an intramucosal component. None of the 58 cases tested by PCR showed the presence of the EWSR1 rearrangement. With the exception of vulvar melanomas, the prognosis of mucosal-associated melanomas was poor and there was a suggestion that spindle morphology may be more favorable. Our study also showed that the EWSR1 rearrangement was very uncommon among MucM. Though 'clear cell sarcoma' is embedded in the sarcoma literature, the synonym 'melanoma of soft parts' has considerable justification in light of our evolving understanding of the molecular genetics in the family of malignant melanomas.

  16. Category labels versus feature labels: category labels polarize inferential predictions.

    PubMed

    Yamauchi, Takashi; Yu, Na-Yung

    2008-04-01

    What makes category labels different from feature labels in predictive inference? This study suggests that category labels tend to make inductive reasoning polarized and homogeneous. In two experiments, participants were shown two schematic pictures of insects side by side and predicted the value of a hidden feature of one insect on the basis of the other insect. Arbitrary verbal labels were shown above the two pictures, and the meanings of the labels were manipulated in the instructions. In one condition, the labels represented the category membership of the insects, and in the other conditions, the same labels represented attributes of the insects. When the labels represented category membership, participants' responses became substantially polarized and homogeneous, indicating that the mere reference to category membership can modify reasoning processes.

  17. OPTIC NERVE INFILTRATION BY RETINOBLASTOMA: Predictive Clinical Features and Outcome.

    PubMed

    Kaliki, Swathi; Tahiliani, Prerana; Mishra, Dilip K; Srinivasan, Visweswaran; Ali, Mohammed Hasnat; Reddy, Vijay Anand P

    2016-06-01

    To identify the clinical features predictive of any optic nerve infiltration and postlaminar optic nerve infiltration by retinoblastoma on histopathology and to report the outcome (metastasis and death) in these patients. Retrospective study. Of the 403 patients who underwent primary enucleation for retinoblastoma, 196 patients had optic nerve tumor infiltration (Group 1) and 207 patients had no evidence of optic nerve tumor infiltration (Group 2). Group 1 included patients with prelaminar (n = 47; 24%), laminar (n = 74; 38%), and postlaminar tumor infiltration with or without involving optic nerve transection (n = 74; 38%). Comparing Group 1 and Group 2, the patients in Group 1 had prolonged duration of symptoms (>6 months) (16% vs. 8%; P = 0.02) and were associated with no vision at presentation (23% vs. 10%; P = 0.01), higher rates of secondary glaucoma (42% vs. 12%; P < 0.0001), iris neovascularization (39% vs. 23%; P < 0.001), and larger tumors (mean tumor thickness, 12.8 mm vs. 12 mm; P = 0.0001). There was a higher prevalence of metastasis in Group 1 than in Group 2 (4% vs. 0%; P = 0.006). On multivariate analysis, clinical features predictive of any optic nerve tumor infiltration secondary glaucoma (hazard ratio = 5.38; P < 0.001) and those predictive of postlaminar optic nerve tumor infiltration included iris neovascularization (hazard ratio = 2.66; P = 0.001) and secondary glaucoma (hazard ratio = 3.13; P < 0.001). In this study, clinical features predictive of any optic nerve tumor infiltration included secondary glaucoma and those predictive of postlaminar optic nerve tumor infiltration included iris neovascularization and secondary glaucoma. Despite adjuvant treatment in those with postlaminar optic nerve tumor infiltration, metastasis occurred in 8% of patients.

  18. Actigraphy features for predicting mobility disability in older adults

    USDA-ARS?s Scientific Manuscript database

    Actigraphy has attracted much attention for assessing physical activity in the past decade. Many algorithms have been developed to automate the analysis process, but none has targeted a general model to discover related features for detecting or predicting mobility function, or more specifically, mo...

  19. Molecular structures of carotenoids as predicted by MNDO-AM1 molecular orbital calculations

    NASA Astrophysics Data System (ADS)

    Hashimoto, Hideki; Yoda, Takeshi; Kobayashi, Takayoshi; Young, Andrew J.

    2002-02-01

    Semi-empirical molecular orbital calculations using AM1 Hamiltonian (MNDO-AM1 method) were performed for a number of biologically important carotenoid molecules, namely all- trans-β-carotene, all- trans-zeaxanthin, and all- trans-violaxanthin (found in higher plants and algae) together with all- trans-canthaxanthin, all- trans-astaxanthin, and all- trans-tunaxanthin in order to predict their stable structures. The molecular structures of all- trans-β-carotene, all- trans-canthaxanthin, and all- trans-astaxanthin predicted based on molecular orbital calculations were compared with those determined by X-ray crystallography. Predicted bond lengths, bond angles, and dihedral angles showed an excellent agreement with those determined experimentally, a fact that validated the present theoretical calculations. Comparison of the bond lengths, bond angles and dihedral angles of the most stable conformer among all the carotenoid molecules showed that the displacements are localized around the substituent groups and hence around the cyclohexene rings. The most stable conformers of all- trans-zeaxanthin and all- trans-violaxanthin gave rise to a torsion angle around the C6-C7 bond to be ±48.7 and -84.8°, respectively. This difference is a key factor in relation to the biological function of these two carotenoids in plants and algae (the xanthophyll cycle). Further analyses by calculating the atomic charges and using enpartment calculations (division of bond energies between component atoms) were performed to ascribe the cause of the different observed torsion angles.

  20. BDDCS Class Prediction for New Molecular Entities

    PubMed Central

    Broccatelli, Fabio; Cruciani, Gabriele; Benet, Leslie Z.; Oprea, Tudor I.

    2012-01-01

    The Biopharmaceutics Drug Disposition Classification System (BDDCS) was successfully employed for predicting drug-drug interactions (DDIs) with respect to drug metabolizing enzymes (DMEs), drug transporters and their interplay. The major assumption of BDDCS is that the extent of metabolism (EoM) predicts high versus low intestinal permeability rate, and vice versa, at least when uptake transporters or paracellular transport are not involved. We recently published a collection of over 900 marketed drugs classified for BDDCS. We suggest that a reliable model for predicting BDDCS class, integrated with in vitro assays, could anticipate disposition and potential DDIs of new molecular entities (NMEs). Here we describe a computational procedure for predicting BDDCS class from molecular structures. The model was trained on a set of 300 oral drugs, and validated on an external set of 379 oral drugs, using 17 descriptors calculated or derived from the VolSurf+ software. For each molecule, a probability of BDDCS class membership was given, based on predicted EoM, FDA solubility (FDAS) and their confidence scores. The accuracy in predicting FDAS was 78% in training and 77% in validation, while for EoM prediction the accuracy was 82% in training and 79% in external validation. The actual BDDCS class corresponded to the highest ranked calculated class for 55% of the validation molecules, and it was within the top two ranked more than 92% of the times. The unbalanced stratification of the dataset didn’t affect the prediction, which showed highest accuracy in predicting classes 2 and 3 with respect to the most populated class 1. For class 4 drugs a general lack of predictability was observed. A linear discriminant analysis (LDA) confirmed the degree of accuracy for the prediction of the different BDDCS classes is tied to the structure of the dataset. This model could routinely be used in early drug discovery to prioritize in vitro tests for NMEs (e.g., affinity to transporters

  1. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  2. Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules.

    PubMed

    Pronobis, Wiktor; Tkatchenko, Alexandre; Müller, Klaus-Robert

    2018-06-12

    Machine learning (ML) based prediction of molecular properties across chemical compound space is an important and alternative approach to efficiently estimate the solutions of highly complex many-electron problems in chemistry and physics. Statistical methods represent molecules as descriptors that should encode molecular symmetries and interactions between atoms. Many such descriptors have been proposed; all of them have advantages and limitations. Here, we propose a set of general two-body and three-body interaction descriptors which are invariant to translation, rotation, and atomic indexing. By adapting the successfully used kernel ridge regression methods of machine learning, we evaluate our descriptors on predicting several properties of small organic molecules calculated using density-functional theory. We use two data sets. The GDB-7 set contains 6868 molecules with up to 7 heavy atoms of type CNO. The GDB-9 set is composed of 131722 molecules with up to 9 heavy atoms containing CNO. When trained on 5000 random molecules, our best model achieves an accuracy of 0.8 kcal/mol (on the remaining 1868 molecules of GDB-7) and 1.5 kcal/mol (on the remaining 126722 molecules of GDB-9) respectively. Applying a linear regression model on our novel many-body descriptors performs almost equal to a nonlinear kernelized model. Linear models are readily interpretable: a feature importance ranking measure helps to obtain qualitative and quantitative insights on the importance of two- and three-body molecular interactions for predicting molecular properties computed with quantum-mechanical methods.

  3. Prediction of clathrate structure type and guest position by molecular mechanics.

    PubMed

    Fleischer, Everly B; Janda, Kenneth C

    2013-05-16

    The clathrate hydrates occur in various types in which the number, size, and shape of the various cages differ. Usually the clathrate type of a specific guest is predicted by the size and shape of the molecular guest. We have developed a methodology to determine the clathrate type employing molecular mechanics with the MMFF force field employing a strategy to calculate the energy of formation of the clathrate from the sum of the guest/cage energies. The clathrate type with the most negative (most stable) energy of formation would be the type predicted (we mainly focused on type I, type II, or bromine type). This strategy allows for a calculation to predict the clathrate type for any cage guest in a few minutes on a laptop computer. It proved successful in predicting the clathrate structure for 46 out of 47 guest molecules. The molecular mechanics calculations also provide a prediction of the guest position within the cage and clathrate structure. These predictions are generally consistent with the X-ray and neutron diffraction studies. By supplementing the diffraction study with molecular mechanics, we gain a more detailed insight regarding the details of the structure. We have also compared MM calculations to studies of the multiple occupancy of the cages. Finally, we present a density functional calculation that demonstrates that the inside of the clathrates cages have a relatively uniform and low electrostatic potential in comparison with the outside oxygen and hydrogen atoms. This implies that van der Waals forces will usually be dominant in the guest-cage interactions.

  4. Acinar Cell Carcinoma of the Pancreas: Overview of Clinicopathologic Features and Insights into the Molecular Pathology.

    PubMed

    La Rosa, Stefano; Sessa, Fausto; Capella, Carlo

    2015-01-01

    Acinar cell carcinomas (ACCs) of the pancreas are rare pancreatic neoplasms accounting for about 1-2% of pancreatic tumors in adults and about 15% in pediatric subjects. They show different clinical symptoms at presentation, different morphological features, different outcomes, and different molecular alterations. This heterogeneous clinicopathological spectrum may give rise to difficulties in the clinical and pathological diagnosis with consequential therapeutic and prognostic implications. The molecular mechanisms involved in the onset and progression of ACCs are still not completely understood, although in recent years, several attempts have been made to clarify the molecular mechanisms involved in ACC biology. In this paper, we will review the main clinicopathological and molecular features of pancreatic ACCs of both adult and pediatric subjects to give the reader a comprehensive overview of this rare tumor type.

  5. Choroidal Infiltration by Retinoblastoma: Predictive Clinical Features and Outcome.

    PubMed

    Kaliki, Swathi; Tahiliani, Prerana; Iram, Sadiya; Ali, Mohammed Hasnat; Mishra, Dilip K; Reddy, Vijay Anand P

    2016-11-01

    To identify the clinical features predictive of choroidal infiltration by retinoblastoma on histopathology and to report the outcome in these patients. Retrospective study. Of the 403 patients who underwent primary enucleation for retinoblastoma, 113 patients had choroidal tumor infiltration and 290 patients had no choroidal tumor infiltration. There was a higher incidence of metastasis and related death in the choroidal tumor infiltration group compared to the no choroidal tumor infiltration group (4% vs 1%; P = .02). On multivariate analysis, the clinical features predictive of histopathologic massive choroidal infiltration included prolonged duration of symptoms for more than 6 months (hazard ratio [HR] = 3.04; P = .001) and secondary glaucoma (HR = 2.24; P = .005). In this study, the patients with retinoblastoma with prolonged duration of symptoms (> 6 months) had a three-fold greater risk and those with secondary glaucoma at presentation had a two-fold greater risk of massive choroidal tumor infiltration. [J Pediatr Ophthalmol Strabismus. 2016;53(6):349-356.]. Copyright 2016, SLACK Incorporated.

  6. Self-Adaptive MOEA Feature Selection for Classification of Bankruptcy Prediction Data

    PubMed Central

    Gaspar-Cunha, A.; Recio, G.; Costa, L.; Estébanez, C.

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier. PMID:24707201

  7. A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction.

    PubMed

    Du, Xiuquan; Cheng, Jiaxing; Zheng, Tingting; Duan, Zheng; Qian, Fulan

    2014-07-18

    Protein-protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp.

  8. The molecular basis of breast cancer pathological phenotypes.

    PubMed

    Heng, Yujing J; Lester, Susan C; Tse, Gary Mk; Factor, Rachel E; Allison, Kimberly H; Collins, Laura C; Chen, Yunn-Yi; Jensen, Kristin C; Johnson, Nicole B; Jeong, Jong Cheol; Punjabi, Rahi; Shin, Sandra J; Singh, Kamaljeet; Krings, Gregor; Eberhard, David A; Tan, Puay Hoon; Korski, Konstanty; Waldman, Frederic M; Gutman, David A; Sanders, Melinda; Reis-Filho, Jorge S; Flanagan, Sydney R; Gendoo, Deena Ma; Chen, Gregory M; Haibe-Kains, Benjamin; Ciriello, Giovanni; Hoadley, Katherine A; Perou, Charles M; Beck, Andrew H

    2017-02-01

    The histopathological evaluation of morphological features in breast tumours provides prognostic information to guide therapy. Adjunct molecular analyses provide further diagnostic, prognostic and predictive information. However, there is limited knowledge of the molecular basis of morphological phenotypes in invasive breast cancer. This study integrated genomic, transcriptomic and protein data to provide a comprehensive molecular profiling of morphological features in breast cancer. Fifteen pathologists assessed 850 invasive breast cancer cases from The Cancer Genome Atlas (TCGA). Morphological features were significantly associated with genomic alteration, DNA methylation subtype, PAM50 and microRNA subtypes, proliferation scores, gene expression and/or reverse-phase protein assay subtype. Marked nuclear pleomorphism, necrosis, inflammation and a high mitotic count were associated with the basal-like subtype, and had a similar molecular basis. Omics-based signatures were constructed to predict morphological features. The association of morphology transcriptome signatures with overall survival in oestrogen receptor (ER)-positive and ER-negative breast cancer was first assessed by use of the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset; signatures that remained prognostic in the METABRIC multivariate analysis were further evaluated in five additional datasets. The transcriptomic signature of poorly differentiated epithelial tubules was prognostic in ER-positive breast cancer. No signature was prognostic in ER-negative breast cancer. This study provided new insights into the molecular basis of breast cancer morphological phenotypes. The integration of morphological with molecular data has the potential to refine breast cancer classification, predict response to therapy, enhance our understanding of breast cancer biology, and improve clinical management. This work is publicly accessible at www.dx.ai/tcga_breast. Copyright © 2016

  9. Predicting human olfactory perception from chemical features of odor molecules.

    PubMed

    Keller, Andreas; Gerkin, Richard C; Guan, Yuanfang; Dhurandhar, Amit; Turu, Gabor; Szalai, Bence; Mainland, Joel D; Ihara, Yusuke; Yu, Chung Wen; Wolfinger, Russ; Vens, Celine; Schietgat, Leander; De Grave, Kurt; Norel, Raquel; Stolovitzky, Gustavo; Cecchi, Guillermo A; Vosshall, Leslie B; Meyer, Pablo

    2017-02-24

    It is still not possible to predict whether a given molecule will have a perceived odor or what olfactory percept it will produce. We therefore organized the crowd-sourced DREAM Olfaction Prediction Challenge. Using a large olfactory psychophysical data set, teams developed machine-learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models accurately predicted odor intensity and pleasantness and also successfully predicted 8 among 19 rated semantic descriptors ("garlic," "fish," "sweet," "fruit," "burnt," "spices," "flower," and "sour"). Regularized linear models performed nearly as well as random forest-based ones, with a predictive accuracy that closely approaches a key theoretical limit. These models help to predict the perceptual qualities of virtually any molecule with high accuracy and also reverse-engineer the smell of a molecule. Copyright © 2017, American Association for the Advancement of Science.

  10. Computational Prediction of Protein Epsilon Lysine Acetylation Sites Based on a Feature Selection Method.

    PubMed

    Gao, JianZhao; Tao, Xue-Wen; Zhao, Jia; Feng, Yuan-Ming; Cai, Yu-Dong; Zhang, Ning

    2017-01-01

    Lysine acetylation, as one type of post-translational modifications (PTM), plays key roles in cellular regulations and can be involved in a variety of human diseases. However, it is often high-cost and time-consuming to use traditional experimental approaches to identify the lysine acetylation sites. Therefore, effective computational methods should be developed to predict the acetylation sites. In this study, we developed a position-specific method for epsilon lysine acetylation site prediction. Sequences of acetylated proteins were retrieved from the UniProt database. Various kinds of features such as position specific scoring matrix (PSSM), amino acid factors (AAF), and disorders were incorporated. A feature selection method based on mRMR (Maximum Relevance Minimum Redundancy) and IFS (Incremental Feature Selection) was employed. Finally, 319 optimal features were selected from total 541 features. Using the 319 optimal features to encode peptides, a predictor was constructed based on dagging. As a result, an accuracy of 69.56% with MCC of 0.2792 was achieved. We analyzed the optimal features, which suggested some important factors determining the lysine acetylation sites. We developed a position-specific method for epsilon lysine acetylation site prediction. A set of optimal features was selected. Analysis of the optimal features provided insights into the mechanism of lysine acetylation sites, providing guidance of experimental validation. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. Prediction of Protein Modification Sites of Pyrrolidone Carboxylic Acid Using mRMR Feature Selection and Analysis

    PubMed Central

    Zheng, Lu-Lu; Niu, Shen; Hao, Pei; Feng, KaiYan; Cai, Yu-Dong; Li, Yixue

    2011-01-01

    Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations. PMID:22174779

  12. PREAL: prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection

    PubMed Central

    2013-01-01

    Background Assessment of potential allergenicity of protein is necessary whenever transgenic proteins are introduced into the food chain. Bioinformatics approaches in allergen prediction have evolved appreciably in recent years to increase sophistication and performance. However, what are the critical features for protein's allergenicity have been not fully investigated yet. Results We presented a more comprehensive model in 128 features space for allergenic proteins prediction by integrating various properties of proteins, such as biochemical and physicochemical properties, sequential features and subcellular locations. The overall accuracy in the cross-validation reached 93.42% to 100% with our new method. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) procedure were applied to obtain which features are essential for allergenicity. Results of the performance comparisons showed the superior of our method to the existing methods used widely. More importantly, it was observed that the features of subcellular locations and amino acid composition played major roles in determining the allergenicity of proteins, particularly extracellular/cell surface and vacuole of the subcellular locations for wheat and soybean. To facilitate the allergen prediction, we implemented our computational method in a web application, which can be available at http://gmobl.sjtu.edu.cn/PREAL/index.php. Conclusions Our new approach could improve the accuracy of allergen prediction. And the findings may provide novel insights for the mechanism of allergies. PMID:24565053

  13. Infrared images of reflection nebulae and Orion's bar: Fluorescent molecular hydrogen and the 3.3 micron feature

    NASA Technical Reports Server (NTRS)

    Burton, Michael G.; Moorhouse, Alan; Brand, P. W. J. L.; Roche, Patrick F.; Geballe, T. R.

    1989-01-01

    Images were obtained of the (fluorescent) molecular hydrogen 1-0 S(1) line, and of the 3.3 micron emission feature, in Orion's Bar and three reflection nebulae. The emission from these species appears to come from the same spatial locations in all sources observed. This suggests that the 3.3 micron feature is excited by the same energetic UV-photons which cause the molecular hydrogen to fluoresce.

  14. Imaging features of breast cancers on digital breast tomosynthesis according to molecular subtype: association with breast cancer detection.

    PubMed

    Lee, Su Hyun; Chang, Jung Min; Shin, Sung Ui; Chu, A Jung; Yi, Ann; Cho, Nariya; Moon, Woo Kyung

    2017-12-01

    To evaluate imaging features of breast cancers on digital breast tomosynthesis (DBT) according to molecular subtype and to determine whether the molecular subtype affects breast cancer detection on DBT. This was an institutional review board--approved study with a waiver of informed consent. DBT findings of 288 invasive breast cancers were reviewed according to Breast Imaging Reporting and Data System lexicon. Detectability of breast cancer was quantified by the number of readers (0-3) who correctly detected the cancer in an independent blinded review. DBT features and the cancer detectability score according to molecular subtype were compared using Fisher's exact test and analysis of variance. Of 288 invasive cancers, 194 were hormone receptor (HR)-positive, 48 were human epidermal growth factor receptor 2 (HER2) positive and 46 were triple negative breast cancers. The most common DBT findings were irregular spiculated masses for HR-positive cancer, fine pleomorphic or linear branching calcifications for HER2 positive cancer and irregular masses with circumscribed margins for triple negative breast cancers (p < 0.001). Cancer detectability on DBT was not significantly different according to molecular subtype (p = 0.213) but rather affected by tumour size, breast density and presence of mass or calcifications. Breast cancers showed different imaging features according to molecular subtype; however, it did not affect the cancer detectability on DBT. Advances in knowledge: DBT showed characteristic imaging features of breast cancers according to molecular subtype. However, cancer detectability on DBT was not affected by molecular subtype of breast cancers.

  15. A Predictive Model of Intein Insertion Site for Use in the Engineering of Molecular Switches

    PubMed Central

    Apgar, James; Ross, Mary; Zuo, Xiao; Dohle, Sarah; Sturtevant, Derek; Shen, Binzhang; de la Vega, Humberto; Lessard, Philip; Lazar, Gabor; Raab, R. Michael

    2012-01-01

    Inteins are intervening protein domains with self-splicing ability that can be used as molecular switches to control activity of their host protein. Successfully engineering an intein into a host protein requires identifying an insertion site that permits intein insertion and splicing while allowing for proper folding of the mature protein post-splicing. By analyzing sequence and structure based properties of native intein insertion sites we have identified four features that showed significant correlation with the location of the intein insertion sites, and therefore may be useful in predicting insertion sites in other proteins that provide native-like intein function. Three of these properties, the distance to the active site and dimer interface site, the SVM score of the splice site cassette, and the sequence conservation of the site showed statistically significant correlation and strong predictive power, with area under the curve (AUC) values of 0.79, 0.76, and 0.73 respectively, while the distance to secondary structure/loop junction showed significance but with less predictive power (AUC of 0.54). In a case study of 20 insertion sites in the XynB xylanase, two features of native insertion sites showed correlation with the splice sites and demonstrated predictive value in selecting non-native splice sites. Structural modeling of intein insertions at two sites highlighted the role that the insertion site location could play on the ability of the intein to modulate activity of the host protein. These findings can be used to enrich the selection of insertion sites capable of supporting intein splicing and hosting an intein switch. PMID:22649521

  16. Universality and predictability in molecular quantitative genetics.

    PubMed

    Nourmohammad, Armita; Held, Torsten; Lässig, Michael

    2013-12-01

    Molecular traits, such as gene expression levels or protein binding affinities, are increasingly accessible to quantitative measurement by modern high-throughput techniques. Such traits measure molecular functions and, from an evolutionary point of view, are important as targets of natural selection. We review recent developments in evolutionary theory and experiments that are expected to become building blocks of a quantitative genetics of molecular traits. We focus on universal evolutionary characteristics: these are largely independent of a trait's genetic basis, which is often at least partially unknown. We show that universal measurements can be used to infer selection on a quantitative trait, which determines its evolutionary mode of conservation or adaptation. Furthermore, universality is closely linked to predictability of trait evolution across lineages. We argue that universal trait statistics extends over a range of cellular scales and opens new avenues of quantitative evolutionary systems biology. Copyright © 2013. Published by Elsevier Ltd.

  17. HIV-1 protease cleavage site prediction based on two-stage feature selection method.

    PubMed

    Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong

    2013-03-01

    Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.

  18. NetTurnP – Neural Network Prediction of Beta-turns by Use of Evolutionary Information and Predicted Protein Sequence Features

    PubMed Central

    Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl

    2010-01-01

    β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC  = 0.50, Qtotal = 82.1%, sensitivity  = 75.6%, PPV  = 68.8% and AUC  = 0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17 – 0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. Conclusion The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences. PMID:21152409

  19. Clinical Relevance of Prognostic and Predictive Molecular Markers in Gliomas.

    PubMed

    Siegal, Tali

    2016-01-01

    Sorting and grading of glial tumors by the WHO classification provide clinicians with guidance as to the predicted course of the disease and choice of treatment. Nonetheless, histologically identical tumors may have very different outcome and response to treatment. Molecular markers that carry both diagnostic and prognostic information add useful tools to traditional classification by redefining tumor subtypes within each WHO category. Therefore, molecular markers have become an integral part of tumor assessment in modern neuro-oncology and biomarker status now guides clinical decisions in some subtypes of gliomas. The routine assessment of IDH status improves histological diagnostic accuracy by differentiating diffuse glioma from reactive gliosis. It carries a favorable prognostic implication for all glial tumors and it is predictive for chemotherapeutic response in anaplastic oligodendrogliomas with codeletion of 1p/19q chromosomes. Glial tumors that contain chromosomal codeletion of 1p/19q are defined as tumors of oligodendroglial lineage and have favorable prognosis. MGMT promoter methylation is a favorable prognostic marker in astrocytic high-grade gliomas and it is predictive for chemotherapeutic response in anaplastic gliomas with wild-type IDH1/2 and in glioblastoma of the elderly. The clinical implication of other molecular markers of gliomas like mutations of EGFR and ATRX genes and BRAF fusion or point mutation is highlighted. The potential of molecular biomarker-based classification to guide future therapeutic approach is discussed and accentuated.

  20. Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.

    PubMed

    Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing

    2016-08-24

    Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not

  1. Feature genes predicting the FLT3/ITD mutation in acute myeloid leukemia

    PubMed Central

    LI, CHENGLONG; ZHU, BIAO; CHEN, JIAO; HUANG, XIAOBING

    2016-01-01

    In the present study, gene expression profiles of acute myeloid leukemia (AML) samples were analyzed to identify feature genes with the capacity to predict the mutation status of FLT3/ITD. Two machine learning models, namely the support vector machine (SVM) and random forest (RF) methods, were used for classification. Four datasets were downloaded from the European Bioinformatics Institute, two of which (containing 371 samples, including 281 FLT3/ITD mutation-negative and 90 mutation-positive samples) were randomly defined as the training group, while the other two datasets (containing 488 samples, including 350 FLT3/ITD mutation-negative and 138 mutation-positive samples) were defined as the test group. Differentially expressed genes (DEGs) were identified by significance analysis of the micro-array data by using the training samples. The classification efficiency of the SCM and RF methods was evaluated using the following parameters: Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and the area under the receiver operating characteristic curve. Functional enrichment analysis was performed for the feature genes with DAVID. A total of 585 DEGs were identified in the training group, of which 580 were upregulated and five were downregulated. The classification accuracy rates of the two methods for the training group, the test group and the combined group using the 585 feature genes were >90%. For the SVM and RF methods, the rates of correct determination, specificity and PPV were >90%, while the sensitivity and NPV were >80%. The SVM method produced a slightly better classification effect than the RF method. A total of 13 biological pathways were overrepresented by the feature genes, mainly involving energy metabolism, chromatin organization and translation. The feature genes identified in the present study may be used to predict the mutation status of FLT3/ITD in patients with AML. PMID:27177049

  2. Feature genes predicting the FLT3/ITD mutation in acute myeloid leukemia.

    PubMed

    Li, Chenglong; Zhu, Biao; Chen, Jiao; Huang, Xiaobing

    2016-07-01

    In the present study, gene expression profiles of acute myeloid leukemia (AML) samples were analyzed to identify feature genes with the capacity to predict the mutation status of FLT3/ITD. Two machine learning models, namely the support vector machine (SVM) and random forest (RF) methods, were used for classification. Four datasets were downloaded from the European Bioinformatics Institute, two of which (containing 371 samples, including 281 FLT3/ITD mutation-negative and 90 mutation‑positive samples) were randomly defined as the training group, while the other two datasets (containing 488 samples, including 350 FLT3/ITD mutation-negative and 138 mutation-positive samples) were defined as the test group. Differentially expressed genes (DEGs) were identified by significance analysis of the microarray data by using the training samples. The classification efficiency of the SCM and RF methods was evaluated using the following parameters: Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and the area under the receiver operating characteristic curve. Functional enrichment analysis was performed for the feature genes with DAVID. A total of 585 DEGs were identified in the training group, of which 580 were upregulated and five were downregulated. The classification accuracy rates of the two methods for the training group, the test group and the combined group using the 585 feature genes were >90%. For the SVM and RF methods, the rates of correct determination, specificity and PPV were >90%, while the sensitivity and NPV were >80%. The SVM method produced a slightly better classification effect than the RF method. A total of 13 biological pathways were overrepresented by the feature genes, mainly involving energy metabolism, chromatin organization and translation. The feature genes identified in the present study may be used to predict the mutation status of FLT3/ITD in patients with AML.

  3. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    PubMed Central

    2011-01-01

    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite

  4. Prediction of occult invasive disease in ductal carcinoma in situ using computer-extracted mammographic features

    NASA Astrophysics Data System (ADS)

    Shi, Bibo; Grimm, Lars J.; Mazurowski, Maciej A.; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

    2017-03-01

    Predicting the risk of occult invasive disease in ductal carcinoma in situ (DCIS) is an important task to help address the overdiagnosis and overtreatment problems associated with breast cancer. In this work, we investigated the feasibility of using computer-extracted mammographic features to predict occult invasive disease in patients with biopsy proven DCIS. We proposed a computer-vision algorithm based approach to extract mammographic features from magnification views of full field digital mammography (FFDM) for patients with DCIS. After an expert breast radiologist provided a region of interest (ROI) mask for the DCIS lesion, the proposed approach is able to segment individual microcalcifications (MCs), detect the boundary of the MC cluster (MCC), and extract 113 mammographic features from MCs and MCC within the ROI. In this study, we extracted mammographic features from 99 patients with DCIS (74 pure DCIS; 25 DCIS plus invasive disease). The predictive power of the mammographic features was demonstrated through binary classifications between pure DCIS and DCIS with invasive disease using linear discriminant analysis (LDA). Before classification, the minimum redundancy Maximum Relevance (mRMR) feature selection method was first applied to choose subsets of useful features. The generalization performance was assessed using Leave-One-Out Cross-Validation and Receiver Operating Characteristic (ROC) curve analysis. Using the computer-extracted mammographic features, the proposed model was able to distinguish DCIS with invasive disease from pure DCIS, with an average classification performance of AUC = 0.61 +/- 0.05. Overall, the proposed computer-extracted mammographic features are promising for predicting occult invasive disease in DCIS.

  5. Radiomics biomarkers for accurate tumor progression prediction of oropharyngeal cancer

    NASA Astrophysics Data System (ADS)

    Hadjiiski, Lubomir; Chan, Heang-Ping; Cha, Kenny H.; Srinivasan, Ashok; Wei, Jun; Zhou, Chuan; Prince, Mark; Papagerakis, Silvana

    2017-03-01

    Accurate tumor progression prediction for oropharyngeal cancers is crucial for identifying patients who would best be treated with optimized treatment and therefore minimize the risk of under- or over-treatment. An objective decision support system that can merge the available radiomics, histopathologic and molecular biomarkers in a predictive model based on statistical outcomes of previous cases and machine learning may assist clinicians in making more accurate assessment of oropharyngeal tumor progression. In this study, we evaluated the feasibility of developing individual and combined predictive models based on quantitative image analysis from radiomics, histopathology and molecular biomarkers for oropharyngeal tumor progression prediction. With IRB approval, 31, 84, and 127 patients with head and neck CT (CT-HN), tumor tissue microarrays (TMAs) and molecular biomarker expressions, respectively, were collected. For 8 of the patients all 3 types of biomarkers were available and they were sequestered in a test set. The CT-HN lesions were automatically segmented using our level sets based method. Morphological, texture and molecular based features were extracted from CT-HN and TMA images, and selected features were merged by a neural network. The classification accuracy was quantified using the area under the ROC curve (AUC). Test AUCs of 0.87, 0.74, and 0.71 were obtained with the individual predictive models based on radiomics, histopathologic, and molecular features, respectively. Combining the radiomics and molecular models increased the test AUC to 0.90. Combining all 3 models increased the test AUC further to 0.94. This preliminary study demonstrates that the individual domains of biomarkers are useful and the integrated multi-domain approach is most promising for tumor progression prediction.

  6. A priori Prediction of Neoadjuvant Chemotherapy Response and Survival in Breast Cancer Patients using Quantitative Ultrasound

    PubMed Central

    Tadayyon, Hadi; Sannachi, Lakshmanan; Gangeh, Mehrdad J.; Kim, Christina; Ghandi, Sonal; Trudeau, Maureen; Pritchard, Kathleen; Tran, William T.; Slodkowska, Elzbieta; Sadeghi-Naini, Ali; Czarnota, Gregory J.

    2017-01-01

    Quantitative ultrasound (QUS) can probe tissue structure and analyze tumour characteristics. Using a 6-MHz ultrasound system, radiofrequency data were acquired from 56 locally advanced breast cancer patients prior to their neoadjuvant chemotherapy (NAC) and QUS texture features were computed from regions of interest in tumour cores and their margins as potential predictive and prognostic indicators. Breast tumour molecular features were also collected and used for analysis. A multiparametric QUS model was constructed, which demonstrated a response prediction accuracy of 88% and ability to predict patient 5-year survival rates (p = 0.01). QUS features demonstrated superior performance in comparison to molecular markers and the combination of QUS and molecular markers did not improve response prediction. This study demonstrates, for the first time, that non-invasive QUS features in the core and margin of breast tumours can indicate breast cancer response to neoadjuvant chemotherapy (NAC) and predict five-year recurrence-free survival. PMID:28401902

  7. A priori Prediction of Neoadjuvant Chemotherapy Response and Survival in Breast Cancer Patients using Quantitative Ultrasound.

    PubMed

    Tadayyon, Hadi; Sannachi, Lakshmanan; Gangeh, Mehrdad J; Kim, Christina; Ghandi, Sonal; Trudeau, Maureen; Pritchard, Kathleen; Tran, William T; Slodkowska, Elzbieta; Sadeghi-Naini, Ali; Czarnota, Gregory J

    2017-04-12

    Quantitative ultrasound (QUS) can probe tissue structure and analyze tumour characteristics. Using a 6-MHz ultrasound system, radiofrequency data were acquired from 56 locally advanced breast cancer patients prior to their neoadjuvant chemotherapy (NAC) and QUS texture features were computed from regions of interest in tumour cores and their margins as potential predictive and prognostic indicators. Breast tumour molecular features were also collected and used for analysis. A multiparametric QUS model was constructed, which demonstrated a response prediction accuracy of 88% and ability to predict patient 5-year survival rates (p = 0.01). QUS features demonstrated superior performance in comparison to molecular markers and the combination of QUS and molecular markers did not improve response prediction. This study demonstrates, for the first time, that non-invasive QUS features in the core and margin of breast tumours can indicate breast cancer response to neoadjuvant chemotherapy (NAC) and predict five-year recurrence-free survival.

  8. Predicting and explaining the movement of mesoscale oceanographic features using CLIPS

    NASA Technical Reports Server (NTRS)

    Bridges, Susan; Chen, Liang-Chun; Lybanon, Matthew

    1994-01-01

    The Naval Research Laboratory has developed an oceanographic expert system that describes the evolution of mesoscale features in the Gulf Stream region of the northwest Atlantic Ocean. These features include the Gulf Stream current and the warm and cold core eddies associated with the Gulf Stream. An explanation capability was added to the eddy prediction component of the expert system in order to allow the system to justify the reasoning process it uses to make predictions. The eddy prediction and explanation components of the system have recently been redesigned and translated from OPS83 to C and CLIPS and the new system is called WATE (Where Are Those Eddies). The new design has improved the system's readability, understandability and maintainability and will also allow the system to be incorporated into the Semi-Automated Mesoscale Analysis System which will eventually be embedded into the Navy's Tactical Environmental Support System, Third Generation, TESS(3).

  9. Predictive Value of Morphological Features in Patients with Autism versus Normal Controls

    ERIC Educational Resources Information Center

    Ozgen, H.; Hellemann, G. S.; de Jonge, M. V.; Beemer, F. A.; van Engeland, H.

    2013-01-01

    We investigated the predictive power of morphological features in 224 autistic patients and 224 matched-pairs controls. To assess the relationship between the morphological features and autism, we used the receiver operator curves (ROC). In addition, we used recursive partitioning (RP) to determine a specific pattern of abnormalities that is…

  10. An approach to predict Sudden Cardiac Death (SCD) using time domain and bispectrum features from HRV signal.

    PubMed

    Houshyarifar, Vahid; Chehel Amirani, Mehdi

    2016-08-12

    In this paper we present a method to predict Sudden Cardiac Arrest (SCA) with higher order spectral (HOS) and linear (Time) features extracted from heart rate variability (HRV) signal. Predicting the occurrence of SCA is important in order to avoid the probability of Sudden Cardiac Death (SCD). This work is a challenge to predict five minutes before SCA onset. The method consists of four steps: pre-processing, feature extraction, feature reduction, and classification. In the first step, the QRS complexes are detected from the electrocardiogram (ECG) signal and then the HRV signal is extracted. In second step, bispectrum features of HRV signal and time-domain features are obtained. Six features are extracted from bispectrum and two features from time-domain. In the next step, these features are reduced to one feature by the linear discriminant analysis (LDA) technique. Finally, KNN and support vector machine-based classifiers are used to classify the HRV signals. We used two database named, MIT/BIH Sudden Cardiac Death (SCD) Database and Physiobank Normal Sinus Rhythm (NSR). In this work we achieved prediction of SCD occurrence for six minutes before the SCA with the accuracy over 91%.

  11. Feature Biases in Early Word Learning: Network Distinctiveness Predicts Age of Acquisition

    ERIC Educational Resources Information Center

    Engelthaler, Tomas; Hills, Thomas T.

    2017-01-01

    Do properties of a word's features influence the order of its acquisition in early word learning? Combining the principles of mutual exclusivity and shape bias, the present work takes a network analysis approach to understanding how feature distinctiveness predicts the order of early word learning. Distance networks were built from nouns with edge…

  12. Morphological features of IFN-γ–stimulated mesenchymal stromal cells predict overall immunosuppressive capacity

    PubMed Central

    Klinker, Matthew W.; Marklein, Ross A.; Lo Surdo, Jessica L.; Wei, Cheng-Hong

    2017-01-01

    Human mesenchymal stromal cell (MSC) lines can vary significantly in their functional characteristics, and the effectiveness of MSC-based therapeutics may be realized by finding predictive features associated with MSC function. To identify features associated with immunosuppressive capacity in MSCs, we developed a robust in vitro assay that uses principal-component analysis to integrate multidimensional flow cytometry data into a single measurement of MSC-mediated inhibition of T-cell activation. We used this assay to correlate single-cell morphological data with overall immunosuppressive capacity in a cohort of MSC lines derived from different donors and manufacturing conditions. MSC morphology after IFN-γ stimulation significantly correlated with immunosuppressive capacity and accurately predicted the immunosuppressive capacity of MSC lines in a validation cohort. IFN-γ enhanced the immunosuppressive capacity of all MSC lines, and morphology predicted the magnitude of IFN-γ–enhanced immunosuppressive activity. Together, these data identify MSC morphology as a predictive feature of MSC immunosuppressive function. PMID:28283659

  13. Novel molecular subgroups for clinical classification and outcome prediction in childhood medulloblastoma: a cohort study.

    PubMed

    Schwalbe, Edward C; Lindsey, Janet C; Nakjang, Sirintra; Crosier, Stephen; Smith, Amanda J; Hicks, Debbie; Rafiee, Gholamreza; Hill, Rebecca M; Iliasova, Alice; Stone, Thomas; Pizer, Barry; Michalski, Antony; Joshi, Abhijit; Wharton, Stephen B; Jacques, Thomas S; Bailey, Simon; Williamson, Daniel; Clifford, Steven C

    2017-07-01

    International consensus recognises four medulloblastoma molecular subgroups: WNT (MB WNT ), SHH (MB SHH ), group 3 (MB Grp3 ), and group 4 (MB Grp4 ), each defined by their characteristic genome-wide transcriptomic and DNA methylomic profiles. These subgroups have distinct clinicopathological and molecular features, and underpin current disease subclassification and initial subgroup-directed therapies that are underway in clinical trials. However, substantial biological heterogeneity and differences in survival are apparent within each subgroup, which remain to be resolved. We aimed to investigate whether additional molecular subgroups exist within childhood medulloblastoma and whether these could be used to improve disease subclassification and prognosis predictions. In this retrospective cohort study, we assessed 428 primary medulloblastoma samples collected from UK Children's Cancer and Leukaemia Group (CCLG) treatment centres (UK), collaborating European institutions, and the UKCCSG-SIOP-PNET3 European clinical trial. An independent validation cohort (n=276) of archival tumour samples was also analysed. We analysed samples from patients with childhood medulloblastoma who were aged 0-16 years at diagnosis, and had central review of pathology and comprehensive clinical data. We did comprehensive molecular profiling, including DNA methylation microarray analysis, and did unsupervised class discovery of test and validation cohorts to identify consensus primary molecular subgroups and characterise their clinical and biological significance. We modelled survival of patients aged 3-16 years in patients (n=215) who had craniospinal irradiation and had been treated with a curative intent. Seven robust and reproducible primary molecular subgroups of childhood medulloblastoma were identified. MB WNT remained unchanged and each remaining consensus subgroup was split in two. MB SHH was split into age-dependent subgroups corresponding to infant (<4·3 years; MB SHH

  14. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

  15. Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization.

    PubMed

    Wen, Ping-Ping; Shi, Shao-Ping; Xu, Hao-Dong; Wang, Li-Na; Qiu, Jian-Ding

    2016-10-15

    As one of the most important reversible types of post-translational modification, protein methylation catalyzed by methyltransferases carries many pivotal biological functions as well as many essential biological processes. Identification of methylation sites is prerequisite for decoding methylation regulatory networks in living cells and understanding their physiological roles. Experimental methods are limitations of labor-intensive and time-consuming. While in silicon approaches are cost-effective and high-throughput manner to predict potential methylation sites, but those previous predictors only have a mixed model and their prediction performances are not fully satisfactory now. Recently, with increasing availability of quantitative methylation datasets in diverse species (especially in eukaryotes), there is a growing need to develop a species-specific predictor. Here, we designed a tool named PSSMe based on information gain (IG) feature optimization method for species-specific methylation site prediction. The IG method was adopted to analyze the importance and contribution of each feature, then select the valuable dimension feature vectors to reconstitute a new orderly feature, which was applied to build the finally prediction model. Finally, our method improves prediction performance of accuracy about 15% comparing with single features. Furthermore, our species-specific model significantly improves the predictive performance compare with other general methylation prediction tools. Hence, our prediction results serve as useful resources to elucidate the mechanism of arginine or lysine methylation and facilitate hypothesis-driven experimental design and validation. The tool online service is implemented by C# language and freely available at http://bioinfo.ncu.edu.cn/PSSMe.aspx CONTACT: jdqiu@ncu.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights

  16. Prediction of purification of biopharmeceuticals with molecular dynamics

    NASA Astrophysics Data System (ADS)

    Ustach, Vincent; Faller, Roland

    Purification of biopharmeceuticals remains the most expensive part of protein-based drug production. In ion exchange chromatography (IEX), prediction of the elution ionic strength of host cell and target proteins has the potential to reduce the parameter space for scale-up of protein production. The complex shape and charge distribution of proteins and pores complicates predictions of the interactions in these systems. All-atom molecular dynamics methods are beyond the scope of computational limits for mass transport regimes. We present a coarse-grained model for proteins for prediction of elution pH and ionic strength. By extending the raspberry model for colloid particles to surface shapes and charge distributions of proteins, we can reproduce the behavior of proteins in IEX. The average charge states of titratatable amino acid residues at relevant pH values are determined by extrapolation from all-atom molecular dynamics at pH 7. The pH specific all-atom electrostatic field is then mapped onto the coarse-grained surface beads of the raspberry particle. The hydrodynamics are reproduced with the lattice-Boltzmann scheme. This combination of methods allows very long simulation times. The model is being validated for known elution procedures by comparing the data with experiments. Defense Threat Reduction Agency (Grant Number HDTRA1-15-1-0054).

  17. Exploiting Information Diffusion Feature for Link Prediction in Sina Weibo

    NASA Astrophysics Data System (ADS)

    Li, Dong; Zhang, Yongchao; Xu, Zhiming; Chu, Dianhui; Li, Sheng

    2016-01-01

    The rapid development of online social networks (e.g., Twitter and Facebook) has promoted research related to social networks in which link prediction is a key problem. Although numerous attempts have been made for link prediction based on network structure, node attribute and so on, few of the current studies have considered the impact of information diffusion on link creation and prediction. This paper mainly addresses Sina Weibo, which is the largest microblog platform with Chinese characteristics, and proposes the hypothesis that information diffusion influences link creation and verifies the hypothesis based on real data analysis. We also detect an important feature from the information diffusion process, which is used to promote link prediction performance. Finally, the experimental results on Sina Weibo dataset have demonstrated the effectiveness of our methods.

  18. Exploiting Information Diffusion Feature for Link Prediction in Sina Weibo.

    PubMed

    Li, Dong; Zhang, Yongchao; Xu, Zhiming; Chu, Dianhui; Li, Sheng

    2016-01-28

    The rapid development of online social networks (e.g., Twitter and Facebook) has promoted research related to social networks in which link prediction is a key problem. Although numerous attempts have been made for link prediction based on network structure, node attribute and so on, few of the current studies have considered the impact of information diffusion on link creation and prediction. This paper mainly addresses Sina Weibo, which is the largest microblog platform with Chinese characteristics, and proposes the hypothesis that information diffusion influences link creation and verifies the hypothesis based on real data analysis. We also detect an important feature from the information diffusion process, which is used to promote link prediction performance. Finally, the experimental results on Sina Weibo dataset have demonstrated the effectiveness of our methods.

  19. A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data.

    PubMed

    Bommert, Andrea; Rahnenführer, Jörg; Lang, Michel

    2017-01-01

    Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.

  20. Personalized Cancer Medicine: Molecular Diagnostics, Predictive biomarkers, and Drug Resistance

    PubMed Central

    Gonzalez de Castro, D; Clarke, P A; Al-Lazikani, B; Workman, P

    2013-01-01

    The progressive elucidation of the molecular pathogenesis of cancer has fueled the rational development of targeted drugs for patient populations stratified by genetic characteristics. Here we discuss general challenges relating to molecular diagnostics and describe predictive biomarkers for personalized cancer medicine. We also highlight resistance mechanisms for epidermal growth factor receptor (EGFR) kinase inhibitors in lung cancer. We envisage a future requiring the use of longitudinal genome sequencing and other omics technologies alongside combinatorial treatment to overcome cellular and molecular heterogeneity and prevent resistance caused by clonal evolution. PMID:23361103

  1. Health Communication in Social Media: Message Features Predicting User Engagement on Diabetes-Related Facebook Pages.

    PubMed

    Rus, Holly M; Cameron, Linda D

    2016-10-01

    Social media provides unprecedented opportunities for enhancing health communication and health care, including self-management of chronic conditions such as diabetes. Creating messages that engage users is critical for enhancing message impact and dissemination. This study analyzed health communications within ten diabetes-related Facebook pages to identify message features predictive of user engagement. The Common-Sense Model of Illness Self-Regulation and established health communication techniques guided content analyses of 500 Facebook posts. Each post was coded for message features predicted to engage users and numbers of likes, shares, and comments during the week following posting. Multi-level, negative binomial regressions revealed that specific features predicted different forms of engagement. Imagery emerged as a strong predictor; messages with images had higher rates of liking and sharing relative to messages without images. Diabetes consequence information and positive identity predicted higher sharing while negative affect, social support, and crowdsourcing predicted higher commenting. Negative affect, crowdsourcing, and use of external links predicted lower sharing while positive identity predicted lower commenting. The presence of imagery weakened or reversed the positive relationships of several message features with engagement. Diabetes control information and negative affect predicted more likes in text-only messages, but fewer likes when these messages included illustrative imagery. Similar patterns of imagery's attenuating effects emerged for the positive relationships of consequence information, control information, and positive identity with shares and for positive relationships of negative affect and social support with comments. These findings hold promise for guiding communication design in health-related social media.

  2. Biologically active ligands for yersinia outer protein H (YopH): feature based pharmacophore screening, docking and molecular dynamics studies.

    PubMed

    Tamilvanan, Thangaraju; Hopper, Waheeta

    2014-01-01

    Yersinia pestis, a Gram negative bacillus, spreads via lymphatic to lymph nodes and to all organs through the bloodstream, causing plague. Yersinia outer protein H (YopH) is one of the important effector proteins, which paralyzes lymphocytes and macrophages by dephosphorylating critical tyrosine kinases and signal transduction molecules. The purpose of the study is to generate a three-dimensional (3D) pharmacophore model by using diverse sets of YopH inhibitors, which would be useful for designing of potential antitoxin. In this study, we have selected 60 biologically active inhibitors of YopH to perform Ligand based pharmacophore study to elucidate the important structural features responsible for biological activity. Pharmacophore model demonstrated the importance of two acceptors, one hydrophobic and two aromatic features toward the biological activity. Based on these features, different databases were screened to identify novel compounds and these ligands were subjected for docking, ADME properties and Binding energy prediction. Post docking validation was performed using molecular dynamics simulation for selected ligands to calculate the Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF). The ligands, ASN03270114, Mol_252138, Mol_31073 and ZINC04237078 may act as inhibitors against YopH of Y. pestis.

  3. MRI texture features as biomarkers to predict MGMT methylation status in glioblastomas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Korfiatis, Panagiotis; Kline, Timothy L.; Erickson, Bradley J., E-mail: bje@mayo.edu

    Purpose: Imaging biomarker research focuses on discovering relationships between radiological features and histological findings. In glioblastoma patients, methylation of the O{sup 6}-methylguanine methyltransferase (MGMT) gene promoter is positively correlated with an increased effectiveness of current standard of care. In this paper, the authors investigate texture features as potential imaging biomarkers for capturing the MGMT methylation status of glioblastoma multiforme (GBM) tumors when combined with supervised classification schemes. Methods: A retrospective study of 155 GBM patients with known MGMT methylation status was conducted. Co-occurrence and run length texture features were calculated, and both support vector machines (SVMs) and random forest classifiersmore » were used to predict MGMT methylation status. Results: The best classification system (an SVM-based classifier) had a maximum area under the receiver-operating characteristic (ROC) curve of 0.85 (95% CI: 0.78–0.91) using four texture features (correlation, energy, entropy, and local intensity) originating from the T2-weighted images, yielding at the optimal threshold of the ROC curve, a sensitivity of 0.803 and a specificity of 0.813. Conclusions: Results show that supervised machine learning of MRI texture features can predict MGMT methylation status in preoperative GBM tumors, thus providing a new noninvasive imaging biomarker.« less

  4. MRI signal and texture features for the prediction of MCI to Alzheimer's disease progression

    NASA Astrophysics Data System (ADS)

    Martínez-Torteya, Antonio; Rodríguez-Rojas, Juan; Celaya-Padilla, José M.; Galván-Tejada, Jorge I.; Treviño, Victor; Tamez-Peña, José G.

    2014-03-01

    An early diagnosis of Alzheimer's disease (AD) confers many benefits. Several biomarkers from different information modalities have been proposed for the prediction of MCI to AD progression, where features extracted from MRI have played an important role. However, studies have focused almost exclusively in the morphological characteristics of the images. This study aims to determine whether features relating to the signal and texture of the image could add predictive power. Baseline clinical, biological and PET information, and MP-RAGE images for 62 subjects from the Alzheimer's Disease Neuroimaging Initiative were used in this study. Images were divided into 83 regions and 50 features were extracted from each one of these. A multimodal database was constructed, and a feature selection algorithm was used to obtain an accurate and small logistic regression model, which achieved a cross-validation accuracy of 0.96. These model included six features, five of them obtained from the MP-RAGE image, and one obtained from genotyping. A risk analysis divided the subjects into low-risk and high-risk groups according to a prognostic index, showing that both groups are statistically different (p-value of 2.04e-11). The results demonstrate that MRI features related to both signal and texture, add MCI to AD predictive power, and support the idea that multimodal biomarkers outperform single-modality biomarkers.

  5. A combination of molecular markers and clinical features improve the classification of pancreatic cysts.

    PubMed

    Springer, Simeon; Wang, Yuxuan; Dal Molin, Marco; Masica, David L; Jiao, Yuchen; Kinde, Isaac; Blackford, Amanda; Raman, Siva P; Wolfgang, Christopher L; Tomita, Tyler; Niknafs, Noushin; Douville, Christopher; Ptak, Janine; Dobbyn, Lisa; Allen, Peter J; Klimstra, David S; Schattner, Mark A; Schmidt, C Max; Yip-Schneider, Michele; Cummings, Oscar W; Brand, Randall E; Zeh, Herbert J; Singhi, Aatur D; Scarpa, Aldo; Salvia, Roberto; Malleo, Giuseppe; Zamboni, Giuseppe; Falconi, Massimo; Jang, Jin-Young; Kim, Sun-Whe; Kwon, Wooil; Hong, Seung-Mo; Song, Ki-Byung; Kim, Song Cheol; Swan, Niall; Murphy, Jean; Geoghegan, Justin; Brugge, William; Fernandez-Del Castillo, Carlos; Mino-Kenudson, Mari; Schulick, Richard; Edil, Barish H; Adsay, Volkan; Paulino, Jorge; van Hooft, Jeanin; Yachida, Shinichi; Nara, Satoshi; Hiraoka, Nobuyoshi; Yamao, Kenji; Hijioka, Susuma; van der Merwe, Schalk; Goggins, Michael; Canto, Marcia Irene; Ahuja, Nita; Hirose, Kenzo; Makary, Martin; Weiss, Matthew J; Cameron, John; Pittman, Meredith; Eshleman, James R; Diaz, Luis A; Papadopoulos, Nickolas; Kinzler, Kenneth W; Karchin, Rachel; Hruban, Ralph H; Vogelstein, Bert; Lennon, Anne Marie

    2015-11-01

    The management of pancreatic cysts poses challenges to both patients and their physicians. We investigated whether a combination of molecular markers and clinical information could improve the classification of pancreatic cysts and management of patients. We performed a multi-center, retrospective study of 130 patients with resected pancreatic cystic neoplasms (12 serous cystadenomas, 10 solid pseudopapillary neoplasms, 12 mucinous cystic neoplasms, and 96 intraductal papillary mucinous neoplasms). Cyst fluid was analyzed to identify subtle mutations in genes known to be mutated in pancreatic cysts (BRAF, CDKN2A, CTNNB1, GNAS, KRAS, NRAS, PIK3CA, RNF43, SMAD4, TP53, and VHL); to identify loss of heterozygozity at CDKN2A, RNF43, SMAD4, TP53, and VHL tumor suppressor loci; and to identify aneuploidy. The analyses were performed using specialized technologies for implementing and interpreting massively parallel sequencing data acquisition. An algorithm was used to select markers that could classify cyst type and grade. The accuracy of the molecular markers was compared with that of clinical markers and a combination of molecular and clinical markers. We identified molecular markers and clinical features that classified cyst type with 90%-100% sensitivity and 92%-98% specificity. The molecular marker panel correctly identified 67 of the 74 patients who did not require surgery and could, therefore, reduce the number of unnecessary operations by 91%. We identified a panel of molecular markers and clinical features that show promise for the accurate classification of cystic neoplasms of the pancreas and identification of cysts that require surgery. Copyright © 2015 AGA Institute. Published by Elsevier Inc. All rights reserved.

  6. A Combination of Molecular Markers and Clinical Features Improve the Classification of Pancreatic Cysts

    PubMed Central

    Springer, Simeon; Wang, Yuxuan; Molin, Marco Dal; Masica, David L.; Jiao, Yuchen; Kinde, Isaac; Blackford, Amanda; Raman, Siva P.; Wolfgang, Christopher L.; Tomita, Tyler; Niknafs, Noushin; Douville, Christopher; Ptak, Janine; Dobbyn, Lisa; Allen, Peter J.; Klimstra, David S.; Schattner, Mark A.; Schmidt, C. Max; Yip-Schneider, Michele; Cummings, Oscar W.; Brand, Randall E.; Zeh, Herbert J.; Singhi, Aatur D.; Scarpa, Aldo; Salvia, Roberto; Malleo, Giuseppe; Zamboni, Giuseppe; Falconi, Massimo; Jang, Jin-Young; Kim, Sun-Whe; Kwon, Wooil; Hong, Seung-Mo; Song, Ki-Byung; Kim, Song Cheol; Swan, Niall; Murphy, Jean; Geoghegan, Justin; Brugge, William; Fernandez-Del Castillo, Carlos; Mino-Kenudson, Mari; Schulick, Richard; Edil, Barish H.; Adsay, Volkan; Paulino, Jorge; van Hooft, Jeanin; Yachida, Shinichi; Nara, Satoshi; Hiraoka, Nobuyoshi; Yamao, Kenji; Hijioka, Susuma; van der Merwe, Schalk; Goggins, Michael; Canto, Marcia Irene; Ahuja, Nita; Hirose, Kenzo; Makary, Martin; Weiss, Matthew J.; Cameron, John; Pittman, Meredith; Eshleman, James R.; Diaz, Luis A.; Papadopoulos, Nickolas; Kinzler, Kenneth W.; Karchin, Rachel; Hruban, Ralph H.; Vogelstein, Bert; Lennon, Anne Marie

    2016-01-01

    Background & Aims The management of pancreatic cysts poses challenges to both patients and their physicians. We investigated whether a combination of molecular markers and clinical information could improve the classification of pancreatic cysts and management of patients. Methods We performed a multi-center, retrospective study of 130 patients with resected pancreatic cystic neoplasms (12 serous cystadenomas, 10 solid-pseudopapillary neoplasms, 12 mucinous cystic neoplasms, and 96 intraductal papillary mucinous neoplasms). Cyst fluid was analyzed to identify subtle mutations in genes known to be mutated in pancreatic cysts (BRAF, CDKN2A, CTNNB1, GNAS, KRAS, NRAS, PIK3CA, RNF43, SMAD4, TP53 and VHL); to identify loss of heterozygozity at CDKN2A, RNF43, SMAD4, TP53, and VHL tumor suppressor loci; and to identify aneuploidy. The analyses were performed using specialized technologies for implementing and interpreting massively parallel sequencing data acquisition. An algorithm was used to select markers that could classify cyst type and grade. The accuracy of the molecular markers were compared with that of clinical markers, and a combination of molecular and clinical markers. Results We identified molecular markers and clinical features that classified cyst type with 90%–100% sensitivity and 92%–98% specificity. The molecular marker panel correctly identified 67 of the 74 patients who did not require surgery, and could therefore reduce the number of unnecessary operations by 91%. Conclusions We identified a panel of molecular markers and clinical features that show promise for the accurate classification of cystic neoplasms of the pancreas and identification of cysts that require surgery. PMID:26253305

  7. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features.

    PubMed

    Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua

    2017-02-01

    Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.

  8. eMolTox: prediction of molecular toxicity with confidence.

    PubMed

    Ji, Changge; Svensson, Fredrik; Zoufir, Azedine; Bender, Andreas

    2018-03-07

    In this work we present eMolTox, a web server for the prediction of potential toxicity associated with a given molecule. 174 toxicology-related in vitro/vivo experimental datasets were used for model construction and Mondrian conformal prediction was used to estimate the confidence of the resulting predictions. Toxic substructure analysis is also implemented in eMolTox. eMolTox predicts and displays a wealth of information of potential molecular toxicities for safety analysis in drug development. The eMolTox Server is freely available for use on the web at http://xundrug.cn/moltox. chicago.ji@gmail.com or ab454@cam.ac.uk. Supplementary data are available at Bioinformatics online.

  9. Feature extraction using molecular planes for fuzzy relational clustering of a flexible dopamine reuptake inhibitor.

    PubMed

    Banerjee, Amit; Misra, Milind; Pai, Deepa; Shih, Liang-Yu; Woodley, Rohan; Lu, Xiang-Jun; Srinivasan, A R; Olson, Wilma K; Davé, Rajesh N; Venanzi, Carol A

    2007-01-01

    Six rigid-body parameters (Shift, Slide, Rise, Tilt, Roll, Twist) are commonly used to describe the relative displacement and orientation of successive base pairs in a nucleic acid structure. The present work adapts this approach to describe the relative displacement and orientation of any two planes in an arbitrary molecule-specifically, planes which contain important pharmacophore elements. Relevant code from the 3DNA software package (Nucleic Acids Res. 2003, 31, 5108-5121) was generalized to treat molecular fragments other than DNA bases as input for the calculation of the corresponding rigid-body (or "planes") parameters. These parameters were used to construct feature vectors for a fuzzy relational clustering study of over 700 conformations of a flexible analogue of the dopamine reuptake inhibitor, GBR 12909. Several cluster validity measures were used to determine the optimal number of clusters. Translational (Shift, Slide, Rise) rather than rotational (Tilt, Roll, Twist) features dominate clustering based on planes that are relatively far apart, whereas both types of features are important to clustering when the pair of planes are close by. This approach was able to classify the data set of molecular conformations into groups and to identify representative conformers for use as template conformers in future Comparative Molecular Field Analysis studies of GBR 12909 analogues. The advantage of using the planes parameters, rather than the combination of atomic coordinates and angles between molecular planes used in our previous fuzzy relational clustering of the same data set (J. Chem. Inf. Model. 2005, 45, 610-623), is that the present clustering results are independent of molecular superposition and the technique is able to identify clusters in the molecule considered as a whole. This approach is easily generalizable to any two planes in any molecule.

  10. Critical Features of Fragment Libraries for Protein Structure Prediction

    PubMed Central

    dos Santos, Karina Baptista

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928

  11. Critical Features of Fragment Libraries for Protein Structure Prediction.

    PubMed

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  12. Correlation of chemical shifts predicted by molecular dynamics simulations for partially disordered proteins.

    PubMed

    Karp, Jerome M; Eryilmaz, Ertan; Erylimaz, Ertan; Cowburn, David

    2015-01-01

    There has been a longstanding interest in being able to accurately predict NMR chemical shifts from structural data. Recent studies have focused on using molecular dynamics (MD) simulation data as input for improved prediction. Here we examine the accuracy of chemical shift prediction for intein systems, which have regions of intrinsic disorder. We find that using MD simulation data as input for chemical shift prediction does not consistently improve prediction accuracy over use of a static X-ray crystal structure. This appears to result from the complex conformational ensemble of the disordered protein segments. We show that using accelerated molecular dynamics (aMD) simulations improves chemical shift prediction, suggesting that methods which better sample the conformational ensemble like aMD are more appropriate tools for use in chemical shift prediction for proteins with disordered regions. Moreover, our study suggests that data accurately reflecting protein dynamics must be used as input for chemical shift prediction in order to correctly predict chemical shifts in systems with disorder.

  13. Urothelial dysplasia and other flat lesions of the urinary bladder: clinicopathologic and molecular features.

    PubMed

    Hodges, Kurt B; Lopez-Beltran, Antonio; Davidson, Darrell D; Montironi, Rodolfo; Cheng, Liang

    2010-02-01

    The 2004 World Health Organization classification system for urothelial neoplasia classifies flat-related preneoplastic lesions as urothelial hyperplasia (flat and papillary), reactive urothelial atypia, urothelial atypia of unknown significance, urothelial dysplasia (low-grade intraurothelial neoplasia), and urothelial carcinoma in situ (high-grade intraurothelial neoplasia). Each lesion is defined with precise nomenclature and strict morphologic criteria. In many cases, morphologic features alone suffice for diagnosis. Other cases may require a panel of immunohistochemical antibodies consisting of cytokeratin 20, p53, and CD44 for diagnosis. Recent molecular studies have provided further insight into the premalignant potential of these urothelial lesions. Herein, we present a review of flat urothelial lesions of the urinary bladder as defined by the 2004 World Health Organization classification with focus on the clinicopathologic, immunohistochemical, and molecular features. Copyright 2010 Elsevier Inc. All rights reserved.

  14. [Non-small cell lung cancer. Subtyping and predictive molecular marker investigations in cytology].

    PubMed

    Savic, S; Bihl, M P; Bubendorf, L

    2012-07-01

    The diagnosis and treatment of non-small cell lung cancer (NSCLC) have been revolutionized over the last few years. Requirements for cytopathologists in lung cancer diagnosis have therefore changed. The general diagnostic category of NSLC is no longer sufficient. In addition cytological specimens need to be evaluated for adequacy regarding predictive marker analyses. Accurate NSCLC subtyping with a distinction of adenocarcinoma from squamous cell carcinoma is crucial for treatment decisions as the subtype will decide on the chemotherapy regimen and the choice of predictive marker analyses for targeted treatment. In the majority of cases, the subtype can be diagnosed by morphology alone. Cytology is equally well suited as biopsy specimens for the assessment of molecular predictive markers. The best results are achieved when both cytology and biopsy specimens are compared to choose the most appropriate specimen for morphological subtyping and molecular testing. In this paper, we discuss special issues of NSCLC subtyping and currently recommended predictive molecular marker analyses.

  15. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

    PubMed

    Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.

  16. Evaluation of protein-ligand affinity prediction using steered molecular dynamics simulations.

    PubMed

    Okimoto, Noriaki; Suenaga, Atsushi; Taiji, Makoto

    2017-11-01

    In computational drug design, ranking a series of compound analogs in a manner that is consistent with experimental affinities remains a challenge. In this study, we evaluated the prediction of protein-ligand binding affinities using steered molecular dynamics simulations. First, we investigated the appropriate conditions for accurate predictions in these simulations. A conic harmonic restraint was applied to the system for efficient sampling of work values on the ligand unbinding pathway. We found that pulling velocity significantly influenced affinity predictions, but that the number of collectable trajectories was less influential. We identified the appropriate pulling velocity and collectable trajectories for binding affinity predictions as 1.25 Å/ns and 100, respectively, and these parameters were used to evaluate three target proteins (FK506 binding protein, trypsin, and cyclin-dependent kinase 2). For these proteins using our parameters, the accuracy of affinity prediction was higher and more stable when Jarzynski's equality was employed compared with the second-order cumulant expansion equation of Jarzynski's equality. Our results showed that steered molecular dynamics simulations are effective for predicting the rank order of ligands; thus, they are a potential tool for compound selection in hit-to-lead and lead optimization processes.

  17. Knowledge-based fragment binding prediction.

    PubMed

    Tang, Grace W; Altman, Russ B

    2014-04-01

    Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.

  18. Knowledge-based Fragment Binding Prediction

    PubMed Central

    Tang, Grace W.; Altman, Russ B.

    2014-01-01

    Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening. PMID:24762971

  19. Novel method to predict body weight in children based on age and morphological facial features.

    PubMed

    Huang, Ziyin; Barrett, Jeffrey S; Barrett, Kyle; Barrett, Ryan; Ng, Chee M

    2015-04-01

    A new and novel approach of predicting the body weight of children based on age and morphological facial features using a three-layer feed-forward artificial neural network (ANN) model is reported. The model takes in four parameters, including age-based CDC-inferred median body weight and three facial feature distances measured from digital facial images. In this study, thirty-nine volunteer subjects with age ranging from 6-18 years old and BW ranging from 18.6-96.4 kg were used for model development and validation. The final model has a mean prediction error of 0.48, a mean squared error of 18.43, and a coefficient of correlation of 0.94. The model shows significant improvement in prediction accuracy over several age-based body weight prediction methods. Combining with a facial recognition algorithm that can detect, extract and measure the facial features used in this study, mobile applications that incorporate this body weight prediction method may be developed for clinical investigations where access to scales is limited. © 2014, The American College of Clinical Pharmacology.

  20. General morphological and biological features of neoplasms: integration of molecular findings.

    PubMed

    Diaz-Cano, S J

    2008-07-01

    This review highlights the importance of morphology-molecular correlations for a proper implementation of new markers. It covers both general aspects of tumorigenesis (which are normally omitted in papers analysing molecular pathways) and the general mechanisms for the acquired capabilities of neoplasms. The mechanisms are also supported by appropriate diagrams for each acquired capability that include overlooked features such as mobilization of cellular resources and changes in chromatin, transcription and epigenetics; fully accepted oncogenes and tumour suppressor genes are highlighted, while the pathways are also presented as activating or inactivating with appropriate colour coding. Finally, the concepts and mechanisms presented enable us to understand the basic requirements for the appropriate implementation of molecular tests in clinical practice. In summary, the basic findings are presented to serve as a bridge to clinical applications. The current definition of neoplasm is descriptive and difficult to apply routinely. Biologically, neoplasms develop through acquisition of capabilities that involve tumour cell aspects and modified microenvironment interactions, resulting in unrestricted growth due to a stepwise accumulation of cooperative genetic alterations that affect key molecular pathways. The correlation of these molecular aspects with morphological changes is essential for better understanding of essential concepts as early neoplasms/precancerous lesions, progression/dedifferentiation, and intratumour heterogeneity. The acquired capabilities include self-maintained replication (cell cycle dysregulation), extended cell survival (cell cycle arrest, apoptosis dysregulation, and replicative lifespan), genetic instability (chromosomal and microsatellite), changes of chromatin, transcription and epigenetics, mobilization of cellular resources, and modified microenvironment interactions (tumour cells, stromal cells, extracellular, endothelium). The acquired

  1. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

  2. Clinical and cytological features predictive of malignancy in thyroid follicular neoplasms.

    PubMed

    Lubitz, Carrie C; Faquin, William C; Yang, Jingyun; Mekel, Michal; Gaz, Randall D; Parangi, Sareh; Randolph, Gregory W; Hodin, Richard A; Stephen, Antonia E

    2010-01-01

    The preoperative diagnosis of malignancy in nodules suspicious for a follicular neoplasm remains challenging. A number of clinical and cytological parameters have been previously studied; however, none have significantly impacted clinical practice. The aim of this study was to determine predictive characteristics of follicular neoplasms useful for clinical application. Four clinical (age, sex, nodule size, solitary nodule) and 17 cytological variables were retrospectively reviewed for 144 patients with a nodule suspicious for follicular neoplasm, diagnosed preoperatively by fine-needle aspiration (FNA), from a single institution over a 2-year period (January 2006 to December 2007). The FNAs were examined by a single, blinded pathologist and compared with final surgical pathology. Significance of clinical and cytological variables was determined by univariate analysis and backward stepwise logistic regression. Odds ratios (ORs) for malignancy, a receiver operating characteristic curve, and predicted probabilities of combined features were determined. There was an 11% incidence of malignancy (16/144). On univariate analysis, nodule size >OR=4.0 cm nears significance (p = 0.054) and 9 of 17 cytological features examined were significantly associated with malignancy. Three variables stay in the final model after performing backward stepwise selection in logistic regression: nodule size (OR = 0.25, p = 0.05), presence of a transgressing vessel (OR = 23, p < 0.0001), and nuclear grooves (OR = 4.3, p = 0.03). The predicted probability of malignancy was 88.4% with the presence of all three variables on preoperative FNA. When the two papillary carcinomas were excluded from the analysis, the presence of nuclear grooves was no longer significant, and anisokaryosis (OR = 12.74, p = 0.005) and presence of nucleolus (OR = 0.11, p = 0.04) were significantly associated with malignancy. Excluding the two papillary thyroid carcinomas, a nodule size >or=4 cm, with a transgressing

  3. Predictive Ensemble Decoding of Acoustical Features Explains Context-Dependent Receptive Fields.

    PubMed

    Yildiz, Izzet B; Mesgarani, Nima; Deneve, Sophie

    2016-12-07

    A primary goal of auditory neuroscience is to identify the sound features extracted and represented by auditory neurons. Linear encoding models, which describe neural responses as a function of the stimulus, have been primarily used for this purpose. Here, we provide theoretical arguments and experimental evidence in support of an alternative approach, based on decoding the stimulus from the neural response. We used a Bayesian normative approach to predict the responses of neurons detecting relevant auditory features, despite ambiguities and noise. We compared the model predictions to recordings from the primary auditory cortex of ferrets and found that: (1) the decoding filters of auditory neurons resemble the filters learned from the statistics of speech sounds; (2) the decoding model captures the dynamics of responses better than a linear encoding model of similar complexity; and (3) the decoding model accounts for the accuracy with which the stimulus is represented in neural activity, whereas linear encoding model performs very poorly. Most importantly, our model predicts that neuronal responses are fundamentally shaped by "explaining away," a divisive competition between alternative interpretations of the auditory scene. Neural responses in the auditory cortex are dynamic, nonlinear, and hard to predict. Traditionally, encoding models have been used to describe neural responses as a function of the stimulus. However, in addition to external stimulation, neural activity is strongly modulated by the responses of other neurons in the network. We hypothesized that auditory neurons aim to collectively decode their stimulus. In particular, a stimulus feature that is decoded (or explained away) by one neuron is not explained by another. We demonstrated that this novel Bayesian decoding model is better at capturing the dynamic responses of cortical neurons in ferrets. Whereas the linear encoding model poorly reflects selectivity of neurons, the decoding model can

  4. Non-linear feature extraction from HRV signal for mortality prediction of ICU cardiovascular patient.

    PubMed

    Karimi Moridani, Mohammad; Setarehdan, Seyed Kamaledin; Motie Nasrabadi, Ali; Hajinasrollah, Esmaeil

    2016-01-01

    Intensive care unit (ICU) patients are at risk of in-ICU morbidities and mortality, making specific systems for identifying at-risk patients a necessity for improving clinical care. This study presents a new method for predicting in-hospital mortality using heart rate variability (HRV) collected from the times of a patient's ICU stay. In this paper, a HRV time series processing based method is proposed for mortality prediction of ICU cardiovascular patients. HRV signals were obtained measuring R-R time intervals. A novel method, named return map, is then developed that reveals useful information from the HRV time series. This study also proposed several features that can be extracted from the return map, including the angle between two vectors, the area of triangles formed by successive points, shortest distance to 45° line and their various combinations. Finally, a thresholding technique is proposed to extract the risk period and to predict mortality. The data used to evaluate the proposed algorithm obtained from 80 cardiovascular ICU patients, from the first 48 h of the first ICU stay of 40 males and 40 females. This study showed that the angle feature has on average a sensitivity of 87.5% (with 12 false alarms), the area feature has on average a sensitivity of 89.58% (with 10 false alarms), the shortest distance feature has on average a sensitivity of 85.42% (with 14 false alarms) and, finally, the combined feature has on average a sensitivity of 92.71% (with seven false alarms). The results showed that the last half an hour before the patient's death is very informative for diagnosing the patient's condition and to save his/her life. These results confirm that it is possible to predict mortality based on the features introduced in this paper, relying on the variations of the HRV dynamic characteristics.

  5. Identifying predictive features in drug response using machine learning: opportunities and challenges.

    PubMed

    Vidyasagar, Mathukumalli

    2015-01-01

    This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.

  6. Predicting cancer-relevant proteins using an improved molecular similarity ensemble approach.

    PubMed

    Zhou, Bin; Sun, Qi; Kong, De-Xin

    2016-05-31

    In this study, we proposed an improved algorithm for identifying proteins relevant to cancer. The algorithm was named two-layer molecular similarity ensemble approach (TL-SEA). We applied TL-SEA to analyzing the correlation between anticancer compounds (against cell lines K562, MCF7 and A549) and active compounds against separate target proteins listed in BindingDB. Several associations between cancer types and related proteins were revealed using this chemoinformatics approach. An analysis of the literature showed that 26 of 35 predicted proteins were correlated with cancer cell proliferation, apoptosis or differentiation. Additionally, interactions between proteins in BindingDB and anticancer chemicals were also predicted. We discuss the roles of the most important predicted proteins in cancer biology and conclude that TL-SEA could be a useful tool for inferring novel proteins involved in cancer and revealing underlying molecular mechanisms.

  7. Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts.

    PubMed

    Zhu, Jianwei; Zhang, Haicang; Li, Shuai Cheng; Wang, Chao; Kong, Lupeng; Sun, Shiwei; Zheng, Wei-Mou; Bu, Dongbo

    2017-12-01

    Accurate recognition of protein fold types is a key step for template-based prediction of protein structures. The existing approaches to fold recognition mainly exploit the features derived from alignments of query protein against templates. These approaches have been shown to be successful for fold recognition at family level, but usually failed at superfamily/fold levels. To overcome this limitation, one of the key points is to explore more structurally informative features of proteins. Although residue-residue contacts carry abundant structural information, how to thoroughly exploit these information for fold recognition still remains a challenge. In this study, we present an approach (called DeepFR) to improve fold recognition at superfamily/fold levels. The basic idea of our approach is to extract fold-specific features from predicted residue-residue contacts of proteins using deep convolutional neural network (DCNN) technique. Based on these fold-specific features, we calculated similarity between query protein and templates, and then assigned query protein with fold type of the most similar template. DCNN has showed excellent performance in image feature extraction and image recognition; the rational underlying the application of DCNN for fold recognition is that contact likelihood maps are essentially analogy to images, as they both display compositional hierarchy. Experimental results on the LINDAHL dataset suggest that even using the extracted fold-specific features alone, our approach achieved success rate comparable to the state-of-the-art approaches. When further combining these features with traditional alignment-related features, the success rate of our approach increased to 92.3%, 82.5% and 78.8% at family, superfamily and fold levels, respectively, which is about 18% higher than the state-of-the-art approach at fold level, 6% higher at superfamily level and 1% higher at family level. An independent assessment on SCOP_TEST dataset showed consistent

  8. Identification of critical chemical features for Aurora kinase-B inhibitors using Hip-Hop, virtual screening and molecular docking

    NASA Astrophysics Data System (ADS)

    Sakkiah, Sugunadevi; Thangapandian, Sundarapandian; John, Shalini; Lee, Keun Woo

    2011-01-01

    This study was performed to find the selective chemical features for Aurora kinase-B inhibitors using the potent methods like Hip-Hop, virtual screening, homology modeling, molecular dynamics and docking. The best hypothesis, Hypo1 was validated toward a wide range of test set containing the selective inhibitors of Aurora kinase-B. Homology modeling and molecular dynamics studies were carried out to perform the molecular docking studies. The best hypothesis Hypo1 was used as a 3D query to screen the chemical databases. The screened molecules from the databases were sorted based on ADME and drug like properties. The selective hit compounds were docked and the hydrogen bond interactions with the critical amino acids present in Aurora kinase-B were compared with the chemical features present in the Hypo1. Finally, we suggest that the chemical features present in the Hypo1 are vital for a molecule to inhibit the Aurora kinase-B activity.

  9. Searching for an Accurate Marker-Based Prediction of an Individual Quantitative Trait in Molecular Plant Breeding

    PubMed Central

    Fu, Yong-Bi; Yang, Mo-Hua; Zeng, Fangqin; Biligetu, Bill

    2017-01-01

    Molecular plant breeding with the aid of molecular markers has played an important role in modern plant breeding over the last two decades. Many marker-based predictions for quantitative traits have been made to enhance parental selection, but the trait prediction accuracy remains generally low, even with the aid of dense, genome-wide SNP markers. To search for more accurate trait-specific prediction with informative SNP markers, we conducted a literature review on the prediction issues in molecular plant breeding and on the applicability of an RNA-Seq technique for developing function-associated specific trait (FAST) SNP markers. To understand whether and how FAST SNP markers could enhance trait prediction, we also performed a theoretical reasoning on the effectiveness of these markers in a trait-specific prediction, and verified the reasoning through computer simulation. To the end, the search yielded an alternative to regular genomic selection with FAST SNP markers that could be explored to achieve more accurate trait-specific prediction. Continuous search for better alternatives is encouraged to enhance marker-based predictions for an individual quantitative trait in molecular plant breeding. PMID:28729875

  10. Deep Residual Network Predicts Cortical Representation and Organization of Visual Features for Rapid Categorization.

    PubMed

    Wen, Haiguang; Shi, Junxing; Chen, Wei; Liu, Zhongming

    2018-02-28

    The brain represents visual objects with topographic cortical patterns. To address how distributed visual representations enable object categorization, we established predictive encoding models based on a deep residual network, and trained them to predict cortical responses to natural movies. Using this predictive model, we mapped human cortical representations to 64,000 visual objects from 80 categories with high throughput and accuracy. Such representations covered both the ventral and dorsal pathways, reflected multiple levels of object features, and preserved semantic relationships between categories. In the entire visual cortex, object representations were organized into three clusters of categories: biological objects, non-biological objects, and background scenes. In a finer scale specific to each cluster, object representations revealed sub-clusters for further categorization. Such hierarchical clustering of category representations was mostly contributed by cortical representations of object features from middle to high levels. In summary, this study demonstrates a useful computational strategy to characterize the cortical organization and representations of visual features for rapid categorization.

  11. Role of Side-Chain Molecular Features in Tuning Lower Critical Solution Temperatures (LCSTs) of Oligoethylene Glycol Modified Polypeptides.

    PubMed

    Gharakhanian, Eric G; Deming, Timothy J

    2016-07-07

    A series of thermoresponsive polypeptides has been synthesized using a methodology that allowed facile adjustment of side-chain functional groups. The lower critical solution temperature (LCST) properties of these polymers in water were then evaluated relative to systematic molecular modifications in their side-chains. It was found that in addition to the number of ethylene glycol repeats in the side-chains, terminal and linker groups also have substantial and predictable effects on cloud point temperatures (Tcp). In particular, we found that the structure of these polypeptides allowed for inclusion of polar hydroxyl groups, which significantly increased their hydrophilicity and decreased the need to use long oligoethylene glycol repeats to obtain LCSTs. The thioether linkages in these polypeptides were found to provide an additional structural feature for reversible switching of both polypeptide conformation and thermoresponsive properties.

  12. A feature-based approach to modeling protein–protein interaction hot spots

    PubMed Central

    Cho, Kyu-il; Kim, Dongsup; Lee, Doheon

    2009-01-01

    Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions. PMID:19273533

  13. A feature-based approach to modeling protein-protein interaction hot spots.

    PubMed

    Cho, Kyu-il; Kim, Dongsup; Lee, Doheon

    2009-05-01

    Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to pi-related interactions, especially pi . . . pi interactions.

  14. Prediction of troponin-T degradation using color image texture features in 10d aged beef longissimus steaks.

    PubMed

    Sun, X; Chen, K J; Berg, E P; Newman, D J; Schwartz, C A; Keller, W L; Maddock Carlin, K R

    2014-02-01

    The objective was to use digital color image texture features to predict troponin-T degradation in beef. Image texture features, including 88 gray level co-occurrence texture features, 81 two-dimension fast Fourier transformation texture features, and 48 Gabor wavelet filter texture features, were extracted from color images of beef strip steaks (longissimus dorsi, n = 102) aged for 10d obtained using a digital camera and additional lighting. Steaks were designated degraded or not-degraded based on troponin-T degradation determined on d 3 and d 10 postmortem by immunoblotting. Statistical analysis (STEPWISE regression model) and artificial neural network (support vector machine model, SVM) methods were designed to classify protein degradation. The d 3 and d 10 STEPWISE models were 94% and 86% accurate, respectively, while the d 3 and d 10 SVM models were 63% and 71%, respectively, in predicting protein degradation in aged meat. STEPWISE and SVM models based on image texture features show potential to predict troponin-T degradation in meat. © 2013.

  15. Predicting Intelligibility Gains in Individuals with Dysarthria from Baseline Speech Features

    ERIC Educational Resources Information Center

    Fletcher, Annalise R.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Sinex, Donal G.; Liss, Julie M.

    2017-01-01

    Purpose: Across the treatment literature, behavioral speech modifications have produced variable intelligibility changes in speakers with dysarthria. This study is the first of two articles exploring whether measurements of baseline speech features can predict speakers' responses to these modifications. Method: Fifty speakers (7 older individuals…

  16. Relationship of carbohydrate molecular spectroscopic features in combined feeds to carbohydrate utilization and availability in ruminants

    NASA Astrophysics Data System (ADS)

    Zhang, Xuewei; Yu, Peiqiang

    To date, there is no study on the relationship between carbohydrate (CHO) molecular structures and nutrient availability of combined feeds in ruminants. The objective of this study was to use molecular spectroscopy to reveal the relationship between CHO molecular spectral profiles (in terms of functional groups (biomolecular, biopolymer) spectral peak area and height intensity) and CHO chemical profiles, CHO subfractions, energy values, and CHO rumen degradation kinetics of combined feeds of hulless barley with pure wheat dried distillers grains with solubles (DDGS) at five different combination ratios (hulless barley to pure wheat DDGS: 100:0, 75:25, 50:50, 25:75, 0:100). The molecular spectroscopic parameters assessed included: lignin biopolymer molecular spectra profile (peak area and height, region and baseline: ca. 1539-1504 cm-1); structural carbohydrate (STCHO, peaks area region and baseline: ca. 1485-1186 cm-1) mainly associated with hemi- and cellulosic compounds; cellulosic materials peak area (centered at ca. 1240 cm-1 with region and baseline: ca. 1272-1186 cm-1); total carbohydrate (CHO, peaks area region and baseline: ca. 1186-946 cm-1). The results showed that the functional groups (biomolecular, biopolymer) in the combined feeds are sensitive to the changes of carbohydrate chemical and nutrient profiles. The changes of the CHO molecular spectroscopic features in the combined feeds were highly correlated with CHO chemical profiles, CHO subfractions, in situ CHO rumen degradation kinetics and fermentable organic matter supply. Further study is needed to investigate possibility of using CHO molecular spectral features as a predictor to estimate nutrient availability in combined feeds for animals and quantify their relationship.

  17. Molecular Heterogeneity in Glioblastoma: Potential Clinical Implications

    PubMed Central

    Parker, Nicole Renee; Khong, Peter; Parkinson, Jonathon Fergus; Howell, Viive Maarika; Wheeler, Helen Ruth

    2015-01-01

    Glioblastomas, (grade 4 astrocytomas), are aggressive primary brain tumors characterized by histopathological heterogeneity. High-resolution sequencing technologies have shown that these tumors also feature significant inter-tumoral molecular heterogeneity. Molecular subtyping of these tumors has revealed several predictive and prognostic biomarkers. However, intra-tumoral heterogeneity may undermine the use of single biopsy analysis for determining tumor genotype and has implications for potential targeted therapies. The clinical relevance and theories of tumoral molecular heterogeneity in glioblastoma are discussed. PMID:25785247

  18. Can upstaging of ductal carcinoma in situ be predicted at biopsy by histologic and mammographic features?

    NASA Astrophysics Data System (ADS)

    Shi, Bibo; Grimm, Lars J.; Mazurowski, Maciej A.; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

    2017-03-01

    Reducing the overdiagnosis and overtreatment associated with ductal carcinoma in situ (DCIS) requires accurate prediction of the invasive potential at cancer screening. In this work, we investigated the utility of pre-operative histologic and mammographic features to predict upstaging of DCIS. The goal was to provide intentionally conservative baseline performance using readily available data from radiologists and pathologists and only linear models. We conducted a retrospective analysis on 99 patients with DCIS. Of those 25 were upstaged to invasive cancer at the time of definitive surgery. Pre-operative factors including both the histologic features extracted from stereotactic core needle biopsy (SCNB) reports and the mammographic features annotated by an expert breast radiologist were investigated with statistical analysis. Furthermore, we built classification models based on those features in an attempt to predict the presence of an occult invasive component in DCIS, with generalization performance assessed by receiver operating characteristic (ROC) curve analysis. Histologic features including nuclear grade and DCIS subtype did not show statistically significant differences between cases with pure DCIS and with DCIS plus invasive disease. However, three mammographic features, i.e., the major axis length of DCIS lesion, the BI-RADS level of suspicion, and radiologist's assessment did achieve the statistical significance. Using those three statistically significant features as input, a linear discriminant model was able to distinguish patients with DCIS plus invasive disease from those with pure DCIS, with AUC-ROC equal to 0.62. Overall, mammograms used for breast screening contain useful information that can be perceived by radiologists and help predict occult invasive components in DCIS.

  19. YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features.

    PubMed

    Kleftogiannis, Dimitrios; Theofilatos, Konstantinos; Likothanassis, Spiros; Mavroudi, Seferina

    2015-01-01

    MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of support vector machines (SVM) with genetic algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.

  20. [Prognostic and predictive molecular markers for urologic cancers].

    PubMed

    Hartmann, A; Schlomm, T; Bertz, S; Heinzelmann, J; Hölters, S; Simon, R; Stoehr, R; Junker, K

    2014-04-01

    Molecular prognostic factors and genetic alterations as predictive markers for cancer-specific targeted therapies are used today in the clinic for many malignancies. In recent years, many molecular markers for urogenital cancers have also been identified. However, these markers are not clinically used yet. In prostate cancer, novel next-generation sequencing methods revealed a detailed picture of the molecular changes. There is growing evidence that a combination of classical histopathological and validated molecular markers could lead to a more precise estimation of prognosis, thus, resulting in an increasing number of patients with active surveillance as a possible treatment option. In patients with urothelial carcinoma, histopathological factors but also the proliferation of the tumor, mutations in oncogenes leading to an increasing proliferation rate and changes in genes responsible for invasion and metastasis are important. In addition, gene expression profiles which could distinguish aggressive tumors with high risk of metastasis from nonmetastasizing tumors have been recently identified. In the future, this could potentially allow better selection of patients needing systemic perioperative treatment. In renal cell carcinoma, many molecular markers that are associated with metastasis and survival have been identified. Some of these markers were also validated as independent prognostic markers. Selection of patients with primarily organ-confined tumors and increased risk of metastasis for adjuvant systemic therapy could be clinically relevant in the future.

  1. Systems Biological Approach of Molecular Descriptors Connectivity: Optimal Descriptors for Oral Bioavailability Prediction

    PubMed Central

    Ahmed, Shiek S. S. J.; Ramakrishnan, V.

    2012-01-01

    Background Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. Results The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/−bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. Conclusion The logistic algorithm with 47 selected descriptors correctly predicted the oral

  2. Systems biological approach of molecular descriptors connectivity: optimal descriptors for oral bioavailability prediction.

    PubMed

    Ahmed, Shiek S S J; Ramakrishnan, V

    2012-01-01

    Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/-bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy

  3. Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method

    NASA Astrophysics Data System (ADS)

    Shamsoddini, A.; Aboodi, M. R.; Karami, J.

    2017-09-01

    Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

  4. A data-driven feature extraction framework for predicting the severity of condition of congestive heart failure patients.

    PubMed

    Sideris, Costas; Alshurafa, Nabil; Pourhomayoun, Mohammad; Shahmohammadi, Farhad; Samy, Lauren; Sarrafzadeh, Majid

    2015-01-01

    In this paper, we propose a novel methodology for utilizing disease diagnostic information to predict severity of condition for Congestive Heart Failure (CHF) patients. Our methodology relies on a novel, clustering-based, feature extraction framework using disease diagnostic information. To reduce the dimensionality we identify disease clusters using cooccurence frequencies. We then utilize these clusters as features to predict patient severity of condition. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 patients. We compare our cluster-based feature set with another that incorporates the Charlson comorbidity score as a feature and demonstrate an accuracy improvement of up to 14% in the predictability of the severity of condition.

  5. Diffuse gliomas with FGFR3-TACC3 fusion have characteristic histopathological and molecular features.

    PubMed

    Bielle, Franck; Di Stefano, Anna-Luisa; Meyronet, David; Picca, Alberto; Villa, Chiara; Bernier, Michèle; Schmitt, Yohann; Giry, Marine; Rousseau, Audrey; Figarella-Branger, Dominique; Maurage, Claude-Alain; Uro-Coste, Emmanuelle; Lasorella, Anna; Iavarone, Antonio; Sanson, Marc; Mokhtari, Karima

    2017-10-04

    Adult glioblastomas, IDH-wildtype represent a heterogeneous group of diseases. They are resistant to conventional treatment by concomitant radiochemotherapy and carry a dismal prognosis. The discovery of oncogenic gene fusions in these tumors has led to prospective targeted treatments, but identification of these rare alterations in practice is challenging. Here, we report a series of 30 adult diffuse gliomas with an in frame FGFR3-TACC3 oncogenic fusion (n = 27 WHO grade IV and n = 3 WHO grade II) as well as their histological and molecular features. We observed recurrent morphological features (monomorphous ovoid nuclei, nuclear palisading and thin parallel cytoplasmic processes, endocrinoid network of thin capillaries) associated with frequent microcalcifications and desmoplasia. We report a constant immunoreactivity for FGFR3, which is a valuable method for screening for the FGFR3-TACC3 fusion with 100% sensitivity and 92% specificity. We confirmed the associated molecular features (typical genetic alterations of glioblastoma, except the absence of EGFR amplification, and an increased frequency of CDK4 and MDM2 amplifications). FGFR3 immunopositivity is a valuable tool to identify gliomas that are likely to harbor the FGFR3-TACC3 fusion for inclusion in targeted therapeutic trials. © 2017 International Society of Neuropathology.

  6. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Feature Selection Methods

    PubMed Central

    Wang, Ping; Hu, Lele; Liu, Guiyou; Jiang, Nan; Chen, Xiaoyun; Xu, Jianyong; Zheng, Wen; Li, Li; Tan, Ming; Chen, Zugen; Song, Hui; Cai, Yu-Dong; Chou, Kuo-Chen

    2011-01-01

    Antimicrobial peptides (AMPs) represent a class of natural peptides that form a part of the innate immune system, and this kind of ‘nature's antibiotics’ is quite promising for solving the problem of increasing antibiotic resistance. In view of this, it is highly desired to develop an effective computational method for accurately predicting novel AMPs because it can provide us with more candidates and useful insights for drug design. In this study, a new method for predicting AMPs was implemented by integrating the sequence alignment method and the feature selection method. It was observed that, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was over 80.23%, and the Mathews correlation coefficient is 0.73, indicating a good prediction. Moreover, it is indicated by an in-depth feature analysis that the results are quite consistent with the previously known knowledge that some amino acids are preferential in AMPs and that these amino acids do play an important role for the antimicrobial activity. For the convenience of most experimental scientists who want to use the prediction method without the interest to follow the mathematical details, a user-friendly web-server is provided at http://amp.biosino.org/. PMID:21533231

  7. Multi-center prediction of hemorrhagic transformation in acute ischemic stroke using permeability imaging features.

    PubMed

    Scalzo, Fabien; Alger, Jeffry R; Hu, Xiao; Saver, Jeffrey L; Dani, Krishna A; Muir, Keith W; Demchuk, Andrew M; Coutts, Shelagh B; Luby, Marie; Warach, Steven; Liebeskind, David S

    2013-07-01

    Permeability images derived from magnetic resonance (MR) perfusion images are sensitive to blood-brain barrier derangement of the brain tissue and have been shown to correlate with subsequent development of hemorrhagic transformation (HT) in acute ischemic stroke. This paper presents a multi-center retrospective study that evaluates the predictive power in terms of HT of six permeability MRI measures including contrast slope (CS), final contrast (FC), maximum peak bolus concentration (MPB), peak bolus area (PB), relative recirculation (rR), and percentage recovery (%R). Dynamic T2*-weighted perfusion MR images were collected from 263 acute ischemic stroke patients from four medical centers. An essential aspect of this study is to exploit a classifier-based framework to automatically identify predictive patterns in the overall intensity distribution of the permeability maps. The model is based on normalized intensity histograms that are used as input features to the predictive model. Linear and nonlinear predictive models are evaluated using a cross-validation to measure generalization power on new patients and a comparative analysis is provided for the different types of parameters. Results demonstrate that perfusion imaging in acute ischemic stroke can predict HT with an average accuracy of more than 85% using a predictive model based on a nonlinear regression model. Results also indicate that the permeability feature based on the percentage of recovery performs significantly better than the other features. This novel model may be used to refine treatment decisions in acute stroke. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Prediction of solubility parameters and miscibility of pharmaceutical compounds by molecular dynamics simulations.

    PubMed

    Gupta, Jasmine; Nunes, Cletus; Vyas, Shyam; Jonnalagadda, Sriramakamal

    2011-03-10

    The objectives of this study were (i) to develop a computational model based on molecular dynamics technique to predict the miscibility of indomethacin in carriers (polyethylene oxide, glucose, and sucrose) and (ii) to experimentally verify the in silico predictions by characterizing the drug-carrier mixtures using thermoanalytical techniques. Molecular dynamics (MD) simulations were performed using the COMPASS force field, and the cohesive energy density and the solubility parameters were determined for the model compounds. The magnitude of difference in the solubility parameters of drug and carrier is indicative of their miscibility. The MD simulations predicted indomethacin to be miscible with polyethylene oxide and to be borderline miscible with sucrose and immiscible with glucose. The solubility parameter values obtained using the MD simulations values were in reasonable agreement with those calculated using group contribution methods. Differential scanning calorimetry showed melting point depression of polyethylene oxide with increasing levels of indomethacin accompanied by peak broadening, confirming miscibility. In contrast, thermal analysis of blends of indomethacin with sucrose and glucose verified general immiscibility. The findings demonstrate that molecular modeling is a powerful technique for determining the solubility parameters and predicting miscibility of pharmaceutical compounds. © 2011 American Chemical Society

  9. Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features

    PubMed Central

    Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou

    2016-01-01

    The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646

  10. Predictive features of breast cancer on Mexican screening mammography patients

    NASA Astrophysics Data System (ADS)

    Rodriguez-Rojas, Juan; Garza-Montemayor, Margarita; Trevino-Alvarado, Victor; Tamez-Pena, José Gerardo

    2013-02-01

    Breast cancer is the most common type of cancer worldwide. In response, breast cancer screening programs are becoming common around the world and public programs now serve millions of women worldwide. These programs are expensive, requiring many specialized radiologists to examine all images. Nevertheless, there is a lack of trained radiologists in many countries as in Mexico, which is a barrier towards decreasing breast cancer mortality, pointing at the need of a triaging system that prioritizes high risk cases for prompt interpretation. Therefore we explored in an image database of Mexican patients whether high risk cases can be distinguished using image features. We collected a set of 200 digital screening mammography cases from a hospital in Mexico, and assigned low or high risk labels according to its BIRADS score. Breast tissue segmentation was performed using an automatic procedure. Image features were obtained considering only the segmented region on each view and comparing the bilateral di erences of the obtained features. Predictive combinations of features were chosen using a genetic algorithms based feature selection procedure. The best model found was able to classify low-risk and high-risk cases with an area under the ROC curve of 0.88 on a 150-fold cross-validation test. The features selected were associated to the differences of signal distribution and tissue shape on bilateral views. The model found can be used to automatically identify high risk cases and trigger the necessary measures to provide prompt treatment.

  11. TU-C-17A-10: Patient Features Based Dosimetric Pareto Front Prediction In Esophagus Cancer Radiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, J; Zhao, K; Peng, J

    2014-06-15

    Purpose: The purpose of this study is to study the feasibility of the dosimetric pareto front (PF) prediction based on patient anatomic and dosimetric parameters for esophagus cancer patients. Methods: Sixty esophagus patients in our institution were enrolled in this study. A total 2920 IMRT plans were created to generated PF for each patient. On average, each patient had 48 plans. The anatomic and dosimetric features were extracted from those plans. The mean lung dose (MLD), mean heart dose (MHD), spinal cord max dose and PTV homogeneous index (PTVHI) were recorded for each plan. The principal component analysis (PCA) wasmore » used to extract overlap volume histogram (OVH) features between PTV and other critical organs. The full dataset was separated into two parts include the training dataset and the validation dataset. The prediction outcomes were the MHD and MLD for the current study. The spearman rank correlation coefficient was used to evaluate the correlation between the anatomical features and dosimetric features. The PF was fit by the the stepwise multiple regression method. The cross-validation method was used to evaluation the model. Results: The mean prediction error of the MHD was 465 cGy with 100 repetitions. The most correlated factors were the first principal components of the OVH between heart and PTV, and the overlap between heart and PTV in Z-axis. The mean prediction error of the MLD was 195 cGy. The most correlated factors were the first principal components of the OVH between lung and PTV, and the overlap between lung and PTV in Z-axis. Conclusion: It is feasible to use patients anatomic and dosimetric features to generate a predicted PF. Additional samples and further studies were required to get a better prediction model.« less

  12. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

  13. Induction of CaSR expression circumvents the molecular features of malignant CaSR null colon cancer cells.

    PubMed

    Singh, Navneet; Chakrabarty, Subhas

    2013-11-15

    We recently reported on the isolation and characterization of calcium sensing receptor (CaSR) null human colon cancer cells (Singh et al., Int J Cancer 2013; 132: 1996-2005). CaSR null cells possess a myriad of molecular features that are linked to a highly malignant and drug resistant phenotype of colon cancer. The CaSR null phenotype can be maintained in defined human embryonic stem cell culture medium. We now show that the CaSR null cells can be induced to differentiate in conventional culture medium, regained the expression of CaSR with a concurrent reversal of the cellular and molecular features associated with the null phenotype. These features include cellular morphology, expression of colon cancer stem cell markers, expression of survivin and thymidylate synthase and sensitivity to fluorouracil. Other features include the expression of epithelial mesenchymal transition linked molecules and transcription factors, oncogenic miRNAs and tumor suppressive molecule and miRNA. With the exception of cancer stem cell markers, the reversal of molecular features, upon the induction of CaSR expression, is directly linked to the expression and function of CaSR because blocking CaSR induction by shRNA circumvented such reversal. We further report that methylation and demethylation of the CaSR gene promoter underlie CaSR expression. Due to the malignant nature of the CaSR null cells, inclusion of the CaSR null phenotype in disease management may improve on the mortality of this disease. Because CaSR is a robust promoter of differentiation and mediates its action through diverse mechanisms and pathways, inactivation of CaSR may serve as a new paradigm in colon carcinogenesis. Copyright © 2013 UICC.

  14. Predicting the excess solubility of acetanilide, acetaminophen, phenacetin, benzocaine, and caffeine in binary water/ethanol mixtures via molecular simulation.

    PubMed

    Paluch, Andrew S; Parameswaran, Sreeja; Liu, Shuai; Kolavennu, Anasuya; Mobley, David L

    2015-01-28

    We present a general framework to predict the excess solubility of small molecular solids (such as pharmaceutical solids) in binary solvents via molecular simulation free energy calculations at infinite dilution with conventional molecular models. The present study used molecular dynamics with the General AMBER Force Field to predict the excess solubility of acetanilide, acetaminophen, phenacetin, benzocaine, and caffeine in binary water/ethanol solvents. The simulations are able to predict the existence of solubility enhancement and the results are in good agreement with available experimental data. The accuracy of the predictions in addition to the generality of the method suggests that molecular simulations may be a valuable design tool for solvent selection in drug development processes.

  15. Predicting the excess solubility of acetanilide, acetaminophen, phenacetin, benzocaine, and caffeine in binary water/ethanol mixtures via molecular simulation

    PubMed Central

    Paluch, Andrew S.; Parameswaran, Sreeja; Liu, Shuai; Kolavennu, Anasuya; Mobley, David L.

    2015-01-01

    We present a general framework to predict the excess solubility of small molecular solids (such as pharmaceutical solids) in binary solvents via molecular simulation free energy calculations at infinite dilution with conventional molecular models. The present study used molecular dynamics with the General AMBER Force Field to predict the excess solubility of acetanilide, acetaminophen, phenacetin, benzocaine, and caffeine in binary water/ethanol solvents. The simulations are able to predict the existence of solubility enhancement and the results are in good agreement with available experimental data. The accuracy of the predictions in addition to the generality of the method suggests that molecular simulations may be a valuable design tool for solvent selection in drug development processes. PMID:25637996

  16. Predicting the excess solubility of acetanilide, acetaminophen, phenacetin, benzocaine, and caffeine in binary water/ethanol mixtures via molecular simulation

    NASA Astrophysics Data System (ADS)

    Paluch, Andrew S.; Parameswaran, Sreeja; Liu, Shuai; Kolavennu, Anasuya; Mobley, David L.

    2015-01-01

    We present a general framework to predict the excess solubility of small molecular solids (such as pharmaceutical solids) in binary solvents via molecular simulation free energy calculations at infinite dilution with conventional molecular models. The present study used molecular dynamics with the General AMBER Force Field to predict the excess solubility of acetanilide, acetaminophen, phenacetin, benzocaine, and caffeine in binary water/ethanol solvents. The simulations are able to predict the existence of solubility enhancement and the results are in good agreement with available experimental data. The accuracy of the predictions in addition to the generality of the method suggests that molecular simulations may be a valuable design tool for solvent selection in drug development processes.

  17. Predictive features associated with thyrotoxic storm and management.

    PubMed

    Bacuzzi, Alessandro; Dionigi, Gianlorenzo; Guzzetti, Luca; De Martino, Alessandro Ivan; Severgnini, Paolo; Cuffari, Salvatore

    2017-10-01

    Thyroid storm (TS) is an endocrine emergency characterized by rapid deterioration, associated with high mortality rate therefore rapid diagnosis and emergent treatment is mandatory. In the past, thyroid surgery was the most common cause of TS, but recent preoperative medication creates a euthyroid state before performing surgery. An active approach during perioperative period could determine an effective clinical treatment of this life-threating diseases. Recently, the Japan Thyroid Association and Japan Endocrine Society developed diagnostic criteria for TS focusing on premature and prompt diagnosis avoiding inopportune e useless drugs. This review analyses predictive features associated with thyrotoxic storm highlighting recent literature to optimize the patient quality of care.

  18. TU-CD-BRB-01: Normal Lung CT Texture Features Improve Predictive Models for Radiation Pneumonitis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krafft, S; The University of Texas Graduate School of Biomedical Sciences, Houston, TX; Briere, T

    2015-06-15

    Purpose: Existing normal tissue complication probability (NTCP) models for radiation pneumonitis (RP) traditionally rely on dosimetric and clinical data but are limited in terms of performance and generalizability. Extraction of pre-treatment image features provides a potential new category of data that can improve NTCP models for RP. We consider quantitative measures of total lung CT intensity and texture in a framework for prediction of RP. Methods: Available clinical and dosimetric data was collected for 198 NSCLC patients treated with definitive radiotherapy. Intensity- and texture-based image features were extracted from the T50 phase of the 4D-CT acquired for treatment planning. Amore » total of 3888 features (15 clinical, 175 dosimetric, and 3698 image features) were gathered and considered candidate predictors for modeling of RP grade≥3. A baseline logistic regression model with mean lung dose (MLD) was first considered. Additionally, a least absolute shrinkage and selection operator (LASSO) logistic regression was applied to the set of clinical and dosimetric features, and subsequently to the full set of clinical, dosimetric, and image features. Model performance was assessed by comparing area under the curve (AUC). Results: A simple logistic fit of MLD was an inadequate model of the data (AUC∼0.5). Including clinical and dosimetric parameters within the framework of the LASSO resulted in improved performance (AUC=0.648). Analysis of the full cohort of clinical, dosimetric, and image features provided further and significant improvement in model performance (AUC=0.727). Conclusions: To achieve significant gains in predictive modeling of RP, new categories of data should be considered in addition to clinical and dosimetric features. We have successfully incorporated CT image features into a framework for modeling RP and have demonstrated improved predictive performance. Validation and further investigation of CT image features in the context of RP

  19. PREDICTION OF MOLECULAR PROPERTIES WITH MID-INFRARED SPECTRA AND INTERFEROGRAMS

    EPA Science Inventory

    We have built infrared spectroscopy-based partial least squares (PLS) models for molecular polarizabilities using a 97 member training set and a 59 member independent prediction set. These 156 compounds span a very wide range of chemical structure. Our goal was to use this well...

  20. Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features

    PubMed Central

    Shi, Xiao-He; Hu, Le-Le; Kong, Xiangyin; Cai, Yu-Dong; Chou, Kuo-Chen

    2010-01-01

    Background Study of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. Methods/Principal Findings To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. Conclusion/Significance Our results indicate that the network prediction system thus established is quite promising and encouraging. PMID:20300175

  1. Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features

    PubMed Central

    Mohammad-Noori, Morteza; Beer, Michael A.

    2014-01-01

    Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408

  2. Enhanced regulatory sequence prediction using gapped k-mer features.

    PubMed

    Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A

    2014-07-01

    Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.

  3. Can we better predict the biologic behavior of incidental IPMN? A comprehensive analysis of molecular diagnostics and biomarkers in intraductal papillary mucinous neoplasms of the pancreas.

    PubMed

    Tulla, Kiara A; Maker, Ajay V

    2018-03-01

    Predicting the biologic behavior of intraductal papillary mucinous neoplasm (IPMN) remains challenging. Current guidelines utilize patient symptoms and imaging characteristics to determine appropriate surgical candidates. However, the majority of resected cysts remain low-risk lesions, many of which may be feasible to have under surveillance. We herein characterize the most promising and up-to-date molecular diagnostics in order to identify optimal components of a molecular signature to distinguish levels of IPMN dysplasia. A comprehensive systematic review of pertinent literature, including our own experience, was conducted based on the PRISMA guidelines. Molecular diagnostics in IPMN patient tissue, duodenal secretions, cyst fluid, saliva, and serum were evaluated and organized into the following categories: oncogenes, tumor suppressor genes, glycoproteins, markers of the immune response, proteomics, DNA/RNA mutations, and next-generation sequencing/microRNA. Specific targets in each of these categories, and in aggregate, were identified by their ability to both characterize a cyst as an IPMN and determine the level of cyst dysplasia. Combining molecular signatures with clinical and imaging features in this era of next-generation sequencing and advanced computational analysis will enable enhanced sensitivity and specificity of current models to predict the biologic behavior of IPMN.

  4. Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information.

    PubMed

    Hieke, Stefanie; Benner, Axel; Schlenl, Richard F; Schumacher, Martin; Bullinger, Lars; Binder, Harald

    2016-08-30

    High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building multivariable risk prediction models for a clinical endpoint, such as treatment response or survival. Unfortunately, such a high-dimensional modeling task will often be complicated by a limited overlap of molecular measurements at different levels between patients, i.e. measurements from all molecular levels are available only for a smaller proportion of patients. We propose a sequential strategy for building clinical risk prediction models that integrate genome-wide measurements from two molecular levels in a complementary way. To deal with partial overlap, we develop an imputation approach that allows us to use all available data. This approach is investigated in two acute myeloid leukemia applications combining gene expression with either SNP or DNA methylation data. After obtaining a sparse risk prediction signature e.g. from SNP data, an automatically selected set of prognostic SNPs, by componentwise likelihood-based boosting, imputation is performed for the corresponding linear predictor by a linking model that incorporates e.g. gene expression measurements. The imputed linear predictor is then used for adjustment when building a prognostic signature from the gene expression data. For evaluation, we consider stability, as quantified by inclusion frequencies across resampling data sets. Despite an extremely small overlap in the application example with gene expression and SNPs, several genes are seen to be more stably identified when taking the (imputed) linear predictor from the SNP data into account. In the application with gene expression and DNA methylation, prediction performance with respect to survival also indicates that the proposed approach might

  5. Molecular mapping of 21 features associated with partial monosomy 21: Involvement of the APP-SODI region

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chettouh, Z.; Maunoury, C.; Sinet, P.M.

    1995-07-01

    We compared the phenotypes, karyotypes, and molecular data for six cases of partial monosomy 21. Regions of chromosome 21, the deletion of which corresponds to particular features of monosomy 21, were thereby defined. Five such regions were identified for 21 features. Ten of the features could be assigned to the region flanked by genes APP and SOD1: six facial features, transverse palmar crease, arthrogryposis-like symptoms, hypertonia, and contribution to mental retardation. This region, covering the interface of bands 21q21-21q22.1, is 4.7-6.4 Mb long and contains the gene encoding the glutamate receptor subunit GluR5 (GRIK1). 82 refs., 5 figs., 1 tab.

  6. Computer extracted texture features on T2w MRI to predict biochemical recurrence following radiation therapy for prostate cancer

    NASA Astrophysics Data System (ADS)

    Ginsburg, Shoshana B.; Rusu, Mirabela; Kurhanewicz, John; Madabhushi, Anant

    2014-03-01

    In this study we explore the ability of a novel machine learning approach, in conjunction with computer-extracted features describing prostate cancer morphology on pre-treatment MRI, to predict whether a patient will develop biochemical recurrence within ten years of radiation therapy. Biochemical recurrence, which is characterized by a rise in serum prostate-specific antigen (PSA) of at least 2 ng/mL above the nadir PSA, is associated with increased risk of metastasis and prostate cancer-related mortality. Currently, risk of biochemical recurrence is predicted by the Kattan nomogram, which incorporates several clinical factors to predict the probability of recurrence-free survival following radiation therapy (but has limited prediction accuracy). Semantic attributes on T2w MRI, such as the presence of extracapsular extension and seminal vesicle invasion and surrogate measure- ments of tumor size, have also been shown to be predictive of biochemical recurrence risk. While the correlation between biochemical recurrence and factors like tumor stage, Gleason grade, and extracapsular spread are well- documented, it is less clear how to predict biochemical recurrence in the absence of extracapsular spread and for small tumors fully contained in the capsule. Computer{extracted texture features, which quantitatively de- scribe tumor micro-architecture and morphology on MRI, have been shown to provide clues about a tumor's aggressiveness. However, while computer{extracted features have been employed for predicting cancer presence and grade, they have not been evaluated in the context of predicting risk of biochemical recurrence. This work seeks to evaluate the role of computer-extracted texture features in predicting risk of biochemical recurrence on a cohort of sixteen patients who underwent pre{treatment 1.5 Tesla (T) T2w MRI. We extract a combination of first-order statistical, gradient, co-occurrence, and Gabor wavelet features from T2w MRI. To identify which of these

  7. Using the Personality Assessment Inventory Antisocial and Borderline Features Scales to Predict Behavior Change.

    PubMed

    Penson, Brittany N; Ruchensky, Jared R; Morey, Leslie C; Edens, John F

    2016-11-01

    A substantial amount of research has examined the developmental trajectory of antisocial behavior and, in particular, the relationship between antisocial behavior and maladaptive personality traits. However, research typically has not controlled for previous behavior (e.g., past violence) when examining the utility of personality measures, such as self-report scales of antisocial and borderline traits, in predicting future behavior (e.g., subsequent violence). Examination of the potential interactive effects of measures of both antisocial and borderline traits also is relatively rare in longitudinal research predicting adverse outcomes. The current study utilizes a large sample of youthful offenders ( N = 1,354) from the Pathways to Desistance project to examine the separate effects of the Personality Assessment Inventory Antisocial Features (ANT) and Borderline Features (BOR) scales in predicting future offending behavior as well as trends in other negative outcomes (e.g., substance abuse, violence, employment difficulties) over a 1-year follow-up period. In addition, an ANT × BOR interaction term was created to explore the predictive effects of secondary psychopathy. ANT and BOR both explained unique variance in the prediction of various negative outcomes even after controlling for past indicators of those same behaviors during the preceding year.

  8. Predictive maps for Juno perijoves and identification of significant features

    NASA Astrophysics Data System (ADS)

    Rogers, J. H.; Adamoli, G.; Jacquesson, M.; Vedovato, M.; Mettig, H.-J.; Eichstädt, G.; Caplinger, M.; Momary, T. W.; Orton, G. S.; Tabataba-Vakili, F.; Hansen, C. J.

    2017-09-01

    At each Juno perijove, JunoCam takes hi-res images of selected latitudes along the sub-spacecraft track, as determined by public voting. To inform this target election process, we use the continuous coverage of Jupiter's visible clouds by amateur imaging, and the tracking of features from those images by the JUPOS project, to identify the features which are expected to be visible at the upcoming perijove. We produce a predictive map for each perijove, and subsequently annotate the JunoCam images to locate the known jets and circulation. Up to perijove 5, this collaboration has contributed to hi-res imaging of several long-lived circulations in northern and southern hemispheres, of major new convective outbreaks in the North and South Equatorial Belts, and of the North Temperate Belt maturing after a cyclic outbreak.

  9. Feature Selection for Wheat Yield Prediction

    NASA Astrophysics Data System (ADS)

    Ruß, Georg; Kruse, Rudolf

    Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.

  10. Predicting the Occurrence of Cave-Inhabiting Fauna Based on Features of the Earth Surface Environment.

    PubMed

    Christman, Mary C; Doctor, Daniel H; Niemiller, Matthew L; Weary, David J; Young, John A; Zigler, Kirk S; Culver, David C

    2016-01-01

    One of the most challenging fauna to study in situ is the obligate cave fauna because of the difficulty of sampling. Cave-limited species display patchy and restricted distributions, but it is often unclear whether the observed distribution is a sampling artifact or a true restriction in range. Further, the drivers of the distribution could be local environmental conditions, such as cave humidity, or they could be associated with surface features that are surrogates for cave conditions. If surface features can be used to predict the distribution of important cave taxa, then conservation management is more easily obtained. We examined the hypothesis that the presence of major faunal groups of cave obligate species could be predicted based on features of the earth surface. Georeferenced records of cave obligate amphipods, crayfish, fish, isopods, beetles, millipedes, pseudoscorpions, spiders, and springtails within the area of Appalachian Landscape Conservation Cooperative in the eastern United States (Illinois to Virginia and New York to Alabama) were assigned to 20 x 20 km grid cells. Habitat suitability for these faunal groups was modeled using logistic regression with twenty predictor variables within each grid cell, such as percent karst, soil features, temperature, precipitation, and elevation. Models successfully predicted the presence of a group greater than 65% of the time (mean = 88%) for the presence of single grid cell endemics, and for all faunal groups except pseudoscorpions. The most common predictor variables were latitude, percent karst, and the standard deviation of the Topographic Position Index (TPI), a measure of landscape rugosity within each grid cell. The overall success of these models points to a number of important connections between the surface and cave environments, and some of these, especially soil features and topographic variability, suggest new research directions. These models should prove to be useful tools in predicting the

  11. Predicting the Occurrence of Cave-Inhabiting Fauna Based on Features of the Earth Surface Environment

    PubMed Central

    Doctor, Daniel H.; Niemiller, Matthew L.; Weary, David J.; Young, John A.; Zigler, Kirk S.

    2016-01-01

    One of the most challenging fauna to study in situ is the obligate cave fauna because of the difficulty of sampling. Cave-limited species display patchy and restricted distributions, but it is often unclear whether the observed distribution is a sampling artifact or a true restriction in range. Further, the drivers of the distribution could be local environmental conditions, such as cave humidity, or they could be associated with surface features that are surrogates for cave conditions. If surface features can be used to predict the distribution of important cave taxa, then conservation management is more easily obtained. We examined the hypothesis that the presence of major faunal groups of cave obligate species could be predicted based on features of the earth surface. Georeferenced records of cave obligate amphipods, crayfish, fish, isopods, beetles, millipedes, pseudoscorpions, spiders, and springtails within the area of Appalachian Landscape Conservation Cooperative in the eastern United States (Illinois to Virginia and New York to Alabama) were assigned to 20 x 20 km grid cells. Habitat suitability for these faunal groups was modeled using logistic regression with twenty predictor variables within each grid cell, such as percent karst, soil features, temperature, precipitation, and elevation. Models successfully predicted the presence of a group greater than 65% of the time (mean = 88%) for the presence of single grid cell endemics, and for all faunal groups except pseudoscorpions. The most common predictor variables were latitude, percent karst, and the standard deviation of the Topographic Position Index (TPI), a measure of landscape rugosity within each grid cell. The overall success of these models points to a number of important connections between the surface and cave environments, and some of these, especially soil features and topographic variability, suggest new research directions. These models should prove to be useful tools in predicting the

  12. Predicting the occurrence of cave-inhabiting fauna based on features of the earth surface environment

    USGS Publications Warehouse

    Christman, Mary C.; Doctor, Daniel H.; Niemiller, Matthew L.; Weary, David J.; Young, John A.; Zigler, Kirk S.; Culver, David C.

    2016-01-01

    One of the most challenging fauna to study in situ is the obligate cave fauna because of the difficulty of sampling. Cave-limited species display patchy and restricted distributions, but it is often unclear whether the observed distribution is a sampling artifact or a true restriction in range. Further, the drivers of the distribution could be local environmental conditions, such as cave humidity, or they could be associated with surface features that are surrogates for cave conditions. If surface features can be used to predict the distribution of important cave taxa, then conservation management is more easily obtained. We examined the hypothesis that the presence of major faunal groups of cave obligate species could be predicted based on features of the earth surface. Georeferenced records of cave obligate amphipods, crayfish, fish, isopods, beetles, millipedes, pseudoscorpions, spiders, and springtails within the area of Appalachian Landscape Conservation Cooperative in the eastern United States (Illinois to Virginia and New York to Alabama) were assigned to 20 x 20 km grid cells. Habitat suitability for these faunal groups was modeled using logistic regression with twenty predictor variables within each grid cell, such as percent karst, soil features, temperature, precipitation, and elevation. Models successfully predicted the presence of a group greater than 65% of the time (mean = 88%) for the presence of single grid cell endemics, and for all faunal groups except pseudoscorpions. The most common predictor variables were latitude, percent karst, and the standard deviation of the Topographic Position Index (TPI), a measure of landscape rugosity within each grid cell. The overall success of these models points to a number of important connections between the surface and cave environments, and some of these, especially soil features and topographic variability, suggest new research directions. These models should prove to be useful tools in predicting the

  13. Genomic biomarkers for molecular imaging: predicting the future.

    PubMed

    Thakur, Mathew L

    2009-07-01

    Over the past few decades, great strides have been made in anatomical imaging of disease that has led to their diagnosis with minimal invasion. Despite these advances, diseases such as cancer continue to take one human life every minute in the United States. Complimentary approaches that pertain directly to the genesis of the disease might contribute to its early diagnosis and subsequent management. In cancer, an array of molecular abnormalities leading to the modulations in expression of key proteins important in the cellular signaling pathways and cell proliferation has been identified. These specific disease fingerprints, biomarkers, are overexpressed on malignant cell surfaces or within the cytoplasm, and they provide unique targets that are promising for improving cancer diagnosis and therapy. We and others have designed, synthesized, and evaluated some novel probes specific for those oncogenes and oncogene product biomarkers for PET and SPECT molecular imaging of certain types of cancers. This article briefly describes this approach and gives specific examples that depict the ability of molecular imaging to detect occult lesions not detectable by current scintigraphic approaches. The article also outlines a few examples predicting other possible applications of targeting such specific probes not yet used.

  14. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  15. Prediction of hot spots in protein interfaces using a random forest model with hybrid features.

    PubMed

    Wang, Lin; Liu, Zhi-Ping; Zhang, Xiang-Sun; Chen, Luonan

    2012-03-01

    Prediction of hot spots in protein interfaces provides crucial information for the research on protein-protein interaction and drug design. Existing machine learning methods generally judge whether a given residue is likely to be a hot spot by extracting features only from the target residue. However, hot spots usually form a small cluster of residues which are tightly packed together at the center of protein interface. With this in mind, we present a novel method to extract hybrid features which incorporate a wide range of information of the target residue and its spatially neighboring residues, i.e. the nearest contact residue in the other face (mirror-contact residue) and the nearest contact residue in the same face (intra-contact residue). We provide a novel random forest (RF) model to effectively integrate these hybrid features for predicting hot spots in protein interfaces. Our method can achieve accuracy (ACC) of 82.4% and Matthew's correlation coefficient (MCC) of 0.482 in Alanine Scanning Energetics Database, and ACC of 77.6% and MCC of 0.429 in Binding Interface Database. In a comparison study, performance of our RF model exceeds other existing methods, such as Robetta, FOLDEF, KFC, KFC2, MINERVA and HotPoint. Of our hybrid features, three physicochemical features of target residues (mass, polarizability and isoelectric point), the relative side-chain accessible surface area and the average depth index of mirror-contact residues are found to be the main discriminative features in hot spots prediction. We also confirm that hot spots tend to form large contact surface areas between two interacting proteins. Source data and code are available at: http://www.aporc.org/doc/wiki/HotSpot.

  16. Prediction of Occult Invasive Disease in Ductal Carcinoma in Situ Using Deep Learning Features.

    PubMed

    Shi, Bibo; Grimm, Lars J; Mazurowski, Maciej A; Baker, Jay A; Marks, Jeffrey R; King, Lorraine M; Maley, Carlo C; Hwang, E Shelley; Lo, Joseph Y

    2018-03-01

    The aim of this study was to determine whether deep features extracted from digital mammograms using a pretrained deep convolutional neural network are prognostic of occult invasive disease for patients with ductal carcinoma in situ (DCIS) on core needle biopsy. In this retrospective study, digital mammographic magnification views were collected for 99 subjects with DCIS at biopsy, 25 of which were subsequently upstaged to invasive cancer. A deep convolutional neural network model that was pretrained on nonmedical images (eg, animals, plants, instruments) was used as the feature extractor. Through a statistical pooling strategy, deep features were extracted at different levels of convolutional layers from the lesion areas, without sacrificing the original resolution or distorting the underlying topology. A multivariate classifier was then trained to predict which tumors contain occult invasive disease. This was compared with the performance of traditional "handcrafted" computer vision (CV) features previously developed specifically to assess mammographic calcifications. The generalization performance was assessed using Monte Carlo cross-validation and receiver operating characteristic curve analysis. Deep features were able to distinguish DCIS with occult invasion from pure DCIS, with an area under the receiver operating characteristic curve of 0.70 (95% confidence interval, 0.68-0.73). This performance was comparable with the handcrafted CV features (area under the curve = 0.68; 95% confidence interval, 0.66-0.71) that were designed with prior domain knowledge. Despite being pretrained on only nonmedical images, the deep features extracted from digital mammograms demonstrated comparable performance with handcrafted CV features for the challenging task of predicting DCIS upstaging. Copyright © 2017 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  17. Predicting activation energy of thermolysis of polynitro arenes through molecular structure.

    PubMed

    Keshavarz, Mohammad Hossein; Pouretedal, Hamid Reza; Shokrolahi, Arash; Zali, Abbas; Semnani, Abolfazl

    2008-12-15

    The paper presents a new method for activation energy or the Arrhenius parameter E(a) of the thermolysis in the condensed state for different polynitro arenes as an important class of energetic molecules. The methodology assumes that E(a) of a polynitro arene with general formula C(a)H(b)N(c)O(d) can be expressed as a function of optimized elemental composition as well as the contribution of specific molecular structural parameters. The new method can predict E(a) of the thermolysis under conditions of Soviet Manometric Method (SMM), which can be related to the other convenient methods. The new correlation has the root mean square (rms) and the average deviations of 13.79 and 11.94kJ/mol, respectively, for 20 polynitro arenes with different molecular structures. The proposed new method can also be used to predict E(a) of three polynitro arenes, i.e. 2,2',2'',4,4',4'',6,6',6''-nonanitro-1,1':3',1''-terphenyl (NONA), 3,3'-diamino-2,2',4,4',6,6'-hexanitro-1,1'-biphenyl-3,3'-diamine (DIPAM) and N,N-bis(2,4-dinitrophenyl)-2,4,6-trinitroaniline (NTFA), which have complex molecular structures.

  18. Rosetta Structure Prediction as a Tool for Solving Difficult Molecular Replacement Problems.

    PubMed

    DiMaio, Frank

    2017-01-01

    Molecular replacement (MR), a method for solving the crystallographic phase problem using phases derived from a model of the target structure, has proven extremely valuable, accounting for the vast majority of structures solved by X-ray crystallography. However, when the resolution of data is low, or the starting model is very dissimilar to the target protein, solving structures via molecular replacement may be very challenging. In recent years, protein structure prediction methodology has emerged as a powerful tool in model building and model refinement for difficult molecular replacement problems. This chapter describes some of the tools available in Rosetta for model building and model refinement specifically geared toward difficult molecular replacement cases.

  19. Recent advances in the development and use of molecular tests to predict antimicrobial resistance in Neisseria gonorrhoeae.

    PubMed

    Donà, Valentina; Low, Nicola; Golparian, Daniel; Unemo, Magnus

    2017-09-01

    The number of genetic tests, mostly real-time PCRs, to detect antimicrobial resistance (AMR) determinants and predict AMR in Neisseria gonorrhoeae is increasing. Several of these assays are promising, but there are important shortcomings and few assays have been adequately validated and quality assured. Areas covered: Recent advances, focusing on publications since 2012, in the development and use of molecular tests to predict gonococcal AMR for surveillance and for clinical use, advantages and disadvantages of these tests and of molecular AMR prediction compared with phenotypic AMR testing, and future perspectives for effective use of molecular AMR tests for different purposes. Expert commentary: Several challenges for direct testing of clinical, especially extra-genital, specimens remain. The choice of molecular assay needs to consider the assay target, quality controls, sample types, limitations intrinsic to molecular technologies, and specific to the chosen methodology, and the intended use of the test. Improved molecular- and particularly genome-sequencing-based methods will supplement AMR testing for surveillance purposes, and translate into point-of-care tests that will lead to personalized treatments, while sparing the last available empiric treatment option (ceftriaxone). However, genetic AMR prediction will never completely replace phenotypic AMR testing, which detects also AMR due to unknown AMR determinants.

  20. Adaptive reliance on the most stable sensory predictions enhances perceptual feature extraction of moving stimuli.

    PubMed

    Kumar, Neeraj; Mutha, Pratik K

    2016-03-01

    The prediction of the sensory outcomes of action is thought to be useful for distinguishing self- vs. externally generated sensations, correcting movements when sensory feedback is delayed, and learning predictive models for motor behavior. Here, we show that aspects of another fundamental function-perception-are enhanced when they entail the contribution of predicted sensory outcomes and that this enhancement relies on the adaptive use of the most stable predictions available. We combined a motor-learning paradigm that imposes new sensory predictions with a dynamic visual search task to first show that perceptual feature extraction of a moving stimulus is poorer when it is based on sensory feedback that is misaligned with those predictions. This was possible because our novel experimental design allowed us to override the "natural" sensory predictions present when any action is performed and separately examine the influence of these two sources on perceptual feature extraction. We then show that if the new predictions induced via motor learning are unreliable, rather than just relying on sensory information for perceptual judgments, as is conventionally thought, then subjects adaptively transition to using other stable sensory predictions to maintain greater accuracy in their perceptual judgments. Finally, we show that when sensory predictions are not modified at all, these judgments are sharper when subjects combine their natural predictions with sensory feedback. Collectively, our results highlight the crucial contribution of sensory predictions to perception and also suggest that the brain intelligently integrates the most stable predictions available with sensory information to maintain high fidelity in perceptual decisions. Copyright © 2016 the American Physiological Society.

  1. Predicting solubilisation features of ternary phase diagrams of fully dilutable lecithin linker microemulsions.

    PubMed

    Nouraei, Mehdi; Acosta, Edgar J

    2017-06-01

    Fully dilutable microemulsions (μEs), used to design self-microemulsifying delivery system (SMEDS), are formulated as concentrate solutions containing oil and surfactants, without water. As water is added to dilute these systems, various μEs are produced (water-swollen reverse micelles, bicontinuous systems, and oil-swollen micelles), without the onset of phase separation. Currently, the formulation dilutable μEs follows a trial and error approach that has had a limited success. The objective of this work is to introduce the use of the hydrophilic-lipophilic-difference (HLD) and net-average-curvature (NAC) frameworks to predict the solubilisation features of ternary phase diagrams of lecithin-linker μEs and the use of these predictions to guide the formulation of dilutable μEs. To this end, the characteristic curvatures (Cc) of soybean lecithin (surfactant), glycerol monooleate (lipophilic linker) and polyglycerol caprylate (hydrophilic linker) and the equivalent alkane carbon number (EACN) of ethyl caprate (oil) were obtained via phase scans with reference surfactant-oil systems. These parameters were then used to calculate the HLD of lecithin-linkers-ethyl caprate microemulsions. The calculated HLDs were able to predict the phase transitions observed in the phase scans. The NAC was then used to fit and predict phase volumes obtained from salinity phase scans, and to predict the solubilisation features of ternary phase diagrams of the lecithin-linker formulations. The HLD-NAC predictions were reasonably accurate, and indicated that the largest region for dilutable μEs was obtained with slightly negative HLD values. The NAC framework also predicted, and explained, the changes in microemulsion properties along dilution lines. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods.

    PubMed

    Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming

    2014-12-01

    Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  3. Molecular features of biguanides required for targeting of mitochondrial respiratory complex I and activation of AMP-kinase.

    PubMed

    Bridges, Hannah R; Sirviö, Ville A; Agip, Ahmed-Noor A; Hirst, Judy

    2016-08-09

    The biguanides are a family of drugs with diverse clinical applications. Metformin, a widely used anti-hyperglycemic biguanide, suppresses mitochondrial respiration by inhibiting respiratory complex I. Phenformin, a related anti-hyperglycemic biguanide, also inhibits respiration, but proguanil, which is widely used for the prevention of malaria, does not. The molecular structures of phenformin and proguanil are closely related and both inhibit isolated complex I. Proguanil does not inhibit respiration in cells and mitochondria because it is unable to access complex I. The molecular features that determine which biguanides accumulate in mitochondria, enabling them to inhibit complex I in vivo, are not known. Here, a family of seven biguanides are used to reveal the molecular features that determine why phenformin enters mitochondria and inhibits respiration whereas proguanil does not. All seven biguanides inhibit isolated complex I, but only four of them inhibit respiration in cells and mitochondria. Direct conjugation of a phenyl group and bis-substitution of the biguanide moiety prevent uptake into mitochondria, irrespective of the compound hydrophobicity. This high selectivity suggests that biguanide uptake into mitochondria is protein mediated, and is not by passive diffusion. Only those biguanides that enter mitochondria and inhibit complex I activate AMP kinase, strengthening links between complex I and the downstream effects of biguanide treatments. Biguanides inhibit mitochondrial complex I, but specific molecular features control the uptake of substituted biguanides into mitochondria, so only some biguanides inhibit mitochondrial respiration in vivo. Biguanides with restricted intracellular access may be used to determine physiologically relevant targets of biguanide action, and for the rational design of substituted biguanides for diverse clinical applications.

  4. Feature selection in feature network models: finding predictive subsets of features with the Positive Lasso.

    PubMed

    Frank, Laurence E; Heiser, Willem J

    2008-05-01

    A set of features is the basis for the network representation of proximity data achieved by feature network models (FNMs). Features are binary variables that characterize the objects in an experiment, with some measure of proximity as response variable. Sometimes features are provided by theory and play an important role in the construction of the experimental conditions. In some research settings, the features are not known a priori. This paper shows how to generate features in this situation and how to select an adequate subset of features that takes into account a good compromise between model fit and model complexity, using a new version of least angle regression that restricts coefficients to be non-negative, called the Positive Lasso. It will be shown that features can be generated efficiently with Gray codes that are naturally linked to the FNMs. The model selection strategy makes use of the fact that FNM can be considered as univariate multiple regression model. A simulation study shows that the proposed strategy leads to satisfactory results if the number of objects is less than or equal to 22. If the number of objects is larger than 22, the number of features selected by our method exceeds the true number of features in some conditions.

  5. Mining hidden data to predict patient prognosis: texture feature extraction and machine learning in mammography

    NASA Astrophysics Data System (ADS)

    Leighs, J. A.; Halling-Brown, M. D.; Patel, M. N.

    2018-03-01

    The UK currently has a national breast cancer-screening program and images are routinely collected from a number of screening sites, representing a wealth of invaluable data that is currently under-used. Radiologists evaluate screening images manually and recall suspicious cases for further analysis such as biopsy. Histological testing of biopsy samples confirms the malignancy of the tumour, along with other diagnostic and prognostic characteristics such as disease grade. Machine learning is becoming increasingly popular for clinical image classification problems, as it is capable of discovering patterns in data otherwise invisible. This is particularly true when applied to medical imaging features; however clinical datasets are often relatively small. A texture feature extraction toolkit has been developed to mine a wide range of features from medical images such as mammograms. This study analysed a dataset of 1,366 radiologist-marked, biopsy-proven malignant lesions obtained from the OPTIMAM Medical Image Database (OMI-DB). Exploratory data analysis methods were employed to better understand extracted features. Machine learning techniques including Classification and Regression Trees (CART), ensemble methods (e.g. random forests), and logistic regression were applied to the data to predict the disease grade of the analysed lesions. Prediction scores of up to 83% were achieved; sensitivity and specificity of the models trained have been discussed to put the results into a clinical context. The results show promise in the ability to predict prognostic indicators from the texture features extracted and thus enable prioritisation of care for patients at greatest risk.

  6. Stargardt disease: clinical features, molecular genetics, animal models and therapeutic options

    PubMed Central

    Tanna, Preena; Strauss, Rupert W; Fujinami, Kaoru; Michaelides, Michel

    2017-01-01

    Stargardt disease (STGD1; MIM 248200) is the most prevalent inherited macular dystrophy and is associated with disease-causing sequence variants in the gene ABCA4. Significant advances have been made over the last 10 years in our understanding of both the clinical and molecular features of STGD1, and also the underlying pathophysiology, which has culminated in ongoing and planned human clinical trials of novel therapies. The aims of this review are to describe the detailed phenotypic and genotypic characteristics of the disease, conventional and novel imaging findings, current knowledge of animal models and pathogenesis, and the multiple avenues of intervention being explored. PMID:27491360

  7. Prediction of paroxysmal atrial fibrillation using recurrence plot-based features of the RR-interval signal.

    PubMed

    Mohebbi, Maryam; Ghassemian, Hassan

    2011-08-01

    Atrial fibrillation (AF) is the most common cardiac arrhythmia and increases the risk of stroke. Predicting the onset of paroxysmal AF (PAF), based on noninvasive techniques, is clinically important and can be invaluable in order to avoid useless therapeutic intervention and to minimize risks for the patients. In this paper, we propose an effective PAF predictor which is based on the analysis of the RR-interval signal. This method consists of three steps: preprocessing, feature extraction and classification. In the first step, the QRS complexes are detected from the electrocardiogram (ECG) signal and then the RR-interval signal is extracted. In the next step, the recurrence plot (RP) of the RR-interval signal is obtained and five statistically significant features are extracted to characterize the basic patterns of the RP. These features consist of the recurrence rate, length of longest diagonal segments (L(max )), average length of the diagonal lines (L(mean)), entropy, and trapping time. Recurrence quantification analysis can reveal subtle aspects of dynamics not easily appreciated by other methods and exhibits characteristic patterns which are caused by the typical dynamical behavior. In the final step, a support vector machine (SVM)-based classifier is used for PAF prediction. The performance of the proposed method in prediction of PAF episodes was evaluated using the Atrial Fibrillation Prediction Database (AFPDB) which consists of both 30 min ECG recordings that end just prior to the onset of PAF and segments at least 45 min distant from any PAF events. The obtained sensitivity, specificity, positive predictivity and negative predictivity were 97%, 100%, 100%, and 96%, respectively. The proposed methodology presents better results than other existing approaches.

  8. Distinct Molecular Features of Different Macroscopic Subtypes of Colorectal Neoplasms

    PubMed Central

    Konda, Kenichi; Konishi, Kazuo; Yamochi, Toshiko; Ito, Yoichi M.; Nozawa, Hisako; Tojo, Masayuki; Shinmura, Kensuke; Kogo, Mari; Katagiri, Atsushi; Kubota, Yutaro; Muramoto, Takashi; Yano, Yuichiro; Kobayashi, Yoshiya; Kihara, Toshihiro; Tagawa, Teppei; Makino, Reiko; Takimoto, Masafumi; Imawari, Michio; Yoshida, Hitoshi

    2014-01-01

    Background Colorectal adenoma develops into cancer with the accumulation of genetic and epigenetic changes. We studied the underlying molecular and clinicopathological features to better understand the heterogeneity of colorectal neoplasms (CRNs). Methods We evaluated both genetic (mutations of KRAS, BRAF, TP53, and PIK3CA, and microsatellite instability [MSI]) and epigenetic (methylation status of nine genes or sequences, including the CpG island methylator phenotype [CIMP] markers) alterations in 158 CRNs including 56 polypoid neoplasms (PNs), 25 granular type laterally spreading tumors (LST-Gs), 48 non-granular type LSTs (LST-NGs), 19 depressed neoplasms (DNs) and 10 small flat-elevated neoplasms (S-FNs) on the basis of macroscopic appearance. Results S-FNs showed few molecular changes except SFRP1 methylation. Significant differences in the frequency of KRAS mutations were observed among subtypes (68% for LST-Gs, 36% for PNs, 16% for DNs and 6% for LST-NGs) (P<0.001). By contrast, the frequency of TP53 mutation was higher in DNs than PNs or LST-Gs (32% vs. 5% or 0%, respectively) (P<0.007). We also observed significant differences in the frequency of CIMP between LST-Gs and LST-NGs or PNs (32% vs. 6% or 5%, respectively) (P<0.005). Moreover, the methylation level of LINE-1 was significantly lower in DNs or LST-Gs than in PNs (58.3% or 60.5% vs. 63.2%, P<0.05). PIK3CA mutations were detected only in LSTs. Finally, multivariate analyses showed that macroscopic morphologies were significantly associated with an increased risk of molecular changes (PN or LST-G for KRAS mutation, odds ratio [OR] 9.11; LST-NG or DN for TP53 mutation, OR 5.30; LST-G for PIK3CA mutation, OR 26.53; LST-G or DN for LINE-1 hypomethylation, OR 3.41). Conclusion We demonstrated that CRNs could be classified into five macroscopic subtypes according to clinicopathological and molecular differences, suggesting that different mechanisms are involved in the pathogenesis of colorectal

  9. Predicting the Macroscopic Fracture Energy of Epoxy Resins from Atomistic Molecular Simulations

    DOE PAGES

    Meng, Zhaoxu; Bessa, Miguel A.; Xia, Wenjie; ...

    2016-12-06

    Predicting the macroscopic fracture energy of highly crosslinked glassy polymers from atomistic simulations is challenging due to the size of the process zone being large in these systems. Here, we present a scale-bridging approach that links atomistic molecular dynamics simulations to macroscopic fracture properties on the basis of a continuum fracture mechanics model for two different epoxy materials. Our approach reveals that the fracture energy of epoxy resins strongly depends on the functionality of epoxy resin and the component ratio between the curing agent (amine) and epoxide. The most intriguing part of our study is that we demonstrate that themore » fracture energy exhibits a maximum value within the range of conversion degrees considered (from 65% to 95%), which can be attributed to the combined effects of structural rigidity and post-yield deformability. Our study provides physical insight into the molecular mechanisms that govern the fracture characteristics of epoxy resins and demonstrates the success of utilizing atomistic molecular simulations towards predicting macroscopic material properties.« less

  10. Predicting the Macroscopic Fracture Energy of Epoxy Resins from Atomistic Molecular Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meng, Zhaoxu; Bessa, Miguel A.; Xia, Wenjie

    Predicting the macroscopic fracture energy of highly crosslinked glassy polymers from atomistic simulations is challenging due to the size of the process zone being large in these systems. Here, we present a scale-bridging approach that links atomistic molecular dynamics simulations to macroscopic fracture properties on the basis of a continuum fracture mechanics model for two different epoxy materials. Our approach reveals that the fracture energy of epoxy resins strongly depends on the functionality of epoxy resin and the component ratio between the curing agent (amine) and epoxide. The most intriguing part of our study is that we demonstrate that themore » fracture energy exhibits a maximum value within the range of conversion degrees considered (from 65% to 95%), which can be attributed to the combined effects of structural rigidity and post-yield deformability. Our study provides physical insight into the molecular mechanisms that govern the fracture characteristics of epoxy resins and demonstrates the success of utilizing atomistic molecular simulations towards predicting macroscopic material properties.« less

  11. [Neuroendocrine prostate cancer: Natural history, molecular features, therapeutic management and future directions].

    PubMed

    Campedel, Luca; Kossaï, Myriam; Blanc-Durand, Paul; Rouprêt, Morgan; Seisen, Thomas; Compérat, Eva; Spano, Jean-Philippe; Malouf, Gabriel

    2017-09-01

    Neuroendocrine prostate cancer is a rare malignancy with a an adverse prognostic. Histologically, It can be pure (small cells or large cells neuroendocrine carcinoma) or mixed with a adenocarcinoma component. Rarely diagnosed de novo, neuroendocrine prostate cancer is generally associated with advanced stage disease resistant to castration. As such, this histological subtype could represent an aggressive evolution of prostatic adenocarcinoma, through the epithelio-neuroendocrine transdifferentiation mechanism (phenomenon of lineage plasticity). Nonetheless, neuroendocrine prostate cancer is a heterogeneous malignancy with multiple histopathological variants showing distinct clinical features. The broad variety of molecular analyses could help to understand the ontogeny of this histological subtype and its signaling pathways. This may also allow identifying diagnostic and prognostic biomarkers as well as potential molecular targets. However, treatment options are currently limited and consist only in platinium-based chemotherapy for advanced stage disease. Copyright © 2017 Société Française du Cancer. Published by Elsevier Masson SAS. All rights reserved.

  12. Comparison between semantic features and lung-RADS in predicting malignancy of screening lung nodule

    PubMed Central

    Li, Qian; Balagurunathan, Yoganand; Liu, Ying; Qi, Jin; Schabath, Matthew B.; Ye, Zhaoxiang; Gillies, Robert

    2017-01-01

    Rationale Lung-RADS is proposed for the Low-dose computed tomography (LDCT) interpretation in lung cancer screening, but its performance needs to be further evaluated. Objectives To compare the value of radiological semantic features and lung-RADS in predicting nodule malignancy risk at different screening rounds, and to investigate whether the predictive power of lung-RADS could be improved by incorporating semantic features. Methods A training cohort of 199 patients (139 benign and 60 cancerous nodules diagnosed at the third screening round), and a testing cohort of 80 patients (40 benign and 40 malignant nodules) were obtained from the National Lung Screening Trial dataset. A multivariate linear predictor model was built based on the 24 systematically scored semantic features, and the performances were compared to lung-RADS (scale 3 or above called positive). Measurements and Main Results Among the semantic features, contour and border definition were the top individual predictors. The average area under the receiver-operating characteristic curve (AUC) of border definition at baseline (T0) was 0.724. The average AUC of contour at first (T1) and second follow-up (T2) were 0.843 and 0.878, respectively. Other significant features included size, location, vessel attachment, solidity, focal emphysema and focal fibrosis. In comparison, the average AUC of lung-RADS at T0, T1 and T2 were 0.600, 0.760 and 0.867, respectively, and could be improved to 0.743, 0.887 and 0.968 by adding semantic features. Conclusion The semantic features performed similar to lung-RADS at follow-ups, outperformed lung-RADS at baseline, and could improve the performance of lung-RADS for all screening rounds. PMID:29137847

  13. OH-PRED: prediction of protein hydroxylation sites by incorporating adapted normal distribution bi-profile Bayes feature extraction and physicochemical properties of amino acids.

    PubMed

    Jia, Cang-Zhi; He, Wen-Ying; Yao, Yu-Hua

    2017-03-01

    Hydroxylation of proline or lysine residues in proteins is a common post-translational modification event, and such modifications are found in many physiological and pathological processes. Nonetheless, the exact molecular mechanism of hydroxylation remains under investigation. Because experimental identification of hydroxylation is time-consuming and expensive, bioinformatics tools with high accuracy represent desirable alternatives for large-scale rapid identification of protein hydroxylation sites. In view of this, we developed a supporter vector machine-based tool, OH-PRED, for the prediction of protein hydroxylation sites using the adapted normal distribution bi-profile Bayes feature extraction in combination with the physicochemical property indexes of the amino acids. In a jackknife cross validation, OH-PRED yields an accuracy of 91.88% and a Matthew's correlation coefficient (MCC) of 0.838 for the prediction of hydroxyproline sites, and yields an accuracy of 97.42% and a MCC of 0.949 for the prediction of hydroxylysine sites. These results demonstrate that OH-PRED increased significantly the prediction accuracy of hydroxyproline and hydroxylysine sites by 7.37 and 14.09%, respectively, when compared with the latest predictor PredHydroxy. In independent tests, OH-PRED also outperforms previously published methods.

  14. Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico

    USGS Publications Warehouse

    Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.

    2003-01-01

    Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire

  15. Music-induced emotions can be predicted from a combination of brain activity and acoustic features.

    PubMed

    Daly, Ian; Williams, Duncan; Hallowell, James; Hwang, Faustina; Kirke, Alexis; Malik, Asad; Weaver, James; Miranda, Eduardo; Nasuto, Slawomir J

    2015-12-01

    It is widely acknowledged that music can communicate and induce a wide range of emotions in the listener. However, music is a highly-complex audio signal composed of a wide range of complex time- and frequency-varying components. Additionally, music-induced emotions are known to differ greatly between listeners. Therefore, it is not immediately clear what emotions will be induced in a given individual by a piece of music. We attempt to predict the music-induced emotional response in a listener by measuring the activity in the listeners electroencephalogram (EEG). We combine these measures with acoustic descriptors of the music, an approach that allows us to consider music as a complex set of time-varying acoustic features, independently of any specific music theory. Regression models are found which allow us to predict the music-induced emotions of our participants with a correlation between the actual and predicted responses of up to r=0.234,p<0.001. This regression fit suggests that over 20% of the variance of the participant's music induced emotions can be predicted by their neural activity and the properties of the music. Given the large amount of noise, non-stationarity, and non-linearity in both EEG and music, this is an encouraging result. Additionally, the combination of measures of brain activity and acoustic features describing the music played to our participants allows us to predict music-induced emotions with significantly higher accuracies than either feature type alone (p<0.01). Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Respiratory trace feature analysis for the prediction of respiratory-gated PET quantification.

    PubMed

    Wang, Shouyi; Bowen, Stephen R; Chaovalitwongse, W Art; Sandison, George A; Grabowski, Thomas J; Kinahan, Paul E

    2014-02-21

    The benefits of respiratory gating in quantitative PET/CT vary tremendously between individual patients. Respiratory pattern is among many patient-specific characteristics that are thought to play an important role in gating-induced imaging improvements. However, the quantitative relationship between patient-specific characteristics of respiratory pattern and improvements in quantitative accuracy from respiratory-gated PET/CT has not been well established. If such a relationship could be estimated, then patient-specific respiratory patterns could be used to prospectively select appropriate motion compensation during image acquisition on a per-patient basis. This study was undertaken to develop a novel statistical model that predicts quantitative changes in PET/CT imaging due to respiratory gating. Free-breathing static FDG-PET images without gating and respiratory-gated FDG-PET images were collected from 22 lung and liver cancer patients on a PET/CT scanner. PET imaging quality was quantified with peak standardized uptake value (SUV(peak)) over lesions of interest. Relative differences in SUV(peak) between static and gated PET images were calculated to indicate quantitative imaging changes due to gating. A comprehensive multidimensional extraction of the morphological and statistical characteristics of respiratory patterns was conducted, resulting in 16 features that characterize representative patterns of a single respiratory trace. The six most informative features were subsequently extracted using a stepwise feature selection approach. The multiple-regression model was trained and tested based on a leave-one-subject-out cross-validation. The predicted quantitative improvements in PET imaging achieved an accuracy higher than 90% using a criterion with a dynamic error-tolerance range for SUV(peak) values. The results of this study suggest that our prediction framework could be applied to determine which patients would likely benefit from respiratory motion

  17. Respiratory trace feature analysis for the prediction of respiratory-gated PET quantification

    NASA Astrophysics Data System (ADS)

    Wang, Shouyi; Bowen, Stephen R.; Chaovalitwongse, W. Art; Sandison, George A.; Grabowski, Thomas J.; Kinahan, Paul E.

    2014-02-01

    The benefits of respiratory gating in quantitative PET/CT vary tremendously between individual patients. Respiratory pattern is among many patient-specific characteristics that are thought to play an important role in gating-induced imaging improvements. However, the quantitative relationship between patient-specific characteristics of respiratory pattern and improvements in quantitative accuracy from respiratory-gated PET/CT has not been well established. If such a relationship could be estimated, then patient-specific respiratory patterns could be used to prospectively select appropriate motion compensation during image acquisition on a per-patient basis. This study was undertaken to develop a novel statistical model that predicts quantitative changes in PET/CT imaging due to respiratory gating. Free-breathing static FDG-PET images without gating and respiratory-gated FDG-PET images were collected from 22 lung and liver cancer patients on a PET/CT scanner. PET imaging quality was quantified with peak standardized uptake value (SUVpeak) over lesions of interest. Relative differences in SUVpeak between static and gated PET images were calculated to indicate quantitative imaging changes due to gating. A comprehensive multidimensional extraction of the morphological and statistical characteristics of respiratory patterns was conducted, resulting in 16 features that characterize representative patterns of a single respiratory trace. The six most informative features were subsequently extracted using a stepwise feature selection approach. The multiple-regression model was trained and tested based on a leave-one-subject-out cross-validation. The predicted quantitative improvements in PET imaging achieved an accuracy higher than 90% using a criterion with a dynamic error-tolerance range for SUVpeak values. The results of this study suggest that our prediction framework could be applied to determine which patients would likely benefit from respiratory motion compensation

  18. Physical re-examination of parameters on a molecular collisions-based diffusion model for diffusivity prediction in polymers.

    PubMed

    Ohashi, Hidenori; Tamaki, Takanori; Yamaguchi, Takeo

    2011-12-29

    Molecular collisions, which are the microscopic origin of molecular diffusive motion, are affected by both the molecular surface area and the distance between molecules. Their product can be regarded as the free space around a penetrant molecule defined as the "shell-like free volume" and can be taken as a characteristic of molecular collisions. On the basis of this notion, a new diffusion theory has been developed. The model can predict molecular diffusivity in polymeric systems using only well-defined single-component parameters of molecular volume, molecular surface area, free volume, and pre-exponential factors. By consideration of the physical description of the model, the actual body moved and which neighbor molecules are collided with are the volume and the surface area of the penetrant molecular core. In the present study, a semiempirical quantum chemical calculation was used to calculate both of these parameters. The model and the newly developed parameters offer fairly good predictive ability. © 2011 American Chemical Society

  19. Energy Minimization of Molecular Features Observed on the (110) Face of Lysozyme Crystals

    NASA Technical Reports Server (NTRS)

    Perozzo, Mary A.; Konnert, John H.; Li, Huayu; Nadarajah, Arunan; Pusey, Marc

    1999-01-01

    Molecular dynamics and energy minimization have been carried out using the program XPLOR to check the plausibility of a model lysozyme crystal surface. The molecular features of the (110) face of lysozyme were observed using atomic force microscopy (AFM). A model of the crystal surface was constructed using the PDB file 193L, and was used to simulate an AFM image. Molecule translations, van der Waals radii, and assumed AFM tip shape were adjusted to maximize the correlation coefficient between the experimental and simulated images. The highest degree of 0 correlation (0.92) was obtained with the molecules displaced over 6 A from their positions within the bulk of the crystal. The quality of this starting model, the extent of energy minimization, and the correlation coefficient between the final model and the experimental data will be discussed.

  20. Models of the elastic x-ray scattering feature for warm dense aluminum

    DOE PAGES

    Starrett, Charles Edward; Saumon, Didier

    2015-09-03

    The elastic feature of x-ray scattering from warm dense aluminum has recently been measured by Fletcher et al. [Nature Photonics 9, 274 (2015)] with much higher accuracy than had hitherto been possible. This measurement is a direct test of the ionic structure predicted by models of warm dense matter. We use the method of pseudoatom molecular dynamics to predict this elastic feature for warm dense aluminum with temperatures of 1–100 eV and densities of 2.7–8.1g/cm 3. We compare these predictions to experiments, finding good agreement with Fletcher et al. and corroborating the discrepancy found in analyses of an earlier experimentmore » of Ma et al. [Phys. Rev. Lett. 110, 065001 (2013)]. Lastly, we also evaluate the validity of the Thomas-Fermi model of the electrons and of the hypernetted chain approximation in computing the elastic feature and find them both wanting in the regime currently probed by experiments.« less

  1. Spatial-Temporal [{sup 18}F]FDG-PET Features for Predicting Pathologic Response of Esophageal Cancer to Neoadjuvant Chemoradiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tan, Shan; Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan; Kligerman, Seth

    2013-04-01

    Purpose: To extract and study comprehensive spatial-temporal {sup 18}F-labeled fluorodeoxyglucose ([{sup 18}F]FDG) positron emission tomography (PET) features for the prediction of pathologic tumor response to neoadjuvant chemoradiation therapy (CRT) in esophageal cancer. Methods and Materials: Twenty patients with esophageal cancer were treated with trimodal therapy (CRT plus surgery) and underwent [{sup 18}F]FDG-PET/CT scans both before (pre-CRT) and after (post-CRT) CRT. The 2 scans were rigidly registered. A tumor volume was semiautomatically delineated using a threshold standardized uptake value (SUV) of ≥2.5, followed by manual editing. Comprehensive features were extracted to characterize SUV intensity distribution, spatial patterns (texture), tumor geometry, andmore » associated changes resulting from CRT. The usefulness of each feature in predicting pathologic tumor response to CRT was evaluated using the area under the receiver operating characteristic curve (AUC) value. Results: The best traditional response measure was decline in maximum SUV (SUV{sub max}; AUC, 0.76). Two new intensity features, decline in mean SUV (SUV{sub mean}) and skewness, and 3 texture features (inertia, correlation, and cluster prominence) were found to be significant predictors with AUC values ≥0.76. According to these features, a tumor was more likely to be a responder when the SUV{sub mean} decline was larger, when there were relatively fewer voxels with higher SUV values pre-CRT, or when [{sup 18}F]FDG uptake post-CRT was relatively homogeneous. All of the most accurate predictive features were extracted from the entire tumor rather than from the most active part of the tumor. For SUV intensity features and tumor size features, changes were more predictive than pre- or post-CRT assessment alone. Conclusion: Spatial-temporal [{sup 18}F]FDG-PET features were found to be useful predictors of pathologic tumor response to neoadjuvant CRT in esophageal cancer.« less

  2. Three Molecular Subtypes of Gastric Adenocarcinoma Have Distinct Histochemical Features Reflecting Epstein-Barr Virus Infection Status and Neuroendocrine Differentiation.

    PubMed

    Speck, Olga; Tang, Weihua; Morgan, Douglas R; Kuan, Pei Fen; Meyers, Michael O; Dominguez, Ricardo L; Martinez, Enrique; Gulley, Margaret L

    2015-10-01

    Current histopathologic classification schemes for gastric adenocarcinoma have limited clinical utility and are difficult to apply due to tumor heterogeneity. Elucidation of molecular subtypes of gastric cancer may contribute to our understanding of gastric cancer biology and to the development of new molecular markers that may lead to improved diagnosis, therapy, or prognosis. We previously demonstrated that Epstein-Barr virus (EBV)-infected gastric cancers have a distinct human gene expression profile compared with uninfected cancers. We now examine the histopathologic features characterizing infected (n=14) and uninfected (n=89) cancers; the latter of which are now further divided into 2 major molecular subtypes based on expression patterns of 93 RNAs. One uninfected gastric cancer subtype was distinguished by upregulation of 3 genes with neuroendocrine (NE) function (CHGA, GAST, and REG4 encoding chromogranin, gastrin, and the secreted peptide REG4 involved in epithelial cell regeneration), implicating hormonal factors in the pathogenesis of a major class of gastric adenocarcinomas. Evidence of NE differentiation (molecular, immunohistochemical, or morphologic) was mutually exclusive of EBV infection. EBV-infected tumors tended to have solid-type morphology with lymphoid stroma. This study reveals novel molecular subtypes of gastric cancer and their associated morphologies that demonstrate divergent NE features.

  3. Computer-aided global breast MR image feature analysis for prediction of tumor response to chemotherapy: performance assessment

    NASA Astrophysics Data System (ADS)

    Aghaei, Faranak; Tan, Maxine; Hollingsworth, Alan B.; Zheng, Bin; Cheng, Samuel

    2016-03-01

    Dynamic contrast-enhanced breast magnetic resonance imaging (DCE-MRI) has been used increasingly in breast cancer diagnosis and assessment of cancer treatment efficacy. In this study, we applied a computer-aided detection (CAD) scheme to automatically segment breast regions depicting on MR images and used the kinetic image features computed from the global breast MR images acquired before neoadjuvant chemotherapy to build a new quantitative model to predict response of the breast cancer patients to the chemotherapy. To assess performance and robustness of this new prediction model, an image dataset involving breast MR images acquired from 151 cancer patients before undergoing neoadjuvant chemotherapy was retrospectively assembled and used. Among them, 63 patients had "complete response" (CR) to chemotherapy in which the enhanced contrast levels inside the tumor volume (pre-treatment) was reduced to the level as the normal enhanced background parenchymal tissues (post-treatment), while 88 patients had "partially response" (PR) in which the high contrast enhancement remain in the tumor regions after treatment. We performed the studies to analyze the correlation among the 22 global kinetic image features and then select a set of 4 optimal features. Applying an artificial neural network trained with the fusion of these 4 kinetic image features, the prediction model yielded an area under ROC curve (AUC) of 0.83+/-0.04. This study demonstrated that by avoiding tumor segmentation, which is often difficult and unreliable, fusion of kinetic image features computed from global breast MR images without tumor segmentation can also generate a useful clinical marker in predicting efficacy of chemotherapy.

  4. Molecular electronegativity distance vector model for the prediction of bioconcentration factors in fish.

    PubMed

    Liu, Shu-Shen; Qin, Li-Tang; Liu, Hai-Ling; Yin, Da-Qiang

    2008-02-01

    Molecular electronegativity distance vector (MEDV) derived directly from the molecular topological structures was used to describe the structures of 122 nonionic organic compounds (NOCs) and a quantitative relationship between the MEDV descriptors and the bioconcentration factors (BCF) of NOCs in fish was developed using the variable selection and modeling based on prediction (VSMP). It was found that some main structural factors influencing the BCFs of NOCs are the substructures expressed by four atomic types of nos. 2, 3, 5, and 13, i.e., atom groups -CH(2)- or =CH-, -CH< or =C<, -NH(2), and -Cl or -Br where the former two groups exist in the molecular skeleton of NOC and the latter three groups are related closely to the substituting groups on a benzene ring. The best 5-variable model, with the correlation coefficient (r(2)) of 0.9500 and the leave-one-out cross-validation correlation coefficient (q(2)) of 0.9428, was built by multiple linear regressions, which shows a good estimation ability and stability. A predictive power for the external samples was tested by the model from the training set of 80 NOCs and the predictive correlation coefficient (u(2)) for the 42 external samples in the test set was 0.9028.

  5. Predicting DNA hybridization kinetics from sequence

    NASA Astrophysics Data System (ADS)

    Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu

    2018-01-01

    Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.

  6. Malignant melanoma of sun-protected sites: a review of clinical, histological, and molecular features.

    PubMed

    Merkel, Emily A; Gerami, Pedram

    2017-06-01

    In most cases of cutaneous melanoma, ultraviolet (UV) radiation is recognized as a prominent risk factor. Less is known regarding the mechanisms of mutagenesis for melanoma arising in sun-protected sites, such as acral and mucosal melanoma. Acral and mucosal melanoma share many common features, including a late age of onset, a broad radial growth phase with prominent lentiginous growth, the presence of field cancerization cells, and, in most cases, lack of a precursor nevus. In addition to early chromosomal instability, many of the same genes are also involved in these two distinct melanoma subtypes. To better understand non-UV-mediated pathogenesis in melanoma, we conducted a joint literature review of clinical, histological, and molecular features in acral and mucosal melanoma. We also reviewed the current literature regarding aberrations in KIT, PDGFRA, TERT, and other commonly involved genes. By comparing common features of these two subtypes, we suggest potential mechanisms underlying acral and/or mucosal melanoma and offer direction for future investigations.

  7. Application of molecular dynamics simulations in molecular property prediction II: diffusion coefficient.

    PubMed

    Wang, Junmei; Hou, Tingjun

    2011-12-01

    In this work, we have evaluated how well the general assisted model building with energy refinement (AMBER) force field performs in studying the dynamic properties of liquids. Diffusion coefficients (D) have been predicted for 17 solvents, five organic compounds in aqueous solutions, four proteins in aqueous solutions, and nine organic compounds in nonaqueous solutions. An efficient sampling strategy has been proposed and tested in the calculation of the diffusion coefficients of solutes in solutions. There are two major findings of this study. First of all, the diffusion coefficients of organic solutes in aqueous solution can be well predicted: the average unsigned errors and the root mean square errors are 0.137 and 0.171 × 10(-5) cm(-2) s(-1), respectively. Second, although the absolute values of D cannot be predicted, good correlations have been achieved for eight organic solvents with experimental data (R(2) = 0.784), four proteins in aqueous solutions (R(2) = 0.996), and nine organic compounds in nonaqueous solutions (R(2) = 0.834). The temperature dependent behaviors of three solvents, namely, TIP3P water, dimethyl sulfoxide, and cyclohexane have been studied. The major molecular dynamics (MD) settings, such as the sizes of simulation boxes and with/without wrapping the coordinates of MD snapshots into the primary simulation boxes have been explored. We have concluded that our sampling strategy that averaging the mean square displacement collected in multiple short-MD simulations is efficient in predicting diffusion coefficients of solutes at infinite dilution. Copyright © 2011 Wiley Periodicals, Inc.

  8. Molecular classification of breast cancer: what the pathologist needs to know.

    PubMed

    Rakha, Emad A; Green, Andrew R

    2017-02-01

    Breast cancer is a heterogeneous disease featuring distinct histological, molecular and clinical phenotypes. Although traditional classification systems utilising clinicopathological and few molecular markers are well established and validated, they remain insufficient to reflect the diverse biological and clinical heterogeneity of breast cancer. Advancements in high-throughput molecular techniques and bioinformatics have contributed to the improved understanding of breast cancer biology, refinement of molecular taxonomies and the development of novel prognostic and predictive molecular assays. Application of such technologies is already underway, and is expected to change the way we manage breast cancer. Despite the enormous amount of work that has been carried out to develop and refine breast cancer molecular prognostic and predictive assays, molecular testing is still in evolution. Pathologists should be aware of the new technology and be ready for the challenge. In this review, we provide an update on the application of molecular techniques with regard to breast cancer diagnosis, prognosis and outcome prediction. The current contribution of emerging technology to our understanding of breast cancer is also highlighted. Copyright © 2016 Royal College of Pathologists of Australasia. Published by Elsevier B.V. All rights reserved.

  9. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition.

    PubMed

    Hayat, Maqsood; Khan, Asifullah

    2011-02-21

    Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.

  10. Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease.

    PubMed

    Vivekanandan, T; Sriman Narayana Iyengar, N Ch

    2017-11-01

    Enormous data growth in multiple domains has posed a great challenge for data processing and analysis techniques. In particular, the traditional record maintenance strategy has been replaced in the healthcare system. It is vital to develop a model that is able to handle the huge amount of e-healthcare data efficiently. In this paper, the challenging tasks of selecting critical features from the enormous set of available features and diagnosing heart disease are carried out. Feature selection is one of the most widely used pre-processing steps in classification problems. A modified differential evolution (DE) algorithm is used to perform feature selection for cardiovascular disease and optimization of selected features. Of the 10 available strategies for the traditional DE algorithm, the seventh strategy, which is represented by DE/rand/2/exp, is considered for comparative study. The performance analysis of the developed modified DE strategy is given in this paper. With the selected critical features, prediction of heart disease is carried out using fuzzy AHP and a feed-forward neural network. Various performance measures of integrating the modified differential evolution algorithm with fuzzy AHP and a feed-forward neural network in the prediction of heart disease are evaluated in this paper. The accuracy of the proposed hybrid model is 83%, which is higher than that of some other existing models. In addition, the prediction time of the proposed hybrid model is also evaluated and has shown promising results. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Genomic Signal Processing: Predicting Basic Molecular Biological Principles

    NASA Astrophysics Data System (ADS)

    Alter, Orly

    2005-03-01

    Advances in high-throughput technologies enable acquisition of different types of molecular biological data, monitoring the flow of biological information as DNA is transcribed to RNA, and RNA is translated to proteins, on a genomic scale. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment and drug development. Recently we described data-driven models for genome-scale molecular biological data, which use singular value decomposition (SVD) and the comparative generalized SVD (GSVD). Now we describe an integrative data-driven model, which uses pseudoinverse projection (1). We also demonstrate the predictive power of these matrix algebra models (2). The integrative pseudoinverse projection model formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the ``basis'' set. The mathematical variables of this integrative model, the pseudoinverse correlation patterns that are uncovered in the data, represent independent processes and corresponding cellular states (such as observed genome-wide effects of known regulators or transcription factors, the biological components of the cellular machinery that generate the genomic signals, and measured samples in which these regulators or transcription factors are over- or underactive). Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis, and gives a global picture of the correlations and possibly also causal coordination of

  12. Anticancer drug sensitivity prediction in cell lines from baseline gene expression through recursive feature selection.

    PubMed

    Dong, Zuoli; Zhang, Naiqian; Li, Chun; Wang, Haiyun; Fang, Yun; Wang, Jun; Zheng, Xiaoqi

    2015-06-30

    An enduring challenge in personalized medicine is to select right drug for individual patients. Testing drugs on patients in large clinical trials is one way to assess their efficacy and toxicity, but it is impractical to test hundreds of drugs currently under development. Therefore the preclinical prediction model is highly expected as it enables prediction of drug response to hundreds of cell lines in parallel. Recently, two large-scale pharmacogenomic studies screened multiple anticancer drugs on over 1000 cell lines in an effort to elucidate the response mechanism of anticancer drugs. To this aim, we here used gene expression features and drug sensitivity data in Cancer Cell Line Encyclopedia (CCLE) to build a predictor based on Support Vector Machine (SVM) and a recursive feature selection tool. Robustness of our model was validated by cross-validation and an independent dataset, the Cancer Genome Project (CGP). Our model achieved good cross validation performance for most drugs in the Cancer Cell Line Encyclopedia (≥80% accuracy for 10 drugs, ≥75% accuracy for 19 drugs). Independent tests on eleven common drugs between CCLE and CGP achieved satisfactory performance for three of them, i.e., AZD6244, Erlotinib and PD-0325901, using expression levels of only twelve, six and seven genes, respectively. These results suggest that drug response could be effectively predicted from genomic features. Our model could be applied to predict drug response for some certain drugs and potentially play a complementary role in personalized medicine.

  13. High-Performance First-Principles Molecular Dynamics for Predictive Theory and Modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gygi, Francois; Galli, Giulia; Schwegler, Eric

    This project focused on developing high-performance software tools for First-Principles Molecular Dynamics (FPMD) simulations, and applying them in investigations of materials relevant to energy conversion processes. FPMD is an atomistic simulation method that combines a quantum-mechanical description of electronic structure with the statistical description provided by molecular dynamics (MD) simulations. This reliance on fundamental principles allows FPMD simulations to provide a consistent description of structural, dynamical and electronic properties of a material. This is particularly useful in systems for which reliable empirical models are lacking. FPMD simulations are increasingly used as a predictive tool for applications such as batteries, solarmore » energy conversion, light-emitting devices, electro-chemical energy conversion devices and other materials. During the course of the project, several new features were developed and added to the open-source Qbox FPMD code. The code was further optimized for scalable operation of large-scale, Leadership-Class DOE computers. When combined with Many-Body Perturbation Theory (MBPT) calculations, this infrastructure was used to investigate structural and electronic properties of liquid water, ice, aqueous solutions, nanoparticles and solid-liquid interfaces. Computing both ionic trajectories and electronic structure in a consistent manner enabled the simulation of several spectroscopic properties, such as Raman spectra, infrared spectra, and sum-frequency generation spectra. The accuracy of the approximations used allowed for direct comparisons of results with experimental data such as optical spectra, X-ray and neutron diffraction spectra. The software infrastructure developed in this project, as applied to various investigations of solids, liquids and interfaces, demonstrates that FPMD simulations can provide a detailed, atomic-scale picture of structural, vibrational and electronic properties of complex

  14. Prediction of Sliding Friction Coefficient Based on a Novel Hybrid Molecular-Mechanical Model.

    PubMed

    Zhang, Xiaogang; Zhang, Yali; Wang, Jianmei; Sheng, Chenxing; Li, Zhixiong

    2018-08-01

    Sliding friction is a complex phenomenon which arises from the mechanical and molecular interactions of asperities when examined in a microscale. To reveal and further understand the effects of micro scaled mechanical and molecular components of friction coefficient on overall frictional behavior, a hybrid molecular-mechanical model is developed to investigate the effects of main factors, including different loads and surface roughness values, on the sliding friction coefficient in a boundary lubrication condition. Numerical modelling was conducted using a deterministic contact model and based on the molecular-mechanical theory of friction. In the contact model, with given external loads and surface topographies, the pressure distribution, real contact area, and elastic/plastic deformation of each single asperity contact were calculated. Then asperity friction coefficient was predicted by the sum of mechanical and molecular components of friction coefficient. The mechanical component was mainly determined by the contact width and elastic/plastic deformation, and the molecular component was estimated as a function of the contact area and interfacial shear stress. Numerical results were compared with experimental results and a good agreement was obtained. The model was then used to predict friction coefficients in different operating and surface conditions. Numerical results explain why applied load has a minimum effect on the friction coefficients. They also provide insight into the effect of surface roughness on the mechanical and molecular components of friction coefficients. It is revealed that the mechanical component dominates the friction coefficient when the surface roughness is large (Rq > 0.2 μm), while the friction coefficient is mainly determined by the molecular component when the surface is relatively smooth (Rq < 0.2 μm). Furthermore, optimal roughness values for minimizing the friction coefficient are recommended.

  15. Remote health monitoring: predicting outcome success based on contextual features for cardiovascular disease.

    PubMed

    Alshurafa, Nabil; Eastwood, Jo-Ann; Pourhomayoun, Mohammad; Liu, Jason J; Sarrafzadeh, Majid

    2014-01-01

    Current studies have produced a plethora of remote health monitoring (RHM) systems designed to enhance the care of patients with chronic diseases. Many RHM systems are designed to improve patient risk factors for cardiovascular disease, including physiological parameters such as body mass index (BMI) and waist circumference, and lipid profiles such as low density lipoprotein (LDL) and high density lipoprotein (HDL). There are several patient characteristics that could be determining factors for a patient's RHM outcome success, but these characteristics have been largely unidentified. In this paper, we analyze results from an RHM system deployed in a six month Women's Heart Health study of 90 patients, and apply advanced feature selection and machine learning algorithms to identify patients' key baseline contextual features and build effective prediction models that help determine RHM outcome success. We introduce Wanda-CVD, a smartphone-based RHM system designed to help participants with cardiovascular disease risk factors by motivating participants through wireless coaching using feedback and prompts as social support. We analyze key contextual features that secure positive patient outcomes in both physiological parameters and lipid profiles. Results from the Women's Heart Health study show that health threat of heart disease, quality of life, family history, stress factors, social support, and anxiety at baseline all help predict patient RHM outcome success.

  16. In vivo placental MRI shape and textural features predict fetal growth restriction and postnatal outcome.

    PubMed

    Dahdouh, Sonia; Andescavage, Nickie; Yewale, Sayali; Yarish, Alexa; Lanham, Diane; Bulas, Dorothy; du Plessis, Adre J; Limperopoulos, Catherine

    2018-02-01

    To investigate the ability of three-dimensional (3D) MRI placental shape and textural features to predict fetal growth restriction (FGR) and birth weight (BW) for both healthy and FGR fetuses. We recruited two groups of pregnant volunteers between 18 and 39 weeks of gestation; 46 healthy subjects and 34 FGR. Both groups underwent fetal MR imaging on a 1.5 Tesla GE scanner using an eight-channel receiver coil. We acquired T2-weighted images on either the coronal or the axial plane to obtain MR volumes with a slice thickness of either 4 or 8 mm covering the full placenta. Placental shape features (volume, thickness, elongation) were combined with textural features; first order textural features (mean, variance, kurtosis, and skewness of placental gray levels), as well as, textural features computed on the gray level co-occurrence and run-length matrices characterizing placental homogeneity, symmetry, and coarseness. The features were used in two machine learning frameworks to predict FGR and BW. The proposed machine-learning based method using shape and textural features identified FGR pregnancies with 86% accuracy, 77% precision and 86% recall. BW estimations were 0.3 ± 13.4% (mean percentage error ± standard error) for healthy fetuses and -2.6 ± 15.9% for FGR. The proposed FGR identification and BW estimation methods using in utero placental shape and textural features computed on 3D MR images demonstrated high accuracy in our healthy and high-risk cohorts. Future studies to assess the evolution of each feature with regard to placental development are currently underway. 2 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2018;47:449-458. © 2017 International Society for Magnetic Resonance in Medicine.

  17. Molecular dynamics, flexible docking, virtual screening, ADMET predictions, and molecular interaction field studies to design novel potential MAO-B inhibitors.

    PubMed

    Braun, Glaucia H; Jorge, Daniel M M; Ramos, Henrique P; Alves, Raquel M; da Silva, Vinicius B; Giuliatti, Silvana; Sampaio, Suley Vilela; Taft, Carlton A; Silva, Carlos H T P

    2008-02-01

    Monoamine oxidase is a flavoenzyme bound to the mitochondrial outer membranes of the cells, which is responsible for the oxidative deamination of neurotransmitter and dietary amines. It has two distinct isozymic forms, designated MAO-A and MAO-B, each displaying different substrate and inhibitor specificities. They are the well-known targets for antidepressant, Parkinson's disease, and neuroprotective drugs. Elucidation of the x-ray crystallographic structure of MAO-B has opened the way for the molecular modeling studies. In this work we have used molecular modeling, density functional theory with correlation, virtual screening, flexible docking, molecular dynamics, ADMET predictions, and molecular interaction field studies in order to design new molecules with potential higher selectivity and enzymatic inhibitory activity over MAO-B.

  18. Assessment of Canine Mast Cell Tumor Mortality Risk Based on Clinical, Histologic, Immunohistochemical, and Molecular Features.

    PubMed

    Horta, Rodrigo S; Lavalle, Gleidice E; Monteiro, Lidianne N; Souza, Mayara C C; Cassali, Geovanni D; Araújo, Roberto B

    2018-03-01

    Mast cell tumor (MCT) is a frequent cutaneous neoplasm in dogs that is heterogeneous in clinical presentation and biological behavior, with a variable potential for recurrence and metastasis. Accurate prediction of clinical outcomes has been challenging. The study objective was to develop a system for classification of canine MCT according to the mortality risk based on individual assessment of clinical, histologic, immunohistochemical, and molecular features. The study included 149 dogs with a histologic diagnosis of cutaneous or subcutaneous MCT. By univariate analysis, MCT metastasis and related death was significantly associated with clinical stage ( P < .0001, r P = -0.610), history of tumor recurrence ( P < .0001, r P = -0.550), Patnaik ( P < .0001, r P = -0.380) and Kiupel grades ( P < .0001, r P = -0.500), predominant organization of neoplastic cells ( P < .0001, r P = -0.452), mitotic count ( P < .0001, r P = -0.325), Ki-67 labeling index ( P < .0001, r P = -0.414), KITr pattern ( P = .02, r P = 0.207), and c-KIT mutational status ( P < .0001, r P = -0.356). By multivariate analysis with Cox proportional hazard model, only 2 features were independent predictors of overall survival: an amendment of the World Health Organization clinical staging system (hazard ratio [95% CI]: 1.824 [1.210-4.481]; P = .01) and a history of tumor recurrence (hazard ratio [95% CI]: 9.250 [2.158-23.268]; P < .001]. From these results, we propose an amendment of the WHO staging system, a method of risk analysis, and a suggested approach to clinical and laboratory evaluation of dogs with cutaneous MCT.

  19. Molecular Features of Wheat Endosperm Arabinoxylan Inclusion in Functional Bread

    PubMed Central

    Li, Weili; Hu, Hui; Wang, Qi; Brennan, Charles J.

    2013-01-01

    Arabinoxylan (AX) is a major dietary fibre component found in a variety of cereals. Numerous health benefits of arabinoxylans have been reported to be associated with their solubility and molecular features. The current study reports the development of a functional bread using a combination of AX-enriched material (AEM) and optimal commercial endoxylanase. The total AX content of bread was increased to 8.2 g per 100 g available carbohydrates. The extractability of AX in breads with and without endoxylanase was determined. The results demonstrate that water-extractable AX (WE-AX) increased progressively through the bread making process. The application of endoxylanase also increased WE-AX content. The presence of 360 ppm of endoxylanase had positive effects on the bread characteristics in terms of bread volume and firmness by converting the water unextractable (WU)-AX to WE-AX. In addition, the molecular weight (Mw) distribution of the WE-AX of bread with and without endoxylanase was characterized by size-exclusion chromatography. The results show that as the portion of WE-AX increased, the amount of high Mw WE-AX (higher than 100 kDa) decreased, whereas the amount of low Mw WE-AX (lower than 100 kDa) increased from 33.2% to 44.2% through the baking process. The low Mw WE-AX further increased to 75.5% with the application of the optimal endoxylanase (360 ppm). PMID:28239111

  20. A molecular topology approach to predicting pesticide pollution of groundwater

    USGS Publications Warehouse

    Worrall , Fred

    2001-01-01

    Various models have proposed methods for the discrimination of polluting and nonpolluting compounds on the basis of simple parameters, typically adsorption and degradation constants. However, such attempts are prone to site variability and measurement error to the extent that compounds cannot be reliably classified nor the chemistry of pollution extrapolated from them. Using observations of pesticide occurrence in U.S. groundwater it is possible to show that polluting from nonpolluting compounds can be distinguished purely on the basis of molecular topology. Topological parameters can be derived without measurement error or site-specific variability. A logistic regression model has been developed which explains 97% of the variation in the data, with 86% of the variation being explained by the rule that a compound will be found in groundwater if 6 < 0.55. Where 6χp is the sixth-order molecular path connectivity. One group of compounds cannot be classified by this rule and prediction requires reference to higher order connectivity parameters. The use of molecular approaches for understanding pollution at the molecular level and their application to agrochemical development and risk assessment is discussed.

  1. Feature Selection, Flaring Size and Time-to-Flare Prediction Using Support Vector Regression, and Automated Prediction of Flaring Behavior Based on Spatio-Temporal Measures Using Hidden Markov Models

    NASA Astrophysics Data System (ADS)

    Al-Ghraibah, Amani

    Solar flares release stored magnetic energy in the form of radiation and can have significant detrimental effects on earth including damage to technological infrastructure. Recent work has considered methods to predict future flare activity on the basis of quantitative measures of the solar magnetic field. Accurate advanced warning of solar flare occurrence is an area of increasing concern and much research is ongoing in this area. Our previous work 111] utilized standard pattern recognition and classification techniques to determine (classify) whether a region is expected to flare within a predictive time window, using a Relevance Vector Machine (RVM) classification method. We extracted 38 features which describing the complexity of the photospheric magnetic field, the result classification metrics will provide the baseline against which we compare our new work. We find a true positive rate (TPR) of 0.8, true negative rate (TNR) of 0.7, and true skill score (TSS) of 0.49. This dissertation proposes three basic topics; the first topic is an extension to our previous work [111, where we consider a feature selection method to determine an appropriate feature subset with cross validation classification based on a histogram analysis of selected features. Classification using the top five features resulting from this analysis yield better classification accuracies across a large unbalanced dataset. In particular, the feature subsets provide better discrimination of the many regions that flare where we find a TPR of 0.85, a TNR of 0.65 sightly lower than our previous work, and a TSS of 0.5 which has an improvement comparing with our previous work. In the second topic, we study the prediction of solar flare size and time-to-flare using support vector regression (SVR). When we consider flaring regions only, we find an average error in estimating flare size of approximately half a GOES class. When we additionally consider non-flaring regions, we find an increased average

  2. Critical Features Predicting Sustained Implementation of School-Wide Positive Behavioral Interventions and Supports

    ERIC Educational Resources Information Center

    Mathews, Susanna; McIntosh, Kent; Frank, Jennifer L.; May, Seth L.

    2014-01-01

    The current study explored the extent to which a common measure of perceived implementation of critical features of Positive Behavioral Interventions and Supports (PBIS) predicted fidelity of implementation 3 years later. Respondents included school personnel from 261 schools across the United States implementing PBIS. School teams completed the…

  3. Ultra high molecular weight polyethylene: Optical features at millimeter wavelengths

    NASA Astrophysics Data System (ADS)

    D'Alessandro, G.; Paiella, A.; Coppolecchia, A.; Castellano, M. G.; Colantoni, I.; de Bernardis, P.; Lamagna, L.; Masi, S.

    2018-05-01

    The next generation of experiments for the measurement of the Cosmic Microwave Background (CMB) requires more and more the use of advanced materials, with specific physical and structural properties. An example is the material used for receiver's cryostat windows and internal lenses. The large throughput of current CMB experiments requires a large diameter (of the order of 0.5 m) of these parts, resulting in heavy structural and optical requirements on the material to be used. Ultra High Molecular Weight (UHMW) polyethylene (PE) features high resistance to traction and good transmissivity in the frequency range of interest. In this paper, we discuss the possibility of using UHMW PE for windows and lenses in experiments working at millimeter wavelengths, by measuring its optical properties: emissivity, transmission and refraction index. Our measurements show that the material is well suited to this purpose.

  4. Using vibrational molecular spectroscopy to reveal association of steam-flaking induced carbohydrates molecular structural changes with grain fractionation, biodigestion and biodegradation

    NASA Astrophysics Data System (ADS)

    Xu, Ningning; Liu, Jianxin; Yu, Peiqiang

    2018-04-01

    Advanced vibrational molecular spectroscopy has been developed as a rapid and non-destructive tool to reveal intrinsic molecular structure conformation of biological tissues. However, this technique has not been used to systematically study flaking induced structure changes at a molecular level. The objective of this study was to use vibrational molecular spectroscopy to reveal association between steam flaking induced CHO molecular structural changes in relation to grain CHO fractionation, predicted CHO biodegradation and biodigestion in ruminant system. The Attenuate Total Reflectance Fourier-transform Vibrational Molecular Spectroscopy (ATR-Ft/VMS) at SRP Key Lab of Molecular Structure and Molecular Nutrition, Ministry of Agriculture Strategic Research Chair Program (SRP, University of Saskatchewan) was applied in this study. The fractionation, predicted biodegradation and biodigestion were evaluated using the Cornell Net Carbohydrate Protein System. The results show that: (1) The steam flaking induced significant changes in CHO subfractions, CHO biodegradation and biodigestion in ruminant system. There were significant differences between non-processed (raw) and steam flaked grain corn (P < .01); (2) The ATR-Ft/VMS molecular technique was able to detect the processing induced CHO molecular structure changes; (3) Induced CHO molecular structure spectral features are significantly correlated (P < .05) to CHO subfractions, CHO biodegradation and biodigestion and could be applied to potentially predict CHO biodegradation (R2 = 0.87, RSD = 0.74, P < .01) and intestinal digestible undegraded CHO (R2 = 0.87, RSD = 0.24, P < .01). In summary, the processing induced molecular CHO structure changes in grain corn could be revealed by the ATR-Ft/VMS vibrational molecular spectroscopy. These molecular structure changes in grain were potentially associated with CHO biodegradation and biodigestion.

  5. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

    PubMed Central

    Khan, Tarik A.; Friedensohn, Simon; de Vries, Arthur R. Gorter; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T.

    2016-01-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518

  6. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting.

    PubMed

    Khan, Tarik A; Friedensohn, Simon; Gorter de Vries, Arthur R; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T

    2016-03-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion-the intraclonal diversity index-which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology.

  7. Assessing the performance of quantitative image features on early stage prediction of treatment effectiveness for ovary cancer patients: a preliminary investigation

    NASA Astrophysics Data System (ADS)

    Zargari, Abolfazl; Du, Yue; Thai, Theresa C.; Gunderson, Camille C.; Moore, Kathleen; Mannel, Robert S.; Liu, Hong; Zheng, Bin; Qiu, Yuchen

    2018-02-01

    The objective of this study is to investigate the performance of global and local features to better estimate the characteristics of highly heterogeneous metastatic tumours, for accurately predicting the treatment effectiveness of the advanced stage ovarian cancer patients. In order to achieve this , a quantitative image analysis scheme was developed to estimate a total of 103 features from three different groups including shape and density, Wavelet, and Gray Level Difference Method (GLDM) features. Shape and density features are global features, which are directly applied on the entire target image; wavelet and GLDM features are local features, which are applied on the divided blocks of the target image. To assess the performance, the new scheme was applied on a retrospective dataset containing 120 recurrent and high grade ovary cancer patients. The results indicate that the three best performed features are skewness, root-mean-square (rms) and mean of local GLDM texture, indicating the importance of integrating local features. In addition, the averaged predicting performance are comparable among the three different categories. This investigation concluded that the local features contains at least as copious tumour heterogeneity information as the global features, which may be meaningful on improving the predicting performance of the quantitative image markers for the diagnosis and prognosis of ovary cancer patients.

  8. Role of tumour molecular and pathology features to estimate colorectal cancer risk for first-degree relatives.

    PubMed

    Win, Aung Ko; Buchanan, Daniel D; Rosty, Christophe; MacInnis, Robert J; Dowty, James G; Dite, Gillian S; Giles, Graham G; Southey, Melissa C; Young, Joanne P; Clendenning, Mark; Walsh, Michael D; Walters, Rhiannon J; Boussioutas, Alex; Smyrk, Thomas C; Thibodeau, Stephen N; Baron, John A; Potter, John D; Newcomb, Polly A; Le Marchand, Loïc; Haile, Robert W; Gallinger, Steven; Lindor, Noralane M; Hopper, John L; Ahnen, Dennis J; Jenkins, Mark A

    2015-01-01

    To estimate risk of colorectal cancer (CRC) for first-degree relatives of CRC cases based on CRC molecular subtypes and tumour pathology features. We studied a cohort of 33,496 first-degree relatives of 4853 incident invasive CRC cases (probands) who were recruited to the Colon Cancer Family Registry through population cancer registries in the USA, Canada and Australia. We categorised the first-degree relatives into four groups: 28,156 of 4095 mismatch repair (MMR)-proficient probands, 2302 of 301 MMR-deficient non-Lynch syndrome probands, 1799 of 271 suspected Lynch syndrome probands and 1239 of 186 Lynch syndrome probands. We compared CRC risk for first-degree relatives stratified by the absence or presence of specific tumour molecular pathology features in probands across each of these four groups and for all groups combined. Compared with first-degree relatives of MMR-proficient CRC cases, a higher risk of CRC was estimated for first-degree relatives of CRC cases with suspected Lynch syndrome (HR 2.06, 95% CI 1.59 to 2.67) and with Lynch syndrome (HR 5.37, 95% CI 4.16 to 6.94), but not with MMR-deficient non-Lynch syndrome (HR 1.04, 95% CI 0.82 to 1.31). A greater risk of CRC was estimated for first-degree relatives if CRC cases were diagnosed before age 50 years, had proximal colon cancer or if their tumours had any of the following: expanding tumour margin, peritumoral lymphocytes, tumour-infiltrating lymphocytes or synchronous CRC. Molecular pathology features are potentially useful to refine screening recommendations for first-degree relatives of CRC cases and to identify which cases are more likely to be caused by genetic or other familial factors. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  9. Feature Detection” vs. “Predictive Coding” Models of Plant Behavior

    PubMed Central

    Calvo, Paco; Baluška, František; Sims, Andrew

    2016-01-01

    In this article we consider the possibility that plants exhibit anticipatory behavior, a mark of intelligence. If plants are able to anticipate and respond accordingly to varying states of their surroundings, as opposed to merely responding online to environmental contingencies, then such capacity may be in principle testable, and subject to empirical scrutiny. Our main thesis is that adaptive behavior can only take place by way of a mechanism that predicts the environmental sources of sensory stimulation. We propose to test for anticipation in plants experimentally by contrasting two empirical hypotheses: “feature detection” and “predictive coding.” We spell out what these contrasting hypotheses consist of by way of illustration from the animal literature, and consider how to transfer the rationale involved to the plant literature. PMID:27757094

  10. Prediction of the Fate of Organic Compounds in the Environment From Their Molecular Properties: A Review

    PubMed Central

    Mamy, Laure; Patureau, Dominique; Barriuso, Enrique; Bedos, Carole; Bessac, Fabienne; Louchart, Xavier; Martin-laurent, Fabrice; Miege, Cecile; Benoit, Pierre

    2015-01-01

    A comprehensive review of quantitative structure-activity relationships (QSAR) allowing the prediction of the fate of organic compounds in the environment from their molecular properties was done. The considered processes were water dissolution, dissociation, volatilization, retention on soils and sediments (mainly adsorption and desorption), degradation (biotic and abiotic), and absorption by plants. A total of 790 equations involving 686 structural molecular descriptors are reported to estimate 90 environmental parameters related to these processes. A significant number of equations was found for dissociation process (pKa), water dissolution or hydrophobic behavior (especially through the KOW parameter), adsorption to soils and biodegradation. A lack of QSAR was observed to estimate desorption or potential of transfer to water. Among the 686 molecular descriptors, five were found to be dominant in the 790 collected equations and the most generic ones: four quantum-chemical descriptors, the energy of the highest occupied molecular orbital (EHOMO) and the energy of the lowest unoccupied molecular orbital (ELUMO), polarizability (α) and dipole moment (μ), and one constitutional descriptor, the molecular weight. Keeping in mind that the combination of descriptors belonging to different categories (constitutional, topological, quantum-chemical) led to improve QSAR performances, these descriptors should be considered for the development of new QSAR, for further predictions of environmental parameters. This review also allows finding of the relevant QSAR equations to predict the fate of a wide diversity of compounds in the environment. PMID:25866458

  11. Prediction of the Fate of Organic Compounds in the Environment From Their Molecular Properties: A Review.

    PubMed

    Mamy, Laure; Patureau, Dominique; Barriuso, Enrique; Bedos, Carole; Bessac, Fabienne; Louchart, Xavier; Martin-Laurent, Fabrice; Miege, Cecile; Benoit, Pierre

    2015-06-18

    A comprehensive review of quantitative structure-activity relationships (QSAR) allowing the prediction of the fate of organic compounds in the environment from their molecular properties was done. The considered processes were water dissolution, dissociation, volatilization, retention on soils and sediments (mainly adsorption and desorption), degradation (biotic and abiotic), and absorption by plants. A total of 790 equations involving 686 structural molecular descriptors are reported to estimate 90 environmental parameters related to these processes. A significant number of equations was found for dissociation process (pK a ), water dissolution or hydrophobic behavior (especially through the K OW parameter), adsorption to soils and biodegradation. A lack of QSAR was observed to estimate desorption or potential of transfer to water. Among the 686 molecular descriptors, five were found to be dominant in the 790 collected equations and the most generic ones: four quantum-chemical descriptors, the energy of the highest occupied molecular orbital (E HOMO ) and the energy of the lowest unoccupied molecular orbital (E LUMO ), polarizability (α) and dipole moment (μ), and one constitutional descriptor, the molecular weight. Keeping in mind that the combination of descriptors belonging to different categories (constitutional, topological, quantum-chemical) led to improve QSAR performances, these descriptors should be considered for the development of new QSAR, for further predictions of environmental parameters. This review also allows finding of the relevant QSAR equations to predict the fate of a wide diversity of compounds in the environment.

  12. Prediction of near-term breast cancer risk using local region-based bilateral asymmetry features in mammography

    NASA Astrophysics Data System (ADS)

    Li, Yane; Fan, Ming; Li, Lihua; Zheng, Bin

    2017-03-01

    This study proposed a near-term breast cancer risk assessment model based on local region bilateral asymmetry features in Mammography. The database includes 566 cases who underwent at least two sequential FFDM examinations. The `prior' examination in the two series all interpreted as negative (not recalled). In the "current" examination, 283 women were diagnosed cancers and 283 remained negative. Age of cancers and negative cases completely matched. These cases were divided into three subgroups according to age: 152 cases among the 37-49 age-bracket, 220 cases in the age-bracket 50- 60, and 194 cases with the 61-86 age-bracket. For each image, two local regions including strip-based regions and difference-of-Gaussian basic element regions were segmented. After that, structural variation features among pixel values and structural similarity features were computed for strip regions. Meanwhile, positional features were extracted for basic element regions. The absolute subtraction value was computed between each feature of the left and right local-regions. Next, a multi-layer perception classifier was implemented to assess performance of features for prediction. Features were then selected according stepwise regression analysis. The AUC achieved 0.72, 0.75 and 0.71 for these 3 age-based subgroups, respectively. The maximum adjustable odds ratios were 12.4, 20.56 and 4.91 for these three groups, respectively. This study demonstrate that the local region-based bilateral asymmetry features extracted from CC-view mammography could provide useful information to predict near-term breast cancer risk.

  13. Predictive features of chronic kidney disease in atypical haemolytic uremic syndrome

    PubMed Central

    Jamme, Matthieu; Raimbourg, Quentin; Chauveau, Dominique; Seguin, Amélie; Presne, Claire; Perez, Pierre; Gobert, Pierre; Wynckel, Alain; Provôt, François; Delmas, Yahsou; Mousson, Christiane; Servais, Aude; Vrigneaud, Laurence; Veyradier, Agnès

    2017-01-01

    Chronic kidney disease (CKD) is a frequent and serious complication of atypical haemolytic uremic syndrome (aHUS). We aimed to develop a simple accurate model to predict the risk of renal dysfunction in aHUS based on clinical and biological features available at hospital admission. Renal function at 1-year follow-up, based on an estimated glomerular filtration rate < 60mL/min/1.73m2 as assessed by the Modification of Diet in Renal Disease equation, was used as an indicator of significant CKD. Prospectively collected data from a cohort of 156 aHUS patients who did not receive eculizumab were used to identify predictors of CKD. Covariates associated with renal impairment were identified by multivariate analysis. The model performance was assessed and a scoring system for clinical practice was constructed from the regression coefficient. Multivariate analyses identified three predictors of CKD: a high serum creatinine level, a high mean arterial pressure and a mildly decreased platelet count. The prognostic model had a good discriminative ability (area under the curve = .84). The scoring system ranged from 0 to 5, with corresponding risks of CKD ranging from 18% to 100%. This model accurately predicts development of 1-year CKD in patients with aHUS using clinical and biological features available on admission. After further validation, this model may assist in clinical decision making. PMID:28542627

  14. Wolfram Syndrome in the Japanese Population; Molecular Analysis of WFS1 Gene and Characterization of Clinical Features

    PubMed Central

    Inoue, Hiroshi; Okuya, Shigeru; Ohta, Yasuharu; Akiyama, Masaru; Taguchi, Akihiko; Kora, Yukari; Okayama, Naoko; Yamada, Yuichiro; Wada, Yasuhiko; Amemiya, Shin; Sugihara, Shigetaka; Nakao, Yuzo; Oka, Yoshitomo; Tanizawa, Yukio

    2014-01-01

    Background Wolfram syndrome (WFS) is a recessive neurologic and endocrinologic degenerative disorder, and is also known as DIDMOAD (Diabetes Insipidus, early-onset Diabetes Mellitus, progressive Optic Atrophy and Deafness) syndrome. Most affected individuals carry recessive mutations in the Wolfram syndrome 1 gene (WFS1). However, the phenotypic pleiomorphism, rarity and molecular complexity of this disease complicate our efforts to understand WFS. To address this limitation, we aimed to describe complications and to elucidate the contributions of WFS1 mutations to clinical manifestations in Japanese patients with WFS. Methodology The minimal ascertainment criterion for diagnosing WFS was having both early onset diabetes mellitus and bilateral optic atrophy. Genetic analysis for WFS1 was performed by direct sequencing. Principal Findings Sixty-seven patients were identified nationally for a prevalence of one per 710,000, with 33 patients (49%) having all 4 components of DIDMOAD. In 40 subjects who agreed to participate in this investigation from 30 unrelated families, the earliest manifestation was DM at a median age of 8.7 years, followed by OA at a median age of 15.8 years. However, either OA or DI was the first diagnosed feature in 6 subjects. In 10, features other than DM predated OA. Twenty-seven patients (67.5%) had a broad spectrum of recessive mutations in WFS1. Two patients had mutations in only one allele. Eleven patients (27.5%) had intact WFS1 alleles. Ages at onset of both DM and OA in patients with recessive WFS1 mutations were indistinguishable from those in patients without WFS1 mutations. In the patients with predicted complete loss-of-function mutations, ages at the onsets of both DM and OA were significantly earlier than those in patients with predicted partial-loss-of function mutations. Conclusion/Significance This study emphasizes the clinical and genetic heterogeneity in patients with WFS. Genotype-phenotype correlations may exist in patients

  15. Congenital hyperinsulinism as the presenting feature of Kabuki syndrome: clinical and molecular characterization of 10 affected individuals.

    PubMed

    Yap, Kai Lee; Johnson, Amy E Knight; Fischer, David; Kandikatla, Priscilla; Deml, Jacea; Nelakuditi, Viswateja; Halbach, Sara; Jeha, George S; Burrage, Lindsay C; Bodamer, Olaf; Benavides, Valeria C; Lewis, Andrea M; Ellard, Sian; Shah, Pratik; Cody, Declan; Diaz, Alejandro; Devarajan, Aishwarya; Truong, Lisa; Greeley, Siri Atma W; De Leó-Crutchlow, Diva D; Edmondson, Andrew C; Das, Soma; Thornton, Paul; Waggoner, Darrel; Del Gaudio, Daniela

    2018-06-15

    Describe the clinical and molecular findings of patients with Kabuki syndrome (KS) who present with hypoglycemia due to congenital hyperinsulinism (HI), and assess the incidence of KS in patients with HI. We documented the clinical features and molecular diagnoses of 10 infants with persistent HI and KS via a combination of sequencing and copy-number profiling methodologies. Subsequently, we retrospectively evaluated 100 infants with HI lacking a genetic diagnosis, for causative variants in KS genes. Molecular diagnoses of KS were established by identification of pathogenic variants in KMT2D (n = 5) and KDM6A (n = 5). Among the 100 infants with HI of unknown genetic etiology, a KS diagnosis was uncovered in one patient. The incidence of HI among patients with KS may be higher than previously reported, and KS may account for as much as 1% of patients diagnosed with HI. As the recognition of dysmorphic features associated with KS is challenging in the neonatal period, we propose KS should be considered in the differential diagnosis of HI. Since HI in patients with KS is well managed medically, a timely recognition of hyperinsulinemic episodes will improve outcomes, and prevent aggravation of the preexisting mild to moderate intellectual disability in KS.

  16. Prediction of cervical cancer recurrence using textural features extracted from 18F-FDG PET images acquired with different scanners.

    PubMed

    Reuzé, Sylvain; Orlhac, Fanny; Chargari, Cyrus; Nioche, Christophe; Limkin, Elaine; Riet, François; Escande, Alexandre; Haie-Meder, Christine; Dercle, Laurent; Gouy, Sébastien; Buvat, Irène; Deutsch, Eric; Robert, Charlotte

    2017-06-27

    To identify an imaging signature predicting local recurrence for locally advanced cervical cancer (LACC) treated by chemoradiation and brachytherapy from baseline 18F-FDG PET images, and to evaluate the possibility of gathering images from two different PET scanners in a radiomic study. 118 patients were included retrospectively. Two groups (G1, G2) were defined according to the PET scanner used for image acquisition. Eleven radiomic features were extracted from delineated cervical tumors to evaluate: (i) the predictive value of features for local recurrence of LACC, (ii) their reproducibility as a function of the scanner within a hepatic reference volume, (iii) the impact of voxel size on feature values. Eight features were statistically significant predictors of local recurrence in G1 (p < 0.05). The multivariate signature trained in G2 was validated in G1 (AUC=0.76, p<0.001) and identified local recurrence more accurately than SUVmax (p=0.022). Four features were significantly different between G1 and G2 in the liver. Spatial resampling was not sufficient to explain the stratification effect. This study showed that radiomic features could predict local recurrence of LACC better than SUVmax. Further investigation is needed before applying a model designed using data from one PET scanner to another.

  17. Predicting Molecular Crystal Properties from First Principles: Finite-Temperature Thermochemistry to NMR Crystallography.

    PubMed

    Beran, Gregory J O; Hartman, Joshua D; Heit, Yonaton N

    2016-11-15

    Molecular crystals occur widely in pharmaceuticals, foods, explosives, organic semiconductors, and many other applications. Thanks to substantial progress in electronic structure modeling of molecular crystals, attention is now shifting from basic crystal structure prediction and lattice energy modeling toward the accurate prediction of experimentally observable properties at finite temperatures and pressures. This Account discusses how fragment-based electronic structure methods can be used to model a variety of experimentally relevant molecular crystal properties. First, it describes the coupling of fragment electronic structure models with quasi-harmonic techniques for modeling the thermal expansion of molecular crystals, and what effects this expansion has on thermochemical and mechanical properties. Excellent agreement with experiment is demonstrated for the molar volume, sublimation enthalpy, entropy, and free energy, and the bulk modulus of phase I carbon dioxide when large basis second-order Møller-Plesset perturbation theory (MP2) or coupled cluster theories (CCSD(T)) are used. In addition, physical insight is offered into how neglect of thermal expansion affects these properties. Zero-point vibrational motion leads to an appreciable expansion in the molar volume; in carbon dioxide, it accounts for around 30% of the overall volume expansion between the electronic structure energy minimum and the molar volume at the sublimation point. In addition, because thermal expansion typically weakens the intermolecular interactions, neglecting thermal expansion artificially stabilizes the solid and causes the sublimation enthalpy to be too large at higher temperatures. Thermal expansion also frequently weakens the lower-frequency lattice phonon modes; neglecting thermal expansion causes the entropy of sublimation to be overestimated. Interestingly, the sublimation free energy is less significantly affected by neglecting thermal expansion because the systematic

  18. ROLE OF MOLECULAR MARKERS IN THYROID NODULE MANAGEMENT: THEN AND NOW.

    PubMed

    Nikiforov, Yuri E

    2017-08-01

    To describe the evolution and clinical utility of molecular testing for thyroid nodules and cancer achieved over the last 2 decades. Scientific reports on thyroid cancer genetics and molecular diagnostics in thyroid nodules. Over the last 2 decades, our understanding of the genetic mechanisms of thyroid cancer has dramatically expanded, such that most thyroid cancers now have known gene driver events. This knowledge provides the basis for establishing and further improving molecular tests for thyroid nodules and cancer and for the introduction of new entities such as noninvasive follicular thyroid neoplasm with papillary-like nuclear features. The progress with molecular tests for thyroid nodules started in the 1990s from demonstrating feasibility of detecting various molecular alterations in fine-needle aspiration (FNA) material collected from thyroid nodules. It was followed by the introduction of the first single-gene mutational markers, such as BRAF, and a small mutational panel into clinical practice in the mid 2000s. Currently, several more advanced molecular tests are available for clinical use. They are based on multiple molecular markers and have increasing impact on the clinical management of patients with thyroid nodules. The evolution of molecular tests for thyroid nodules followed the discovery of various diagnostic and prognostic molecular markers of thyroid cancer that can be applied to thyroid FNA samples to inform more individualized management of these patients. FNA = fine-needle aspiration miRNA = micro RNA NGS = next-generation sequencing NIFTP = noninvasive follicular thyroid neoplasm with papillary-like nuclear features NPV = negative predictive value PPV = positive predictive value PTC = papillary thyroid carcinoma RAI = radioactive iodine.

  19. Prediction of glass transition temperature of freeze-dried formulations by molecular dynamics simulation.

    PubMed

    Yoshioka, Sumie; Aso, Yukio; Kojima, Shigeo

    2003-06-01

    To examine whether the glass transition temperature (Tg) of freeze-dried formulations containing polymer excipients can be accurately predicted by molecular dynamics simulation using software currently available on the market. Molecular dynamics simulations were carried out for isomaltodecaose, a fragment of dextran, and alpha-glucose, the repeated unit of dextran. in the presence or absence of water molecules. Estimated values of Tg were compared with experimental values obtained by differential scanning calorimetry (DSC). Isothermal-isobaric molecular dynamics simulations (NPTMD) and isothermal molecular dynamics simulations at a constant volume (NVTMD) were carried out using the software package DISCOVER (Material Studio) with the Polymer Consortium Force Field. Mean-squared displacement and radial distribution function were calculated. NVTMD using the values of density obtained by NPTMD provided the diffusivity of glucose-ring oxygen and water oxygen in amorphous alpha-glucose and isomaltodecaose, which exhibited a discontinuity in temperature dependence due to glass transition. Tg was estimated to be approximately 400K and 500K for pure amorphous a-glucose and isomaltodecaose, respectively, and in the presence of one water molecule per glucose unit, Tg was 340K and 360K, respectively. Estimated Tg values were higher than experimentally determined values because of the very fast cooling rates in the simulations. However, decreases in Tg on hydration and increases in Tg associated with larger fragment size could be demonstrated. The results indicate that molecular dynamics simulation is a useful method for investigating the effects of hydration and molecular weight on the Tg of lyophilized formulations containing polymer excipients. although the relationship between cooling rates and Tg must first be elucidated to predict Tg vales observed by DSC measurement. January 16.

  20. COPRED: prediction of fold, GO molecular function and functional residues at the domain level.

    PubMed

    López, Daniel; Pazos, Florencio

    2013-07-15

    Only recently the first resources devoted to the functional annotation of proteins at the domain level started to appear. The next step is to develop specific methodologies for predicting function at the domain level based on these resources, and to implement them in web servers to be used by the community. In this work, we present COPRED, a web server for the concomitant prediction of fold, molecular function and functional sites at the domain level, based on a methodology for domain molecular function prediction and a resource of domain functional annotations previously developed and benchmarked. COPRED can be freely accessed at http://csbg.cnb.csic.es/copred. The interface works in all standard web browsers. WebGL (natively supported by most browsers) is required for the in-line preview and manipulation of protein 3D structures. The website includes a detailed help section and usage examples. pazos@cnb.csic.es.

  1. Molecular Features Underlying Selectivity in Chicken Bitter Taste Receptors.

    PubMed

    Di Pizio, Antonella; Shy, Nitzan; Behrens, Maik; Meyerhof, Wolfgang; Niv, Masha Y

    2018-01-01

    Chickens sense the bitter taste of structurally different molecules with merely three bitter taste receptors ( Gallus gallus taste 2 receptors, ggTas2rs), representing a minimal case of bitter perception. Some bitter compounds like quinine, diphenidol and chlorpheniramine, activate all three ggTas2rs, while others selectively activate one or two of the receptors. We focus on bitter compounds with different selectivity profiles toward the three receptors, to shed light on the molecular recognition complexity in bitter taste. Using homology modeling and induced-fit docking simulations, we investigated the binding modes of ggTas2r agonists. Interestingly, promiscuous compounds are predicted to establish polar interactions with position 6.51 and hydrophobic interactions with positions 3.32 and 5.42 in all ggTas2rs; whereas certain residues are responsible for receptor selectivity. Lys 3.29 and Asn 3.36 are suggested as ggTas2r1-specificity-conferring residues; Gln 6.55 as ggTas2r2-specificity-conferring residue; Ser 5.38 and Gln 7.42 as ggTas2r7-specificity conferring residues. The selectivity profile of quinine analogs, quinidine, epiquinidine and ethylhydrocupreine, was then characterized by combining calcium-imaging experiments and in silico approaches. ggTas2r models were used to virtually screen BitterDB compounds. ~50% of compounds known to be bitter to human are likely to be bitter to chicken, with 25, 20, 37% predicted to be ggTas2r1, ggTas2r2, ggTas2r7 agonists, respectively. Predicted ggTas2rs agonists can be tested with in vitro and in vivo experiments, contributing to our understanding of bitter taste in chicken and, consequently, to the improvement of chicken feed.

  2. Molecular Features of Dissolved Organic Matter Produced by Picophytoplankton

    NASA Astrophysics Data System (ADS)

    Ma, X.; Coleman, M.; Waldbauer, J.

    2016-02-01

    Compounds derived from picophytoplankton through exudation, grazing and viral lysis contribute a large proportion of labile DOM to the ocean. This labile DOM is rapidly turned over by and exchanged among microbial communities. However, identifying labile DOM compounds and tracking their sources and sinks in ocean ecosystems is complicated by the presence of non-labile DOM which has a significantly larger reservoir size and longer residence time. This study focuses on investigating labile DOM produced by single-strain cyanobacteria isolates via different modes of release and varied nutrient conditions. DOM compounds are analyzed by high-resolution mass spectrometry. Statistical comparison between intracellular and extracellular molecular data of Synechococcus WH7803 revealed noticeable differences in terms of compound number, size and structure. Incubation experiments using combined whole seawater and diluent of grazer-free or viral-free water at the BATS time-series station in Sargasso Sea yielded complimentary data to be synthesized with data from lab cultures. The compositional features of each type of DOM could serve as future proxies for different modes of DOM production in the oceans.

  3. Biased ART: a neural architecture that shifts attention toward previously disregarded features following an incorrect prediction.

    PubMed

    Carpenter, Gail A; Gaddam, Sai Chaitanya

    2010-04-01

    Memories in Adaptive Resonance Theory (ART) networks are based on matched patterns that focus attention on those portions of bottom-up inputs that match active top-down expectations. While this learning strategy has proved successful for both brain models and applications, computational examples show that attention to early critical features may later distort memory representations during online fast learning. For supervised learning, biased ARTMAP (bARTMAP) solves the problem of over-emphasis on early critical features by directing attention away from previously attended features after the system makes a predictive error. Small-scale, hand-computed analog and binary examples illustrate key model dynamics. Two-dimensional simulation examples demonstrate the evolution of bARTMAP memories as they are learned online. Benchmark simulations show that featural biasing also improves performance on large-scale examples. One example, which predicts movie genres and is based, in part, on the Netflix Prize database, was developed for this project. Both first principles and consistent performance improvements on all simulation studies suggest that featural biasing should be incorporated by default in all ARTMAP systems. Benchmark datasets and bARTMAP code are available from the CNS Technology Lab Website: http://techlab.bu.edu/bART/. Copyright 2009 Elsevier Ltd. All rights reserved.

  4. Latent feature decompositions for integrative analysis of multi-platform genomic data

    PubMed Central

    Gregory, Karl B.; Momin, Amin A.; Coombes, Kevin R.; Baladandayuthapani, Veerabhadran

    2015-01-01

    Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to, a glioblastoma multiforme dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between

  5. Ab initio NMR Confirmed Evolutionary Structure Prediction for Organic Molecular Crystals

    NASA Astrophysics Data System (ADS)

    Pham, Cong-Huy; Kucukbenli, Emine; de Gironcoli, Stefano

    2015-03-01

    Ab initio crystal structure prediction of even small organic compounds is extremely challenging due to polymorphism, molecular flexibility and difficulties in addressing the dispersion interaction from first principles. We recently implemented vdW-aware density functionals and demonstrated their success in energy ordering of aminoacid crystals. In this work we combine this development with the evolutionary structure prediction method to study cholesterol polymorphs. Cholesterol crystals have paramount importance in various diseases, from cancer to atherosclerosis. The structure of some polymorphs (e.g. ChM, ChAl, ChAh) have already been resolved while some others, which display distinct NMR spectra and are involved in disease formation, are yet to be determined. Here we thoroughly assess the applicability of evolutionary structure prediction to address such real world problems. We validate the newly predicted structures with ab initio NMR chemical shift data using secondary referencing for an improved comparison with experiments.

  6. Molecular activity prediction by means of supervised subspace projection based ensembles of classifiers.

    PubMed

    Cerruela García, G; García-Pedrajas, N; Luque Ruiz, I; Gómez-Nieto, M Á

    2018-03-01

    This paper proposes a method for molecular activity prediction in QSAR studies using ensembles of classifiers constructed by means of two supervised subspace projection methods, namely nonparametric discriminant analysis (NDA) and hybrid discriminant analysis (HDA). We studied the performance of the proposed ensembles compared to classical ensemble methods using four molecular datasets and eight different models for the representation of the molecular structure. Using several measures and statistical tests for classifier comparison, we observe that our proposal improves the classification results with respect to classical ensemble methods. Therefore, we show that ensembles constructed using supervised subspace projections offer an effective way of creating classifiers in cheminformatics.

  7. Apocalypse...now? Molecular epidemiology, predictive genetic tests, and social communication of genetic contents.

    PubMed

    Castiel, L D

    1999-01-01

    The author analyzes the underlying theoretical aspects in the construction of the molecular watershed of epidemiology and the concept of genetic risk, focusing on issues raised by contemporary reality: new technologies, globalization, proliferation of communications strategies, and the dilution of identity matrices. He discusses problems pertaining to the establishment of such new interdisciplinary fields as molecular epidemiology and molecular genetics. Finally, he analyzes the repercussions of the social communication of genetic content, especially as related to predictive genetic tests and cloning of animals, based on triumphal, deterministic metaphors sustaining beliefs relating to the existence and supremacy of concepts such as 'purity', 'essence', and 'unification' of rational, integrated 'I's/egos'.

  8. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification.

    PubMed

    Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan

    2015-06-01

    Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. An ensemble predictive modeling framework for breast cancer classification.

    PubMed

    Nagarajan, Radhakrishnan; Upreti, Meenakshi

    2017-12-01

    Molecular changes often precede clinical presentation of diseases and can be useful surrogates with potential to assist in informed clinical decision making. Recent studies have demonstrated the usefulness of modeling approaches such as classification that can predict the clinical outcomes from molecular expression profiles. While useful, a majority of these approaches implicitly use all molecular markers as features in the classification process often resulting in sparse high-dimensional projection of the samples often comparable to that of the sample size. In this study, a variant of the recently proposed ensemble classification approach is used for predicting good and poor-prognosis breast cancer samples from their molecular expression profiles. In contrast to traditional single and ensemble classifiers, the proposed approach uses multiple base classifiers with varying feature sets obtained from two-dimensional projection of the samples in conjunction with a majority voting strategy for predicting the class labels. In contrast to our earlier implementation, base classifiers in the ensembles are chosen based on maximal sensitivity and minimal redundancy by choosing only those with low average cosine distance. The resulting ensemble sets are subsequently modeled as undirected graphs. Performance of four different classification algorithms is shown to be better within the proposed ensemble framework in contrast to using them as traditional single classifier systems. Significance of a subset of genes with high-degree centrality in the network abstractions across the poor-prognosis samples is also discussed. Copyright © 2017 Elsevier Inc. All rights reserved.

  10. Correlation of tumor-infiltrating lymphocytes to histopathological features and molecular phenotypes in canine mammary carcinoma: A morphologic and immunohistochemical morphometric study.

    PubMed

    Kim, Jong-Hyuk; Chon, Seung-Ki; Im, Keum-Soon; Kim, Na-Hyun; Sur, Jung-Hyang

    2013-04-01

    Abundant lymphocyte infiltration is frequently found in canine malignant mammary tumors, but the pathological features and immunophenotypes associated with the infiltration remain to be elucidated. The aim of the present study was to evaluate the relationship between lymphocyte infiltration, histopathological features, and molecular phenotype in canine mammary carcinoma (MC). The study was done with archived formalin-fixed, paraffin-embedded samples (n = 47) by histologic and immunohistochemical methods. The degree of lymphocyte infiltration was evaluated by morphologic analysis, and the T- and B-cell populations as well as the T/B-cell ratio were evaluated by morphometric analysis; results were compared with the histologic features and molecular phenotypes. The degree of lymphocyte infiltration was significantly higher in MCs with lymphatic invasion than in those without lymphatic invasion (P < 0.0001) and in tumors of high histologic grade compared with those of lower histologic grade (P = 0.045). Morphometric analysis showed a larger amount of T-cells and B-cells in MCs with a higher histologic grade and lymphatic invasion, but the T/B ratio did not change. Lymphocyte infiltration was not associated with histologic type or molecular phenotype, as assessed from the immunohistochemical expression of epidermal growth factor receptor 2, estrogen receptor, cytokeratin 14, and p63. Since intense lymphocyte infiltration was associated with aggressive histologic features, lymphocytes may be important for tumor aggressiveness and greater malignant behavior in the tumor microenvironment.

  11. Applying quantitative adiposity feature analysis models to predict benefit of bevacizumab-based chemotherapy in ovarian cancer patients

    NASA Astrophysics Data System (ADS)

    Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin

    2016-03-01

    How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.

  12. Towards the Improved Discovery and Design of Functional Peptides: Common Features of Diverse Classes Permit Generalized Prediction of Bioactivity

    PubMed Central

    Mooney, Catherine; Haslam, Niall J.; Pollastri, Gianluca; Shields, Denis C.

    2012-01-01

    The conventional wisdom is that certain classes of bioactive peptides have specific structural features that endow their particular functions. Accordingly, predictions of bioactivity have focused on particular subgroups, such as antimicrobial peptides. We hypothesized that bioactive peptides may share more general features, and assessed this by contrasting the predictive power of existing antimicrobial predictors as well as a novel general predictor, PeptideRanker, across different classes of peptides. We observed that existing antimicrobial predictors had reasonable predictive power to identify peptides of certain other classes i.e. toxin and venom peptides. We trained two general predictors of peptide bioactivity, one focused on short peptides (4–20 amino acids) and one focused on long peptides ( amino acids). These general predictors had performance that was typically as good as, or better than, that of specific predictors. We noted some striking differences in the features of short peptide and long peptide predictions, in particular, high scoring short peptides favour phenylalanine. This is consistent with the hypothesis that short and long peptides have different functional constraints, perhaps reflecting the difficulty for typical short peptides in supporting independent tertiary structure. We conclude that there are general shared features of bioactive peptides across different functional classes, indicating that computational prediction may accelerate the discovery of novel bioactive peptides and aid in the improved design of existing peptides, across many functional classes. An implementation of the predictive method, PeptideRanker, may be used to identify among a set of peptides those that may be more likely to be bioactive. PMID:23056189

  13. Preoperative Molecular Markers in Thyroid Nodules.

    PubMed

    Sahli, Zeyad T; Smith, Philip W; Umbricht, Christopher B; Zeiger, Martha A

    2018-01-01

    The need for distinguishing benign from malignant thyroid nodules has led to the pursuit of differentiating molecular markers. The most common molecular tests in clinical use are Afirma ® Gene Expression Classifier (GEC) and Thyroseq ® V2. Despite the rapidly developing field of molecular markers, several limitations exist. These challenges include the recent introduction of the histopathological diagnosis "Non-Invasive Follicular Thyroid neoplasm with Papillary-like nuclear features", the correlation of genetic mutations within both benign and malignant pathologic diagnoses, the lack of follow-up of molecular marker negative nodules, and the cost-effectiveness of molecular markers. In this manuscript, we review the current published literature surrounding the diagnostic value of Afirma ® GEC and Thyroseq ® V2. Among Afirma ® GEC studies, sensitivity (Se), specificity (Sp), positive predictive value (PPV), and negative predictive value (NPV) ranged from 75 to 100%, 5 to 53%, 13 to 100%, and 20 to 100%, respectively. Among Thyroseq ® V2 studies, Se, Sp, PPV, and NPV ranged from 40 to 100%, 56 to 93%, 13 to 90%, and 48 to 97%, respectively. We also discuss current challenges to Afirma ® GEC and Thyroseq ® V2 utility and clinical application, and preview the future directions of these rapidly developing technologies.

  14. Thermophysical properties of liquid UO2, ZrO2 and corium by molecular dynamics and predictive models

    NASA Astrophysics Data System (ADS)

    Kim, Woong Kee; Shim, Ji Hoon; Kaviany, Massoud

    2017-08-01

    Predicting the fate of accident-melted nuclear fuel-cladding requires the understanding of the thermophysical properties which are lacking or have large scatter due to high-temperature experimental challenges. Using equilibrium classical molecular dynamics (MD), we predict the properties of melted UO2 and ZrO2 and compare them with the available experimental data and the predictive models. The existing interatomic potential models have been developed mainly for the polymorphic solid phases of these oxides, so they cannot be used to predict all the properties accurately. We compare and decipher the distinctions of those MD predictions using the specific property-related autocorrelation decays. The predicted properties are density, specific heat, heat of fusion, compressibility, viscosity, surface tension, and the molecular and electronic thermal conductivities. After the comparisons, we provide readily usable temperature-dependent correlations (including UO2-ZrO2 compounds, i.e. corium melt).

  15. Modeling Far-UV Fluorescent Emission Features of Warm Molecular Hydrogen in the Inner Regions of Protoplanetary Disks

    NASA Astrophysics Data System (ADS)

    Hoadley, Keri; France, Kevin

    2015-01-01

    Probing the surviving molecular gas within the inner regions of protoplanetary disks (PPDs) around T Tauri stars (1 - 10 Myr) provides insight into the conditions in which planet formation and migration occurs while the gas disk is still present. We model observed far ultraviolet (FUV) molecular hydrogen (H₂) fluorescent emission lines that originate within the inner regions (< 10 AU) of 9 well-studied Classic T Tauri stars, using the Hubble Space Telescope Cosmic Origins Spectrograph (COS), to explore the physical structure of the molecular disk at different PPD dust evolutionary stages. We created a 2D radiative transfer model that estimates the density and temperature distributions of warm, inner radial H₂ (T > 1500 K) with a set of 6 free parameters and produces a data cube of expected emission line profiles that describe the physical structure of the inner molecular disk atmosphere. By comparing the modeled emission lines with COS H₂ fluorescence emission features, we estimate the physical structure of the molecular disk atmosphere for each target with the set of free parameters that best replicate the observed lines. First results suggest that, for all dust evolutionary stages of disks considered, ground-state H₂ populations are described by a roughly constant temperature T(H₂) = 2500 +/- 1000 K. Possible evolution of the density structure of the H₂ atmosphere between intact and depleting dust disks may be distinguishable, but large errors in the inferred best-fit parameter sets prevent us from making this conclusion. Further improvements to the modeling framework and statistical comparison in determining the best-fit model-to-data parameter sets are ongoing, beginning with improvements to the radiative transfer model and use of up-to-date HI Lyman α absorption optical depths (see McJunkin in posters) to better estimate disk structural parameters. Once improvements are implemented, we will investigate the possible presence of a molecular wind

  16. Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning.

    PubMed

    Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin

    2016-11-01

    Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features. Copyright © 2016. Published by Elsevier Inc.

  17. Tracking the Correlation Between CpG Island Methylator Phenotype and Other Molecular Features and Clinicopathological Features in Human Colorectal Cancers: A Systematic Review and Meta-Analysis.

    PubMed

    Zong, Liang; Abe, Masanobu; Ji, Jiafu; Zhu, Wei-Guo; Yu, Duonan

    2016-03-10

    The controversy of CpG island methylator phenotype (CIMP) in colorectal cancers (CRCs) persists, despite many studies that have been conducted on its correlation with molecular and clinicopathological features. To drive a more precise estimate of the strength of this postulated relationship, a meta-analysis was performed. A comprehensive search for studies reporting molecular and clinicopathological features of CRCs stratified by CIMP was performed within the PubMed, EMBASE, and Cochrane Library. CIMP was defined by either one of the three panels of gene-specific CIMP markers (Weisenberger panel, classic panel, or a mixture panel of the previous two) or the genome-wide DNA methylation profile. The associations of CIMP with outcome parameters were estimated using odds ratio (OR) or weighted mean difference (WMD) or hazard ratios (HRs) with 95% confidence interval (CI) for each study using a fixed effects or random effects model. A total of 29 studies involving 9,393 CRC patients were included for analysis. We observed more BRAF mutations (OR 34.87; 95% CI, 22.49-54.06) and microsatellite instability (MSI) (OR 12.85 95% CI, 8.84-18.68) in CIMP-positive vs. -negative CRCs, whereas KRAS mutations were less frequent (OR 0.47; 95% CI, 0.30-0.75). Subgroup analysis showed that only the genome-wide methylation profile-defined CIMP subset encompassed all BRAF-mutated CRCs. As expected, CIMP-positive CRCs displayed significant associations with female (OR 0.64; 95% CI, 0.56-0.72), older age at diagnosis (WMD 2.77; 95% CI, 1.15-4.38), proximal location (OR 6.91; 95% CI, 5.17-9.23), mucinous histology (OR 3.81; 95% CI, 2.93-4.95), and poor differentiation (OR 4.22; 95% CI, 2.52-7.08). Although CIMP did not show a correlation with tumor stage (OR 1.10; 95% CI, 0.82-1.46), it was associated with shorter overall survival (HR 1.73; 95% CI, 1.27-2.37). The meta-analysis highlights that CIMP-positive CRCs take their own molecular feature, especially overlapping with BRAF mutations

  18. Applying a machine learning model using a locally preserving projection based feature regeneration algorithm to predict breast cancer risk

    NASA Astrophysics Data System (ADS)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qian, Wei; Zheng, Bin

    2018-03-01

    Both conventional and deep machine learning has been used to develop decision-support tools applied in medical imaging informatics. In order to take advantages of both conventional and deep learning approach, this study aims to investigate feasibility of applying a locally preserving projection (LPP) based feature regeneration algorithm to build a new machine learning classifier model to predict short-term breast cancer risk. First, a computer-aided image processing scheme was used to segment and quantify breast fibro-glandular tissue volume. Next, initially computed 44 image features related to the bilateral mammographic tissue density asymmetry were extracted. Then, an LLP-based feature combination method was applied to regenerate a new operational feature vector using a maximal variance approach. Last, a k-nearest neighborhood (KNN) algorithm based machine learning classifier using the LPP-generated new feature vectors was developed to predict breast cancer risk. A testing dataset involving negative mammograms acquired from 500 women was used. Among them, 250 were positive and 250 remained negative in the next subsequent mammography screening. Applying to this dataset, LLP-generated feature vector reduced the number of features from 44 to 4. Using a leave-onecase-out validation method, area under ROC curve produced by the KNN classifier significantly increased from 0.62 to 0.68 (p < 0.05) and odds ratio was 4.60 with a 95% confidence interval of [3.16, 6.70]. Study demonstrated that this new LPP-based feature regeneration approach enabled to produce an optimal feature vector and yield improved performance in assisting to predict risk of women having breast cancer detected in the next subsequent mammography screening.

  19. Assessment of Genetic and Molecular Approaches for the Prediction of Wheat Quality

    USDA-ARS?s Scientific Manuscript database

    Assessment of genetic and molecular approaches for the prediction of wheat quality. R.A. Graybosch, USDA-ARS, Lincoln, NE, U.S.A. Over the past four decades, the field of plant breeding and genetics has been revolutionized by technological advances in the areas of DNA manipulation and evaluation. Fo...

  20. Predicting conformational ensembles and genome-wide transcription factor binding sites from DNA sequences.

    PubMed

    Andrabi, Munazah; Hutchins, Andrew Paul; Miranda-Saavedra, Diego; Kono, Hidetoshi; Nussinov, Ruth; Mizuguchi, Kenji; Ahmad, Shandar

    2017-06-22

    DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.

  1. KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features.

    PubMed

    Zhu, Xiaolei; Mitchell, Julie C

    2011-09-01

    Hot spots constitute a small fraction of protein-protein interface residues, yet they account for a large fraction of the binding affinity. Based on our previous method (KFC), we present two new methods (KFC2a and KFC2b) that outperform other methods at hot spot prediction. A number of improvements were made in developing these new methods. First, we created a training data set that contained a similar number of hot spot and non-hot spot residues. In addition, we generated 47 different features, and different numbers of features were used to train the models to avoid over-fitting. Finally, two feature combinations were selected: One (used in KFC2a) is composed of eight features that are mainly related to solvent accessible surface area and local plasticity; the other (KFC2b) is composed of seven features, only two of which are identical to those used in KFC2a. The two models were built using support vector machines (SVM). The two KFC2 models were then tested on a mixed independent test set, and compared with other methods such as Robetta, FOLDEF, HotPoint, MINERVA, and KFC. KFC2a showed the highest predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.85); however, the false positive rate was somewhat higher than for other models. KFC2b showed the best predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.62) among all methods other than KFC2a, and the False Positive Rate (FPR = 0.15) was comparable with other highly predictive methods. Copyright © 2011 Wiley-Liss, Inc.

  2. Machine learning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space

    DOE PAGES

    Hansen, Katja; Biegler, Franziska; Ramakrishnan, Raghunathan; ...

    2015-06-04

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstratemore » prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. The same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.« less

  3. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

    PubMed Central

    2015-01-01

    Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies. PMID:26113956

  4. Toward Fully in Silico Melting Point Prediction Using Molecular Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Y; Maginn, EJ

    2013-03-01

    Melting point is one of the most fundamental and practically important properties of a compound. Molecular computation of melting points. However, all of these methods simulation methods have been developed for the accurate need an experimental crystal structure as input, which means that such calculations are not really predictive since the melting point can be measured easily in experiments once a crystal structure is known. On the other hand, crystal structure prediction (CSP) has become an active field and significant progress has been made, although challenges still exist. One of the main challenges is the existence of many crystal structuresmore » (polymorphs) that are very close in energy. Thermal effects and kinetic factors make the situation even more complicated, such that it is still not trivial to predict experimental crystal structures. In this work, we exploit the fact that free energy differences are often small between crystal structures. We show that accurate melting point predictions can be made by using a reasonable crystal structure from CSP as a starting point for a free energy-based melting point calculation. The key is that most crystal structures predicted by CSP have free energies that are close to that of the experimental structure. The proposed method was tested on two rigid molecules and the results suggest that a fully in silico melting point prediction method is possible.« less

  5. Breaking the polar-nonpolar division in solvation free energy prediction.

    PubMed

    Wang, Bao; Wang, Chengzhang; Wu, Kedi; Wei, Guo-Wei

    2018-02-05

    Implicit solvent models divide solvation free energies into polar and nonpolar additive contributions, whereas polar and nonpolar interactions are inseparable and nonadditive. We present a feature functional theory (FFT) framework to break this ad hoc division. The essential ideas of FFT are as follows: (i) representability assumption: there exists a microscopic feature vector that can uniquely characterize and distinguish one molecule from another; (ii) feature-function relationship assumption: the macroscopic features, including solvation free energy, of a molecule is a functional of microscopic feature vectors; and (iii) similarity assumption: molecules with similar microscopic features have similar macroscopic properties, such as solvation free energies. Based on these assumptions, solvation free energy prediction is carried out in the following protocol. First, we construct a molecular microscopic feature vector that is efficient in characterizing the solvation process using quantum mechanics and Poisson-Boltzmann theory. Microscopic feature vectors are combined with macroscopic features, that is, physical observable, to form extended feature vectors. Additionally, we partition a solvation dataset into queries according to molecular compositions. Moreover, for each target molecule, we adopt a machine learning algorithm for its nearest neighbor search, based on the selected microscopic feature vectors. Finally, from the extended feature vectors of obtained nearest neighbors, we construct a functional of solvation free energy, which is employed to predict the solvation free energy of the target molecule. The proposed FFT model has been extensively validated via a large dataset of 668 molecules. The leave-one-out test gives an optimal root-mean-square error (RMSE) of 1.05 kcal/mol. FFT predictions of SAMPL0, SAMPL1, SAMPL2, SAMPL3, and SAMPL4 challenge sets deliver the RMSEs of 0.61, 1.86, 1.64, 0.86, and 1.14 kcal/mol, respectively. Using a test set of 94

  6. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

    NASA Astrophysics Data System (ADS)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-01

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  7. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    PubMed

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-22

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  8. Molecular-Scale Features that Govern the Effects of O-Glycosylation on a Carbohydrate-Binding Module

    DOE PAGES

    Guan, Xiaoyang; Chaffey, Patrick K.; Zeng, Chen; ...

    2015-09-21

    The protein glycosylation is a ubiquitous post-translational modification in all kingdoms of life. Despite its importance in molecular and cellular biology, the molecular-level ramifications of O-glycosylation on biomolecular structure and function remain elusive. Here, we took a small model glycoprotein and changed the glycan structure and size, amino acid residues near the glycosylation site, and glycosidic linkage while monitoring any corresponding changes to physical stability and cellulose binding affinity. The results of this study reveal the collective importance of all the studied features in controlling the most pronounced effects of O-glycosylation in this system. This study suggests the possibility ofmore » designing proteins with multiple improved properties by simultaneously varying the structures of O-glycans and amino acids local to the glycosylation site.« less

  9. Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity

    PubMed Central

    Elias-Kirma, Shani; Nir, Ronit; Segal, Eran

    2017-01-01

    Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements. PMID:28922394

  10. HBC-Evo: predicting human breast cancer by exploiting amino acid sequence-based feature spaces and evolutionary ensemble system.

    PubMed

    Majid, Abdul; Ali, Safdar

    2015-01-01

    We developed genetic programming (GP)-based evolutionary ensemble system for the early diagnosis, prognosis and prediction of human breast cancer. This system has effectively exploited the diversity in feature and decision spaces. First, individual learners are trained in different feature spaces using physicochemical properties of protein amino acids. Their predictions are then stacked to develop the best solution during GP evolution process. Finally, results for HBC-Evo system are obtained with optimal threshold, which is computed using particle swarm optimization. Our novel approach has demonstrated promising results compared to state of the art approaches.

  11. Learning better deep features for the prediction of occult invasive disease in ductal carcinoma in situ through transfer learning

    NASA Astrophysics Data System (ADS)

    Shi, Bibo; Hou, Rui; Mazurowski, Maciej A.; Grimm, Lars J.; Ren, Yinhao; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

    2018-02-01

    Purpose: To determine whether domain transfer learning can improve the performance of deep features extracted from digital mammograms using a pre-trained deep convolutional neural network (CNN) in the prediction of occult invasive disease for patients with ductal carcinoma in situ (DCIS) on core needle biopsy. Method: In this study, we collected digital mammography magnification views for 140 patients with DCIS at biopsy, 35 of which were subsequently upstaged to invasive cancer. We utilized a deep CNN model that was pre-trained on two natural image data sets (ImageNet and DTD) and one mammographic data set (INbreast) as the feature extractor, hypothesizing that these data sets are increasingly more similar to our target task and will lead to better representations of deep features to describe DCIS lesions. Through a statistical pooling strategy, three sets of deep features were extracted using the CNNs at different levels of convolutional layers from the lesion areas. A logistic regression classifier was then trained to predict which tumors contain occult invasive disease. The generalization performance was assessed and compared using repeated random sub-sampling validation and receiver operating characteristic (ROC) curve analysis. Result: The best performance of deep features was from CNN model pre-trained on INbreast, and the proposed classifier using this set of deep features was able to achieve a median classification performance of ROC-AUC equal to 0.75, which is significantly better (p<=0.05) than the performance of deep features extracted using ImageNet data set (ROCAUC = 0.68). Conclusion: Transfer learning is helpful for learning a better representation of deep features, and improves the prediction of occult invasive disease in DCIS.

  12. Clinicopathologic, Immunohistochemical, and Molecular Features of Histiocytoid Sweet Syndrome.

    PubMed

    Alegría-Landa, Victoria; Rodríguez-Pinilla, Socorro María; Santos-Briz, Angel; Rodríguez-Peralto, José Luis; Alegre, Victor; Cerroni, Lorenzo; Kutzner, Heinz; Requena, Luis

    2017-07-01

    Histiocytoid Sweet syndrome is a rare histopathologic variant of Sweet syndrome. The nature of the histiocytoid infiltrate has generated considerable controversy in the literature. The main goal of this study was to conduct a comprehensive overview of the immunohistochemical phenotype of the infiltrate in histiocytoid Sweet syndrome. We also analyze whether this variant of Sweet syndrome is more frequently associated with hematologic malignancies than classic Sweet syndrome. This is a retrospective case series study of the clinicopathologic, immunohistochemical, and molecular features of 33 patients with a clinicopathologic diagnosis of histiocytoid Sweet syndrome was conducted in the dermatology departments of 5 university hospitals and a private laboratory of dermatopathology. The clinical, histopathological, immunohistochemical, and follow-up features of 33 patients with histiocytoid Sweet syndrome were analyzed. In some cases, cytogenetic studies of the dermal infiltrate were also performed. We compare our findings with those of the literature. The dermal infiltrate from the 33 study patients (20 female; median age, 49 years; age range, 5-93 years; and 13 male; median age, 42 years; age range, 4-76 years) was mainly composed of myeloperoxidase-positive immature myelomonocytic cells with histiocytoid morphology. No cytogenetic anomalies were found in the infiltrate except in 1 case in which neoplastic cells of chronic myelogenous leukemia were intermingled with the cells of histiocytoid Sweet syndrome. Authentic histiocytes were also found in most cases, with a mature immunoprofile, but they appeared to be a minor component of the infiltrate. Histiocytoid Sweet syndrome was not more frequently related with hematologic malignancies than classic neutrophilic Sweet syndrome. The dermal infiltrate of cutaneous lesions of histiocytoid Sweet syndrome is composed mostly of immature cells of myeloid lineage. This infiltrate should not be interpreted as leukemia cutis.

  13. mpMoRFsDB: a database of molecular recognition features in membrane proteins.

    PubMed

    Gypas, Foivos; Tsaousis, Georgios N; Hamodrakas, Stavros J

    2013-10-01

    Molecular recognition features (MoRFs) are small, intrinsically disordered regions in proteins that undergo a disorder-to-order transition on binding to their partners. MoRFs are involved in protein-protein interactions and may function as the initial step in molecular recognition. The aim of this work was to collect, organize and store all membrane proteins that contain MoRFs. Membrane proteins constitute ∼30% of fully sequenced proteomes and are responsible for a wide variety of cellular functions. MoRFs were classified according to their secondary structure, after interacting with their partners. We identified MoRFs in transmembrane and peripheral membrane proteins. The position of transmembrane protein MoRFs was determined in relation to a protein's topology. All information was stored in a publicly available mySQL database with a user-friendly web interface. A Jmol applet is integrated for visualization of the structures. mpMoRFsDB provides valuable information related to disorder-based protein-protein interactions in membrane proteins. http://bioinformatics.biol.uoa.gr/mpMoRFsDB

  14. Technique Feature Analysis or Involvement Load Hypothesis: Estimating Their Predictive Power in Vocabulary Learning.

    PubMed

    Gohar, Manoochehr Jafari; Rahmanian, Mahboubeh; Soleimani, Hassan

    2018-02-05

    Vocabulary learning has always been a great concern and has attracted the attention of many researchers. Among the vocabulary learning hypotheses, involvement load hypothesis and technique feature analysis have been proposed which attempt to bring some concepts like noticing, motivation, and generation into focus. In the current study, 90 high proficiency EFL students were assigned into three vocabulary tasks of sentence making, composition, and reading comprehension in order to examine the power of involvement load hypothesis and technique feature analysis frameworks in predicting vocabulary learning. It was unraveled that involvement load hypothesis cannot be a good predictor, and technique feature analysis was a good predictor in pretest to posttest score change and not in during-task activity. The implications of the results will be discussed in the light of preparing vocabulary tasks.

  15. Stargardt disease: clinical features, molecular genetics, animal models and therapeutic options.

    PubMed

    Tanna, Preena; Strauss, Rupert W; Fujinami, Kaoru; Michaelides, Michel

    2017-01-01

    Stargardt disease (STGD1; MIM 248200) is the most prevalent inherited macular dystrophy and is associated with disease-causing sequence variants in the gene ABCA4 Significant advances have been made over the last 10 years in our understanding of both the clinical and molecular features of STGD1, and also the underlying pathophysiology, which has culminated in ongoing and planned human clinical trials of novel therapies. The aims of this review are to describe the detailed phenotypic and genotypic characteristics of the disease, conventional and novel imaging findings, current knowledge of animal models and pathogenesis, and the multiple avenues of intervention being explored. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  16. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches.

    PubMed

    Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity

  17. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches

    PubMed Central

    Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility

  18. Accuracy of gap analysis habitat models in predicting physical features for wildlife-habitat associations in the southwest U.S.

    USGS Publications Warehouse

    Boykin, K.G.; Thompson, B.C.; Propeck-Gray, S.

    2010-01-01

    Despite widespread and long-standing efforts to model wildlife-habitat associations using remotely sensed and other spatially explicit data, there are relatively few evaluations of the performance of variables included in predictive models relative to actual features on the landscape. As part of the National Gap Analysis Program, we specifically examined physical site features at randomly selected sample locations in the Southwestern U.S. to assess degree of concordance with predicted features used in modeling vertebrate habitat distribution. Our analysis considered hypotheses about relative accuracy with respect to 30 vertebrate species selected to represent the spectrum of habitat generalist to specialist and categorization of site by relative degree of conservation emphasis accorded to the site. Overall comparison of 19 variables observed at 382 sample sites indicated ???60% concordance for 12 variables. Directly measured or observed variables (slope, soil composition, rock outcrop) generally displayed high concordance, while variables that required judgments regarding descriptive categories (aspect, ecological system, landform) were less concordant. There were no differences detected in concordance among taxa groups, degree of specialization or generalization of selected taxa, or land conservation categorization of sample sites with respect to all sites. We found no support for the hypothesis that accuracy of habitat models is inversely related to degree of taxa specialization when model features for a habitat specialist could be more difficult to represent spatially. Likewise, we did not find support for the hypothesis that physical features will be predicted with higher accuracy on lands with greater dedication to biodiversity conservation than on other lands because of relative differences regarding available information. Accuracy generally was similar (>60%) to that observed for land cover mapping at the ecological system level. These patterns demonstrate

  19. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

    PubMed

    Jones, David T; Kandathil, Shaun M

    2018-04-26

    In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.

  20. Ventromedial Frontal Cortex Is Critical for Guiding Attention to Reward-Predictive Visual Features in Humans.

    PubMed

    Vaidya, Avinash R; Fellows, Lesley K

    2015-09-16

    Adaptively interacting with our environment requires extracting information that will allow us to successfully predict reward. This can be a challenge, particularly when there are many candidate cues, and when rewards are probabilistic. Recent work has demonstrated that visual attention is allocated to stimulus features that have been associated with reward on previous trials. The ventromedial frontal lobe (VMF) has been implicated in learning in dynamic environments of this kind, but the mechanism by which this region influences this process is not clear. Here, we hypothesized that the VMF plays a critical role in guiding attention to reward-predictive stimulus features based on feedback. We tested the effects of VMF damage in human subjects on a visual search task in which subjects were primed to attend to task-irrelevant colors associated with different levels of reward, incidental to the search task. Consistent with previous work, we found that distractors had a greater influence on reaction time when they appeared in colors associated with high reward in the previous trial compared with colors associated with low reward in healthy control subjects and patients with prefrontal damage sparing the VMF. However, this reward modulation of attentional priming was absent in patients with VMF damage. Thus, an intact VMF is necessary for directing attention based on experience with cue-reward associations. We suggest that this region plays a role in selecting reward-predictive cues to facilitate future learning. There has been a swell of interest recently in the ventromedial frontal cortex (VMF), a brain region critical to associative learning. However, the underlying mechanism by which this region guides learning is not well understood. Here, we tested the effects of damage to this region in humans on a task in which rewards were linked incidentally to visual features, resulting in trial-by-trial attentional priming. Controls and subjects with prefrontal damage

  1. Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies.

    PubMed

    Hansen, Katja; Montavon, Grégoire; Biegler, Franziska; Fazli, Siamac; Rupp, Matthias; Scheffler, Matthias; von Lilienfeld, O Anatole; Tkatchenko, Alexandre; Müller, Klaus-Robert

    2013-08-13

    The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.

  2. Predicting Presynaptic and Postsynaptic Neurotoxins by Developing Feature Selection Technique

    PubMed Central

    Yang, Yunchun; Zhang, Chunmei; Chen, Rong; Huang, Po

    2017-01-01

    Presynaptic and postsynaptic neurotoxins are proteins which act at the presynaptic and postsynaptic membrane. Correctly predicting presynaptic and postsynaptic neurotoxins will provide important clues for drug-target discovery and drug design. In this study, we developed a theoretical method to discriminate presynaptic neurotoxins from postsynaptic neurotoxins. A strict and objective benchmark dataset was constructed to train and test our proposed model. The dipeptide composition was used to formulate neurotoxin samples. The analysis of variance (ANOVA) was proposed to find out the optimal feature set which can produce the maximum accuracy. In the jackknife cross-validation test, the overall accuracy of 94.9% was achieved. We believe that the proposed model will provide important information to study neurotoxins. PMID:28303250

  3. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

    PubMed

    Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

  4. Predicting protein-protein interactions by combing various sequence- derived features into the general form of Chou's Pseudo amino acid composition.

    PubMed

    Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

    2012-05-01

    Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.

  5. New molecular features of cowpea bean (Vigna unguiculata, l. Walp) β-vignin.

    PubMed

    de Souza Ferreira, Ederlan; Capraro, Jessica; Sessa, Fabio; Magni, Chiara; Demonte, Aureluce; Consonni, Alessandro; Augusto Neves, Valdir; Maffud Cilli, Eduardo; Duranti, Marcello; Scarafoni, Alessio

    2018-02-01

    Cowpea seed β-vignin, a vicilin-like globulin, proved to exert various health favourable effects, including blood cholesterol reduction in animal models. The need of a simple scalable enrichment procedure for further studies for tailored applications of this seed protein is crucial. A chromatography-independent fractionation method allowing to obtain a protein preparation with a high degree of homogeneity was used. Further purification was pursued to deep the molecular characterisation of β-vignin. The results showed: (i) differing glycosylation patterns of the two constituent polypeptides, in agreement with amino acid sequence features; (ii) the seed accumulation of a gene product never identified before; (iii) metal binding capacity of native protein, a property observed only in few other legume seed vicilins.

  6. Use of multiple picosecond high-mass molecular dynamics simulations to predict crystallographic B-factors of folded globular proteins.

    PubMed

    Pang, Yuan-Ping

    2016-09-01

    Predicting crystallographic B-factors of a protein from a conventional molecular dynamics simulation is challenging, in part because the B-factors calculated through sampling the atomic positional fluctuations in a picosecond molecular dynamics simulation are unreliable, and the sampling of a longer simulation yields overly large root mean square deviations between calculated and experimental B-factors. This article reports improved B-factor prediction achieved by sampling the atomic positional fluctuations in multiple picosecond molecular dynamics simulations that use uniformly increased atomic masses by 100-fold to increase time resolution. Using the third immunoglobulin-binding domain of protein G, bovine pancreatic trypsin inhibitor, ubiquitin, and lysozyme as model systems, the B-factor root mean square deviations (mean ± standard error) of these proteins were 3.1 ± 0.2-9 ± 1 Å 2 for Cα and 7.3 ± 0.9-9.6 ± 0.2 Å 2 for Cγ, when the sampling was done for each of these proteins over 20 distinct, independent, and 50-picosecond high-mass molecular dynamics simulations with AMBER forcefield FF12MC or FF14SB. These results suggest that sampling the atomic positional fluctuations in multiple picosecond high-mass molecular dynamics simulations may be conducive to a priori prediction of crystallographic B-factors of a folded globular protein.

  7. Similarity-based Regularized Latent Feature Model for Link Prediction in Bipartite Networks.

    PubMed

    Wang, Wenjun; Chen, Xue; Jiao, Pengfei; Jin, Di

    2017-12-05

    Link prediction is an attractive research topic in the field of data mining and has significant applications in improving performance of recommendation system and exploring evolving mechanisms of the complex networks. A variety of complex systems in real world should be abstractly represented as bipartite networks, in which there are two types of nodes and no links connect nodes of the same type. In this paper, we propose a framework for link prediction in bipartite networks by combining the similarity based structure and the latent feature model from a new perspective. The framework is called Similarity Regularized Nonnegative Matrix Factorization (SRNMF), which explicitly takes the local characteristics into consideration and encodes the geometrical information of the networks by constructing a similarity based matrix. We also develop an iterative scheme to solve the objective function based on gradient descent. Extensive experiments on a variety of real world bipartite networks show that the proposed framework of link prediction has a more competitive, preferable and stable performance in comparison with the state-of-art methods.

  8. Predictive value of initial FDG-PET features for treatment response and survival in esophageal cancer patients treated with chemo-radiation therapy using a random forest classifier.

    PubMed

    Desbordes, Paul; Ruan, Su; Modzelewski, Romain; Pineau, Pascal; Vauclin, Sébastien; Gouel, Pierrick; Michel, Pierre; Di Fiore, Frédéric; Vera, Pierre; Gardin, Isabelle

    2017-01-01

    In oncology, texture features extracted from positron emission tomography with 18-fluorodeoxyglucose images (FDG-PET) are of increasing interest for predictive and prognostic studies, leading to several tens of features per tumor. To select the best features, the use of a random forest (RF) classifier was investigated. Sixty-five patients with an esophageal cancer treated with a combined chemo-radiation therapy were retrospectively included. All patients underwent a pretreatment whole-body FDG-PET. The patients were followed for 3 years after the end of the treatment. The response assessment was performed 1 month after the end of the therapy. Patients were classified as complete responders and non-complete responders. Sixty-one features were extracted from medical records and PET images. First, Spearman's analysis was performed to eliminate correlated features. Then, the best predictive and prognostic subsets of features were selected using a RF algorithm. These results were compared to those obtained by a Mann-Whitney U test (predictive study) and a univariate Kaplan-Meier analysis (prognostic study). Among the 61 initial features, 28 were not correlated. From these 28 features, the best subset of complementary features found using the RF classifier to predict response was composed of 2 features: metabolic tumor volume (MTV) and homogeneity from the co-occurrence matrix. The corresponding predictive value (AUC = 0.836 ± 0.105, Se = 82 ± 9%, Sp = 91 ± 12%) was higher than the best predictive results found using the Mann-Whitney test: busyness from the gray level difference matrix (P < 0.0001, AUC = 0.810, Se = 66%, Sp = 88%). The best prognostic subset found using RF was composed of 3 features: MTV and 2 clinical features (WHO status and nutritional risk index) (AUC = 0.822 ± 0.059, Se = 79 ± 9%, Sp = 95 ± 6%), while no feature was significantly prognostic according to the Kaplan-Meier analysis. The RF classifier can improve predictive and prognostic values

  9. Incorporation of local structure into kriging models for the prediction of atomistic properties in the water decamer.

    PubMed

    Davie, Stuart J; Di Pasquale, Nicodemo; Popelier, Paul L A

    2016-10-15

    Machine learning algorithms have been demonstrated to predict atomistic properties approaching the accuracy of quantum chemical calculations at significantly less computational cost. Difficulties arise, however, when attempting to apply these techniques to large systems, or systems possessing excessive conformational freedom. In this article, the machine learning method kriging is applied to predict both the intra-atomic and interatomic energies, as well as the electrostatic multipole moments, of the atoms of a water molecule at the center of a 10 water molecule (decamer) cluster. Unlike previous work, where the properties of small water clusters were predicted using a molecular local frame, and where training set inputs (features) were based on atomic index, a variety of feature definitions and coordinate frames are considered here to increase prediction accuracy. It is shown that, for a water molecule at the center of a decamer, no single method of defining features or coordinate schemes is optimal for every property. However, explicitly accounting for the structure of the first solvation shell in the definition of the features of the kriging training set, and centring the coordinate frame on the atom-of-interest will, in general, return better predictions than models that apply the standard methods of feature definition, or a molecular coordinate frame. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.

  10. Bond-valence methods for pKa prediction. II. Bond-valence, electrostatic, molecular geometry, and solvation effects

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bickmore, Barry R.; Rosso, Kevin M.; Tadanier, Christopher J.

    2006-08-15

    In a previous contribution, we outlined a method for predicting (hydr)oxy-acid and oxide surface acidity constants based on three main factors: bond valence, Me?O bond ionicity, and molecular shape. Here electrostatics calculations and ab initio molecular dynamics simulations are used to qualitatively show that Me?O bond ionicity controls the extent to which the electrostatic work of proton removal departs from ideality, bond valence controls the extent of solvation of individual functional groups, and bond valence and molecular shape controls local dielectric response. These results are consistent with our model of acidity, but completely at odds with other methods of predictingmore » acidity constants for use in multisite complexation models. In particular, our ab initio molecular dynamics simulations of solvated monomers clearly indicate that hydrogen bonding between (hydr)oxo-groups and water molecules adjusts to obey the valence sum rule, rather than maintaining a fixed valence based on the coordination of the oxygen atom as predicted by the standard MUSIC model.« less

  11. Tracking the Correlation Between CpG Island Methylator Phenotype and Other Molecular Features and Clinicopathological Features in Human Colorectal Cancers: A Systematic Review and Meta-Analysis

    PubMed Central

    Zong, Liang; Abe, Masanobu; Ji, Jiafu; Zhu, Wei-Guo; Yu, Duonan

    2016-01-01

    Objectives: The controversy of CpG island methylator phenotype (CIMP) in colorectal cancers (CRCs) persists, despite many studies that have been conducted on its correlation with molecular and clinicopathological features. To drive a more precise estimate of the strength of this postulated relationship, a meta-analysis was performed. Methods: A comprehensive search for studies reporting molecular and clinicopathological features of CRCs stratified by CIMP was performed within the PubMed, EMBASE, and Cochrane Library. CIMP was defined by either one of the three panels of gene-specific CIMP markers (Weisenberger panel, classic panel, or a mixture panel of the previous two) or the genome-wide DNA methylation profile. The associations of CIMP with outcome parameters were estimated using odds ratio (OR) or weighted mean difference (WMD) or hazard ratios (HRs) with 95% confidence interval (CI) for each study using a fixed effects or random effects model. Results: A total of 29 studies involving 9,393 CRC patients were included for analysis. We observed more BRAF mutations (OR 34.87; 95% CI, 22.49–54.06) and microsatellite instability (MSI) (OR 12.85 95% CI, 8.84–18.68) in CIMP-positive vs. -negative CRCs, whereas KRAS mutations were less frequent (OR 0.47; 95% CI, 0.30–0.75). Subgroup analysis showed that only the genome-wide methylation profile-defined CIMP subset encompassed all BRAF-mutated CRCs. As expected, CIMP-positive CRCs displayed significant associations with female (OR 0.64; 95% CI, 0.56–0.72), older age at diagnosis (WMD 2.77; 95% CI, 1.15–4.38), proximal location (OR 6.91; 95% CI, 5.17–9.23), mucinous histology (OR 3.81; 95% CI, 2.93–4.95), and poor differentiation (OR 4.22; 95% CI, 2.52–7.08). Although CIMP did not show a correlation with tumor stage (OR 1.10; 95% CI, 0.82–1.46), it was associated with shorter overall survival (HR 1.73; 95% CI, 1.27–2.37). Conclusions: The meta-analysis highlights that CIMP-positive CRCs take their own

  12. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.

    PubMed

    Zheng, Ce; Kurgan, Lukasz

    2008-10-10

    beta-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of beta-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based beta-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential beta-turns, while the remaining four amino acids are useful to predict non-beta-turns. Empirical evaluation using three nonredundant datasets shows favorable Q total, Q predicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Q total barrier and achieves Q total = 80.9%, MCC = 0.47, and Q predicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Experiments show that the proposed method constitutes an improvement over the competing prediction

  13. Prediction of blood-brain partitioning: a model based on molecular electronegativity distance vector descriptors.

    PubMed

    Zhang, Yong-Hong; Xia, Zhi-Ning; Qin, Li-Tang; Liu, Shu-Shen

    2010-09-01

    The objective of this paper is to build a reliable model based on the molecular electronegativity distance vector (MEDV) descriptors for predicting the blood-brain barrier (BBB) permeability and to reveal the effects of the molecular structural segments on the BBB permeability. Using 70 structurally diverse compounds, the partial least squares regression (PLSR) models between the BBB permeability and the MEDV descriptors were developed and validated by the variable selection and modeling based on prediction (VSMP) technique. The estimation ability, stability, and predictive power of a model are evaluated by the estimated correlation coefficient (r), leave-one-out (LOO) cross-validation correlation coefficient (q), and predictive correlation coefficient (R(p)). It has been found that PLSR model has good quality, r=0.9202, q=0.7956, and R(p)=0.6649 for M1 model based on the training set of 57 samples. To search the most important structural factors affecting the BBB permeability of compounds, we performed the values of the variable importance in projection (VIP) analysis for MEDV descriptors. It was found that some structural fragments in compounds, such as -CH(3), -CH(2)-, =CH-, =C, triple bond C-, -CH<, =C<, =N-, -NH-, =O, and -OH, are the most important factors affecting the BBB permeability. (c) 2010. Published by Elsevier Inc.

  14. Fine-needle aspiration of lipoblastoma: Cytological, molecular, and clinical features.

    PubMed

    Ferreira, Joana; Esteves, Gonçalo; Fonseca, Ricardo; Martins, Carmo; André, Saudade; Lemos, Maria Manuel

    2017-12-01

    Lipoblastomas are rare, benign adipocytic tumors that present mostly during infancy. In about 70% of cases, these tumors carry abnormalities in chromosome 8, mainly leading to rearrangements of the PLAG1 gene. We report a series of histologically proven lipoblastomas with previous fine-needle aspiration (FNA) cytology from 9 patients (n = 10 samples) and describe their clinical, cytological, and molecular features. Our cohort included 5 boys and 4 girls (median age, 2.5 years [range, 10 months to 13 years]) who presented with soft tissue masses in the thorax (n = 3), abdomen (n = 2), axilla (n = 2), and thigh (n = 2). In 1 patient, the FNA diagnosis was inconclusive due to hypocellularity, and in another patient a diagnosis of benign lipomatous tumor was made. In the remaining 8 samples (one of which confirmed relapse), a correct preoperative FNA diagnosis was rendered. Smears were hypo- to moderately cellular and contained fragments of mature adipose tissue with thin branching vessels admixed with some lipoblasts in a myxoid matrix. Spindle cells and naked oval nuclei with no atypia were observed in the background. Of the 4 patients tested for PLAG1 rearrangement using FISH probes, 3 harbored this alteration (1 was made on a FNA smear and 1 was made in a tumor imprint). All the patients are alive and well, except for 1 patient with a retroperitoneal tumor who, after an initial incomplete excision, died of local disease progression. FNA, especially if used together with molecular biology techniques (eg, PLAG1 FISH analysis), is a reliable and accurate diagnostic tool. Cancer Cytopathol 2017;125:934-9. © 2017 American Cancer Society. © 2017 American Cancer Society.

  15. Abstract Conceptual Feature Ratings Predict Gaze within Written Word Arrays: Evidence from a Visual Wor(l)d Paradigm

    ERIC Educational Resources Information Center

    Primativo, Silvia; Reilly, Jamie; Crutch, Sebastian J

    2017-01-01

    The Abstract Conceptual Feature (ACF) framework predicts that word meaning is represented within a high-dimensional semantic space bounded by weighted contributions of perceptual, affective, and encyclopedic information. The ACF, like latent semantic analysis, is amenable to distance metrics between any two words. We applied predictions of the ACF…

  16. Gas Sensors Based on Molecular Imprinting Technology.

    PubMed

    Zhang, Yumin; Zhang, Jin; Liu, Qingju

    2017-07-04

    Molecular imprinting technology (MIT); often described as a method of designing a material to remember a target molecular structure (template); is a technique for the creation of molecularly imprinted polymers (MIPs) with custom-made binding sites complementary to the target molecules in shape; size and functional groups. MIT has been successfully applied to analyze; separate and detect macromolecular organic compounds. Furthermore; it has been increasingly applied in assays of biological macromolecules. Owing to its unique features of structure specificity; predictability; recognition and universal application; there has been exploration of the possible application of MIPs in the field of highly selective gas sensors. In this present study; we outline the recent advances in gas sensors based on MIT; classify and introduce the existing molecularly imprinted gas sensors; summarize their advantages and disadvantages; and analyze further research directions.

  17. Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.

    PubMed

    Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E

    2018-04-25

    Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.

  18. Computational intelligence models to predict porosity of tablets using minimum features.

    PubMed

    Khalid, Mohammad Hassan; Kazemi, Pezhman; Perez-Gandarillas, Lucia; Michrafy, Abderrahim; Szlęk, Jakub; Jachowicz, Renata; Mendyk, Aleksander

    2017-01-01

    The effects of different formulations and manufacturing process conditions on the physical properties of a solid dosage form are of importance to the pharmaceutical industry. It is vital to have in-depth understanding of the material properties and governing parameters of its processes in response to different formulations. Understanding the mentioned aspects will allow tighter control of the process, leading to implementation of quality-by-design (QbD) practices. Computational intelligence (CI) offers an opportunity to create empirical models that can be used to describe the system and predict future outcomes in silico. CI models can help explore the behavior of input parameters, unlocking deeper understanding of the system. This research endeavor presents CI models to predict the porosity of tablets created by roll-compacted binary mixtures, which were milled and compacted under systematically varying conditions. CI models were created using tree-based methods, artificial neural networks (ANNs), and symbolic regression trained on an experimental data set and screened using root-mean-square error (RMSE) scores. The experimental data were composed of proportion of microcrystalline cellulose (MCC) (in percentage), granule size fraction (in micrometers), and die compaction force (in kilonewtons) as inputs and porosity as an output. The resulting models show impressive generalization ability, with ANNs (normalized root-mean-square error [NRMSE] =1%) and symbolic regression (NRMSE =4%) as the best-performing methods, also exhibiting reliable predictive behavior when presented with a challenging external validation data set (best achieved symbolic regression: NRMSE =3%). Symbolic regression demonstrates the transition from the black box modeling paradigm to more transparent predictive models. Predictive performance and feature selection behavior of CI models hints at the most important variables within this factor space.

  19. TANDEM: a two-stage approach to maximize interpretability of drug response models based on multiple molecular data types.

    PubMed

    Aben, Nanne; Vis, Daniel J; Michaut, Magali; Wessels, Lodewyk F A

    2016-09-01

    Clinical response to anti-cancer drugs varies between patients. A large portion of this variation can be explained by differences in molecular features, such as mutation status, copy number alterations, methylation and gene expression profiles. We show that the classic approach for combining these molecular features (Elastic Net regression on all molecular features simultaneously) results in models that are almost exclusively based on gene expression. The gene expression features selected by the classic approach are difficult to interpret as they often represent poorly studied combinations of genes, activated by aberrations in upstream signaling pathways. To utilize all data types in a more balanced way, we developed TANDEM, a two-stage approach in which the first stage explains response using upstream features (mutations, copy number, methylation and cancer type) and the second stage explains the remainder using downstream features (gene expression). Applying TANDEM to 934 cell lines profiled across 265 drugs (GDSC1000), we show that the resulting models are more interpretable, while retaining the same predictive performance as the classic approach. Using the more balanced contributions per data type as determined with TANDEM, we find that response to MAPK pathway inhibitors is largely predicted by mutation data, while predicting response to DNA damaging agents requires gene expression data, in particular SLFN11 expression. TANDEM is available as an R package on CRAN (for more information, see http://ccb.nki.nl/software/tandem). m.michaut@nki.nl or l.wessels@nki.nl Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Pediatric Eosinophilic Esophagitis Symptom Scores (PEESS® v2.0) identify histologic and molecular correlates of the key clinical features of disease

    PubMed Central

    Martin, Lisa J.; Franciosi, James P.; Collins, Margaret H.; Abonia, J. Pablo; Lee, James J.; Hommel, Kevin A.; Varni, James W.; Grotjan, J. Tommie; Eby, Michael; He, Hua; Marsolo, Keith; Putnam, Philip E.; Garza, Jose M.; Kaul, Ajay; Wen, Ting; Rothenberg, Marc E.

    2015-01-01

    Background The Pediatric Eosinophilic Esophagitis Symptom Score (PEESS® v2.0) measures patient-relevant outcomes. However, whether patient-identified domains (dysphagia, gastrointestinal reflux disease (GERD), nausea/vomiting, and pain) align with clinical symptomology and histopathologic and molecular features of eosinophilic esophagitis (EoE) is unclear. Objective The purpose of this study was to determine if clinical features of EoE, measured through the PEESS® v2.0, associate with histopathologic and molecular features of EoE. This represents a novel approach for analysis of allergic diseases, given the availability of allergic tissue biopsy specimens. Methods We systematically recruited treated and untreated, pediatric patients with EoE (aged 2–18 years) and examined parent proxy–reported symptoms using the PEESS® v2.0. Clinical symptomology was collected by questionnaire. Esophageal biopsy samples were quantified for levels of eosinophils, eosinophil peroxidase (EPX) immunohistochemical staining, and mast cells. Molecular features were assessed by the EoE Diagnostic Panel (94 EoE-related gene transcripts). Associations between domain scores and clinical symptoms and biologic features were analyzed using Wilcoxon Rank Sum and Spearman correlation. Results The PEESS® v2.0 domains correlated to specific parent-reported symptoms: dysphagia (p = 0.0012), GERD (p = 0.0001), and nausea/vomiting (p < 0.0001). Pain correlated with multiple symptoms (p < 0.0005). Dysphagia correlated most strongly with overall histopathology, particularly in the proximal esophagus (p ≤ 0.0049). Markers of esophageal activity (EPX) were significantly associated with dysphagia (strongest r = .37; p = 0.02). Eosinophil levels were more associated with pain (r = 0.27; p=0.06) than for dysphagia (r = 0.24; p = 0.13). The dysphagia domain correlated the most with esophageal gene transcript levels, predominantly with mast cell–specific genes. Conclusion We have 1) established a

  1. Mutational Landscape of cfDNA Identifies Distinct Molecular Features Associated With Therapeutic Response to First-Line Platinum-Based Doublet Chemotherapy in Patients with Advanced NSCLC

    PubMed Central

    Jiang, Tao; Li, Xuefei; Wang, Jianfei; Su, Chunxia; Han, Wenbo; Zhao, Chao; Wu, Fengying; Gao, Guanghui; Li, Wei; Chen, Xiaoxia; Li, Jiayu; Zhou, Fei; Zhao, Jing; Cai, Weijing; Zhang, Henghui; Du, Bo; Zhang, Jun; Ren, Shengxiang; Zhou, Caicun; Yu, Hui; Hirsch, Fred R.

    2017-01-01

    Rationale To investigate whether the mutational landscape of circulating cell-free DNA (cfDNA) could predict and dynamically monitor the response to first-line platinum-based chemotherapy in patients with advanced non-small-cell lung cancer (NSCLC). Methods Eligible patients were included and blood samples were collected from a phase III trial. Both cfDNA fragments and fragmented genomic DNA were extracted for enrichment in a 1.15M size panel covering exon regions of 1,086 genes. Molecular mutational burden (MMB) was calculated to investigate the relationship between molecular features of cfDNA and response to chemotherapy. Results In total, 52 eligible cases were enrolled and their blood samples were prospectively collected at baseline, every cycle of chemotherapy and time of disease progression. At baseline, alterations of 17 genes were found. Patients with partial response (PR) had significantly lower baseline MMB of these genes than those patients with either stable disease (SD) (P = 0.0006) or progression disease (PD) (P = 0.0074). Further analysis revealed that the mutational landscape of cfDNA from pretreatment blood samples were distinctly different among patients with PR vs. SD/PD. For patients with baseline TP53 mutation, those with PR experienced a significant reduction in MMB whereas patients with SD or PD experienced an increase after two, three or four cycles of chemotherapy. Furthermore, patients with low MMB had superior response rate and significantly longer progression-free survival than those with high MMB. Conclusion This study indicated that the mutational landscape of cfDNA has potential clinical value to predict the therapeutic response to first-line platinum-based doublet chemotherapy in NSCLC patients. At the single gene level, dynamic change of molecular mutational burden of TP53 is valuable to monitor efficacy (and, therefore, might aid in early recognition of resistance and relapse) in patients harboring this mutation at baseline. PMID

  2. Diagnostic, prognostic and predictive relevance of molecular markers in gliomas.

    PubMed

    Brandner, Sebastian; von Deimling, Andreas

    2015-10-01

    The advances of genome-wide 'discovery platforms' and the increasing affordability of the analysis of significant sample sizes have led to the identification of novel mutations in brain tumours that became diagnostically and prognostically relevant. The development of mutation-specific antibodies has facilitated the introduction of these convenient biomarkers into most neuropathology laboratories and has changed our approach to brain tumour diagnostics. However, tissue diagnosis will remain an essential first step for the correct stratification for subsequent molecular tests, and the combined interpretation of the molecular and tissue diagnosis ideally remains with the neuropathologist. This overview will help our understanding of the pathobiology of common intrinsic brain tumours in adults and help guiding which molecular tests can supplement and refine the tissue diagnosis of the most common adult intrinsic brain tumours. This article will discuss the relevance of 1p/19q codeletions, IDH1/2 mutations, BRAF V600E and BRAF fusion mutations, more recently discovered mutations in ATRX, H3F3A, TERT, CIC and FUBP1, for diagnosis, prognostication and predictive testing. In a tumour-specific topic, the role of mitogen-activated protein kinase pathway mutations in the pathogenesis of pilocytic astrocytomas will be covered. © 2015 British Neuropathological Society.

  3. Morphologic, Immunophenotypic, and Molecular Features of Epithelial Ovarian Cancer.

    PubMed

    Ramalingam, Preetha

    2016-02-01

    Epithelial ovarian cancer comprises a heterogeneous group of tumors. The four most common subtypes are serous, endometrioid, clear cell, and mucinous carcinoma. Less common are transitional cell tumors, including transitional cell carcinoma and malignant Brenner tumor. While in the past these subtypes were grouped together and designated as epithelial ovarian tumors, these tumor types are now known to be separate entities with distinct clinical and biologic behaviors. From a therapeutic standpoint, current regimens employ standard chemotherapy based on stage and grade rather than histotype. However, this landscape may change in the era of personalized therapy, given that most subtypes (with the exception of high-grade serous carcinoma) are relatively resistant to chemotherapy. It is now well-accepted that high-grade and low-grade serous carcinomas represent distinct entities rather than a spectrum of the same tumor type. While they are similar in that patients present with advanced-stage disease, their histologic and molecular features are entirely different. High-grade serous carcinoma is associated with TP53 mutations, whereas low-grade serous carcinomas are associated with BRAF and KRAS mutations. Endometrioid and clear cell carcinomas typically present as early-stage disease and are frequently associated with endometriosis. Mucinous carcinomas typically present as large unilateral masses and often show areas of mucinous cystadenoma and mucinous borderline tumor. It must be emphasized that primary mucinous carcinomas are uncommon tumors, and metastasis from other sites such as the appendix, colon, stomach, and pancreaticobiliary tract must always be considered in the differential diagnosis. Lastly, transitional cell tumors of the ovary, specifically malignant Brenner tumors, are quite uncommon. High-grade serous carcinoma often has a transitional cell pattern, and adequate sampling in most cases shows more typical areas of serous carcinoma. Immunohistochemical

  4. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

    PubMed

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-05

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.

  5. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

    PubMed Central

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-01

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html PMID:29416743

  6. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments

    PubMed Central

    Zheng, Ce; Kurgan, Lukasz

    2008-01-01

    Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. Results We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an improvement over the competing

  7. Quantitative imaging features: extension of the oncology medical image database

    NASA Astrophysics Data System (ADS)

    Patel, M. N.; Looney, P. T.; Young, K. C.; Halling-Brown, M. D.

    2015-03-01

    Radiological imaging is fundamental within the healthcare industry and has become routinely adopted for diagnosis, disease monitoring and treatment planning. With the advent of digital imaging modalities and the rapid growth in both diagnostic and therapeutic imaging, the ability to be able to harness this large influx of data is of paramount importance. The Oncology Medical Image Database (OMI-DB) was created to provide a centralized, fully annotated dataset for research. The database contains both processed and unprocessed images, associated data, and annotations and where applicable expert determined ground truths describing features of interest. Medical imaging provides the ability to detect and localize many changes that are important to determine whether a disease is present or a therapy is effective by depicting alterations in anatomic, physiologic, biochemical or molecular processes. Quantitative imaging features are sensitive, specific, accurate and reproducible imaging measures of these changes. Here, we describe an extension to the OMI-DB whereby a range of imaging features and descriptors are pre-calculated using a high throughput approach. The ability to calculate multiple imaging features and data from the acquired images would be valuable and facilitate further research applications investigating detection, prognosis, and classification. The resultant data store contains more than 10 million quantitative features as well as features derived from CAD predictions. Theses data can be used to build predictive models to aid image classification, treatment response assessment as well as to identify prognostic imaging biomarkers.

  8. Shell feature: a new radiomics descriptor for predicting distant failure after radiotherapy in non-small cell lung cancer and cervix cancer

    NASA Astrophysics Data System (ADS)

    Hao, Hongxia; Zhou, Zhiguo; Li, Shulong; Maquilan, Genevieve; Folkert, Michael R.; Iyengar, Puneeth; Westover, Kenneth D.; Albuquerque, Kevin; Liu, Fang; Choy, Hak; Timmerman, Robert; Yang, Lin; Wang, Jing

    2018-05-01

    Distant failure is the main cause of human cancer-related mortalities. To develop a model for predicting distant failure in non-small cell lung cancer (NSCLC) and cervix cancer (CC) patients, a shell feature, consisting of outer voxels around the tumor boundary, was constructed using pre-treatment positron emission tomography (PET) images from 48 NSCLC patients received stereotactic body radiation therapy and 52 CC patients underwent external beam radiation therapy and concurrent chemotherapy followed with high-dose-rate intracavitary brachytherapy. The hypothesis behind this feature is that non-invasive and invasive tumors may have different morphologic patterns in the tumor periphery, in turn reflecting the differences in radiological presentations in the PET images. The utility of the shell was evaluated by the support vector machine classifier in comparison with intensity, geometry, gray level co-occurrence matrix-based texture, neighborhood gray tone difference matrix-based texture, and a combination of these four features. The results were assessed in terms of accuracy, sensitivity, specificity, and AUC. Collectively, the shell feature showed better predictive performance than all the other features for distant failure prediction in both NSCLC and CC cohorts.

  9. Modeling and Prediction of Drug Dispersability in Polyvinylpyrrolidone-Vinyl Acetate Copolymer Using a Molecular Descriptor.

    PubMed

    DeBoyace, Kevin; Buckner, Ira S; Gong, Yuchuan; Ju, Tzu-Chi Rob; Wildfong, Peter L D

    2018-01-01

    The expansion of a novel in silico model for the prediction of the dispersability of 18 model compounds with polyvinylpyrrolidone-vinyl acetate copolymer is described. The molecular descriptor R3m (atomic mass weighted 3rd-order autocorrelation index) is shown to be predictive of the formation of amorphous solid dispersions at 2 drug loadings (15% and 75% w/w) using 2 preparation methods (melt quenching and solvent evaporation using a rotary evaporator). Cosolidified samples were characterized using a suite of analytical techniques, which included differential scanning calorimetry, powder X-ray diffraction, pair distribution function analysis, polarized light microscopy, and hot stage microscopy. Logistic regression was applied, where appropriate, to model the success and failure of compound dispersability in polyvinylpyrrolidone-vinyl acetate copolymer. R3m had combined prediction accuracy greater than 90% for tested samples. The usefulness of this descriptor appears to be associated with the presence of heavy atoms in the molecular structure of the active pharmaceutical ingredient, and their location with respect to the geometric center of the molecule. Given the higher electronegativity and atomic volume of these types of atoms, it is hypothesized that they may impact the molecular mobility of the active pharmaceutical ingredient, or increase the likelihood of forming nonbonding interactions with the carrier polymer. Copyright © 2018 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  10. Density functional theory for prediction of far-infrared vibrational frequencies: molecular crystals of astrophysical interest

    NASA Astrophysics Data System (ADS)

    Ennis, C.; Auchettl, R.; Appadoo, D. R. T.; Robertson, E. G.

    2017-11-01

    Solid-state density functional theory code has been implemented for the structure optimization of crystalline methanol, acetaldehyde and acetic acid and for the calculation of infrared frequencies. The results are compared to thin film spectra obtained from low-temperature experiments performed at the Australian Synchrotron. Harmonic frequency calculations of the internal modes calculated at the B3LYP-D3/m-6-311G(d) level shows higher deviation from infrared experiment than more advanced theory applied to the gas phase. Importantly for the solid-state, the simulation of low-frequency molecular lattice modes closely resembles the observed far-infrared features after application of a 0.92 scaling factor. This allowed experimental peaks to be assigned to specific translation and libration modes, including acetaldehyde and acetic acid lattice features for the first time. These frequency calculations have been performed without the need for supercomputing resources that are required for large molecular clusters using comparable levels of theory. This new theoretical approach will find use for the rapid characterization of intermolecular interactions and bonding in crystals, and the assignment of far-infrared spectra for crystalline samples such as pharmaceuticals and molecular ices. One interesting application may be for the detection of species of prebiotic interest on the surfaces of Kuiper-Belt and Trans-Neptunian Objects. At such locations, the three small organic molecules studied here could reside in their crystalline phase. The far-infrared spectra for their low-temperature solid phases are collected under planetary conditions, allowing us to compile and assign their most intense spectral features to assist future far-infrared surveys of icy Solar system surfaces.

  11. Quantitative imaging features of pretreatment CT predict volumetric response to chemotherapy in patients with colorectal liver metastases.

    PubMed

    Creasy, John M; Midya, Abhishek; Chakraborty, Jayasree; Adams, Lauryn B; Gomes, Camilla; Gonen, Mithat; Seastedt, Kenneth P; Sutton, Elizabeth J; Cercek, Andrea; Kemeny, Nancy E; Shia, Jinru; Balachandran, Vinod P; Kingham, T Peter; Allen, Peter J; DeMatteo, Ronald P; Jarnagin, William R; D'Angelica, Michael I; Do, Richard K G; Simpson, Amber L

    2018-06-19

    This study investigates whether quantitative image analysis of pretreatment CT scans can predict volumetric response to chemotherapy for patients with colorectal liver metastases (CRLM). Patients treated with chemotherapy for CRLM (hepatic artery infusion (HAI) combined with systemic or systemic alone) were included in the study. Patients were imaged at baseline and approximately 8 weeks after treatment. Response was measured as the percentage change in tumour volume from baseline. Quantitative imaging features were derived from the index hepatic tumour on pretreatment CT, and features statistically significant on univariate analysis were included in a linear regression model to predict volumetric response. The regression model was constructed from 70% of data, while 30% were reserved for testing. Test data were input into the trained model. Model performance was evaluated with mean absolute prediction error (MAPE) and R 2 . Clinicopatholologic factors were assessed for correlation with response. 157 patients were included, split into training (n = 110) and validation (n = 47) sets. MAPE from the multivariate linear regression model was 16.5% (R 2 = 0.774) and 21.5% in the training and validation sets, respectively. Stratified by HAI utilisation, MAPE in the validation set was 19.6% for HAI and 25.1% for systemic chemotherapy alone. Clinical factors associated with differences in median tumour response were treatment strategy, systemic chemotherapy regimen, age and KRAS mutation status (p < 0.05). Quantitative imaging features extracted from pretreatment CT are promising predictors of volumetric response to chemotherapy in patients with CRLM. Pretreatment predictors of response have the potential to better select patients for specific therapies. • Colorectal liver metastases (CRLM) are downsized with chemotherapy but predicting the patients that will respond to chemotherapy is currently not possible. • Heterogeneity and enhancement patterns of CRLM can be

  12. Computational intelligence models to predict porosity of tablets using minimum features

    PubMed Central

    Khalid, Mohammad Hassan; Kazemi, Pezhman; Perez-Gandarillas, Lucia; Michrafy, Abderrahim; Szlęk, Jakub; Jachowicz, Renata; Mendyk, Aleksander

    2017-01-01

    The effects of different formulations and manufacturing process conditions on the physical properties of a solid dosage form are of importance to the pharmaceutical industry. It is vital to have in-depth understanding of the material properties and governing parameters of its processes in response to different formulations. Understanding the mentioned aspects will allow tighter control of the process, leading to implementation of quality-by-design (QbD) practices. Computational intelligence (CI) offers an opportunity to create empirical models that can be used to describe the system and predict future outcomes in silico. CI models can help explore the behavior of input parameters, unlocking deeper understanding of the system. This research endeavor presents CI models to predict the porosity of tablets created by roll-compacted binary mixtures, which were milled and compacted under systematically varying conditions. CI models were created using tree-based methods, artificial neural networks (ANNs), and symbolic regression trained on an experimental data set and screened using root-mean-square error (RMSE) scores. The experimental data were composed of proportion of microcrystalline cellulose (MCC) (in percentage), granule size fraction (in micrometers), and die compaction force (in kilonewtons) as inputs and porosity as an output. The resulting models show impressive generalization ability, with ANNs (normalized root-mean-square error [NRMSE] =1%) and symbolic regression (NRMSE =4%) as the best-performing methods, also exhibiting reliable predictive behavior when presented with a challenging external validation data set (best achieved symbolic regression: NRMSE =3%). Symbolic regression demonstrates the transition from the black box modeling paradigm to more transparent predictive models. Predictive performance and feature selection behavior of CI models hints at the most important variables within this factor space. PMID:28138223

  13. Molecular Markers for Breast Cancer: Prediction on Tumor Behavior

    PubMed Central

    Banin Hirata, Bruna Karina; Oda, Julie Massayo Maeda; Losi Guembarovski, Roberta; Ariza, Carolina Batista; de Oliveira, Carlos Eduardo Coral; Watanabe, Maria Angelica Ehara

    2014-01-01

    Breast cancer is one of the most common cancers with greater than 1,300,000 cases and 450,000 deaths each year worldwide. The development of breast cancer involves a progression through intermediate stages until the invasive carcinoma and finally into metastatic disease. Given the variability in clinical progression, the identification of markers that could predict the tumor behavior is particularly important in breast cancer. The determination of tumor markers is a useful tool for clinical management in cancer patients, assisting in diagnostic, staging, evaluation of therapeutic response, detection of recurrence and metastasis, and development of new treatment modalities. In this context, this review aims to discuss the main tumor markers in breast carcinogenesis. The most well-established breast molecular markers with prognostic and/or therapeutic value like hormone receptors, HER-2 oncogene, Ki-67, and p53 proteins, and the genes for hereditary breast cancer will be presented. Furthermore, this review shows the new molecular targets in breast cancer: CXCR4, caveolin, miRNA, and FOXP3, as promising candidates for future development of effective and targeted therapies, also with lower toxicity. PMID:24591761

  14. Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data.

    PubMed

    Liu, Zhenqiu; Sun, Fengzhu; McGovern, Dermot P

    2017-01-01

    Feature selection and prediction are the most important tasks for big data mining. The common strategies for feature selection in big data mining are L 1 , SCAD and MC+. However, none of the existing algorithms optimizes L 0 , which penalizes the number of nonzero features directly. In this paper, we develop a novel sparse generalized linear model (GLM) with L 0 approximation for feature selection and prediction with big omics data. The proposed approach approximate the L 0 optimization directly. Even though the original L 0 problem is non-convex, the problem is approximated by sequential convex optimizations with the proposed algorithm. The proposed method is easy to implement with only several lines of code. Novel adaptive ridge algorithms ( L 0 ADRIDGE) for L 0 penalized GLM with ultra high dimensional big data are developed. The proposed approach outperforms the other cutting edge regularization methods including SCAD and MC+ in simulations. When it is applied to integrated analysis of mRNA, microRNA, and methylation data from TCGA ovarian cancer, multilevel gene signatures associated with suboptimal debulking are identified simultaneously. The biological significance and potential clinical importance of those genes are further explored. The developed Software L 0 ADRIDGE in MATLAB is available at https://github.com/liuzqx/L0adridge.

  15. Molecular classification of patients with grade II/III glioma using quantitative MRI characteristics.

    PubMed

    Bahrami, Naeim; Hartman, Stephen J; Chang, Yu-Hsuan; Delfanti, Rachel; White, Nathan S; Karunamuni, Roshan; Seibert, Tyler M; Dale, Anders M; Hattangadi-Gluth, Jona A; Piccioni, David; Farid, Nikdokht; McDonald, Carrie R

    2018-06-02

    Molecular markers of WHO grade II/III glioma are known to have important prognostic and predictive implications and may be associated with unique imaging phenotypes. The purpose of this study is to determine whether three clinically relevant molecular markers identified in gliomas-IDH, 1p/19q, and MGMT status-show distinct quantitative MRI characteristics on FLAIR imaging. Sixty-one patients with grade II/III gliomas who had molecular data and MRI available prior to radiation were included. Quantitative MRI features were extracted that measured tissue heterogeneity (homogeneity and pixel correlation) and FLAIR border distinctiveness (edge contrast; EC). T-tests were conducted to determine whether patients with different genotypes differ across the features. Logistic regression with LASSO regularization was used to determine the optimal combination of MRI and clinical features for predicting molecular subtypes. Patients with IDH wildtype tumors showed greater signal heterogeneity (p = 0.001) and lower EC (p = 0.008) within the FLAIR region compared to IDH mutant tumors. Among patients with IDH mutant tumors, 1p/19q co-deleted tumors had greater signal heterogeneity (p = 0.002) and lower EC (p = 0.005) compared to 1p/19q intact tumors. MGMT methylated tumors showed lower EC (p = 0.03) compared to the unmethylated group. The combination of FLAIR border distinctness, heterogeneity, and pixel correlation optimally classified tumors by IDH status. Quantitative imaging characteristics of FLAIR heterogeneity and border pattern in grade II/III gliomas may provide unique information for determining molecular status at time of initial diagnostic imaging, which may then guide subsequent surgical and medical management.

  16. Fusobacterium in colonic flora and molecular features of colorectal carcinoma

    PubMed Central

    Tahara, Tomomitsu; Yamamoto, Eiichiro; Suzuki, Hiromu; Maruyama, Reo; Chung, Woonbok; Garriga, Judith; Jelinek, Jaroslav; Yamano, Hiro-o; Sugai, Tamotsu; An, Byonggu; Shureiqi, Imad; Toyota, Minoru; Kondo, Yutaka; Estécio, Marcos R. H.; Issa, Jean-Pierre J.

    2015-01-01

    Fusobacterium species are part of the gut microbiome in humans. Recent studies have identified over-representation of Fusobacterium in colorectal cancer (CRC) tissues but it is not yet clear whether this is pathogenic or simply an epiphenomenon. In this study, we evaluated the relationship between Fusobacterium status and molecular features in CRCs through quantitative real-time PCR in 149 CRC tissues, 89 adjacent normal appearing mucosae and 72 colonic mucosae from cancer-free individuals. Results were correlated with CpG island methylator phenotype (CIMP) status, microsatellite instability (MSI) and mutations in BRAF, KRAS, TP53, CHD7 and CHD8. Whole exome capture sequencing data were also available in 11 cases. Fusobacterium was detectable in 111/149 (74%) CRC tissues and heavily enriched in 9% (14/149) of the cases. As expected, Fusobacterium was also detected in normal appearing mucosae from both cancer and cancer-free individuals but the amount of bacteria was much lower compared to CRC tissues (a mean of 250-fold lower for Pan-fusobacterium). We found the Fusobacterium-high CRC group (FB-high) to be associated with CIMP positivity (p=0.001), TP53 wild type (p=0.015), hMLH1 methylation positivity (p=0.0028), MSI (p=0.018) and CHD7/8 mutation positivity (p=0.002). Among the 11 cases where whole exome sequencing data was available, two that were FB-high cases also had the highest number of somatic mutations (a mean of 736 per case in FB-high vs. 225 per case in all others). Taken together, our findings show that Fusobacterium enrichment is associated with specific molecular subsets of CRCs, offering support for a pathogenic role in CRC for this gut microbiome component PMID:24385213

  17. Endoscopic Features of Mucous Cap Polyps: A Way to Predict Serrated Polyps.

    PubMed

    Moy, Brian T; Forouhar, Faripour; Kuo, Chia-Ling; Devers, Thomas J

    2018-04-27

    The aims of the study were to identify whether a mucous-cap predicts the presence of serrated polyps, and to determine whether additional endoscopic findings predict the presence of a sessile serrated adenomas/polyp (SSA/P). We analyzed 147 mucous-capped polyps with corresponding histology, during 2011-2014. Eight endoscopic features (presence of borders, elevation, rim of debris, location in the colon, size ≥10 mm, varicose vessels, nodularity, and alteration in mucosal folds) of mucous-capped polyps were examined to see if they can predict SSA/Ps. A total of 86% (n=126) of mucous-capped polyps were from the right sided serrated pathway (right-sided hyperplastic [n=83], SSA/Ps [n=43], traditional serrated adenoma [n=1]), 10% (n=15) were left-sided hyperplastic polyps, and 3% (n=5) were from the adenoma-carcinoma sequence. The presence of a mucous cap combined with varicose vessels was the only significant predictor for SSA/Ps. The other seven characteristics were not found to be statistically significant for SSA/Ps, although location in the colon and the presence of nodularity trended towards significance. Our study suggests that mucous-capped polyps have high predictability for being a part of the serrated pathway. Gastroenterologists should be alert for a mucous-capped polyp with varicose veins, as these lesions have a higher risk of SSA/P.

  18. Binding Affinity prediction with Property Encoded Shape Distribution signatures

    PubMed Central

    Das, Sourav; Krein, Michael P.

    2010-01-01

    We report the use of the molecular signatures known as “Property-Encoded Shape Distributions” (PESD) together with standard Support Vector Machine (SVM) techniques to produce validated models that can predict the binding affinity of a large number of protein ligand complexes. This “PESD-SVM” method uses PESD signatures that encode molecular shapes and property distributions on protein and ligand surfaces as features to build SVM models that require no subjective feature selection. A simple protocol was employed for tuning the SVM models during their development, and the results were compared to SFCscore – a regression-based method that was previously shown to perform better than 14 other scoring functions. Although the PESD-SVM method is based on only two surface property maps, the overall results were comparable. For most complexes with a dominant enthalpic contribution to binding (ΔH/-TΔS > 3), a good correlation between true and predicted affinities was observed. Entropy and solvent were not considered in the present approach and further improvement in accuracy would require accounting for these components rigorously. PMID:20095526

  19. Evaluating stability of histomorphometric features across scanner and staining variations: predicting biochemical recurrence from prostate cancer whole slide images

    NASA Astrophysics Data System (ADS)

    Leo, Patrick; Lee, George; Madabhushi, Anant

    2016-03-01

    Quantitative histomorphometry (QH) is the process of computerized extraction of features from digitized tissue slide images. Typically these features are used in machine learning classifiers to predict disease presence, behavior and outcome. Successful robust classifiers require features that both discriminate between classes of interest and are stable across data from multiple sites. Feature stability may be compromised by variation in slide staining and scanning procedures. These laboratory specific variables include dye batch, slice thickness and the whole slide scanner used to digitize the slide. The key therefore is to be able to identify features that are not only discriminating between the classes of interest (e.g. cancer and non-cancer or biochemical recurrence and non- recurrence) but also features that will not wildly fluctuate on slides representing the same tissue class but from across multiple different labs and sites. While there has been some recent efforts at understanding feature stability in the context of radiomics applications (i.e. feature analysis of radiographic images), relatively few attempts have been made at studying the trade-off between feature stability and discriminability for histomorphometric and digital pathology applications. In this paper we present two new measures, preparation-induced instability score (PI) and latent instability score (LI), to quantify feature instability across and within datasets. Dividing PI by LI yields a ratio for how often a feature for a specific tissue class (e.g. low grade prostate cancer) is different between datasets from different sites versus what would be expected from random chance alone. Using this ratio we seek to quantify feature vulnerability to variations in slide preparation and digitization. Since our goal is to identify stable QH features we evaluate these features for their stability and thus inclusion in machine learning based classifiers in a use case involving prostate cancer

  20. Prediction of Heterodimeric Protein Complexes from Weighted Protein-Protein Interaction Networks Using Novel Features and Kernel Functions

    PubMed Central

    Ruan, Peiying; Hayashida, Morihiro; Maruyama, Osamu; Akutsu, Tatsuya

    2013-01-01

    Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes. PMID:23776458

  1. Structure prediction and molecular simulation of gases diffusion pathways in hydrogenase.

    PubMed

    Sundaram, Shanthy; Tripathi, Ashutosh; Gupta, Vipul

    2010-10-06

    Although hydrogen is considered to be one of the most promising future energy sources and the technical aspects involved in using it have advanced considerably, the future supply of hydrogen from renewable sources is still unsolved. The [Fe]- hydrogenase enzymes are highly efficient H(2) catalysts found in ecologically and phylogenetically diverse microorganisms, including the photosynthetic green alga, Chlamydomonas reinhardtii. While these enzymes can occur in several forms, H(2) catalysis takes place at a unique [FeS] prosthetic group or H-cluster, located at the active site. 3D structure of the protein hydA1 hydrogenase from Chlamydomonas reinhardtti was predicted using the MODELER 8v2 software. Conserved region was depicted from the NCBI CDD Search. Template selection was done on the basis NCBI BLAST results. For single template 1FEH was used and for multiple templates 1FEH and 1HFE were used. The result of the Homology modeling was verified by uploading the file to SAVS server. On the basis of the SAVS result 3D structure predicted using single template was chosen for performing molecular simulation. For performing molecular simulation three strategies were used. First the molecular simulation of the protein was performed in solvated box containing bulk water. Then 100 H(2) molecules were randomly inserted in the solvated box and two simulations of 50 and 100 ps were performed. Similarly 100 O(2) molecules were randomly placed in the solvated box and again 50 and 100 ps simulation were performed. Energy minimization was performed before each simulation was performed. Conformations were saved after each simulation. Analysis of the gas diffusion was done on the basis of RMSD, Radius of Gyration and no. of gas molecule/ps plot.

  2. Performance comparison of the Prophecy (forecasting) Algorithm in FFT form for unseen feature and time-series prediction

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger; Handley, James

    2013-06-01

    We introduce a generalized numerical prediction and forecasting algorithm. We have previously published it for malware byte sequence feature prediction and generalized distribution modeling for disparate test article analysis. We show how non-trivial non-periodic extrapolation of a numerical sequence (forecast and backcast) from the starting data is possible. Our ancestor-progeny prediction can yield new options for evolutionary programming. Our equations enable analytical integrals and derivatives to any order. Interpolation is controllable from smooth continuous to fractal structure estimation. We show how our generalized trigonometric polynomial can be derived using a Fourier transform.

  3. Online feature selection with streaming features.

    PubMed

    Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan

    2013-05-01

    We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.

  4. A Prediction Model for ROS1-Rearranged Lung Adenocarcinomas based on Histologic Features

    PubMed Central

    Zheng, Jing; Kong, Mei; Sun, Ke; Wang, Bo; Chen, Xi; Ding, Wei; Zhou, Jianying

    2016-01-01

    Aims To identify the clinical and histological characteristics of ROS1-rearranged non-small-cell lung carcinomas (NSCLCs) and build a prediction model to prescreen suitable patients for molecular testing. Methods and Results We identified 27 cases of ROS1-rearranged lung adenocarcinomas in 1165 patients with NSCLCs confirmed by real-time PCR and FISH and performed univariate and multivariate analyses to identify predictive factors associated with ROS1 rearrangement and finally developed prediction model. Detected with ROS1 immunochemistry, 59 cases of 1165 patients had a certain degree of ROS1 expression. Among these cases, 19 cases (68%, 19/28) with 3+ and 8 cases (47%, 8/17) with 2+ staining were ROS1 rearrangement verified by real-time PCR and FISH. In the resected group, the acinar-predominant growth pattern was the most commonly observed (57%, 8/14), while in the biopsy group, solid patterns were the most frequently observed (78%, 7/13). Based on multiple logistic regression analysis, we determined that female sex, cribriform structure and the presence of psammoma body were the three most powerful indicators of ROS1 rearrangement, and we have developed a predictive model for the presence of ROS1 rearrangements in lung adenocarcinomas. Conclusions Female, cribriform structure and presence of psammoma body were the three most powerful indicator of ROS1 rearrangement status, and predictive formula was helpful in screening ROS1-rearranged NSCLC, especially for ROS1 immunochemistry equivocal cases. PMID:27648828

  5. A Prediction Model for ROS1-Rearranged Lung Adenocarcinomas based on Histologic Features.

    PubMed

    Zhou, Jianya; Zhao, Jing; Zheng, Jing; Kong, Mei; Sun, Ke; Wang, Bo; Chen, Xi; Ding, Wei; Zhou, Jianying

    2016-01-01

    To identify the clinical and histological characteristics of ROS1-rearranged non-small-cell lung carcinomas (NSCLCs) and build a prediction model to prescreen suitable patients for molecular testing. We identified 27 cases of ROS1-rearranged lung adenocarcinomas in 1165 patients with NSCLCs confirmed by real-time PCR and FISH and performed univariate and multivariate analyses to identify predictive factors associated with ROS1 rearrangement and finally developed prediction model. Detected with ROS1 immunochemistry, 59 cases of 1165 patients had a certain degree of ROS1 expression. Among these cases, 19 cases (68%, 19/28) with 3+ and 8 cases (47%, 8/17) with 2+ staining were ROS1 rearrangement verified by real-time PCR and FISH. In the resected group, the acinar-predominant growth pattern was the most commonly observed (57%, 8/14), while in the biopsy group, solid patterns were the most frequently observed (78%, 7/13). Based on multiple logistic regression analysis, we determined that female sex, cribriform structure and the presence of psammoma body were the three most powerful indicators of ROS1 rearrangement, and we have developed a predictive model for the presence of ROS1 rearrangements in lung adenocarcinomas. Female, cribriform structure and presence of psammoma body were the three most powerful indicator of ROS1 rearrangement status, and predictive formula was helpful in screening ROS1-rearranged NSCLC, especially for ROS1 immunochemistry equivocal cases.

  6. TH-E-BRF-05: Comparison of Survival-Time Prediction Models After Radiotherapy for High-Grade Glioma Patients Based On Clinical and DVH Features

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Magome, T; Haga, A; Igaki, H

    Purpose: Although many outcome prediction models based on dose-volume information have been proposed, it is well known that the prognosis may be affected also by multiple clinical factors. The purpose of this study is to predict the survival time after radiotherapy for high-grade glioma patients based on features including clinical and dose-volume histogram (DVH) information. Methods: A total of 35 patients with high-grade glioma (oligodendroglioma: 2, anaplastic astrocytoma: 3, glioblastoma: 30) were selected in this study. All patients were treated with prescribed dose of 30–80 Gy after surgical resection or biopsy from 2006 to 2013 at The University of Tokyomore » Hospital. All cases were randomly separated into training dataset (30 cases) and test dataset (5 cases). The survival time after radiotherapy was predicted based on a multiple linear regression analysis and artificial neural network (ANN) by using 204 candidate features. The candidate features included the 12 clinical features (tumor location, extent of surgical resection, treatment duration of radiotherapy, etc.), and the 192 DVH features (maximum dose, minimum dose, D95, V60, etc.). The effective features for the prediction were selected according to a step-wise method by using 30 training cases. The prediction accuracy was evaluated by a coefficient of determination (R{sup 2}) between the predicted and actual survival time for the training and test dataset. Results: In the multiple regression analysis, the value of R{sup 2} between the predicted and actual survival time was 0.460 for the training dataset and 0.375 for the test dataset. On the other hand, in the ANN analysis, the value of R{sup 2} was 0.806 for the training dataset and 0.811 for the test dataset. Conclusion: Although a large number of patients would be needed for more accurate and robust prediction, our preliminary Result showed the potential to predict the outcome in the patients with high-grade glioma. This work was partly

  7. Predicting beta-turns in proteins using support vector machines with fractional polynomials

    PubMed Central

    2013-01-01

    Background β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. Results We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. Conclusions In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods. PMID:24565438

  8. Predicting beta-turns in proteins using support vector machines with fractional polynomials.

    PubMed

    Elbashir, Murtada; Wang, Jianxin; Wu, Fang-Xiang; Wang, Lusheng

    2013-11-07

    β-turns are secondary structure type that have essential role in molecular recognition, protein folding, and stability. They are found to be the most common type of non-repetitive structures since 25% of amino acids in protein structures are situated on them. Their prediction is considered to be one of the crucial problems in bioinformatics and molecular biology, which can provide valuable insights and inputs for the fold recognition and drug design. We propose an approach that combines support vector machines (SVMs) and logistic regression (LR) in a hybrid prediction method, which we call (H-SVM-LR) to predict β-turns in proteins. Fractional polynomials are used for LR modeling. We utilize position specific scoring matrices (PSSMs) and predicted secondary structure (PSS) as features. Our simulation studies show that H-SVM-LR achieves Qtotal of 82.87%, 82.84%, and 82.32% on the BT426, BT547, and BT823 datasets respectively. These values are the highest among other β-turns prediction methods that are based on PSSMs and secondary structure information. H-SVM-LR also achieves favorable performance in predicting β-turns as measured by the Matthew's correlation coefficient (MCC) on these datasets. Furthermore, H-SVM-LR shows good performance when considering shape strings as additional features. In this paper, we present a comprehensive approach for β-turns prediction. Experiments show that our proposed approach achieves better performance compared to other competing prediction methods.

  9. Application of Molecular Dynamics Simulations in Molecular Property Prediction I: Density and Heat of Vaporization

    PubMed Central

    Wang, Junmei; Tingjun, Hou

    2011-01-01

    Molecular mechanical force field (FF) methods are useful in studying condensed phase properties. They are complementary to experiment and can often go beyond experiment in atomic details. Even a FF is specific for studying structures, dynamics and functions of biomolecules, it is still important for the FF to accurately reproduce the experimental liquid properties of small molecules that represent the chemical moieties of biomolecules. Otherwise, the force field may not describe the structures and energies of macromolecules in aqueous solutions properly. In this work, we have carried out a systematic study to evaluate the General AMBER Force Field (GAFF) in studying densities and heats of vaporization for a large set of organic molecules that covers the most common chemical functional groups. The latest techniques, such as the particle mesh Ewald (PME) for calculating electrostatic energies, and Langevin dynamics for scaling temperatures, have been applied in the molecular dynamics (MD) simulations. For density, the average percent error (APE) of 71 organic compounds is 4.43% when compared to the experimental values. More encouragingly, the APE drops to 3.43% after the exclusion of two outliers and four other compounds for which the experimental densities have been measured with pressures higher than 1.0 atm. For heat of vaporization, several protocols have been investigated and the best one, P4/ntt0, achieves an average unsigned error (AUE) and a root-mean-square error (RMSE) of 0.93 and 1.20 kcal/mol, respectively. How to reduce the prediction errors through proper van der Waals (vdW) parameterization has been discussed. An encouraging finding in vdW parameterization is that both densities and heats of vaporization approach their “ideal” values in a synchronous fashion when vdW parameters are tuned. The following hydration free energy calculation using thermodynamic integration further justifies the vdW refinement. We conclude that simple vdW parameterization

  10. TargetCrys: protein crystallization prediction by fusing multi-view features with two-layered SVM.

    PubMed

    Hu, Jun; Han, Ke; Li, Yang; Yang, Jing-Yu; Shen, Hong-Bin; Yu, Dong-Jun

    2016-11-01

    The accurate prediction of whether a protein will crystallize plays a crucial role in improving the success rate of protein crystallization projects. A common critical problem in the development of machine-learning-based protein crystallization predictors is how to effectively utilize protein features extracted from different views. In this study, we aimed to improve the efficiency of fusing multi-view protein features by proposing a new two-layered SVM (2L-SVM) which switches the feature-level fusion problem to a decision-level fusion problem: the SVMs in the 1st layer of the 2L-SVM are trained on each of the multi-view feature sets; then, the outputs of the 1st layer SVMs, which are the "intermediate" decisions made based on the respective feature sets, are further ensembled by a 2nd layer SVM. Based on the proposed 2L-SVM, we implemented a sequence-based protein crystallization predictor called TargetCrys. Experimental results on several benchmark datasets demonstrated the efficacy of the proposed 2L-SVM for fusing multi-view features. We also compared TargetCrys with existing sequence-based protein crystallization predictors and demonstrated that the proposed TargetCrys outperformed most of the existing predictors and is competitive with the state-of-the-art predictors. The TargetCrys webserver and datasets used in this study are freely available for academic use at: http://csbio.njust.edu.cn/bioinf/TargetCrys .

  11. Integrating in silico prediction methods, molecular docking, and molecular dynamics simulation to predict the impact of ALK missense mutations in structural perspective.

    PubMed

    Doss, C George Priya; Chakraborty, Chiranjib; Chen, Luonan; Zhu, Hailong

    2014-01-01

    Over the past decade, advancements in next generation sequencing technology have placed personalized genomic medicine upon horizon. Understanding the likelihood of disease causing mutations in complex diseases as pathogenic or neutral remains as a major task and even impossible in the structural context because of its time consuming and expensive experiments. Among the various diseases causing mutations, single nucleotide polymorphisms (SNPs) play a vital role in defining individual's susceptibility to disease and drug response. Understanding the genotype-phenotype relationship through SNPs is the first and most important step in drug research and development. Detailed understanding of the effect of SNPs on patient drug response is a key factor in the establishment of personalized medicine. In this paper, we represent a computational pipeline in anaplastic lymphoma kinase (ALK) for SNP-centred study by the application of in silico prediction methods, molecular docking, and molecular dynamics simulation approaches. Combination of computational methods provides a way in understanding the impact of deleterious mutations in altering the protein drug targets and eventually leading to variable patient's drug response. We hope this rapid and cost effective pipeline will also serve as a bridge to connect the clinicians and in silico resources in tailoring treatments to the patients' specific genotype.

  12. Synaptic State Matching: A Dynamical Architecture for Predictive Internal Representation and Feature Detection

    PubMed Central

    Tavazoie, Saeed

    2013-01-01

    Here we explore the possibility that a core function of sensory cortex is the generation of an internal simulation of sensory environment in real-time. A logical elaboration of this idea leads to a dynamical neural architecture that oscillates between two fundamental network states, one driven by external input, and the other by recurrent synaptic drive in the absence of sensory input. Synaptic strength is modified by a proposed synaptic state matching (SSM) process that ensures equivalence of spike statistics between the two network states. Remarkably, SSM, operating locally at individual synapses, generates accurate and stable network-level predictive internal representations, enabling pattern completion and unsupervised feature detection from noisy sensory input. SSM is a biologically plausible substrate for learning and memory because it brings together sequence learning, feature detection, synaptic homeostasis, and network oscillations under a single unifying computational framework. PMID:23991161

  13. A novel numerical model to predict the morphological behavior of magnetic liquid marbles using coarse grained molecular dynamics concepts

    NASA Astrophysics Data System (ADS)

    Polwaththe-Gallage, Hasitha-Nayanajith; Sauret, Emilie; Nguyen, Nam-Trung; Saha, Suvash C.; Gu, YuanTong

    2018-01-01

    Liquid marbles are liquid droplets coated with superhydrophobic powders whose morphology is governed by the gravitational and surface tension forces. Small liquid marbles take spherical shapes, while larger liquid marbles exhibit puddle shapes due to the dominance of gravitational forces. Liquid marbles coated with hydrophobic magnetic powders respond to an external magnetic field. This unique feature of magnetic liquid marbles is very attractive for digital microfluidics and drug delivery systems. Several experimental studies have reported the behavior of the liquid marbles. However, the complete behavior of liquid marbles under various environmental conditions is yet to be understood. Modeling techniques can be used to predict the properties and the behavior of the liquid marbles effectively and efficiently. A robust liquid marble model will inspire new experiments and provide new insights. This paper presents a novel numerical modeling technique to predict the morphology of magnetic liquid marbles based on coarse grained molecular dynamics concepts. The proposed model is employed to predict the changes in height of a magnetic liquid marble against its width and compared with the experimental data. The model predictions agree well with the experimental findings. Subsequently, the relationship between the morphology of a liquid marble with the properties of the liquid is investigated. Furthermore, the developed model is capable of simulating the reversible process of opening and closing of the magnetic liquid marble under the action of a magnetic force. The scaling analysis shows that the model predictions are consistent with the scaling laws. Finally, the proposed model is used to assess the compressibility of the liquid marbles. The proposed modeling approach has the potential to be a powerful tool to predict the behavior of magnetic liquid marbles serving as bioreactors.

  14. Basic features of the predictive tools of early warning systems for water-related natural hazards: examples for shallow landslides

    NASA Astrophysics Data System (ADS)

    Greco, Roberto; Pagano, Luca

    2017-12-01

    To manage natural risks, an increasing effort is being put in the development of early warning systems (EWS), namely, approaches facing catastrophic phenomena by timely forecasting and alarm spreading throughout exposed population. Research efforts aimed at the development and implementation of effective EWS should especially concern the definition and calibration of the interpretative model. This paper analyses the main features characterizing predictive models working in EWS by discussing their aims and their features in terms of model accuracy, evolutionary stage of the phenomenon at which the prediction is carried out and model architecture. Original classification criteria based on these features are developed throughout the paper and shown in their practical implementation through examples of flow-like landslides and earth flows, both of which are characterized by rapid evolution and quite representative of many applications of EWS.

  15. Conventional MRI features for predicting the clinical outcome of patients with invasive placenta

    PubMed Central

    Chen, Ting; Xu, Xiao-Quan; Shi, Hai-Bin; Yang, Zheng-Qiang; Zhou, Xin; Pan, Yi

    2017-01-01

    PURPOSE We aimed to evaluate whether morphologic magnetic resonance imaging (MRI) features could help to predict the maternal outcome after uterine artery embolization (UAE)-assisted cesarean section (CS) in patients with invasive placenta previa. METHODS We retrospectively reviewed the MRI data of 40 pregnant women who have undergone UAE-assisted cesarean section due to suspected high risk of massive hemorrhage caused by invasive placenta previa. Patients were divided into two groups based on the maternal outcome (good-outcome group: minor hemorrhage and uterus preserved; poor-outcome group: significant hemorrhage or emergency hysterectomy). Morphologic MRI features were compared between the two groups. Multivariate logistic regression analysis was used to identify the most valuable variables, and predictive value of the identified risk factor was determined. RESULTS Low signal intensity bands on T2-weighted imaging (P < 0.001), placenta percreta (P = 0.011), and placental cervical protrusion sign (P = 0.002) were more frequently observed in patients with poor outcome. Low signal intensity bands on T2-weighted imaging was the only significant predictor of poor maternal outcome in multivariate analysis (P = 0.020; odds ratio, 14.79), with 81.3% sensitivity and 84.3% specificity. CONCLUSION Low signal intensity bands on T2-weighted imaging might be a predictor of poor maternal outcome after UAE-assisted cesarean section in patients with invasive placenta previa. PMID:28345524

  16. Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention.

    PubMed

    Attallah, Omneya; Karthikesalingam, Alan; Holt, Peter J E; Thompson, Matthew M; Sayers, Rob; Bown, Matthew J; Choke, Eddie C; Ma, Xianghong

    2017-08-03

    Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox's proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher's previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox's model. The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox's model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0

  17. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

    PubMed

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A

    2013-07-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

  18. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    PubMed Central

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  19. In silico modelling and molecular dynamics simulation studies of thiazolidine based PTP1B inhibitors.

    PubMed

    Mahapatra, Manoj Kumar; Bera, Krishnendu; Singh, Durg Vijay; Kumar, Rajnish; Kumar, Manoj

    2018-04-01

    Protein tyrosine phosphatase 1B (PTP1B) has been identified as a negative regulator of insulin and leptin signalling pathway; hence, it can be considered as a new therapeutic target of intervention for the treatment of type 2 diabetes. Inhibition of this molecular target takes care of both diabetes and obesity, i.e. diabestiy. In order to get more information on identification and optimization of lead, pharmacophore modelling, atom-based 3D QSAR, docking and molecular dynamics studies were carried out on a set of ligands containing thiazolidine scaffold. A six-point pharmacophore model consisting of three hydrogen bond acceptor (A), one negative ionic (N) and two aromatic rings (R) with discrete geometries as pharmacophoric features were developed for a predictive 3D QSAR model. The probable binding conformation of the ligands within the active site was studied through molecular docking. The molecular interactions and the structural features responsible for PTP1B inhibition and selectivity were further supplemented by molecular dynamics simulation study for a time scale of 30 ns. The present investigation has identified some of the indispensible structural features of thiazolidine analogues which can further be explored to optimize PTP1B inhibitors.

  20. Featured Image: A Molecular Cloud Outside Our Galaxy

    NASA Astrophysics Data System (ADS)

    Kohler, Susanna

    2018-06-01

    What do molecular clouds look like outside of our own galaxy? See for yourself in the images above and below of N55, a molecular cloud located in the Large Magellanic Cloud (LMC). In a recent study led by Naslim Neelamkodan (Academia Sinica Institute of Astronomy and Astrophysics, Taiwan), a team of scientists explore N55 to determine how its cloud properties differ from clouds within the Milky Way. The image above reveals the distribution of infrared-emitting gas and dust observed in three bands by the Spitzer Space Telescope. Overplotted in cyan are observations from the Atacama Submillimeter Telescope Experiment tracing the clumpy, warm molecular gas. Below, new observations from the Atacama Large Millimeter/submillimeter Array (ALMA) reveal the sub-parsec-scale molecular clumps in greater detail, showing the correlation of massive clumps with Spitzer-identified young stellar objects (crosses). The study presented here indicates that this cloud in the LMC is the site of massive star formation, with properties similar to equivalent clouds in the Milky Way. To learn more about the authors findings, check out the article linked below.CitationNaslim N. et al 2018 ApJ 853 175. doi:10.3847/1538-4357/aaa5b0

  1. Prediction models for solitary pulmonary nodules based on curvelet textural features and clinical parameters.

    PubMed

    Wang, Jing-Jing; Wu, Hai-Feng; Sun, Tao; Li, Xia; Wang, Wei; Tao, Li-Xin; Huo, Da; Lv, Ping-Xin; He, Wen; Guo, Xiu-Hua

    2013-01-01

    Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images.

  2. Analysis of Factors Influencing Hydration Site Prediction Based on Molecular Dynamics Simulations

    PubMed Central

    2015-01-01

    Water contributes significantly to the binding of small molecules to proteins in biochemical systems. Molecular dynamics (MD) simulation based programs such as WaterMap and WATsite have been used to probe the locations and thermodynamic properties of hydration sites at the surface or in the binding site of proteins generating important information for structure-based drug design. However, questions associated with the influence of the simulation protocol on hydration site analysis remain. In this study, we use WATsite to investigate the influence of factors such as simulation length and variations in initial protein conformations on hydration site prediction. We find that 4 ns MD simulation is appropriate to obtain a reliable prediction of the locations and thermodynamic properties of hydration sites. In addition, hydration site prediction can be largely affected by the initial protein conformations used for MD simulations. Here, we provide a first quantification of this effect and further indicate that similar conformations of binding site residues (RMSD < 0.5 Å) are required to obtain consistent hydration site predictions. PMID:25252619

  3. Analysis of factors influencing hydration site prediction based on molecular dynamics simulations.

    PubMed

    Yang, Ying; Hu, Bingjie; Lill, Markus A

    2014-10-27

    Water contributes significantly to the binding of small molecules to proteins in biochemical systems. Molecular dynamics (MD) simulation based programs such as WaterMap and WATsite have been used to probe the locations and thermodynamic properties of hydration sites at the surface or in the binding site of proteins generating important information for structure-based drug design. However, questions associated with the influence of the simulation protocol on hydration site analysis remain. In this study, we use WATsite to investigate the influence of factors such as simulation length and variations in initial protein conformations on hydration site prediction. We find that 4 ns MD simulation is appropriate to obtain a reliable prediction of the locations and thermodynamic properties of hydration sites. In addition, hydration site prediction can be largely affected by the initial protein conformations used for MD simulations. Here, we provide a first quantification of this effect and further indicate that similar conformations of binding site residues (RMSD < 0.5 Å) are required to obtain consistent hydration site predictions.

  4. Molecular Dynamics Simulations and Kinetic Measurements to Estimate and Predict Protein-Ligand Residence Times.

    PubMed

    Mollica, Luca; Theret, Isabelle; Antoine, Mathias; Perron-Sierra, Françoise; Charton, Yves; Fourquez, Jean-Marie; Wierzbicki, Michel; Boutin, Jean A; Ferry, Gilles; Decherchi, Sergio; Bottegoni, Giovanni; Ducrot, Pierre; Cavalli, Andrea

    2016-08-11

    Ligand-target residence time is emerging as a key drug discovery parameter because it can reliably predict drug efficacy in vivo. Experimental approaches to binding and unbinding kinetics are nowadays available, but we still lack reliable computational tools for predicting kinetics and residence time. Most attempts have been based on brute-force molecular dynamics (MD) simulations, which are CPU-demanding and not yet particularly accurate. We recently reported a new scaled-MD-based protocol, which showed potential for residence time prediction in drug discovery. Here, we further challenged our procedure's predictive ability by applying our methodology to a series of glucokinase activators that could be useful for treating type 2 diabetes mellitus. We combined scaled MD with experimental kinetics measurements and X-ray crystallography, promptly checking the protocol's reliability by directly comparing computational predictions and experimental measures. The good agreement highlights the potential of our scaled-MD-based approach as an innovative method for computationally estimating and predicting drug residence times.

  5. [Molecular mechanisms of primary and secondary resistance, molecular-genetic features and characteristics of KIT/PDGFRA non-mutated GISTs].

    PubMed

    Kalfusová, Alena; Kodet, Roman

    2017-01-01

    Gastrointestinal stromal tumors (GISTs) are the most common mesenchymal tumors of the gastrointestinal tract. Most of them arise due to activating mutations in KIT (75 - 85 %) or PDGFRA (less than 10 %) genes. Identification of the activating mutations in KIT and PDGFRA genes, which code for receptor tyrosine kinases (RTKs), has improved the outcome of targeted therapy of metastatic, unresectable or recurrent GISTs. Primary and/or secondary resistance represents a significant problem in the targeted therapy by Imatinib mesylate (IM) in patients with GIST. An important mechanism of the secondary resistance is the evolvement of secondary mutations. Except for primary and secondary resistance, there is another problem of disease progression - a failure of tumor cells eradication even in the long term therapy of tyrosine kinase inhibitors. GISTs without mutations in KIT/PDGFRA genes constitute 10 - 15% GISTs in adults, and a majority (85 %) of pediatric GISTs. KIT/PDGFRA wild-type GISTs represent a heterogeneous group of tumors with several molecular-genetics and/or morphologic differences. KIT/PDGFRA wild-type GISTs are different in their molecular features, for example in mutations in the BRAF, KRAS, NF1 genes or defects of succinate dehydrogenase (SDH) subunits. KIT/PDGFRA wild-type GISTs are generally less sensitive to targeted therapy by tyrosine kinase inhibitors in comparison with KIT/PDGFRA mutated GISTs. Inhibitors of BRAF, PI3K (mTOR) or inhibitors of IGF1R and VEGFR receptors provide alternative therapeutic strategies.

  6. Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.

    PubMed

    Zhou, Hang; Yang, Yang; Shen, Hong-Bin

    2017-03-15

    Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. [Analysis of Conformational Features of Watson-Crick Duplex Fragments by Molecular Mechanics and Quantum Mechanics Methods].

    PubMed

    Poltev, V I; Anisimov, V M; Sanchez, C; Deriabina, A; Gonzalez, E; Garcia, D; Rivas, F; Polteva, N A

    2016-01-01

    It is generally accepted that the important characteristic features of the Watson-Crick duplex originate from the molecular structure of its subunits. However, it still remains to elucidate what properties of each subunit are responsible for the significant characteristic features of the DNA structure. The computations of desoxydinucleoside monophosphates complexes with Na-ions using density functional theory revealed a pivotal role of DNA conformational properties of single-chain minimal fragments in the development of unique features of the Watson-Crick duplex. We found that directionality of the sugar-phosphate backbone and the preferable ranges of its torsion angles, combined with the difference between purines and pyrimidines. in ring bases, define the dependence of three-dimensional structure of the Watson-Crick duplex on nucleotide base sequence. In this work, we extended these density functional theory computations to the minimal' fragments of DNA duplex, complementary desoxydinucleoside monophosphates complexes with Na-ions. Using several computational methods and various functionals, we performed a search for energy minima of BI-conformation for complementary desoxydinucleoside monophosphates complexes with different nucleoside sequences. Two sequences are optimized using ab initio method at the MP2/6-31++G** level of theory. The analysis of torsion angles, sugar ring puckering and mutual base positions of optimized structures demonstrates that the conformational characteristic features of complementary desoxydinucleoside monophosphates complexes with Na-ions remain within BI ranges and become closer to the corresponding characteristic features of the Watson-Crick duplex crystals. Qualitatively, the main characteristic features of each studied complementary desoxydinucleoside monophosphates complex remain invariant when different computational methods are used, although the quantitative values of some conformational parameters could vary lying within the

  8. Prediction of human disease-associated phosphorylation sites with combined feature selection approach and support vector machine.

    PubMed

    Xu, Xiaoyi; Li, Ao; Wang, Minghui

    2015-08-01

    Phosphorylation is a crucial post-translational modification, which regulates almost all cellular processes in life. It has long been recognised that protein phosphorylation has close relationship with diseases, and therefore many researches are undertaken to predict phosphorylation sites for disease treatment and drug design. However, despite the success achieved by these approaches, no method focuses on disease-associated phosphorylation sites prediction. Herein, for the first time the authors propose a novel approach that is specially designed to identify associations between phosphorylation sites and human diseases. To take full advantage of local sequence information, a combined feature selection method-based support vector machine (CFS-SVM) that incorporates minimum-redundancy-maximum-relevance filtering process and forward feature selection process is developed. Performance evaluation shows that CFS-SVM is significantly better than the widely used classifiers including Bayesian decision theory, k nearest neighbour and random forest. With the extremely high specificity of 99%, CFS-SVM can still achieve a high sensitivity. Besides, tests on extra data confirm the effectiveness and general applicability of CFS-SVM approach on a variety of diseases. Finally, the analysis of selected features and corresponding kinases also help the understanding of the potential mechanism of disease-phosphorylation relationships and guide further experimental validations.

  9. CT texture features of liver parenchyma for predicting development of metastatic disease and overall survival in patients with colorectal cancer.

    PubMed

    Lee, Scott J; Zea, Ryan; Kim, David H; Lubner, Meghan G; Deming, Dustin A; Pickhardt, Perry J

    2018-04-01

    To determine if identifiable hepatic textural features are present at abdominal CT in patients with colorectal cancer (CRC) prior to the development of CT-detectable hepatic metastases. Four filtration-histogram texture features (standard deviation, skewness, entropy and kurtosis) were extracted from the liver parenchyma on portal venous phase CT images at staging and post-treatment surveillance. Surveillance scans corresponded to the last scan prior to the development of CT-detectable CRC liver metastases in 29 patients (median time interval, 6 months), and these were compared with interval-matched surveillance scans in 60 CRC patients who did not develop liver metastases. Predictive models of liver metastasis-free survival and overall survival were built using regularised Cox proportional hazards regression. Texture features did not significantly differ between cases and controls. For Cox models using all features as predictors, all coefficients were shrunk to zero, suggesting no association between any CT texture features and outcomes. Prognostic indices derived from entropy features at surveillance CT incorrectly classified patients into risk groups for future liver metastases (p < 0.001). On surveillance CT scans immediately prior to the development of CRC liver metastases, we found no evidence suggesting that changes in identifiable hepatic texture features were predictive of their development. • No correlation between liver texture features and metastasis-free survival was observed. • Liver texture features incorrectly classified patients into risk groups for liver metastases. • Standardised texture analysis workflows need to be developed to improve research reproducibility.

  10. Prediction of rat protein subcellular localization with pseudo amino acid composition based on multiple sequential features.

    PubMed

    Shi, Ruijia; Xu, Cunshuan

    2011-06-01

    The study of rat proteins is an indispensable task in experimental medicine and drug development. The function of a rat protein is closely related to its subcellular location. Based on the above concept, we construct the benchmark rat proteins dataset and develop a combined approach for predicting the subcellular localization of rat proteins. From protein primary sequence, the multiple sequential features are obtained by using of discrete Fourier analysis, position conservation scoring function and increment of diversity, and these sequential features are selected as input parameters of the support vector machine. By the jackknife test, the overall success rate of prediction is 95.6% on the rat proteins dataset. Our method are performed on the apoptosis proteins dataset and the Gram-negative bacterial proteins dataset with the jackknife test, the overall success rates are 89.9% and 96.4%, respectively. The above results indicate that our proposed method is quite promising and may play a complementary role to the existing predictors in this area.

  11. Semen molecular and cellular features: these parameters can reliably predict subsequent ART outcome in a goat model

    PubMed Central

    Berlinguer, Fiammetta; Madeddu, Manuela; Pasciu, Valeria; Succu, Sara; Spezzigu, Antonio; Satta, Valentina; Mereu, Paolo; Leoni, Giovanni G; Naitana, Salvatore

    2009-01-01

    Currently, the assessment of sperm function in a raw or processed semen sample is not able to reliably predict sperm ability to withstand freezing and thawing procedures and in vivo fertility and/or assisted reproductive biotechnologies (ART) outcome. The aim of the present study was to investigate which parameters among a battery of analyses could predict subsequent spermatozoa in vitro fertilization ability and hence blastocyst output in a goat model. Ejaculates were obtained by artificial vagina from 3 adult goats (Capra hircus) aged 2 years (A, B and C). In order to assess the predictive value of viability, computer assisted sperm analyzer (CASA) motility parameters and ATP intracellular concentration before and after thawing and of DNA integrity after thawing on subsequent embryo output after an in vitro fertility test, a logistic regression analysis was used. Individual differences in semen parameters were evident for semen viability after thawing and DNA integrity. Results of IVF test showed that spermatozoa collected from A and B lead to higher cleavage rates (0 < 0.01) and blastocysts output (p < 0.05) compared with C. Logistic regression analysis model explained a deviance of 72% (p < 0.0001), directly related with the mean percentage of rapid spermatozoa in fresh semen (p < 0.01), semen viability after thawing (p < 0.01), and with two of the three comet parameters considered, i.e tail DNA percentage and comet length (p < 0.0001). DNA integrity alone had a high predictive value on IVF outcome with frozen/thawed semen (deviance explained: 57%). The model proposed here represents one of the many possible ways to explain differences found in embryo output following IVF with different semen donors and may represent a useful tool to select the most suitable donors for semen cryopreservation. PMID:19900288

  12. Analysis of DCE-MRI features in tumor and the surrounding stroma for prediction of Ki-67 proliferation status in breast cancer

    NASA Astrophysics Data System (ADS)

    Li, Hui; Fan, Ming; Zhang, Peng; Li, Yuanzhe; Cheng, Hu; Zhang, Juan; Shao, Guoliang; Li, Lihua

    2018-03-01

    Breast cancer, with its high heterogeneity, is the most common malignancies in women. In addition to the entire tumor itself, tumor microenvironment could also play a fundamental role on the occurrence and development of tumors. The aim of this study is to investigate the role of heterogeneity within a tumor and the surrounding stromal tissue in predicting the Ki-67 proliferation status of oestrogen receptor (ER)-positive breast cancer patients. To this end, we collected 62 patients imaged with preoperative dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) for analysis. The tumor and the peritumoral stromal tissue were segmented into 8 shells with 5 mm width outside of tumor. The mean enhancement rate in the stromal shells showed a decreasing order if their distances to the tumor increase. Statistical and texture features were extracted from the tumor and the surrounding stromal bands, and multivariate logistic regression classifiers were trained and tested based on these features. An area under the receiver operating characteristic curve (AUC) were calculated to evaluate performance of the classifiers. Furthermore, the statistical model using features extracted from boundary shell next to the tumor produced AUC of 0.796+/-0.076, which is better than that using features from the other subregions. Furthermore, the prediction model using 7 features from the entire tumor produced an AUC value of 0.855+/-0.065. The classifier based on 9 selected features extracted from peritumoral stromal region showed an AUC value of 0.870+/-0.050. Finally, after fusion of the predictive model obtained from entire tumor and the peritumoral stromal regions, the classifier performance was significantly improved with AUC of 0.920. The results indicated that heterogeneity in tumor boundary and peritumoral stromal region could be valuable in predicting the indicator associated with prognosis.

  13. Predictive Features of a Cockpit Traffic Display: A Workload Assessment

    NASA Technical Reports Server (NTRS)

    Wickens, Christopher D.; Morphew, Ephimia

    1997-01-01

    Eighteen pilots flew a series of traffic avoidance maneuvers in an experiment designed to assess the support offered and workload imposed by different levels of traffic display information in a free flight simulation. Three display prototypes were compared which differed in traffic information provided. A BASELINE (BL) display provided current and (2nd order) predicted information regarding ownship and current information of an intruder aircraft, represented on lateral and vertical displays in a coplanar suite. An INTRUDER PREDICTOR (IP) display, augmented the baseline display by providing lateral and vertical prediction of the intruder aircraft. A THREAT VECTOR (TV) display added to the IP display a vector that indicates the direction from ownship to the intruder at the predicted point of closest contact (POCC). The length of the vector corresponds to the radius of the protected zone, and the distance of the intersection of the vector with ownship predictor, corresponds to the time available till POCC or loss of separation. Pilots time shared the traffic avoidance task with a secondary task requiring them to monitor the top of the display for faint targets. This task simulated the visual demands of out-of-cockpit scanning, and hence was used to estimate the head-down time required by the different display formats. The results revealed that both display augmentations improved performance (safety) as assessed by predicted and actual loss of separation (i.e., penetration of the protected zone). Both enhancements also reduced workload, as assessed by the NASA TLX scale. The intruder predictor display produced these benefits with no substantial impact on the qualitative nature of the avoidance maneuvers that were selected. The threat vector produced the safety benefits by inducing a greater degree of (effective) lateral maneuvering, thus partially offsetting the benefits of reduced workload. The three displays did not differ in terms of their effect on performance of

  14. Molecular modeling of the microstructure evolution during carbon fiber processing

    NASA Astrophysics Data System (ADS)

    Desai, Saaketh; Li, Chunyu; Shen, Tongtong; Strachan, Alejandro

    2017-12-01

    The rational design of carbon fibers with desired properties requires quantitative relationships between the processing conditions, microstructure, and resulting properties. We developed a molecular model that combines kinetic Monte Carlo and molecular dynamics techniques to predict the microstructure evolution during the processes of carbonization and graphitization of polyacrylonitrile (PAN)-based carbon fibers. The model accurately predicts the cross-sectional microstructure of the fibers with the molecular structure of the stabilized PAN fibers and physics-based chemical reaction rates as the only inputs. The resulting structures exhibit key features observed in electron microcopy studies such as curved graphitic sheets and hairpin structures. In addition, computed X-ray diffraction patterns are in good agreement with experiments. We predict the transverse moduli of the resulting fibers between 1 GPa and 5 GPa, in good agreement with experimental results for high modulus fibers and slightly lower than those of high-strength fibers. The transverse modulus is governed by sliding between graphitic sheets, and the relatively low value for the predicted microstructures can be attributed to their perfect longitudinal texture. Finally, the simulations provide insight into the relationships between chemical kinetics and the final microstructure; we observe that high reaction rates result in porous structures with lower moduli.

  15. Molecular diagnostics in the management of rhabdomyosarcoma.

    PubMed

    Arnold, Michael A; Barr, Fredric G

    2017-02-01

    A classification of rhabdomyosarcoma (RMS) with prognostic relevance has primarily relied on clinical features and histologic classification as either embryonal or alveolar RMS. The PAX3-FOXO1 and PAX7-FOXO1 gene fusions occur in 80% of cases with the alveolar subtype and are more predictive of outcome than histologic classification. Identifying additional molecular hallmarks that further subclassify RMS is an active area of research. Areas Covered: The authors review the current state of the PAX3-FOXO1 and PAX7-FOXO1 fusions as prognostic biomarkers. Emerging biomarkers, including mRNA expression profiling, MYOD1 mutations, RAS pathway mutations and gene fusions involving NCOA2 or VGLL2 are also reviewed. Expert commentary: Strategies for modifying RMS risk stratification based on molecular biomarkers are emerging with the potential to transform the clinical management of RMS, ultimately improving patient outcomes by tailoring therapy to predicted patient risk and identifying targets for novel therapies.

  16. Carbohydrate-protein interactions: molecular modeling insights.

    PubMed

    Pérez, Serge; Tvaroška, Igor

    2014-01-01

    The article reviews the significant contributions to, and the present status of, applications of computational methods for the characterization and prediction of protein-carbohydrate interactions. After a presentation of the specific features of carbohydrate modeling, along with a brief description of the experimental data and general features of carbohydrate-protein interactions, the survey provides a thorough coverage of the available computational methods and tools. At the quantum-mechanical level, the use of both molecular orbitals and density-functional theory is critically assessed. These are followed by a presentation and critical evaluation of the applications of semiempirical and empirical methods: QM/MM, molecular dynamics, free-energy calculations, metadynamics, molecular robotics, and others. The usefulness of molecular docking in structural glycobiology is evaluated by considering recent docking- validation studies on a range of protein targets. The range of applications of these theoretical methods provides insights into the structural, energetic, and mechanistic facets that occur in the course of the recognition processes. Selected examples are provided to exemplify the usefulness and the present limitations of these computational methods in their ability to assist in elucidation of the structural basis underlying the diverse function and biological roles of carbohydrates in their dialogue with proteins. These test cases cover the field of both carbohydrate biosynthesis and glycosyltransferases, as well as glycoside hydrolases. The phenomenon of (macro)molecular recognition is illustrated for the interactions of carbohydrates with such proteins as lectins, monoclonal antibodies, GAG-binding proteins, porins, and viruses. © 2014 Elsevier Inc. All rights reserved.

  17. Predicting critical micelle concentration and micelle molecular weight of polysorbate 80 using compendial methods.

    PubMed

    Braun, Alexandra C; Ilko, David; Merget, Benjamin; Gieseler, Henning; Germershaus, Oliver; Holzgrabe, Ulrike; Meinel, Lorenz

    2015-08-01

    This manuscript addresses the capability of compendial methods in controlling polysorbate 80 (PS80) functionality. Based on the analysis of sixteen batches, functionality related characteristics (FRC) including critical micelle concentration (CMC), cloud point, hydrophilic-lipophilic balance (HLB) value and micelle molecular weight were correlated to chemical composition including fatty acids before and after hydrolysis, content of non-esterified polyethylene glycols and sorbitan polyethoxylates, sorbitan- and isosorbide polyethoxylate fatty acid mono- and diesters, polyoxyethylene diesters, and peroxide values. Batches from some suppliers had a high variability in functionality related characteristic (FRC), questioning the ability of the current monograph in controlling these. Interestingly, the combined use of the input parameters oleic acid content and peroxide value - both of which being monographed methods - resulted in a model adequately predicting CMC. Confining the batches to those complying with specifications for peroxide value proved oleic acid content alone as being predictive for CMC. Similarly, a four parameter model based on chemical analyses alone was instrumental in predicting the molecular weight of PS80 micelles. Improved models based on analytical outcome from fingerprint analyses are also presented. A road map controlling PS80 batches with respect to FRC and based on chemical analyses alone is provided for the formulator. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Molecular transistors based on BDT-type molecular bridges.

    PubMed

    Wheeler, W D; Dahnovsky, Yu

    2008-10-21

    In this work we study the effect of electron correlations in molecular transistors with molecular bridges based on 1,4-benzene-dithiol (BDT) and 2-nitro-1,4-benzene-dithiol (nitro-BDT) by using ab initio electron propagator calculations. We find that there is no gate field effect for the BDT based transistor in accordance with the experimental data. After verifying the computational method on the BDT molecule, we consider a transistor with a nitro-BDT molecular bridge. From the electron propagator calculations, we predict strong negative differential resistance at small positive and negative values of source-drain voltages. The explanation of the peak and the minimum in the current is given in terms of the molecular orbital picture and switch-on (-off) properties due to the voltage dependencies of the Dyson poles (ionization potentials). When the current is off, the electronic states on both electrodes are populated resulting in the vanishing tunneling probability due to the Pauli principle. Besides the minimum and the maximum in the I-V characteristics, we find a strong gate field effect in the conductance where the peak at V(sd) = 0.15 eV and E(g) = 4x10(-3) a.u. switches to the minimum at E(g) = -4x10(-3) a.u. A similar behavior is discovered at the negative V(sd). Such a feature can be used for fast current modulation by changing the polarity of a gate field.

  19. Predicting the performance of fingerprint similarity searching.

    PubMed

    Vogt, Martin; Bajorath, Jürgen

    2011-01-01

    Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the "background." By quantifying the difference in feature distribution using the Kullback-Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.

  20. Specific molecular signatures predict decitabine response in chronic myelomonocytic leukemia

    PubMed Central

    Meldi, Kristen; Qin, Tingting; Buchi, Francesca; Droin, Nathalie; Sotzen, Jason; Micol, Jean-Baptiste; Selimoglu-Buet, Dorothée; Masala, Erico; Allione, Bernardino; Gioia, Daniela; Poloni, Antonella; Lunghi, Monia; Solary, Eric; Abdel-Wahab, Omar; Santini, Valeria; Figueroa, Maria E.

    2015-01-01

    Myelodysplastic syndromes and chronic myelomonocytic leukemia (CMML) are characterized by mutations in genes encoding epigenetic modifiers and aberrant DNA methylation. DNA methyltransferase inhibitors (DMTis) are used to treat these disorders, but response is highly variable, with few means to predict which patients will benefit. Here, we examined baseline differences in mutations, DNA methylation, and gene expression in 40 CMML patients who were responsive or resistant to decitabine (DAC) in order to develop a molecular means of predicting response at diagnosis. While somatic mutations did not differentiate responders from nonresponders, we identified 167 differentially methylated regions (DMRs) of DNA at baseline that distinguished responders from nonresponders using next-generation sequencing. These DMRs were primarily localized to nonpromoter regions and overlapped with distal regulatory enhancers. Using the methylation profiles, we developed an epigenetic classifier that accurately predicted DAC response at the time of diagnosis. Transcriptional analysis revealed differences in gene expression at diagnosis between responders and nonresponders. In responders, the upregulated genes included those that are associated with the cell cycle, potentially contributing to effective DAC incorporation. Treatment with CXCL4 and CXCL7, which were overexpressed in nonresponders, blocked DAC effects in isolated normal CD34+ and primary CMML cells, suggesting that their upregulation contributes to primary DAC resistance. PMID:25822018

  1. Associations between colorectal cancer molecular markers and pathways with clinicopathologic features in older women.

    PubMed

    Samadder, N Jewel; Vierkant, Robert A; Tillmans, Lori S; Wang, Alice H; Weisenberger, Daniel J; Laird, Peter W; Lynch, Charles F; Anderson, Kristin E; French, Amy J; Haile, Robert W; Potter, John D; Slager, Susan L; Smyrk, Thomas C; Thibodeau, Stephen N; Cerhan, James R; Limburg, Paul J

    2013-08-01

    Colorectal tumors have a large degree of molecular heterogeneity. Three integrated pathways of carcinogenesis (ie, traditional, alternate, and serrated) have been proposed, based on specific combinations of microsatellite instability (MSI), CpG island methylator phenotype (CIMP), and mutations in BRAF and KRAS. We used resources from the population-based Iowa Women's Health Study (n = 41,836) to associate markers of colorectal tumors, integrated pathways, and clinical and pathology characteristics, including survival times. We assessed archived specimens from 732 incident colorectal tumors and characterized them as microsatellite stable (MSS), MSI high or MSI low, CIMP high or CIMP low, CIMP negative, and positive or negative for BRAF and/or KRAS mutations. Informative marker data were collected from 563 tumors (77%), which were assigned to the following integrated pathways: traditional (MSS, CIMP negative, BRAF mutation negative, and KRAS mutation negative; n = 170), alternate (MSS, CIMP low, BRAF mutation negative, and KRAS mutation positive; n = 58), serrated (any MSI, CIMP high, BRAF mutation positive, and KRAS mutation negative; n = 142), or unassigned (n = 193). Multivariable-adjusted Cox proportional hazards regression models were used to assess the associations of interest. Patients' mean age (P = .03) and tumors' anatomic subsite (P = .0001) and grade (P = .0001) were significantly associated with integrated pathway assignment. Colorectal cancer (CRC) mortality was not associated with the traditional, alternate, or serrated pathways, but was associated with a subset of pathway-unassigned tumors (MSS or MSI low, CIMP negative, BRAF mutation negative, and KRAS mutation positive) (n = 96 cases; relative risk = 1.76; 95% confidence interval, 1.07-2.89, compared with the traditional pathway). We identified clinical and pathology features associated with molecularly defined CRC subtypes. However, additional studies are needed to determine how these features

  2. Molecular crosstalk between tumour and brain parenchyma instructs histopathological features in glioblastoma.

    PubMed

    Bougnaud, Sébastien; Golebiewska, Anna; Oudin, Anaïs; Keunen, Olivier; Harter, Patrick N; Mäder, Lisa; Azuaje, Francisco; Fritah, Sabrina; Stieber, Daniel; Kaoma, Tony; Vallar, Laurent; Brons, Nicolaas H C; Daubon, Thomas; Miletic, Hrvoje; Sundstrøm, Terje; Herold-Mende, Christel; Mittelbronn, Michel; Bjerkvig, Rolf; Niclou, Simone P

    2016-05-31

    The histopathological and molecular heterogeneity of glioblastomas represents a major obstacle for effective therapies. Glioblastomas do not develop autonomously, but evolve in a unique environment that adapts to the growing tumour mass and contributes to the malignancy of these neoplasms. Here, we show that patient-derived glioblastoma xenografts generated in the mouse brain from organotypic spheroids reproducibly give rise to three different histological phenotypes: (i) a highly invasive phenotype with an apparent normal brain vasculature, (ii) a highly angiogenic phenotype displaying microvascular proliferation and necrosis and (iii) an intermediate phenotype combining features of invasion and vessel abnormalities. These phenotypic differences were visible during early phases of tumour development suggesting an early instructive role of tumour cells on the brain parenchyma. Conversely, we found that tumour-instructed stromal cells differentially influenced tumour cell proliferation and migration in vitro, indicating a reciprocal crosstalk between neoplastic and non-neoplastic cells. We did not detect any transdifferentiation of tumour cells into endothelial cells. Cell type-specific transcriptomic analysis of tumour and endothelial cells revealed a strong phenotype-specific molecular conversion between the two cell types, suggesting co-evolution of tumour and endothelial cells. Integrative bioinformatic analysis confirmed the reciprocal crosstalk between tumour and microenvironment and suggested a key role for TGFβ1 and extracellular matrix proteins as major interaction modules that shape glioblastoma progression. These data provide novel insight into tumour-host interactions and identify novel stroma-specific targets that may play a role in combinatorial treatment strategies against glioblastoma.

  3. Beyond intensity: Spectral features effectively predict music-induced subjective arousal.

    PubMed

    Gingras, Bruno; Marin, Manuela M; Fitch, W Tecumseh

    2014-01-01

    Emotions in music are conveyed by a variety of acoustic cues. Notably, the positive association between sound intensity and arousal has particular biological relevance. However, although amplitude normalization is a common procedure used to control for intensity in music psychology research, direct comparisons between emotional ratings of original and amplitude-normalized musical excerpts are lacking. In this study, 30 nonmusicians retrospectively rated the subjective arousal and pleasantness induced by 84 six-second classical music excerpts, and an additional 30 nonmusicians rated the same excerpts normalized for amplitude. Following the cue-redundancy and Brunswik lens models of acoustic communication, we hypothesized that arousal and pleasantness ratings would be similar for both versions of the excerpts, and that arousal could be predicted effectively by other acoustic cues besides intensity. Although the difference in mean arousal and pleasantness ratings between original and amplitude-normalized excerpts correlated significantly with the amplitude adjustment, ratings for both sets of excerpts were highly correlated and shared a similar range of values, thus validating the use of amplitude normalization in music emotion research. Two acoustic parameters, spectral flux and spectral entropy, accounted for 65% of the variance in arousal ratings for both sets, indicating that spectral features can effectively predict arousal. Additionally, we confirmed that amplitude-normalized excerpts were adequately matched for loudness. Overall, the results corroborate our hypotheses and support the cue-redundancy and Brunswik lens models.

  4. The Role of Molecular Diagnostics in the Management of Patients with Gliomas.

    PubMed

    Wirsching, Hans-Georg; Weller, Michael

    2016-10-01

    The revised World Health Organization (WHO) classification of tumors of the central nervous system of 2016 combines biology-driven molecular marker diagnostics with classical histological cancer diagnosis. Reclassification of gliomas by molecular similarity beyond histological boundaries improves outcome prediction and will increasingly guide treatment decisions. This change in paradigms implies more personalized and eventually more efficient therapeutic approaches, but the era of molecular targeted therapies for gliomas is yet at its onset. Promising results of molecularly targeted therapies in genetically less complex gliomas with circumscribed growth such as subependymal giant cell astrocytoma or pilocytic astrocytoma support further development of molecularly targeted therapies. In diffuse gliomas, several molecular markers that predict benefit from alkylating agent chemotherapy have been identified in recent years. For example, co-deletion of chromosome arms 1p and 19q predicts benefit from polychemotherapy with procarbazine, CCNU (lomustine), and vincristine (PCV) in patients with anaplastic oligodendroglioma, and the presence of 1p/19q co-deletion was integrated as a defining feature of oligodendroglial tumors in the revised WHO classification. However, the tremendous increase in knowledge of molecular drivers of diffuse gliomas on genomic, epigenetic, and gene expression levels has not yet translated into effective molecular targeted therapies. Multiple reasons account for the failure of early clinical trials of molecularly targeted therapies in diffuse gliomas, including the lack of molecular entry controls as well as pharmacokinetic and pharmacodynamics issues, but the key challenge of specifically targeting the molecular backbone of diffuse gliomas is probably extensive clonal heterogeneity. A more profound understanding of clonal selection, alternative activation of oncogenic signaling pathways, and genomic instability is warranted to identify effective

  5. Gastric tumours in hereditary cancer syndromes: clinical features, molecular biology and strategies for prevention.

    PubMed

    Sereno, María; Aguayo, Cristina; Guillén Ponce, Carmen; Gómez-Raposo, César; Zambrana, Francisco; Gómez-López, Miriam; Casado, Enrique

    2011-09-01

    Gastric cancer is the major cause of cancer-related deaths worldwide. The majority of them are classified as sporadic, whereas the remaining 10% exhibit familial clustering. Hereditary diffuse gastric cancer (HDGC) syndrome is the most important condition that leads to hereditary gastric cancer. However, other hereditary cancer syndromes, such as hereditary non-polyposis colorectal cancer, familial adenomatous polyposis, Peutz-Jeghers syndrome, Li-Fraumeni syndrome and hereditary breast and ovarian cancer, entail a higher risk compared to the general population for developing this kind of neoplasia. In this review, we describe briefly the most important aspects related to clinical features, molecular biology and strategies for prevention in hereditary gastric associated to different cancer syndromes.

  6. TU-D-207B-01: A Prediction Model for Distinguishing Radiation Necrosis From Tumor Progression After Gamma Knife Radiosurgery Based On Radiomics Features From MR Images

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Z; MD Anderson Cancer Center, Houston, TX; Ho, A

    Purpose: To develop and validate a prediction model using radiomics features extracted from MR images to distinguish radiation necrosis from tumor progression for brain metastases treated with Gamma knife radiosurgery. Methods: The images used to develop the model were T1 post-contrast MR scans from 71 patients who had had pathologic confirmation of necrosis or progression; 1 lesion was identified per patient (17 necrosis and 54 progression). Radiomics features were extracted from 2 images at 2 time points per patient, both obtained prior to resection. Each lesion was manually contoured on each image, and 282 radiomics features were calculated for eachmore » lesion. The correlation for each radiomics feature between two time points was calculated within each group to identify a subset of features with distinct values between two groups. The delta of this subset of radiomics features, characterizing changes from the earlier time to the later one, was included as a covariate to build a prediction model using support vector machines with a cubic polynomial kernel function. The model was evaluated with a 10-fold cross-validation. Results: Forty radiomics features were selected based on consistent correlation values of approximately 0 for the necrosis group and >0.2 for the progression group. In performing the 10-fold cross-validation, we narrowed this number down to 11 delta radiomics features for the model. This 11-delta-feature model showed an overall prediction accuracy of 83.1%, with a true positive rate of 58.8% in predicting necrosis and 90.7% for predicting tumor progression. The area under the curve for the prediction model was 0.79. Conclusion: These delta radiomics features extracted from MR scans showed potential for distinguishing radiation necrosis from tumor progression. This tool may be a useful, noninvasive means of determining the status of an enlarging lesion after radiosurgery, aiding decision-making regarding surgical resection versus conservative

  7. Predicting Ki67% expression from DCE-MR images of breast tumors using textural kinetic features in tumor habitats

    NASA Astrophysics Data System (ADS)

    Chaudhury, Baishali; Zhou, Mu; Farhidzadeh, Hamidreza; Goldgof, Dmitry B.; Hall, Lawrence O.; Gatenby, Robert A.; Gillies, Robert J.; Weinfurtner, Robert J.; Drukteinis, Jennifer S.

    2016-03-01

    The use of Ki67% expression, a cell proliferation marker, as a predictive and prognostic factor has been widely studied in the literature. Yet its usefulness is limited due to inconsistent cut off scores for Ki67% expression, subjective differences in its assessment in various studies, and spatial variation in expression, which makes it difficult to reproduce as a reliable independent prognostic factor. Previous studies have shown that there are significant spatial variations in Ki67% expression, which may limit its clinical prognostic utility after core biopsy. These variations are most evident when examining the periphery of the tumor vs. the core. To date, prediction of Ki67% expression from quantitative image analysis of DCE-MRI is very limited. This work presents a novel computer aided diagnosis framework to use textural kinetics to (i) predict the ratio of periphery Ki67% expression to core Ki67% expression, and (ii) predict Ki67% expression from individual tumor habitats. The pilot cohort consists of T1 weighted fat saturated DCE-MR images from 17 patients. Support vector regression with a radial basis function was used for predicting the Ki67% expression and ratios. The initial results show that texture features from individual tumor habitats are more predictive of the Ki67% expression ratio and spatial Ki67% expression than features from the whole tumor. The Ki67% expression ratio could be predicted with a root mean square error (RMSE) of 1.67%. Quantitative image analysis of DCE-MRI using textural kinetic habitats, has the potential to be used as a non-invasive method for predicting Ki67 percentage and ratio, thus more accurately reporting high KI-67 expression for patient prognosis.

  8. Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction.

    PubMed

    Bossi, Flavia; Fan, Jue; Xiao, Jun; Chandra, Lilyana; Shen, Max; Dorone, Yanniv; Wagner, Doris; Rhee, Seung Y

    2017-06-26

    The molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. To identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation. We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.

  9. Deep learning-based features of breast MRI for prediction of occult invasive disease following a diagnosis of ductal carcinoma in situ: preliminary data

    NASA Astrophysics Data System (ADS)

    Zhu, Zhe; Harowicz, Michael; Zhang, Jun; Saha, Ashirbani; Grimm, Lars J.; Hwang, Shelley; Mazurowski, Maciej A.

    2018-02-01

    Approximately 25% of patients with ductal carcinoma in situ (DCIS) diagnosed from core needle biopsy are subsequently upstaged to invasive cancer at surgical excision. Identifying patients with occult invasive disease is important as it changes treatment and precludes enrollment in active surveillance for DCIS. In this study, we investigated upstaging of DCIS to invasive disease using deep features. While deep neural networks require large amounts of training data, the available data to predict DCIS upstaging is sparse and thus directly training a neural network is unlikely to be successful. In this work, a pre-trained neural network is used as a feature extractor and a support vector machine (SVM) is trained on the extracted features. We used the dynamic contrast-enhanced (DCE) MRIs of patients at our institution from January 1, 2000, through March 23, 2014 who underwent MRI following a diagnosis of DCIS. Among the 131 DCIS patients, there were 35 patients who were upstaged to invasive cancer. Area under the ROC curve within the 10-fold cross-validation scheme was used for validation of our predictive model. The use of deep features was able to achieve an AUC of 0.68 (95% CI: 0.56-0.78) to predict occult invasive disease. This preliminary work demonstrates the promise of deep features to predict surgical upstaging following a diagnosis of DCIS.

  10. Cross-Platform Toxicogenomics for the Prediction of Non-Genotoxic Hepatocarcinogenesis in Rat

    PubMed Central

    Metzger, Ute; Templin, Markus F.; Plummer, Simon; Ellinger-Ziegelbauer, Heidrun; Zell, Andreas

    2014-01-01

    In the area of omics profiling in toxicology, i.e. toxicogenomics, characteristic molecular profiles have previously been incorporated into prediction models for early assessment of a carcinogenic potential and mechanism-based classification of compounds. Traditionally, the biomarker signatures used for model construction were derived from individual high-throughput techniques, such as microarrays designed for monitoring global mRNA expression. In this study, we built predictive models by integrating omics data across complementary microarray platforms and introduced new concepts for modeling of pathway alterations and molecular interactions between multiple biological layers. We trained and evaluated diverse machine learning-based models, differing in the incorporated features and learning algorithms on a cross-omics dataset encompassing mRNA, miRNA, and protein expression profiles obtained from rat liver samples treated with a heterogeneous set of substances. Most of these compounds could be unambiguously classified as genotoxic carcinogens, non-genotoxic carcinogens, or non-hepatocarcinogens based on evidence from published studies. Since mixed characteristics were reported for the compounds Cyproterone acetate, Thioacetamide, and Wy-14643, we reclassified these compounds as either genotoxic or non-genotoxic carcinogens based on their molecular profiles. Evaluating our toxicogenomics models in a repeated external cross-validation procedure, we demonstrated that the prediction accuracy of our models could be increased by joining the biomarker signatures across multiple biological layers and by adding complex features derived from cross-platform integration of the omics data. Furthermore, we found that adding these features resulted in a better separation of the compound classes and a more confident reclassification of the three undefined compounds as non-genotoxic carcinogens. PMID:24830643

  11. Predicting epidermal growth factor receptor gene amplification status in glioblastoma multiforme by quantitative enhancement and necrosis features deriving from conventional magnetic resonance imaging.

    PubMed

    Dong, Fei; Zeng, Qiang; Jiang, Biao; Yu, Xinfeng; Wang, Weiwei; Xu, Jingjing; Yu, Jinna; Li, Qian; Zhang, Minming

    2018-05-01

    To study whether some of the quantitative enhancement and necrosis features in preoperative conventional MRI (cMRI) had a predictive value for epidermal growth factor receptor (EGFR) gene amplification status in glioblastoma multiforme (GBM).Fifty-five patients with pathologically determined GBMs who underwent cMRI were retrospectively reviewed. The following cMRI features were quantitatively measured and recorded: long and short diameters of the enhanced portion (LDE and SDE), maximum and minimum thickness of the enhanced portion (MaxTE and MinTE), and long and short diameters of the necrotic portion (LDN and SDN). Univariate analysis of each feature and a decision tree model fed with all the features were performed. Area under the receiver operating characteristic (ROC) curve (AUC) was used to assess the performance of features, and predictive accuracy was used to assess the performance of the model.For single feature, MinTE showed the best performance in differentiating EGFR gene amplification negative (wild-type) (nEGFR) GBM from EGFR gene amplification positive (pEGFR) GBM, and it got an AUC of 0.68 with a cut-off value of 2.6 mm. The decision tree model included 2 features MinTE and SDN, and got an accuracy of 0.83 in validation dataset.Our results suggest that quantitative measurement of the features MinTE and SDN in preoperative cMRI had a high accuracy for predicting EGFR gene amplification status in GBM.

  12. The value of DCE-MRI in assessing histopathological and molecular biological features in induced rat epithelial ovarian carcinomas.

    PubMed

    Yuan, Su Juan; Qiao, Tian Kui; Qiang, Jin Wei; Cai, Song Qi; Li, Ruo Kun

    2017-09-26

    To investigate dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) for assessing histopathological and molecular biological features in induced rat epithelial ovarian carcinomas (EOCs). 7,12-dimethylbenz[A]anthracene (DMBA) was applied to induce EOCs in situ in 46 SD rats. Conventional MRI and DCE-MRI were performed to evaluate the morphology and perfusion features of the tumors, including the time-signal intensity curve (TIC), volume transfer constant (K trans ), rate constant (K ep ), extravascular extracellular space volume ratio (V e ) and initial area under the curve (IAUC). DCE-MRI parameters were correlated with histological grade, microvascular density (MVD), vascular endothelial growth factor (VEGF) and fraction of Ki67-positive cells and the serum level of cancer antigen 125 (CA125). Thirty-five of the 46 rats developed EOCs. DCE-MRI showed type III TIC more frequently than type II (29/35 vs. 6/35, p < 0.001) in EOCs. The two types of TIC of tumors had significant differences in the histological grade, MVD, expression of VEGF and Ki67, and the serum level of CA125 (all p < 0.01). K trans , K ep and IAUC values showed significant differences in different histological grades in overall and pairwise comparisons except for IAUC in grade 2 vs. grade 3 (all p < 0.01). There was no significant difference in V e values among the three grade groups (p > 0.05). K trans , K ep and IAUC values were positively correlated with MVD, VEGF and Ki67 expression (all p < 0.01). V e was not significantly correlated with MVD, VEGF expression, Ki67 expression and the CA125 level (all p > 0.05). TIC types and perfusion parameters of DCE-MRI can reflect tumor grade, angiogenesis and cell proliferation to some extent, thereby helping treatment planning and predicting prognosis.

  13. Predicting Displaceable Water Sites Using Mixed-Solvent Molecular Dynamics.

    PubMed

    Graham, Sarah E; Smith, Richard D; Carlson, Heather A

    2018-02-26

    Water molecules are an important factor in protein-ligand binding. Upon binding of a ligand with a protein's surface, waters can either be displaced by the ligand or may be conserved and possibly bridge interactions between the protein and ligand. Depending on the specific interactions made by the ligand, displacing waters can yield a gain in binding affinity. The extent to which binding affinity may increase is difficult to predict, as the favorable displacement of a water molecule is dependent on the site-specific interactions made by the water and the potential ligand. Several methods have been developed to predict the location of water sites on a protein's surface, but the majority of methods are not able to take into account both protein dynamics and the interactions made by specific functional groups. Mixed-solvent molecular dynamics (MixMD) is a cosolvent simulation technique that explicitly accounts for the interaction of both water and small molecule probes with a protein's surface, allowing for their direct competition. This method has previously been shown to identify both active and allosteric sites on a protein's surface. Using a test set of eight systems, we have developed a method using MixMD to identify conserved and displaceable water sites. Conserved sites can be determined by an occupancy-based metric to identify sites which are consistently occupied by water even in the presence of probe molecules. Conversely, displaceable water sites can be found by considering the sites which preferentially bind probe molecules. Furthermore, the inclusion of six probe types allows the MixMD method to predict which functional groups are capable of displacing which water sites. The MixMD method consistently identifies sites which are likely to be nondisplaceable and predicts the favorable displacement of water sites that are known to be displaced upon ligand binding.

  14. Using multiple classifiers for predicting the risk of endovascular aortic aneurysm repair re-intervention through hybrid feature selection.

    PubMed

    Attallah, Omneya; Karthikesalingam, Alan; Holt, Peter Je; Thompson, Matthew M; Sayers, Rob; Bown, Matthew J; Choke, Eddie C; Ma, Xianghong

    2017-11-01

    Feature selection is essential in medical area; however, its process becomes complicated with the presence of censoring which is the unique character of survival analysis. Most survival feature selection methods are based on Cox's proportional hazard model, though machine learning classifiers are preferred. They are less employed in survival analysis due to censoring which prevents them from directly being used to survival data. Among the few work that employed machine learning classifiers, partial logistic artificial neural network with auto-relevance determination is a well-known method that deals with censoring and perform feature selection for survival data. However, it depends on data replication to handle censoring which leads to unbalanced and biased prediction results especially in highly censored data. Other methods cannot deal with high censoring. Therefore, in this article, a new hybrid feature selection method is proposed which presents a solution to high level censoring. It combines support vector machine, neural network, and K-nearest neighbor classifiers using simple majority voting and a new weighted majority voting method based on survival metric to construct a multiple classifier system. The new hybrid feature selection process uses multiple classifier system as a wrapper method and merges it with iterated feature ranking filter method to further reduce features. Two endovascular aortic repair datasets containing 91% censored patients collected from two centers were used to construct a multicenter study to evaluate the performance of the proposed approach. The results showed the proposed technique outperformed individual classifiers and variable selection methods based on Cox's model such as Akaike and Bayesian information criterions and least absolute shrinkage and selector operator in p values of the log-rank test, sensitivity, and concordance index. This indicates that the proposed classifier is more powerful in correctly predicting the risk of

  15. Assessment of global and local region-based bilateral mammographic feature asymmetry to predict short-term breast cancer risk

    NASA Astrophysics Data System (ADS)

    Li, Yane; Fan, Ming; Cheng, Hu; Zhang, Peng; Zheng, Bin; Li, Lihua

    2018-01-01

    This study aims to develop and test a new imaging marker-based short-term breast cancer risk prediction model. An age-matched dataset of 566 screening mammography cases was used. All ‘prior’ images acquired in the two screening series were negative, while in the ‘current’ screening images, 283 cases were positive for cancer and 283 cases remained negative. For each case, two bilateral cranio-caudal view mammograms acquired from the ‘prior’ negative screenings were selected and processed by a computer-aided image processing scheme, which segmented the entire breast area into nine strip-based local regions, extracted the element regions using difference of Gaussian filters, and computed both global- and local-based bilateral asymmetrical image features. An initial feature pool included 190 features related to the spatial distribution and structural similarity of grayscale values, as well as of the magnitude and phase responses of multidirectional Gabor filters. Next, a short-term breast cancer risk prediction model based on a generalized linear model was built using an embedded stepwise regression analysis method to select features and a leave-one-case-out cross-validation method to predict the likelihood of each woman having image-detectable cancer in the next sequential mammography screening. The area under the receiver operating characteristic curve (AUC) values significantly increased from 0.5863  ±  0.0237 to 0.6870  ±  0.0220 when the model trained by the image features extracted from the global regions and by the features extracted from both the global and the matched local regions (p  =  0.0001). The odds ratio values monotonically increased from 1.00-8.11 with a significantly increasing trend in slope (p  =  0.0028) as the model-generated risk score increased. In addition, the AUC values were 0.6555  ±  0.0437, 0.6958  ±  0.0290, and 0.7054  ±  0.0529 for the three age groups of 37

  16. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    PubMed

    Choi, Ickwon; Chung, Amy W; Suscovich, Todd J; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J; Francis, Donald; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Alter, Galit; Ackerman, Margaret E; Bailey-Kellogg, Chris

    2015-04-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  17. Machine Learning Methods Enable Predictive Modeling of Antibody Feature:Function Relationships in RV144 Vaccinees

    PubMed Central

    Choi, Ickwon; Chung, Amy W.; Suscovich, Todd J.; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J.; Francis, Donald; Robb, Merlin L.; Michael, Nelson L.; Kim, Jerome H.; Alter, Galit; Ackerman, Margaret E.; Bailey-Kellogg, Chris

    2015-01-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates. PMID:25874406

  18. Detecting molecular features of spectra mainly associated with structural and non-structural carbohydrates in co-products from bioEthanol production using DRIFT with uni- and multivariate molecular spectral analyses.

    PubMed

    Yu, Peiqiang; Damiran, Daalkhaijav; Azarfar, Arash; Niu, Zhiyuan

    2011-01-01

    The objective of this study was to use DRIFT spectroscopy with uni- and multivariate molecular spectral analyses as a novel approach to detect molecular features of spectra mainly associated with carbohydrate in the co-products (wheat DDGS, corn DDGS, blend DDGS) from bioethanol processing in comparison with original feedstock (wheat (Triticum), corn (Zea mays)). The carbohydrates related molecular spectral bands included: A_Cell (structural carbohydrates, peaks area region and baseline: ca. 1485-1188 cm(-1)), A_1240 (structural carbohydrates, peak area centered at ca. 1240 cm(-1) with region and baseline: ca. 1292-1198 cm(-1)), A_CHO (total carbohydrates, peaks region and baseline: ca. 1187-950 cm(-1)), A_928 (non-structural carbohydrates, peak area centered at ca. 928 cm(-1) with region and baseline: ca. 952-910 cm(-1)), A_860 (non-structural carbohydrates, peak area centered at ca. 860 cm(-1) with region and baseline: ca. 880-827 cm(-1)), H_1415 (structural carbohydrate, peak height centered at ca. 1415 cm(-1) with baseline: ca. 1485-1188 cm(-1)), H_1370 (structural carbohydrate, peak height at ca. 1370 cm(-1) with a baseline: ca. 1485-1188 cm(-1)). The study shows that the grains had lower spectral intensity (KM Unit) of the cellulosic compounds of A_1240 (8.5 vs. 36.6, P < 0.05), higher (P < 0.05) intensities of the non-structural carbohydrate of A_928 (17.3 vs. 2.0) and A_860 (20.7 vs. 7.6) than their co-products from bioethanol processing. There were no differences (P > 0.05) in the peak area intensities of A_Cell (structural CHO) at 1292-1198 cm(-1) and A_CHO (total CHO) at 1187-950 cm(-1) with average molecular infrared intensity KM unit of 226.8 and 508.1, respectively. There were no differences (P > 0.05) in the peak height intensities of H_1415 and H_1370 (structural CHOs) with average intensities 1.35 and 1.15, respectively. The multivariate molecular spectral analyses were able to discriminate and classify between the corn and corn DDGS molecular

  19. A signal detection model predicts the effects of set size on visual search accuracy for feature, conjunction, triple conjunction, and disjunction displays

    NASA Technical Reports Server (NTRS)

    Eckstein, M. P.; Thomas, J. P.; Palmer, J.; Shimozaki, S. S.

    2000-01-01

    Recently, quantitative models based on signal detection theory have been successfully applied to the prediction of human accuracy in visual search for a target that differs from distractors along a single attribute (feature search). The present paper extends these models for visual search accuracy to multidimensional search displays in which the target differs from the distractors along more than one feature dimension (conjunction, disjunction, and triple conjunction displays). The model assumes that each element in the display elicits a noisy representation for each of the relevant feature dimensions. The observer combines the representations across feature dimensions to obtain a single decision variable, and the stimulus with the maximum value determines the response. The model accurately predicts human experimental data on visual search accuracy in conjunctions and disjunctions of contrast and orientation. The model accounts for performance degradation without resorting to a limited-capacity spatially localized and temporally serial mechanism by which to bind information across feature dimensions.

  20. The solid-state terahertz spectrum of MDMA (Ecstasy) - A unique test for molecular modeling assignments

    NASA Astrophysics Data System (ADS)

    Allis, Damian G.; Hakey, Patrick M.; Korter, Timothy M.

    2008-10-01

    The terahertz (THz, far-infrared) spectrum of 3,4-methylene-dioxymethamphetamine hydrochloride (Ecstasy) is simulated using solid-state density functional theory. While a previously reported isolated-molecule calculation is noteworthy for the precision of its solid-state THz reproduction, the solid-state calculation predicts that the isolated-molecule modes account for only half of the spectral features in the THz region, with the remaining structure arising from lattice vibrations that cannot be predicted without solid-state molecular modeling. The molecular origins of the internal mode contributions to the solid-state THz spectrum, as well as the proper consideration of the protonation state of the molecule, are also considered.

  1. A Systematic Prediction of Drug-Target Interactions Using Molecular Fingerprints and Protein Sequences.

    PubMed

    Huang, Yu-An; You, Zhu-Hong; Chen, Xing

    2018-01-01

    Drug-Target Interactions (DTI) play a crucial role in discovering new drug candidates and finding new proteins to target for drug development. Although the number of detected DTI obtained by high-throughput techniques has been increasing, the number of known DTI is still limited. On the other hand, the experimental methods for detecting the interactions among drugs and proteins are costly and inefficient. Therefore, computational approaches for predicting DTI are drawing increasing attention in recent years. In this paper, we report a novel computational model for predicting the DTI using extremely randomized trees model and protein amino acids information. More specifically, the protein sequence is represented as a Pseudo Substitution Matrix Representation (Pseudo-SMR) descriptor in which the influence of biological evolutionary information is retained. For the representation of drug molecules, a novel fingerprint feature vector is utilized to describe its substructure information. Then the DTI pair is characterized by concatenating the two vector spaces of protein sequence and drug substructure. Finally, the proposed method is explored for predicting the DTI on four benchmark datasets: Enzyme, Ion Channel, GPCRs and Nuclear Receptor. The experimental results demonstrate that this method achieves promising prediction accuracies of 89.85%, 87.87%, 82.99% and 81.67%, respectively. For further evaluation, we compared the performance of Extremely Randomized Trees model with that of the state-of-the-art Support Vector Machine classifier. And we also compared the proposed model with existing computational models, and confirmed 15 potential drug-target interactions by looking for existing databases. The experiment results show that the proposed method is feasible and promising for predicting drug-target interactions for new drug candidate screening based on sizeable features. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Molecular Diagnosis and Biomarker Identification on SELDI proteomics data by ADTBoost method.

    PubMed

    Wang, Lu-Yong; Chakraborty, Amit; Comaniciu, Dorin

    2005-01-01

    Clinical proteomics is an emerging field that will have great impact on molecular diagnosis, identification of disease biomarkers, drug discovery and clinical trials in the post-genomic era. Protein profiling in tissues and fluids in disease and pathological control and other proteomics techniques will play an important role in molecular diagnosis with therapeutics and personalized healthcare. We introduced a new robust diagnostic method based on ADTboost algorithm, a novel algorithm in proteomics data analysis to improve classification accuracy. It generates classification rules, which are often smaller and easier to interpret. This method often gives most discriminative features, which can be utilized as biomarkers for diagnostic purpose. Also, it has a nice feature of providing a measure of prediction confidence. We carried out this method in amyotrophic lateral sclerosis (ALS) disease data acquired by surface enhanced laser-desorption/ionization-time-of-flight mass spectrometry (SELDI-TOF MS) experiments. Our method is shown to have outstanding prediction capacity through the cross-validation, ROC analysis results and comparative study. Our molecular diagnosis method provides an efficient way to distinguish ALS disease from neurological controls. The results are expressed in a simple and straightforward alternating decision tree format or conditional format. We identified most discriminative peaks in proteomic data, which can be utilized as biomarkers for diagnosis. It will have broad application in molecular diagnosis through proteomics data analysis and personalized medicine in this post-genomic era.

  3. Electronic coarse graining enhances the predictive power of molecular simulation allowing challenges in water physics to be addressed

    NASA Astrophysics Data System (ADS)

    Cipcigan, Flaviu S.; Sokhan, Vlad P.; Crain, Jason; Martyna, Glenn J.

    2016-12-01

    One key factor that limits the predictive power of molecular dynamics simulations is the accuracy and transferability of the input force field. Force fields are challenged by heterogeneous environments, where electronic responses give rise to biologically important forces such as many-body polarisation and dispersion. The importance of polarisation in the condensed phase was recognised early on, as described by Cochran in 1959 [Philosophical Magazine 4 (1959) 1082-1086] [32]. Currently in molecular simulation, dispersion forces are treated at the two-body level and in the dipole limit, although the importance of three-body terms in the condensed phase was demonstrated by Barker in the 1980s [Phys. Rev. Lett. 57 (1986) 230-233] [72]. One approach for treating both polarisation and dispersion on an equal basis is to coarse grain the electrons surrounding a molecular moiety to a single quantum harmonic oscillator (cf. Hirschfelder, Curtiss and Bird 1954 [The Molecular Theory of Gases and Liquids (1954)] [37]). The approach, when solved in strong coupling beyond the dipole limit, gives a description of long-range forces that includes two- and many-body terms to all orders. In the last decade, the tools necessary to implement the strong coupling limit have been developed, culminating in a transferable model of water with excellent predictive power across the phase diagram. Transferability arises since the environment automatically identifies the important long range interactions, rather than the modeller through a limited set of expressions. Here, we discuss the role of electronic coarse-graining in predictive multiscale materials modelling and describe the first implementation of the method in a general purpose molecular dynamics software: QDO_MD.

  4. Clinical and molecular features of human rhinovirus C

    PubMed Central

    Bochkov, Yury A.; Gern, James E.

    2012-01-01

    A newly discovered group of human rhinoviruses (HRVs) has been classified as the HRV-C species based on distinct genomic features. HRV-Cs circulate worldwide, and are important causes of upper and lower respiratory illnesses. Methods to culture and produce these viruses have recently been developed, and should enable identification of unique features of HRV-C replication and biology. PMID:22285901

  5. The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features.

    PubMed

    Cui, Zaixu; Gong, Gaolang

    2018-06-02

    Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of

  6. Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships.

    PubMed

    Janet, Jon Paul; Kulik, Heather J

    2017-11-22

    Machine learning (ML) of quantum mechanical properties shows promise for accelerating chemical discovery. For transition metal chemistry where accurate calculations are computationally costly and available training data sets are small, the molecular representation becomes a critical ingredient in ML model predictive accuracy. We introduce a series of revised autocorrelation functions (RACs) that encode relationships of the heuristic atomic properties (e.g., size, connectivity, and electronegativity) on a molecular graph. We alter the starting point, scope, and nature of the quantities evaluated in standard ACs to make these RACs amenable to inorganic chemistry. On an organic molecule set, we first demonstrate superior standard AC performance to other presently available topological descriptors for ML model training, with mean unsigned errors (MUEs) for atomization energies on set-aside test molecules as low as 6 kcal/mol. For inorganic chemistry, our RACs yield 1 kcal/mol ML MUEs on set-aside test molecules in spin-state splitting in comparison to 15-20× higher errors for feature sets that encode whole-molecule structural information. Systematic feature selection methods including univariate filtering, recursive feature elimination, and direct optimization (e.g., random forest and LASSO) are compared. Random-forest- or LASSO-selected subsets 4-5× smaller than the full RAC set produce sub- to 1 kcal/mol spin-splitting MUEs, with good transferability to metal-ligand bond length prediction (0.004-5 Å MUE) and redox potential on a smaller data set (0.2-0.3 eV MUE). Evaluation of feature selection results across property sets reveals the relative importance of local, electronic descriptors (e.g., electronegativity, atomic number) in spin-splitting and distal, steric effects in redox potential and bond lengths.

  7. LICRE: unsupervised feature correlation reduction for lipidomics.

    PubMed

    Wong, Gerard; Chan, Jeffrey; Kingwell, Bronwyn A; Leckie, Christopher; Meikle, Peter J

    2014-10-01

    Recent advances in high-throughput lipid profiling by liquid chromatography electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) have made it possible to quantify hundreds of individual molecular lipid species (e.g. fatty acyls, glycerolipids, glycerophospholipids, sphingolipids) in a single experimental run for hundreds of samples. This enables the lipidome of large cohorts of subjects to be profiled to identify lipid biomarkers significantly associated with disease risk, progression and treatment response. Clinically, these lipid biomarkers can be used to construct classification models for the purpose of disease screening or diagnosis. However, the inclusion of a large number of highly correlated biomarkers within a model may reduce classification performance, unnecessarily inflate associated costs of a diagnosis or a screen and reduce the feasibility of clinical translation. An unsupervised feature reduction approach can reduce feature redundancy in lipidomic biomarkers by limiting the number of highly correlated lipids while retaining informative features to achieve good classification performance for various clinical outcomes. Good predictive models based on a reduced number of biomarkers are also more cost effective and feasible from a clinical translation perspective. The application of LICRE to various lipidomic datasets in diabetes and cardiovascular disease demonstrated superior discrimination in terms of the area under the receiver operator characteristic curve while using fewer lipid markers when predicting various clinical outcomes. The MATLAB implementation of LICRE is available from http://ww2.cs.mu.oz.au/∼gwong/LICRE © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Molecular crosstalk between tumour and brain parenchyma instructs histopathological features in glioblastoma

    PubMed Central

    Bougnaud, Sébastien; Golebiewska, Anna; Oudin, Anaïs; Keunen, Olivier; Harter, Patrick N.; Mäder, Lisa; Azuaje, Francisco; Fritah, Sabrina; Stieber, Daniel; Kaoma, Tony; Vallar, Laurent; Brons, Nicolaas H.C.; Daubon, Thomas; Miletic, Hrvoje; Sundstrøm, Terje; Herold-Mende, Christel; Mittelbronn, Michel; Bjerkvig, Rolf; Niclou, Simone P.

    2016-01-01

    The histopathological and molecular heterogeneity of glioblastomas represents a major obstacle for effective therapies. Glioblastomas do not develop autonomously, but evolve in a unique environment that adapts to the growing tumour mass and contributes to the malignancy of these neoplasms. Here, we show that patient-derived glioblastoma xenografts generated in the mouse brain from organotypic spheroids reproducibly give rise to three different histological phenotypes: (i) a highly invasive phenotype with an apparent normal brain vasculature, (ii) a highly angiogenic phenotype displaying microvascular proliferation and necrosis and (iii) an intermediate phenotype combining features of invasion and vessel abnormalities. These phenotypic differences were visible during early phases of tumour development suggesting an early instructive role of tumour cells on the brain parenchyma. Conversely, we found that tumour-instructed stromal cells differentially influenced tumour cell proliferation and migration in vitro, indicating a reciprocal crosstalk between neoplastic and non-neoplastic cells. We did not detect any transdifferentiation of tumour cells into endothelial cells. Cell type-specific transcriptomic analysis of tumour and endothelial cells revealed a strong phenotype-specific molecular conversion between the two cell types, suggesting co-evolution of tumour and endothelial cells. Integrative bioinformatic analysis confirmed the reciprocal crosstalk between tumour and microenvironment and suggested a key role for TGFβ1 and extracellular matrix proteins as major interaction modules that shape glioblastoma progression. These data provide novel insight into tumour-host interactions and identify novel stroma-specific targets that may play a role in combinatorial treatment strategies against glioblastoma. PMID:27049916

  9. Search performance is better predicted by tileability than presence of a unique basic feature.

    PubMed

    Chang, Honghua; Rosenholtz, Ruth

    2016-08-01

    Traditional models of visual search such as feature integration theory (FIT; Treisman & Gelade, 1980), have suggested that a key factor determining task difficulty consists of whether or not the search target contains a "basic feature" not found in the other display items (distractors). Here we discriminate between such traditional models and our recent texture tiling model (TTM) of search (Rosenholtz, Huang, Raj, Balas, & Ilie, 2012b), by designing new experiments that directly pit these models against each other. Doing so is nontrivial, for two reasons. First, the visual representation in TTM is fully specified, and makes clear testable predictions, but its complexity makes getting intuitions difficult. Here we elucidate a rule of thumb for TTM, which enables us to easily design new and interesting search experiments. FIT, on the other hand, is somewhat ill-defined and hard to pin down. To get around this, rather than designing totally new search experiments, we start with five classic experiments that FIT already claims to explain: T among Ls, 2 among 5s, Q among Os, O among Qs, and an orientation/luminance-contrast conjunction search. We find that fairly subtle changes in these search tasks lead to significant changes in performance, in a direction predicted by TTM, providing definitive evidence in favor of the texture tiling model as opposed to traditional views of search.

  10. Molecular structure of bottlebrush polymers in melts

    PubMed Central

    Paturej, Jarosław; Sheiko, Sergei S.; Panyukov, Sergey; Rubinstein, Michael

    2016-01-01

    Bottlebrushes are fascinating macromolecules that display an intriguing combination of molecular and particulate features having vital implications in both living and synthetic systems, such as cartilage and ultrasoft elastomers. However, the progress in practical applications is impeded by the lack of knowledge about the hierarchic organization of both individual bottlebrushes and their assemblies. We delineate fundamental correlations between molecular architecture, mesoscopic conformation, and macroscopic properties of polymer melts. Numerical simulations corroborate theoretical predictions for the effect of grafting density and side-chain length on the dimensions and rigidity of bottlebrushes, which effectively behave as a melt of flexible filaments. These findings provide quantitative guidelines for the design of novel materials that allow architectural tuning of their properties in a broad range without changing chemical composition. PMID:28861466

  11. A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities

    NASA Astrophysics Data System (ADS)

    Vallières, M.; Freeman, C. R.; Skamene, S. R.; El Naqa, I.

    2015-07-01

    This study aims at developing a joint FDG-PET and MRI texture-based model for the early evaluation of lung metastasis risk in soft-tissue sarcomas (STSs). We investigate if the creation of new composite textures from the combination of FDG-PET and MR imaging information could better identify aggressive tumours. Towards this goal, a cohort of 51 patients with histologically proven STSs of the extremities was retrospectively evaluated. All patients had pre-treatment FDG-PET and MRI scans comprised of T1-weighted and T2-weighted fat-suppression sequences (T2FS). Nine non-texture features (SUV metrics and shape features) and forty-one texture features were extracted from the tumour region of separate (FDG-PET, T1 and T2FS) and fused (FDG-PET/T1 and FDG-PET/T2FS) scans. Volume fusion of the FDG-PET and MRI scans was implemented using the wavelet transform. The influence of six different extraction parameters on the predictive value of textures was investigated. The incorporation of features into multivariable models was performed using logistic regression. The multivariable modeling strategy involved imbalance-adjusted bootstrap resampling in the following four steps leading to final prediction model construction: (1) feature set reduction; (2) feature selection; (3) prediction performance estimation; and (4) computation of model coefficients. Univariate analysis showed that the isotropic voxel size at which texture features were extracted had the most impact on predictive value. In multivariable analysis, texture features extracted from fused scans significantly outperformed those from separate scans in terms of lung metastases prediction estimates. The best performance was obtained using a combination of four texture features extracted from FDG-PET/T1 and FDG-PET/T2FS scans. This model reached an area under the receiver-operating characteristic curve of 0.984 ± 0.002, a sensitivity of 0.955 ± 0.006, and a specificity of 0.926 ± 0.004 in bootstrapping

  12. Molecular features of colorectal polyps presenting Kudo's type II mucosal crypt pattern: are they based on the same mechanism of tumorigenesis?

    PubMed

    Shinmura, Kensuke; Konishi, Kazuo; Yamochi, Toshiko; Kubota, Yutaro; Yano, Yuichiro; Katagiri, Atsushi; Muramoto, Takashi; Kihara, Toshihiro; Tojo, Masayuki; Konda, Kenichi; Tagawa, Teppei; Yanagisawa, Fumito; Kogo, Mari; Makino, Reiko; Takimoto, Masafumi; Yoshida, Hitoshi

    2014-09-01

    The molecular features of serrated polyps (SPs) with hyperplastic crypt pattern, also called Kudo's type II observed by chromoendoscopy, were evaluated. The clinicopathological and molecular features of 114 SPs with a hyperplastic pit pattern detected under chromoendoscopy (five dysplastic SPs, 63 sessile serrated adenoma/polyps (SSA/Ps), 36 microvesicular hyperplastic polyps (MVHPs), and 10 goblet cell-rich hyperplastic polyps (GCHPs)) were examined. The frequency of KRAS and BRAF mutations and CpG island methylator phenotype (CIMP) were investigated. Dysplastic SPs and SSA/Ps were frequently located in the proximal colon compared to others (SSA/Ps vs. MVHPs or GCHPs, P < 0.0001). No significant difference was found in the frequency of BRAF mutation among SPs apart from GCHP (60 % for dysplastic SPs, 44 % for SSA/Ps, 47 % for MVHPs, and 0 % for GCHPs). The frequency of CIMP was higher in dysplastic SPs or SSA/Ps than in MVHPs or GCHPs (60 % for dysplastic SPs, 56 % for SSA/Ps, 32 % for MVHPs, and 10 % for GCHPs) (SSA/Ps vs. GCHP, P = 0.0068). When serrated neoplasias (SNs) and MVHPs were classified into proximal and distal lesions, the frequency of CIMP was significantly higher in the proximal compared to the distal SNs (64 % vs. 11 %, P = 0.0032). Finally, multivariate analysis showed that proximal location and BRAF mutation were significantly associated with an increased risk of CIMP. Distinct molecular features were observed between proximal and distal SPs with hyperplastic crypt pattern. Proximal MVHPs may develop more frequently through SSA/Ps to CIMP cancers than distal MVHPs.

  13. Clinicopathological Features to Predict Progression of IgA Nephropathy with Mild Proteinuria.

    PubMed

    Chen, Ding; Liu, Jian; Duan, Shuwei; Chen, Pu; Tang, Li; Zhang, Li; Feng, Zhe; Cai, Guangyan; Wu, Jie; Chen, Xiangmei

    2018-03-06

    In the past, little attention has been paid to patients with IgA nephropathy (IgAN) who had minimal proteinuria upon the onset. The aim of this study was to analyze the clinicopathological features and the prognostic factors in patients with IgA nephropathy. Data of patients that had their first renal biopsy in our hospital and were diagnosed with primary IgAN with proteinuria <1 g/d from January 1995 to December 2014 were retrospectively examined. Clinical records of the clinicopathological features, renal function, and proteinuria were collected and investigated. The factors affecting the renal function and proteinuria were analyzed by Cox regression. The predictive efficiencies of clinical and pathological models were evaluated by Harrell concordance index (C-index). A total of 506 patients with IgA nephropathy were included in this study. (1) Baseline proteinuria greater than 0.5 g/d was positively associated with Oxford M, S, and T lesions. eGFR less than 90 mL/min/1.73 m2 were positively associated with Oxford T. (2) In the follow-up with a median of 50 months, 82 patients (16.2%) achieved complete clinical remission (CCR), whereas 54 patients (10.6%) showed an increase in creatinine by more than 50% (not progressing to end-stage renal disease). The cumulative proportion of creatinine increased >50%, and the values obtained by life-table analysis in 10, 15, and 20 years were 15%, 21%, and 22%, respectively. Significant differences were found in baseline age, proteinuria, and Oxford T between the group of creatinine increase >50% and the CCR group. (4) Multivariate COX regression showed that baseline age and proteinuria > 0.5 g/d were independent risk factors of adverse outcome. C-index suggested that the clinical model was more effective than the pathological models in predicting endpoint events. (5) Effect of the mean value during the follow-up on adverse endpoint events: Multivariate COX regression found that the mean proteinuria during follow-up was an

  14. Predicting DNA binding proteins using support vector machine with hybrid fractal features.

    PubMed

    Niu, Xiao-Hui; Hu, Xue-Hai; Shi, Feng; Xia, Jing-Bo

    2014-02-21

    DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances. © 2013 The Authors. Published by Elsevier Ltd All rights reserved.

  15. A systematic identification of species-specific protein succinylation sites using joint element features information.

    PubMed

    Hasan, Md Mehedi; Khatun, Mst Shamima; Mollah, Md Nurul Haque; Yong, Cao; Guo, Dianjing

    2017-01-01

    Lysine succinylation, an important type of protein posttranslational modification, plays significant roles in many cellular processes. Accurate identification of succinylation sites can facilitate our understanding about the molecular mechanism and potential roles of lysine succinylation. However, even in well-studied systems, a majority of the succinylation sites remain undetected because the traditional experimental approaches to succinylation site identification are often costly, time-consuming, and laborious. In silico approach, on the other hand, is potentially an alternative strategy to predict succinylation substrates. In this paper, a novel computational predictor SuccinSite2.0 was developed for predicting generic and species-specific protein succinylation sites. This predictor takes the composition of profile-based amino acid and orthogonal binary features, which were used to train a random forest classifier. We demonstrated that the proposed SuccinSite2.0 predictor outperformed other currently existing implementations on a complementarily independent dataset. Furthermore, the important features that make visible contributions to species-specific and cross-species-specific prediction of protein succinylation site were analyzed. The proposed predictor is anticipated to be a useful computational resource for lysine succinylation site prediction. The integrated species-specific online tool of SuccinSite2.0 is publicly accessible.

  16. Melancholic depression prediction by identifying representative features in metabolic and microarray profiles with missing values.

    PubMed

    Nie, Zhi; Yang, Tao; Liu, Yashu; Li, Qingyang; Narayan, Vaibhav A; Wittenberg, Gayle; Ye, Jieping

    2015-01-01

    Recent studies have revealed that melancholic depression, one major subtype of depression, is closely associated with the concentration of some metabolites and biological functions of certain genes and pathways. Meanwhile, recent advances in biotechnologies have allowed us to collect a large amount of genomic data, e.g., metabolites and microarray gene expression. With such a huge amount of information available, one approach that can give us new insights into the understanding of the fundamental biology underlying melancholic depression is to build disease status prediction models using classification or regression methods. However, the existence of strong empirical correlations, e.g., those exhibited by genes sharing the same biological pathway in microarray profiles, tremendously limits the performance of these methods. Furthermore, the occurrence of missing values which are ubiquitous in biomedical applications further complicates the problem. In this paper, we hypothesize that the problem of missing values might in some way benefit from the correlation between the variables and propose a method to learn a compressed set of representative features through an adapted version of sparse coding which is capable of identifying correlated variables and addressing the issue of missing values simultaneously. An efficient algorithm is also developed to solve the proposed formulation. We apply the proposed method on metabolic and microarray profiles collected from a group of subjects consisting of both patients with melancholic depression and healthy controls. Results show that the proposed method can not only produce meaningful clusters of variables but also generate a set of representative features that achieve superior classification performance over those generated by traditional clustering and data imputation techniques. In particular, on both datasets, we found that in comparison with the competing algorithms, the representative features learned by the proposed

  17. Exploring GPCR-Lipid Interactions by Molecular Dynamics Simulations: Excitements, Challenges, and the Way Forward.

    PubMed

    Sengupta, Durba; Prasanna, Xavier; Mohole, Madhura; Chattopadhyay, Amitabha

    2018-06-07

    Gprotein-coupled receptors (GPCRs) are seven transmembrane receptors that mediate a large number of cellular responses and are important drug targets. One of the current challenges in GPCR biology is to analyze the molecular signatures of receptor-lipid interactions and their subsequent effects on GPCR structure, organization, and function. Molecular dynamics simulation studies have been successful in predicting molecular determinants of receptor-lipid interactions. In particular, predicted cholesterol interaction sites appear to correspond well with experimentally determined binding sites and estimated time scales of association. In spite of several success stories, the methodologies in molecular dynamics simulations are still emerging. In this Feature Article, we provide a comprehensive overview of coarse-grain and atomistic molecular dynamics simulations of GPCR-lipid interaction in the context of experimental observations. In addition, we discuss the effect of secondary and tertiary structural constraints in coarse-grain simulations in the context of functional dynamics and structural plasticity of GPCRs. We envision that this comprehensive overview will help resolve differences in computational studies and provide a way forward.

  18. A Molecular docking study to predict enantioseparation of some chiral carboxylic acid derivatives by methyl-β-cyclodextrin

    NASA Astrophysics Data System (ADS)

    Nurhidayah, E. S.; Ivansyah, A. L.; Martoprawiro, M. A.; Zulfikar, M. A.

    2018-05-01

    A molecular docking study, using molecular mechanics calculations with Arguslab, was used to help predict the enantioseparation of some guest molecules of chiral carboxylic acid derivatives by heptakis-2,6-di-O-methyl-β-cyclodextrin (DIMEB) and heptakis-2,3,6-tri-O-methyl-β-cyclodextrin (TRIMEB) as host molecules. The small differences in the binding free energy values (ΔΔG) obtained from Arguslab did not indicate any significant enantioseparation. From the molecular docking simulation results, it is predicted that in the case of DIMEB as host molecule, R-enantiomer of Etodolac, Fenoprofen, Indoprofen, Ketorolac, and Naproxen will be eluted first than S-enantiomer; However, S-enantiomer of Carprofen, Flurbiprofen, Ketoprofen, Pirprofen, Proglumide, Sulindac, Surprofen, and Zaltoprofen will be eluted first than R-enantiomer by DIMEB as host molecule. When TRIMEB is used as a host molecule, R-enantiomer of Carprofen, Flurbiprofen, Indoprofen, Ketoprofen, Naproxen, Pirprofen, and Surprofen will be eluted first than S-enantiomer; However, S-enantiomer of Etodolac, Fenoprofen, Ketorolac, Proglumide, Sulindac and Zaltoprofen will be eluted first than R-enantiomer by TRIMEB as host molecule.

  19. SU-F-R-31: Identification of Robust Normal Lung CT Texture Features for the Prediction of Radiation-Induced Lung Disease

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Choi, W; Riyahi, S; Lu, W

    Purpose: Normal lung CT texture features have been used for the prediction of radiation-induced lung disease (radiation pneumonitis and radiation fibrosis). For these features to be clinically useful, they need to be relatively invariant (robust) to tumor size and not correlated with normal lung volume. Methods: The free-breathing CTs of 14 lung SBRT patients were studied. Different sizes of GTVs were simulated with spheres placed at the upper lobe and lower lobe respectively in the normal lung (contralateral to tumor). 27 texture features (9 from intensity histogram, 8 from grey-level co-occurrence matrix [GLCM] and 10 from grey-level run-length matrix [GLRM])more » were extracted from [normal lung-GTV]. To measure the variability of a feature F, the relative difference D=|Fref -Fsim|/Fref*100% was calculated, where Fref was for the entire normal lung and Fsim was for [normal lung-GTV]. A feature was considered as robust if the largest non-outlier (Q3+1.5*IQR) D was less than 5%, and considered as not correlated with normal lung volume when their Pearson correlation was lower than 0.50. Results: Only 11 features were robust. All first-order intensity-histogram features (mean, max, etc.) were robust, while most higher-order features (skewness, kurtosis, etc.) were unrobust. Only two of the GLCM and four of the GLRM features were robust. Larger GTV resulted greater feature variation, this was particularly true for unrobust features. All robust features were not correlated with normal lung volume while three unrobust features showed high correlation. Excessive variations were observed in two low grey-level run features and were later identified to be from one patient with local lung diseases (atelectasis) in the normal lung. There was no dependence on GTV location. Conclusion: We identified 11 robust normal lung CT texture features that can be further examined for the prediction of radiation-induced lung disease. Interestingly, low grey-level run features

  20. CD147 expression predicts biochemical recurrence after prostatectomy independent of histologic and pathologic features.

    PubMed

    Bauman, Tyler M; Ewald, Jonathan A; Huang, Wei; Ricke, William A

    2015-07-25

    CD147 is an MMP-inducing protein often implicated in cancer progression. The purpose of this study was to investigate the expression of CD147 in prostate cancer (PCa) progression and the prognostic ability of CD147 in predicting biochemical recurrence after prostatectomy. Plasma membrane-localized CD147 protein expression was quantified in patient samples using immunohistochemistry and multispectral imaging, and expression was compared to clinico-pathological features (pathologic stage, Gleason score, tumor volume, preoperative PSA, lymph node status, surgical margins, biochemical recurrence status). CD147 specificity and expression were confirmed with immunoblotting of prostate cell lines, and CD147 mRNA expression was evaluated in public expression microarray datasets of patient prostate tumors. Expression of CD147 protein was significantly decreased in localized tumors (pT2; p = 0.02) and aggressive PCa (≥pT3; p = 0.004), and metastases (p = 0.001) compared to benign prostatic tissue. Decreased CD147 was associated with advanced pathologic stage (p = 0.009) and high Gleason score (p = 0.02), and low CD147 expression predicted biochemical recurrence (HR 0.55; 95 % CI 0.31-0.97; p = 0.04) independent of clinico-pathologic features. Immunoblot bands were detected at 44 kDa and 66 kDa, representing non-glycosylated and glycosylated forms of CD147 protein, and CD147 expression was lower in tumorigenic T10 cells than non-tumorigenic BPH-1 cells (p = 0.02). Decreased CD147 mRNA expression was associated with increased Gleason score and pathologic stage in patient tumors but is not associated with recurrence status. Membrane-associated CD147 expression is significantly decreased in PCa compared to non-malignant prostate tissue and is associated with tumor progression, and low CD147 expression predicts biochemical recurrence after prostatectomy independent of pathologic stage, Gleason score, lymph node status, surgical margins, and tumor volume in multivariable

  1. HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.

    PubMed

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2014-01-01

    Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.

  2. A Bayesian network approach to predicting nest presence of thefederally-threatened piping plover (Charadrius melodus) using barrier island features

    USGS Publications Warehouse

    Gieder, Katherina D.; Karpanty, Sarah M.; Fraser, James D.; Catlin, Daniel H.; Gutierrez, Benjamin T.; Plant, Nathaniel G.; Turecek, Aaron M.; Thieler, E. Robert

    2014-01-01

    Sea-level rise and human development pose significant threats to shorebirds, particularly for species that utilize barrier island habitat. The piping plover (Charadrius melodus) is a federally-listed shorebird that nests on barrier islands and rapidly responds to changes in its physical environment, making it an excellent species with which to model how shorebird species may respond to habitat change related to sea-level rise and human development. The uncertainty and complexity in predicting sea-level rise, the responses of barrier island habitats to sea-level rise, and the responses of species to sea-level rise and human development necessitate a modelling approach that can link species to the physical habitat features that will be altered by changes in sea level and human development. We used a Bayesian network framework to develop a model that links piping plover nest presence to the physical features of their nesting habitat on a barrier island that is impacted by sea-level rise and human development, using three years of data (1999, 2002, and 2008) from Assateague Island National Seashore in Maryland. Our model performance results showed that we were able to successfully predict nest presence given a wide range of physical conditions within the model’s dataset. We found that model predictions were more successful when the range of physical conditions included in model development was varied rather than when those physical conditions were narrow. We also found that all model predictions had fewer false negatives (nests predicted to be absent when they were actually present in the dataset) than false positives (nests predicted to be present when they were actually absent in the dataset), indicating that our model correctly predicted nest presence better than nest absence. These results indicated that our approach of using a Bayesian network to link specific physical features to nest presence will be useful for modelling impacts of sea-level rise- or human

  3. Molecular markers in pediatric neuro-oncology.

    PubMed

    Ichimura, Koichi; Nishikawa, Ryo; Matsutani, Masao

    2012-09-01

    Pediatric molecular neuro-oncology is a fast developing field. A multitude of molecular profiling studies in recent years has unveiled a number of genetic abnormalities unique to pediatric brain tumors. It has now become clear that brain tumors that arise in children have distinct pathogenesis and biology, compared with their adult counterparts, even for those with indistinguishable histopathology. Some of the molecular features are so specific to a particular type of tumors, such as the presence of the KIAA1549-BRAF fusion gene for pilocytic astrocytomas or SMARCB1 mutations for atypical teratoid/rhabdoid tumors, that they could practically serve as a diagnostic marker on their own. Expression profiling has resolved the existence of 4 molecular subgroups in medulloblastomas, which positively translated into improved prognostication for the patients. The currently available molecular markers, however, do not cover all tumors even within a single tumor entity. The molecular pathogenesis of a large number of pediatric brain tumors is still unaccounted for, and the hierarchy of tumors is likely to be more complex and intricate than currently acknowledged. One of the main tasks of future molecular analyses in pediatric neuro-oncology, including the ongoing genome sequencing efforts, is to elucidate the biological basis of those orphan tumors. The ultimate goal of molecular diagnostics is to accurately predict the clinical and biological behavior of any tumor by means of their molecular characteristics, which is hoped to eventually pave the way for individualized treatment.

  4. Molecular markers in pediatric neuro-oncology

    PubMed Central

    Ichimura, Koichi; Nishikawa, Ryo; Matsutani, Masao

    2012-01-01

    Pediatric molecular neuro-oncology is a fast developing field. A multitude of molecular profiling studies in recent years has unveiled a number of genetic abnormalities unique to pediatric brain tumors. It has now become clear that brain tumors that arise in children have distinct pathogenesis and biology, compared with their adult counterparts, even for those with indistinguishable histopathology. Some of the molecular features are so specific to a particular type of tumors, such as the presence of the KIAA1549-BRAF fusion gene for pilocytic astrocytomas or SMARCB1 mutations for atypical teratoid/rhabdoid tumors, that they could practically serve as a diagnostic marker on their own. Expression profiling has resolved the existence of 4 molecular subgroups in medulloblastomas, which positively translated into improved prognostication for the patients. The currently available molecular markers, however, do not cover all tumors even within a single tumor entity. The molecular pathogenesis of a large number of pediatric brain tumors is still unaccounted for, and the hierarchy of tumors is likely to be more complex and intricate than currently acknowledged. One of the main tasks of future molecular analyses in pediatric neuro-oncology, including the ongoing genome sequencing efforts, is to elucidate the biological basis of those orphan tumors. The ultimate goal of molecular diagnostics is to accurately predict the clinical and biological behavior of any tumor by means of their molecular characteristics, which is hoped to eventually pave the way for individualized treatment. PMID:23095836

  5. Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine.

    PubMed

    Meng, Jun; Liu, Dong; Sun, Chao; Luan, Yushi

    2014-12-30

    MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown that miRNAs are involved in biological responses to a variety of biotic and abiotic stresses. Identification of these molecules and their targets can aid the understanding of regulatory processes. Recently, prediction methods based on machine learning have been widely used for miRNA prediction. However, most of these methods were designed for mammalian miRNA prediction, and few are available for predicting miRNAs in the pre-miRNAs of specific plant species. Although the complete Solanum lycopersicum genome has been published, only 77 Solanum lycopersicum miRNAs have been identified, far less than the estimated number. Therefore, it is essential to develop a prediction method based on machine learning to identify new plant miRNAs. A novel classification model based on a support vector machine (SVM) was trained to identify real and pseudo plant pre-miRNAs together with their miRNAs. An initial set of 152 novel features related to sequential structures was used to train the model. By applying feature selection, we obtained the best subset of 47 features for use with the Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) method for the classification of plant pre-miRNAs. Using this method, 63 features were obtained for plant miRNA classification. We then developed an integrated classification model, miPlantPreMat, which comprises MiPlantPre and MiPlantMat, to identify plant pre-miRNAs and their miRNAs. This model achieved approximately 90% accuracy using plant datasets from nine plant species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, Arabidopsis lyrata, Zea mays and Solanum

  6. Comprehensive analysis of CpG island methylator phenotype (CIMP)-high, -low, and -negative colorectal cancers based on protein marker expression and molecular features.

    PubMed

    Zlobec, Inti; Bihl, Michel; Foerster, Anja; Rufle, Alex; Lugli, Alessandro

    2011-11-01

    CpG island methylator phenotype (CIMP) is being investigated for its role in the molecular and prognostic classification of colorectal cancer patients but is also emerging as a factor with the potential to influence clinical decision-making. We report a comprehensive analysis of clinico-pathological and molecular features (KRAS, BRAF and microsatellite instability, MSI) as well as of selected tumour- and host-related protein markers characterizing CIMP-high (CIMP-H), -low, and -negative colorectal cancers. Immunohistochemical analysis for 48 protein markers and molecular analysis of CIMP (CIMP-H: ≥ 4/5 methylated genes), MSI (MSI-H: ≥ 2 instable genes), KRAS, and BRAF were performed on 337 colorectal cancers. Simple and multiple regression analysis and receiver operating characteristic (ROC) curve analysis were performed. CIMP-H was found in 24 cases (7.1%) and linked (p < 0.0001) to more proximal tumour location, BRAF mutation, MSI-H, MGMT methylation (p = 0.022), advanced pT classification (p = 0.03), mucinous histology (p = 0.069), and less frequent KRAS mutation (p = 0.067) compared to CIMP-low or -negative cases. Of the 48 protein markers, decreased levels of RKIP (p = 0.0056), EphB2 (p = 0.0045), CK20 (p = 0.002), and Cdx2 (p < 0.0001) and increased numbers of CD8+ intra-epithelial lymphocytes (p < 0.0001) were related to CIMP-H, independently of MSI status. In addition to the expected clinico-pathological and molecular associations, CIMP-H colorectal cancers are characterized by a loss of protein markers associated with differentiation, and metastasis suppression, and have increased CD8+ T-lymphocytes regardless of MSI status. In particular, Cdx2 loss seems to strongly predict CIMP-H in both microsatellite-stable (MSS) and MSI-H colorectal cancers. Cdx2 is proposed as a surrogate marker for CIMP-H. Copyright © 2011 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.

  7. Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bossi, Flavia; Fan, Jue; Xiao, Jun

    Here, the molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. As a result, to identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation.more » We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.« less

  8. Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction

    DOE PAGES

    Bossi, Flavia; Fan, Jue; Xiao, Jun; ...

    2017-06-26

    Here, the molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. As a result, to identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation.more » We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.« less

  9. Alcohols Reduce Lateral Membrane Pressures: Predictions from Molecular Theory

    PubMed Central

    Frischknecht, Amalie L.; Frink, Laura J. Douglas

    2006-01-01

    We explore the effects of alcohols on fluid lipid bilayers using a molecular theory with a coarse-grained model. We show that the trends predicted from the theory in the changes in area per lipid, alcohol concentration in the bilayer, and area compressibility modulus, as a function of alcohol chain length and of the alcohol concentration in the solvent far from the bilayer, follow those found experimentally. We then use the theory to study the effect of added alcohol on the lateral pressure profile across the membrane, and find that added alcohol reduces the surface tensions at both the headgroup/solvent and headgroup/tailgroup interfaces, as well as the lateral pressures in the headgroup and tailgroup regions. These changes in lateral pressures could affect the conformations of membrane proteins, providing a nonspecific mechanism for the biological effects of alcohols on cells. PMID:16980354

  10. Detecting Molecular Features of Spectra Mainly Associated with Structural and Non-Structural Carbohydrates in Co-Products from BioEthanol Production Using DRIFT with Uni- and Multivariate Molecular Spectral Analyses

    PubMed Central

    Yu, Peiqiang; Damiran, Daalkhaijav; Azarfar, Arash; Niu, Zhiyuan

    2011-01-01

    The objective of this study was to use DRIFT spectroscopy with uni- and multivariate molecular spectral analyses as a novel approach to detect molecular features of spectra mainly associated with carbohydrate in the co-products (wheat DDGS, corn DDGS, blend DDGS) from bioethanol processing in comparison with original feedstock (wheat (Triticum), corn (Zea mays)). The carbohydrates related molecular spectral bands included: A_Cell (structural carbohydrates, peaks area region and baseline: ca. 1485–1188 cm−1), A_1240 (structural carbohydrates, peak area centered at ca. 1240 cm−1 with region and baseline: ca. 1292–1198 cm−1), A_CHO (total carbohydrates, peaks region and baseline: ca. 1187–950 cm−1), A_928 (non-structural carbohydrates, peak area centered at ca. 928 cm−1 with region and baseline: ca. 952–910 cm−1), A_860 (non-structural carbohydrates, peak area centered at ca. 860 cm−1 with region and baseline: ca. 880–827 cm−1), H_1415 (structural carbohydrate, peak height centered at ca. 1415 cm−1 with baseline: ca. 1485–1188 cm−1), H_1370 (structural carbohydrate, peak height at ca. 1370 cm−1 with a baseline: ca. 1485–1188 cm−1). The study shows that the grains had lower spectral intensity (KM Unit) of the cellulosic compounds of A_1240 (8.5 vs. 36.6, P < 0.05), higher (P < 0.05) intensities of the non-structural carbohydrate of A_928 (17.3 vs. 2.0) and A_860 (20.7 vs. 7.6) than their co-products from bioethanol processing. There were no differences (P > 0.05) in the peak area intensities of A_Cell (structural CHO) at 1292–1198 cm−1 and A_CHO (total CHO) at 1187–950 cm−1 with average molecular infrared intensity KM unit of 226.8 and 508.1, respectively. There were no differences (P > 0.05) in the peak height intensities of H_1415 and H_1370 (structural CHOs) with average intensities 1.35 and 1.15, respectively. The multivariate molecular spectral analyses were able to discriminate and classify between the corn and corn

  11. SU-E-J-256: Predicting Metastasis-Free Survival of Rectal Cancer Patients Treated with Neoadjuvant Chemo-Radiotherapy by Data-Mining of CT Texture Features of Primary Lesions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhong, H; Wang, J; Shen, L

    Purpose: The purpose of this study is to investigate the relationship between computed tomographic (CT) texture features of primary lesions and metastasis-free survival for rectal cancer patients; and to develop a datamining prediction model using texture features. Methods: A total of 220 rectal cancer patients treated with neoadjuvant chemo-radiotherapy (CRT) were enrolled in this study. All patients underwent CT scans before CRT. The primary lesions on the CT images were delineated by two experienced oncologists. The CT images were filtered by Laplacian of Gaussian (LoG) filters with different filter values (1.0–2.5: from fine to coarse). Both filtered and unfiltered imagesmore » were analyzed using Gray-level Co-occurrence Matrix (GLCM) texture analysis with different directions (transversal, sagittal, and coronal). Totally, 270 texture features with different species, directions and filter values were extracted. Texture features were examined with Student’s t-test for selecting predictive features. Principal Component Analysis (PCA) was performed upon the selected features to reduce the feature collinearity. Artificial neural network (ANN) and logistic regression were applied to establish metastasis prediction models. Results: Forty-six of 220 patients developed metastasis with a follow-up time of more than 2 years. Sixtyseven texture features were significantly different in t-test (p<0.05) between patients with and without metastasis, and 12 of them were extremely significant (p<0.001). The Area-under-the-curve (AUC) of ANN was 0.72, and the concordance index (CI) of logistic regression was 0.71. The predictability of ANN was slightly better than logistic regression. Conclusion: CT texture features of primary lesions are related to metastasisfree survival of rectal cancer patients. Both ANN and logistic regression based models can be developed for prediction.« less

  12. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness.

    PubMed

    Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo

    2016-02-01

    Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.

  13. Electronic coarse graining enhances the predictive power of molecular simulation allowing challenges in water physics to be addressed

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cipcigan, Flaviu S., E-mail: flaviu.cipcigan@ed.ac.uk; National Physical Laboratory, Hampton Road, Teddington, Middlesex TW11 0LW; Sokhan, Vlad P.

    One key factor that limits the predictive power of molecular dynamics simulations is the accuracy and transferability of the input force field. Force fields are challenged by heterogeneous environments, where electronic responses give rise to biologically important forces such as many-body polarisation and dispersion. The importance of polarisation in the condensed phase was recognised early on, as described by Cochran in 1959 [Philosophical Magazine 4 (1959) 1082–1086] [32]. Currently in molecular simulation, dispersion forces are treated at the two-body level and in the dipole limit, although the importance of three-body terms in the condensed phase was demonstrated by Barker inmore » the 1980s [Phys. Rev. Lett. 57 (1986) 230–233] [72]. One approach for treating both polarisation and dispersion on an equal basis is to coarse grain the electrons surrounding a molecular moiety to a single quantum harmonic oscillator (cf. Hirschfelder, Curtiss and Bird 1954 [The Molecular Theory of Gases and Liquids (1954)] [37]). The approach, when solved in strong coupling beyond the dipole limit, gives a description of long-range forces that includes two- and many-body terms to all orders. In the last decade, the tools necessary to implement the strong coupling limit have been developed, culminating in a transferable model of water with excellent predictive power across the phase diagram. Transferability arises since the environment automatically identifies the important long range interactions, rather than the modeler through a limited set of expressions. Here, we discuss the role of electronic coarse-graining in predictive multiscale materials modelling and describe the first implementation of the method in a general purpose molecular dynamics software: QDO-MD. - Highlights: • Electronic coarse graining unites many-body dispersion and polarisation beyond the dipole limit. • It consists of replacing the electrons of a molecule using a quantum harmonic oscillator

  14. Theoretical Characterization of the Spectral Density of the Water-Soluble Chlorophyll-Binding Protein from Combined Quantum Mechanics/Molecular Mechanics Molecular Dynamics Simulations.

    PubMed

    Rosnik, Andreana M; Curutchet, Carles

    2015-12-08

    Over the past decade, both experimentalists and theorists have worked to develop methods to describe pigment-protein coupling in photosynthetic light-harvesting complexes in order to understand the molecular basis of quantum coherence effects observed in photosynthesis. Here we present an improved strategy based on the combination of quantum mechanics/molecular mechanics (QM/MM) molecular dynamics (MD) simulations and excited-state calculations to predict the spectral density of electronic-vibrational coupling. We study the water-soluble chlorophyll-binding protein (WSCP) reconstituted with Chl a or Chl b pigments as the system of interest and compare our work with data obtained by Pieper and co-workers from differential fluorescence line-narrowing spectra (Pieper et al. J. Phys. Chem. B 2011, 115 (14), 4042-4052). Our results demonstrate that the use of QM/MM MD simulations where the nuclear positions are still propagated at the classical level leads to a striking improvement of the predicted spectral densities in the middle- and high-frequency regions, where they nearly reach quantitative accuracy. This demonstrates that the so-called "geometry mismatch" problem related to the use of low-quality structures in QM calculations, not the quantum features of pigments high-frequency motions, causes the failure of previous studies relying on similar protocols. Thus, this work paves the way toward quantitative predictions of pigment-protein coupling and the comprehension of quantum coherence effects in photosynthesis.

  15. Molecular Indicators of Stress-Induced Neuroinflammation in a Mouse Model Simulating Features of Post-Traumatic Stress Disorder (Open Access)

    DTIC Science & Technology

    2017-05-23

    OPEN ORIGINAL ARTICLE Molecular indicators of stress-induced neuroinflammation in a mouse model simulating features of post -traumatic stress disorder... post -traumatic stress disorder (PTSD). The model involved exposure of an intruder (male C57BL/6) mouse to a resident aggressor (male SJL) mouse for 5...revealed that neurogenesis and synaptic plasticity pathways were activated during the early responses but were inhibited after the later post -trauma

  16. Comparing Mammography Abnormality Features and Genetic Variants in the Prediction of Breast Cancer in Women Recommended for Breast Biopsy

    PubMed Central

    Burnside, Elizabeth S.; Liu, Jie; Wu, Yirong; Onitilo, Adedayo A.; McCarty, Catherine; Page, C. David; Peissig, Peggy; Trentham-Dietz, Amy; Kitchner, Terrie; Fan, Jun; Yuan, Ming

    2015-01-01

    Rationale and Objectives The discovery of germline genetic variants associated with breast cancer has engendered interest in risk stratification for improved, targeted detection and diagnosis. However, there has yet to be a comparison of the predictive ability of these genetic variants with mammography abnormality descriptors. Materials and Methods Our IRB-approved, HIPAA-compliant study utilized a personalized medicine registry in which participants consented to provide a DNA sample and participate in longitudinal follow-up. In our retrospective, age-matched, case-controlled study of 373 cases and 395 controls who underwent breast biopsy, we collected risk factors selected a priori based on the literature including: demographic variables based on the Gail model, common germline genetic variants, and diagnostic mammography findings according to BI-RADS. We developed predictive models using logistic regression to determine the predictive ability of: 1) demographic variables, 2) 10 selected genetic variants, or 3) mammography BI-RADS features. We evaluated each model in turn by calculating a risk score for each patient using 10-fold cross validation; used this risk estimate to construct ROC curves; and compared the AUC of each using the DeLong method. Results The performance of the regression model using demographic risk factors was not statistically different from the model using genetic variants (p=0.9). The model using mammography features (AUC = 0.689) was superior to both the demographic model (AUC = .598; p<0.001) and the genetic model (AUC = .601; p<0.001). Conclusion BI-RADS features exceeded the ability of demographic and 10 selected germline genetic variants to predict breast cancer in women recommended for biopsy. PMID:26514439

  17. Molecular properties of food allergens.

    PubMed

    Breiteneder, Heimo; Mills, E N Clare

    2005-01-01

    Plant food allergens belong to a rather limited number of protein families and are also characterized by a number of biochemical and physicochemical properties, many of which are also shared by food allergens of animal origin. These include thermal stability and resistance to proteolysis, which are enhanced by an ability to bind ligands, such as metal ions, lipids, or steroids. Other types of lipid interaction, including membranes or other lipid structures, represent another feature that might promote the allergenic properties of certain food proteins. A structural feature clearly related to stability is intramolecular disulfide bonds alongside posttranslational modifications, such as N-glycosylation. Some plant food allergens, such as the cereal seed storage prolamins, are rheomorphic proteins with polypeptide chains that adopt an ensemble of secondary structures resembling unfolded or partially folded proteins. Other plant food allergens are characterized by the presence of repetitive structures, the ability to form oligomers, and the tendency to aggregate. A summary of our current knowledge regarding the molecular properties of food allergens is presented. Although we cannot as yet predict the allergenicity of a given food protein, understanding of the molecular properties that might predispose them to becoming allergens is an important first step and will undoubtedly contribute to the integrative allergenic risk assessment process being adopted by regulators.

  18. Computational prediction of kink properties of helices in membrane proteins

    NASA Astrophysics Data System (ADS)

    Mai, T.-L.; Chen, C.-M.

    2014-02-01

    We have combined molecular dynamics simulations and fold identification procedures to investigate the structure of 696 kinked and 120 unkinked transmembrane (TM) helices in the PDBTM database. Our main aim of this study is to understand the formation of helical kinks by simulating their quasi-equilibrium heating processes, which might be relevant to the prediction of their structural features. The simulated structural features of these TM helices, including the position and the angle of helical kinks, were analyzed and compared with statistical data from PDBTM. From quasi-equilibrium heating processes of TM helices with four very different relaxation time constants, we found that these processes gave comparable predictions of the structural features of TM helices. Overall, 95 % of our best kink position predictions have an error of no more than two residues and 75 % of our best angle predictions have an error of less than 15°. Various structure assessments have been carried out to assess our predicted models of TM helices in PDBTM. Our results show that, in 696 predicted kinked helices, 70 % have a RMSD less than 2 Å, 71 % have a TM-score greater than 0.5, 69 % have a MaxSub score greater than 0.8, 60 % have a GDT-TS score greater than 85, and 58 % have a GDT-HA score greater than 70. For unkinked helices, our predicted models are also highly consistent with their crystal structure. These results provide strong supports for our assumption that kink formation of TM helices in quasi-equilibrium heating processes is relevant to predicting the structure of TM helices.

  19. Molecular markers in bladder cancer: Novel research frontiers.

    PubMed

    Sanguedolce, Francesca; Cormio, Antonella; Bufo, Pantaleo; Carrieri, Giuseppe; Cormio, Luigi

    2015-01-01

    Bladder cancer (BC) is a heterogeneous disease encompassing distinct biologic features that lead to extremely different clinical behaviors. In the last 20 years, great efforts have been made to predict disease outcome and response to treatment by developing risk assessment calculators based on multiple standard clinical-pathological factors, as well as by testing several molecular markers. Unfortunately, risk assessment calculators alone fail to accurately assess a single patient's prognosis and response to different treatment options. Several molecular markers easily assessable by routine immunohistochemical techniques hold promise for becoming widely available and cost-effective tools for a more reliable risk assessment, but none have yet entered routine clinical practice. Current research is therefore moving towards (i) identifying novel molecular markers; (ii) testing old and new markers in homogeneous patients' populations receiving homogeneous treatments; (iii) generating a multimarker panel that could be easily, and thus routinely, used in clinical practice; (iv) developing novel risk assessment tools, possibly combining standard clinical-pathological factors with molecular markers. This review analyses the emerging body of literature concerning novel biomarkers, ranging from genetic changes to altered expression of a huge variety of molecules, potentially involved in BC outcome and response to treatment. Findings suggest that some of these indicators, such as serum circulating tumor cells and tissue mitochondrial DNA, seem to be easily assessable and provide reliable information. Other markers, such as the phosphoinositide-3-kinase (PI3K)/AKT (serine-threonine kinase)/mTOR (mammalian target of rapamycin) pathway and epigenetic changes in DNA methylation seem to not only have prognostic/predictive value but also, most importantly, represent valuable therapeutic targets. Finally, there is increasing evidence that the development of novel risk assessment tools

  20. The Relative Importance of Family History, Gender, Mode of Onset, and Age at Onsetin Predicting Clinical Features of First-Episode Psychotic Disorders.

    PubMed

    Compton, Michael T; Berez, Chantal; Walker, Elaine F

    Family history of psychosis, gender, mode of onset, and age at onset are considered prognostic factors important to clinicians evaluating first-episode psychosis; yet, clinicians have little guidance as to how these four factors differentially predict early-course substance abuse, symptomatology, and functioning. We conducted a "head-to-head comparison" of these four factors regarding their associations with key clinical features at initial hospitalization. We also assessed potential interactions between gender and family history with regard to age at onset of psychosis and symptom severity. Consecutively admitted first-episode patients (n=334) were evaluated in two studies that rigorously assessed a number of early-course variables. Associations among variables of interest were examined using Pearson correlations, χ 2 tests, Student's t-tests, and 2×2 factorial analyses of variance. Substance (nicotine, alcohol, and cannabis) abuse and positive symptom severity were predicted only by male gender. Negative symptom severity and global functioning impairments were predicted by earlier age at onset of psychosis. General psychopathology symptom severity was predicted by both mode of onset and age at onset. Interaction effects were not observed with regard to gender and family history in predicting age at onset or symptom severity. The four prognostic features have differential associations with substance abuse, domains of symptom severity, and global functioning. Gender and age at onset of psychosis appear to be more predictive of clinical features at the time of initial evaluation (and thus presumably longer term outcomes) than the presence of a family history of psychosis and a more gradual mode of onset.

  1. Prediction of chromatographic relative retention time of polychlorinated biphenyls from the molecular electronegativity distance vector.

    PubMed

    Liu, Shu-Shen; Liu, Yan; Yin, Da-Qian; Wang, Xiao-Dong; Wang, Lian-Sheng

    2006-02-01

    Using the molecular electronegativity distance vector (MEDV) descriptors derived directly from the molecular topological structures, the gas chromatographic relative retention times (RRTs) of 209 polychlorinated biphenyls (PCBs) on the SE-54 stationary phase were predicted. A five-variable regression equation with the correlation coefficient of 0.9964 and the root mean square errors of 0.0152 was developed. The descriptors included in the equation represent degree of chlorination (nCl), nonortho index (Ino), and interactions between three pairs of atom types, i.e., atom groups -C= and -C=, -C= and >C=, -C= and -Cl. It has been proved that the retention times of all 209 PCB congeners can be accurately predicted as long as there are more than 50 calibration compounds. In the same way, the MEDV descriptors are also used to develop the five- or six-variable models of RRTs of PCBs on other 18 stationary phases and the correlation coefficients in both modeling stage and LOO cross-validation step are not lower than 0.99 except two models.

  2. [Prediction of the molecular response to pertubations from single cell measurements].

    PubMed

    Remacle, Françoise; Levine, Raphael D

    2014-12-01

    The response of protein signalization networks to perturbations is analysed from single cell measurements. This experimental approach allows characterizing the fluctuations in protein expression levels from cell to cell. The analysis is based on an information theoretic approach grounded in thermodynamics leading to a quantitative version of Le Chatelier principle which allows to predict the molecular response. Two systems are investigated: human macrophages subjected to lipopolysaccharide challenge, analogous to the immune response against Gram-negative bacteria and the response of the proteins involved in the mTOR signalizing network of GBM cancer cells to changes in partial oxygen pressure. © 2014 médecine/sciences – Inserm.

  3. LC-IMS-MS Feature Finder

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2013-03-07

    LC-IMS-MS Feature Finder is a command line software application which searches for possible molecular ion signatures in multidimensional liquid chromatography, ion mobility spectrometry, and mass spectrometry data by clustering deisotoped peaks with similar monoisotopic mass values, charge states, elution times, and drift times. The software application includes an algorithm for detecting multiple conformations and co-eluting species in the ion mobility dimension. LC-IMS-MS Feature Finder is designed to create an output file with detected features that includes associated information about the detected features.

  4. Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes

    PubMed Central

    Lou, Wangchao; Wang, Xiaoqing; Chen, Fan; Chen, Yixiao; Jiang, Bo; Zhang, Hua

    2014-01-01

    Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred

  5. Molecular features of colorectal polyps presenting Kudo’s type II mucosal crypt pattern: are they based on the same mechanism of tumorigenesis?

    PubMed Central

    Shinmura, Kensuke; Konishi, Kazuo; Yamochi, Toshiko; Kubota, Yutaro; Yano, Yuichiro; Katagiri, Atsushi; Muramoto, Takashi; Kihara, Toshihiro; Tojo, Masayuki; Konda, Kenichi; Tagawa, Teppei; Yanagisawa, Fumito; Kogo, Mari; Makino, Reiko; Takimoto, Masafumi; Yoshida, Hitoshi

    2014-01-01

    Background and study aims: The molecular features of serrated polyps (SPs) with hyperplastic crypt pattern, also called Kudo’s type II observed by chromoendoscopy, were evaluated. Methods: The clinicopathological and molecular features of 114 SPs with a hyperplastic pit pattern detected under chromoendoscopy (five dysplastic SPs, 63 sessile serrated adenoma/polyps (SSA/Ps), 36 microvesicular hyperplastic polyps (MVHPs), and 10 goblet cell-rich hyperplastic polyps (GCHPs)) were examined. The frequency of KRAS and BRAF mutations and CpG island methylator phenotype (CIMP) were investigated. Results: Dysplastic SPs and SSA/Ps were frequently located in the proximal colon compared to others (SSA/Ps vs. MVHPs or GCHPs, P < 0.0001). No significant difference was found in the frequency of BRAF mutation among SPs apart from GCHP (60 % for dysplastic SPs, 44 % for SSA/Ps, 47 % for MVHPs, and 0 % for GCHPs). The frequency of CIMP was higher in dysplastic SPs or SSA/Ps than in MVHPs or GCHPs (60 % for dysplastic SPs, 56 % for SSA/Ps, 32 % for MVHPs, and 10 % for GCHPs) (SSA/Ps vs. GCHP, P = 0.0068). When serrated neoplasias (SNs) and MVHPs were classified into proximal and distal lesions, the frequency of CIMP was significantly higher in the proximal compared to the distal SNs (64 % vs. 11 %, P = 0.0032). Finally, multivariate analysis showed that proximal location and BRAF mutation were significantly associated with an increased risk of CIMP. Conclusions: Distinct molecular features were observed between proximal and distal SPs with hyperplastic crypt pattern. Proximal MVHPs may develop more frequently through SSA/Ps to CIMP cancers than distal MVHPs. PMID:26134964

  6. System and methods for predicting transmembrane domains in membrane proteins and mining the genome for recognizing G-protein coupled receptors

    DOEpatents

    Trabanino, Rene J; Vaidehi, Nagarajan; Hall, Spencer E; Goddard, William A; Floriano, Wely

    2013-02-05

    The invention provides computer-implemented methods and apparatus implementing a hierarchical protocol using multiscale molecular dynamics and molecular modeling methods to predict the presence of transmembrane regions in proteins, such as G-Protein Coupled Receptors (GPCR), and protein structural models generated according to the protocol. The protocol features a coarse grain sampling method, such as hydrophobicity analysis, to provide a fast and accurate procedure for predicting transmembrane regions. Methods and apparatus of the invention are useful to screen protein or polynucleotide databases for encoded proteins with transmembrane regions, such as GPCRs.

  7. Lattice-free prediction of three-dimensional structure of programmed DNA assemblies

    PubMed Central

    Pan, Keyao; Kim, Do-Nyun; Zhang, Fei; Adendorff, Matthew R.; Yan, Hao; Bathe, Mark

    2014-01-01

    DNA can be programmed to self-assemble into high molecular weight 3D assemblies with precise nanometer-scale structural features. Although numerous sequence design strategies exist to realize these assemblies in solution, there is currently no computational framework to predict their 3D structures on the basis of programmed underlying multi-way junction topologies constrained by DNA duplexes. Here, we introduce such an approach and apply it to assemblies designed using the canonical immobile four-way junction. The procedure is used to predict the 3D structure of high molecular weight planar and spherical ring-like origami objects, a tile-based sheet-like ribbon, and a 3D crystalline tensegrity motif, in quantitative agreement with experiments. Our framework provides a new approach to predict programmed nucleic acid 3D structure on the basis of prescribed secondary structure motifs, with possible application to the design of such assemblies for use in biomolecular and materials science. PMID:25470497

  8. Application of Deep Learning in Automated Analysis of Molecular Images in Cancer: A Survey

    PubMed Central

    Xue, Yong; Chen, Shihui; Liu, Yong

    2017-01-01

    Molecular imaging enables the visualization and quantitative analysis of the alterations of biological procedures at molecular and/or cellular level, which is of great significance for early detection of cancer. In recent years, deep leaning has been widely used in medical imaging analysis, as it overcomes the limitations of visual assessment and traditional machine learning techniques by extracting hierarchical features with powerful representation capability. Research on cancer molecular images using deep learning techniques is also increasing dynamically. Hence, in this paper, we review the applications of deep learning in molecular imaging in terms of tumor lesion segmentation, tumor classification, and survival prediction. We also outline some future directions in which researchers may develop more powerful deep learning models for better performance in the applications in cancer molecular imaging. PMID:29114182

  9. Star formation in evolving molecular clouds

    NASA Astrophysics Data System (ADS)

    Völschow, M.; Banerjee, R.; Körtgen, B.

    2017-09-01

    Molecular clouds are the principle stellar nurseries of our universe; they thus remain a focus of both observational and theoretical studies. From observations, some of the key properties of molecular clouds are well known but many questions regarding their evolution and star formation activity remain open. While numerical simulations feature a large number and complexity of involved physical processes, this plethora of effects may hide the fundamentals that determine the evolution of molecular clouds and enable the formation of stars. Purely analytical models, on the other hand, tend to suffer from rough approximations or a lack of completeness, limiting their predictive power. In this paper, we present a model that incorporates central concepts of astrophysics as well as reliable results from recent simulations of molecular clouds and their evolutionary paths. Based on that, we construct a self-consistent semi-analytical framework that describes the formation, evolution, and star formation activity of molecular clouds, including a number of feedback effects to account for the complex processes inside those objects. The final equation system is solved numerically but at much lower computational expense than, for example, hydrodynamical descriptions of comparable systems. The model presented in this paper agrees well with a broad range of observational results, showing that molecular cloud evolution can be understood as an interplay between accretion, global collapse, star formation, and stellar feedback.

  10. Computer-aided molecular modeling techniques for predicting the stability of drug cyclodextrin inclusion complexes in aqueous solutions

    NASA Astrophysics Data System (ADS)

    Faucci, Maria Teresa; Melani, Fabrizio; Mura, Paola

    2002-06-01

    Molecular modeling was used to investigate factors influencing complex formation between cyclodextrins and guest molecules and predict their stability through a theoretical model based on the search for a correlation between experimental stability constants ( Ks) and some theoretical parameters describing complexation (docking energy, host-guest contact surfaces, intermolecular interaction fields) calculated from complex structures at a minimum conformational energy, obtained through stochastic methods based on molecular dynamic simulations. Naproxen, ibuprofen, ketoprofen and ibuproxam were used as model drug molecules. Multiple Regression Analysis allowed identification of the significant factors for the complex stability. A mathematical model ( r=0.897) related log Ks with complex docking energy and lipophilic molecular fields of cyclodextrin and drug.

  11. Prediction and validation of diffusion coefficients in a model drug delivery system using microsecond atomistic molecular dynamics simulation and vapour sorption analysis.

    PubMed

    Forrey, Christopher; Saylor, David M; Silverstein, Joshua S; Douglas, Jack F; Davis, Eric M; Elabd, Yossef A

    2014-10-14

    Diffusion of small to medium sized molecules in polymeric medical device materials underlies a broad range of public health concerns related to unintended leaching from or uptake into implantable medical devices. However, obtaining accurate diffusion coefficients for such systems at physiological temperature represents a formidable challenge, both experimentally and computationally. While molecular dynamics simulation has been used to accurately predict the diffusion coefficients, D, of a handful of gases in various polymers, this success has not been extended to molecules larger than gases, e.g., condensable vapours, liquids, and drugs. We present atomistic molecular dynamics simulation predictions of diffusion in a model drug eluting system that represent a dramatic improvement in accuracy compared to previous simulation predictions for comparable systems. We find that, for simulations of insufficient duration, sub-diffusive dynamics can lead to dramatic over-prediction of D. We present useful metrics for monitoring the extent of sub-diffusive dynamics and explore how these metrics correlate to error in D. We also identify a relationship between diffusion and fast dynamics in our system, which may serve as a means to more rapidly predict diffusion in slowly diffusing systems. Our work provides important precedent and essential insights for utilizing atomistic molecular dynamics simulations to predict diffusion coefficients of small to medium sized molecules in condensed soft matter systems.

  12. Relational Network for Knowledge Discovery through Heterogeneous Biomedical and Clinical Features

    PubMed Central

    Chen, Huaidong; Chen, Wei; Liu, Chenglin; Zhang, Le; Su, Jing; Zhou, Xiaobo

    2016-01-01

    Biomedical big data, as a whole, covers numerous features, while each dataset specifically delineates part of them. “Full feature spectrum” knowledge discovery across heterogeneous data sources remains a major challenge. We developed a method called bootstrapping for unified feature association measurement (BUFAM) for pairwise association analysis, and relational dependency network (RDN) modeling for global module detection on features across breast cancer cohorts. Discovered knowledge was cross-validated using data from Wake Forest Baptist Medical Center’s electronic medical records and annotated with BioCarta signaling signatures. The clinical potential of the discovered modules was exhibited by stratifying patients for drug responses. A series of discovered associations provided new insights into breast cancer, such as the effects of patient’s cultural background on preferences for surgical procedure. We also discovered two groups of highly associated features, the HER2 and the ER modules, each of which described how phenotypes were associated with molecular signatures, diagnostic features, and clinical decisions. The discovered “ER module”, which was dominated by cancer immunity, was used as an example for patient stratification and prediction of drug responses to tamoxifen and chemotherapy. BUFAM-derived RDN modeling demonstrated unique ability to discover clinically meaningful and actionable knowledge across highly heterogeneous biomedical big data sets. PMID:27427091

  13. Relational Network for Knowledge Discovery through Heterogeneous Biomedical and Clinical Features

    NASA Astrophysics Data System (ADS)

    Chen, Huaidong; Chen, Wei; Liu, Chenglin; Zhang, Le; Su, Jing; Zhou, Xiaobo

    2016-07-01

    Biomedical big data, as a whole, covers numerous features, while each dataset specifically delineates part of them. “Full feature spectrum” knowledge discovery across heterogeneous data sources remains a major challenge. We developed a method called bootstrapping for unified feature association measurement (BUFAM) for pairwise association analysis, and relational dependency network (RDN) modeling for global module detection on features across breast cancer cohorts. Discovered knowledge was cross-validated using data from Wake Forest Baptist Medical Center’s electronic medical records and annotated with BioCarta signaling signatures. The clinical potential of the discovered modules was exhibited by stratifying patients for drug responses. A series of discovered associations provided new insights into breast cancer, such as the effects of patient’s cultural background on preferences for surgical procedure. We also discovered two groups of highly associated features, the HER2 and the ER modules, each of which described how phenotypes were associated with molecular signatures, diagnostic features, and clinical decisions. The discovered “ER module”, which was dominated by cancer immunity, was used as an example for patient stratification and prediction of drug responses to tamoxifen and chemotherapy. BUFAM-derived RDN modeling demonstrated unique ability to discover clinically meaningful and actionable knowledge across highly heterogeneous biomedical big data sets.

  14. deepNF: Deep network fusion for protein function prediction.

    PubMed

    Gligorijevic, Vladimir; Barot, Meet; Bonneau, Richard

    2018-06-01

    The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity. deepNF is freely available at: https://github.com/VGligorijevic/deepNF. vgligorijevic@flatironinstitute.org, rb133@nyu.edu. Supplementary data are available at Bioinformatics online.

  15. De novo PHIP-predicted deleterious variants are associated with developmental delay, intellectual disability, obesity, and dysmorphic features.

    PubMed

    Webster, Emily; Cho, Megan T; Alexander, Nora; Desai, Sonal; Naidu, Sakkubai; Bekheirnia, Mir Reza; Lewis, Andrea; Retterer, Kyle; Juusola, Jane; Chung, Wendy K

    2016-11-01

    Using whole-exome sequencing, we have identified novel de novo heterozygous pleckstrin homology domain-interacting protein ( PHIP ) variants that are predicted to be deleterious, including a frameshift deletion, in two unrelated patients with common clinical features of developmental delay, intellectual disability, anxiety, hypotonia, poor balance, obesity, and dysmorphic features. A nonsense mutation in PHIP has previously been associated with similar clinical features. Patients with microdeletions of 6q14.1, including PHIP , have a similar phenotype of developmental delay, intellectual disability, hypotonia, and obesity, suggesting that the phenotype of our patients is a result of loss-of-function mutations. PHIP produces multiple protein products, such as PHIP1 (also known as DCAF14), PHIP, and NDRP. PHIP1 is one of the multiple substrate receptors of the proteolytic CUL4-DDB1 ubiquitin ligase complex. CUL4B deficiency has been associated with intellectual disability, central obesity, muscle wasting, and dysmorphic features. The overlapping phenotype associated with CUL4B deficiency suggests that PHIP mutations cause disease through disruption of the ubiquitin ligase pathway.

  16. Validation of Molecular Dynamics Simulations for Prediction of Three-Dimensional Structures of Small Proteins.

    PubMed

    Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi

    2017-10-12

    Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.

  17. Characterization of the glass transition of water predicted by molecular dynamics simulations using nonpolarizable intermolecular potentials.

    PubMed

    Kreck, Cara A; Mancera, Ricardo L

    2014-02-20

    Molecular dynamics simulations allow detailed study of the experimentally inaccessible liquid state of supercooled water below its homogeneous nucleation temperature and the characterization of the glass transition. Simple, nonpolarizable intermolecular potentials are commonly used in classical molecular dynamics simulations of water and aqueous systems due to their lower computational cost and their ability to reproduce a wide range of properties. Because the quality of these predictions varies between the potentials, the predicted glass transition of water is likely to be influenced by the choice of potential. We have thus conducted an extensive comparative investigation of various three-, four-, five-, and six-point water potentials in both the NPT and NVT ensembles. The T(g) predicted from NPT simulations is strongly correlated with the temperature of minimum density, whereas the maximum in the heat capacity plot corresponds to the minimum in the thermal expansion coefficient. In the NVT ensemble, these points are instead related to the maximum in the internal pressure and the minimum of its derivative, respectively. A detailed analysis of the hydrogen-bonding properties at the glass transition reveals that the extent of hydrogen-bonds lost upon the melting of the glassy state is related to the height of the heat capacity peak and varies between water potentials.

  18. Statistical modelling coupled with LC-MS analysis to predict human upper intestinal absorption of phytochemical mixtures.

    PubMed

    Selby-Pham, Sophie N B; Howell, Kate S; Dunshea, Frank R; Ludbey, Joel; Lutz, Adrian; Bennett, Louise

    2018-04-15

    A diet rich in phytochemicals confers benefits for health by reducing the risk of chronic diseases via regulation of oxidative stress and inflammation (OSI). For optimal protective bio-efficacy, the time required for phytochemicals and their metabolites to reach maximal plasma concentrations (T max ) should be synchronised with the time of increased OSI. A statistical model has been reported to predict T max of individual phytochemicals based on molecular mass and lipophilicity. We report the application of the model for predicting the absorption profile of an uncharacterised phytochemical mixture, herein referred to as the 'functional fingerprint'. First, chemical profiles of phytochemical extracts were acquired using liquid chromatography mass spectrometry (LC-MS), then the molecular features for respective components were used to predict their plasma absorption maximum, based on molecular mass and lipophilicity. This method of 'functional fingerprinting' of plant extracts represents a novel tool for understanding and optimising the health efficacy of plant extracts. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Prediction of response to neoadjuvant chemotherapy in breast cancer: a radiomic study

    NASA Astrophysics Data System (ADS)

    Wu, Guolin; Fan, Ming; Zhang, Juan; Zheng, Bin; Li, Lihua

    2017-03-01

    Breast cancer is one of the most malignancies among women in worldwide. Neoadjuvant Chemotherapy (NACT) has gained interest and is increasingly used in treatment of breast cancer in recent years. Therefore, it is necessary to find a reliable non-invasive assessment and prediction method which can evaluate and predict the response of NACT. Recent studies have highlighted the use of MRI for predicting response to NACT. In addition, molecular subtype could also effectively identify patients who are likely have better prognosis in breast cancer. In this study, a radiomic analysis were performed, by extracting features from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) and immunohistochemistry (IHC) to determine subtypes. A dataset with fifty-seven breast cancer patients were included, all of them received preoperative MRI examination. Among them, 47 patients had complete response (CR) or partial response (PR) and 10 had stable disease (SD) to chemotherapy based on the RECIST criterion. A total of 216 imaging features including statistical characteristics, morphology, texture and dynamic enhancement were extracted from DCE-MRI. In multivariate analysis, the proposed imaging predictors achieved an AUC of 0.923 (P = 0.0002) in leave-one-out crossvalidation. The performance of the classifier increased to 0.960, 0.950 and 0.936 when status of HER2, Luminal A and Luminal B subtypes were added into the statistic model, respectively. The results of this study demonstrated that IHC determined molecular status combined with radiomic features from DCE-MRI could be used as clinical marker that is associated with response to NACT.

  20. Molecular biomarkers for chronological age in animal ecology.

    PubMed

    Jarman, Simon N; Polanowski, Andrea M; Faux, Cassandra E; Robbins, Jooke; De Paoli-Iseppi, Ricardo; Bravington, Mark; Deagle, Bruce E

    2015-10-01

    The chronological age of an individual animal predicts many of its biological characteristics, and these in turn influence population-level ecological processes. Animal age information can therefore be valuable in ecological research, but many species have no external features that allow age to be reliably determined. Molecular age biomarkers provide a potential solution to this problem. Research in this area of molecular ecology has so far focused on a limited range of age biomarkers. The most commonly tested molecular age biomarker is change in average telomere length, which predicts age well in a small number of species and tissues, but performs poorly in many other situations. Epigenetic regulation of gene expression has recently been shown to cause age-related modifications to DNA and to cause changes in abundance of several RNA types throughout animal lifespans. Age biomarkers based on these epigenetic changes, and other new DNA-based assays, have already been applied to model organisms, humans and a limited number of wild animals. There is clear potential to apply these marker types more widely in ecological studies. For many species, these new approaches will produce age estimates where this was previously impractical. They will also enable age information to be gathered in cross-sectional studies and expand the range of demographic characteristics that can be quantified with molecular methods. We describe the range of molecular age biomarkers that have been investigated to date and suggest approaches for developing the newer marker types as age assays in nonmodel animal species. © 2015 John Wiley & Sons Ltd.

  1. UManSysProp: an online facility for molecular property prediction and atmospheric aerosol calculations

    NASA Astrophysics Data System (ADS)

    Topping, D.; Barley, M. H.; Bane, M.; Higham, N.; Aumont, B.; McFiggans, G.

    2015-11-01

    In this paper we describe the development and application of a new web based facility, UManSysProp (http://umansysprop.seaes.manchester.ac.uk), for automating predictions of molecular and atmospheric aerosol properties. Current facilities include: pure component vapour pressures, critical properties and sub-cooled densities of organic molecules; activity coefficient predictions for mixed inorganic-organic liquid systems; hygroscopic growth factors and CCN activation potential of mixed inorganic/organic aerosol particles; absorptive partitioning calculations with/without a treatment of non-ideality. The aim of this new facility is to provide a single point of reference for all properties relevant to atmospheric aerosol that have been checked for applicability to atmospheric compounds where possible. The group contribution approach allows users to upload molecular information in the form of SMILES strings and UManSysProp will automatically extract the relevant information for calculations. Built using open source chemical informatics, and hosted at the University of Manchester, the facilities are provided via a browser and device-friendly web-interface, or can be accessed using the user's own code via a JSON API. In this paper we demonstrate its use with specific examples that can be simulated using the web-browser interface.

  2. Predicting bioconcentration of chemicals into vegetation from soil or air using the molecular connectivity index

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dowdy, D.L.; McKone, T.E.; Hsieh, D.P.H.

    1995-12-31

    Bioconcentration factors (BCFs) are the ratio of chemical concentration found in an exposed organism (in this case a plant) to the concentration in an air or soil exposure medium. The authors examine here the use of molecular connectivity indices (MCIs) as quantitative structure-activity relationships (QSARS) for predicting BCFs for organic chemicals between plants and air or soil. The authors compare the reliability of the octanol-air partition coefficient (K{sub oa}) to the MC based prediction method for predicting plant/air partition coefficients. The authors also compare the reliability of the octanol/water partition coefficient (K{sub ow}) to the MC based prediction method formore » predicting plant/soil partition coefficients. The results here indicate that, relative to the use of K{sub ow} or K{sub oa} as predictors of BCFs the MC can substantially increase the reliability with which BCFs can be estimated. The authors find that the MC provides a relatively precise and accurate method for predicting the potential biotransfer of a chemical from environmental media into plants. In addition, the MC is much faster and more cost effective than direct measurements.« less

  3. Geopositioning with a quadcopter: Extracted feature locations and predicted accuracy without a priori sensor attitude information

    NASA Astrophysics Data System (ADS)

    Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron

    2017-05-01

    This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.

  4. Predictive capabilities of statistical learning methods for lung nodule malignancy classification using diagnostic image features: an investigation using the Lung Image Database Consortium dataset

    NASA Astrophysics Data System (ADS)

    Hancock, Matthew C.; Magnan, Jerry F.

    2017-03-01

    To determine the potential usefulness of quantified diagnostic image features as inputs to a CAD system, we investigate the predictive capabilities of statistical learning methods for classifying nodule malignancy, utilizing the Lung Image Database Consortium (LIDC) dataset, and only employ the radiologist-assigned diagnostic feature values for the lung nodules therein, as well as our derived estimates of the diameter and volume of the nodules from the radiologists' annotations. We calculate theoretical upper bounds on the classification accuracy that is achievable by an ideal classifier that only uses the radiologist-assigned feature values, and we obtain an accuracy of 85.74 (+/-1.14)% which is, on average, 4.43% below the theoretical maximum of 90.17%. The corresponding area-under-the-curve (AUC) score is 0.932 (+/-0.012), which increases to 0.949 (+/-0.007) when diameter and volume features are included, along with the accuracy to 88.08 (+/-1.11)%. Our results are comparable to those in the literature that use algorithmically-derived image-based features, which supports our hypothesis that lung nodules can be classified as malignant or benign using only quantified, diagnostic image features, and indicates the competitiveness of this approach. We also analyze how the classification accuracy depends on specific features, and feature subsets, and we rank the features according to their predictive power, statistically demonstrating the top four to be spiculation, lobulation, subtlety, and calcification.

  5. The Value of 5-Aminolevulinic Acid in Low-grade Gliomas and High-grade Gliomas Lacking Glioblastoma Imaging Features: An Analysis Based on Fluorescence, Magnetic Resonance Imaging, 18F-Fluoroethyl Tyrosine Positron Emission Tomography, and Tumor Molecular Factors.

    PubMed

    Jaber, Mohammed; Wölfer, Johannes; Ewelt, Christian; Holling, Markus; Hasselblatt, Martin; Niederstadt, Thomas; Zoubi, Tarek; Weckesser, Matthias; Stummer, Walter

    2016-03-01

    Approximately 20% of grade II and most grade III gliomas fluoresce after 5-aminolevulinic acid (5-ALA) application. Conversely, approximately 30% of nonenhancing gliomas are actually high grade. The aim of this study was to identify preoperative factors (ie, age, enhancement, 18F-fluoroethyl tyrosine positron emission tomography [F-FET PET] uptake ratios) for predicting fluorescence in gliomas without typical glioblastomas imaging features and to determine whether fluorescence will allow prediction of tumor grade or molecular characteristics. Patients harboring gliomas without typical glioblastoma imaging features were given 5-ALA. Fluorescence was recorded intraoperatively, and biopsy specimens collected from fluorescing tissue. World Health Organization (WHO) grade, Ki-67/MIB-1 index, IDH1 (R132H) mutation status, O-methylguanine DNA methyltransferase (MGMT) promoter methylation status, and 1p/19q co-deletion status were assessed. Predictive factors for fluorescence were derived from preoperative magnetic resonance imaging and F-FET PET. Classification and regression tree analysis and receiver-operating-characteristic curves were generated for defining predictors. Of 166 tumors, 82 were diagnosed as WHO grade II, 76 as grade III, and 8 as glioblastomas grade IV. Contrast enhancement, tumor volume, and F-FET PET uptake ratio >1.85 predicted fluorescence. Fluorescence correlated with WHO grade (P < .001) and Ki-67/MIB-1 index (P < .001), but not with MGMT promoter methylation status, IDH1 mutation status, or 1p19q co-deletion status. The Ki-67/MIB-1 index in fluorescing grade III gliomas was higher than in nonfluorescing tumors, whereas in fluorescing and nonfluorescing grade II tumors, no differences were noted. Age, tumor volume, and F-FET PET uptake are factors predicting 5-ALA-induced fluorescence in gliomas without typical glioblastoma imaging features. Fluorescence was associated with an increased Ki-67/MIB-1 index and high-grade pathology. Whether

  6. Wetland features and landscape context predict the risk of wetland habitat loss.

    PubMed

    Gutzwiller, Kevin J; Flather, Curtis H

    2011-04-01

    Wetlands generally provide significant ecosystem services and function as important harbors of biodiversity. To ensure that these habitats are conserved, an efficient means of identifying wetlands at risk of conversion is needed, especially in the southern United States where the rate of wetland loss has been highest in recent decades. We used multivariate adaptive regression splines to develop a model to predict the risk of wetland habitat loss as a function of wetland features and landscape context. Fates of wetland habitats from 1992 to 1997 were obtained from the National Resources Inventory for the U.S. Forest Service's Southern Region, and land-cover data were obtained from the National Land Cover Data. We randomly selected 70% of our 40 617 observations to build the model (n = 28 432), and randomly divided the remaining 30% of the data into five Test data sets (n = 2437 each). The wetland and landscape variables that were important in the model, and their relative contributions to the model's predictive ability (100 = largest, 0 = smallest), were land-cover/ land-use of the surrounding landscape (100.0), size and proximity of development patches within 570 m (39.5), land ownership (39.1), road density within 570 m (37.5), percent woody and herbaceous wetland cover within 570 m (27.8), size and proximity of development patches within 5130 m (25.7), percent grasslands/herbaceous plants and pasture/hay cover within 5130 m (21.7), wetland type (21.2), and percent woody and herbaceous wetland cover within 1710 m (16.6). For the five Test data sets, Kappa statistics (0.40, 0.50, 0.52, 0.55, 0.56; P < 0.0001), area-under-the-receiver-operating-curve (AUC) statistics (0.78, 0.82, 0.83, 0.83, 0.84; P < 0.0001), and percent correct prediction of wetland habitat loss (69.1, 80.4, 81.7, 82.3, 83.1) indicated the model generally had substantial predictive ability across the South. Policy analysts and land-use planners can use the model and associated maps to prioritize

  7. Dynamic features of carboxy cytoglobin distal mutants investigated by molecular dynamics simulations.

    PubMed

    Zhao, Cong; Du, Weihong

    2016-04-01

    Cytoglobin (Cgb) is a member of hemoprotein family with roles in NO metabolism, fibrosis, and tumourigenesis. Similar to other hemoproteins, Cgb structure and functions are markedly influenced by distal key residues. The sixth ligand His(81) (E7) is crucial to exogenous ligand binding, heme pocket conformation, and physiological roles of this protein. However, the effects of other key residues on heme pocket and protein biological functions are not well known. In this work, a molecular dynamics (MD) simulation study of two single mutants in CO-ligated Cgb (L46FCgbCO and L46VCgbCO) and two double mutants (L46FH81QCgbCO and L46VH81QCgbCO) was conducted to explore the effects of the key distal residues Leu(46)(B10) and His(81)(E7) on Cgb structure and functions. Results indicated that the distal mutation of B10 and E7 affected CgbCO dynamic properties on loop region fluctuation, internal cavity rearrangement, and heme motion. The distal conformation change was reflected by the distal key residues Gln(62) (CD3) and Arg(84)(E10). The hydrogen bond between heme propionates with CD3 or E10 residues were evidently influenced by B10/E7 mutation. Furthermore, heme pocket rearrangement was also observed based on the distal pocket volume and occurrence rate of inner cavities. The mutual effects of B10 and E7 residues on protein conformational rearrangement and other dynamic features were expressed in current MD studies of CgbCO and its distal mutants, suggesting their crucial role in heme pocket stabilization, ligand binding, and Cgb biological functions. The mutation of distal B10 and E7 residues affects the dynamic features of carboxy cytoglobin.

  8. Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou's general pseudo amino acid composition and motif features.

    PubMed

    Mei, Juan; Zhao, Ji

    2018-06-14

    Presynaptic neurotoxins and postsynaptic neurotoxins are two important neurotoxins isolated from venoms of venomous animals and have been proven to be potential effective in neurosciences and pharmacology. With the number of toxin sequences appeared in the public databases, there was a need for developing a computational method for fast and accurate identification and classification of the novel presynaptic neurotoxins and postsynaptic neurotoxins in the large databases. In this study, the Multinomial Naive Bayes Classifier (MNBC) had been developed to discriminate the presynaptic neurotoxins and postsynaptic neurotoxins based on the different kinds of features. The Minimum Redundancy Maximum Relevance (MRMR) feature selection method was used for ranking 400 pseudo amino acid (PseAA) compositions and 50 top ranked PseAA compositions were selected for improving the prediction results. The motif features, 400 PseAA compositions and 50 PseAA compositions were combined together, and selected as the input parameters of MNBC. The best correlation coefficient (CC) value of 0.8213 was obtained when the prediction quality was evaluated by the jackknife test. It was anticipated that the algorithm presented in this study may become a useful tool for identification of presynaptic neurotoxin and postsynaptic neurotoxin sequences and may provide some useful help for in-depth investigation into the biological mechanism of presynaptic neurotoxins and postsynaptic neurotoxins. Copyright © 2018 Elsevier Ltd. All rights reserved.

  9. Prediction of anticancer activity of diterpenes isolated from the paraiban flora through a PLS model and molecular surfaces.

    PubMed

    Scotti, Luciana; Scotti, Marcus T; Ishiki, Hamilton; Junior, Francisco J B M; dos, Santos Paula F; Tavares, Josean F; da Silva, Marcelo S

    2014-05-01

    The aim of this work was to predict the anticancer potential of 3 atisane, and 3 trachylobane diterpene compounds extracted from the roots of Xylopia langsdorffiana. The prediction of anticancer activity as expressed against PC-3 tumor cells was made using a PLS model built with 26 diterpenes in the training set. Significant statistical measures were obtained. The six investigated diterpenes were applied to the model and their activities against PC-3 cells were calculated. All the diterpenes were active, with atisane diterpenes showing the higher pICso values. In human prostate carcinoma PC-3 cells, the apoptosis mechanism is related to an inhibition of IKK/NF-KB. Antioxidant potential implies a greater electronic molecular atmosphere (increased donor electron capacity), which can reduce radical reactivity, and facilitate post donation charge accommodation. Molecular surfaces indicated a much greater electronic cloud over atisane diterpenes.

  10. T-cell epitope prediction and immune complex simulation using molecular dynamics: state of the art and persisting challenges

    PubMed Central

    2010-01-01

    Atomistic Molecular Dynamics provides powerful and flexible tools for the prediction and analysis of molecular and macromolecular systems. Specifically, it provides a means by which we can measure theoretically that which cannot be measured experimentally: the dynamic time-evolution of complex systems comprising atoms and molecules. It is particularly suitable for the simulation and analysis of the otherwise inaccessible details of MHC-peptide interaction and, on a larger scale, the simulation of the immune synapse. Progress has been relatively tentative yet the emergence of truly high-performance computing and the development of coarse-grained simulation now offers us the hope of accurately predicting thermodynamic parameters and of simulating not merely a handful of proteins but larger, longer simulations comprising thousands of protein molecules and the cellular scale structures they form. We exemplify this within the context of immunoinformatics. PMID:21067546

  11. A new predictive model for the bioconcentration factors of polychlorinated biphenyls (PCBs) based on the molecular electronegativity distance vector (MEDV).

    PubMed

    Qin, Li-Tang; Liu, Shu-Shen; Liu, Hai-Ling; Ge, Hui-Lin

    2008-02-01

    Polychlorinated biphenyls (PCBs) are some of the most prevalent pollutants in the total environment and receive more and more concerns as a group of ubiquitous potential persistent organic pollutants. Using the variable selection and modeling based on prediction (VSMP), the molecular electronegativity distance vector (MEDV) derived directly from the molecular topological structures was employed to develop a linear model (MI) between the bioconcentration factors (BCF) and two MEDV descriptors of 58 PCBs. The MI model showed a good estimation ability with a correlation coefficient (r) of 0.9605 and a high stability with a leave-one-out cross-validation correlation coefficient (q) of 0.9564. The MEDV-base model (MI) is easier to use than the splinoid poset method reported by Ivanciuc et al. [Ivanciuc, T., Ivanciuc, O., Klein, D.J., 2006. Modeling the bioconcentration factors and bioaccumulation factors of polychlorinated biphenyls with posetic quatitative super-structure/activity relationships (QSSAR). Mol. Divers. 10, 133-145] and gives a better statistics than molecular connectivity index (MCI)-base model developed by Hu et al. [Hu, H.Y., Xu, F.L., Li, B.G., Cao, J., Dawson, R., Tao, S., 2005. Prediction of the bioconcentration factor of PCBs in fish using the molecular connectivity index and fragment constant models. Water Environ. Res. 77, 87-97]. Main structural factors influencing the BCF of PCBs are the substructures expressed by two atomic groups >C= and -CH=. 58 PCBs were divided into an "odd set" and "even set" in order to ensure the predicted potential of the MI for the external samples. It was shown that three models, MI, MO for "odd set", and ME for "even set", can be used to predict the BCF of remaining 152 PCBs in which the experimental BCFs are not available.

  12. Some Questions about Feature Re-Assembly

    ERIC Educational Resources Information Center

    White, Lydia

    2009-01-01

    In this commentary, differences between feature re-assembly and feature selection are discussed. Lardiere's proposals are compared to existing approaches to grammatical features in second language (L2) acquisition. Questions are raised about the predictive power of the feature re-assembly approach. (Contains 1 footnote.)

  13. Prediction of BRAF mutation status of craniopharyngioma using magnetic resonance imaging features.

    PubMed

    Yue, Qi; Yu, Yang; Shi, Zhifeng; Wang, Yongfei; Zhu, Wei; Du, Zunguo; Yao, Zhenwei; Chen, Liang; Mao, Ying

    2017-10-06

    .91. The area under the ROC curve for the sum of all 5 diagnostic criteria was 0.989 (p < 0.001). CONCLUSIONS The BRAF mutation status of craniopharyngiomas might be predicted using certain MRI features with relatively high sensitivity and specificity, thus offering potential guidance for the preoperative administration of BRAF mutation inhibitors.

  14. US-guided percutaneous cholecystostomy: features predicting culture-positive bile and clinical outcome.

    PubMed

    Sosna, Jacob; Kruskal, Jonathan B; Copel, Laurian; Goldberg, S Nahum; Kane, Robert A

    2004-03-01

    To assess sonographic and clinical features that might be used to predict infected bile and/or patient outcome from ultrasonography (US)-guided percutaneous cholecystostomy. Between February 1997 and August 2002 at one institution, 112 patients underwent US-guided percutaneous cholecystostomy (59 men, 53 women; average age, 69.3 years). All US images were scored on a defined semiquantitative scale according to preset parameters: (a) gallbladder distention, (b) sludge and/or stones, (c) wall appearance, (d) pericholecystic fluid, and (e) common bile duct size and/or choledocholithiasis. Separate and total scores were generated. Retrospective evaluation of (a) the bacteriologic growth of aspirated bile and its color and (b) clinical indices (fever, white blood cell count, bilirubin level, liver function test results) was conducted by reviewing medical records. For each patient, the clinical manifestation was classified into four groups: (a) localized right upper quadrant symptoms, (b) generalized abdominal symptoms, (c) unexplained sepsis, or (d) sepsis with other known infection. Logistic regression models, exact Wilcoxon-Mann-Whitney test, and the Kruskal-Wallis test were used. Forty-seven (44%) of 107 patients had infected bile. A logistic regression model showed that wall appearance, distention, bile color, and pericholecystic fluid were not individually significant predictors for culture-positive bile, leaving sludge and/or stones (P =.003, odds ratio = 1.647), common bile duct status (P =.02, odds ratio = 2.214), and total score (P =.007, odds ratio = 1.267). No US covariates or clinical indices predicted clinical outcome. Clinical manifestation was predictive of clinical outcome (P =.001) and aspirating culture-positive bile (P =.008); specifically, 30 (86%) of 35 patients with right upper quadrant symptoms had their condition improve, compared with one (7%) of 15 asymptomatic patients with other known causes of infection. US variables can be used to predict

  15. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues.

    PubMed

    Guo, Song; Liu, Chunhua; Zhou, Peng; Li, Yanling

    2016-01-01

    Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields.

  16. A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues

    PubMed Central

    Liu, Chunhua; Zhou, Peng; Li, Yanling

    2016-01-01

    Tyrosine sulfation is one of the ubiquitous protein posttranslational modifications, where some sulfate groups are added to the tyrosine residues. It plays significant roles in various physiological processes in eukaryotic cells. To explore the molecular mechanism of tyrosine sulfation, one of the prerequisites is to correctly identify possible protein tyrosine sulfation residues. In this paper, a novel method was presented to predict protein tyrosine sulfation residues from primary sequences. By means of informative feature construction and elaborate feature selection and parameter optimization scheme, the proposed predictor achieved promising results and outperformed many other state-of-the-art predictors. Using the optimal features subset, the proposed method achieved mean MCC of 94.41% on the benchmark dataset, and a MCC of 90.09% on the independent dataset. The experimental performance indicated that our new proposed method could be effective in identifying the important protein posttranslational modifications and the feature selection scheme would be powerful in protein functional residues prediction research fields. PMID:27034949

  17. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further

  18. Features in visual search combine linearly

    PubMed Central

    Pramod, R. T.; Arun, S. P.

    2014-01-01

    Single features such as line orientation and length are known to guide visual search, but relatively little is known about how multiple features combine in search. To address this question, we investigated how search for targets differing in multiple features (intensity, length, orientation) from the distracters is related to searches for targets differing in each of the individual features. We tested race models (based on reaction times) and co-activation models (based on reciprocal of reaction times) for their ability to predict multiple feature searches. Multiple feature searches were best accounted for by a co-activation model in which feature information combined linearly (r = 0.95). This result agrees with the classic finding that these features are separable i.e., subjective dissimilarity ratings sum linearly. We then replicated the classical finding that the length and width of a rectangle are integral features—in other words, they combine nonlinearly in visual search. However, to our surprise, upon including aspect ratio as an additional feature, length and width combined linearly and this model outperformed all other models. Thus, length and width of a rectangle became separable when considered together with aspect ratio. This finding predicts that searches involving shapes with identical aspect ratio should be more difficult than searches where shapes differ in aspect ratio. We confirmed this prediction on a variety of shapes. We conclude that features in visual search co-activate linearly and demonstrate for the first time that aspect ratio is a novel feature that guides visual search. PMID:24715328

  19. Molecular Identification and Epidemiological Features of Human Adenoviruses Associated with Acute Respiratory Infections in Hospitalized Children in Southern China, 2012-2013.

    PubMed

    Chen, Yi; Liu, Fanghua; Wang, Changbing; Zhao, Mingqi; Deng, Li; Zhong, Jiayu; Zhang, Yingying; Ye, Jun; Jing, Shuping; Cheng, Zetao; Guan, Yongxin; Ma, Yi; Sun, Yuanyuan; Zhu, Bing; Zhang, Qiwei

    2016-01-01

    metropolitan area. Phylogenetic analysis indicated that all the HVR sequences of the hexon gene of HAdV-3 and -7 strains have high similarity within their individual types, and these strains were also similar to those circulating in China currently, indicating the conservation of hexon genes of both HAdV-3 and HAdV-7. Knowledge of the epidemiological features and molecular types of HAdV, a major pathogen of pediatric ARI, as well as other co-infected respiratory pathogens circulating in Guangzhou, southern China, is vital to predict and prevent future disease outbreaks in children. This study will certainly facilitate HAdV vaccine development and treatment of HAdV infections in children.

  20. Assessment of two mammographic density related features in predicting near-term breast cancer risk

    NASA Astrophysics Data System (ADS)

    Zheng, Bin; Sumkin, Jules H.; Zuley, Margarita L.; Wang, Xingwei; Klym, Amy H.; Gur, David

    2012-02-01

    In order to establish a personalized breast cancer screening program, it is important to develop risk models that have high discriminatory power in predicting the likelihood of a woman developing an imaging detectable breast cancer in near-term (e.g., <3 years after a negative examination in question). In epidemiology-based breast cancer risk models, mammographic density is considered the second highest breast cancer risk factor (second to woman's age). In this study we explored a new feature, namely bilateral mammographic density asymmetry, and investigated the feasibility of predicting near-term screening outcome. The database consisted of 343 negative examinations, of which 187 depicted cancers that were detected during the subsequent screening examination and 155 that remained negative. We computed the average pixel value of the segmented breast areas depicted on each cranio-caudal view of the initial negative examinations. We then computed the mean and difference mammographic density for paired bilateral images. Using woman's age, subjectively rated density (BIRADS), and computed mammographic density related features we compared classification performance in estimating the likelihood of detecting cancer during the subsequent examination using areas under the ROC curves (AUC). The AUCs were 0.63+/-0.03, 0.54+/-0.04, 0.57+/-0.03, 0.68+/-0.03 when using woman's age, BIRADS rating, computed mean density and difference in computed bilateral mammographic density, respectively. Performance increased to 0.62+/-0.03 and 0.72+/-0.03 when we fused mean and difference in density with woman's age. The results suggest that, in this study, bilateral mammographic tissue density is a significantly stronger (p<0.01) risk indicator than both woman's age and mean breast density.