Science.gov

Sample records for molecular features predicting

  1. Radiomic analysis reveals DCE-MRI features for prediction of molecular subtypes of breast cancer.

    PubMed

    Fan, Ming; Li, Hui; Wang, Shijian; Zheng, Bin; Zhang, Juan; Li, Lihua

    2017-01-01

    The purpose of this study was to investigate the role of features derived from breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) and to incorporated clinical information to predict the molecular subtypes of breast cancer. In particular, 60 breast cancers with the following four molecular subtypes were analyzed: luminal A, luminal B, human epidermal growth factor receptor-2 (HER2)-over-expressing and basal-like. The breast region was segmented and the suspicious tumor was depicted on sequentially scanned MR images from each case. In total, 90 features were obtained, including 88 imaging features related to morphology and texture as well as dynamic features from tumor and background parenchymal enhancement (BPE) and 2 clinical information-based parameters, namely, age and menopausal status. An evolutionary algorithm was used to select an optimal subset of features for classification. Using these features, we trained a multi-class logistic regression classifier that calculated the area under the receiver operating characteristic curve (AUC). The results of a prediction model using 24 selected features showed high overall classification performance, with an AUC value of 0.869. The predictive model discriminated among the luminal A, luminal B, HER2 and basal-like subtypes, with AUC values of 0.867, 0.786, 0.888 and 0.923, respectively. An additional independent dataset with 36 patients was utilized to validate the results. A similar classification analysis of the validation dataset showed an AUC of 0.872 using 15 image features, 10 of which were identical to those from the first cohort. We identified clinical information and 3D imaging features from DCE-MRI as candidate biomarkers for discriminating among four molecular subtypes of breast cancer.

  2. Radiomic analysis reveals DCE-MRI features for prediction of molecular subtypes of breast cancer

    PubMed Central

    Fan, Ming; Li, Hui; Wang, Shijian; Zheng, Bin; Zhang, Juan; Li, Lihua

    2017-01-01

    The purpose of this study was to investigate the role of features derived from breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) and to incorporated clinical information to predict the molecular subtypes of breast cancer. In particular, 60 breast cancers with the following four molecular subtypes were analyzed: luminal A, luminal B, human epidermal growth factor receptor-2 (HER2)-over-expressing and basal-like. The breast region was segmented and the suspicious tumor was depicted on sequentially scanned MR images from each case. In total, 90 features were obtained, including 88 imaging features related to morphology and texture as well as dynamic features from tumor and background parenchymal enhancement (BPE) and 2 clinical information-based parameters, namely, age and menopausal status. An evolutionary algorithm was used to select an optimal subset of features for classification. Using these features, we trained a multi-class logistic regression classifier that calculated the area under the receiver operating characteristic curve (AUC). The results of a prediction model using 24 selected features showed high overall classification performance, with an AUC value of 0.869. The predictive model discriminated among the luminal A, luminal B, HER2 and basal-like subtypes, with AUC values of 0.867, 0.786, 0.888 and 0.923, respectively. An additional independent dataset with 36 patients was utilized to validate the results. A similar classification analysis of the validation dataset showed an AUC of 0.872 using 15 image features, 10 of which were identical to those from the first cohort. We identified clinical information and 3D imaging features from DCE-MRI as candidate biomarkers for discriminating among four molecular subtypes of breast cancer. PMID:28166261

  3. Using molecular features of xenobiotics to predict hepatic gene expression response.

    PubMed

    Fernald, Guy Haskin; Altman, Russ B

    2013-10-28

    Despite recent advances in molecular medicine and rational drug design, many drugs still fail because toxic effects arise at the cellular and tissue level. In order to better understand these effects, cellular assays can generate high-throughput measurements of gene expression changes induced by small molecules. However, our understanding of how the chemical features of small molecules influence gene expression is very limited. Therefore, we investigated the extent to which chemical features of small molecules can reliably be associated with significant changes in gene expression. Specifically, we analyzed the gene expression response of rat liver cells to 170 different drugs and searched for genes whose expression could be related to chemical features alone. Surprisingly, we can predict the up-regulation of 87 genes (increased expression of at least 1.5 times compared to controls). We show an average cross-validation predictive area under the receiver operating characteristic curve (AUROC) of 0.7 or greater for each of these 87 genes. We applied our method to an external data set of rat liver gene expression response to a novel drug and achieved an AUROC of 0.7. We also validated our approach by predicting up-regulation of Cytochrome P450 1A2 (CYP1A2) in three drugs known to induce CYP1A2 that were not in our data set. Finally, a detailed analysis of the CYP1A2 predictor allowed us to identify which fragments made significant contributions to the predictive scores.

  4. Supported bimetallic Pt-Au nanoparticles: Structural features predicted by molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Morrow, Brian H.; Striolo, Alberto

    2010-04-01

    We have utilized all-atom molecular dynamics simulations to study bimetallic Pt-Au nanoparticles supported by carbonaceous materials at 700 K. Nanoparticles containing 250 atoms with 25%, 50%, and 75% Pt ( Pt62Au188 , Pt125Au125 , and Pt188Au62 , respectively) were considered. A single graphite sheet and bundles of seven (10,10), (13,13), and (20,20) single-walled carbon nanotubes were used as supports. It was found that Pt125Au125 forms a well-defined Pt core covered by an Au shell, regardless of the support. Pt62Au188 exhibits a mixed Pt-Au core with an Au shell. Pt188Au62 has a Pt core with a mixed Pt-Au shell. The support affects the atomic distribution. We investigated the percentage of nanoparticle surface atoms that are Pt. Our results show that for Pt62Au188 and Pt125Pt125 , this percentage is lowest when there is no support and highest when carbon nanotubes are supports. We studied the size of clusters of Pt atoms on the nanoparticle surface, finding that the geometry of the support influences the distribution of cluster sizes. Finally, we found that the coordination states of the atoms on the nanoparticle surface are affected by the support structure. These results suggest that it is possible to tailor the distribution of atoms in Pt-Au nanoparticles by controlling the nanoparticle composition and the support geometry. Such level of control is desirable for improving selectivity of catalysts.

  5. Evaluation of tumor-derived MRI-texture features for discrimination of molecular subtypes and prediction of 12-month survival status in glioblastoma

    PubMed Central

    Yang, Dalu; Rao, Ganesh; Martinez, Juan; Veeraraghavan, Ashok; Rao, Arvind

    2015-01-01

    Purpose: Glioblastoma multiforme (GBM) is the most common and aggressive primary brain cancer. Four molecular subtypes of GBM have been described but can only be determined by an invasive brain biopsy. The goal of this study is to evaluate the utility of texture features extracted from magnetic resonance imaging (MRI) scans as a potential noninvasive method to characterize molecular subtypes of GBM and to predict 12-month overall survival status for GBM patients. Methods: The authors manually segmented the tumor regions from postcontrast T1 weighted and T2 fluid-attenuated inversion recovery (FLAIR) MRI scans of 82 patients with de novo GBM. For each patient, the authors extracted five sets of computer-extracted texture features, namely, 48 segmentation-based fractal texture analysis (SFTA) features, 576 histogram of oriented gradients (HOGs) features, 44 run-length matrix (RLM) features, 256 local binary patterns features, and 52 Haralick features, from the tumor slice corresponding to the maximum tumor area in axial, sagittal, and coronal planes, respectively. The authors used an ensemble classifier called random forest on each feature family to predict GBM molecular subtypes and 12-month survival status (a dichotomized version of overall survival at the 12-month time point indicating if the patient was alive or not at 12 months). The performance of the prediction was quantified and compared using receiver operating characteristic (ROC) curves. Results: With the appropriate combination of texture feature set, image plane (axial, coronal, or sagittal), and MRI sequence, the area under ROC curve values for predicting different molecular subtypes and 12-month survival status are 0.72 for classical (with Haralick features on T1 postcontrast axial scan), 0.70 for mesenchymal (with HOG features on T2 FLAIR axial scan), 0.75 for neural (with RLM features on T2 FLAIR axial scan), 0.82 for proneural (with SFTA features on T1 postcontrast coronal scan), and 0.69 for 12

  6. In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method.

    PubMed

    Li, Bing-Ke; Cong, Yong; Yang, Xue-Gang; Xue, Ying; Chen, Yi-Zong

    2013-05-01

    We tested four machine learning methods, support vector machine (SVM), k-nearest neighbor, back-propagation neural network and C4.5 decision tree for their capability in predicting spleen tyrosine kinase (Syk) inhibitors by using 2592 compounds which are more diverse than those in other studies. The recursive feature elimination method was used for improving prediction performance and selecting molecular descriptors responsible for distinguishing Syk inhibitors and non-inhibitors. Among four machine learning models, SVM produces the best performance at 99.18% for inhibitors and 98.82% for non-inhibitors, respectively, indicating that the SVM is potentially useful for facilitating the discovery of Syk inhibitors. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Semen molecular and cellular features: these parameters can reliably predict subsequent ART outcome in a goat model

    PubMed Central

    Berlinguer, Fiammetta; Madeddu, Manuela; Pasciu, Valeria; Succu, Sara; Spezzigu, Antonio; Satta, Valentina; Mereu, Paolo; Leoni, Giovanni G; Naitana, Salvatore

    2009-01-01

    Currently, the assessment of sperm function in a raw or processed semen sample is not able to reliably predict sperm ability to withstand freezing and thawing procedures and in vivo fertility and/or assisted reproductive biotechnologies (ART) outcome. The aim of the present study was to investigate which parameters among a battery of analyses could predict subsequent spermatozoa in vitro fertilization ability and hence blastocyst output in a goat model. Ejaculates were obtained by artificial vagina from 3 adult goats (Capra hircus) aged 2 years (A, B and C). In order to assess the predictive value of viability, computer assisted sperm analyzer (CASA) motility parameters and ATP intracellular concentration before and after thawing and of DNA integrity after thawing on subsequent embryo output after an in vitro fertility test, a logistic regression analysis was used. Individual differences in semen parameters were evident for semen viability after thawing and DNA integrity. Results of IVF test showed that spermatozoa collected from A and B lead to higher cleavage rates (0 < 0.01) and blastocysts output (p < 0.05) compared with C. Logistic regression analysis model explained a deviance of 72% (p < 0.0001), directly related with the mean percentage of rapid spermatozoa in fresh semen (p < 0.01), semen viability after thawing (p < 0.01), and with two of the three comet parameters considered, i.e tail DNA percentage and comet length (p < 0.0001). DNA integrity alone had a high predictive value on IVF outcome with frozen/thawed semen (deviance explained: 57%). The model proposed here represents one of the many possible ways to explain differences found in embryo output following IVF with different semen donors and may represent a useful tool to select the most suitable donors for semen cryopreservation. PMID:19900288

  8. Analysis of the molecular features of rectal carcinoid tumors to identify new biomarkers that predict biological malignancy

    PubMed Central

    Ito, Miki; Igarashi, Hisayoshi; Ishigami, Keisuke; Sukawa, Yasutaka; Tachibana, Mami; Takahashi, Hiroaki; Tokino, Takashi; Maruyama, Reo; Suzuki, Hiromu; Imai, Kohzoh; Shinomura, Yasuhisa

    2015-01-01

    Although gastrointestinal carcinoid tumors are relatively rare in the digestive tract, a quarter of them are present in the rectum. In the absence of specific tumor biomarkers, lymphatic or vascular invasion is generally used to predict the risk of lymph node metastasis. We, therefore, examined the genetic and epigenetic alterations potentially associated with lymphovascular invasion among 56 patients with rectal carcinoid tumors. We also conducted a microRNA (miRNA) array analysis. Our analysis failed to detect mutations in BRAF, KRAS, NRAS, or PIK3CA or any microsatellite instability (MSI); however, we did observe CpG island methylator phenotype (CIMP) positivity in 13% (7/56) of the carcinoid tumors. The CIMP-positive status was significantly correlated with lymphovascular invasion (P = 0.036). The array analysis revealed that microRNA-885 (miR-885)-5p was the most up-regulated miRNA in the carcinoid tumors with lymphovascular invasion compared with that in those without invasion. In addition, high miR-885-5p expression was independently associated with lymphovascular invasion (P = 0.0002). In conclusion, our findings suggest that miR-885-5p and CIMP status may be useful biomarkers for predicting biological malignancy in patients with rectal carcinoid tumors. PMID:26090613

  9. Nearly maximally predictive features and their dimensions

    NASA Astrophysics Data System (ADS)

    Marzen, Sarah E.; Crutchfield, James P.

    2017-05-01

    Scientific explanation often requires inferring maximally predictive features from a given data set. Unfortunately, the collection of minimal maximally predictive features for most stochastic processes is uncountably infinite. In such cases, one compromises and instead seeks nearly maximally predictive features. Here, we derive upper bounds on the rates at which the number and the coding cost of nearly maximally predictive features scale with desired predictive power. The rates are determined by the fractal dimensions of a process' mixed-state distribution. These results, in turn, show how widely used finite-order Markov models can fail as predictors and that mixed-state predictive features can offer a substantial improvement.

  10. Predicting aqueous solubility of environmentally relevant compounds from molecular features: a simple but highly effective four-dimensional model based on Project to Latent Structures.

    PubMed

    Xiao, Feng; Gulliver, John S; Simcik, Matt F

    2013-09-15

    The aqueous solubility (log S) of xenobiotic chemicals has been identified as a key characteristic in determining their bioaccessibility/bioavailability and their fate and transport in aquatic environments. We here explore and evaluate the use of a state-of-the-art data analysis technique (Project to Latent Structures, PLS) to estimate log S of environmentally relevant chemicals. A large number (n = 624) of molecular descriptors was computed for over 1400 organic chemicals, and then refined by a feature selection technique. Candidate predictor descriptors were fitted to data by means of PLS, which was optimized by an internal leave-one-out cross-validation technique and validated by an external data set. The final (best) PLS model with only four variables (AlogP, X1sol, Mv, and E) exhibited noteworthy stability and good predictive power. It was able to explain 91% of the data (n = 1400) variance with an average absolute error of 0.5 log units through the solubilities span over 12 orders of magnitude. The newly proposed model is transparent, easily portable from one user to another, and robust enough to accurately estimate log S of a wide range of emerging contaminants.

  11. Predicting discovery rates of genomic features.

    PubMed

    Gravel, Simon

    2014-06-01

    Successful sequencing experiments require judicious sample selection. However, this selection must often be performed on the basis of limited preliminary data. Predicting the statistical properties of the final sample based on preliminary data can be challenging, because numerous uncertain model assumptions may be involved. Here, we ask whether we can predict "omics" variation across many samples by sequencing only a fraction of them. In the infinite-genome limit, we find that a pilot study sequencing 5% of a population is sufficient to predict the number of genetic variants in the entire population within 6% of the correct value, using an estimator agnostic to demography, selection, or population structure. To reach similar accuracy in a finite genome with millions of polymorphisms, the pilot study would require ∼15% of the population. We present computationally efficient jackknife and linear programming methods that exhibit substantially less bias than the state of the art when applied to simulated data and subsampled 1000 Genomes Project data. Extrapolating based on the National Heart, Lung, and Blood Institute Exome Sequencing Project data, we predict that 7.2% of sites in the capture region would be variable in a sample of 50,000 African Americans and 8.8% in a European sample of equal size. Finally, we show how the linear programming method can also predict discovery rates of various genomic features, such as the number of transcription factor binding sites across different cell types.

  12. Clinical features and molecular bases of neuroacanthocytosis.

    PubMed

    Rampoldi, Luca; Danek, Adrian; Monaco, Anthony P

    2002-08-01

    The term acanthocytosis is derived from the Greek for "thorn" and is used to describe a peculiar spiky appearance of erythrocytes. Acanthocytosis is found to be associated with at least three hereditary neurological disorders that are generally referred to as neuroacanthocytosis. Abetalipoproteinaemia is an autosomal recessive condition, characterised by absence of serum apolipoprotein B containing lipoproteins leading to fat intolerance and fat-soluble vitamin deficiency. This results in a progressive spinocerebellar ataxia with peripheral neuropathy and retinitis pigmentosa. Chorea-acanthocytosis is also an autosomal recessive condition and is characterised by chorea, orofaciolingual dyskinesia, dysphagia, dysarthria, areflexia, seizures and dementia. Some of its features, including choreic movements, peripheral neuropathy with areflexia, elevated serum creatine kinase levels and myopathy are shared by another form of neuroacanthocytosis, McLeod syndrome. Patients affected by this X-linked disorder also show abnormal expression of Kell blood group antigens and a permanent haemolytic state. In addition to these cases, acanthocytosis is occasionally associated with other neurological disorders, such as Hallervorden-Spatz disease. For each of the neuroacanthocytosis syndromes we review the main clinical features and their molecular bases. The recent molecular genetics findings are the first step towards the understanding of the pathogenetic mechanisms and eventually the search for effective treatments.

  13. Feature Selection for Wheat Yield Prediction

    NASA Astrophysics Data System (ADS)

    Ruß, Georg; Kruse, Rudolf

    Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.

  14. Extron prediction method based on improved period-3 feature strategy

    NASA Astrophysics Data System (ADS)

    Chen, Gong; Dou, Xiao-Ming; Zhu, Xi-Fang

    2017-07-01

    To improve the accuracy of the gene encoding (exon) prediction, near period-3 feature exons prediction algorithm is proposed. Near period-3 clustering power spectrum of extrons and introns are extracted as template feature, DNA sequence is divided into frames and moved. Compared with the template feature, the prediction of the Euclidean distance with different weights is realized from each frame. By changing the different feature, number, frame length, gene sequence weight and comparing with period-3 algorithm, the experiment results show that the prediction accuracy of the proposed algorithm is better than that period-3 algorithm.

  15. Feature Selection for Neural Network Based Stock Prediction

    NASA Astrophysics Data System (ADS)

    Sugunnasil, Prompong; Somhom, Samerkae

    We propose a new methodology of feature selection for stock movement prediction. The methodology is based upon finding those features which minimize the correlation relation function. We first produce all the combination of feature and evaluate each of them by using our evaluate function. We search through the generated set with hill climbing approach. The self-organizing map based stock prediction model is utilized as the prediction method. We conduct the experiment on data sets of the Microsoft Corporation, General Electric Co. and Ford Motor Co. The results show that our feature selection method can improve the efficiency of the neural network based stock prediction.

  16. Prediction of DNA-binding proteins from relational features

    PubMed Central

    2012-01-01

    Background The process of protein-DNA binding has an essential role in the biological processing of genetic information. We use relational machine learning to predict DNA-binding propensity of proteins from their structures. Automatically discovered structural features are able to capture some characteristic spatial configurations of amino acids in proteins. Results Prediction based only on structural relational features already achieves competitive results to existing methods based on physicochemical properties on several protein datasets. Predictive performance is further improved when structural features are combined with physicochemical features. Moreover, the structural features provide some insights not revealed by physicochemical features. Our method is able to detect common spatial substructures. We demonstrate this in experiments with zinc finger proteins. Conclusions We introduced a novel approach for DNA-binding propensity prediction using relational machine learning which could potentially be used also for protein function prediction in general. PMID:23146001

  17. Macromolecular target prediction by self-organizing feature maps.

    PubMed

    Schneider, Gisbert; Schneider, Petra

    2017-03-01

    Rational drug discovery would greatly benefit from a more nuanced appreciation of the activity of pharmacologically active compounds against a diverse panel of macromolecular targets. Already, computational target-prediction models assist medicinal chemists in library screening, de novo molecular design, optimization of active chemical agents, drug re-purposing, in the spotting of potential undesired off-target activities, and in the 'de-orphaning' of phenotypic screening hits. The self-organizing map (SOM) algorithm has been employed successfully for these and other purposes. Areas covered: The authors recapitulate contemporary artificial neural network methods for macromolecular target prediction, and present the basic SOM algorithm at a conceptual level. Specifically, they highlight consensus target-scoring by the employment of multiple SOMs, and discuss the opportunities and limitations of this technique. Expert opinion: Self-organizing feature maps represent a straightforward approach to ligand clustering and classification. Some of the appeal lies in their conceptual simplicity and broad applicability domain. Despite known algorithmic shortcomings, this computational target prediction concept has been proven to work in prospective settings with high success rates. It represents a prototypic technique for future advances in the in silico identification of the modes of action and macromolecular targets of bioactive molecules.

  18. Learning through Feature Prediction: An Initial Investigation into Teaching Categories to Children with Autism through Predicting Missing Features

    ERIC Educational Resources Information Center

    Sweller, Naomi

    2015-01-01

    Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…

  19. Learning through Feature Prediction: An Initial Investigation into Teaching Categories to Children with Autism through Predicting Missing Features

    ERIC Educational Resources Information Center

    Sweller, Naomi

    2015-01-01

    Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…

  20. Predicting protein subcellular locations with feature selection and analysis.

    PubMed

    Cai, Yudong; He, Jianfeng; Li, Xinlei; Feng, Kaiyan; Lu, Lin; Feng, Kairui; Kong, Xiangyin; Lu, Wencong

    2010-04-01

    In this paper, we propose a strategy to predict the subcellular locations of proteins by combining various feature selection methods. Firstly, proteins are coded by amino-acid composition and physicochemical properties, then these features are arranged by Minimum Redundancy Maximum Relevance method and further filtered by feature selection procedure. Nearest Neighbor Algorithm is used as a prediction model to predict the protein subcellular locations, and gains a correct prediction rate of 70.63%, evaluated by Jackknife cross-validation. Results of feature selection also enable us to identify the most important protein properties. The prediction software is available for public access on the website http://chemdata.shu.edu.cn/sub22/, which may play a important complementary role to a series of web-server predictors summarized recently in a review by Chou and Shen (Chou, K.C., Shen, H.B. Natural Science, 2009, 2, 63-92, http://www.scirp.org/journal/NS/).

  1. Bayesian profiling of molecular signatures to predict event times

    PubMed Central

    Zhang, Dabao; Zhang, Min

    2007-01-01

    Background It is of particular interest to identify cancer-specific molecular signatures for early diagnosis, monitoring effects of treatment and predicting patient survival time. Molecular information about patients is usually generated from high throughput technologies such as microarray and mass spectrometry. Statistically, we are challenged by the large number of candidates but only a small number of patients in the study, and the right-censored clinical data further complicate the analysis. Results We present a two-stage procedure to profile molecular signatures for survival outcomes. Firstly, we group closely-related molecular features into linkage clusters, each portraying either similar or opposite functions and playing similar roles in prognosis; secondly, a Bayesian approach is developed to rank the centroids of these linkage clusters and provide a list of the main molecular features closely related to the outcome of interest. A simulation study showed the superior performance of our approach. When it was applied to data on diffuse large B-cell lymphoma (DLBCL), we were able to identify some new candidate signatures for disease prognosis. Conclusion This multivariate approach provides researchers with a more reliable list of molecular features profiled in terms of their prognostic relationship to the event times, and generates dependable information for subsequent identification of prognostic molecular signatures through either biological procedures or further data analysis. PMID:17239251

  2. Robust feature selection to predict tumor treatment outcome.

    PubMed

    Mi, Hongmei; Petitjean, Caroline; Dubray, Bernard; Vera, Pierre; Ruan, Su

    2015-07-01

    Recurrence of cancer after treatment increases the risk of death. The ability to predict the treatment outcome can help to design the treatment planning and can thus be beneficial to the patient. We aim to select predictive features from clinical and PET (positron emission tomography) based features, in order to provide doctors with informative factors so as to anticipate the outcome of the patient treatment. In order to overcome the small sample size problem of datasets usually met in the medical domain, we propose a novel wrapper feature selection algorithm, named HFS (hierarchical forward selection), which searches forward in a hierarchical feature subset space. Feature subsets are iteratively evaluated with the prediction performance using SVM (support vector machine). All feature subsets performing better than those at the preceding iteration are retained. Moreover, as SUV (standardized uptake value) based features have been recognized as significant predictive factors for a patient outcome, we propose to incorporate this prior knowledge into the selection procedure to improve its robustness and reduce its computational cost. Two real-world datasets from cancer patients are included in the evaluation. We extract dozens of clinical and PET-based features to characterize the patient's state, including SUV parameters and texture features. We use leave-one-out cross-validation to evaluate the prediction performance, in terms of prediction accuracy and robustness. Using SVM as the classifier, our HFS method produces accuracy values of 100% and 94% on the two datasets, respectively, and robustness values of 89% and 96%. Without accuracy loss, the prior-based version (pHFS) improves the robustness up to 100% and 98% on the two datasets, respectively. Compared with other feature selection methods, the proposed HFS and pHFS provide the most promising results. For our HFS method, we have empirically shown that the addition of prior knowledge improves the robustness and

  3. Deep Feature Transfer Learning in Combination with Traditional Features Predicts Survival Among Patients with Lung Adenocarcinoma.

    PubMed

    Paul, Rahul; Hawkins, Samuel H; Balagurunathan, Yoganand; Schabath, Matthew B; Gillies, Robert J; Hall, Lawrence O; Goldgof, Dmitry B

    2016-12-01

    Lung cancer is the most common cause of cancer-related deaths in the USA. It can be detected and diagnosed using computed tomography images. For an automated classifier, identifying predictive features from medical images is a key concern. Deep feature extraction using pretrained convolutional neural networks (CNNs) has recently been successfully applied in some image domains. Here, we applied a pretrained CNN to extract deep features from 40 computed tomography images, with contrast, of non-small cell adenocarcinoma lung cancer, and combined deep features with traditional image features and trained classifiers to predict short- and long-term survivors. We experimented with several pretrained CNNs and several feature selection strategies. The best previously reported accuracy when using traditional quantitative features was 77.5% (area under the curve [AUC], 0.712), which was achieved by a decision tree classifier. The best reported accuracy from transfer learning and deep features was 77.5% (AUC, 0.713) using a decision tree classifier. When extracted deep neural network features were combined with traditional quantitative features, we obtained an accuracy of 90% (AUC, 0.935) with the 5 best post-rectified linear unit features extracted from a vgg-f pretrained CNN and the 5 best traditional features. The best results were achieved with the symmetric uncertainty feature ranking algorithm followed by a random forests classifier.

  4. Deep Feature Transfer Learning in Combination with Traditional Features Predicts Survival Among Patients with Lung Adenocarcinoma

    PubMed Central

    Paul, Rahul; Hawkins, Samuel H.; Balagurunathan, Yoganand; Schabath, Matthew B.; Gillies, Robert J.; Hall, Lawrence O.; Goldgof, Dmitry B.

    2016-01-01

    Lung cancer is the most common cause of cancer-related deaths in the USA. It can be detected and diagnosed using computed tomography images. For an automated classifier, identifying predictive features from medical images is a key concern. Deep feature extraction using pretrained convolutional neural networks (CNNs) has recently been successfully applied in some image domains. Here, we applied a pretrained CNN to extract deep features from 40 computed tomography images, with contrast, of non-small cell adenocarcinoma lung cancer, and combined deep features with traditional image features and trained classifiers to predict short- and long-term survivors. We experimented with several pretrained CNNs and several feature selection strategies. The best previously reported accuracy when using traditional quantitative features was 77.5% (area under the curve [AUC], 0.712), which was achieved by a decision tree classifier. The best reported accuracy from transfer learning and deep features was 77.5% (AUC, 0.713) using a decision tree classifier. When extracted deep neural network features were combined with traditional quantitative features, we obtained an accuracy of 90% (AUC, 0.935) with the 5 best post-rectified linear unit features extracted from a vgg-f pretrained CNN and the 5 best traditional features. The best results were achieved with the symmetric uncertainty feature ranking algorithm followed by a random forests classifier. PMID:28066809

  5. Stabilizing l1-norm prediction models by supervised feature grouping.

    PubMed

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2016-02-01

    Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l1-norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making.

  6. Prediction of interface residue based on the features of residue interaction network.

    PubMed

    Jiao, Xiong; Ranganathan, Shoba

    2017-11-07

    Protein-protein interaction plays a crucial role in the cellular biological processes. Interface prediction can improve our understanding of the molecular mechanisms of the related processes and functions. In this work, we propose a classification method to recognize the interface residue based on the features of a weighted residue interaction network. The random forest algorithm is used for the prediction and 16 network parameters and the B-factor are acting as the element of the input feature vector. Compared with other similar work, the method is feasible and effective. The relative importance of these features also be analyzed to identify the key feature for the prediction. Some biological meaning of the important feature is explained. The results of this work can be used for the related work about the structure-function relationship analysis via a residue interaction network model. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Using genetic algorithms to select most predictive protein features.

    PubMed

    Kernytsky, Andrew; Rost, Burkhard

    2009-04-01

    Many important characteristics of proteins such as biochemical activity and subcellular localization present a challenge to machine-learning methods: it is often difficult to encode the appropriate input features at the residue level for the purpose of making a prediction for the entire protein. The problem is usually that the biophysics of the connection between a machine-learning method's input (sequence feature) and its output (observed phenomenon to be predicted) remains unknown; in other words, we may only know that a certain protein is an enzyme (output) without knowing which region may contain the active site residues (input). The goal then becomes to dissect a protein into a vast set of sequence-derived features and to correlate those features with the desired output. We introduce a framework that begins with a set of global sequence features and then vastly expands the feature space by generically encoding the coexistence of residue-based features. It is this combination of individual features, that is the step from the fractions of serine and buried (input space 20 + 2) to the fraction of buried serine (input space 20 * 2) that implicitly shifts the search space from global feature inputs to features that can capture very local evidence such as a the individual residues of a catalytic triad. The vast feature space created is explored by a genetic algorithm (GA) paired with neural networks and support vector machines. We find that the GA is critical for selecting combinations of features that are neither too general resulting in poor performance, nor too specific, leading to overtraining. The final framework manages to effectively sample a feature space that is far too large for exhaustive enumeration. We demonstrate the power of the concept by applying it to prediction of protein enzymatic activity. (c) 2008 Wiley-Liss, Inc.

  8. Generalized perceptual linear prediction features for animal vocalization analysis.

    PubMed

    Clemins, Patrick J; Johnson, Michael T

    2006-07-01

    A new feature extraction model, generalized perceptual linear prediction (gPLP), is developed to calculate a set of perceptually relevant features for digital signal analysis of animal vocalizations. The gPLP model is a generalized adaptation of the perceptual linear prediction model, popular in human speech processing, which incorporates perceptual information such as frequency warping and equal loudness normalization into the feature extraction process. Since such perceptual information is available for a number of animal species, this new approach integrates that information into a generalized model to extract perceptually relevant features for a particular species. To illustrate, qualitative and quantitative comparisons are made between the species-specific model, generalized perceptual linear prediction (gPLP), and the original PLP model using a set of vocalizations collected from captive African elephants (Loxodonta africana) and wild beluga whales (Delphinapterus leucas). The models that incorporate perceptional information outperform the original human-based models in both visualization and classification tasks.

  9. Real-world predictions from ab initio molecular dynamics simulations.

    PubMed

    Kirchner, Barbara; di Dio, Philipp J; Hutter, Jürg

    2012-01-01

    In this review we present the techniques of ab initio molecular dynamics simulation improved to its current stage where the analysis of existing processes and the prediction of further chemical features and real-world processes are feasible. For this reason we describe the relevant developments in ab initio molecular dynamics leading to this stage. Among them, parallel implementations, different basis set functions, density functionals, and van der Waals corrections are reported. The chemical features accessible through AIMD are discussed. These are IR, NMR, as well as EXAFS spectra, sampling methods like metadynamics and others, Wannier functions, dipole moments of molecules in condensed phase, and many other properties. Electrochemical reactions investigated by ab initio molecular dynamics methods in solution, on surfaces as well as complex interfaces, are also presented.

  10. Which ante mortem clinical features predict progressive supranuclear palsy pathology?

    PubMed

    Respondek, Gesine; Kurz, Carolin; Arzberger, Thomas; Compta, Yaroslau; Englund, Elisabet; Ferguson, Leslie W; Gelpi, Ellen; Giese, Armin; Irwin, David J; Meissner, Wassilios G; Nilsson, Christer; Pantelyat, Alexander; Rajput, Alex; van Swieten, John C; Troakes, Claire; Josephs, Keith A; Lang, Anthony E; Mollenhauer, Brit; Müller, Ulrich; Whitwell, Jennifer L; Antonini, Angelo; Bhatia, Kailash P; Bordelon, Yvette; Corvol, Jean-Christophe; Colosimo, Carlo; Dodel, Richard; Grossman, Murray; Kassubek, Jan; Krismer, Florian; Levin, Johannes; Lorenzl, Stefan; Morris, Huw; Nestor, Peter; Oertel, Wolfgang H; Rabinovici, Gil D; Rowe, James B; van Eimeren, Thilo; Wenning, Gregor K; Boxer, Adam; Golbe, Lawrence I; Litvan, Irene; Stamelou, Maria; Höglinger, Günter U

    2017-07-01

    Progressive supranuclear palsy (PSP) is a neuropathologically defined disease presenting with a broad spectrum of clinical phenotypes. To identify clinical features and investigations that predict or exclude PSP pathology during life, aiming at an optimization of the clinical diagnostic criteria for PSP. We performed a systematic review of the literature published since 1996 to identify clinical features and investigations that may predict or exclude PSP pathology. We then extracted standardized data from clinical charts of patients with pathologically diagnosed PSP and relevant disease controls and calculated the sensitivity, specificity, and positive predictive value of key clinical features for PSP in this cohort. Of 4166 articles identified by the database inquiry, 269 met predefined standards. The literature review identified clinical features predictive of PSP, including features of the following 4 functional domains: ocular motor dysfunction, postural instability, akinesia, and cognitive dysfunction. No biomarker or genetic feature was found reliably validated to predict definite PSP. High-quality original natural history data were available from 206 patients with pathologically diagnosed PSP and from 231 pathologically diagnosed disease controls (54 corticobasal degeneration, 51 multiple system atrophy with predominant parkinsonism, 53 Parkinson's disease, 73 behavioral variant frontotemporal dementia). We identified clinical features that predicted PSP pathology, including phenotypes other than Richardson's syndrome, with varying sensitivity and specificity. Our results highlight the clinical variability of PSP and the high prevalence of phenotypes other than Richardson's syndrome. The features of variant phenotypes with high specificity and sensitivity should serve to optimize clinical diagnosis of PSP. © 2017 International Parkinson and Movement Disorder Society. © 2017 International Parkinson and Movement Disorder Society.

  11. A Reduced Set of Features for Chronic Kidney Disease Prediction

    PubMed Central

    Misir, Rajesh; Mitra, Malay; Samanta, Ranjit Kumar

    2017-01-01

    Chronic kidney disease (CKD) is one of the life-threatening diseases. Early detection and proper management are solicited for augmenting survivability. As per the UCI data set, there are 24 attributes for predicting CKD or non-CKD. At least there are 16 attributes need pathological investigations involving more resources, money, time, and uncertainties. The objective of this work is to explore whether we can predict CKD or non-CKD with reasonable accuracy using less number of features. An intelligent system development approach has been used in this study. We attempted one important feature selection technique to discover reduced features that explain the data set much better. Two intelligent binary classification techniques have been adopted for the validity of the reduced feature set. Performances were evaluated in terms of four important classification evaluation parameters. As suggested from our results, we may more concentrate on those reduced features for identifying CKD and thereby reduces uncertainty, saves time, and reduces costs. PMID:28706750

  12. Feature Fusion Based SVM Classifier for Protein Subcellular Localization Prediction.

    PubMed

    Rahman, Julia; Mondal, Md Nazrul Islam; Islam, Md Khaled Ben; Hasan, Md Al Mehedi

    2016-12-18

    For the importance of protein subcellular localization in different branches of life science and drug discovery, researchers have focused their attentions on protein subcellular localization prediction. Effective representation of features from protein sequences plays a most vital role in protein subcellular localization prediction specially in case of machine learning techniques. Single feature representation-like pseudo amino acid composition (PseAAC), physiochemical property models (PPM), and amino acid index distribution (AAID) contains insufficient information from protein sequences. To deal with such problems, we have proposed two feature fusion representations, AAIDPAAC and PPMPAAC, to work with Support Vector Machine classifiers, which fused PseAAC with PPM and AAID accordingly. We have evaluated the performance for both single and fused feature representation of a Gram-negative bacterial dataset. We have got at least 3% more actual accuracy by AAIDPAAC and 2% more locative accuracy by PPMPAAC than single feature representation.

  13. Predicting Human Olfactory Perception from Chemical Features of Odor Molecules

    PubMed Central

    Keller, Andreas; Gerkin, Richard C.; Guan, Yuanfang; Dhurandhar, Amit; Turu, Gabor; Szalai, Bence; Mainland, Joel D.; Ihara, Yusuke; Yu, Chung Wen; Wolfinger, Russ; Vens, Celine; Schietgat, Leander; De Grave, Kurt; Norel, Raquel; Stolovitzky, Gustavo; Cecchi, Guillermo; Vosshall, Leslie B.; Meyer, Pablo

    2017-01-01

    It is still not possible to predict whether a given molecule will have a perceived odor, or what olfactory percept it will produce. We therefore organized the crowd-sourced DREAM Olfaction Prediction Challenge. Using a large olfactory psychophysical dataset, teams developed machine learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models accurately predicted odor intensity and pleasantness, and also successfully predicted eight among 19 rated semantic descriptors (“garlic”, “fish”, “sweet”, “fruit,” “burnt”, “spices”, “flower”, “sour”). Regularized linear models performed nearly as well as random-forest-based ones, with a predictive accuracy that closely approaches a key theoretical limit. These models help to predict the perceptual qualities of virtually any molecule with high accuracy and also reverse-engineer the smell of a molecule. PMID:28219971

  14. Predicting human olfactory perception from chemical features of odor molecules.

    PubMed

    Keller, Andreas; Gerkin, Richard C; Guan, Yuanfang; Dhurandhar, Amit; Turu, Gabor; Szalai, Bence; Mainland, Joel D; Ihara, Yusuke; Yu, Chung Wen; Wolfinger, Russ; Vens, Celine; Schietgat, Leander; De Grave, Kurt; Norel, Raquel; Stolovitzky, Gustavo; Cecchi, Guillermo A; Vosshall, Leslie B; Meyer, Pablo

    2017-02-24

    It is still not possible to predict whether a given molecule will have a perceived odor or what olfactory percept it will produce. We therefore organized the crowd-sourced DREAM Olfaction Prediction Challenge. Using a large olfactory psychophysical data set, teams developed machine-learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models accurately predicted odor intensity and pleasantness and also successfully predicted 8 among 19 rated semantic descriptors ("garlic," "fish," "sweet," "fruit," "burnt," "spices," "flower," and "sour"). Regularized linear models performed nearly as well as random forest-based ones, with a predictive accuracy that closely approaches a key theoretical limit. These models help to predict the perceptual qualities of virtually any molecule with high accuracy and also reverse-engineer the smell of a molecule. Copyright © 2017, American Association for the Advancement of Science.

  15. Clinical and molecular features of young-onset colorectal cancer

    PubMed Central

    Ballester, Veroushka; Rashtak, Shahrooz; Boardman, Lisa

    2016-01-01

    Colorectal cancer (CRC) is one of the leading causes of cancer related mortality worldwide. Although young-onset CRC raises the possibility of a hereditary component, hereditary CRC syndromes only explain a minority of young-onset CRC cases. There is evidence to suggest that young-onset CRC have a different molecular profile than late-onset CRC. While the pathogenesis of young-onset CRC is well characterized in individuals with an inherited CRC syndrome, knowledge regarding the molecular features of sporadic young-onset CRC is limited. Understanding the molecular mechanisms of young-onset CRC can help us tailor specific screening and management strategies. While the incidence of late-onset CRC has been decreasing, mainly attributed to an increase in CRC screening, the incidence of young-onset CRC is increasing. Differences in the molecular biology of these tumors and low suspicion of CRC in young symptomatic individuals, may be possible explanations. Currently there is no evidence that supports that screening of average risk individuals less than 50 years of age will translate into early detection or increased survival. However, increasing understanding of the underlying molecular mechanisms of young-onset CRC could help us tailor specific screening and management strategies. The purpose of this review is to evaluate the current knowledge about young-onset CRC, its clinicopathologic features, and the newly recognized molecular alterations involved in tumor progression. PMID:26855533

  16. Actigraphy features for predicting mobility disability in older adults.

    PubMed

    Kheirkhahan, Matin; Tudor-Locke, Catrine; Axtell, Robert; Buman, Matthew P; Fielding, Roger A; Glynn, Nancy W; Guralnik, Jack M; King, Abby C; White, Daniel K; Miller, Michael E; Siddique, Juned; Brubaker, Peter; Rejeski, W Jack; Ranshous, Stephen; Pahor, Marco; Ranka, Sanjay; Manini, Todd M

    2016-09-21

    Actigraphy has attracted much attention for assessing physical activity in the past decade. Many algorithms have been developed to automate the analysis process, but none has targeted a general model to discover related features for detecting or predicting mobility function, or more specifically, mobility impairment and major mobility disability (MMD). Men (N  =  357) and women (N  =  778) aged 70-89 years wore a tri-axial accelerometer (Actigraph GT3X) on the right hip during free-living conditions for 8.4  ±  3.0 d. One-second epoch data were summarized into 67 features. Several machine learning techniques were used to select features from the free-living condition to predict mobility impairment, defined as 400 m walking speed  <0.80 m s(-1). Selected features were also included in a model to predict the first occurrence of MMD-defined as the loss in the ability to walk 400 m. Each method yielded a similar estimate of 400 m walking speed with a root mean square error of ~0.07 m s(-1) and an R-squared values ranging from 0.37-0.41. Sensitivity and specificity of identifying slow walkers was approximately 70% and 80% for all methods, respectively. The top five features, which were related to movement pace and amount (activity counts and steps), length in activity engagement (bout length), accumulation patterns of activity, and movement variability significantly improved the prediction of MMD beyond that found with common covariates (age, diseases, anthropometry, etc). This study identified a subset of actigraphy features collected in free-living conditions that are moderately accurate in identifying persons with clinically-assessed mobility impaired and significantly improve the prediction of MMD. These findings suggest that the combination of features as opposed to a specific feature is important to consider when choosing features and/or combinations of features for prediction of mobility phenotypes in older adults.

  17. Predicting couple therapy outcomes based on speech acoustic features.

    PubMed

    Nasir, Md; Baucom, Brian Robert; Georgiou, Panayiotis; Narayanan, Shrikanth

    2017-01-01

    Automated assessment and prediction of marital outcome in couples therapy is a challenging task but promises to be a potentially useful tool for clinical psychologists. Computational approaches for inferring therapy outcomes using observable behavioral information obtained from conversations between spouses offer objective means for understanding relationship dynamics. In this work, we explore whether the acoustics of the spoken interactions of clinically distressed spouses provide information towards assessment of therapy outcomes. The therapy outcome prediction task in this work includes detecting whether there was a relationship improvement or not (posed as a binary classification) as well as discerning varying levels of improvement or decline in the relationship status (posed as a multiclass recognition task). We use each interlocutor's acoustic speech signal characteristics such as vocal intonation and intensity, both independently and in relation to one another, as cues for predicting the therapy outcome. We also compare prediction performance with one obtained via standardized behavioral codes characterizing the relationship dynamics provided by human experts as features for automated classification. Our experiments, using data from a longitudinal clinical study of couples in distressed relations, showed that predictions of relationship outcomes obtained directly from vocal acoustics are comparable or superior to those obtained using human-rated behavioral codes as prediction features. In addition, combining direct signal-derived features with manually coded behavioral features improved the prediction performance in most cases, indicating the complementarity of relevant information captured by humans and machine algorithms. Additionally, considering the vocal properties of the interlocutors in relation to one another, rather than in isolation, showed to be important for improving the automatic prediction. This finding supports the notion that behavioral

  18. SCRATCH: a protein structure and structural feature prediction server

    PubMed Central

    Cheng, J.; Randall, A. Z.; Sweredoski, M. J.; Baldi, P.

    2005-01-01

    SCRATCH is a server for predicting protein tertiary structure and structural features. The SCRATCH software suite includes predictors for secondary structure, relative solvent accessibility, disordered regions, domains, disulfide bridges, single mutation stability, residue contacts versus average, individual residue contacts and tertiary structure. The user simply provides an amino acid sequence and selects the desired predictions, then submits to the server. Results are emailed to the user. The server is available at . PMID:15980571

  19. Universality and predictability in molecular quantitative genetics.

    PubMed

    Nourmohammad, Armita; Held, Torsten; Lässig, Michael

    2013-12-01

    Molecular traits, such as gene expression levels or protein binding affinities, are increasingly accessible to quantitative measurement by modern high-throughput techniques. Such traits measure molecular functions and, from an evolutionary point of view, are important as targets of natural selection. We review recent developments in evolutionary theory and experiments that are expected to become building blocks of a quantitative genetics of molecular traits. We focus on universal evolutionary characteristics: these are largely independent of a trait's genetic basis, which is often at least partially unknown. We show that universal measurements can be used to infer selection on a quantitative trait, which determines its evolutionary mode of conservation or adaptation. Furthermore, universality is closely linked to predictability of trait evolution across lineages. We argue that universal trait statistics extends over a range of cellular scales and opens new avenues of quantitative evolutionary systems biology.

  20. How to Predict Molecular Interactions between Species?

    PubMed Central

    Schulze, Sylvie; Schleicher, Jana; Guthke, Reinhard; Linde, Jörg

    2016-01-01

    Organisms constantly interact with other species through physical contact which leads to changes on the molecular level, for example the transcriptome. These changes can be monitored for all genes, with the help of high-throughput experiments such as RNA-seq or microarrays. The adaptation of the gene expression to environmental changes within cells is mediated through complex gene regulatory networks. Often, our knowledge of these networks is incomplete. Network inference predicts gene regulatory interactions based on transcriptome data. An emerging application of high-throughput transcriptome studies are dual transcriptomics experiments. Here, the transcriptome of two or more interacting species is measured simultaneously. Based on a dual RNA-seq data set of murine dendritic cells infected with the fungal pathogen Candida albicans, the software tool NetGenerator was applied to predict an inter-species gene regulatory network. To promote further investigations of molecular inter-species interactions, we recently discussed dual RNA-seq experiments for host-pathogen interactions and extended the applied tool NetGenerator (Schulze et al., 2015). The updated version of NetGenerator makes use of measurement variances in the algorithmic procedure and accepts gene expression time series data with missing values. Additionally, we tested multiple modeling scenarios regarding the stimuli functions of the gene regulatory network. Here, we summarize the work by Schulze et al. (2015) and put it into a broader context. We review various studies making use of the dual transcriptomics approach to investigate the molecular basis of interacting species. Besides the application to host-pathogen interactions, dual transcriptomics data are also utilized to study mutualistic and commensalistic interactions. Furthermore, we give a short introduction into additional approaches for the prediction of gene regulatory networks and discuss their application to dual transcriptomics data. We

  1. JWST NIRCam WFSS Ice Feature Spectroscopy in Nearby Molecular Cores

    NASA Astrophysics Data System (ADS)

    Chu, Laurie; Hodapp, Klaus W.; Rieke, Marcia J.; Meyer, Michael; Greene, Thomas P.; JWST NIRCam Science Team

    2017-06-01

    In molecular clouds above a few magnitudes of total visual extinction, some components of the molecular gas freeze out on the surfaces of dust grains. These ice mantles around dust grains are the site of complex surface chemistry that leads to the formation of simple organic molecules in these mantles. The icy surfaces also facilitate the coaggulation of the dust particles, setting the stage for grain growth and ultimately the formation of planetary bodies.As part of the JWST NIRCam GTO program, we plan to observe a selection of small molecular cores using the wide field grism spectroscopy mode of NIRCam.This poster presents the results of a preliminary study of several candidate molecular cores using UKIRT, Spitzer IRAC, IRTF SpeX, Keck MOSFIRE and Subaru MOIRCS data.After the prelimary studies we have selected three molecular cores in different evolutionary stages for the GTO program: B68, a quiescent molecular core, LDN 694-2, a collapsing pre-stellar core, and B335, a protostellar core. All these cores are seen against a dense background of stars in the inner Galaxy and offer the opportunity for spatially well resolved mapping of the ice feature distribution. We will obtain slitless grism spectroscopy in six filters covering the features of H2O, CO2, CO, CH3OH, and the XCN feature. Simulations using aXeSIM have shown that spectrum overlap will occur in a fraction of the spectra, but will not be a prohibitive problem.Our poster will discuss the details of observations planned out in the APT system.

  2. Gene essentiality prediction based on fractal features and machine learning.

    PubMed

    Yu, Yongming; Yang, Licai; Liu, Zhiping; Zhu, Chuansheng

    2017-02-28

    Essential genes are required for the viability of an organism. Accurate and rapid identification of new essential genes is of substantial theoretical interest to synthetic biology and has practical applications in biomedicine. Fractals provide facilitated access to genetic structure analysis on a different scale. In this study, machine learning-based methods using solely fractal features are presented and the problem of predicting essential genes in bacterial genomes is evaluated. Six fractal features were investigated to learn the parameters of five supervised classification methods for the binary classification task. The optimal parameters of these classifiers are determined via grid-based searching technique. All the currently available identified genes from the database of essential genes were utilized to build the classifiers. The fractal features were proven to be more robust and powerful in the prediction performance. In a statistical sense, the ELM method shows superiority in predicting the essential genes. Non-parameter tests of the average AUC and ACC showed that the fractal feature is much better than other five compared features sets. Our approach is promising and convenient to identify new bacterial essential genes.

  3. Predicting protein amidation sites by orchestrating amino acid sequence features

    NASA Astrophysics Data System (ADS)

    Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

    2017-08-01

    Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.

  4. Structural features that predict real-value fluctuations of globular proteins.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-05-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.

  5. Interpretable Topic Features for Post-ICU Mortality Prediction

    PubMed Central

    Luo, Yen-Fu; Rumshisky, Anna

    2016-01-01

    Electronic health records provide valuable resources for understanding the correlation between various diseases and mortality. The analysis of post-discharge mortality is critical for healthcare professionals to follow up potential causes of death after a patient is discharged from the hospital and give prompt treatment. Moreover, it may reduce the cost derived from readmissions and improve the quality of healthcare. Our work focused on post-discharge ICU mortality prediction. In addition to features derived from physiological measurements, we incorporated ICD-9-CM hierarchy into Bayesian topic model learning and extracted topic features from medical notes. We achieved highest AUCs of 0.835 and 0.829 for 30-day and 6-month post-discharge mortality prediction using baseline and topic proportions derived from Labeled-LDA. Moreover, our work emphasized the interpretability of topic features derived from topic model which may facilitates the understanding and investigation of the complexity between mortality and diseases. PMID:28269879

  6. Predicting polymer nanofiber interactions via molecular simulations.

    PubMed

    Buell, Sezen; Rutledge, Gregory C; Vliet, Krystyn J Van

    2010-04-01

    Physical and functional properties of nonwoven textiles and other fiberlike materials depend strongly on the number and type of fiber-fiber interactions. For nanoscale polymeric fibers in particular, these interactions are governed by the surfaces of and contacts between fibers. We employ both molecular dynamics (MD) simulations at a temperature below the glass transition temperature T(g) of the polymer bulk, and molecular statics (MS), or energy minimization, to study the interfiber interactions between prototypical polymeric fibers of 4.6 nm diameter, comprising multiple macromolecular chains each of 100 carbon atoms per chain (C100). Our MD simulations show that fibers aligned parallel and within 9 nm of one another experience a significant force of attraction. These fibers tend toward coalescence on a very short time scale, even below T(g). In contrast, our MS calculations suggest an interfiber interaction that transitions from an attractive to a repulsive force at a separation distance of 6 nm. The results of either approach can be used to obtain a quantitative, closed-form relation describing fiber-fiber interaction energies U(s). However, the predicted form of interaction is quite different for the two approaches, and can be understood in terms of differences in the extent of molecular mobility within and between fibers for these different modeling perspectives. The results of these molecular-scale calculations of U(s) are used to interpret experimental observations for electrospun polymer nanofiber mats. These findings highlight the role of temperature and kinetically accessible molecular configurations in predicting interface-dominated interactions at polymer fiber surfaces, and prompt further experiments and simulations to confirm these effects in the properties of nonwoven mats comprising such nanoscale fibers.

  7. Prediction of subjective ratings of emotional pictures by EEG features

    NASA Astrophysics Data System (ADS)

    McFarland, Dennis J.; Parvaz, Muhammad A.; Sarnacki, William A.; Goldstein, Rita Z.; Wolpaw, Jonathan R.

    2017-02-01

    Objective. Emotion dysregulation is an important aspect of many psychiatric disorders. Brain-computer interface (BCI) technology could be a powerful new approach to facilitating therapeutic self-regulation of emotions. One possible BCI method would be to provide stimulus-specific feedback based on subject-specific electroencephalographic (EEG) responses to emotion-eliciting stimuli. Approach. To assess the feasibility of this approach, we studied the relationships between emotional valence/arousal and three EEG features: amplitude of alpha activity over frontal cortex; amplitude of theta activity over frontal midline cortex; and the late positive potential over central and posterior mid-line areas. For each feature, we evaluated its ability to predict emotional valence/arousal on both an individual and a group basis. Twenty healthy participants (9 men, 11 women; ages 22-68) rated each of 192 pictures from the IAPS collection in terms of valence and arousal twice (96 pictures on each of 4 d over 2 weeks). EEG was collected simultaneously and used to develop models based on canonical correlation to predict subject-specific single-trial ratings. Separate models were evaluated for the three EEG features: frontal alpha activity; frontal midline theta; and the late positive potential. In each case, these features were used to simultaneously predict both the normed ratings and the subject-specific ratings. Main results. Models using each of the three EEG features with data from individual subjects were generally successful at predicting subjective ratings on training data, but generalization to test data was less successful. Sparse models performed better than models without regularization. Significance. The results suggest that the frontal midline theta is a better candidate than frontal alpha activity or the late positive potential for use in a BCI-based paradigm designed to modify emotional reactions.

  8. Exploiting Information Diffusion Feature for Link Prediction in Sina Weibo

    NASA Astrophysics Data System (ADS)

    Li, Dong; Zhang, Yongchao; Xu, Zhiming; Chu, Dianhui; Li, Sheng

    2016-01-01

    The rapid development of online social networks (e.g., Twitter and Facebook) has promoted research related to social networks in which link prediction is a key problem. Although numerous attempts have been made for link prediction based on network structure, node attribute and so on, few of the current studies have considered the impact of information diffusion on link creation and prediction. This paper mainly addresses Sina Weibo, which is the largest microblog platform with Chinese characteristics, and proposes the hypothesis that information diffusion influences link creation and verifies the hypothesis based on real data analysis. We also detect an important feature from the information diffusion process, which is used to promote link prediction performance. Finally, the experimental results on Sina Weibo dataset have demonstrated the effectiveness of our methods.

  9. Common features of microRNA target prediction tools.

    PubMed

    Peterson, Sarah M; Thompson, Jeffrey A; Ufkin, Melanie L; Sathyanarayana, Pradeep; Liaw, Lucy; Congdon, Clare Bates

    2014-01-01

    The human genome encodes for over 1800 microRNAs (miRNAs), which are short non-coding RNA molecules that function to regulate gene expression post-transcriptionally. Due to the potential for one miRNA to target multiple gene transcripts, miRNAs are recognized as a major mechanism to regulate gene expression and mRNA translation. Computational prediction of miRNA targets is a critical initial step in identifying miRNA:mRNA target interactions for experimental validation. The available tools for miRNA target prediction encompass a range of different computational approaches, from the modeling of physical interactions to the incorporation of machine learning. This review provides an overview of the major computational approaches to miRNA target prediction. Our discussion highlights three tools for their ease of use, reliance on relatively updated versions of miRBase, and range of capabilities, and these are DIANA-microT-CDS, miRanda-mirSVR, and TargetScan. In comparison across all miRNA target prediction tools, four main aspects of the miRNA:mRNA target interaction emerge as common features on which most target prediction is based: seed match, conservation, free energy, and site accessibility. This review explains these features and identifies how they are incorporated into currently available target prediction tools. MiRNA target prediction is a dynamic field with increasing attention on development of new analysis tools. This review attempts to provide a comprehensive assessment of these tools in a manner that is accessible across disciplines. Understanding the basis of these prediction methodologies will aid in user selection of the appropriate tools and interpretation of the tool output.

  10. Predictive features of breast cancer on Mexican screening mammography patients

    NASA Astrophysics Data System (ADS)

    Rodriguez-Rojas, Juan; Garza-Montemayor, Margarita; Trevino-Alvarado, Victor; Tamez-Pena, José Gerardo

    2013-02-01

    Breast cancer is the most common type of cancer worldwide. In response, breast cancer screening programs are becoming common around the world and public programs now serve millions of women worldwide. These programs are expensive, requiring many specialized radiologists to examine all images. Nevertheless, there is a lack of trained radiologists in many countries as in Mexico, which is a barrier towards decreasing breast cancer mortality, pointing at the need of a triaging system that prioritizes high risk cases for prompt interpretation. Therefore we explored in an image database of Mexican patients whether high risk cases can be distinguished using image features. We collected a set of 200 digital screening mammography cases from a hospital in Mexico, and assigned low or high risk labels according to its BIRADS score. Breast tissue segmentation was performed using an automatic procedure. Image features were obtained considering only the segmented region on each view and comparing the bilateral di erences of the obtained features. Predictive combinations of features were chosen using a genetic algorithms based feature selection procedure. The best model found was able to classify low-risk and high-risk cases with an area under the ROC curve of 0.88 on a 150-fold cross-validation test. The features selected were associated to the differences of signal distribution and tissue shape on bilateral views. The model found can be used to automatically identify high risk cases and trigger the necessary measures to provide prompt treatment.

  11. Critical Features of Fragment Libraries for Protein Structure Prediction.

    PubMed

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  12. Critical Features of Fragment Libraries for Protein Structure Prediction

    PubMed Central

    dos Santos, Karina Baptista

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928

  13. Quantitative imaging features to predict cancer status in lung nodules

    NASA Astrophysics Data System (ADS)

    Liu, Ying; Balagurunathan, Yoganand; Atwater, Thomas; Antic, Sanja; Li, Qian; Walker, Ronald; Smith, Gary T.; Massion, Pierre P.; Schabath, Matthew B.; Gillies, Robert J.

    2016-03-01

    Background: We propose a systematic methodology to quantify incidentally identified lung nodules based on observed radiological traits on a point scale. These quantitative traits classification model was used to predict cancer status. Materials and Methods: We used 102 patients' low dose computed tomography (LDCT) images for this study, 24 semantic traits were systematically scored from each image. We built a machine learning classifier in cross validation setting to find best predictive imaging features to differentiate malignant from benign lung nodules. Results: The best feature triplet to discriminate malignancy was based on long axis, concavity and lymphadenopathy with average AUC of 0.897 (Accuracy of 76.8%, Sensitivity of 64.3%, Specificity of 90%). A similar semantic triplet optimized on Sensitivity/Specificity (Youden's J index) included long axis, vascular convergence and lymphadenopathy which had an average AUC of 0.875 (Accuracy of 81.7%, Sensitivity of 76.2%, Specificity of 95%). Conclusions: Quantitative radiological image traits can differentiate malignant from benign lung nodules. These semantic features along with size measurement enhance the prediction accuracy.

  14. Application of optimal prediction to molecular dynamics

    SciTech Connect

    Barber, IV, John Letherman

    2004-12-01

    Optimal prediction is a general system reduction technique for large sets of differential equations. In this method, which was devised by Chorin, Hald, Kast, Kupferman, and Levy, a projection operator formalism is used to construct a smaller system of equations governing the dynamics of a subset of the original degrees of freedom. This reduced system consists of an effective Hamiltonian dynamics, augmented by an integral memory term and a random noise term. Molecular dynamics is a method for simulating large systems of interacting fluid particles. In this thesis, I construct a formalism for applying optimal prediction to molecular dynamics, producing reduced systems from which the properties of the original system can be recovered. These reduced systems require significantly less computational time than the original system. I initially consider first-order optimal prediction, in which the memory and noise terms are neglected. I construct a pair approximation to the renormalized potential, and ignore three-particle and higher interactions. This produces a reduced system that correctly reproduces static properties of the original system, such as energy and pressure, at low-to-moderate densities. However, it fails to capture dynamical quantities, such as autocorrelation functions. I next derive a short-memory approximation, in which the memory term is represented as a linear frictional force with configuration-dependent coefficients. This allows the use of a Fokker-Planck equation to show that, in this regime, the noise is δ-correlated in time. This linear friction model reproduces not only the static properties of the original system, but also the autocorrelation functions of dynamical variables.

  15. Identifying predictive morphologic features of malignancy in eyelid lesions

    PubMed Central

    Leung, Christina; Johnson, Davin; Pang, Renee; Kratky, Vladimir

    2015-01-01

    Abstract Objective To determine features of eyelid lesions most predictive of malignancy, and to design a key to assist general practitioners in the triaging of such lesions. Design Prospective observational study. Setting Department of Ophthalmology at Queen’s University in Kingston, Ont. Participants A total of 199 consecutive periocular lesions requiring biopsy or excision were included. Main outcome measures First, potential features suggestive of malignancy for eyelid lesions were identified based on a survey sent to Canadian oculoplastic surgeons. The sensitivity, specificity, and odds ratios (ORs) of these features were then determined using 199 consecutive photographed eyelid lesions of patients who presented to the Department of Ophthalmology and underwent biopsy or excision. A triage key was then created based on the features with the highest ORs, and it was pilot-tested by a group of medical students. Results Of the 199 lesions included, 161 (80.9%) were benign and 38 (19.1%) were malignant. The 3 features with the highest ORs in predicting malignancy were infiltration (OR = 18.2, P < .01), ulceration (OR = 14.7, P < .01), and loss of eyelashes (OR = 6.0, P < .01). The acronym LUI (loss of eyelashes, ulceration, infiltration) was created to assist in memory recall. After watching a video describing the LUI triage key, the mean total score of a group of medical students for correctly identifying malignant lesions increased from 46% to 70% (P < .001). Conclusion Differentiating benign from malignant eyelid lesions can be difficult even for experienced physicians. The LUI triage key provides physicians with an evidence-based, easy-to-remember system for assisting in the triaging of these lesions. PMID:25756148

  16. Magnetic resonance image features identify glioblastoma phenotypic subtypes with distinct molecular pathway activities

    PubMed Central

    Itakura, Haruka; Achrol, Achal S.; Mitchell, Lex A.; Loya, Joshua J.; Liu, Tiffany; Westbroek, Erick M.; Feroze, Abdullah H.; Rodriguez, Scott; Echegaray, Sebastian; Azad, Tej D.; Yeom, Kristen W.; Napel, Sandy; Rubin, Daniel L.; Chang, Steven D.; Harsh, Griffith R.; Gevaert, Olivier

    2015-01-01

    Glioblastoma (GBM) is the most common and highly lethal primary malignant brain tumor in adults. There is a dire need for easily accessible, noninvasive biomarkers that can delineate underlying molecular activities and predict response to therapy. To this end, we sought to identify subtypes of GBM, differentiated solely by quantitative MR imaging features, that could be used for better management of GBM patients. Quantitative image features capturing the shape, texture, and edge sharpness of each lesion were extracted from MR images of 121 patients with de novo, solitary, unilateral GBM. Three distinct phenotypic “clusters” emerged in the development cohort using consensus clustering with 10,000 iterations on these image features. These three clusters—pre-multifocal, spherical, and rim-enhancing, names reflecting their image features—were validated in an independent cohort consisting of 144 multi-institution patients with similar tumor characteristics from The Cancer Genome Atlas (TCGA). Each cluster mapped to a unique set of molecular signaling pathways using pathway activity estimates derived from analysis of TCGA tumor copy number and gene expression data with the PARADIGM algorithm. Distinct pathways, such as c-Kit and FOXA, were enriched in each cluster, indicating differential molecular activities as determined by image features. Each cluster also demonstrated differential probabilities of survival, indicating prognostic importance. Our imaging method offers a noninvasive approach to stratify GBM patients and also provides unique sets of molecular signatures to inform targeted therapy and personalized treatment of GBM. PMID:26333934

  17. Prediction of chemical carcinogenicity from molecular structure.

    PubMed

    Sun, Hongmao

    2004-01-01

    Carcinogens represent a serious threat to human health. In vivo determination of carcinogenicity is time-consuming and expensive, thus in silico models to predict chemical carcinogenicity are highly desirable for virtual screening of compound libraries of both pharmaceutically and other commercially interesting molecules. In the present study, a PLS-DA (partial least squares discriminant analysis) model was developed to predict carcinogenicities in each of four rodent models: male mouse (MM), female mouse (FM), male rat (MR), and female rat (FR). The data set that was used contained over 520 compounds from both the NTP and the FDA databases. All the models were built from the same molecular descriptor system, which is based on atom typing [Sun, H. J. Chem. Inf. Comput. Sci. 2004, 44, 748-757], enabling the comparison of atomic contributions to carcinogenicity with respect to species and gender. Using four components, the models were able to achieve excellent fitting and prediction, with r(2) = 0.987 and q(2) = 0.944 for MM, r(2) = 0.985 and q(2) = 0.950 for FM, r(2) = 0.989 and q(2) = 0.962 for MR, and r(2) = 0.990 and q(2) = 0.965 for FR. The models were further validated by response permutation testing and external validation, and the results indicated that the models were both statistically significant and predictive. Variable influence on projection (VIP) analysis identified the key atom types and fragments that contributed to carcinogenicities and response differences across species and gender.

  18. A Prediction Model for Membrane Proteins Using Moments Based Features

    PubMed Central

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  19. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http

  20. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines.

    PubMed

    González, Alvaro J; Liao, Li

    2010-10-29

    Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http

  1. Predicting drug pharmacokinetic properties using molecular interaction fields and SIMCA

    NASA Astrophysics Data System (ADS)

    Wolohan, Philippa R. N.; Clark, Robert D.

    2003-01-01

    We have developed a method that combines molecular interaction fields with soft independent modeling of class analogy (SIMCA) Wold:1977 to predict pharmacokinetic drug properties. Several additional considerations to those made in traditional QSAR are required in order to develop a successful QSPR strategy that is capable of accommodating the many complex factors that contribute to key pharmacokinetic properties such as ADME (absorption, distribution, metabolism, and excretion) and toxicology. An accurate prediction of oral bioavailability, for example, requires that absorption and first-pass hepatic elimination both be taken into consideration. To accomplish this, general properties of molecules must be related to their solubility and ability to penetrate biological membranes, and specific features must be related to their particular metabolic and toxicological profiles. Here we describe a method, which is applicable to structurally diverse data sets while utilizing as much detailed structural information as possible. We address the issue of the molecular alignment of a structurally diverse set of compounds using idiotropic field orientation (IFO), a generalization of inertial field orientation Clark:1998. We have developed a second flavor of this method, which directly incorporates electrostatics into the molecular alignment. Both variations of IFO produce a characteristic orientation for each structure and the corresponding molecular fields can then be analyzed using SIMCA. Models are presented for human intestinal absorption, blood-brain barrier penetration and bioavailability to demonstrate ways in which this tool can be used early in the drug development process to identify leads likely to exhibit poor pharmacokinetic behavior in pre-clinical studies, and we have explored the influence of conformation and molecular field type on the statistical properties of the models obtained.

  2. Features Predicting Sentinel Lymph Node Positivity in Merkel Cell Carcinoma

    PubMed Central

    Schwartz, Jennifer L.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; McLean, Scott A.; Fullen, Douglas R.; Lao, Christopher D.; Hayman, James A.; Bradford, Carol R.; Rees, Riley S.; Johnson, Timothy M.; Bichakjian, Christopher K.

    2011-01-01

    Purpose Merkel cell carcinoma (MCC) is a relatively rare, potentially aggressive cutaneous malignancy. We examined the clinical and histologic features of primary MCC that may correlate with the probability of a positive sentinel lymph node (SLN). Methods Ninety-five patients with MCC who underwent SLN biopsy at the University of Michigan were identified. SLN biopsy was performed on 97 primary tumors, and an SLN was identified in 93 instances. These were reviewed for clinical and histologic features and associated SLN positivity. Univariate associations between these characteristics and a positive SLN were tested for by using either the χ2 or the Fisher's exact test. A backward elimination algorithm was used to help create a best multiple variable model to explain a positive SLN. Results SLN positivity was significantly associated with the clinical size of the lesion, greatest horizontal histologic dimension, tumor thickness, mitotic rate, and histologic growth pattern. Two competing multivariate models were generated to predict a positive SLN. The histologic growth pattern was present in both models and combined with either tumor thickness or mitotic rate. Conclusion Increasing clinical size, increasing tumor thickness, increasing mitotic rate, and infiltrative tumor growth pattern were significantly associated with a greater likelihood of a positive SLN. By using the growth pattern and tumor thickness model, no subgroup of patients was predicted to have a lower than 15% to 20% likelihood of a positive SLN. This suggests that all patients presenting with MCC without clinical evidence of regional lymph node disease should be considered for SLN biopsy. PMID:21300936

  3. Molecular Features of Wheat Endosperm Arabinoxylan Inclusion in Functional Bread

    PubMed Central

    Li, Weili; Hu, Hui; Wang, Qi; Brennan, Charles J.

    2013-01-01

    Arabinoxylan (AX) is a major dietary fibre component found in a variety of cereals. Numerous health benefits of arabinoxylans have been reported to be associated with their solubility and molecular features. The current study reports the development of a functional bread using a combination of AX-enriched material (AEM) and optimal commercial endoxylanase. The total AX content of bread was increased to 8.2 g per 100 g available carbohydrates. The extractability of AX in breads with and without endoxylanase was determined. The results demonstrate that water-extractable AX (WE-AX) increased progressively through the bread making process. The application of endoxylanase also increased WE-AX content. The presence of 360 ppm of endoxylanase had positive effects on the bread characteristics in terms of bread volume and firmness by converting the water unextractable (WU)-AX to WE-AX. In addition, the molecular weight (Mw) distribution of the WE-AX of bread with and without endoxylanase was characterized by size-exclusion chromatography. The results show that as the portion of WE-AX increased, the amount of high Mw WE-AX (higher than 100 kDa) decreased, whereas the amount of low Mw WE-AX (lower than 100 kDa) increased from 33.2% to 44.2% through the baking process. The low Mw WE-AX further increased to 75.5% with the application of the optimal endoxylanase (360 ppm). PMID:28239111

  4. Predicting Presynaptic and Postsynaptic Neurotoxins by Developing Feature Selection Technique

    PubMed Central

    Yang, Yunchun; Zhang, Chunmei; Chen, Rong; Huang, Po

    2017-01-01

    Presynaptic and postsynaptic neurotoxins are proteins which act at the presynaptic and postsynaptic membrane. Correctly predicting presynaptic and postsynaptic neurotoxins will provide important clues for drug-target discovery and drug design. In this study, we developed a theoretical method to discriminate presynaptic neurotoxins from postsynaptic neurotoxins. A strict and objective benchmark dataset was constructed to train and test our proposed model. The dipeptide composition was used to formulate neurotoxin samples. The analysis of variance (ANOVA) was proposed to find out the optimal feature set which can produce the maximum accuracy. In the jackknife cross-validation test, the overall accuracy of 94.9% was achieved. We believe that the proposed model will provide important information to study neurotoxins. PMID:28303250

  5. Predicting the Presence of Large Fish through Benthic Geomorphic Features

    NASA Astrophysics Data System (ADS)

    Knuth, F.; Sautter, L.; Levine, N. S.; Kracker, L.

    2013-12-01

    Marine Protected Areas are critical in sustaining the resilience of fish populations to commercial fishing operations. Using acoustic data to survey these areas promises efficiency, accuracy, and minimal environmental impact. In July, 2013, the NOAA Ship Pisces collected bathymetric, backscatter and water column data for 10 proposed MPA sites along the U.S. Southeast Atlantic continental shelf. A total of 205 km2 of seafloor were mapped between Mayport, FL and Wilmington, NC, using the SIMRAD ME70 and EK60 echosounder systems. These data were processed in Caris HIPS, QPS FMGT, MATLAB and ArcGIS. The backscatter and bathymetry reveal various benthic geomorphic features, including flat sand, rippled sand, and rugose hard bottom. Water column data directly above highly rugose hardbottom contains the greatest counts for large fish populations. Using spatial statistics, such as a geographically weighted regression model, we aim to identify features of the benthic profile, including rugosity, curvature and slope, that can predict the presence of large fish. The success of this approach will greatly expedite fishery surveys, minimize operational cost and aid in making timely management decisions.

  6. Mixed learning algorithms and features ensemble in hepatotoxicity prediction

    NASA Astrophysics Data System (ADS)

    Liew, Chin Yee; Lim, Yen Ching; Yap, Chun Wei

    2011-09-01

    Drug-induced liver injury, although infrequent, is an important safety concern that can lead to fatality in patients and failure in drug developments. In this study, we have used an ensemble of mixed learning algorithms and mixed features for the development of a model to predict hepatic effects. This robust method is based on the premise that no single learning algorithm is optimum for all modelling problems. An ensemble model of 617 base classifiers was built from a diverse set of 1,087 compounds. The ensemble model was validated internally with five-fold cross-validation and 25 rounds of y-randomization. In the external validation of 120 compounds, the ensemble model had achieved an accuracy of 75.0%, sensitivity of 81.9% and specificity of 64.6%. The model was also able to identify 22 of 23 withdrawn drugs or drugs with black box warning against hepatotoxicity. Dronedarone which is associated with severe liver injuries, announced in a recent FDA drug safety communication, was predicted as hepatotoxic by the ensemble model. It was found that the ensemble model was capable of classifying positive compounds (with hepatic effects) well, but less so on negatives compounds when they were structurally similar. The ensemble model built in this study is made available for public use.

  7. Mixed learning algorithms and features ensemble in hepatotoxicity prediction.

    PubMed

    Liew, Chin Yee; Lim, Yen Ching; Yap, Chun Wei

    2011-09-01

    Drug-induced liver injury, although infrequent, is an important safety concern that can lead to fatality in patients and failure in drug developments. In this study, we have used an ensemble of mixed learning algorithms and mixed features for the development of a model to predict hepatic effects. This robust method is based on the premise that no single learning algorithm is optimum for all modelling problems. An ensemble model of 617 base classifiers was built from a diverse set of 1,087 compounds. The ensemble model was validated internally with five-fold cross-validation and 25 rounds of y-randomization. In the external validation of 120 compounds, the ensemble model had achieved an accuracy of 75.0%, sensitivity of 81.9% and specificity of 64.6%. The model was also able to identify 22 of 23 withdrawn drugs or drugs with black box warning against hepatotoxicity. Dronedarone which is associated with severe liver injuries, announced in a recent FDA drug safety communication, was predicted as hepatotoxic by the ensemble model. It was found that the ensemble model was capable of classifying positive compounds (with hepatic effects) well, but less so on negatives compounds when they were structurally similar. The ensemble model built in this study is made available for public use. © Springer Science+Business Media B.V. 2011

  8. Clinicopathologic, Immunohistochemical, and Molecular Features of Histiocytoid Sweet Syndrome.

    PubMed

    Alegría-Landa, Victoria; Rodríguez-Pinilla, Socorro María; Santos-Briz, Angel; Rodríguez-Peralto, José Luis; Alegre, Victor; Cerroni, Lorenzo; Kutzner, Heinz; Requena, Luis

    2017-07-01

    Histiocytoid Sweet syndrome is a rare histopathologic variant of Sweet syndrome. The nature of the histiocytoid infiltrate has generated considerable controversy in the literature. The main goal of this study was to conduct a comprehensive overview of the immunohistochemical phenotype of the infiltrate in histiocytoid Sweet syndrome. We also analyze whether this variant of Sweet syndrome is more frequently associated with hematologic malignancies than classic Sweet syndrome. This is a retrospective case series study of the clinicopathologic, immunohistochemical, and molecular features of 33 patients with a clinicopathologic diagnosis of histiocytoid Sweet syndrome was conducted in the dermatology departments of 5 university hospitals and a private laboratory of dermatopathology. The clinical, histopathological, immunohistochemical, and follow-up features of 33 patients with histiocytoid Sweet syndrome were analyzed. In some cases, cytogenetic studies of the dermal infiltrate were also performed. We compare our findings with those of the literature. The dermal infiltrate from the 33 study patients (20 female; median age, 49 years; age range, 5-93 years; and 13 male; median age, 42 years; age range, 4-76 years) was mainly composed of myeloperoxidase-positive immature myelomonocytic cells with histiocytoid morphology. No cytogenetic anomalies were found in the infiltrate except in 1 case in which neoplastic cells of chronic myelogenous leukemia were intermingled with the cells of histiocytoid Sweet syndrome. Authentic histiocytes were also found in most cases, with a mature immunoprofile, but they appeared to be a minor component of the infiltrate. Histiocytoid Sweet syndrome was not more frequently related with hematologic malignancies than classic neutrophilic Sweet syndrome. The dermal infiltrate of cutaneous lesions of histiocytoid Sweet syndrome is composed mostly of immature cells of myeloid lineage. This infiltrate should not be interpreted as leukemia cutis.

  9. Imaging features of automated breast volume scanner: Correlation with molecular subtypes of breast cancer.

    PubMed

    Zheng, Feng-Yang; Lu, Qing; Huang, Bei-Jian; Xia, Han-Sheng; Yan, Li-Xia; Wang, Xi; Yuan, Wei; Wang, Wen-Ping

    2017-01-01

    To investigate the correlation between the imaging features obtained by an automated breast volume scanner (ABVS) and molecular subtypes of breast cancer. We examined 303 malignant breast tumours by ABVS for specific imaging features and by immunohistochemical analysis to determine the molecular subtype. ABVS imaging features, including retraction phenomenon, shape, margins, echogenicity, post-acoustic features, echogenic halo, and calcifications were analysed by univariate and multivariate logistic regression analyses to determine the significant predictive factors of the molecular subtypes. By univariate logistic regression analysis, the predictive factors of the Luminal-A subtype (n=128) were retraction phenomenon (odds ratio [OR]=10.188), post-acoustic shadowing (OR=5.112), and echogenic halo (OR=3.263, P<0.001). The predictive factors of the Human-epidermal-growth-factor-receptor-2-amplified subtype (n=39) were calcifications (OR=6.210), absence of retraction phenomenon (OR=4.375), non-mass lesions (OR=4.286, P<0.001), absence of echogenic halo (OR=3.851, P=0.035), and post-acoustic enhancement (OR=3.641, P=0.008). The predictors for the Triple-Negative subtype (n=47) were absence of retraction phenomenon (OR=5.884), post-acoustic enhancement (OR=5.255, P<0.001), absence of echogenic halo (OR=4.138, P=0.002), and absence of calcifications (OR=3.363, P=0.001). Predictors for the Luminal-B subtype (n=89) had a relatively lower association (OR≤2.328). By multivariate logistic regression analysis, retraction phenomenon was the strongest independent predictor for the Luminal-A subtype (OR=9.063, P<0.001) when present and for the Triple-Negative subtype (OR=4.875, P<0.001) when absent. ABVS imaging features, especially retraction phenomenon, have a strong correlation with the molecular subtypes, expanding the scope of ultrasound in identifying breast cancer subtypes with confidence. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  10. Predictive Features of a Cockpit Traffic Display: A Workload Assessment

    NASA Technical Reports Server (NTRS)

    Wickens, Christopher D.; Morphew, Ephimia

    1997-01-01

    Eighteen pilots flew a series of traffic avoidance maneuvers in an experiment designed to assess the support offered and workload imposed by different levels of traffic display information in a free flight simulation. Three display prototypes were compared which differed in traffic information provided. A BASELINE (BL) display provided current and (2nd order) predicted information regarding ownship and current information of an intruder aircraft, represented on lateral and vertical displays in a coplanar suite. An INTRUDER PREDICTOR (IP) display, augmented the baseline display by providing lateral and vertical prediction of the intruder aircraft. A THREAT VECTOR (TV) display added to the IP display a vector that indicates the direction from ownship to the intruder at the predicted point of closest contact (POCC). The length of the vector corresponds to the radius of the protected zone, and the distance of the intersection of the vector with ownship predictor, corresponds to the time available till POCC or loss of separation. Pilots time shared the traffic avoidance task with a secondary task requiring them to monitor the top of the display for faint targets. This task simulated the visual demands of out-of-cockpit scanning, and hence was used to estimate the head-down time required by the different display formats. The results revealed that both display augmentations improved performance (safety) as assessed by predicted and actual loss of separation (i.e., penetration of the protected zone). Both enhancements also reduced workload, as assessed by the NASA TLX scale. The intruder predictor display produced these benefits with no substantial impact on the qualitative nature of the avoidance maneuvers that were selected. The threat vector produced the safety benefits by inducing a greater degree of (effective) lateral maneuvering, thus partially offsetting the benefits of reduced workload. The three displays did not differ in terms of their effect on performance of

  11. Delta hepatitis: molecular biology and clinical and epidemiological features.

    PubMed Central

    Polish, L B; Gallagher, M; Fields, H A; Hadler, S C

    1993-01-01

    Hepatitis delta virus, discovered in 1977, requires the help of hepatitis B virus to replicate in hepatocytes and is an important cause of acute, fulminant, and chronic liver disease in many regions of the world. Because of the helper function of hepatitis delta virus, infection with it occurs either as a coinfection with hepatitis B or as a superinfection of a carrier of hepatitis B surface antigen. Although the mechanisms of transmission are similar to those of hepatitis B virus, the patterns of transmission of delta virus vary widely around the world. In regions of the world in which hepatitis delta virus infection is not endemic, the disease is confined to groups at high risk of acquiring hepatitis B infection and high-risk hepatitis B carriers. Because of the propensity of this viral infection to cause fulminant as well as chronic liver disease, continued incursion of hepatitis delta virus into areas of the world where persistent hepatitis B infection is endemic will have serious implications. Prevention depends on the widespread use of hepatitis B vaccine. This review focuses on the molecular biology and the clinical and epidemiologic features of this important viral infection. PMID:8358704

  12. Molecular biology of testicular germ cell tumors: unique features awaiting clinical application.

    PubMed

    Boublikova, Ludmila; Buchler, Tomas; Stary, Jan; Abrahamova, Jitka; Trka, Jan

    2014-03-01

    Testicular germ cell tumors (TGCTs) are the most common solid tumors in young adult men characterized by distinct biologic features and clinical behavior. Both genetic predispositions and environmental factors probably play a substantial role in their etiology. TGTCs arise from a malignant transformation of primordial germ cells in a process that starts prenatally, is often associated with a certain degree of gonadal dysgenesis, and involves the acquirement of several specific aberrations, including activation of SCF-CKIT, amplification of 12p with up-regulation of stem cell genes, and subsequent genetic and epigenetic alterations. Their embryonic and germ origin determines the unique sensitivity of TGCTs to platinum-based chemotherapy. Contrary to the vast majority of other malignancies, no molecular prognostic/predictive factors nor targeted therapy is available for patients with these tumors. This review summarizes the principal molecular characteristics of TGCTs that could represent a potential basis for development of novel diagnostic and treatment approaches.

  13. Radiogenomics of Glioblastoma: Machine Learning-based Classification of Molecular Characteristics by Using Multiparametric and Multiregional MR Imaging Features.

    PubMed

    Kickingereder, Philipp; Bonekamp, David; Nowosielski, Martha; Kratz, Annekathrin; Sill, Martin; Burth, Sina; Wick, Antje; Eidel, Oliver; Schlemmer, Heinz-Peter; Radbruch, Alexander; Debus, Jürgen; Herold-Mende, Christel; Unterberg, Andreas; Jones, David; Pfister, Stefan; Wick, Wolfgang; von Deimling, Andreas; Bendszus, Martin; Capper, David

    2016-12-01

    Purpose To evaluate the association of multiparametric and multiregional magnetic resonance (MR) imaging features with key molecular characteristics in patients with newly diagnosed glioblastoma. Materials and Methods Retrospective data evaluation was approved by the local ethics committee, and the requirement to obtain informed consent was waived. Preoperative MR imaging features were correlated with key molecular characteristics within a single-institution cohort of 152 patients with newly diagnosed glioblastoma. Preoperative MR imaging features (n = 31) included multiparametric (anatomic and diffusion-, perfusion-, and susceptibility-weighted images) and multiregional (contrast-enhancing regions and hyperintense regions at nonenhanced fluid-attenuated inversion recovery imaging) information with histogram quantification of tumor volumes, volume ratios, apparent diffusion coefficients, cerebral blood flow, cerebral blood volume, and intratumoral susceptibility signals. Molecular characteristics determined included global DNA methylation subgroups (eg, mesenchymal, RTK I "PGFRA," RTK II "classic"), MGMT promoter methylation status, and hallmark copy number variations (EGFR, PDGFRA, MDM4, and CDK4 amplification; PTEN, CDKN2A, NF1, and RB1 loss). Univariate analyses (voxel-lesion symptom mapping for tumor location, Wilcoxon test for all other MR imaging features) and machine learning models were applied to study the strength of association and discriminative value of MR imaging features for predicting underlying molecular characteristics. Results There was no tumor location predilection for any of the assessed molecular parameters (permutation-adjusted P > .05). Univariate imaging parameter associations were noted for EGFR amplification and CDKN2A loss, with both demonstrating increased Gaussian-normalized relative cerebral blood volume and Gaussian-normalized relative cerebral blood flow values (area under the receiver operating characteristics curve: 63

  14. PredictProtein--an open resource for online prediction of protein structural and functional features.

    PubMed

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-07-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein-protein binding sites (ISIS2), protein-polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Beyond [lambda][subscript max] Part 2: Predicting Molecular Color

    ERIC Educational Resources Information Center

    Williams, Darren L.; Flaherty, Thomas J.; Alnasleh, Bassam K.

    2009-01-01

    A concise roadmap for using computational chemistry programs (i.e., Gaussian 03W) to predict the color of a molecular species is presented. A color-predicting spreadsheet is available with the online material that uses transition wavelengths and peak-shape parameters to predict the visible absorbance spectrum, transmittance spectrum, chromaticity…

  16. Beyond [lambda][subscript max] Part 2: Predicting Molecular Color

    ERIC Educational Resources Information Center

    Williams, Darren L.; Flaherty, Thomas J.; Alnasleh, Bassam K.

    2009-01-01

    A concise roadmap for using computational chemistry programs (i.e., Gaussian 03W) to predict the color of a molecular species is presented. A color-predicting spreadsheet is available with the online material that uses transition wavelengths and peak-shape parameters to predict the visible absorbance spectrum, transmittance spectrum, chromaticity…

  17. Optimized feature subsets for epileptic seizure prediction studies.

    PubMed

    Direito, Bruno; Ventura, Francisco; Teixeira, César; Dourado, António

    2011-01-01

    The reduction of the number of EEG features to give as inputs to epilepsy seizure predictors is a needed step towards the development of a transportable device for real-time warning. This paper presents a comparative study of three feature selection methods, based on Support Vector Machines. Minimum-Redundancy Maximum-Relevance, Recursive Feature Elimination, Genetic Algorithms, show that, for three patients of the European Database on Epilepsy, the most important univariate features are related to spectral information and statistical moments.

  18. Lung Cancer Prediction Using Neural Network Ensemble with Histogram of Oriented Gradient Genomic Features

    PubMed Central

    Adetiba, Emmanuel; Olugbara, Oludayo O.

    2015-01-01

    This paper reports an experimental comparison of artificial neural network (ANN) and support vector machine (SVM) ensembles and their “nonensemble” variants for lung cancer prediction. These machine learning classifiers were trained to predict lung cancer using samples of patient nucleotides with mutations in the epidermal growth factor receptor, Kirsten rat sarcoma viral oncogene, and tumor suppressor p53 genomes collected as biomarkers from the IGDB.NSCLC corpus. The Voss DNA encoding was used to map the nucleotide sequences of mutated and normal genomes to obtain the equivalent numerical genomic sequences for training the selected classifiers. The histogram of oriented gradient (HOG) and local binary pattern (LBP) state-of-the-art feature extraction schemes were applied to extract representative genomic features from the encoded sequences of nucleotides. The ANN ensemble and HOG best fit the training dataset of this study with an accuracy of 95.90% and mean square error of 0.0159. The result of the ANN ensemble and HOG genomic features is promising for automated screening and early detection of lung cancer. This will hopefully assist pathologists in administering targeted molecular therapy and offering counsel to early stage lung cancer patients and persons in at risk populations. PMID:25802891

  19. Breast cancer in young women: Pathologic features and molecular phenotype.

    PubMed

    Sabiani, Laura; Houvenaeghel, Gilles; Heinemann, Mellie; Reyal, Fabien; Classe, Jean Marc; Cohen, Monique; Garbay, Jean Rémy; Giard, Sylvia; Charitansky, Hélène; Chopin, Nicolas; Rouzier, Roman; Daraï, Emile; Coutant, Charles; Azuar, Pierre; Gimbergues, Pierre; Villet, Richard; Tunon de Lara, Christine; Lambaudie, Eric

    2016-10-01

    Controversy exists about the prognosis of breast cancer in young women. Our objective was to describe clinicopathological and prognostic features to improve adjuvant treatment indications. We conducted a retrospective multi centre study including fifteen French hospitals. Disease-free survival's data, clinical and pathological criteria were collected. 5815 patients were included, 15.6% of them where between 35 and 40 years old and 8.7% below 35. In 94% of the cases, a palpable masse was found in patients ≤35 years old. Triple negative and HER2 tumors were predominantly found in patients ≤35 (22.2% and 22.1%, p < 0.01). A young age ≤40 years (p < 0.001; hazard ratio [HR]: 2.05; 95% confidence limit [CL]: 1.60-2.63) or ≤35 years (p < 0.001; [HR]: 3.86; 95% [CL]: 2.69-5.53) impacted on the indication of chemotherapy. Age ≤35 (p < 0.001; [HR]: 2.01; 95% [CL]: 1.36-2.95) was a significantly negative factor on disease-free survival. Chemotherapy (p < 0.006; [HR]: 0.6; 95% [CL]: 0.40-0.86) and positive hormone receptor status (p < 0.001; [HR]: 0.6; 95% [CL]: 0.54-0.79) appeared to be protector factors. Patients under 36, had a significantly higher rate of local recurrence and distant metastasis compared to patients >35-40 (21.5 vs. 15.4% and 21.8 vs. 12.6%, p < 0.01). Young women present a different distribution of molecular phenotypes with more luminal B and triple negative tumors with a higher grade and more lymph node involvement. A young age, must be taken as a pejorative prognostic factor and must play a part in indication of adjuvant therapy. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Exploiting heterogeneous features to improve in silico prediction of peptide status – amyloidogenic or non-amyloidogenic

    PubMed Central

    2011-01-01

    Background Prediction of short stretches in protein sequences capable of forming amyloid-like fibrils is important in understanding the underlying cause of amyloid illnesses thereby aiding in the discovery of sequence-targeted anti-aggregation pharmaceuticals. Due to the constraints of experimental molecular techniques in identifying such motif segments, it is highly desirable to develop computational methods to provide better and affordable in silico predictions. Results Accurate in silico prediction techniques of amyloidogenic peptide regions rely on the cooperation between informative features and classifier design. In this research article, we propose one such efficient fibril prediction implementation exploiting heterogeneous features based on bio-physio-chemical (BPC) properties, auto-correlation function of carefully selected amino acid indices and atomic composition within a protein fragment of amino acids in a window. In an attempt to get an optimal number of BPC features, an evolutionary Support Vector Machine (SVM) integrating a novel implementation of hybrid Genetic Algorithm termed Memetic Algorithm and SVM is utilized. Five prediction modules designed using Artificial Neural Network (ANN) models are trained with independent and integrated features in order to validate the fibril forming motifs. The results provide evidence that incorporating new feature namely auto-correlation function besides BPC, attempt to strengthen the sequence interaction effect in forming the feature vector thereby obtaining better prediction quality in terms of sensitivity, specificity, Mathews Correlation Coefficient and Area under the Receiver Operating Characteristics curve. Conclusion A significant improvement in performance is observed by introducing features like auto-correlation function that maintains sequence order effect, in addition to the conventional BPC properties selected through a novel optimization strategy to predict the peptide status – amyloidogenic or

  1. Radiogenomic analysis of breast cancer: dynamic contrast enhanced - magnetic resonance imaging based features are associated with molecular subtypes

    NASA Astrophysics Data System (ADS)

    Wang, Shijian; Fan, Ming; Zhang, Juan; Zheng, Bin; Wang, Xiaojia; Li, Lihua

    2016-03-01

    Breast cancer is one of the most common malignant tumor with upgrading incidence in females. The key to decrease the mortality is early diagnosis and reasonable treatment. Molecular classification could provide better insights into patient-directed therapy and prognosis prediction of breast cancer. It is known that different molecular subtypes have different characteristics in magnetic resonance imaging (MRI) examination. Therefore, we assumed that imaging features can reflect molecular information in breast cancer. In this study, we investigated associations between dynamic contrasts enhanced MRI (DCE-MRI) features and molecular subtypes in breast cancer. Sixty patients with breast cancer were enrolled and the MR images were pre-processed for noise reduction, registration and segmentation. Sixty-five dimensional imaging features including statistical characteristics, morphology, texture and dynamic enhancement in breast lesion and background regions were semiautomatically extracted. The associations between imaging features and molecular subtypes were assessed by using statistical analyses, including univariate logistic regression and multivariate logistic regression. The results of multivariate regression showed that imaging features are significantly associated with molecular subtypes of Luminal A (p=0.00473), HER2-enriched (p=0.00277) and Basal like (p=0.0117), respectively. The results indicated that three molecular subtypes are correlated with DCE-MRI features in breast cancer. Specifically, patients with a higher level of compactness or lower level of skewness in breast lesion are more likely to be Luminal A subtype. Besides, the higher value of the dynamic enhancement at T1 time in normal side reflect higher possibility of HER2-enriched subtype in breast cancer.

  2. Heart failure: molecular, genetic and epigenetic features of the disease.

    PubMed

    D'Alessandro, R; Roselli, T; Valente, F; Iannaccone, M; Capogrosso, C; Petti, G; Alfano, G; Masarone, D; Ziello, B; Fimiani, F; Pacileo, G; Russo, M G; Calabrò, P; Limongelli, G; Maddaloni, V; Calabrò, R

    2012-12-01

    Factors that compete to establish heart failure (HF) are not completely known. In the last years the several technological improvements allowed us to deeply study the molecular and genetic aspects of this complex syndrome. This new approach to HF based on molecular biology new discoveries shows us more clearly the pathophysiological bases of this disease, and a future scenery where the genetics may be useful in the clinical practice, as screening of high risk populations, as well as in the diagnosis and therapy of underlying myocardial diseases. The purpose of this review was to analyse the molecular, genetic and epigenetic factors of HF. We described the molecular anatomy of the sarcomere and the pathogenesis of the heart muscle diseases, abandoning the previous monogenic theory for the concept of a polygenic disease. Different actors play a role to cause the illness by themselves, modifying the expression of the disease and, eventually, the prognosis of the patient.

  3. Feature Parameter Optimization for Seizure Detection/Prediction

    DTIC Science & Technology

    2007-11-02

    the window length for the feature under consideration. Figure 4 illustrates the variation of the k-factor for the fractal dimension feature, as...r Figure 4: K-Factor from the Fractal Dimension for Different Window Sizes Typically, the window sizes that maximized the k-factor were...Esteller R., Ph.D dissertation “Detection of seizure onset in epileptic patients from intracranial EEG signals ”, Georgia Institute of Technology

  4. Structure and functional features of olive pollen pectin methylesterase using homology modeling and molecular docking methods.

    PubMed

    Jimenez-Lopez, Jose C; Kotchoni, Simeon O; Rodríguez-García, María I; Alché, Juan D

    2012-12-01

    Pectin methylesterases (PMEs), a multigene family of proteins with multiple differentially regulated isoforms, are key enzymes implicated in the carbohydrates (pectin) metabolism of cell walls. Olive pollen PME has been identified as a new allergen (Ole e 11) of potential relevance in allergy amelioration, since it exhibits high prevalence among atopic patients. In this work, the structural and functional characterization of two olive pollen PME isoforms and their comparison with other PME plants was performed by using different approaches: (1) the physicochemical properties and functional-regulatory motifs characterization, (2) primary sequence analysis, 2D and 3D comparative structural features study, (3) conservation and evolutionary analysis, (4) catalytic activity and regulation based on molecular docking analysis of a homologue PME inhibitor, and (5) B-cell epitopes prediction by sequence and structural based methods and protein-protein interaction tools, while T-cell epitopes by inhibitory concentration and binding score methods. Our results indicate that the structural differences and low conservation of residues, together with differences in physicochemical and posttranslational motifs might be a mechanism for PME isovariants generation, regulation, and differential surface epitopes generation. Olive PMEs perform a processive catalytic mechanism, and a differential molecular interaction with specific PME inhibitor, opening new possibilities for PME activity regulation. Despite the common function of PMEs, differential features found in this study will lead to a better understanding of the structural and functional characterization of plant PMEs and help to improve the component-resolving diagnosis and immunotherapy of olive pollen allergy by epitopes identification.

  5. Molecular Dications and the Auroral Mystery Feature: Measurements on Nitrogen

    NASA Astrophysics Data System (ADS)

    Daw, A. N.; Brewer, S. M.; Estes, C. C.; Kanoy, J. A.; Myer, B. W.; Calamai, A. G.

    2006-05-01

    Experiments in progress at the ASU ion trapping facility will provide atomic and molecular data for N^+, N2^++, and N2, specifically, measurements of: the radiative lifetime of the ^5S metastable level of N^+, the dissociation rate of N2^++, electron capture rates from molecular nitrogen for both these ions, and the cross section for dissociative electron impact ionization of molecular nitrogen into metastable ^5S N^+. Ions are created in a radiofrequency ion trap by electron bombardment on nitrogen gas, and both the number of stored ions and the UV radiation emitted by the stored ion population (from decaying metastable N^+(^5S) ions and N2^+++N2 reactions) are measured as a function of time. Preliminary data and results will be presented

  6. MSACompro: improving multiple protein sequence alignment by predicted structural features.

    PubMed

    Deng, Xin; Cheng, Jianlin

    2014-01-01

    Multiple Sequence Alignment (MSA) is an essential tool in protein structure modeling, gene and protein function prediction, DNA motif recognition, phylogenetic analysis, and many other bioinformatics tasks. Therefore, improving the accuracy of multiple sequence alignment is an important long-term objective in bioinformatics. We designed and developed a new method MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into the currently most accurate posterior probability-based MSA methods to improve the accuracy of multiple sequence alignments. Different from the multiple sequence alignment methods that use the tertiary structure information of some sequences, our method uses the structural information purely predicted from sequences. In this chapter, we first introduce some background and related techniques in the field of multiple sequence alignment. Then, we describe the detailed algorithm of MSACompro. Finally, we show that integrating predicted protein structural information improved the multiple sequence alignment accuracy.

  7. Clinicopathological features of nonspecific invasive breast cancer according to its molecular subtypes.

    PubMed

    2016-06-01

    The aim of the present study was to investigate the clinical and morphological features of nonspecific invasive breast cancer according to its molecular subtypes. 163 women with nonspecific invasive breast cancer (T1-4N0-3M0) were included in the present study. Luminal A type of breast cancer was detected in 101 women, luminal B type - in 23 women, overexpression of HER2/neu was identified in 14 women and triple-negative cancer - in 25 women. The study revealed that various molecular subtypes of breast cancer differ in the morphological structure, the expression characteristics of the primary tumor and the rate of lymphogenous and hematogenous metastasis. Lymphogenous metastases were more frequently (in 71%) detected in HER2/neu overexpressing breast cancer than in luminal A (41%), luminal B (39%) and triple-negative tumors (40%). Hematogenous metastasis did not depend on the morphological structure of carcinoma infiltrative component, the state of tumor stroma as well as the proliferative activity in all the investigated groups. The revealed clinicopathological characteristics of different molecular subtypes of invasive breast cancer allow to predict the possible outcome of the disease and select personalized treatment strategy for patients more reasonably.

  8. Visual Prediction Error Spreads Across Object Features in Human Visual Cortex.

    PubMed

    Jiang, Jiefeng; Summerfield, Christopher; Egner, Tobias

    2016-12-14

    Visual cognition is thought to rely heavily on contextual expectations. Accordingly, previous studies have revealed distinct neural signatures for expected versus unexpected stimuli in visual cortex. However, it is presently unknown how the brain combines multiple concurrent stimulus expectations such as those we have for different features of a familiar object. To understand how an unexpected object feature affects the simultaneous processing of other expected feature(s), we combined human fMRI with a task that independently manipulated expectations for color and motion features of moving-dot stimuli. Behavioral data and neural signals from visual cortex were then interrogated to adjudicate between three possible ways in which prediction error (surprise) in the processing of one feature might affect the concurrent processing of another, expected feature: (1) feature processing may be independent; (2) surprise might "spread" from the unexpected to the expected feature, rendering the entire object unexpected; or (3) pairing a surprising feature with an expected feature might promote the inference that the two features are not in fact part of the same object. To formalize these rival hypotheses, we implemented them in a simple computational model of multifeature expectations. Across a range of analyses, behavior and visual neural signals consistently supported a model that assumes a mixing of prediction error signals across features: surprise in one object feature spreads to its other feature(s), thus rendering the entire object unexpected. These results reveal neurocomputational principles of multifeature expectations and indicate that objects are the unit of selection for predictive vision.

  9. Predicting new molecular targets for known drugs

    PubMed Central

    Keiser, Michael J.; Setola, Vincent; Irwin, John J.; Laggner, Christian; Abbas, Atheir; Hufeisen, Sandra J.; Jensen, Niels H.; Kuijer, Michael B.; Matos, Roberto C.; Tran, Thuy B.; Whaley, Ryan; Glennon, Richard A.; Hert, Jérôme; Thomas, Kelan L.H.; Edwards, Douglas D.; Shoichet, Brian K.; Roth, Bryan L.

    2009-01-01

    Whereas drugs are intended to be selective, at least some bind to several physiologic targets, explaining both side effects and efficacy. As many drug-target combinations exist, it would be useful to explore possible interactions computationally. Here, we compared 3,665 FDA-approved and investigational drugs against hundreds of targets, defining each target by its ligands. Chemical similarities between drugs and ligand sets predicted thousands of unanticipated associations. Thirty were tested experimentally, including the antagonism of the β1 receptor by the transporter inhibitor Prozac, the inhibition of the 5-HT transporter by the ion channel drug Vadilex, and antagonism of the histamine H4 receptor by the enzyme inhibitor Rescriptor. Overall, 23 new drug-target associations were confirmed, five of which were potent (< 100 nM). The physiological relevance of one such, the drug DMT on serotonergic receptors, was confirmed in a knock-out mouse. The chemical similarity approach is systematic and comprehensive, and may suggest side-effects and new indications for many drugs. PMID:19881490

  10. Identification of Prognostic Molecular Features in the Reactive Stroma of Human Breast and Prostate Cancer

    PubMed Central

    Provero, Paolo; Fusco, Carlo; Delorenzi, Mauro; Stehle, Jean-Christophe; Stamenkovic, Ivan

    2011-01-01

    Primary tumor growth induces host tissue responses that are believed to support and promote tumor progression. Identification of the molecular characteristics of the tumor microenvironment and elucidation of its crosstalk with tumor cells may therefore be crucial for improving our understanding of the processes implicated in cancer progression, identifying potential therapeutic targets, and uncovering stromal gene expression signatures that may predict clinical outcome. A key issue to resolve, therefore, is whether the stromal response to tumor growth is largely a generic phenomenon, irrespective of the tumor type or whether the response reflects tumor-specific properties. To address similarity or distinction of stromal gene expression changes during cancer progression, oligonucleotide-based Affymetrix microarray technology was used to compare the transcriptomes of laser-microdissected stromal cells derived from invasive human breast and prostate carcinoma. Invasive breast and prostate cancer-associated stroma was observed to display distinct transcriptomes, with a limited number of shared genes. Interestingly, both breast and prostate tumor-specific dysregulated stromal genes were observed to cluster breast and prostate cancer patients, respectively, into two distinct groups with statistically different clinical outcomes. By contrast, a gene signature that was common to the reactive stroma of both tumor types did not have survival predictive value. Univariate Cox analysis identified genes whose expression level was most strongly associated with patient survival. Taken together, these observations suggest that the tumor microenvironment displays distinct features according to the tumor type that provides survival-predictive value. PMID:21611158

  11. Analysis and prediction of drug-drug interaction by minimum redundancy maximum relevance and incremental feature selection.

    PubMed

    Liu, Lili; Chen, Lei; Zhang, Yu-Hang; Wei, Lai; Cheng, Shiwen; Kong, Xiangyin; Zheng, Mingyue; Huang, Tao; Cai, Yu-Dong

    2017-02-01

    Drug-drug interaction (DDI) defines a situation in which one drug affects the activity of another when both are administered together. DDI is a common cause of adverse drug reactions and sometimes also leads to improved therapeutic effects. Therefore, it is of great interest to discover novel DDIs according to their molecular properties and mechanisms in a robust and rigorous way. This paper attempts to predict effective DDIs using the following properties: (1) chemical interaction between drugs; (2) protein interactions between the targets of drugs; and (3) target enrichment of KEGG pathways. The data consisted of 7323 pairs of DDIs collected from the DrugBank and 36,615 pairs of drugs constructed by randomly combining two drugs. Each drug pair was represented by 465 features derived from the aforementioned three categories of properties. The random forest algorithm was adopted to train the prediction model. Some feature selection techniques, including minimum redundancy maximum relevance and incremental feature selection, were used to extract key features as the optimal input for the prediction model. The extracted key features may help to gain insights into the mechanisms of DDIs and provide some guidelines for the relevant clinical medication developments, and the prediction model can give new clues for identification of novel DDIs.

  12. Epileptic Seizure Prediction based on Ratio and Differential Linear Univariate Features

    PubMed Central

    Rasekhi, Jalil; Mollaei, Mohammad Reza Karami; Bandarabadi, Mojtaba; Teixeira, César A.; Dourado, António

    2015-01-01

    Bivariate features, obtained from multichannel electroencephalogram recordings, quantify the relation between different brain regions. Studies based on bivariate features have shown optimistic results for tackling epileptic seizure prediction problem in patients suffering from refractory epilepsy. A new bivariate approach using univariate features is proposed here. Differences and ratios of 22 linear univariate features were calculated using pairwise combination of 6 electroencephalograms channels, to create 330 differential, and 330 relative features. The feature subsets were classified using support vector machines separately, as one of the two classes of preictal and nonpreictal. Furthermore, minimum Redundancy Maximum Relevance feature reduction method is employed to improve the predictions and reduce the number of false alarms. The studies were carried out on features obtained from 10 patients. For reduced subset of 30 features and using differential approach, the seizures were on average predicted in 60.9% of the cases (28 out of 46 in 737.9 h of test data), with a low false prediction rate of 0.11 h−1. Results of bivariate approaches were compared with those achieved from original linear univariate features, extracted from 6 channels. The advantage of proposed bivariate features is the smaller number of false predictions in comparison to the original 22 univariate features. In addition, reduction in feature dimension could provide a less complex and the more cost-effective algorithm. Results indicate that applying machine learning methods on a multidimensional feature space resulting from relative/differential pairwise combination of 22 univariate features could predict seizure onsets with high performance. PMID:25709936

  13. Molecular biological features of strains of Histomonas meleagridis.

    PubMed

    Munsch, Mareike; Mehlhorn, H; Al-Quraishy, Saleh; Lotfi, Abdul-Rahman; Hafez, H M

    2009-04-01

    Berlin strains of Histomonas meleagridis were subcultivated to produce cyst-like stages. These strains were studied for their ITS 1 and 18S rRNA properties and compared with sequences in data banks of other H. meleagridis strains, Dientamoeba fragilis, and some species of the genus Trichomonas and relatives. The Berlin isolates that had previously been shown to be able to develop cyst-like structures (Munsch et al. 2008) represent a significant cluster among the published data of other Histomonas meleagridis isolates and thus the formation of cysts might be a common feature that would open further possibilities of transmission.

  14. Skeletal Muscle Laminopathies: A Review of Clinical and Molecular Features

    PubMed Central

    Maggi, Lorenzo; Carboni, Nicola; Bernasconi, Pia

    2016-01-01

    LMNA-related disorders are caused by mutations in the LMNA gene, which encodes for the nuclear envelope proteins, lamin A and C, via alternative splicing. Laminopathies are associated with a wide range of disease phenotypes, including neuromuscular, cardiac, metabolic disorders and premature aging syndromes. The most frequent diseases associated with mutations in the LMNA gene are characterized by skeletal and cardiac muscle involvement. This review will focus on genetics and clinical features of laminopathies affecting primarily skeletal muscle. Although only symptomatic treatment is available for these patients, many achievements have been made in clarifying the pathogenesis and improving the management of these diseases. PMID:27529282

  15. Molecular characteristics and chromatin texture features in acute promyelocytic leukemia

    PubMed Central

    2012-01-01

    Background Acute promyelocytic leukemia is a cytogenetically well defined entity. Nevertheless, some features observed at diagnosis are related to a worse outcome of the patients. Methods In a prospective study, we analyzed peripheral (PB) leukocyte count, immunophenotype, methylation status of CDKN2B, CDKN2A and TP73; FLT3 and NPM1 mutations besides nuclear chromatin texture characteristics of the leukemic cells. We also examined the relation of these features with patient’s outcome. Results Among 19 cases, 4 had a microgranular morphology, 7 presented PB leukocytes >10x109/l, 2 had FLT3-ITD and 3 had FLT3-TKD (all three presenting a methylated CDKN2B). NPM1 mutation was not observed. PB leukocyte count showed an inverse relation with standard deviation of gray levels, contrast, cluster prominence, and chromatin fractal dimension (FD). Cases with FLT3-ITD presented a microgranular morphology, PB leukocytosis and expression of HLA-DR, CD34 and CD11b. Concerning nuclear chromatin texture variables, these cases had a lower entropy, contrast, cluster prominence and FD, but higher local homogeneity, and R245, in keeping with more homogeneously distributed chromatin. In the univariate Cox analysis, a higher leukocyte count, FLT3-ITD mutation, microgranular morphology, methylation of CDKN2B, besides a higher local homogeneity of nuclear chromatin, a lower chromatin entropy and FD were associated to a worse outcome. All these features lost significance when the cases were stratified for FLT3-ITD mutation. Methylation status of CDNK2A and TP73 showed no relation to patient’s survival. Conclusion in APL, patients with FLT3-ITD mutation show different clinical characteristics and have blasts with a more homogeneous chromatin texture. Texture analysis demonstrated that FLTD-ITD was accompanied not only by different cytoplasmic features, but also by a change in chromatin structure in routine cytologic preparations. Yet we were not able to detect chromatin changes by

  16. Weighted feature value based Drug Target Protein prediction.

    PubMed

    Hyun, Bo-ra; Jung, Hwiesung; Jang, Woo-Hyuk; Jung, Suk Hoon; Han, Dong-Soo

    2008-01-01

    Drug discovery is a long process in which only a few successful new therapeutic discoveries are made and identification of drug target candidate proteins requires considerable time and efforts. However, the accumulation of information on drugs has made it possible to devise new computational methods for classifying drug target candidates. In this paper, we devise a Drug Target Protein (DT-P) classification method by the summation of weighted features which is extracted from known DT-P. The method is validated using Bayesian decision theory and SVM, and it was revealed to achieve high specificity of 89.5% with 88% accuracy.

  17. Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.

    PubMed

    Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao

    2015-04-01

    Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained.

  18. Personalized Cancer Medicine: Molecular Diagnostics, Predictive biomarkers, and Drug Resistance

    PubMed Central

    Gonzalez de Castro, D; Clarke, P A; Al-Lazikani, B; Workman, P

    2013-01-01

    The progressive elucidation of the molecular pathogenesis of cancer has fueled the rational development of targeted drugs for patient populations stratified by genetic characteristics. Here we discuss general challenges relating to molecular diagnostics and describe predictive biomarkers for personalized cancer medicine. We also highlight resistance mechanisms for epidermal growth factor receptor (EGFR) kinase inhibitors in lung cancer. We envisage a future requiring the use of longitudinal genome sequencing and other omics technologies alongside combinatorial treatment to overcome cellular and molecular heterogeneity and prevent resistance caused by clonal evolution. PMID:23361103

  19. Preprocessing effects of 22 linear univariate features on the performance of seizure prediction methods.

    PubMed

    Rasekhi, Jalil; Mollaei, Mohammad Reza Karami; Bandarabadi, Mojtaba; Teixeira, Cesar A; Dourado, Antonio

    2013-07-15

    Combining multiple linear univariate features in one feature space and classifying the feature space using machine learning methods could predict epileptic seizures in patients suffering from refractory epilepsy. For each patient, a set of twenty-two linear univariate features were extracted from 6 electroencephalogram (EEG) signals to make a 132 dimensional feature space. Preprocessing and normalization methods of the features, which affect the output of the seizure prediction algorithm, were studied in terms of alarm sensitivity and false prediction rate (FPR). The problem of choosing an optimal preictal time was tackled using 4 distinct values of 10, 20, 30, and 40 min. The seizure prediction problem has traditionally been considered a two-class classification problem, which is also exercised here. These studies have been conducted on the features obtained from 10 patients. For each patient, 48 different combinations of methods are compared to find the best configuration. Normalization by dividing by the maximum and smoothing are found to be the best configuration in most of the patients. The results also indicate that applying machine learning methods on a multidimensional feature space of 22 univariate features predicted seizure onsets with high performance. On average, the seizures were predicted in 73.9% of the cases (34 out of 46 in 737.9h of test data), with a FPR of 0.15 h(-1). Copyright © 2013 Elsevier B.V. All rights reserved.

  20. Structural and Molecular Modeling Features of P2X Receptors

    PubMed Central

    Alves, Luiz Anastacio; da Silva, João Herminio Martins; Ferreira, Dinarte Neto Moreira; Fidalgo-Neto, Antonio Augusto; Teixeira, Pedro Celso Nogueira; de Souza, Cristina Alves Magalhães; Caffarena, Ernesto Raúl; de Freitas, Mônica Santos

    2014-01-01

    Currently, adenosine 5′-triphosphate (ATP) is recognized as the extracellular messenger that acts through P2 receptors. P2 receptors are divided into two subtypes: P2Y metabotropic receptors and P2X ionotropic receptors, both of which are found in virtually all mammalian cell types studied. Due to the difficulty in studying membrane protein structures by X-ray crystallography or NMR techniques, there is little information about these structures available in the literature. Two structures of the P2X4 receptor in truncated form have been solved by crystallography. Molecular modeling has proven to be an excellent tool for studying ionotropic receptors. Recently, modeling studies carried out on P2X receptors have advanced our knowledge of the P2X receptor structure-function relationships. This review presents a brief history of ion channel structural studies and shows how modeling approaches can be used to address relevant questions about P2X receptors. PMID:24637936

  1. Molecular features in arsenic-induced lung tumors

    PubMed Central

    2013-01-01

    Arsenic is a well-known human carcinogen, which potentially affects ~160 million people worldwide via exposure to unsafe levels in drinking water. Lungs are one of the main target organs for arsenic-related carcinogenesis. These tumors exhibit particular features, such as squamous cell-type specificity and high incidence among never smokers. Arsenic-induced malignant transformation is mainly related to the biotransformation process intended for the metabolic clearing of the carcinogen, which results in specific genetic and epigenetic alterations that ultimately affect key pathways in lung carcinogenesis. Based on this, lung tumors induced by arsenic exposure could be considered an additional subtype of lung cancer, especially in the case of never-smokers, where arsenic is a known etiological agent. In this article, we review the current knowledge on the various mechanisms of arsenic carcinogenicity and the specific roles of this metalloid in signaling pathways leading to lung cancer. PMID:23510327

  2. Visual Prediction Error Spreads Across Object Features in Human Visual Cortex

    PubMed Central

    Summerfield, Christopher; Egner, Tobias

    2016-01-01

    Visual cognition is thought to rely heavily on contextual expectations. Accordingly, previous studies have revealed distinct neural signatures for expected versus unexpected stimuli in visual cortex. However, it is presently unknown how the brain combines multiple concurrent stimulus expectations such as those we have for different features of a familiar object. To understand how an unexpected object feature affects the simultaneous processing of other expected feature(s), we combined human fMRI with a task that independently manipulated expectations for color and motion features of moving-dot stimuli. Behavioral data and neural signals from visual cortex were then interrogated to adjudicate between three possible ways in which prediction error (surprise) in the processing of one feature might affect the concurrent processing of another, expected feature: (1) feature processing may be independent; (2) surprise might “spread” from the unexpected to the expected feature, rendering the entire object unexpected; or (3) pairing a surprising feature with an expected feature might promote the inference that the two features are not in fact part of the same object. To formalize these rival hypotheses, we implemented them in a simple computational model of multifeature expectations. Across a range of analyses, behavior and visual neural signals consistently supported a model that assumes a mixing of prediction error signals across features: surprise in one object feature spreads to its other feature(s), thus rendering the entire object unexpected. These results reveal neurocomputational principles of multifeature expectations and indicate that objects are the unit of selection for predictive vision. SIGNIFICANCE STATEMENT We address a key question in predictive visual cognition: how does the brain combine multiple concurrent expectations for different features of a single object such as its color and motion trajectory? By combining a behavioral protocol that

  3. Salivary epithelial-myoepithelial carcinoma: clinical, morphological and molecular features.

    PubMed

    De Cecio, R; Cantile, M; Fulciniti, F; Botti, G; Foschini, M P; Losito, N S

    2017-03-01

    Epithelial-myoepithelial carcinoma (EMC) is a rare biphasic tumor accounting for less than 2% of all salivary gland malignancies. It presents as a slowly growing, asymptomatic small size mass, with ulceration of overlying mucosa in some cases. Microscopically, it is characterized by glands lined by the simultaneous presence of two different cell components, inner epithelial cells and outer myoepithelial cells. Immunohistochemical staining of myoepithelial cells is variably positive for vimentin, Smooth Muscle Actin (SMA), Muscle Specific Actin (MSA), S100, Smooth Muscle Myosin Heavy Chain I(SM-MHC), calponin and p63. Several molecular alterations, mainly point mutations, have been described. Mutations of HRAS, AKT1, CTNNB1 and PIK3CA were highlighted in variable percentage of EMC samples. EMC is considered a low-grade malignant tumor with a 5-year survival rate of 94% that may commonly recur locally after resection in 30-50% of cases. At the moment, adequate resection with negative margins is the minimum recommended and necessary treatment. © Copyright Società Italiana di Anatomia Patologica e Citopatologia Diagnostica, Divisione Italiana della International Academy of Pathology.

  4. Accelerating ab initio molecular dynamics simulations by linear prediction methods

    NASA Astrophysics Data System (ADS)

    Herr, Jonathan D.; Steele, Ryan P.

    2016-09-01

    Acceleration of ab initio molecular dynamics (AIMD) simulations can be reliably achieved by extrapolation of electronic data from previous timesteps. Existing techniques utilize polynomial least-squares regression to fit previous steps' Fock or density matrix elements. In this work, the recursive Burg 'linear prediction' technique is shown to be a viable alternative to polynomial regression, and the extrapolation-predicted Fock matrix elements were three orders of magnitude closer to converged elements. Accelerations of 1.8-3.4× were observed in test systems, and in all cases, linear prediction outperformed polynomial extrapolation. Importantly, these accelerations were achieved without reducing the MD integration timestep.

  5. Molecular Pathogenesis and Diagnostic, Prognostic and Predictive Molecular Markers in Sarcoma.

    PubMed

    Mariño-Enríquez, Adrián; Bovée, Judith V M G

    2016-09-01

    Sarcomas are infrequent mesenchymal neoplasms characterized by notable morphological and molecular heterogeneity. Molecular studies in sarcoma provide refinements to morphologic classification, and contribute diagnostic information (frequently), prognostic stratification (rarely) and predict therapeutic response (occasionally). Herein, we summarize the major molecular mechanisms underlying sarcoma pathogenesis and present clinically useful diagnostic, prognostic and predictive molecular markers for sarcoma. Five major molecular alterations are discussed, illustrated with representative sarcoma types, including 1. the presence of chimeric transcription factors, in vascular tumors; 2. abnormal kinase signaling, in gastrointestinal stromal tumor; 3. epigenetic deregulation, in chondrosarcoma, chondroblastoma, and other tumors; 4. deregulated cell survival and proliferation, due to focal copy number alterations, in dedifferentiated liposarcoma; 5. extreme genomic instability, in conventional osteosarcoma as a representative example of sarcomas with highly complex karyotype. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Potential drugs and nondrugs: prediction and identification of important structural features

    PubMed

    Wagener; van Geerestein VJ

    2000-03-01

    Using decision trees, a model to discriminate between potential drugs and nondrugs has been developed. Compounds from the Available Chemical Directory and the World Drug Index databases were used as training set; the molecular structures were represented using extended atom types. The error rate on an independent validation data set is 17.4%. The number of false negatives can be reduced by penalizing the misclassification of drugs so that 92 out of 100 potential drugs are correctly recognized. At the same time, 34 out of 100 nondrugs are classified as potential drugs. The predictions of the model can be used to guide the purchase or selection of compounds for biological screening or the design of combinatorial libraries. The visualization of the generated models in the form of colored trees allowed us to identify a few, surprisingly simple features that explain the most significant differences between drugs and nondrugs in the training set: Just by testing the presence of hydroxyl, tertiary or secondary amino, carboxyl, phenol, or enol groups, already three quarters of all drugs could be correctly recognized. The nondrugs, on the other hand, are characterized by their aromatic nature with a low content of functional groups besides halogens. The general applicability of the model is shown by the predictions made for several Organon databases.

  7. Improving link prediction in complex networks by adaptively exploiting multiple structural features of networks

    NASA Astrophysics Data System (ADS)

    Ma, Chuang; Bao, Zhong-Kui; Zhang, Hai-Feng

    2017-10-01

    So far, many network-structure-based link prediction methods have been proposed. However, these methods only highlight one or two structural features of networks, and then use the methods to predict missing links in different networks. The performances of these existing methods are not always satisfied in all cases since each network has its unique underlying structural features. In this paper, by analyzing different real networks, we find that the structural features of different networks are remarkably different. In particular, even in the same network, their inner structural features are utterly different. Therefore, more structural features should be considered. However, owing to the remarkably different structural features, the contributions of different features are hard to be given in advance. Inspired by these facts, an adaptive fusion model regarding link prediction is proposed to incorporate multiple structural features. In the model, a logistic function combing multiple structural features is defined, then the weight of each feature in the logistic function is adaptively determined by exploiting the known structure information. Last, we use the "learnt" logistic function to predict the connection probabilities of missing links. According to our experimental results, we find that the performance of our adaptive fusion model is better than many similarity indices.

  8. Protein-ligand binding region prediction (PLB-SAVE) based on geometric features and CUDA acceleration.

    PubMed

    Lo, Ying-Tsang; Wang, Hsin-Wei; Pai, Tun-Wen; Tzou, Wen-Shoung; Hsu, Hui-Huang; Chang, Hao-Teng

    2013-01-01

    Protein-ligand interactions are key processes in triggering and controlling biological functions within cells. Prediction of protein binding regions on the protein surface assists in understanding the mechanisms and principles of molecular recognition. In silico geometrical shape analysis plays a primary step in analyzing the spatial characteristics of protein binding regions and facilitates applications of bioinformatics in drug discovery and design. Here, we describe the novel software, PLB-SAVE, which uses parallel processing technology and is ideally suited to extract the geometrical construct of solid angles from surface atoms. Representative clusters and corresponding anchors were identified from all surface elements and were assigned according to the ranking of their solid angles. In addition, cavity depth indicators were obtained by proportional transformation of solid angles and cavity volumes were calculated by scanning multiple directional vectors within each selected cavity. Both depth and volume characteristics were combined with various weighting coefficients to rank predicted potential binding regions. Two test datasets from LigASite, each containing 388 bound and unbound structures, were used to predict binding regions using PLB-SAVE and two well-known prediction systems, SiteHound and MetaPocket2.0 (MPK2). PLB-SAVE outperformed the other programs with accuracy rates of 94.3% for unbound proteins and 95.5% for bound proteins via a tenfold cross-validation process. Additionally, because the parallel processing architecture was designed to enhance the computational efficiency, we obtained an average of 160-fold increase in computational time. In silico binding region prediction is considered the initial stage in structure-based drug design. To improve the efficacy of biological experiments for drug development, we developed PLB-SAVE, which uses only geometrical features of proteins and achieves a good overall performance for protein-ligand binding

  9. Protein-ligand binding region prediction (PLB-SAVE) based on geometric features and CUDA acceleration

    PubMed Central

    2013-01-01

    Background Protein-ligand interactions are key processes in triggering and controlling biological functions within cells. Prediction of protein binding regions on the protein surface assists in understanding the mechanisms and principles of molecular recognition. In silico geometrical shape analysis plays a primary step in analyzing the spatial characteristics of protein binding regions and facilitates applications of bioinformatics in drug discovery and design. Here, we describe the novel software, PLB-SAVE, which uses parallel processing technology and is ideally suited to extract the geometrical construct of solid angles from surface atoms. Representative clusters and corresponding anchors were identified from all surface elements and were assigned according to the ranking of their solid angles. In addition, cavity depth indicators were obtained by proportional transformation of solid angles and cavity volumes were calculated by scanning multiple directional vectors within each selected cavity. Both depth and volume characteristics were combined with various weighting coefficients to rank predicted potential binding regions. Results Two test datasets from LigASite, each containing 388 bound and unbound structures, were used to predict binding regions using PLB-SAVE and two well-known prediction systems, SiteHound and MetaPocket2.0 (MPK2). PLB-SAVE outperformed the other programs with accuracy rates of 94.3% for unbound proteins and 95.5% for bound proteins via a tenfold cross-validation process. Additionally, because the parallel processing architecture was designed to enhance the computational efficiency, we obtained an average of 160-fold increase in computational time. Conclusions In silico binding region prediction is considered the initial stage in structure-based drug design. To improve the efficacy of biological experiments for drug development, we developed PLB-SAVE, which uses only geometrical features of proteins and achieves a good overall performance

  10. Predictable convergence in hemoglobin function has unpredictable molecular underpinnings.

    PubMed

    Natarajan, Chandrasekhar; Hoffmann, Federico G; Weber, Roy E; Fago, Angela; Witt, Christopher C; Storz, Jay F

    2016-10-21

    To investigate the predictability of genetic adaptation, we examined the molecular basis of convergence in hemoglobin function in comparisons involving 56 avian taxa that have contrasting altitudinal range limits. Convergent increases in hemoglobin-oxygen affinity were pervasive among high-altitude taxa, but few such changes were attributable to parallel amino acid substitutions at key residues. Thus, predictable changes in biochemical phenotype do not have a predictable molecular basis. Experiments involving resurrected ancestral proteins revealed that historical substitutions have context-dependent effects, indicating that possible adaptive solutions are contingent on prior history. Mutations that produce an adaptive change in one species may represent precluded possibilities in other species because of differences in genetic background. Copyright © 2016, American Association for the Advancement of Science.

  11. Prediction and analysis of quorum sensing peptides based on sequence features.

    PubMed

    Rajput, Akanksha; Gupta, Amit Kumar; Kumar, Manoj

    2015-01-01

    Quorum sensing peptides (QSPs) are the signaling molecules used by the Gram-positive bacteria in orchestrating cell-to-cell communication. In spite of their enormous importance in signaling process, their detailed bioinformatics analysis is lacking. In this study, QSPs and non-QSPs were examined according to their amino acid composition, residues position, motifs and physicochemical properties. Compositional analysis concludes that QSPs are enriched with aromatic residues like Trp, Tyr and Phe. At the N-terminal, Ser was a dominant residue at maximum positions, namely, first, second, third and fifth while Phe was a preferred residue at first, third and fifth positions from the C-terminal. A few motifs from QSPs were also extracted. Physicochemical properties like aromaticity, molecular weight and secondary structure were found to be distinguishing features of QSPs. Exploiting above properties, we have developed a Support Vector Machine (SVM) based predictive model. During 10-fold cross-validation, SVM achieves maximum accuracy of 93.00%, Mathew's correlation coefficient (MCC) of 0.86 and Receiver operating characteristic (ROC) of 0.98 on the training/testing dataset (T200p+200n). Developed models performed equally well on the validation dataset (V20p+20n). The server also integrates several useful analysis tools like "QSMotifScan", "ProtFrag", "MutGen" and "PhysicoProp". Our analysis reveals important characteristics of QSPs and on the basis of these unique features, we have developed a prediction algorithm "QSPpred" (freely available at: http://crdd.osdd.net/servers/qsppred).

  12. Prediction and Analysis of Quorum Sensing Peptides Based on Sequence Features

    PubMed Central

    Rajput, Akanksha; Gupta, Amit Kumar; Kumar, Manoj

    2015-01-01

    Quorum sensing peptides (QSPs) are the signaling molecules used by the Gram-positive bacteria in orchestrating cell-to-cell communication. In spite of their enormous importance in signaling process, their detailed bioinformatics analysis is lacking. In this study, QSPs and non-QSPs were examined according to their amino acid composition, residues position, motifs and physicochemical properties. Compositional analysis concludes that QSPs are enriched with aromatic residues like Trp, Tyr and Phe. At the N-terminal, Ser was a dominant residue at maximum positions, namely, first, second, third and fifth while Phe was a preferred residue at first, third and fifth positions from the C-terminal. A few motifs from QSPs were also extracted. Physicochemical properties like aromaticity, molecular weight and secondary structure were found to be distinguishing features of QSPs. Exploiting above properties, we have developed a Support Vector Machine (SVM) based predictive model. During 10-fold cross-validation, SVM achieves maximum accuracy of 93.00%, Mathew’s correlation coefficient (MCC) of 0.86 and Receiver operating characteristic (ROC) of 0.98 on the training/testing dataset (T200p+200n). Developed models performed equally well on the validation dataset (V20p+20n). The server also integrates several useful analysis tools like “QSMotifScan”, “ProtFrag”, “MutGen” and “PhysicoProp”. Our analysis reveals important characteristics of QSPs and on the basis of these unique features, we have developed a prediction algorithm “QSPpred” (freely available at: http://crdd.osdd.net/servers/qsppred). PMID:25781990

  13. Application of high-dimensional feature selection: evaluation for genomic prediction in man.

    PubMed

    Bermingham, M L; Pong-Wong, R; Spiliopoulou, A; Hayward, C; Rudan, I; Campbell, H; Wright, A F; Wilson, J F; Agakov, F; Navarro, P; Haley, C S

    2015-05-19

    In this study, we investigated the effect of five feature selection approaches on the performance of a mixed model (G-BLUP) and a Bayesian (Bayes C) prediction method. We predicted height, high density lipoprotein cholesterol (HDL) and body mass index (BMI) within 2,186 Croatian and into 810 UK individuals using genome-wide SNP data. Using all SNP information Bayes C and G-BLUP had similar predictive performance across all traits within the Croatian data, and for the highly polygenic traits height and BMI when predicting into the UK data. Bayes C outperformed G-BLUP in the prediction of HDL, which is influenced by loci of moderate size, in the UK data. Supervised feature selection of a SNP subset in the G-BLUP framework provided a flexible, generalisable and computationally efficient alternative to Bayes C; but careful evaluation of predictive performance is required when supervised feature selection has been used.

  14. Clinical, Epidemiologic, Histopathologic and Molecular Features of an Unexplained Dermopathy

    PubMed Central

    Pearson, Michele L.; Selby, Joseph V.; Katz, Kenneth A.; Cantrell, Virginia; Braden, Christopher R.; Parise, Monica E.; Paddock, Christopher D.; Lewin-Smith, Michael R.; Kalasinsky, Victor F.; Goldstein, Felicia C.; Hightower, Allen W.; Papier, Arthur; Lewis, Brian; Motipara, Sarita; Eberhard, Mark L.

    2012-01-01

    Background Morgellons is a poorly characterized constellation of symptoms, with the primary manifestations involving the skin. We conducted an investigation of this unexplained dermopathy to characterize the clinical and epidemiologic features and explore potential etiologies. Methods A descriptive study was conducted among persons at least 13 years of age and enrolled in Kaiser Permanente Northern California (KPNC) during 2006–2008. A case was defined as the self-reported emergence of fibers or materials from the skin accompanied by skin lesions and/or disturbing skin sensations. We collected detailed epidemiologic data, performed clinical evaluations and geospatial analyses and analyzed materials collected from participants' skin. Results We identified 115 case-patients. The prevalence was 3.65 (95% CI = 2.98, 4.40) cases per 100,000 enrollees. There was no clustering of cases within the 13-county KPNC catchment area (p = .113). Case-patients had a median age of 52 years (range: 17–93) and were primarily female (77%) and Caucasian (77%). Multi-system complaints were common; 70% reported chronic fatigue and 54% rated their overall health as fair or poor with mean Physical Component Scores and Mental Component Scores of 36.63 (SD = 12.9) and 35.45 (SD = 12.89), respectively. Cognitive deficits were detected in 59% of case-patients and 63% had evidence of clinically significant somatic complaints; 50% had drugs detected in hair samples and 78% reported exposure to solvents. Solar elastosis was the most common histopathologic abnormality (51% of biopsies); skin lesions were most consistent with arthropod bites or chronic excoriations. No parasites or mycobacteria were detected. Most materials collected from participants' skin were composed of cellulose, likely of cotton origin. Conclusions This unexplained dermopathy was rare among this population of Northern California residents, but associated with significantly reduced health-related quality of

  15. Clinical, epidemiologic, histopathologic and molecular features of an unexplained dermopathy.

    PubMed

    Pearson, Michele L; Selby, Joseph V; Katz, Kenneth A; Cantrell, Virginia; Braden, Christopher R; Parise, Monica E; Paddock, Christopher D; Lewin-Smith, Michael R; Kalasinsky, Victor F; Goldstein, Felicia C; Hightower, Allen W; Papier, Arthur; Lewis, Brian; Motipara, Sarita; Eberhard, Mark L

    2012-01-01

    Morgellons is a poorly characterized constellation of symptoms, with the primary manifestations involving the skin. We conducted an investigation of this unexplained dermopathy to characterize the clinical and epidemiologic features and explore potential etiologies. A descriptive study was conducted among persons at least 13 years of age and enrolled in Kaiser Permanente Northern California (KPNC) during 2006-2008. A case was defined as the self-reported emergence of fibers or materials from the skin accompanied by skin lesions and/or disturbing skin sensations. We collected detailed epidemiologic data, performed clinical evaluations and geospatial analyses and analyzed materials collected from participants' skin. We identified 115 case-patients. The prevalence was 3.65 (95% CI = 2.98, 4.40) cases per 100,000 enrollees. There was no clustering of cases within the 13-county KPNC catchment area (p = .113). Case-patients had a median age of 52 years (range: 17-93) and were primarily female (77%) and Caucasian (77%). Multi-system complaints were common; 70% reported chronic fatigue and 54% rated their overall health as fair or poor with mean Physical Component Scores and Mental Component Scores of 36.63 (SD = 12.9) and 35.45 (SD = 12.89), respectively. Cognitive deficits were detected in 59% of case-patients and 63% had evidence of clinically significant somatic complaints; 50% had drugs detected in hair samples and 78% reported exposure to solvents. Solar elastosis was the most common histopathologic abnormality (51% of biopsies); skin lesions were most consistent with arthropod bites or chronic excoriations. No parasites or mycobacteria were detected. Most materials collected from participants' skin were composed of cellulose, likely of cotton origin. This unexplained dermopathy was rare among this population of Northern California residents, but associated with significantly reduced health-related quality of life. No common underlying medical

  16. Synthesis of a specified, silica molecular sieve by using computationally predicted organic structure-directing agents.

    PubMed

    Schmidt, Joel E; Deem, Michael W; Davis, Mark E

    2014-08-04

    Crystalline molecular sieves are used in numerous applications, where the properties exploited for each technology are the direct consequence of structural features. New materials are typically discovered by trial and error, and in many cases, organic structure-directing agents (OSDAs) are used to direct their formation. Here, we report the first successful synthesis of a specified molecular sieve through the use of an OSDA that was predicted from a recently developed computational method that constructs chemically synthesizable OSDAs. Pentamethylimidazolium is computationally predicted to have the largest stabilization energy in the STW framework, and is experimentally shown to strongly direct the synthesis of pure-silica STW. Other OSDAs with lower stabilization energies did not form STW. The general method demonstrated here to create STW may lead to new, simpler OSDAs for existing frameworks and provide a way to predict OSDAs for desired, theoretical frameworks. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

    PubMed Central

    2013-01-01

    Background One of the main topics in the development of quantitative structure-property relationship (QSPR) predictive models is the identification of the subset of variables that represent the structure of a molecule and which are predictors for a given property. There are several automated feature selection methods, ranging from backward, forward or stepwise procedures, to further elaborated methodologies such as evolutionary programming. The problem lies in selecting the minimum subset of descriptors that can predict a certain property with a good performance, computationally efficient and in a more robust way, since the presence of irrelevant or redundant features can cause poor generalization capacity. In this paper an alternative selection method, based on Random Forests to determine the variable importance is proposed in the context of QSPR regression problems, with an application to a manually curated dataset for predicting standard enthalpy of formation. The subsequent predictive models are trained with support vector machines introducing the variables sequentially from a ranked list based on the variable importance. Results The model generalizes well even with a high dimensional dataset and in the presence of highly correlated variables. The feature selection step was shown to yield lower prediction errors with RMSE values 23% lower than without feature selection, albeit using only 6% of the total number of variables (89 from the original 1485). The proposed approach further compared favourably with other feature selection methods and dimension reduction of the feature space. The predictive model was selected using a 10-fold cross validation procedure and, after selection, it was validated with an independent set to assess its performance when applied to new data and the results were similar to the ones obtained for the training set, supporting the robustness of the proposed approach. Conclusions The proposed methodology seemingly improves the prediction

  18. Predicting age groups of Twitter users based on language and metadata features.

    PubMed

    Morgan-Lopez, Antonio A; Kim, Annice E; Chew, Robert F; Ruddle, Paul

    2017-01-01

    Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for

  19. Predicting age groups of Twitter users based on language and metadata features

    PubMed Central

    Morgan-Lopez, Antonio A.; Chew, Robert F.; Ruddle, Paul

    2017-01-01

    Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles’ metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen’s d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as “school” for youth and “college” for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be

  20. Special Feature: Liquids and Structural Glasses Special Feature: An active biopolymer network controlled by molecular motors

    NASA Astrophysics Data System (ADS)

    Koenderink, Gijsje H.; Dogic, Zvonimir; Nakamura, Fumihiko; Bendix, Poul M.; MacKintosh, Frederick C.; Hartwig, John H.; Stossel, Thomas P.; Weitz, David A.

    2009-09-01

    We describe an active polymer network in which processive molecular motors control network elasticity. This system consists of actin filaments cross-linked by filamin A (FLNa) and contracted by bipolar filaments of muscle myosin II. The myosin motors stiffen the network by more than two orders of magnitude by pulling on actin filaments anchored in the network by FLNa cross-links, thereby generating internal stress. The stiffening response closely mimics the effects of external stress applied by mechanical shear. Both internal and external stresses can drive the network into a highly nonlinear, stiffened regime. The active stress reaches values that are equivalent to an external stress of 14 Pa, consistent with a 1-pN force per myosin head. This active network mimics many mechanical properties of cells and suggests that adherent cells exert mechanical control by operating in a nonlinear regime where cell stiffness is sensitive to changes in motor activity. This design principle may be applicable to engineering novel biologically inspired, active materials that adjust their own stiffness by internal catalytic control.

  1. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques.

    PubMed

    Macyszyn, Luke; Akbari, Hamed; Pisapia, Jared M; Da, Xiao; Attiah, Mark; Pigrish, Vadim; Bi, Yingtao; Pal, Sharmistha; Davuluri, Ramana V; Roccograndi, Laura; Dahmane, Nadia; Martinez-Lage, Maria; Biros, George; Wolf, Ronald L; Bilello, Michel; O'Rourke, Donald M; Davatzikos, Christos

    2016-03-01

    MRI characteristics of brain gliomas have been used to predict clinical outcome and molecular tumor characteristics. However, previously reported imaging biomarkers have not been sufficiently accurate or reproducible to enter routine clinical practice and often rely on relatively simple MRI measures. The current study leverages advanced image analysis and machine learning algorithms to identify complex and reproducible imaging patterns predictive of overall survival and molecular subtype in glioblastoma (GB). One hundred five patients with GB were first used to extract approximately 60 diverse features from preoperative multiparametric MRIs. These imaging features were used by a machine learning algorithm to derive imaging predictors of patient survival and molecular subtype. Cross-validation ensured generalizability of these predictors to new patients. Subsequently, the predictors were evaluated in a prospective cohort of 29 new patients. Survival curves yielded a hazard ratio of 10.64 for predicted long versus short survivors. The overall, 3-way (long/medium/short survival) accuracy in the prospective cohort approached 80%. Classification of patients into the 4 molecular subtypes of GB achieved 76% accuracy. By employing machine learning techniques, we were able to demonstrate that imaging patterns are highly predictive of patient survival. Additionally, we found that GB subtypes have distinctive imaging phenotypes. These results reveal that when imaging markers related to infiltration, cell density, microvascularity, and blood-brain barrier compromise are integrated via advanced pattern analysis methods, they form very accurate predictive biomarkers. These predictive markers used solely preoperative images, hence they can significantly augment diagnosis and treatment of GB patients. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Imaging patterns predict patient survival and molecular subtype in glioblastoma via machine learning techniques

    PubMed Central

    Macyszyn, Luke; Akbari, Hamed; Pisapia, Jared M.; Da, Xiao; Attiah, Mark; Pigrish, Vadim; Bi, Yingtao; Pal, Sharmistha; Davuluri, Ramana V.; Roccograndi, Laura; Dahmane, Nadia; Martinez-Lage, Maria; Biros, George; Wolf, Ronald L.; Bilello, Michel; O'Rourke, Donald M.; Davatzikos, Christos

    2016-01-01

    Background MRI characteristics of brain gliomas have been used to predict clinical outcome and molecular tumor characteristics. However, previously reported imaging biomarkers have not been sufficiently accurate or reproducible to enter routine clinical practice and often rely on relatively simple MRI measures. The current study leverages advanced image analysis and machine learning algorithms to identify complex and reproducible imaging patterns predictive of overall survival and molecular subtype in glioblastoma (GB). Methods One hundred five patients with GB were first used to extract approximately 60 diverse features from preoperative multiparametric MRIs. These imaging features were used by a machine learning algorithm to derive imaging predictors of patient survival and molecular subtype. Cross-validation ensured generalizability of these predictors to new patients. Subsequently, the predictors were evaluated in a prospective cohort of 29 new patients. Results Survival curves yielded a hazard ratio of 10.64 for predicted long versus short survivors. The overall, 3-way (long/medium/short survival) accuracy in the prospective cohort approached 80%. Classification of patients into the 4 molecular subtypes of GB achieved 76% accuracy. Conclusions By employing machine learning techniques, we were able to demonstrate that imaging patterns are highly predictive of patient survival. Additionally, we found that GB subtypes have distinctive imaging phenotypes. These results reveal that when imaging markers related to infiltration, cell density, microvascularity, and blood–brain barrier compromise are integrated via advanced pattern analysis methods, they form very accurate predictive biomarkers. These predictive markers used solely preoperative images, hence they can significantly augment diagnosis and treatment of GB patients. PMID:26188015

  3. Metabolite identification and molecular fingerprint prediction through machine learning.

    PubMed

    Heinonen, Markus; Shen, Huibin; Zamboni, Nicola; Rousu, Juho

    2012-09-15

    Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule. An Matlab/Python package of the FingerID tool is freely available on the web at http://www.sourceforge.net/p/fingerid. markus.heinonen@cs.helsinki.fi.

  4. Adaptive modelling of structured molecular representations for toxicity prediction

    NASA Astrophysics Data System (ADS)

    Bertinetto, Carlo; Duce, Celia; Micheli, Alessio; Solaro, Roberto; Tiné, Maria Rosaria

    2012-12-01

    We investigated the possibility of modelling structure-toxicity relationships by direct treatment of the molecular structure (without using descriptors) through an adaptive model able to retain the appropriate structural information. With respect to traditional descriptor-based approaches, this provides a more general and flexible way to tackle prediction problems that is particularly suitable when little or no background knowledge is available. Our method employs a tree-structured molecular representation, which is processed by a recursive neural network (RNN). To explore the realization of RNN modelling in toxicological problems, we employed a data set containing growth impairment concentrations (IGC50) for Tetrahymena pyriformis.

  5. Predictive Value of Morphological Features in Patients with Autism versus Normal Controls

    ERIC Educational Resources Information Center

    Ozgen, H.; Hellemann, G. S.; de Jonge, M. V.; Beemer, F. A.; van Engeland, H.

    2013-01-01

    We investigated the predictive power of morphological features in 224 autistic patients and 224 matched-pairs controls. To assess the relationship between the morphological features and autism, we used the receiver operator curves (ROC). In addition, we used recursive partitioning (RP) to determine a specific pattern of abnormalities that is…

  6. Feature Biases in Early Word Learning: Network Distinctiveness Predicts Age of Acquisition

    ERIC Educational Resources Information Center

    Engelthaler, Tomas; Hills, Thomas T.

    2017-01-01

    Do properties of a word's features influence the order of its acquisition in early word learning? Combining the principles of mutual exclusivity and shape bias, the present work takes a network analysis approach to understanding how feature distinctiveness predicts the order of early word learning. Distance networks were built from nouns with edge…

  7. Feature Biases in Early Word Learning: Network Distinctiveness Predicts Age of Acquisition

    ERIC Educational Resources Information Center

    Engelthaler, Tomas; Hills, Thomas T.

    2017-01-01

    Do properties of a word's features influence the order of its acquisition in early word learning? Combining the principles of mutual exclusivity and shape bias, the present work takes a network analysis approach to understanding how feature distinctiveness predicts the order of early word learning. Distance networks were built from nouns with edge…

  8. Predictive Value of Morphological Features in Patients with Autism versus Normal Controls

    ERIC Educational Resources Information Center

    Ozgen, H.; Hellemann, G. S.; de Jonge, M. V.; Beemer, F. A.; van Engeland, H.

    2013-01-01

    We investigated the predictive power of morphological features in 224 autistic patients and 224 matched-pairs controls. To assess the relationship between the morphological features and autism, we used the receiver operator curves (ROC). In addition, we used recursive partitioning (RP) to determine a specific pattern of abnormalities that is…

  9. Structural class prediction of protein using novel feature extraction method from chaos game representation of predicted secondary structure.

    PubMed

    Zhang, Lichao; Kong, Liang; Han, Xiaodong; Lv, Jinfeng

    2016-07-07

    Protein structural class prediction plays an important role in protein structure and function analysis, drug design and many other biological applications. Extracting good representation from protein sequence is fundamental for this prediction task. In recent years, although several secondary structure based feature extraction strategies have been specially proposed for low-similarity protein sequences, the prediction accuracy still remains limited. To explore the potential of secondary structure information, this study proposed a novel feature extraction method from the chaos game representation of predicted secondary structure to mainly capture sequence order information and secondary structure segments distribution information in a given protein sequence. Several kinds of prediction accuracies obtained by the jackknife test are reported on three widely used low-similarity benchmark datasets (25PDB, 1189 and 640). Compared with the state-of-the-art prediction methods, the proposed method achieves the highest overall accuracies on all the three datasets. The experimental results confirm that the proposed feature extraction method is effective for accurate prediction of protein structural class. Moreover, it is anticipated that the proposed method could be extended to other graphical representations of protein sequence and be helpful in future research. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Adaptive reliance on the most stable sensory predictions enhances perceptual feature extraction of moving stimuli.

    PubMed

    Kumar, Neeraj; Mutha, Pratik K

    2016-03-01

    The prediction of the sensory outcomes of action is thought to be useful for distinguishing self- vs. externally generated sensations, correcting movements when sensory feedback is delayed, and learning predictive models for motor behavior. Here, we show that aspects of another fundamental function-perception-are enhanced when they entail the contribution of predicted sensory outcomes and that this enhancement relies on the adaptive use of the most stable predictions available. We combined a motor-learning paradigm that imposes new sensory predictions with a dynamic visual search task to first show that perceptual feature extraction of a moving stimulus is poorer when it is based on sensory feedback that is misaligned with those predictions. This was possible because our novel experimental design allowed us to override the "natural" sensory predictions present when any action is performed and separately examine the influence of these two sources on perceptual feature extraction. We then show that if the new predictions induced via motor learning are unreliable, rather than just relying on sensory information for perceptual judgments, as is conventionally thought, then subjects adaptively transition to using other stable sensory predictions to maintain greater accuracy in their perceptual judgments. Finally, we show that when sensory predictions are not modified at all, these judgments are sharper when subjects combine their natural predictions with sensory feedback. Collectively, our results highlight the crucial contribution of sensory predictions to perception and also suggest that the brain intelligently integrates the most stable predictions available with sensory information to maintain high fidelity in perceptual decisions.

  11. Molecular predictive markers in tumors of the gastrointestinal tract

    PubMed Central

    Papadopoulou, Eirini; Metaxa-Mariatou, Vasiliki; Tsaousis, Georgios; Tsoulos, Nikolaos; Tsirigoti, Angeliki; Efstathiadou, Chrisoula; Apessos, Angela; Agiannitopoulos, Konstantinos; Pepe, Georgia; Bourkoula, Eugenia; Nasioulas, George

    2016-01-01

    Gastrointestinal malignancies are among the leading causes of cancer-related deaths worldwide. Like all human malignancies they are characterized by accumulation of mutations which lead to inactivation of tumor suppressor genes or activation of oncogenes. Advances in Molecular Biology techniques have allowed for more accurate analysis of tumors’ genetic profiling using new breakthrough technologies such as next generation sequencing (NGS), leading to the development of targeted therapeutical approaches based upon biomarker-selection. During the last 10 years tremendous advances in the development of targeted therapies for patients with advanced cancer have been made, thus various targeted agents, associated with predictive biomarkers, have been developed or are in development for the treatment of patients with gastrointestinal cancer patients. This review summarizes the advances in the field of molecular biomarkers in tumors of the gastrointestinal tract, with focus on the available NGS platforms that enable comprehensive tumor molecular profile analysis. PMID:27895815

  12. Separate and concurrent symbolic predictions of sound features are processed differently

    PubMed Central

    Pieszek, Marika; Schröger, Erich; Widmann, Andreas

    2014-01-01

    The studies investigated the impact of predictive visual information about the pitch and location of a forthcoming sound on the sound processing. In Symbol-to-Sound matching paradigms, symbols induced predictions of particular sounds. The brain's error signals (IR and N2b components of the event-related potential) were measured in response to occasional violations of the prediction, i.e., when a sound was incongruent to the corresponding symbol. IR and N2b index the detection of prediction violations at different levels, IR at a sensory and N2b at a cognitive level. Participants evaluated the congruency between prediction and actual sound by button press. When the prediction referred to only the pitch or only the location feature (Experiment 1), the violation of each feature elicited IR and N2b. The IRs to pitch and location violations revealed differences in the in time course and topography, suggesting that they were generated in feature-specific sensory areas. When the prediction referred to both features concurrently (Experiment 2), that is, the symbol predicted the sound's pitch and location, either one or both predictions were violated. Unexpectedly, no significant effects in the IR range were obtained. However, N2b was elicited in response to all violations. N2b in response to concurrent violations of pitch and location had a shorter latency. We conclude that associative predictions can be established by arbitrary rule-based symbols and for different sound features, and that concurrent violations are processed in parallel. In complex situations as in Experiment 2, capacity limitations appear to affect processing in a hierarchical manner. While predictions were presumably not reliably established at sensory levels (absence of IR), they were established at more cognitive levels, where sounds are represented categorially (presence of N2b). PMID:25477832

  13. Ultra-low-molecular-weight heparins: precise structural features impacting specific anticoagulant activities.

    PubMed

    Lima, Marcelo A; Viskov, Christian; Herman, Frederic; Gray, Angel L; de Farias, Eduardo H C; Cavalheiro, Renan P; Sassaki, Guilherme L; Hoppensteadt, Debra; Fareed, Jawed; Nader, Helena B

    2013-03-01

    Ultra-low-molecular-weight heparins (ULMWHs) with better efficacy and safety ratios are under development; however, there are few structural data available. The main structural features and molecular weight of ULMWHs were studied and compared to enoxaparin. Their monosaccharide composition and average molecular weights were determined and preparations studied by nuclear magnetic resonance spectroscopy, scanning ultraviolet spectroscopy, circular dichroism and gel permeation chromatography. In general, ULMWHs presented higher 3-O-sulphated glucosamine and unsaturated uronic acid residues, the latter being comparable with their higher degree of depolymerisation. The analysis showed that ULMWHs are structurally related to LMWHs; however, their monosaccharide/oligosaccharide compositions and average molecular weights differed considerably explaining their different anticoagulant activities. The results relate structural features to activity, assisting the development of new and improved therapeutic agents, based on depolymerised heparin, for the prophylaxis and treatment of thrombotic disorders.

  14. Acinar Cell Carcinoma of the Pancreas: Overview of Clinicopathologic Features and Insights into the Molecular Pathology.

    PubMed

    La Rosa, Stefano; Sessa, Fausto; Capella, Carlo

    2015-01-01

    Acinar cell carcinomas (ACCs) of the pancreas are rare pancreatic neoplasms accounting for about 1-2% of pancreatic tumors in adults and about 15% in pediatric subjects. They show different clinical symptoms at presentation, different morphological features, different outcomes, and different molecular alterations. This heterogeneous clinicopathological spectrum may give rise to difficulties in the clinical and pathological diagnosis with consequential therapeutic and prognostic implications. The molecular mechanisms involved in the onset and progression of ACCs are still not completely understood, although in recent years, several attempts have been made to clarify the molecular mechanisms involved in ACC biology. In this paper, we will review the main clinicopathological and molecular features of pancreatic ACCs of both adult and pediatric subjects to give the reader a comprehensive overview of this rare tumor type.

  15. Acinar Cell Carcinoma of the Pancreas: Overview of Clinicopathologic Features and Insights into the Molecular Pathology

    PubMed Central

    La Rosa, Stefano; Sessa, Fausto; Capella, Carlo

    2015-01-01

    Acinar cell carcinomas (ACCs) of the pancreas are rare pancreatic neoplasms accounting for about 1–2% of pancreatic tumors in adults and about 15% in pediatric subjects. They show different clinical symptoms at presentation, different morphological features, different outcomes, and different molecular alterations. This heterogeneous clinicopathological spectrum may give rise to difficulties in the clinical and pathological diagnosis with consequential therapeutic and prognostic implications. The molecular mechanisms involved in the onset and progression of ACCs are still not completely understood, although in recent years, several attempts have been made to clarify the molecular mechanisms involved in ACC biology. In this paper, we will review the main clinicopathological and molecular features of pancreatic ACCs of both adult and pediatric subjects to give the reader a comprehensive overview of this rare tumor type. PMID:26137463

  16. Selecting radiomic features from FDG-PET images for cancer treatment outcome prediction.

    PubMed

    Lian, Chunfeng; Ruan, Su; Denœux, Thierry; Jardin, Fabrice; Vera, Pierre

    2016-08-01

    As a vital task in cancer therapy, accurately predicting the treatment outcome is valuable for tailoring and adapting a treatment planning. To this end, multi-sources of information (radiomics, clinical characteristics, genomic expressions, etc) gathered before and during treatment are potentially profitable. In this paper, we propose such a prediction system primarily using radiomic features (e.g., texture features) extracted from FDG-PET images. The proposed system includes a feature selection method based on Dempster-Shafer theory, a powerful tool to deal with uncertain and imprecise information. It aims to improve the prediction accuracy, and reduce the imprecision and overlaps between different classes (treatment outcomes) in a selected feature subspace. Considering that training samples are often small-sized and imbalanced in our applications, a data balancing procedure and specified prior knowledge are taken into account to improve the reliability of the selected feature subsets. Finally, the Evidential K-NN (EK-NN) classifier is used with selected features to output prediction results. Our prediction system has been evaluated by synthetic and clinical datasets, consistently showing good performance. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier.

    PubMed

    Paul, Desbordes; Su, Ruan; Romain, Modzelewski; Sébastien, Vauclin; Pierre, Vera; Isabelle, Gardin

    2016-12-28

    The outcome prediction of patients can greatly help to personalize cancer treatment. A large amount of quantitative features (clinical exams, imaging, …) are potentially useful to assess the patient outcome. The challenge is to choose the most predictive subset of features. In this paper, we propose a new feature selection strategy called GARF (genetic algorithm based on random forest) extracted from positron emission tomography (PET) images and clinical data. The most relevant features, predictive of the therapeutic response or which are prognoses of the patient survival 3 years after the end of treatment, were selected using GARF on a cohort of 65 patients with a local advanced oesophageal cancer eligible for chemo-radiation therapy. The most relevant predictive results were obtained with a subset of 9 features leading to a random forest misclassification rate of 18±4% and an areas under the of receiver operating characteristic (ROC) curves (AUC) of 0.823±0.032. The most relevant prognostic results were obtained with 8 features leading to an error rate of 20±7% and an AUC of 0.750±0.108. Both predictive and prognostic results show better performances using GARF than using 4 other studied methods.

  18. Prediction of biomechanical trabecular bone properties with geometric features using MR imaging

    NASA Astrophysics Data System (ADS)

    Huber, Markus B.; Lancianese, Sarah L.; Ikpot, Imoh; Nagarajan, Mahesh B.; Lerner, Amy L.; Wismüller, Axel

    2010-03-01

    Trabecular bone parameters extracted from magnetic resonance (MR) images are compared in their ability to predict biomechanical properties determined through mechanical testing. Trabecular bone density and structural changes throughout the proximal tibia are indicative of several musculoskeletal disorders of the knee joint involving changes in the bone quality and the surrounding soft tissue. Recent studies have shown that MR imaging, most frequently applied in soft tissue imaging, also allows non-invasive 3-dimensional characterization of bone microstructure. Sophisticated MR image features that estimate local structural and geometric properties of the trabecular bone may improve the ability of MR imaging to determine local bone quality in vivo. The purpose of the current study is to use whole joint MR images to compare the performance of trabecular bone features extracted from the images in predicting biomechanical strength properties measured on the corresponding ex vivo specimens. The regional apparent bone volume fraction (appBVF) and scaling index method (SIM) derived features were calculated; a Multilayer Radial Basis Functions Network was then optimized to calculate the prediction accuracy as measured by the root mean square error (RSME) for each bone feature. The best prediction result was obtained with a SIM feature with the lowest prediction error (RSME=0.246) and the highest coefficient of determination (R2 = 0.769). The current study demonstrates that the combination of sophisticated bone structure features and supervised learning techniques can improve MR imaging as an in vivo imaging tool in determining local trabecular bone quality.

  19. Feature maps driven no-reference image quality prediction of authentically distorted images

    NASA Astrophysics Data System (ADS)

    Ghadiyaram, Deepti; Bovik, Alan C.

    2015-03-01

    Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.

  20. Incorporating higher-order representative features improves prediction in network-based cancer prognosis analysis.

    PubMed

    Ma, Shuangge; Kosorok, Michael R; Huang, Jian; Dai, Ying

    2011-01-12

    In cancer prognosis studies with gene expression measurements, an important goal is to construct gene signatures with predictive power. In this study, we describe the coordination among genes using the weighted coexpression network, where nodes represent genes and nodes are connected if the corresponding genes have similar expression patterns across samples. There are subsets of nodes, called modules, that are tightly connected to each other. In several published studies, it has been suggested that the first principal components of individual modules, also referred to as "eigengenes", may sufficiently represent the corresponding modules. In this article, we refer to principal components and their functions as representative features". We investigate higher-order representative features, which include the principal components other than the first ones and second order terms (quadratics and interactions). Two gradient thresholding methods are adopted for regularized estimation and feature selection. Analysis of six prognosis studies on lymphoma and breast cancer shows that incorporating higher-order representative features improves prediction performance over using eigengenes only. Simulation study further shows that prediction performance can be less satisfactory if the representative feature set is not properly chosen. This study introduces multiple ways of defining the representative features and effective thresholding regularized estimation approaches. It provides convincing evidence that the higher-order representative features may have important implications for the prediction of cancer prognosis.

  1. Incorporating higher-order representative features improves prediction in network-based cancer prognosis analysis

    PubMed Central

    2011-01-01

    Background In cancer prognosis studies with gene expression measurements, an important goal is to construct gene signatures with predictive power. In this study, we describe the coordination among genes using the weighted coexpression network, where nodes represent genes and nodes are connected if the corresponding genes have similar expression patterns across samples. There are subsets of nodes, called modules, that are tightly connected to each other. In several published studies, it has been suggested that the first principal components of individual modules, also referred to as "eigengenes", may sufficiently represent the corresponding modules. Results In this article, we refer to principal components and their functions as representative features". We investigate higher-order representative features, which include the principal components other than the first ones and second order terms (quadratics and interactions). Two gradient thresholding methods are adopted for regularized estimation and feature selection. Analysis of six prognosis studies on lymphoma and breast cancer shows that incorporating higher-order representative features improves prediction performance over using eigengenes only. Simulation study further shows that prediction performance can be less satisfactory if the representative feature set is not properly chosen. Conclusions This study introduces multiple ways of defining the representative features and effective thresholding regularized estimation approaches. It provides convincing evidence that the higher-order representative features may have important implications for the prediction of cancer prognosis. PMID:21226928

  2. Weekly fluctuations in nonjudging predict borderline personality disorder feature expression in women

    PubMed Central

    Peters, Jessica R.; Chamberlain, Kaitlyn D.; Rodriguez, Marcus

    2015-01-01

    Objectives Borderline personality disorder (BPD) features have been linked to deficits in mindfulness, or nonjudgmental attention to present-moment stimuli. However, no previous work has examined the role of fluctuations in mindfulness over time in predicting BPD features. The present study examines the impact of both between-person differences and within-person changes in mindfulness. Design 40 women recruited to achieve a flat distribution of BPD features completed 4 weekly assessments of mindfulness (Five Facet Mindfulness Questionnaire; FFMQ) and BPD features. Multilevel models predicted each outcome from both 1) a person’s average levels of each facet and 2) weekly deviations from a person’s average for each facet. Results Average acting with awareness, nonjudging, and nonreactivity predicted lower BPD features at the between-person level, and weekly deviations above one’s average (i.e., higher-than-usual) nonjudging predicted lower BPD feature expression at the within-person level. Conclusions Within-person fluctuations in the nonjudging facet of mindfulness may be relevant to the daily expression of BPD features over and above dispositional mindfulness. PMID:27231408

  3. Harnessing Computational Biology for Exact Linear B-Cell Epitope Prediction: A Novel Amino Acid Composition-Based Feature Descriptor.

    PubMed

    Saravanan, Vijayakumar; Gautham, Namasivayam

    2015-10-01

    Proteins embody epitopes that serve as their antigenic determinants. Epitopes occupy a central place in integrative biology, not to mention as targets for novel vaccine, pharmaceutical, and systems diagnostics development. The presence of T-cell and B-cell epitopes has been extensively studied due to their potential in synthetic vaccine design. However, reliable prediction of linear B-cell epitope remains a formidable challenge. Earlier studies have reported discrepancy in amino acid composition between the epitopes and non-epitopes. Hence, this study proposed and developed a novel amino acid composition-based feature descriptor, Dipeptide Deviation from Expected Mean (DDE), to distinguish the linear B-cell epitopes from non-epitopes effectively. In this study, for the first time, only exact linear B-cell epitopes and non-epitopes have been utilized for developing the prediction method, unlike the use of epitope-containing regions in earlier reports. To evaluate the performance of the DDE feature vector, models have been developed with two widely used machine-learning techniques Support Vector Machine and AdaBoost-Random Forest. Five-fold cross-validation performance of the proposed method with error-free dataset and dataset from other studies achieved an overall accuracy between nearly 61% and 73%, with balance between sensitivity and specificity metrics. Performance of the DDE feature vector was better (with accuracy difference of about 2% to 12%), in comparison to other amino acid-derived features on different datasets. This study reflects the efficiency of the DDE feature vector in enhancing the linear B-cell epitope prediction performance, compared to other feature representations. The proposed method is made as a stand-alone tool available freely for researchers, particularly for those interested in vaccine design and novel molecular target development for systems therapeutics and diagnostics: https://github.com/brsaran/LBEEP.

  4. Improving structure-based function prediction using molecular dynamics

    PubMed Central

    Glazer, Dariya S.; Radmer, Randall J.; Altman, Russ B.

    2009-01-01

    Summary The number of molecules with solved three-dimensional structure but unknown function is increasing rapidly. Particularly problematic are novel folds with little detectable similarity to molecules of known function. Experimental assays can determine the functions of such molecules, but are time-consuming and expensive. Computational approaches can identify potential functional sites; however, these approaches generally rely on single static structures and do not use information about dynamics. In fact, structural dynamics can enhance function prediction: we coupled molecular dynamics simulations with structure-based function prediction algorithms that identify Ca2+ binding sites. When applied to 11 challenging proteins, both methods showed substantial improvement in performance, revealing 22 more sites in one case and 12 more in the other, with a modest increase in apparent false positives. Thus, we show that treating molecules as dynamic entities improves the performance of structure-based function prediction methods. PMID:19604472

  5. Toward Fully in Silico Melting Point Prediction Using Molecular Simulations

    SciTech Connect

    Zhang, Y; Maginn, EJ

    2013-03-01

    Melting point is one of the most fundamental and practically important properties of a compound. Molecular computation of melting points. However, all of these methods simulation methods have been developed for the accurate need an experimental crystal structure as input, which means that such calculations are not really predictive since the melting point can be measured easily in experiments once a crystal structure is known. On the other hand, crystal structure prediction (CSP) has become an active field and significant progress has been made, although challenges still exist. One of the main challenges is the existence of many crystal structures (polymorphs) that are very close in energy. Thermal effects and kinetic factors make the situation even more complicated, such that it is still not trivial to predict experimental crystal structures. In this work, we exploit the fact that free energy differences are often small between crystal structures. We show that accurate melting point predictions can be made by using a reasonable crystal structure from CSP as a starting point for a free energy-based melting point calculation. The key is that most crystal structures predicted by CSP have free energies that are close to that of the experimental structure. The proposed method was tested on two rigid molecules and the results suggest that a fully in silico melting point prediction method is possible.

  6. Toward Fully in Silico Melting Point Prediction Using Molecular Simulations.

    PubMed

    Zhang, Yong; Maginn, Edward J

    2013-03-12

    Melting point is one of the most fundamental and practically important properties of a compound. Molecular simulation methods have been developed for the accurate computation of melting points. However, all of these methods need an experimental crystal structure as input, which means that such calculations are not really predictive since the melting point can be measured easily in experiments once a crystal structure is known. On the other hand, crystal structure prediction (CSP) has become an active field and significant progress has been made, although challenges still exist. One of the main challenges is the existence of many crystal structures (polymorphs) that are very close in energy. Thermal effects and kinetic factors make the situation even more complicated, such that it is still not trivial to predict experimental crystal structures. In this work, we exploit the fact that free energy differences are often small between crystal structures. We show that accurate melting point predictions can be made by using a reasonable crystal structure from CSP as a starting point for a free energy-based melting point calculation. The key is that most crystal structures predicted by CSP have free energies that are close to that of the experimental structure. The proposed method was tested on two rigid molecules and the results suggest that a fully in silico melting point prediction method is possible.

  7. Prediction of structural features and application to outer membrane protein identification

    PubMed Central

    Yan, Renxiang; Wang, Xiaofeng; Huang, Lanqing; Yan, Feidi; Xue, Xiaoyu; Cai, Weiwen

    2015-01-01

    Protein three-dimensional (3D) structures provide insightful information in many fields of biology. One-dimensional properties derived from 3D structures such as secondary structure, residue solvent accessibility, residue depth and backbone torsion angles are helpful to protein function prediction, fold recognition and ab initio folding. Here, we predict various structural features with the assistance of neural network learning. Based on an independent test dataset, protein secondary structure prediction generates an overall Q3 accuracy of ~80%. Meanwhile, the prediction of relative solvent accessibility obtains the highest mean absolute error of 0.164, and prediction of residue depth achieves the lowest mean absolute error of 0.062. We further improve the outer membrane protein identification by including the predicted structural features in a scoring function using a simple profile-to-profile alignment. The results demonstrate that the accuracy of outer membrane protein identification can be improved by ~3% at a 1% false positive level when structural features are incorporated. Finally, our methods are available as two convenient and easy-to-use programs. One is PSSM-2-Features for predicting secondary structure, relative solvent accessibility, residue depth and backbone torsion angles, the other is PPA-OMP for identifying outer membrane proteins from proteomes. PMID:26104144

  8. Prediction of structural features and application to outer membrane protein identification

    NASA Astrophysics Data System (ADS)

    Yan, Renxiang; Wang, Xiaofeng; Huang, Lanqing; Yan, Feidi; Xue, Xiaoyu; Cai, Weiwen

    2015-06-01

    Protein three-dimensional (3D) structures provide insightful information in many fields of biology. One-dimensional properties derived from 3D structures such as secondary structure, residue solvent accessibility, residue depth and backbone torsion angles are helpful to protein function prediction, fold recognition and ab initio folding. Here, we predict various structural features with the assistance of neural network learning. Based on an independent test dataset, protein secondary structure prediction generates an overall Q3 accuracy of ~80%. Meanwhile, the prediction of relative solvent accessibility obtains the highest mean absolute error of 0.164, and prediction of residue depth achieves the lowest mean absolute error of 0.062. We further improve the outer membrane protein identification by including the predicted structural features in a scoring function using a simple profile-to-profile alignment. The results demonstrate that the accuracy of outer membrane protein identification can be improved by ~3% at a 1% false positive level when structural features are incorporated. Finally, our methods are available as two convenient and easy-to-use programs. One is PSSM-2-Features for predicting secondary structure, relative solvent accessibility, residue depth and backbone torsion angles, the other is PPA-OMP for identifying outer membrane proteins from proteomes.

  9. Structural features based genome-wide characterization and prediction of nucleosome organization

    PubMed Central

    2012-01-01

    Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene

  10. Widespread convergence in toxin resistance by predictable molecular evolution

    PubMed Central

    Ujvari, Beata; Casewell, Nicholas R.; Sunagar, Kartik; Arbuckle, Kevin; Wüster, Wolfgang; Lo, Nathan; O’Meally, Denis; Beckmann, Christa; King, Glenn F.; Deplazes, Evelyne; Madsen, Thomas

    2015-01-01

    The question about whether evolution is unpredictable and stochastic or intermittently constrained along predictable pathways is the subject of a fundamental debate in biology, in which understanding convergent evolution plays a central role. At the molecular level, documented examples of convergence are rare and limited to occurring within specific taxonomic groups. Here we provide evidence of constrained convergent molecular evolution across the metazoan tree of life. We show that resistance to toxic cardiac glycosides produced by plants and bufonid toads is mediated by similar molecular changes to the sodium-potassium-pump (Na+/K+-ATPase) in insects, amphibians, reptiles, and mammals. In toad-feeding reptiles, resistance is conferred by two point mutations that have evolved convergently on four occasions, whereas evidence of a molecular reversal back to the susceptible state in varanid lizards migrating to toad-free areas suggests that toxin resistance is maladaptive in the absence of selection. Importantly, resistance in all taxa is mediated by replacements of 2 of the 12 amino acids comprising the Na+/K+-ATPase H1–H2 extracellular domain that constitutes a core part of the cardiac glycoside binding site. We provide mechanistic insight into the basis of resistance by showing that these alterations perturb the interaction between the cardiac glycoside bufalin and the Na+/K+-ATPase. Thus, similar selection pressures have resulted in convergent evolution of the same molecular solution across the breadth of the animal kingdom, demonstrating how a scarcity of possible solutions to a selective challenge can lead to highly predictable evolutionary responses. PMID:26372961

  11. Widespread convergence in toxin resistance by predictable molecular evolution.

    PubMed

    Ujvari, Beata; Casewell, Nicholas R; Sunagar, Kartik; Arbuckle, Kevin; Wüster, Wolfgang; Lo, Nathan; O'Meally, Denis; Beckmann, Christa; King, Glenn F; Deplazes, Evelyne; Madsen, Thomas

    2015-09-22

    The question about whether evolution is unpredictable and stochastic or intermittently constrained along predictable pathways is the subject of a fundamental debate in biology, in which understanding convergent evolution plays a central role. At the molecular level, documented examples of convergence are rare and limited to occurring within specific taxonomic groups. Here we provide evidence of constrained convergent molecular evolution across the metazoan tree of life. We show that resistance to toxic cardiac glycosides produced by plants and bufonid toads is mediated by similar molecular changes to the sodium-potassium-pump (Na(+)/K(+)-ATPase) in insects, amphibians, reptiles, and mammals. In toad-feeding reptiles, resistance is conferred by two point mutations that have evolved convergently on four occasions, whereas evidence of a molecular reversal back to the susceptible state in varanid lizards migrating to toad-free areas suggests that toxin resistance is maladaptive in the absence of selection. Importantly, resistance in all taxa is mediated by replacements of 2 of the 12 amino acids comprising the Na(+)/K(+)-ATPase H1-H2 extracellular domain that constitutes a core part of the cardiac glycoside binding site. We provide mechanistic insight into the basis of resistance by showing that these alterations perturb the interaction between the cardiac glycoside bufalin and the Na(+)/K(+)-ATPase. Thus, similar selection pressures have resulted in convergent evolution of the same molecular solution across the breadth of the animal kingdom, demonstrating how a scarcity of possible solutions to a selective challenge can lead to highly predictable evolutionary responses.

  12. Modelling complex features from histone modification signatures using genetic algorithm for the prediction of enhancer region.

    PubMed

    Lee, Nung Kion; Fong, Pui Kwan; Abdullah, Mohd Tajuddin

    2014-01-01

    Using Genetic Algorithm, this paper presents a modelling method to generate novel logical-based features from DNA sequences enriched with H3K4mel histone signatures. Current histone signature is mostly represented using k-mers content features incapable of representing all the possible complex interactions of various DNA segments. The main contributions are, among others: (a) demonstrating that there are complex interactions among sequence segments in the histone regions; (b) developing a parse tree representation of the logical complex features. The proposed novel feature is compared to the k-mers content features using datasets from the mouse (mm9) genome. Evaluation results show that the new feature improves the prediction performance as shown by f-measure for all datasets tested. Also, it is discovered that tree-based features generated from a single chromosome can be generalized to predict histone marks in other chromosomes not used in the training. These findings have a great impact on feature design considerations for histone signatures as well as other classifier design features.

  13. Molecular orbital predictions of the vibrational frequencies of some molecular ions

    NASA Technical Reports Server (NTRS)

    Defrees, D. J.; Mclean, A. D.

    1985-01-01

    The initial detections of IR vibration-rotation bands in polyatomic molecular ions by recent spectroscopic advances were guided by ab initio prediction of vibrational frequencies. The present calculations predict the vibrational frequencies of additional ions which are candidates for laboratory analysis. Neutral molecule vibrational frequencies were computed at three levels of theory and then compared with experimental data; the effect of scaling was also investigated, in order to determine how accurately vibrational frequencies could be predicted. For 92 percent of the frequencies examined, the relatively simple HF/6-31G theory's vibrational frequencies were within 100/cm of experimental values, with a mean absolute error of 49/cm. On this basis, the frequencies of 30 molecular ions (many possessing astrophysical significance) were computed.

  14. Cellular automata with object-oriented features for parallel molecular network modeling.

    PubMed

    Zhu, Hao; Wu, Yinghui; Huang, Sui; Sun, Yan; Dhar, Pawan

    2005-06-01

    Cellular automata are an important modeling paradigm for studying the dynamics of large, parallel systems composed of multiple, interacting components. However, to model biological systems, cellular automata need to be extended beyond the large-scale parallelism and intensive communication in order to capture two fundamental properties characteristic of complex biological systems: hierarchy and heterogeneity. This paper proposes extensions to a cellular automata language, Cellang, to meet this purpose. The extended language, with object-oriented features, can be used to describe the structure and activity of parallel molecular networks within cells. Capabilities of this new programming language include object structure to define molecular programs within a cell, floating-point data type and mathematical functions to perform quantitative computation, message passing capability to describe molecular interactions, as well as new operators, statements, and built-in functions. We discuss relevant programming issues of these features, including the object-oriented description of molecular interactions with molecule encapsulation, message passing, and the description of heterogeneity and anisotropy at the cell and molecule levels. By enabling the integration of modeling at the molecular level with system behavior at cell, tissue, organ, or even organism levels, the program will help improve our understanding of how complex and dynamic biological activities are generated and controlled by parallel functioning of molecular networks. Index Terms-Cellular automata, modeling, molecular network, object-oriented.

  15. PREAL: prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection

    PubMed Central

    2013-01-01

    Background Assessment of potential allergenicity of protein is necessary whenever transgenic proteins are introduced into the food chain. Bioinformatics approaches in allergen prediction have evolved appreciably in recent years to increase sophistication and performance. However, what are the critical features for protein's allergenicity have been not fully investigated yet. Results We presented a more comprehensive model in 128 features space for allergenic proteins prediction by integrating various properties of proteins, such as biochemical and physicochemical properties, sequential features and subcellular locations. The overall accuracy in the cross-validation reached 93.42% to 100% with our new method. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) procedure were applied to obtain which features are essential for allergenicity. Results of the performance comparisons showed the superior of our method to the existing methods used widely. More importantly, it was observed that the features of subcellular locations and amino acid composition played major roles in determining the allergenicity of proteins, particularly extracellular/cell surface and vacuole of the subcellular locations for wheat and soybean. To facilitate the allergen prediction, we implemented our computational method in a web application, which can be available at http://gmobl.sjtu.edu.cn/PREAL/index.php. Conclusions Our new approach could improve the accuracy of allergen prediction. And the findings may provide novel insights for the mechanism of allergies. PMID:24565053

  16. Identification of informative features for predicting proinflammatory potentials of engine exhausts.

    PubMed

    Wang, Chia-Chi; Lin, Ying-Chi; Lin, Yuan-Chung; Jhang, Syu-Ruei; Tung, Chun-Wei

    2017-08-18

    The immunotoxicity of engine exhausts is of high concern to human health due to the increasing prevalence of immune-related diseases. However, the evaluation of immunotoxicity of engine exhausts is currently based on expensive and time-consuming experiments. It is desirable to develop efficient methods for immunotoxicity assessment. To accelerate the development of safe alternative fuels, this study proposed a computational method for identifying informative features for predicting proinflammatory potentials of engine exhausts. A principal component regression (PCR) algorithm was applied to develop prediction models. The informative features were identified by a sequential backward feature elimination (SBFE) algorithm. A total of 19 informative chemical and biological features were successfully identified by SBFE algorithm. The informative features were utilized to develop a computational method named FS-CBM for predicting proinflammatory potentials of engine exhausts. FS-CBM model achieved a high performance with correlation coefficient values of 0.997 and 0.943 obtained from training and independent test sets, respectively. The FS-CBM model was developed for predicting proinflammatory potentials of engine exhausts with a large improvement on prediction performance compared with our previous CBM model. The proposed method could be further applied to construct models for bioactivities of mixtures.

  17. Effects of imaging modalities, brain atlases and feature selection on prediction of Alzheimer's disease.

    PubMed

    Ota, Kenichi; Oishi, Naoya; Ito, Kengo; Fukuyama, Hidenao

    2015-12-30

    The choice of biomarkers for early detection of Alzheimer's disease (AD) is important for improving the accuracy of imaging-based prediction of conversion from mild cognitive impairment (MCI) to AD. The primary goal of this study was to assess the effects of imaging modalities and brain atlases on prediction. We also investigated the influence of support vector machine recursive feature elimination (SVM-RFE) on predictive performance. Eighty individuals with amnestic MCI [40 developed AD within 3 years] underwent structural magnetic resonance imaging (MRI) and (18)F-fluorodeoxyglucose positron emission tomography (FDG-PET) scans at baseline. Using Automated Anatomical Labeling (AAL) and LONI Probabilistic Brain Atlas (LPBA40), we extracted features representing gray matter density and relative cerebral metabolic rate for glucose in each region of interest from the baseline MRI and FDG-PET data, respectively. We used linear SVM ensemble with bagging and computed the area under the receiver operating characteristic curve (AUC) as a measure of classification performance. We performed multiple SVM-RFE to compute feature ranking. We performed analysis of variance on the mean AUCs for eight feature sets. The interactions between atlas and modality choices were significant. The main effect of SVM-RFE was significant, but the interactions with the other factors were not significant. Multimodal features were found to be better than unimodal features to predict AD. FDG-PET was found to be better than MRI. Imaging modalities and brain atlases interact with each other and affect prediction. SVM-RFE can improve the predictive accuracy when using atlas-based features. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Prognosis of stage III colorectal carcinomas with FOLFOX adjuvant chemotherapy can be predicted by molecular subtype

    PubMed Central

    Yun, Seongju; Kim, Won Kyu; Kim, Sora; Paik, Soonmyung; Lee, Hyun Jung; Hong, Sungpil; Kim, Tae Il; Min, Byungsoh; Kim, Hoguen

    2017-01-01

    Individualizing adjuvant chemotherapy is important in patients with advanced colorectal cancers (CRCs), and the ability to identify molecular subtypes predictive of good prognosis for stage III CRCs after adjuvant chemotherapy could be highly beneficial. We performed microarray-based gene expression analysis on 101 fresh-frozen primary samples from patients with stage III CRCs treated with FOLFOX adjuvant chemotherapy and 35 matched non-neoplastic mucosal tissues. CRC samples were classified into four molecular subtypes using nonnegative matrix factorization, and for comparison, we also grouped CRC samples using the proposed consensus molecular subtypes (CMSs). Of the 101 cases, 80 were classified into a CMS group, which shows a 79% correlation between the CMS classification and our four molecular subtypes. We found that two of our subtypes showed significantly higher disease-free survival and overall survival than the others. Group 2, in particular, which showed no disease recurrence or death, was characterized by high microsatellite instability (MSI-H, 6/21), abundant mucin production (12/21), and right-sided location (12/21); this group strongly correlated with CMS1 (microsatellite instability immune type). We further identified the molecular characteristics of each group and selected 10 potential biomarker genes from each. When these were compared to the previously reported molecular classifier genes, we found that 31 out of 40 selected genes were matched with those previously reported. Our findings indicate that molecular classification can reveal specific molecular subtypes correlating with clinicopathologic features of CRCs and can have predictive value for the prognosis for stage III CRCs with FOLFOX adjuvant chemotherapy. PMID:28455965

  19. Prognosis of stage III colorectal carcinomas with FOLFOX adjuvant chemotherapy can be predicted by molecular subtype.

    PubMed

    Kwon, Yujin; Park, Minhee; Jang, Mi; Yun, Seongju; Kim, Won Kyu; Kim, Sora; Paik, Soonmyung; Lee, Hyun Jung; Hong, Sungpil; Kim, Tae Il; Min, Byungsoh; Kim, Hoguen

    2017-06-13

    Individualizing adjuvant chemotherapy is important in patients with advanced colorectal cancers (CRCs), and the ability to identify molecular subtypes predictive of good prognosis for stage III CRCs after adjuvant chemotherapy could be highly beneficial. We performed microarray-based gene expression analysis on 101 fresh-frozen primary samples from patients with stage III CRCs treated with FOLFOX adjuvant chemotherapy and 35 matched non-neoplastic mucosal tissues. CRC samples were classified into four molecular subtypes using nonnegative matrix factorization, and for comparison, we also grouped CRC samples using the proposed consensus molecular subtypes (CMSs). Of the 101 cases, 80 were classified into a CMS group, which shows a 79% correlation between the CMS classification and our four molecular subtypes. We found that two of our subtypes showed significantly higher disease-free survival and overall survival than the others. Group 2, in particular, which showed no disease recurrence or death, was characterized by high microsatellite instability (MSI-H, 6/21), abundant mucin production (12/21), and right-sided location (12/21); this group strongly correlated with CMS1 (microsatellite instability immune type). We further identified the molecular characteristics of each group and selected 10 potential biomarker genes from each. When these were compared to the previously reported molecular classifier genes, we found that 31 out of 40 selected genes were matched with those previously reported. Our findings indicate that molecular classification can reveal specific molecular subtypes correlating with clinicopathologic features of CRCs and can have predictive value for the prognosis for stage III CRCs with FOLFOX adjuvant chemotherapy.

  20. Hypomanic symptoms predict an increase in narcissistic and histrionic personality disorder features in suicidal young adults.

    PubMed

    Shahar, Golan; Scotti, Margaret-Ann; Rudd, M David; Joiner, Thomas E

    2008-01-01

    Consistent with the "scar hypothesis", according to which mood depression might impact personality, we examined the effect of unipolar and hypomanic mood disturbances on cluster B (i.e., narcissistic, histrionic, and borderline) personality disorder features. Data from 113 suicidal young adults were utilized, and cross-lagged associations between unipolar and hypomanic mood disturbances and cluster B personality disorder features were examined using manifest-variable structural equation modeling (SEM). Hypomanic symptoms predicted an increase in narcissistic and histrionic personality disorder features over the Time 1-Time 2 period, as well as an increase in narcissistic personality disorder features over the Time 1-Time 3 period. Unipolar depressive symptoms and borderline features were reciprocally and longitudinally associated, albeit at different time periods. The sample distinct features restrict generalization of the findings. An exclusive use of self-report measures might have contributed to shared method variance. Results are consistent with the notion that hypomanic symptoms increase narcissistic personality disorder tendencies.

  1. SU-E-T-214: Predicting Plan Quality from Patient Geometry: Feature Selection and Inference Modeling.

    PubMed

    Ruan, D; Shao, W; DeMarco, J; Kupelian, P; Low, D

    2012-06-01

    To investigate and develop methods to infer treatment plan quality from the geometric features of PTV/OAR structures; to discover and identify features of high prognostic values. This study explores the prognostic utility of geometric features of two categories: (1) absolute geometry, characterizing the volumes of single structures (PTV, OARs); and (2) relative geometry, based on the minimal 3D distance and/or overlapping volume between pairs of structures. Using prostate as a pilot site, we developed inference models to 'predict' SBRT plan quality of DVH end points. We developed and assessed (1) a full linear regression model based on both absolute and relative geometric features, (2) a sparsity-penalized linear regression model, (3) a linear regression model based on absolute geometry features only; (4) a learning-based nonparametric model. Cross-validation was used for both selecting the parameter values as well as quantifying the inference performance. The best inference method for each of the DVH end points was identified to reveal the structural and prognostic differences among them. For linear regression, using sparsity-regularization discovered geometric features that were mostly absolute, demonstrating their dominant linear prognostic utility. However, introducing relative geometric features improved the plan quality prediction by 15% for all DVH end points. In contrast, nonparametric models had a heavier dependence on relative geometry features. While linear regression based on both features sets predicted OAR DVH points slightly better, the nonparametric method excelled in predicting PTV coverage and conformality. The inference result from this study provides an 'expectation' for the plan quality before the planning is to be performed, providing reference goals for the planner and a baseline for detecting abnormality. The use of relative geometry complements the absolute geometry with information on spatial configuration of the PTV/OAR structures of

  2. Prediction of occult invasive disease in ductal carcinoma in situ using computer-extracted mammographic features

    NASA Astrophysics Data System (ADS)

    Shi, Bibo; Grimm, Lars J.; Mazurowski, Maciej A.; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

    2017-03-01

    Predicting the risk of occult invasive disease in ductal carcinoma in situ (DCIS) is an important task to help address the overdiagnosis and overtreatment problems associated with breast cancer. In this work, we investigated the feasibility of using computer-extracted mammographic features to predict occult invasive disease in patients with biopsy proven DCIS. We proposed a computer-vision algorithm based approach to extract mammographic features from magnification views of full field digital mammography (FFDM) for patients with DCIS. After an expert breast radiologist provided a region of interest (ROI) mask for the DCIS lesion, the proposed approach is able to segment individual microcalcifications (MCs), detect the boundary of the MC cluster (MCC), and extract 113 mammographic features from MCs and MCC within the ROI. In this study, we extracted mammographic features from 99 patients with DCIS (74 pure DCIS; 25 DCIS plus invasive disease). The predictive power of the mammographic features was demonstrated through binary classifications between pure DCIS and DCIS with invasive disease using linear discriminant analysis (LDA). Before classification, the minimum redundancy Maximum Relevance (mRMR) feature selection method was first applied to choose subsets of useful features. The generalization performance was assessed using Leave-One-Out Cross-Validation and Receiver Operating Characteristic (ROC) curve analysis. Using the computer-extracted mammographic features, the proposed model was able to distinguish DCIS with invasive disease from pure DCIS, with an average classification performance of AUC = 0.61 +/- 0.05. Overall, the proposed computer-extracted mammographic features are promising for predicting occult invasive disease in DCIS.

  3. Perceptual quality prediction on authentically distorted images using a bag of features approach

    PubMed Central

    Ghadiyaram, Deepti; Bovik, Alan C.

    2017-01-01

    Current top-performing blind perceptual image quality prediction models are generally trained on legacy databases of human quality opinion scores on synthetically distorted images. Therefore, they learn image features that effectively predict human visual quality judgments of inauthentic and usually isolated (single) distortions. However, real-world images usually contain complex composite mixtures of multiple distortions. We study the perceptually relevant natural scene statistics of such authentically distorted images in different color spaces and transform domains. We propose a “bag of feature maps” approach that avoids assumptions about the type of distortion(s) contained in an image and instead focuses on capturing consistencies—or departures therefrom—of the statistics of real-world images. Using a large database of authentically distorted images, human opinions of them, and bags of features computed on them, we train a regressor to conduct image quality prediction. We demonstrate the competence of the features toward improving automatic perceptual quality prediction by testing a learned algorithm using them on a benchmark legacy database as well as on a newly introduced distortion-realistic resource called the LIVE In the Wild Image Quality Challenge Database. We extensively evaluate the perceptual quality prediction model and algorithm and show that it is able to achieve good-quality prediction power that is better than other leading models. PMID:28129417

  4. Exponential repulsion improves structural predictability of molecular docking.

    PubMed

    Bazgier, Václav; Berka, Karel; Otyepka, Michal; Banáš, Pavel

    2016-10-30

    Molecular docking is a powerful tool for theoretical prediction of the preferred conformation and orientation of small molecules within protein active sites. The obtained poses can be used for estimation of binding energies, which indicate the inhibition effect of designed inhibitors, and therefore might be used for in silico drug design. However, the evaluation of ligand binding affinity critically depends on successful prediction of the native binding mode. Contemporary docking methods are often based on scoring functions derived from molecular mechanical potentials. In such potentials, nonbonded interactions are typically represented by electrostatic interactions between atom-centered partial charges and standard 6-12 Lennard-Jones potential. Here, we present implementation and testing of a scoring function based on more physically justified exponential repulsion instead of the standard Lennard-Jones potential. We found that this scoring function significantly improved prediction of the native binding modes in proteins bearing narrow active sites such as serine proteases and kinases. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  5. Clinical, pathologic, and molecular features of early-onset colorectal carcinoma.

    PubMed

    Yantiss, Rhonda K; Goodarzi, Mahmoud; Zhou, Xi K; Rennert, Hanna; Pirog, Edyta C; Banner, Barbara F; Chen, Yao-Tseng

    2009-04-01

    The incidence of colorectal carcinoma has increased among patients <40 years of age for unclear reasons. In this study, we describe the clinical, pathologic, and molecular features of colorectal carcinomas that developed in young patients. We compiled a study group of 24 patients <40 years of age with colorectal carcinoma, and 45 patients > or =40 years of age served as controls. Cases were evaluated for clinical risk factors of malignancy and pathologic features predictive of outcome. The tumors were immunohistochemically stained for O6-methylguanine methyltransferase, MLH-1, MSH-2, MSH-6, beta-catenin, chemokine (C-X-C motif) receptor 4, epidermal growth factor receptor, TP53, p16, survivin, and alpha-methylacyl-CoA racemase; assessed for microsatellite instability and mutations in beta-catenin, APC, EGFR, PIK3CA, KRAS, and BRAF; evaluated for micro-RNA expression (miR-21, miR-20a, miR-183, miR-192, miR-145, miR-106a, miR-181b, and miR-203); and examined for evidence of human papillomavirus infection. One study patient each had ulcerative colitis and hereditary nonpolyposis colorectal cancer. Ninety-two percent of tumors from young patients occurred in the distal colon (P=0.006), particularly the rectum (58%, P=0.02), and 75% were stage III or IV. Tumors from young patients showed more frequent lymphovascular (81%, P=0.03) and/or venous (48%, P=0.003) invasion, an infiltrative growth pattern (81%, P=0.03), and alpha-methylacyl-CoA racemase expression (83%, P=0.02) compared with controls. Carcinomas in this group showed significantly increased expression of miR-21, miR-20a, miR-145, miR-181b, and miR-203 (P< or =0.005 for all comparisons with controls). These results indicate that early-onset carcinomas commonly show pathologic features associated with aggressive behavior. Posttranslational regulation of mRNA and subsequent protein expression may be particularly important to the development of colorectal carcinomas in young patients.

  6. Adaptive prediction model in prospective molecular-signature-based clinical studies

    PubMed Central

    Xiao, Guanghua; Ma, Shuangge; Minna, John; Xie, Yang

    2014-01-01

    Use of molecular profiles and clinical information can help predict which treatment would give the best outcome and survival for each individual patient, and thus guide optimal therapy, which offers great promise for the future of clinical trials and practice. High prediction accuracy is essential for selecting the best treatment plan. The gold standard for evaluating the prediction models is prospective clinical studies, where patients are enrolled sequentially. However, there is no statistical method utilizing this sequential feature to adapt the prediction model to the current patient cohort. In this paper, we proposed a re-weighted random forest (RWRF) model, which updates the weight of each decision tree whenever additional patient information is available, in order to account for the potential heterogeneity between training and testing data. A simulation study and a lung cancer example were used to show that the proposed method can adapt the prediction model to current patients’ characteristics, and therefore improve prediction accuracy significantly. We also showed that the proposed method can identify important and consistent predictive variables. Compared to rebuilding the prediction model, the RWRF updates a well-tested model gradually, and all of the adaptive procedure/parameters used in the RWRF model are pre-specified before patient recruitment, which are important practical advantages for prospective clinical studies. PMID:24323903

  7. Adaptive prediction model in prospective molecular signature-based clinical studies.

    PubMed

    Xiao, Guanghua; Ma, Shuangge; Minna, John; Xie, Yang

    2014-02-01

    Use of molecular profiles and clinical information can help predict which treatment would give the best outcome and survival for each individual patient, and thus guide optimal therapy, which offers great promise for the future of clinical trials and practice. High prediction accuracy is essential for selecting the best treatment plan. The gold standard for evaluating the prediction models is prospective clinical studies, in which patients are enrolled sequentially. However, there is no statistical method using this sequential feature to adapt the prediction model to the current patient cohort. In this article, we propose a reweighted random forest (RWRF) model, which updates the weight of each decision tree whenever additional patient information is available, to account for the potential heterogeneity between training and testing data. A simulation study and a lung cancer example are used to show that the proposed method can adapt the prediction model to current patients' characteristics, and, therefore, can improve prediction accuracy significantly. We also show that the proposed method can identify important and consistent predictive variables. Compared with rebuilding the prediction model, the RWRF updates a well-tested model gradually, and all of the adaptive procedure/parameters used in the RWRF model are prespecified before patient recruitment, which are important practical advantages for prospective clinical studies. ©2013 AACR.

  8. DYNAMICS OF ATOMIC AND MOLECULAR EMISSION FEATURES FROM NANOSECOND, FEMTOSECOND LASER AND FILAMENT PRODUCED PLASMAS

    SciTech Connect

    Harilal, Sivanandan S.; Yeak, J.; Brumfield, Brian E.; Phillips, Mark C.

    2016-08-08

    In this presentation, the persistence of atomic, and molecular emission features and its relation to fundamental properties (temperature and density) of ablation plumes generated using various irradiation methods (ns, fs, filaments) will be discussed in detail along with its implications for remote sensing applications.

  9. Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision

    PubMed Central

    Zhang, Lei; Li, Jing; Xiao, Yun; Cui, Hao; Du, Guoqing; Wang, Ying; Li, Ziyao; Wu, Tong; Li, Xia; Tian, Jiawei

    2015-01-01

    Breast cancer is molecularly heterogeneous and categorized into four molecular subtypes: Luminal-A, Luminal-B, HER2-amplified and Triple-negative. In this study, we aimed to apply an ensemble decision approach to identify the ultrasound and clinical features related to the molecular subtypes. We collected ultrasound and clinical features from 1,000 breast cancer patients and performed immunohistochemistry on these samples. We used the ensemble decision approach to select unique features and to construct decision models. The decision model for Luminal-A subtype was constructed based on the presence of an echogenic halo and post-acoustic shadowing or indifference. The decision model for Luminal-B subtype was constructed based on the absence of an echogenic halo and vascularity. The decision model for HER2-amplified subtype was constructed based on the presence of post-acoustic enhancement, calcification, vascularity and advanced age. The model for Triple-negative subtype followed two rules. One was based on irregular shape, lobulate margin contour, the absence of calcification and hypovascularity, whereas the other was based on oval shape, hypovascularity and micro-lobulate margin contour. The accuracies of the models were 83.8%, 77.4%, 87.9% and 92.7%, respectively. We identified specific features of each molecular subtype and expanded the scope of ultrasound for making diagnoses using these decision models. PMID:26046791

  10. A molecular topology approach to predicting pesticide pollution of groundwater

    USGS Publications Warehouse

    Worrall , Fred

    2001-01-01

    Various models have proposed methods for the discrimination of polluting and nonpolluting compounds on the basis of simple parameters, typically adsorption and degradation constants. However, such attempts are prone to site variability and measurement error to the extent that compounds cannot be reliably classified nor the chemistry of pollution extrapolated from them. Using observations of pesticide occurrence in U.S. groundwater it is possible to show that polluting from nonpolluting compounds can be distinguished purely on the basis of molecular topology. Topological parameters can be derived without measurement error or site-specific variability. A logistic regression model has been developed which explains 97% of the variation in the data, with 86% of the variation being explained by the rule that a compound will be found in groundwater if 6 < 0.55. Where 6χp is the sixth-order molecular path connectivity. One group of compounds cannot be classified by this rule and prediction requires reference to higher order connectivity parameters. The use of molecular approaches for understanding pollution at the molecular level and their application to agrochemical development and risk assessment is discussed.

  11. Reliable prediction of adsorption isotherms via genetic algorithm molecular simulation.

    PubMed

    LoftiKatooli, L; Shahsavand, A

    2017-01-01

    Conventional molecular simulation techniques such as grand canonical Monte Carlo (GCMC) strictly rely on purely random search inside the simulation box for predicting the adsorption isotherms. This blind search is usually extremely time demanding for providing a faithful approximation of the real isotherm and in some cases may lead to non-optimal solutions. A novel approach is presented in this article which does not use any of the classical steps of the standard GCMC method, such as displacement, insertation, and removal. The new approach is based on the well-known genetic algorithm to find the optimal configuration for adsorption of any adsorbate on a structured adsorbent under prevailing pressure and temperature. The proposed approach considers the molecular simulation problem as a global optimization challenge. A detailed flow chart of our so-called genetic algorithm molecular simulation (GAMS) method is presented, which is entirely different from traditions molecular simulation approaches. Three real case studies (for adsorption of CO2 and H2 over various zeolites) are borrowed from literature to clearly illustrate the superior performances of the proposed method over the standard GCMC technique. For the present method, the average absolute values of percentage errors are around 11% (RHO-H2), 5% (CHA-CO2), and 16% (BEA-CO2), while they were about 70%, 15%, and 40% for the standard GCMC technique, respectively.

  12. TU-CD-BRB-01: Normal Lung CT Texture Features Improve Predictive Models for Radiation Pneumonitis

    SciTech Connect

    Krafft, S; Briere, T; Court, L; Martel, M

    2015-06-15

    Purpose: Existing normal tissue complication probability (NTCP) models for radiation pneumonitis (RP) traditionally rely on dosimetric and clinical data but are limited in terms of performance and generalizability. Extraction of pre-treatment image features provides a potential new category of data that can improve NTCP models for RP. We consider quantitative measures of total lung CT intensity and texture in a framework for prediction of RP. Methods: Available clinical and dosimetric data was collected for 198 NSCLC patients treated with definitive radiotherapy. Intensity- and texture-based image features were extracted from the T50 phase of the 4D-CT acquired for treatment planning. A total of 3888 features (15 clinical, 175 dosimetric, and 3698 image features) were gathered and considered candidate predictors for modeling of RP grade≥3. A baseline logistic regression model with mean lung dose (MLD) was first considered. Additionally, a least absolute shrinkage and selection operator (LASSO) logistic regression was applied to the set of clinical and dosimetric features, and subsequently to the full set of clinical, dosimetric, and image features. Model performance was assessed by comparing area under the curve (AUC). Results: A simple logistic fit of MLD was an inadequate model of the data (AUC∼0.5). Including clinical and dosimetric parameters within the framework of the LASSO resulted in improved performance (AUC=0.648). Analysis of the full cohort of clinical, dosimetric, and image features provided further and significant improvement in model performance (AUC=0.727). Conclusions: To achieve significant gains in predictive modeling of RP, new categories of data should be considered in addition to clinical and dosimetric features. We have successfully incorporated CT image features into a framework for modeling RP and have demonstrated improved predictive performance. Validation and further investigation of CT image features in the context of RP NTCP

  13. Coarse-grained molecular dynamics simulations linking molecular features of polycations to polycation-polyanion complexation for gene delivery

    NASA Astrophysics Data System (ADS)

    McLeland, Anna; Johnson, Daniel; Jayaraman, Arthi

    2014-03-01

    Gene therapy is a method involving transfection or delivery of therapeutic DNA to target cells for expression of proteins that can cure diseases. Polycations have shown tremendous potential as DNA delivery vectors because the positive charges along the polycation interact with the negatively charged DNA backbone to form a polyplex that protects and transfects the DNA. Past work has shown that the structure and chemistry of the polycation affects DNA transfection efficiency. In this work, we use coarse grained models that are mapped from atomistic simulations, along with molecular dynamics simulations to study the binding of polycations and polyanions into polyplexes. We characterize the structure, surface composition and shape of the polyplex, features that impact DNA delivery, as a function of polycation chemistry, architecture (linear versus grafted), and molecular weight. The results from these simulations serve as valuable guidelines for experimentalists on what molecular characteristics they need to incorporate in the polycations to achieve higher transfection efficiency.

  14. Established and emerging variants of glioblastoma multiforme: review of morphological and molecular features.

    PubMed

    Karsy, Michael; Gelbman, Marshall; Shah, Paarth; Balumbu, Odessa; Moy, Fred; Arslan, Erol

    2012-01-01

    Since the recent publication of the World Health Organization brain tumour classification guidelines in 2007, a significant expansion in the molecular understanding of glioblastoma multiforme (GBM) and its pathological as well as genomic variants has been evident. The purpose of this review article is to evaluate the histopathological, molecular and clinical features surrounding emerging and currently established GBM variants. The tumours discussed include classic glioblastoma multiforme and its four genomic variants, proneural, neural, mesenchymal, classical, as well as gliosarcoma (GS), and giant cell GBM (gcGBM). Furthermore, the emerging variants include fibrillary/epithelial GBM, small cell astrocytoma (SCA), GBM with oligodendroglial component (GBMO), GBM with primitive neuroectodermal features (GBM-PNET), gemistocytic astrocytoma (GA), granular cell astrocytoma (GCA), and paediatric high-grade glioma (HGG) as well as diffuse intrinsic pontine glioma (DIPG). Better understanding of the heterogeneous nature of GBM may provide improved treatment paradigms, prognostic classification, and approaches towards molecularly targeted treatments.

  15. Using molecular equivalence numbers to visually explore structural features that distinguish chemical libraries.

    PubMed

    Xu, Yong-Jin; Johnson, Mark

    2002-01-01

    A molecular equivalence number (meqnum) classifies a molecule with respect to a class of structural features or topological shapes such as its cyclic system or its set of functional groups. Meqnums can be used to organize molecular structures into nonoverlapping, yet highly relatable classes. We illustrate the construction of some different types of meqnums and present via examples some methods of comparing diverse chemical libraries based on meqnums. In the examples we compare a library which is a random sample from the MDL Drug Data Report (MDDR) with a library which is a random sample from the Available Chemical Directory (ACD). In our analyses, we discover some interesting features of the topological shape of a molecule and its set of functional groups that are strongly linked with compounds occurring in the MDDR but not in the ACD. We also illustrate the utility of molecular equivalence indices in delineating the structural domain over which an SAR conclusion is valid.

  16. Predictive testing of early CIN behaviour by molecular biomarkers.

    PubMed

    Baak, Jan P A; Kruse, Arnold-Jan; Janssen, Emiel; van Diermen, Bianca

    2005-01-01

    Each year, 330,000 new Cervical Intraepithelial Neoplasias(CIN) occur in the European Union (EU) of which 120,000 are early CIN where grade (1, 2) indicates the progression-risk to CIN-3 and therefore determines the treatment choice. However, the Positive Predictive Value (PPV) of CIN grade to predict progression is low (10% and 20% for CIN-1 and -2 respectively, 16% on average) resulting in an enormous number of over-treatments indicating worrisome grade reproducibility.Certain molecular biomarkers such as Ki-67 have a higher PPV (30%, an improvement of 14%), which in Europe alone could improve treatment for many thousands of women per year with considerable cost reduction for the health care system. The quantitative Ki-67 prognostic model has been validated in independent retrospective and prospective studies from different laboratories. Moreover, the PPV of Ki-67 alone can be improved by additional molecular biomarkers (retinoblastoma protein = Rb, cytokeratins= CK-14/-13). Combined Ki67-Rb allows a 2-tiered progression-risk subgroup assignment as very low ( approximately 0% progression, 71% of all CIN-I/II patients)and high risk (48% progression risk, incidence 32%), leaving a small (7% of all) prognostically undetermined group (17% progression). Additional CK-14 and -13 analysis can sub-classify the high-risk in an intermediate and very high risk subgroup(with 40% and 100% progression risks respectively).Thus, molecular biomarkers are potentially important determinators of early CIN lesion behaviour. Important factors for widespread acceptance of molecular biomarkers are (1) market penetration by user-friendly equipment, (2) (inter)national keeping of GLP conditions (reproducibility, independent validation), requiring customer-driven industrial efforts,governmental measures, and additional PPV improvement to further reduce over-treatment.

  17. Predictive and Prognostic Molecular Biomarkers for Response to Neoadjuvant Chemoradiation in Rectal Cancer

    PubMed Central

    Dayde, Delphine; Tanaka, Ichidai; Jain, Rekha; Tai, Mei Chee; Taguchi, Ayumu

    2017-01-01

    The standard of care in locally advanced rectal cancer is neoadjuvant chemoradiation (nCRT) followed by radical surgery. Response to nCRT varies among patients and pathological complete response is associated with better outcome. However, there is a lack of effective methods to select rectal cancer patients who would or would not have a benefit from nCRT. The utility of clinicopathological and radiological features are limited due to lack of adequate sensitivity and specificity. Molecular biomarkers have the potential to predict response to nCRT at an early time point, but none have currently reached the clinic. Integration of diverse types of biomarkers including clinicopathological and imaging features, identification of mechanistic link to tumor biology, and rigorous validation using samples which represent disease heterogeneity, will allow to develop a sensitive and cost-effective molecular biomarker panel for precision medicine in rectal cancer. Here, we aim to review the recent advance in tissue- and blood-based molecular biomarker research and illustrate their potential in predicting nCRT response in rectal cancer. PMID:28272347

  18. Predictive and Prognostic Molecular Biomarkers for Response to Neoadjuvant Chemoradiation in Rectal Cancer.

    PubMed

    Dayde, Delphine; Tanaka, Ichidai; Jain, Rekha; Tai, Mei Chee; Taguchi, Ayumu

    2017-03-07

    The standard of care in locally advanced rectal cancer is neoadjuvant chemoradiation (nCRT) followed by radical surgery. Response to nCRT varies among patients and pathological complete response is associated with better outcome. However, there is a lack of effective methods to select rectal cancer patients who would or would not have a benefit from nCRT. The utility of clinicopathological and radiological features are limited due to lack of adequate sensitivity and specificity. Molecular biomarkers have the potential to predict response to nCRT at an early time point, but none have currently reached the clinic. Integration of diverse types of biomarkers including clinicopathological and imaging features, identification of mechanistic link to tumor biology, and rigorous validation using samples which represent disease heterogeneity, will allow to develop a sensitive and cost-effective molecular biomarker panel for precision medicine in rectal cancer. Here, we aim to review the recent advance in tissue- and blood-based molecular biomarker research and illustrate their potential in predicting nCRT response in rectal cancer.

  19. The identification of molecular surfaces' feature regions based on spherical mapping

    NASA Astrophysics Data System (ADS)

    Zhang, Meiling; Zhang, Jingqiao

    2017-02-01

    As possible active sites, the concave and convex feature regions of the molecule are the locations where the molecular docking will happen more possibly. Then how to search for those regions is valuable to study. In this paper, a new method is proposed for identifying concave and convex regions. Based on the established spherical mapping between molecular surfaces and its bounding-sphere surfaces, the concave and convex vertices of local areas can be determined according to the expansion distance defined by the spherical mapping. Then through mesh growing, a feature region can be firmed by a concave point or a convex point, also called center point, and its neighboring faces, whose normal vector has an angle in a specified range with the center point. After that, areas and volumes of feature regions are calculated. The experimental results indicate that the method can well identify the concave and convex characteristics of the molecule.

  20. Dynamics of Molecular Emission Features from Nanosecond, Femtosecond Laser and Filament Ablation Plasmas

    SciTech Connect

    Harilal, Sivanandan S.; Yeak, J.; Brumfield, Brian E.; Suter, Jonathan D.; Phillips, Mark C.

    2016-06-15

    The evolutionary paths of molecular species and nanoparticles in laser ablation plumes are not well understood due to the complexity of numerous physical processes that occur simultaneously in a transient laser-plasma system. It is well known that the emission features of ions, atoms, molecules and nanoparticles in a laser ablation plume strongly depend on the laser irradiation conditions. In this letter we report the temporal emission features of AlO molecules in plasmas generated using a nanosecond laser, a femtosecond laser and filaments generated from a femtosecond laser. Our results show that, at a fixed laser energy, the persistence of AlO is found to be highest and lowest in ns and filament laser plasmas respectively while molecular species are formed at early times for both ultrashort pulse (fs and filament) generated plasmas. Analysis of the AlO emission band features show that the vibrational temperature of AlO decays rapidly in filament assisted laser ablation plumes.

  1. Prediction of protein secondary structure using probability based features and a hybrid system.

    PubMed

    Ghanty, Pradip; Pal, Nikhil R; Mudi, Rajani K

    2013-10-01

    In this paper, we propose some co-occurrence probability-based features for prediction of protein secondary structure. The features are extracted using occurrence/nonoccurrence of secondary structures in the protein sequences. We explore two types of features: position-specific (based on position of amino acid on fragments of protein sequences) as well as position-independent (independent of amino acid position on fragments of protein sequences). We use a hybrid system, NEUROSVM, consisting of neural networks and support vector machines for classification of secondary structures. We propose two schemes NSVMps and NSVM for protein secondary structure prediction. The NSVMps uses position-specific probability-based features and NEUROSVM classifier whereas NSVM uses the same classifier with position-independent probability-based features. The proposed method falls in the single-sequence category of methods because it does not use any sequence profile information such as position specific scoring matrices (PSSM) derived from PSI-BLAST. Two widely used datasets RS126 and CB513 are used in the experiments. The results obtained using the proposed features and NEUROSVM classifier are better than most of the existing single-sequence prediction methods. Most importantly, the results using NSVMps that are obtained using lower dimensional features, are comparable to those by other existing methods. The NSVMps and NSVM are finally tested on target proteins of the critical assessment of protein structure prediction experiment-9 (CASP9). A larger dataset is used to compare the performance of the proposed methods with that of two recent single-sequence prediction methods. We also investigate the impact of presence of different amino acid residues (in protein sequences) that are responsible for the formation of different secondary structures.

  2. Early prediction of clinical benefit of treating ovarian cancer using quantitative CT image feature analysis.

    PubMed

    Qiu, Yuchen; Tan, Maxine; McMeekin, Scott; Thai, Theresa; Ding, Kai; Moore, Kathleen; Liu, Hong; Zheng, Bin

    2016-09-01

    In current clinical trials of treating ovarian cancer patients, how to accurately predict patients' response to the chemotherapy at an early stage remains an important and unsolved challenge. To investigate feasibility of applying a new quantitative image analysis method for predicting early response of ovarian cancer patients to chemotherapy in clinical trials. A dataset of 30 patients was retrospectively selected in this study, among which 12 were responders with 6-month progression-free survival (PFS) and 18 were non-responders. A computer-aided detection scheme was developed to segment tumors depicted on two sets of CT images acquired pre-treatment and 4-6 weeks post treatment. The scheme computed changes of three image features related to the tumor volume, density, and density variance. We analyzed performance of using each image feature and applying a decision tree to predict patients' 6-month PFS. The prediction accuracy of using quantitative image features was also compared with the clinical record based on the Response Evaluation Criteria in Solid Tumors (RECIST) guideline. The areas under receiver operating characteristic curve (AUC) were 0.773 ± 0.086, 0.680 ± 0.109, and 0.668 ± 0.101, when using each of three features, respectively. AUC value increased to 0.831 ± 0.078 when combining these features together. The decision-tree classifier achieved a higher predicting accuracy (76.7%) than using RECIST guideline (60.0%). This study demonstrated the potential of using a quantitative image feature analysis method to improve accuracy of predicting early response of ovarian cancer patients to the chemotherapy in clinical trials. © The Foundation Acta Radiologica 2015.

  3. Scoring multiple features to predict drug disease associations using information fusion and aggregation.

    PubMed

    Moghadam, H; Rahgozar, M; Gharaghani, S

    2016-08-01

    Prediction of drug-disease associations is one of the current fields in drug repositioning that has turned into a challenging topic in pharmaceutical science. Several available computational methods use network-based and machine learning approaches to reposition old drugs for new indications. However, they often ignore features of drugs and diseases as well as the priority and importance of each feature, relation, or interactions between features and the degree of uncertainty. When predicting unknown drug-disease interactions there are diverse data sources and multiple features available that can provide more accurate and reliable results. This information can be collectively mined using data fusion methods and aggregation operators. Therefore, we can use the feature fusion method to make high-level features. We have proposed a computational method named scored mean kernel fusion (SMKF), which uses a new method to score the average aggregation operator called scored mean. To predict novel drug indications, this method systematically combines multiple features related to drugs or diseases at two levels: the drug-drug level and the drug-disease level. The purpose of this study was to investigate the effect of drug and disease features as well as data fusion to predict drug-disease interactions. The method was validated against a well-established drug-disease gold-standard dataset. When compared with the available methods, our proposed method outperformed them and competed well in performance with area under cover (AUC) of 0.91, F-measure of 84.9% and Matthews correlation coefficient of 70.31%.

  4. Adaptive reliance on the most stable sensory predictions enhances perceptual feature extraction of moving stimuli

    PubMed Central

    Kumar, Neeraj

    2016-01-01

    The prediction of the sensory outcomes of action is thought to be useful for distinguishing self- vs. externally generated sensations, correcting movements when sensory feedback is delayed, and learning predictive models for motor behavior. Here, we show that aspects of another fundamental function—perception—are enhanced when they entail the contribution of predicted sensory outcomes and that this enhancement relies on the adaptive use of the most stable predictions available. We combined a motor-learning paradigm that imposes new sensory predictions with a dynamic visual search task to first show that perceptual feature extraction of a moving stimulus is poorer when it is based on sensory feedback that is misaligned with those predictions. This was possible because our novel experimental design allowed us to override the “natural” sensory predictions present when any action is performed and separately examine the influence of these two sources on perceptual feature extraction. We then show that if the new predictions induced via motor learning are unreliable, rather than just relying on sensory information for perceptual judgments, as is conventionally thought, then subjects adaptively transition to using other stable sensory predictions to maintain greater accuracy in their perceptual judgments. Finally, we show that when sensory predictions are not modified at all, these judgments are sharper when subjects combine their natural predictions with sensory feedback. Collectively, our results highlight the crucial contribution of sensory predictions to perception and also suggest that the brain intelligently integrates the most stable predictions available with sensory information to maintain high fidelity in perceptual decisions. PMID:26823516

  5. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    PubMed

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  6. The role of affective instability and impulsivity in predicting future BPD features.

    PubMed

    Tragesser, Sarah L; Solhan, Marika; Schwartz-Mette, Rebecca; Trull, Timothy J

    2007-12-01

    Models of borderline personality disorder (BPD) suggest that extreme levels of affective instability/emotional dysregulation, impulsivity, or the combination of these two traits account for the symptoms characteristic of BPD. The present study utilized longitudinal data to evaluate the ability of Personality Assessment Inventory-Borderline Features (PAI-BOR; Morey, 1991) subscale scores to predict BPD features two years later as a test of these models of BPD. Participants were 156 male and 194 female young adults who completed the PAI-BOR at age 18 and again two years later. Three models were compared: (a) Wave 1 affective instability scores predicting Wave 2 BPD features (AI model); (b) Wave 1 self-harm/impulsivity scores predicting Wave 2 BPD features (IMP model); and (c) both Wave 1 affective instability and self-harm/impulsivity scores predicting Wave 2 BPD features (AI-IMP model), all controlling for stabilities and within-time covariances. Results indicated that the AI model provided the best fit to the data, and improved model fit over a baseline stabilities model and the other models tested. These results are consistent with Linehan's theory (1993) that emotional dysregulation drives the other BPD symptoms.

  7. BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection.

    PubMed

    Kandaswamy, Krishna Kumar; Pugalenthi, Ganesan; Hazrati, Mehrnaz Khodam; Kalies, Kai-Uwe; Martinetz, Thomas

    2011-08-17

    Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence. In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated. BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. The BLProt software is available at http://www.inb.uni-luebeck.de/tools-demos/bioluminescent%20protein/BLProt.

  8. Leveraging external knowledge on molecular interactions in classification methods for risk prediction of patients.

    PubMed

    Porzelius, Christine; Johannes, Marc; Binder, Harald; Beissbarth, Tim

    2011-03-01

    Classification of patients based on molecular markers, for example into different risk groups, is a modern field in medical research. The aim of this classification is often a better diagnosis or individualized therapy. The search for molecular markers often utilizes extremely high-dimensional data sets (e.g. gene-expression microarrays). However, in situations where the number of measured markers (genes) is intrinsically higher than the number of available patients, standard methods from statistical learning fail to deal correctly with this so-called "curse of dimensionality". Also feature or dimension reduction techniques based on statistical models promise only limited success. Several recent methods explore ideas of how to quantify and incorporate biological prior knowledge of molecular interactions and known cellular processes into the feature selection process. This article aims to give an overview of such current methods as well as the databases, where this external knowledge can be obtained from. For illustration, two recent methods are compared in detail, a feature selection approach for support vector machines as well as a boosting approach for regression models. As a practical example, data on patients with acute lymphoblastic leukemia are considered, where the binary endpoint "relapse within first year" should be predicted. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Applying Quantitative CT Image Feature Analysis to Predict Response of Ovarian Cancer Patients to Chemotherapy.

    PubMed

    Danala, Gopichandh; Thai, Theresa; Gunderson, Camille C; Moxley, Katherine M; Moore, Kathleen; Mannel, Robert S; Liu, Hong; Zheng, Bin; Qiu, Yuchen

    2017-10-01

    The study aimed to investigate the role of applying quantitative image features computed from computed tomography (CT) images for early prediction of tumor response to chemotherapy in the clinical trials for treating ovarian cancer patients. A dataset involving 91 patients was retrospectively assembled. Each patient had two sets of pre- and post-therapy CT images. A computer-aided detection scheme was applied to segment metastatic tumors previously tracked by radiologists on CT images and computed image features. Two initial feature pools were built using image features computed from pre-therapy CT images only and image feature difference computed from both pre- and post-therapy images. A feature selection method was applied to select optimal features, and an equal-weighted fusion method was used to generate a new quantitative imaging marker from each pool to predict 6-month progression-free survival. The prediction accuracy between quantitative imaging markers and the Response Evaluation Criteria in Solid Tumors (RECIST) criteria was also compared. The highest areas under the receiver operating characteristic curve are 0.684 ± 0.056 and 0.771 ± 0.050 when using a single image feature computed from pre-therapy CT images and feature difference computed from pre- and post-therapy CT images, respectively. Using two corresponding fusion-based image markers, the areas under the receiver operating characteristic curve significantly increased to 0.810 ± 0.045 and 0.829 ± 0.043 (P < 0.05), respectively. Overall prediction accuracy levels are 71.4%, 80.2%, and 74.7% when using two imaging markers and RECIST, respectively. This study demonstrated the feasibility of predicting patients' response to chemotherapy using quantitative imaging markers computed from pre-therapy CT images. However, using image feature difference computed between pre- and post-therapy CT images yielded higher prediction accuracy. Copyright © 2017 The Association of University

  10. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    PubMed Central

    2011-01-01

    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite

  11. MRI signal and texture features for the prediction of MCI to Alzheimer's disease progression

    NASA Astrophysics Data System (ADS)

    Martínez-Torteya, Antonio; Rodríguez-Rojas, Juan; Celaya-Padilla, José M.; Galván-Tejada, Jorge I.; Treviño, Victor; Tamez-Peña, José G.

    2014-03-01

    An early diagnosis of Alzheimer's disease (AD) confers many benefits. Several biomarkers from different information modalities have been proposed for the prediction of MCI to AD progression, where features extracted from MRI have played an important role. However, studies have focused almost exclusively in the morphological characteristics of the images. This study aims to determine whether features relating to the signal and texture of the image could add predictive power. Baseline clinical, biological and PET information, and MP-RAGE images for 62 subjects from the Alzheimer's Disease Neuroimaging Initiative were used in this study. Images were divided into 83 regions and 50 features were extracted from each one of these. A multimodal database was constructed, and a feature selection algorithm was used to obtain an accurate and small logistic regression model, which achieved a cross-validation accuracy of 0.96. These model included six features, five of them obtained from the MP-RAGE image, and one obtained from genotyping. A risk analysis divided the subjects into low-risk and high-risk groups according to a prognostic index, showing that both groups are statistically different (p-value of 2.04e-11). The results demonstrate that MRI features related to both signal and texture, add MCI to AD predictive power, and support the idea that multimodal biomarkers outperform single-modality biomarkers.

  12. Predicting the types of metabolic pathway of compounds using molecular fragments and sequential minimal optimization.

    PubMed

    Chen, Lei; Chu, Chen; Feng, Kaiyan

    2016-01-01

    A metabolic pathway is a series of biological processes providing necessary molecules and energies for an organism, which could be essential to the lives of the living organisms. Most metabolic pathways require the involvement of compounds and given a compound it is helpful to know what types of metabolic pathways the compound participates in. In this study, compounds are first represented by molecular fragments which are then delivered to a prediction engine called Sequential Minimal Optimization (SMO) for predictions. Maximum relevance and minimum redundancy (mRMR) and incremental feature selection are adopted to extract key features based on which an optimal prediction engine is established. The proposed method is effective comparing to the random forest, Dagging and a popular method that integrating chemical-chemical interactions and chemical-chemical similarities. We also make predictions using some compounds with unknown metabolic pathways and choose 17 compounds for analysis. The results indicate that the method proposed may become a useful tool in predicting and analyzing metabolic pathways.

  13. Prediction and Dissection of Protein-RNA Interactions by Molecular Descriptors.

    PubMed

    Liu, Zhi-Ping; Chen, Luonan

    2016-01-01

    Protein-RNA interactions play crucial roles in numerous biological processes. However, detecting the interactions and binding sites between protein and RNA by traditional experiments is still time consuming and labor costing. Thus, it is of importance to develop bioinformatics methods for predicting protein-RNA interactions and binding sites. Accurate prediction of protein-RNA interactions and recognitions will highly benefit to decipher the interaction mechanisms between protein and RNA, as well as to improve the RNA-related protein engineering and drug design. In this work, we summarize the current bioinformatics strategies of predicting protein-RNA interactions and dissecting protein-RNA interaction mechanisms from local structure binding motifs. In particular, we focus on the feature-based machine learning methods, in which the molecular descriptors of protein and RNA are extracted and integrated as feature vectors of representing the interaction events and recognition residues. In addition, the available methods are classified and compared comprehensively. The molecular descriptors are expected to elucidate the binding mechanisms of protein-RNA interaction and reveal the functional implications from structural complementary perspective.

  14. Dopamine Neurons Respond to Errors in the Prediction of Sensory Features of Expected Rewards.

    PubMed

    Takahashi, Yuji K; Batchelor, Hannah M; Liu, Bing; Khanna, Akash; Morales, Marisela; Schoenbaum, Geoffrey

    2017-09-13

    Midbrain dopamine neurons have been proposed to signal prediction errors as defined in model-free reinforcement learning algorithms. While these algorithms have been extremely powerful in interpreting dopamine activity, these models do not register any error unless there is a difference between the value of what is predicted and what is received. Yet learning often occurs in response to changes in the unique features that characterize what is received, sometimes with no change in its value at all. Here, we show that classic error-signaling dopamine neurons also respond to changes in value-neutral sensory features of an expected reward. This suggests that dopamine neurons have access to a wider variety of information than contemplated by the models currently used to interpret their activity and that, while their firing may conform to predictions of these models in some cases, they are not restricted to signaling errors in the prediction of value. Published by Elsevier Inc.

  15. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

  16. Quantitative prediction of drug side effects based on drug-related features.

    PubMed

    Niu, Yanqing; Zhang, Wen

    2017-09-01

    Unexpected side effects of drugs are great concern in the drug development, and the identification of side effects is an important task. Recently, machine learning methods are proposed to predict the presence or absence of interested side effects for drugs, but it is difficult to make the accurate prediction for all of them. In this paper, we transform side effect profiles of drugs as their quantitative scores, by summing up their side effects with weights. The quantitative scores may measure the dangers of drugs, and thus help to compare the risk of different drugs. Here, we attempt to predict quantitative scores of drugs, namely the quantitative prediction. Specifically, we explore a variety of drug-related features and evaluate their discriminative powers for the quantitative prediction. Then, we consider several feature combination strategies (direct combination, average scoring ensemble combination) to integrate three informative features: chemical substructures, targets, and treatment indications. Finally, the average scoring ensemble model which produces the better performances is used as the final quantitative prediction model. Since weights for side effects are empirical values, we randomly generate different weights in the simulation experiments. The experimental results show that the quantitative method is robust to different weights, and produces satisfying results. Although other state-of-the-art methods cannot make the quantitative prediction directly, the prediction results can be transformed as the quantitative scores. By indirect comparison, the proposed method produces much better results than benchmark methods in the quantitative prediction. In conclusion, the proposed method is promising for the quantitative prediction of side effects, which may work cooperatively with existing state-of-the-art methods to reveal dangers of drugs.

  17. Prediction of pesticides chromatographic lipophilicity from the computational molecular descriptors.

    PubMed

    Casoni, Dorina; Petre, Jana; David, Victor; Sârbu, Costel

    2011-02-01

    Quantitative structure-property relationship models were developed for the prediction of pesticides and some PAH compounds lipophilicity based on a wide set of computational molecular descriptors and a set of experimental chromatographic data. The chromatographic lipophilicity of pesticides has been evaluated by high-performance liquid chromatography (HPLC) using different chemically bonded (C18, C8, CN and Phenyl HPLC columns) stationary phases and two different organic modifiers (methanol and acetonitrile, respectively) in the mobile phase composition. Through a systematic study, by using the classic multivariate analysis, several quantitative structure-property/lipophilicity multi-dimensional models were established. Multiple linear regression and genetic algorithm for the variable subset selection were used. The internal and external statistical evaluation procedures revealed some appropriate models for the chromatographic lipophilicity prediction of pesticides. Moreover, the statistical parameters of regression and those obtained by applying t-test for the intercept (a(0)) and for the slope (a(1)) in order to evaluate relationship between experimental and predicted octanol-water partition coefficients in case of the test set compounds, revealed two statistically valid models that can be successfully used in lipophilicity prediction of pesticides.

  18. Combining multiple ECG features does not improve prediction of defibrillation outcome compared to single features in a large population of out-of-hospital cardiac arrests.

    PubMed

    He, Mi; Gong, Yushun; Li, Yongqin; Mauri, Tommaso; Fumagalli, Francesca; Bozzola, Marcella; Cesana, Giancarlo; Latini, Roberto; Pesenti, Antonio; Ristagno, Giuseppe

    2015-12-10

    Quantitative electrocardiographic (ECG) waveform analysis provides a noninvasive reflection of the metabolic milieu of the myocardium during resuscitation and is a potentially useful tool to optimize the defibrillation strategy. However, whether combining multiple ECG features can improve the capability of defibrillation outcome prediction in comparison to single feature analysis is still uncertain. A total of 3828 defibrillations from 1617 patients who experienced out-of-hospital cardiac arrest were analyzed. A 2.048-s ECG trace prior to each defibrillation without chest compressions was used for the analysis. Sixteen predictive features were optimized through the training dataset that included 2447 shocks from 1050 patients. Logistic regression, neural network and support vector machine were used to combine multiple features for the prediction of defibrillation outcome. Performance between single and combined predictive features were compared by area under receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and prediction accuracy (PA) on a validation dataset that consisted of 1381 shocks from 567 patients. Among the single features, mean slope (MS) outperformed other methods with an AUC of 0.876. Combination of complementary features using neural network resulted in the highest AUC of 0.874 among the multifeature-based methods. Compared to MS, no statistical difference was observed in AUC, sensitivity, specificity, PPV, NPV and PA when multiple features were considered. In this large dataset, the amplitude-related features achieved better defibrillation outcome prediction capability than other features. Combinations of multiple electrical features did not further improve prediction performance.

  19. Labour induction at term: clinical, biophysical and molecular predictive factors.

    PubMed

    Riboni, Francesca; Garofalo, Greta; Pascoli, Irene; Vitulo, Anna; Dell'avanzo, Marinella; Battagliarin, Giuseppe; Paternoster, Delia

    2012-11-01

    The aim of this multicentric study is to compare clinical, biophysical and molecular parameters in the prediction of the success of labour induction with prostaglandins. We included 115 women, who underwent to labour induction at term with vaginal prostaglandin gel. We evaluated the diagnostic efficiency of endocervical phosphorylated insulin-like growth factor-binding protein (phIGFBP-1), cervicovaginal interleukins 6 (IL-6) and 8 (IL-8). We analyzed the transvaginal sonographic measurement of cervical length. A receiver-operating characteristics (ROC) curve was used to determine the most useful cut-off point. A multivariate logistic regression model was used to analyze the combination of significant predictive variables following univariate analysis. We analyzed all the data searching for the parameters that best predict the beginning of the active phase of labour within 12 h. 36.5 % of the patients delivered within 12 h. The Bishop score was >4 in the 43 % of patients with an active phase. The best cut-off values at ROC curves for cervical length, IL-6 and IL-8 were respectively 22 mm, 5 mg/dl and 20,237 mg/dl. At univariate analysis, all predictors of success, with the exception of IL-6, were significantly associated with the beginning of the active phase. Multivariate analysis of the Bishop score (OR 2.3), phIGFBP-1 test (OR 11.2) and IL-8 (OR 6.6) showed that the variables were independent and therefore useful in combination to predict the success of labour induction. The phIGFBP-1 test is a fast and easy test that can be used with Bishop score and IL-8 to reach an high positive predictive value in the prediction of the success of labour induction with prostaglandins.

  20. Health Communication in Social Media: Message Features Predicting User Engagement on Diabetes-Related Facebook Pages.

    PubMed

    Rus, Holly M; Cameron, Linda D

    2016-10-01

    Social media provides unprecedented opportunities for enhancing health communication and health care, including self-management of chronic conditions such as diabetes. Creating messages that engage users is critical for enhancing message impact and dissemination. This study analyzed health communications within ten diabetes-related Facebook pages to identify message features predictive of user engagement. The Common-Sense Model of Illness Self-Regulation and established health communication techniques guided content analyses of 500 Facebook posts. Each post was coded for message features predicted to engage users and numbers of likes, shares, and comments during the week following posting. Multi-level, negative binomial regressions revealed that specific features predicted different forms of engagement. Imagery emerged as a strong predictor; messages with images had higher rates of liking and sharing relative to messages without images. Diabetes consequence information and positive identity predicted higher sharing while negative affect, social support, and crowdsourcing predicted higher commenting. Negative affect, crowdsourcing, and use of external links predicted lower sharing while positive identity predicted lower commenting. The presence of imagery weakened or reversed the positive relationships of several message features with engagement. Diabetes control information and negative affect predicted more likes in text-only messages, but fewer likes when these messages included illustrative imagery. Similar patterns of imagery's attenuating effects emerged for the positive relationships of consequence information, control information, and positive identity with shares and for positive relationships of negative affect and social support with comments. These findings hold promise for guiding communication design in health-related social media.

  1. Multivariate Feature Selection for Predicting Scour-Related Bridge Damage using a Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Anderson, I.

    2015-12-01

    Scour and hydraulic damage are the most common cause of bridge failure, reported to be responsible for over 60% of bridge failure nationwide. Scour is a complex process, and is likely an epistatic function of both bridge and stream conditions that are both stationary and in dynamic flux. Bridge inspections, conducted regularly on bridges nationwide, rate bridge health assuming a static stream condition, and typically do not include dynamically changing geomorphological adjustments. The Vermont Agency of Natural Resources stream geomorphic assessment data could add value into the current bridge inspection and scour design. The 2011 bridge damage from Tropical Storm Irene served as a case study for feature selection to improve bridge scour damage prediction in extreme events. The bridge inspection (with over 200 features on more than 300 damaged and 2,000 non-damaged bridges), and the stream geomorphic assessment (with over 300 features on more than 5000 stream reaches) constitute "Big Data", and together have the potential to generate large numbers of combined features ("epistatic relationships") that might better predict scour-related bridge damage. The potential combined features pose significant computational challenges for traditional statistical techniques (e.g., multivariate logistic regression). This study uses a genetic algorithm to perform a search of the multivariate feature space to identify epistatic relationships that are indicative of bridge scour damage. The combined features identified could be used to improve bridge scour design, and to better monitor and rate bridge scour vulnerability.

  2. Survival Prediction and Feature Selection in Patients with Breast Cancer Using Support Vector Regression.

    PubMed

    Goli, Shahrbanoo; Mahjub, Hossein; Faradmal, Javad; Mashayekhi, Hoda; Soltanian, Ali-Reza

    2016-01-01

    The Support Vector Regression (SVR) model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC) dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model.

  3. Survival Prediction and Feature Selection in Patients with Breast Cancer Using Support Vector Regression

    PubMed Central

    Goli, Shahrbanoo; Faradmal, Javad; Mashayekhi, Hoda; Soltanian, Ali-Reza

    2016-01-01

    The Support Vector Regression (SVR) model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC) dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model. PMID:27882074

  4. Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

    PubMed Central

    Laimighofer, Michael; Krumsiek, Jan; Theis, Fabian J.

    2016-01-01

    Abstract With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN. PMID:26894327

  5. Multi-Center Prediction of Hemorrhagic Transformation in Acute Ischemic Stroke using Permeability Imaging Features

    PubMed Central

    Scalzo, Fabien; Alger, Jeffry R.; Hu, Xiao; Saver, Jeffrey L.; Dani, Krishna A.; Muir, Keith W.; Demchuk, Andrew M.; Coutts, Shelagh B.; Luby, Marie; Warach, Steven; Liebeskind, David S.

    2013-01-01

    Permeability images derived from magnetic resonance (MR) perfusion images are sensitive to blood-brain barrier derangement of the brain tissue and have been shown to correlate with subsequent development of hemorrhagic transformation (HT) in acute ischemic stroke. This paper presents a multi-center retrospective study that evaluates the predictive power in terms of HT of six permeability MRI measures including contrast slope (CS), final contrast (FC), maximum peak bolus concentration (MPB), peak bolus area (PB), relative recirculation (rR), and percentage recovery (%R). Dynamic T2*-weighted perfusion MR images were collected from 263 acute ischemic stroke patients from four medical centers. An essential aspect of this study is to exploit a classifier-based framework to automatically identify predictive patterns in the overall intensity distribution of the permeability maps. The model is based on normalized intensity histograms that are used as input features to the predictive model. Linear and nonlinear predictive models are evaluated using a crossvalidation to measure generalization power on new patients and a comparative analysis is provided for the different types of parameters. Results demonstrate that perfusion imaging in acute ischemic stroke can predict HT with an average accuracy of more than 85% using a predictive model based on a nonlinear regression model. Results also indicate that the permeability feature based on the percentage of recovery performs significantly better than the other features. This novel model may be used to refine treatment decisions in acute stroke. PMID:23587928

  6. Multi-center prediction of hemorrhagic transformation in acute ischemic stroke using permeability imaging features.

    PubMed

    Scalzo, Fabien; Alger, Jeffry R; Hu, Xiao; Saver, Jeffrey L; Dani, Krishna A; Muir, Keith W; Demchuk, Andrew M; Coutts, Shelagh B; Luby, Marie; Warach, Steven; Liebeskind, David S

    2013-07-01

    Permeability images derived from magnetic resonance (MR) perfusion images are sensitive to blood-brain barrier derangement of the brain tissue and have been shown to correlate with subsequent development of hemorrhagic transformation (HT) in acute ischemic stroke. This paper presents a multi-center retrospective study that evaluates the predictive power in terms of HT of six permeability MRI measures including contrast slope (CS), final contrast (FC), maximum peak bolus concentration (MPB), peak bolus area (PB), relative recirculation (rR), and percentage recovery (%R). Dynamic T2*-weighted perfusion MR images were collected from 263 acute ischemic stroke patients from four medical centers. An essential aspect of this study is to exploit a classifier-based framework to automatically identify predictive patterns in the overall intensity distribution of the permeability maps. The model is based on normalized intensity histograms that are used as input features to the predictive model. Linear and nonlinear predictive models are evaluated using a cross-validation to measure generalization power on new patients and a comparative analysis is provided for the different types of parameters. Results demonstrate that perfusion imaging in acute ischemic stroke can predict HT with an average accuracy of more than 85% using a predictive model based on a nonlinear regression model. Results also indicate that the permeability feature based on the percentage of recovery performs significantly better than the other features. This novel model may be used to refine treatment decisions in acute stroke. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Endosonographic features predictive of benign and malignant gastrointestinal stromal cell tumours

    PubMed Central

    Palazzo, L; Landi, B; Cellier, C; Cuillerier, E; Roseau, G; Barbier, J

    2000-01-01

    BACKGROUND/AIM—Some endoscopic ultrasonographic (EUS) features have been reported to be suggestive of malignancy in gastrointestinal stromal cell tumours (SCTs). The aim of this study was to assess the predictive value of these features for malignancy.
METHODS—A total of 56 histologically proven cases of SCT studied by EUS between 1989 and 1996 were reviewed. There were 42 gastric tumours, 12 oesophageal tumours, and two rectal tumours. The tumours were divided into two groups: (a) benign SCT, comprising benign leiomyoma (n = 34); (b) malignant or borderline SCT (n = 22), comprising leiomyosarcoma (n = 9), leiomyoblastoma (n = 9), and leiomyoma of uncertain malignant potential (n = 4). The main EUS features recorded were tumour size, ulceration, echo pattern, cystic spaces, extraluminal margins, and lymph nodes with a malignant pattern. The two groups were compared by univariate and multivariate analysis.
RESULTS—Irregular extraluminal margins, cystic spaces, and lymph nodes with a malignant pattern were most predictive of malignant or borderline SCT. Pairwise combinations of the three features had a specificity and positive predictive value of 100% for malignant or borderline SCT, but a sensitivity of only 23%. The presence of at least one of these three criteria had 91% sensitivity, 88% specificity, and 83% predictive positive value. In multivariate analysis, cystic spaces and irregular margins were the only two features independently predictive of malignant potential. The features most predictive of benign SCTs were regular margins, tumour size ⩽30 mm, and a homogeneous echo pattern. When the three features were combined, histology confirmed a benign SCT in all cases.
CONCLUSIONS—The combined presence of two out of three EUS features (irregular extraluminal margins, cystic spaces, and lymph nodes with a malignant pattern) had a positive predictive value of 100% for malignant or borderline gastrointestinal SCT. Tumours less than 30

  8. Can upstaging of ductal carcinoma in situ be predicted at biopsy by histologic and mammographic features?

    NASA Astrophysics Data System (ADS)

    Shi, Bibo; Grimm, Lars J.; Mazurowski, Maciej A.; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

    2017-03-01

    Reducing the overdiagnosis and overtreatment associated with ductal carcinoma in situ (DCIS) requires accurate prediction of the invasive potential at cancer screening. In this work, we investigated the utility of pre-operative histologic and mammographic features to predict upstaging of DCIS. The goal was to provide intentionally conservative baseline performance using readily available data from radiologists and pathologists and only linear models. We conducted a retrospective analysis on 99 patients with DCIS. Of those 25 were upstaged to invasive cancer at the time of definitive surgery. Pre-operative factors including both the histologic features extracted from stereotactic core needle biopsy (SCNB) reports and the mammographic features annotated by an expert breast radiologist were investigated with statistical analysis. Furthermore, we built classification models based on those features in an attempt to predict the presence of an occult invasive component in DCIS, with generalization performance assessed by receiver operating characteristic (ROC) curve analysis. Histologic features including nuclear grade and DCIS subtype did not show statistically significant differences between cases with pure DCIS and with DCIS plus invasive disease. However, three mammographic features, i.e., the major axis length of DCIS lesion, the BI-RADS level of suspicion, and radiologist's assessment did achieve the statistical significance. Using those three statistically significant features as input, a linear discriminant model was able to distinguish patients with DCIS plus invasive disease from those with pure DCIS, with AUC-ROC equal to 0.62. Overall, mammograms used for breast screening contain useful information that can be perceived by radiologists and help predict occult invasive components in DCIS.

  9. Patient feature based dosimetric Pareto front prediction in esophageal cancer radiotherapy

    SciTech Connect

    Wang, Jiazhou; Zhao, Kuaike; Peng, Jiayuan; Xie, Jiang; Chen, Junchao; Zhang, Zhen; Hu, Weigang; Jin, Xiance; Studenski, Matthew

    2015-02-15

    Purpose: To investigate the feasibility of the dosimetric Pareto front (PF) prediction based on patient’s anatomic and dosimetric parameters for esophageal cancer patients. Methods: Eighty esophagus patients in the authors’ institution were enrolled in this study. A total of 2928 intensity-modulated radiotherapy plans were obtained and used to generate PF for each patient. On average, each patient had 36.6 plans. The anatomic and dosimetric features were extracted from these plans. The mean lung dose (MLD), mean heart dose (MHD), spinal cord max dose, and PTV homogeneity index were recorded for each plan. Principal component analysis was used to extract overlap volume histogram (OVH) features between PTV and other organs at risk. The full dataset was separated into two parts; a training dataset and a validation dataset. The prediction outcomes were the MHD and MLD. The spearman’s rank correlation coefficient was used to evaluate the correlation between the anatomical features and dosimetric features. The stepwise multiple regression method was used to fit the PF. The cross validation method was used to evaluate the model. Results: With 1000 repetitions, the mean prediction error of the MHD was 469 cGy. The most correlated factor was the first principal components of the OVH between heart and PTV and the overlap between heart and PTV in Z-axis. The mean prediction error of the MLD was 284 cGy. The most correlated factors were the first principal components of the OVH between heart and PTV and the overlap between lung and PTV in Z-axis. Conclusions: It is feasible to use patients’ anatomic and dosimetric features to generate a predicted Pareto front. Additional samples and further studies are required improve the prediction model.

  10. The predictability of molecular evolution during functional innovation.

    PubMed

    Blank, Diana; Wolf, Luise; Ackermann, Martin; Silander, Olin K

    2014-02-25

    Determining the molecular changes that give rise to functional innovations is a major unresolved problem in biology. The paucity of examples has served as a significant hindrance in furthering our understanding of this process. Here we used experimental evolution with the bacterium Escherichia coli to quantify the molecular changes underlying functional innovation in 68 independent instances ranging over 22 different metabolic functions. Using whole-genome sequencing, we show that the relative contribution of regulatory and structural mutations depends on the cellular context of the metabolic function. In addition, we find that regulatory mutations affect genes that act in pathways relevant to the novel function, whereas structural mutations affect genes that act in unrelated pathways. Finally, we use population genetic modeling to show that the relative contributions of regulatory and structural mutations during functional innovation may be affected by population size. These results provide a predictive framework for the molecular basis of evolutionary innovation, which is essential for anticipating future evolutionary trajectories in the face of rapid environmental change.

  11. Combined Molecular Dynamics Simulation-Molecular-Thermodynamic Theory Framework for Predicting Surface Tensions.

    PubMed

    Sresht, Vishnu; Lewandowski, Eric P; Blankschtein, Daniel; Jusufi, Arben

    2017-08-22

    A molecular modeling approach is presented with a focus on quantitative predictions of the surface tension of aqueous surfactant solutions. The approach combines classical Molecular Dynamics (MD) simulations with a molecular-thermodynamic theory (MTT) [ Y. J. Nikas, S. Puvvada, D. Blankschtein, Langmuir 1992 , 8 , 2680 ]. The MD component is used to calculate thermodynamic and molecular parameters that are needed in the MTT model to determine the surface tension isotherm. The MD/MTT approach provides the important link between the surfactant bulk concentration, the experimental control parameter, and the surfactant surface concentration, the MD control parameter. We demonstrate the capability of the MD/MTT modeling approach on nonionic alkyl polyethylene glycol surfactants at the air-water interface and observe reasonable agreement of the predicted surface tensions and the experimental surface tension data over a wide range of surfactant concentrations below the critical micelle concentration. Our modeling approach can be extended to ionic surfactants and their mixtures with both ionic and nonionic surfactants at liquid-liquid interfaces.

  12. Molecular structures of carotenoids as predicted by MNDO-AM1 molecular orbital calculations

    NASA Astrophysics Data System (ADS)

    Hashimoto, Hideki; Yoda, Takeshi; Kobayashi, Takayoshi; Young, Andrew J.

    2002-02-01

    Semi-empirical molecular orbital calculations using AM1 Hamiltonian (MNDO-AM1 method) were performed for a number of biologically important carotenoid molecules, namely all- trans-β-carotene, all- trans-zeaxanthin, and all- trans-violaxanthin (found in higher plants and algae) together with all- trans-canthaxanthin, all- trans-astaxanthin, and all- trans-tunaxanthin in order to predict their stable structures. The molecular structures of all- trans-β-carotene, all- trans-canthaxanthin, and all- trans-astaxanthin predicted based on molecular orbital calculations were compared with those determined by X-ray crystallography. Predicted bond lengths, bond angles, and dihedral angles showed an excellent agreement with those determined experimentally, a fact that validated the present theoretical calculations. Comparison of the bond lengths, bond angles and dihedral angles of the most stable conformer among all the carotenoid molecules showed that the displacements are localized around the substituent groups and hence around the cyclohexene rings. The most stable conformers of all- trans-zeaxanthin and all- trans-violaxanthin gave rise to a torsion angle around the C6-C7 bond to be ±48.7 and -84.8°, respectively. This difference is a key factor in relation to the biological function of these two carotenoids in plants and algae (the xanthophyll cycle). Further analyses by calculating the atomic charges and using enpartment calculations (division of bond energies between component atoms) were performed to ascribe the cause of the different observed torsion angles.

  13. PROSPER: An Integrated Feature-Based Tool for Predicting Protease Substrate Cleavage Sites

    PubMed Central

    Perry, Andrew J.; Akutsu, Tatsuya; Webb, Geoffrey I.; Whisstock, James C.; Pike, Robert N.

    2012-01-01

    The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using

  14. HER2 status in molecular apocrine breast cancer: associations with clinical, pathological, and molecular features.

    PubMed

    Guo, Wenwen; Wang, Wei; Zhu, Yun; Zhu, Xiaojing; Shi, Zhongyuan; Wang, Yan

    2015-01-01

    Molecular apocrine breast cancer (MABC) is a distinct subtype of breast cancer. The purpose of this study was to investigate the relationship between HER2 status and clinicopathologic characteristics of MABCs from Chinese Han cohort. A cohort of 90 MABC patients were enrolled. Immunohistochemical method was performed to analyze the molecular expression, and the human epidermal growth factor receptor 2 (HER2) amplification was verified by fluorescence in situ hybridization (FISH). By studying these 90 MABC cases, the majority of studied patients were premenopausal young women (median age 48 yr) with high grade tumors. We also found that MABCs had high positive expression rates of HER2, CK8, CD44, CD166, p53 and BRCA1, the elevated Ki-67 labeling index, and favorable prognosis. There was a significantly higher incidence of lymph node metastasis and lower CD166 positive rate in HER2-negative patients compared to HER2-positive patients (54.5% vs. 37.0%, P = 0.044 and 72.7% vs. 91.3%, P = 0.021, respectively). The CK5/6 and EGFR expression rates were significant higher in HER2-negative cases than in HER2-positive cases, suggesting that there is overlap between MABC with HER2-negative phenotype and basal-like breast cancer. In addition, HER2 positive was found to be significantly associated a poor overall survival in MABCs. In conclusion, HER2 are highly expressed, and HER2 positivity could be considered as a significant biomarker of poor prognosis in MABC. The results also suggest that a subtype tumor with distinct patterns of molecule expression depending on HER2 status presented in MABC.

  15. HER2 status in molecular apocrine breast cancer: associations with clinical, pathological, and molecular features

    PubMed Central

    Guo, Wenwen; Wang, Wei; Zhu, Yun; Zhu, Xiaojing; Shi, Zhongyuan; Wang, Yan

    2015-01-01

    Molecular apocrine breast cancer (MABC) is a distinct subtype of breast cancer. The purpose of this study was to investigate the relationship between HER2 status and clinicopathologic characteristics of MABCs from Chinese Han cohort. A cohort of 90 MABC patients were enrolled. Immunohistochemical method was performed to analyze the molecular expression, and the human epidermal growth factor receptor 2 (HER2) amplification was verified by fluorescence in situ hybridization (FISH). By studying these 90 MABC cases, the majority of studied patients were premenopausal young women (median age 48 yr) with high grade tumors. We also found that MABCs had high positive expression rates of HER2, CK8, CD44, CD166, p53 and BRCA1, the elevated Ki-67 labeling index, and favorable prognosis. There was a significantly higher incidence of lymph node metastasis and lower CD166 positive rate in HER2-negative patients compared to HER2-positive patients (54.5% vs. 37.0%, P = 0.044 and 72.7% vs. 91.3%, P = 0.021, respectively). The CK5/6 and EGFR expression rates were significant higher in HER2-negative cases than in HER2-positive cases, suggesting that there is overlap between MABC with HER2-negative phenotype and basal-like breast cancer. In addition, HER2 positive was found to be significantly associated a poor overall survival in MABCs. In conclusion, HER2 are highly expressed, and HER2 positivity could be considered as a significant biomarker of poor prognosis in MABC. The results also suggest that a subtype tumor with distinct patterns of molecule expression depending on HER2 status presented in MABC. PMID:26339367

  16. Improved Species-Specific Lysine Acetylation Site Prediction Based on a Large Variety of Features Set

    PubMed Central

    Wuyun, Qiqige; Zheng, Wei; Zhang, Yanping; Ruan, Jishou; Hu, Gang

    2016-01-01

    Lysine acetylation is a major post-translational modification. It plays a vital role in numerous essential biological processes, such as gene expression and metabolism, and is related to some human diseases. To fully understand the regulatory mechanism of acetylation, identification of acetylation sites is first and most important. However, experimental identification of protein acetylation sites is often time consuming and expensive. Therefore, the alternative computational methods are necessary. Here, we developed a novel tool, KA-predictor, to predict species-specific lysine acetylation sites based on support vector machine (SVM) classifier. We incorporated different types of features and employed an efficient feature selection on each type to form the final optimal feature set for model learning. And our predictor was highly competitive for the majority of species when compared with other methods. Feature contribution analysis indicated that HSE features, which were firstly introduced for lysine acetylation prediction, significantly improved the predictive performance. Particularly, we constructed a high-accurate structure dataset of H.sapiens from PDB to analyze the structural properties around lysine acetylation sites. Our datasets and a user-friendly local tool of KA-predictor can be freely available at http://sourceforge.net/p/ka-predictor. PMID:27183223

  17. Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach.

    PubMed

    Liu, Taigang; Qin, Yufang; Wang, Yongjie; Wang, Chunhua

    2015-12-24

    The prior knowledge of protein structural class may offer useful clues on understanding its functionality as well as its tertiary structure. Though various significant efforts have been made to find a fast and effective computational approach to address this problem, it is still a challenging topic in the field of bioinformatics. The position-specific score matrix (PSSM) profile has been shown to provide a useful source of information for improving the prediction performance of protein structural class. However, this information has not been adequately explored. To this end, in this study, we present a feature extraction technique which is based on gapped-dipeptides composition computed directly from PSSM. Then, a careful feature selection technique is performed based on support vector machine-recursive feature elimination (SVM-RFE). These optimal features are selected to construct a final predictor. The results of jackknife tests on four working datasets show that our method obtains satisfactory prediction accuracies by extracting features solely based on PSSM and could serve as a very promising tool to predict protein structural class.

  18. Synergistic combination of clinical and imaging features predicts abnormal imaging patterns of pulmonary infections

    PubMed Central

    Bagci, Ulas; Jaster-Miller, Kirsten; Olivier, Kenneth N.; Yao, Jianhua; Mollura, Daniel J.

    2013-01-01

    We designed and tested a novel hybrid statistical model that accepts radiologic image features and clinical variables, and integrates this information in order to automatically predict abnormalities in chest computed-tomography (CT) scans and identify potentially important infectious disease biomarkers. In 200 patients, 160 with various pulmonary infections and 40 healthy controls, we extracted 34 clinical variables from laboratory tests and 25 textural features from CT images. From the CT scans, pleural effusion (PE), linear opacity (or thickening) (LT), tree-in-bud (TIB), pulmonary nodules, ground glass opacity (GGO), and consolidation abnormality patterns were analyzed and predicted through clinical, textural (imaging), or combined attributes. The presence and severity of each abnormality pattern was validated by visual analysis of the CT scans. The proposed biomarker identification system included two important steps: (i) a coarse identification of an abnormal imaging pattern by adaptively selected features (AmRMR), and (ii) a fine selection of the most important features from the previous step, and assigning them as biomarkers, depending on the prediction accuracy. Selected biomarkers were used to classify normal and abnormal patterns by using a boosted decision tree (BDT) classifier. For all abnormal imaging patterns, an average prediction accuracy of 76.15% was obtained. Experimental results demonstrated that our proposed biomarker identification approach is promising and may advance the data processing in clinical pulmonary infection research and diagnostic techniques. PMID:23930819

  19. Critical Features Predicting Sustained Implementation of School-Wide Positive Behavioral Interventions and Supports

    ERIC Educational Resources Information Center

    Mathews, Susanna; McIntosh, Kent; Frank, Jennifer L.; May, Seth L.

    2014-01-01

    The current study explored the extent to which a common measure of perceived implementation of critical features of Positive Behavioral Interventions and Supports (PBIS) predicted fidelity of implementation 3 years later. Respondents included school personnel from 261 schools across the United States implementing PBIS. School teams completed the…

  20. Critical Features Predicting Sustained Implementation of School-Wide Positive Behavioral Interventions and Supports

    ERIC Educational Resources Information Center

    Mathews, Susanna; McIntosh, Kent; Frank, Jennifer L.; May, Seth L.

    2014-01-01

    The current study explored the extent to which a common measure of perceived implementation of critical features of Positive Behavioral Interventions and Supports (PBIS) predicted fidelity of implementation 3 years later. Respondents included school personnel from 261 schools across the United States implementing PBIS. School teams completed the…

  1. Critical Features Predicting Sustained Implementation of School-Wide Positive Behavior Support

    ERIC Educational Resources Information Center

    Mathews, Susanna; McIntosh, Kent; Frank, Jennifer; May, Seth

    2014-01-01

    The current study explored the extent to which a common measure of perceived implementation of critical features of School-wide Positive Behavior Support (SWPBS) predicted fidelity of implementation 3 years later. Respondents included school personnel from 261 schools across the United States implementing SWPBS. School teams completed the…

  2. Apocrine carcinoma of the breast: A brief update on the molecular features and targetable biomarkers

    PubMed Central

    Vranic, Semir; Feldman, Rebecca; Gatalica, Zoran

    2017-01-01

    Apocrine carcinoma of the breast is a rare, primary breast cancer characterized by the apocrine morphology, estrogen receptor-negative and androgen receptor-positive profile with a frequent overexpression of Her-2/neu protein (~30%). Apart from the Her-2/neu target, advanced and/or metastatic apocrine carcinomas have limited treatment options. In this review, we briefly describe and discuss the molecular features and new theranostic biomarkers for this rare mammary malignancy. The importance of comprehensive profiling is highlighted due to synergistic and potentially antagonistic molecular events in the individual patients. PMID:28027454

  3. Combining PSSM and physicochemical feature for protein structure prediction with support vector machine

    NASA Astrophysics Data System (ADS)

    Kurniawan, I.; Haryanto, T.; Hasibuan, L. S.; Agmalaro, M. A.

    2017-05-01

    Protein is one of the giant biomolecules that act as the main component of the organism. Protein is formed from building blocks namely amino acids. Hierarchically, the structure of protein is divided into four levels: primary, secondary, tertiary, and quaternary structure. Protein secondary structure is formed by amino acid sequences that would form three-dimensional structures and have information about the tertiary structure and function of proteins. This study used 277,389 protein residue data from enzyme categories. Position-specific scoring matrix (PSSM) profile and physicochemical are used for features. This study developed support vector machine models to predict the protein secondary structure by recognizing patterns of amino acid sequences. The Q3 results showed that the best scores obtained are 93.16% from the dataset that has 260 features with the radial kernel. Combining PSSM and physicochemical feature additions can be used for prediction.

  4. Introduction: feature issue on optical molecular probes, imaging, and drug delivery.

    PubMed

    Campagnola, Paul; French, Paul M W; Georgakoudi, Irene; Mycek, Mary-Ann

    2014-02-01

    The editors introduce the Biomedical Optics Express feature issue "Optical Molecular Probes, Imaging, and Drug Delivery," which is associated with a Topical Meeting of the same name held at the 2013 Optical Society of America (OSA) Optics in the Life Sciences Congress in Waikoloa Beach, Hawaii, April 14-18, 2013. The international meeting focused on the convergence of optical physics, photonics technology, nanoscience, and photochemistry with drug discovery and clinical medicine. Papers in this feature issue are representative of meeting topics, including advances in microscopy, nanotechnology, and optics in cancer research.

  5. Specific molecular signatures predict decitabine response in chronic myelomonocytic leukemia.

    PubMed

    Meldi, Kristen; Qin, Tingting; Buchi, Francesca; Droin, Nathalie; Sotzen, Jason; Micol, Jean-Baptiste; Selimoglu-Buet, Dorothée; Masala, Erico; Allione, Bernardino; Gioia, Daniela; Poloni, Antonella; Lunghi, Monia; Solary, Eric; Abdel-Wahab, Omar; Santini, Valeria; Figueroa, Maria E

    2015-05-01

    Myelodysplastic syndromes and chronic myelomonocytic leukemia (CMML) are characterized by mutations in genes encoding epigenetic modifiers and aberrant DNA methylation. DNA methyltransferase inhibitors (DMTis) are used to treat these disorders, but response is highly variable, with few means to predict which patients will benefit. Here, we examined baseline differences in mutations, DNA methylation, and gene expression in 40 CMML patients who were responsive or resistant to decitabine (DAC) in order to develop a molecular means of predicting response at diagnosis. While somatic mutations did not differentiate responders from nonresponders, we identified 167 differentially methylated regions (DMRs) of DNA at baseline that distinguished responders from nonresponders using next-generation sequencing. These DMRs were primarily localized to nonpromoter regions and overlapped with distal regulatory enhancers. Using the methylation profiles, we developed an epigenetic classifier that accurately predicted DAC response at the time of diagnosis. Transcriptional analysis revealed differences in gene expression at diagnosis between responders and nonresponders. In responders, the upregulated genes included those that are associated with the cell cycle, potentially contributing to effective DAC incorporation. Treatment with CXCL4 and CXCL7, which were overexpressed in nonresponders, blocked DAC effects in isolated normal CD34+ and primary CMML cells, suggesting that their upregulation contributes to primary DAC resistance.

  6. Prediction of protein-protein interaction sites by means of ensemble learning and weighted feature descriptor.

    PubMed

    Du, Xiuquan; Sun, Shiwei; Hu, Changlin; Li, Xinrui; Xia, Junfeng

    2016-05-01

    Reliable prediction of protein-protein interaction sites is an important goal in the field of bioinformatics. Many computational methods have been explored for the large-scale prediction of protein-protein interaction sites based on various data types, including protein sequence, structural and genomic data. Although much progress has been achieved in recent years, the problem has not yet been satisfactorily solved. In this work, we presented an efficient approach that uses ensemble learning algorithm with weighted feature descriptor (EL-WFD) to predict protein-protein interaction sites. Moreover, weighted feature descriptor was designed to describe the distance influence of neighboring residues on interaction sites. The results on two dataset (Hetero and Homo), show that the proposed method yields a satisfactory accuracy with 83.8 % recall and 96.3 % precision on the Hetero dataset and 84.2 % recall and 96.3 % precision on the Homo dataset, respectively. In both datasets, our method tend to obtain high Mathews correlation coefficient compared with state-of-the-art technique random forest method. The experimental results show that the EL-WFD method is quite effective in predicting protein-protein interaction sites. The novel weighted feature descriptor was proved to be promising in discovering interaction sites. Overall, the proposed method can be considered as a new powerful tool for predicting protein-protein interaction sites with excellence performance.

  7. Music-induced emotions can be predicted from a combination of brain activity and acoustic features.

    PubMed

    Daly, Ian; Williams, Duncan; Hallowell, James; Hwang, Faustina; Kirke, Alexis; Malik, Asad; Weaver, James; Miranda, Eduardo; Nasuto, Slawomir J

    2015-12-01

    It is widely acknowledged that music can communicate and induce a wide range of emotions in the listener. However, music is a highly-complex audio signal composed of a wide range of complex time- and frequency-varying components. Additionally, music-induced emotions are known to differ greatly between listeners. Therefore, it is not immediately clear what emotions will be induced in a given individual by a piece of music. We attempt to predict the music-induced emotional response in a listener by measuring the activity in the listeners electroencephalogram (EEG). We combine these measures with acoustic descriptors of the music, an approach that allows us to consider music as a complex set of time-varying acoustic features, independently of any specific music theory. Regression models are found which allow us to predict the music-induced emotions of our participants with a correlation between the actual and predicted responses of up to r=0.234,p<0.001. This regression fit suggests that over 20% of the variance of the participant's music induced emotions can be predicted by their neural activity and the properties of the music. Given the large amount of noise, non-stationarity, and non-linearity in both EEG and music, this is an encouraging result. Additionally, the combination of measures of brain activity and acoustic features describing the music played to our participants allows us to predict music-induced emotions with significantly higher accuracies than either feature type alone (p<0.01).

  8. Robust feature generation for protein subchloroplast location prediction with a weighted GO transfer model.

    PubMed

    Li, Xiaomei; Wu, Xindong; Wu, Gongqing

    2014-04-21

    Chloroplasts are crucial organelles of green plants and eukaryotic algae since they conduct photosynthesis. Predicting the subchloroplast location of a protein can provide important insights for understanding its biological functions. The performance of subchloroplast location prediction algorithms often depends on deriving predictive and succinct features from genomic and proteomic data. In this work, a novel weighted Gene Ontology (GO) transfer model is proposed to generate discriminating features from sequence data and GO Categories. This model contains two components. First, we transfer the GO terms of the homologous protein, and then assign the bit-score as weights to GO features. Second, we employ term-selection methods to determine weights for GO terms. This model is capable of improving prediction accuracy due to the tolerance of the noise derived from homolog knowledge transfer. The proposed weighted GO transfer method based on bit-score and a logarithmic transformation of CHI-square (WS-LCHI) performs better than the baseline models, and also outperforms the four off-the-shelf subchloroplast prediction methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Prediction of acetylcholinesterase inhibitors and characterization of correlative molecular descriptors by machine learning methods.

    PubMed

    Lv, Wei; Xue, Ying

    2010-03-01

    Acetylcholinesterase (AChE) has become an important drug target and its inhibitors have proved useful in the symptomatic treatment of Alzheimer's disease. This work explores several machine learning methods (support vector machine (SVM), k-nearest neighbor (k-NN), and C4.5 decision tree (C4.5 DT)) for predicting AChE inhibitors (AChEIs). A feature selection method is used for improving prediction accuracy and selecting molecular descriptors responsible for distinguishing AChEIs and non-AChEIs. The prediction accuracies are 76.3% approximately 88.0% for AChEIs and 74.3% approximately 79.6% for non-AChEIs based on the three kinds of machine learning methods. This work suggests that machine learning methods such as SVM are facilitating for predicting AChEIs potential of unknown sets of compounds and for exhibiting the molecular descriptors associated with AChEIs. Copyright (c) 2009 Elsevier Masson SAS. All rights reserved.

  10. Relationship of carbohydrate molecular spectroscopic features in combined feeds to carbohydrate utilization and availability in ruminants

    NASA Astrophysics Data System (ADS)

    Zhang, Xuewei; Yu, Peiqiang

    To date, there is no study on the relationship between carbohydrate (CHO) molecular structures and nutrient availability of combined feeds in ruminants. The objective of this study was to use molecular spectroscopy to reveal the relationship between CHO molecular spectral profiles (in terms of functional groups (biomolecular, biopolymer) spectral peak area and height intensity) and CHO chemical profiles, CHO subfractions, energy values, and CHO rumen degradation kinetics of combined feeds of hulless barley with pure wheat dried distillers grains with solubles (DDGS) at five different combination ratios (hulless barley to pure wheat DDGS: 100:0, 75:25, 50:50, 25:75, 0:100). The molecular spectroscopic parameters assessed included: lignin biopolymer molecular spectra profile (peak area and height, region and baseline: ca. 1539-1504 cm-1); structural carbohydrate (STCHO, peaks area region and baseline: ca. 1485-1186 cm-1) mainly associated with hemi- and cellulosic compounds; cellulosic materials peak area (centered at ca. 1240 cm-1 with region and baseline: ca. 1272-1186 cm-1); total carbohydrate (CHO, peaks area region and baseline: ca. 1186-946 cm-1). The results showed that the functional groups (biomolecular, biopolymer) in the combined feeds are sensitive to the changes of carbohydrate chemical and nutrient profiles. The changes of the CHO molecular spectroscopic features in the combined feeds were highly correlated with CHO chemical profiles, CHO subfractions, in situ CHO rumen degradation kinetics and fermentable organic matter supply. Further study is needed to investigate possibility of using CHO molecular spectral features as a predictor to estimate nutrient availability in combined feeds for animals and quantify their relationship.

  11. Relationship of carbohydrate molecular spectroscopic features in combined feeds to carbohydrate utilization and availability in ruminants.

    PubMed

    Zhang, Xuewei; Yu, Peiqiang

    2012-06-15

    To date, there is no study on the relationship between carbohydrate (CHO) molecular structures and nutrient availability of combined feeds in ruminants. The objective of this study was to use molecular spectroscopy to reveal the relationship between CHO molecular spectral profiles (in terms of functional groups (biomolecular, biopolymer) spectral peak area and height intensity) and CHO chemical profiles, CHO subfractions, energy values, and CHO rumen degradation kinetics of combined feeds of hulless barley with pure wheat dried distillers grains with solubles (DDGS) at five different combination ratios (hulless barley to pure wheat DDGS: 100:0, 75:25, 50:50, 25:75, 0:100). The molecular spectroscopic parameters assessed included: lignin biopolymer molecular spectra profile (peak area and height, region and baseline: ca. 1539-1504 cm(-1)); structural carbohydrate (STCHO, peaks area region and baseline: ca. 1485-1186 cm(-1)) mainly associated with hemi- and cellulosic compounds; cellulosic materials peak area (centered at ca. 1240 cm(-1) with region and baseline: ca. 1272-1186 cm(-1)); total carbohydrate (CHO, peaks area region and baseline: ca. 1186-946 cm(-1)). The results showed that the functional groups (biomolecular, biopolymer) in the combined feeds are sensitive to the changes of carbohydrate chemical and nutrient profiles. The changes of the CHO molecular spectroscopic features in the combined feeds were highly correlated with CHO chemical profiles, CHO subfractions, in situ CHO rumen degradation kinetics and fermentable organic matter supply. Further study is needed to investigate possibility of using CHO molecular spectral features as a predictor to estimate nutrient availability in combined feeds for animals and quantify their relationship. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Online prediction of respiratory motion: multidimensional processing with low-dimensional feature learning

    NASA Astrophysics Data System (ADS)

    Ruan, Dan; Keall, Paul

    2010-06-01

    Accurate real-time prediction of respiratory motion is desirable for effective motion management in radiotherapy for lung tumor targets. Recently, nonparametric methods have been developed and their efficacy in predicting one-dimensional respiratory-type motion has been demonstrated. To exploit the correlation among various coordinates of the moving target, it is natural to extend the 1D method to multidimensional processing. However, the amount of learning data required for such extension grows exponentially with the dimensionality of the problem, a phenomenon known as the 'curse of dimensionality'. In this study, we investigate a multidimensional prediction scheme based on kernel density estimation (KDE) in an augmented covariate-response space. To alleviate the 'curse of dimensionality', we explore the intrinsic lower dimensional manifold structure and utilize principal component analysis (PCA) to construct a proper low-dimensional feature space, where kernel density estimation is feasible with the limited training data. Interestingly, the construction of this lower dimensional representation reveals a useful decomposition of the variations in respiratory motion into the contribution from semiperiodic dynamics and that from the random noise, as it is only sensible to perform prediction with respect to the former. The dimension reduction idea proposed in this work is closely related to feature extraction used in machine learning, particularly support vector machines. This work points out a pathway in processing high-dimensional data with limited training instances, and this principle applies well beyond the problem of target-coordinate-based respiratory-based prediction. A natural extension is prediction based on image intensity directly, which we will investigate in the continuation of this work. We used 159 lung target motion traces obtained with a Synchrony respiratory tracking system. Prediction performance of the low-dimensional feature learning

  13. Predicting and explaining the movement of mesoscale oceanographic features using CLIPS

    NASA Technical Reports Server (NTRS)

    Bridges, Susan; Chen, Liang-Chun; Lybanon, Matthew

    1994-01-01

    The Naval Research Laboratory has developed an oceanographic expert system that describes the evolution of mesoscale features in the Gulf Stream region of the northwest Atlantic Ocean. These features include the Gulf Stream current and the warm and cold core eddies associated with the Gulf Stream. An explanation capability was added to the eddy prediction component of the expert system in order to allow the system to justify the reasoning process it uses to make predictions. The eddy prediction and explanation components of the system have recently been redesigned and translated from OPS83 to C and CLIPS and the new system is called WATE (Where Are Those Eddies). The new design has improved the system's readability, understandability and maintainability and will also allow the system to be incorporated into the Semi-Automated Mesoscale Analysis System which will eventually be embedded into the Navy's Tactical Environmental Support System, Third Generation, TESS(3).

  14. EEG background features that predict outcome in term neonates with hypoxic ischaemic encephalopathy: A structured review.

    PubMed

    Awal, Md Abdul; Lai, Melissa M; Azemi, Ghasem; Boashash, Boualem; Colditz, Paul B

    2016-01-01

    Hypoxic ischaemic encephalopathy is a significant cause of mortality and morbidity in the term infant. Electroencephalography (EEG) is a useful tool in the assessment of newborns with HIE. This systematic review of published literature identifies those background features of EEG in term neonates with HIE that best predict neurodevelopmental outcome. A literature search was conducted using the PubMed, EMBASE and CINAHL databases from January 1960 to April 2014. Studies included in the review described recorded EEG background features, neurodevelopmental outcomes at a minimum age of 12 months and were published in English. Pooled sensitivities and specificities of EEG background features were calculated and meta-analyses were performed for each background feature. Of the 860 articles generated by the initial search strategy, 52 studies were identified as potentially relevant. Twenty-one studies were excluded as they did not distinguish between different abnormal background features, leaving 31 studies from which data were extracted for the meta-analysis. The most promising neonatal EEG features are: burst suppression (sensitivity 0.87 [95% CI (0.78-0.92)]; specificity 0.82 [95% CI (0.72-0.88)]), low voltage (sensitivity 0.92 [95% CI (0.72-0.97)]; specificity 0.99 [95% CI (0.88-1.0)]), and flat trace (sensitivity 0.78 [95% CI (0.58-0.91)]; specificity 0.99 [95% CI (0.88-1.0)]). Burst suppression, low voltage and flat trace in the EEG of term neonates with HIE most accurately predict long term neurodevelopmental outcome. This structured review and meta-analysis provides quality evidence of the background EEG features that best predict neurodevelopmental outcome. Copyright © 2015 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  15. Genomic Signal Processing: Predicting Basic Molecular Biological Principles

    NASA Astrophysics Data System (ADS)

    Alter, Orly

    2005-03-01

    Advances in high-throughput technologies enable acquisition of different types of molecular biological data, monitoring the flow of biological information as DNA is transcribed to RNA, and RNA is translated to proteins, on a genomic scale. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment and drug development. Recently we described data-driven models for genome-scale molecular biological data, which use singular value decomposition (SVD) and the comparative generalized SVD (GSVD). Now we describe an integrative data-driven model, which uses pseudoinverse projection (1). We also demonstrate the predictive power of these matrix algebra models (2). The integrative pseudoinverse projection model formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the ``basis'' set. The mathematical variables of this integrative model, the pseudoinverse correlation patterns that are uncovered in the data, represent independent processes and corresponding cellular states (such as observed genome-wide effects of known regulators or transcription factors, the biological components of the cellular machinery that generate the genomic signals, and measured samples in which these regulators or transcription factors are over- or underactive). Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis, and gives a global picture of the correlations and possibly also causal coordination of

  16. Quantitative structure-property relationships for predicting Henry's law constant from molecular structure.

    PubMed

    Dearden, John C; Schüürmann, Gerrit

    2003-08-01

    Various models are available for the prediction of Henry's law constant (H) or the air-water partition coefficient (Kaw), its dimensionless counterpart. Incremental methods are based on structural features such as atom types, bond types, and local structural environments; other regression models employ physicochemical properties, structural descriptors such as connectivity indices, and descriptors reflecting the electronic structure. There are also methods to calculate H from the ratio of vapor pressure (p(v)) and water solubility (S(w)) that in turn can be estimated from molecular structure, and quantum chemical continuum-solvation models to predict H via the solvation-free energy (deltaG(s)). This review is confined to methods that calculate H from molecular structure without experimental information and covers more than 40 methods published in the last 26 years. For a subset of eight incremental methods and four continuum-solvation models, a comparative analysis of their prediction performance is made using a test set of 700 compounds that includes a significant number of more complex and drug-like chemical structures. The results reveal substantial differences in the application range as well as in the prediction capability, a general decrease in prediction performance with decreasing H, and surprisingly large individual prediction errors, which are particularly striking for some quantum chemical schemes. The overall best-performing method appears to be the bond contribution method as implemented in the HENRYWIN software package, yielding a predictive squared correlation coefficient (q2) of 0.87 and a standard error of 1.03 log units for the test set.

  17. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding.

    PubMed

    Chiu, Tsu-Pei; Comoglio, Federico; Zhou, Tianyin; Yang, Lin; Paro, Renato; Rohs, Remo

    2016-04-15

    DNAshapeR predicts DNA shape features in an ultra-fast, high-throughput manner from genomic sequencing data. The package takes either nucleotide sequence or genomic coordinates as input and generates various graphical representations for visualization and further analysis. DNAshapeR further encodes DNA sequence and shape features as user-defined combinations of k-mer and DNA shape features. The resulting feature matrices can be readily used as input of various machine learning software packages for further modeling studies. The DNAshapeR software package was implemented in the statistical programming language R and is freely available through the Bioconductor project at https://www.bioconductor.org/packages/devel/bioc/html/DNAshapeR.html and at the GitHub developer site, http://tsupeichiu.github.io/DNAshapeR/ CONTACT: rohs@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  18. Feature extraction using molecular planes for fuzzy relational clustering of a flexible dopamine reuptake inhibitor.

    PubMed

    Banerjee, Amit; Misra, Milind; Pai, Deepa; Shih, Liang-Yu; Woodley, Rohan; Lu, Xiang-Jun; Srinivasan, A R; Olson, Wilma K; Davé, Rajesh N; Venanzi, Carol A

    2007-01-01

    Six rigid-body parameters (Shift, Slide, Rise, Tilt, Roll, Twist) are commonly used to describe the relative displacement and orientation of successive base pairs in a nucleic acid structure. The present work adapts this approach to describe the relative displacement and orientation of any two planes in an arbitrary molecule-specifically, planes which contain important pharmacophore elements. Relevant code from the 3DNA software package (Nucleic Acids Res. 2003, 31, 5108-5121) was generalized to treat molecular fragments other than DNA bases as input for the calculation of the corresponding rigid-body (or "planes") parameters. These parameters were used to construct feature vectors for a fuzzy relational clustering study of over 700 conformations of a flexible analogue of the dopamine reuptake inhibitor, GBR 12909. Several cluster validity measures were used to determine the optimal number of clusters. Translational (Shift, Slide, Rise) rather than rotational (Tilt, Roll, Twist) features dominate clustering based on planes that are relatively far apart, whereas both types of features are important to clustering when the pair of planes are close by. This approach was able to classify the data set of molecular conformations into groups and to identify representative conformers for use as template conformers in future Comparative Molecular Field Analysis studies of GBR 12909 analogues. The advantage of using the planes parameters, rather than the combination of atomic coordinates and angles between molecular planes used in our previous fuzzy relational clustering of the same data set (J. Chem. Inf. Model. 2005, 45, 610-623), is that the present clustering results are independent of molecular superposition and the technique is able to identify clusters in the molecule considered as a whole. This approach is easily generalizable to any two planes in any molecule.

  19. Phyllodes Tumor of the Breast: Histopathologic Features, Differential Diagnosis, and Molecular/Genetic Updates.

    PubMed

    Zhang, Yanhong; Kleer, Celina G

    2016-07-01

    -Phyllodes tumor (PT) of the breast is a rare fibroepithelial neoplasm with risks of local recurrence and uncommon metastases. The classification proposed by the World Health Organization for PTs into benign, borderline, and malignant is based on a combination of several histologic features. The differential diagnosis between PT and fibroadenoma and the histologic grading of PT remain challenging. In addition, the molecular pathogenesis of PT is largely unknown. -To provide an updated overview of pathologic features, diagnostic terminology, and molecular alterations of PT. -Current English literature related to PT of the breast. -Phyllodes tumor shows a wide spectrum of morphology. There are no clearly distinct boundaries between PT and fibroadenoma. Strict histologic assessment of a combination of histologic features with classification can help to achieve the correct diagnosis and provide useful clinical information. The genomic landscapes of PT generated from genomic sequencing provide insights into the molecular pathogenesis of PT and help to improve diagnostic accuracy and identify potential drug targets in malignant PT.

  20. Computer-aided breast MR image feature analysis for prediction of tumor response to chemotherapy

    SciTech Connect

    Aghaei, Faranak; Tan, Maxine; Liu, Hong; Zheng, Bin; Hollingsworth, Alan B.; Qian, Wei

    2015-11-15

    Purpose: To identify a new clinical marker based on quantitative kinetic image features analysis and assess its feasibility to predict tumor response to neoadjuvant chemotherapy. Methods: The authors assembled a dataset involving breast MR images acquired from 68 cancer patients before undergoing neoadjuvant chemotherapy. Among them, 25 patients had complete response (CR) and 43 had partial and nonresponse (NR) to chemotherapy based on the response evaluation criteria in solid tumors. The authors developed a computer-aided detection scheme to segment breast areas and tumors depicted on the breast MR images and computed a total of 39 kinetic image features from both tumor and background parenchymal enhancement regions. The authors then applied and tested two approaches to classify between CR and NR cases. The first one analyzed each individual feature and applied a simple feature fusion method that combines classification results from multiple features. The second approach tested an attribute selected classifier that integrates an artificial neural network (ANN) with a wrapper subset evaluator, which was optimized using a leave-one-case-out validation method. Results: In the pool of 39 features, 10 yielded relatively higher classification performance with the areas under receiver operating characteristic curves (AUCs) ranging from 0.61 to 0.78 to classify between CR and NR cases. Using a feature fusion method, the maximum AUC = 0.85 ± 0.05. Using the ANN-based classifier, AUC value significantly increased to 0.96 ± 0.03 (p < 0.01). Conclusions: This study demonstrated that quantitative analysis of kinetic image features computed from breast MR images acquired prechemotherapy has potential to generate a useful clinical marker in predicting tumor response to chemotherapy.

  1. Sharp landscape features and their role in predictive hydrology and geomorphology

    NASA Astrophysics Data System (ADS)

    Belmont, P.; Foufoula, E.; Passalacqua, P.

    2012-12-01

    Sharp topographic features often represent critical boundaries, or discontinuities, in hydrologic and geomorphic processes. Many such features are found in the proximity of actively evolving river channels (e.g., small knickpoints, steep channel banks, natural levees, scroll bars, and floodplain microtopography). While these features are often overlooked in hydro-geomorphic modeling, they can be used as indicators of channel dynamics. The increasing availability and quality of high-resolution topography data provides new opportunities to utilize these sharp features to interpret geomorphic processes and identify critical process-boundaries. However, sophisticated and automated techniques are needed for delineation and measurement of these sharp features over spatially extensive areas (i.e., entire channel-floodplain networks). Further, these features occur at scales much smaller than the grid scale of predictive hydrologic and morphodynamic models, raising the need for sub-grid scale parameterizations, or closures. In this work we present such techniques and use the Minnesota River Basin (MRB) as a prototype system to investigate the distinct assemblages of sharp features that exist in different geomorphic environments, connect them to the processes responsible for their formation, and propose ways for incorporating them in hydro-geomorphologic modeling. The MRB is a predominantly agricultural watershed with pervasive human modifications, an accelerating hydrologic cycle, a uniquely dynamic geologic history, and severe impairments for sediment and eutrophication. The MRB channel-floodplain network exhibits an exceptionally broad range of geomorphic environments, including rapidly meandering, incising, and aggrading reaches, making it an ideal location to study the linkages between form and process. Specific challenges are discussed in deriving sub-grid scale closures that implicitly account for these sharp features and developments needed for increased prediction

  2. Learning the High-Dimensional Immunogenomic Features That Predict Public and Private Antibody Repertoires.

    PubMed

    Greiff, Victor; Weber, Cédric R; Palme, Johannes; Bodenhofer, Ulrich; Miho, Enkelejda; Menzel, Ulrike; Reddy, Sai T

    2017-09-18

    Recent studies have revealed that immune repertoires contain a substantial fraction of public clones, which may be defined as Ab or TCR clonal sequences shared across individuals. It has remained unclear whether public clones possess predictable sequence features that differentiate them from private clones, which are believed to be generated largely stochastically. This knowledge gap represents a lack of insight into the shaping of immune repertoire diversity. Leveraging a machine learning approach capable of capturing the high-dimensional compositional information of each clonal sequence (defined by CDR3), we detected predictive public clone and private clone-specific immunogenomic differences concentrated in CDR3's N1-D-N2 region, which allowed the prediction of public and private status with 80% accuracy in humans and mice. Our results unexpectedly demonstrate that public, as well as private, clones possess predictable high-dimensional immunogenomic features. Our support vector machine model could be trained effectively on large published datasets (3 million clonal sequences) and was sufficiently robust for public clone prediction across individuals and studies prepared with different library preparation and high-throughput sequencing protocols. In summary, we have uncovered the existence of high-dimensional immunogenomic rules that shape immune repertoire diversity in a predictable fashion. Our approach may pave the way for the construction of a comprehensive atlas of public mouse and human immune repertoires with potential applications in rational vaccine design and immunotherapeutics. Copyright © 2017 by The American Association of Immunologists, Inc.

  3. Systems Medicine: from molecular features and models to the clinic in COPD

    PubMed Central

    2014-01-01

    Background and hypothesis Chronic Obstructive Pulmonary Disease (COPD) patients are characterized by heterogeneous clinical manifestations and patterns of disease progression. Two major factors that can be used to identify COPD subtypes are muscle dysfunction/wasting and co-morbidity patterns. We hypothesized that COPD heterogeneity is in part the result of complex interactions between several genes and pathways. We explored the possibility of using a Systems Medicine approach to identify such pathways, as well as to generate predictive computational models that may be used in clinic practice. Objective and method Our overarching goal is to generate clinically applicable predictive models that characterize COPD heterogeneity through a Systems Medicine approach. To this end we have developed a general framework, consisting of three steps/objectives: (1) feature identification, (2) model generation and statistical validation, and (3) application and validation of the predictive models in the clinical scenario. We used muscle dysfunction and co-morbidity as test cases for this framework. Results In the study of muscle wasting we identified relevant features (genes) by a network analysis and generated predictive models that integrate mechanistic and probabilistic models. This allowed us to characterize muscle wasting as a general de-regulation of pathway interactions. In the co-morbidity analysis we identified relevant features (genes/pathways) by the integration of gene-disease and disease-disease associations. We further present a detailed characterization of co-morbidities in COPD patients that was implemented into a predictive model. In both use cases we were able to achieve predictive modeling but we also identified several key challenges, the most pressing being the validation and implementation into actual clinical practice. Conclusions The results confirm the potential of the Systems Medicine approach to study complex diseases and generate clinically relevant

  4. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features.

    PubMed

    Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua

    2017-02-01

    Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.

  5. Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features

    PubMed Central

    Stiglic, Gregor; Povalej Brzan, Petra; Fijacko, Nino; Wang, Fei; Delibasic, Boris; Kalousis, Alexandros; Obradovic, Zoran

    2015-01-01

    Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755–0.771) to 0.769 (95% CI: 0.761–0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression. PMID:26645087

  6. A Combination of Molecular Markers and Clinical Features Improve the Classification of Pancreatic Cysts

    PubMed Central

    Springer, Simeon; Wang, Yuxuan; Molin, Marco Dal; Masica, David L.; Jiao, Yuchen; Kinde, Isaac; Blackford, Amanda; Raman, Siva P.; Wolfgang, Christopher L.; Tomita, Tyler; Niknafs, Noushin; Douville, Christopher; Ptak, Janine; Dobbyn, Lisa; Allen, Peter J.; Klimstra, David S.; Schattner, Mark A.; Schmidt, C. Max; Yip-Schneider, Michele; Cummings, Oscar W.; Brand, Randall E.; Zeh, Herbert J.; Singhi, Aatur D.; Scarpa, Aldo; Salvia, Roberto; Malleo, Giuseppe; Zamboni, Giuseppe; Falconi, Massimo; Jang, Jin-Young; Kim, Sun-Whe; Kwon, Wooil; Hong, Seung-Mo; Song, Ki-Byung; Kim, Song Cheol; Swan, Niall; Murphy, Jean; Geoghegan, Justin; Brugge, William; Fernandez-Del Castillo, Carlos; Mino-Kenudson, Mari; Schulick, Richard; Edil, Barish H.; Adsay, Volkan; Paulino, Jorge; van Hooft, Jeanin; Yachida, Shinichi; Nara, Satoshi; Hiraoka, Nobuyoshi; Yamao, Kenji; Hijioka, Susuma; van der Merwe, Schalk; Goggins, Michael; Canto, Marcia Irene; Ahuja, Nita; Hirose, Kenzo; Makary, Martin; Weiss, Matthew J.; Cameron, John; Pittman, Meredith; Eshleman, James R.; Diaz, Luis A.; Papadopoulos, Nickolas; Kinzler, Kenneth W.; Karchin, Rachel; Hruban, Ralph H.; Vogelstein, Bert; Lennon, Anne Marie

    2016-01-01

    Background & Aims The management of pancreatic cysts poses challenges to both patients and their physicians. We investigated whether a combination of molecular markers and clinical information could improve the classification of pancreatic cysts and management of patients. Methods We performed a multi-center, retrospective study of 130 patients with resected pancreatic cystic neoplasms (12 serous cystadenomas, 10 solid-pseudopapillary neoplasms, 12 mucinous cystic neoplasms, and 96 intraductal papillary mucinous neoplasms). Cyst fluid was analyzed to identify subtle mutations in genes known to be mutated in pancreatic cysts (BRAF, CDKN2A, CTNNB1, GNAS, KRAS, NRAS, PIK3CA, RNF43, SMAD4, TP53 and VHL); to identify loss of heterozygozity at CDKN2A, RNF43, SMAD4, TP53, and VHL tumor suppressor loci; and to identify aneuploidy. The analyses were performed using specialized technologies for implementing and interpreting massively parallel sequencing data acquisition. An algorithm was used to select markers that could classify cyst type and grade. The accuracy of the molecular markers were compared with that of clinical markers, and a combination of molecular and clinical markers. Results We identified molecular markers and clinical features that classified cyst type with 90%–100% sensitivity and 92%–98% specificity. The molecular marker panel correctly identified 67 of the 74 patients who did not require surgery, and could therefore reduce the number of unnecessary operations by 91%. Conclusions We identified a panel of molecular markers and clinical features that show promise for the accurate classification of cystic neoplasms of the pancreas and identification of cysts that require surgery. PMID:26253305

  7. Molecular Markers for Breast Cancer: Prediction on Tumor Behavior

    PubMed Central

    Banin Hirata, Bruna Karina; Oda, Julie Massayo Maeda; Losi Guembarovski, Roberta; Ariza, Carolina Batista; de Oliveira, Carlos Eduardo Coral; Watanabe, Maria Angelica Ehara

    2014-01-01

    Breast cancer is one of the most common cancers with greater than 1,300,000 cases and 450,000 deaths each year worldwide. The development of breast cancer involves a progression through intermediate stages until the invasive carcinoma and finally into metastatic disease. Given the variability in clinical progression, the identification of markers that could predict the tumor behavior is particularly important in breast cancer. The determination of tumor markers is a useful tool for clinical management in cancer patients, assisting in diagnostic, staging, evaluation of therapeutic response, detection of recurrence and metastasis, and development of new treatment modalities. In this context, this review aims to discuss the main tumor markers in breast carcinogenesis. The most well-established breast molecular markers with prognostic and/or therapeutic value like hormone receptors, HER-2 oncogene, Ki-67, and p53 proteins, and the genes for hereditary breast cancer will be presented. Furthermore, this review shows the new molecular targets in breast cancer: CXCR4, caveolin, miRNA, and FOXP3, as promising candidates for future development of effective and targeted therapies, also with lower toxicity. PMID:24591761

  8. Predicting the Occurrence of Cave-Inhabiting Fauna Based on Features of the Earth Surface Environment.

    PubMed

    Christman, Mary C; Doctor, Daniel H; Niemiller, Matthew L; Weary, David J; Young, John A; Zigler, Kirk S; Culver, David C

    2016-01-01

    One of the most challenging fauna to study in situ is the obligate cave fauna because of the difficulty of sampling. Cave-limited species display patchy and restricted distributions, but it is often unclear whether the observed distribution is a sampling artifact or a true restriction in range. Further, the drivers of the distribution could be local environmental conditions, such as cave humidity, or they could be associated with surface features that are surrogates for cave conditions. If surface features can be used to predict the distribution of important cave taxa, then conservation management is more easily obtained. We examined the hypothesis that the presence of major faunal groups of cave obligate species could be predicted based on features of the earth surface. Georeferenced records of cave obligate amphipods, crayfish, fish, isopods, beetles, millipedes, pseudoscorpions, spiders, and springtails within the area of Appalachian Landscape Conservation Cooperative in the eastern United States (Illinois to Virginia and New York to Alabama) were assigned to 20 x 20 km grid cells. Habitat suitability for these faunal groups was modeled using logistic regression with twenty predictor variables within each grid cell, such as percent karst, soil features, temperature, precipitation, and elevation. Models successfully predicted the presence of a group greater than 65% of the time (mean = 88%) for the presence of single grid cell endemics, and for all faunal groups except pseudoscorpions. The most common predictor variables were latitude, percent karst, and the standard deviation of the Topographic Position Index (TPI), a measure of landscape rugosity within each grid cell. The overall success of these models points to a number of important connections between the surface and cave environments, and some of these, especially soil features and topographic variability, suggest new research directions. These models should prove to be useful tools in predicting the

  9. Predicting the Occurrence of Cave-Inhabiting Fauna Based on Features of the Earth Surface Environment

    PubMed Central

    Doctor, Daniel H.; Niemiller, Matthew L.; Weary, David J.; Young, John A.; Zigler, Kirk S.

    2016-01-01

    One of the most challenging fauna to study in situ is the obligate cave fauna because of the difficulty of sampling. Cave-limited species display patchy and restricted distributions, but it is often unclear whether the observed distribution is a sampling artifact or a true restriction in range. Further, the drivers of the distribution could be local environmental conditions, such as cave humidity, or they could be associated with surface features that are surrogates for cave conditions. If surface features can be used to predict the distribution of important cave taxa, then conservation management is more easily obtained. We examined the hypothesis that the presence of major faunal groups of cave obligate species could be predicted based on features of the earth surface. Georeferenced records of cave obligate amphipods, crayfish, fish, isopods, beetles, millipedes, pseudoscorpions, spiders, and springtails within the area of Appalachian Landscape Conservation Cooperative in the eastern United States (Illinois to Virginia and New York to Alabama) were assigned to 20 x 20 km grid cells. Habitat suitability for these faunal groups was modeled using logistic regression with twenty predictor variables within each grid cell, such as percent karst, soil features, temperature, precipitation, and elevation. Models successfully predicted the presence of a group greater than 65% of the time (mean = 88%) for the presence of single grid cell endemics, and for all faunal groups except pseudoscorpions. The most common predictor variables were latitude, percent karst, and the standard deviation of the Topographic Position Index (TPI), a measure of landscape rugosity within each grid cell. The overall success of these models points to a number of important connections between the surface and cave environments, and some of these, especially soil features and topographic variability, suggest new research directions. These models should prove to be useful tools in predicting the

  10. BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection

    PubMed Central

    2011-01-01

    Background Bioluminescence is a process in which light is emitted by a living organism. Most creatures that emit light are sea creatures, but some insects, plants, fungi etc, also emit light. The biotechnological application of bioluminescence has become routine and is considered essential for many medical and general technological advances. Identification of bioluminescent proteins is more challenging due to their poor similarity in sequence. So far, no specific method has been reported to identify bioluminescent proteins from primary sequence. Results In this paper, we propose a novel predictive method that uses a Support Vector Machine (SVM) and physicochemical properties to predict bioluminescent proteins. BLProt was trained using a dataset consisting of 300 bioluminescent proteins and 300 non-bioluminescent proteins, and evaluated by an independent set of 141 bioluminescent proteins and 18202 non-bioluminescent proteins. To identify the most prominent features, we carried out feature selection with three different filter approaches, ReliefF, infogain, and mRMR. We selected five different feature subsets by decreasing the number of features, and the performance of each feature subset was evaluated. Conclusion BLProt achieves 80% accuracy from training (5 fold cross-validations) and 80.06% accuracy from testing. The performance of BLProt was compared with BLAST and HMM. High prediction accuracy and successful prediction of hypothetical proteins suggests that BLProt can be a useful approach to identify bioluminescent proteins from sequence information, irrespective of their sequence similarity. The BLProt software is available at http://www.inb.uni-luebeck.de/tools-demos/bioluminescent%20protein/BLProt PMID:21849049

  11. Predicting the occurrence of cave-inhabiting fauna based on features of the earth surface environment

    USGS Publications Warehouse

    Christman, Mary C.; Doctor, Daniel H.; Niemiller, Matthew L.; Weary, David J.; Young, John A.; Zigler, Kirk S.; Culver, David C.

    2016-01-01

    One of the most challenging fauna to study in situ is the obligate cave fauna because of the difficulty of sampling. Cave-limited species display patchy and restricted distributions, but it is often unclear whether the observed distribution is a sampling artifact or a true restriction in range. Further, the drivers of the distribution could be local environmental conditions, such as cave humidity, or they could be associated with surface features that are surrogates for cave conditions. If surface features can be used to predict the distribution of important cave taxa, then conservation management is more easily obtained. We examined the hypothesis that the presence of major faunal groups of cave obligate species could be predicted based on features of the earth surface. Georeferenced records of cave obligate amphipods, crayfish, fish, isopods, beetles, millipedes, pseudoscorpions, spiders, and springtails within the area of Appalachian Landscape Conservation Cooperative in the eastern United States (Illinois to Virginia and New York to Alabama) were assigned to 20 x 20 km grid cells. Habitat suitability for these faunal groups was modeled using logistic regression with twenty predictor variables within each grid cell, such as percent karst, soil features, temperature, precipitation, and elevation. Models successfully predicted the presence of a group greater than 65% of the time (mean = 88%) for the presence of single grid cell endemics, and for all faunal groups except pseudoscorpions. The most common predictor variables were latitude, percent karst, and the standard deviation of the Topographic Position Index (TPI), a measure of landscape rugosity within each grid cell. The overall success of these models points to a number of important connections between the surface and cave environments, and some of these, especially soil features and topographic variability, suggest new research directions. These models should prove to be useful tools in predicting the

  12. Morphological features of IFN-γ–stimulated mesenchymal stromal cells predict overall immunosuppressive capacity

    PubMed Central

    Klinker, Matthew W.; Marklein, Ross A.; Lo Surdo, Jessica L.; Wei, Cheng-Hong

    2017-01-01

    Human mesenchymal stromal cell (MSC) lines can vary significantly in their functional characteristics, and the effectiveness of MSC-based therapeutics may be realized by finding predictive features associated with MSC function. To identify features associated with immunosuppressive capacity in MSCs, we developed a robust in vitro assay that uses principal-component analysis to integrate multidimensional flow cytometry data into a single measurement of MSC-mediated inhibition of T-cell activation. We used this assay to correlate single-cell morphological data with overall immunosuppressive capacity in a cohort of MSC lines derived from different donors and manufacturing conditions. MSC morphology after IFN-γ stimulation significantly correlated with immunosuppressive capacity and accurately predicted the immunosuppressive capacity of MSC lines in a validation cohort. IFN-γ enhanced the immunosuppressive capacity of all MSC lines, and morphology predicted the magnitude of IFN-γ–enhanced immunosuppressive activity. Together, these data identify MSC morphology as a predictive feature of MSC immunosuppressive function. PMID:28283659

  13. MRI texture features as biomarkers to predict MGMT methylation status in glioblastomas

    PubMed Central

    Korfiatis, Panagiotis; Kline, Timothy L.; Coufalova, Lucie; Lachance, Daniel H.; Parney, Ian F.; Carter, Rickey E.; Buckner, Jan C.; Erickson, Bradley J.

    2016-01-01

    Purpose: Imaging biomarker research focuses on discovering relationships between radiological features and histological findings. In glioblastoma patients, methylation of the O6-methylguanine methyltransferase (MGMT) gene promoter is positively correlated with an increased effectiveness of current standard of care. In this paper, the authors investigate texture features as potential imaging biomarkers for capturing the MGMT methylation status of glioblastoma multiforme (GBM) tumors when combined with supervised classification schemes. Methods: A retrospective study of 155 GBM patients with known MGMT methylation status was conducted. Co-occurrence and run length texture features were calculated, and both support vector machines (SVMs) and random forest classifiers were used to predict MGMT methylation status. Results: The best classification system (an SVM-based classifier) had a maximum area under the receiver-operating characteristic (ROC) curve of 0.85 (95% CI: 0.78–0.91) using four texture features (correlation, energy, entropy, and local intensity) originating from the T2-weighted images, yielding at the optimal threshold of the ROC curve, a sensitivity of 0.803 and a specificity of 0.813. Conclusions: Results show that supervised machine learning of MRI texture features can predict MGMT methylation status in preoperative GBM tumors, thus providing a new noninvasive imaging biomarker. PMID:27277032

  14. Biased ART: a neural architecture that shifts attention toward previously disregarded features following an incorrect prediction.

    PubMed

    Carpenter, Gail A; Gaddam, Sai Chaitanya

    2010-04-01

    Memories in Adaptive Resonance Theory (ART) networks are based on matched patterns that focus attention on those portions of bottom-up inputs that match active top-down expectations. While this learning strategy has proved successful for both brain models and applications, computational examples show that attention to early critical features may later distort memory representations during online fast learning. For supervised learning, biased ARTMAP (bARTMAP) solves the problem of over-emphasis on early critical features by directing attention away from previously attended features after the system makes a predictive error. Small-scale, hand-computed analog and binary examples illustrate key model dynamics. Two-dimensional simulation examples demonstrate the evolution of bARTMAP memories as they are learned online. Benchmark simulations show that featural biasing also improves performance on large-scale examples. One example, which predicts movie genres and is based, in part, on the Netflix Prize database, was developed for this project. Both first principles and consistent performance improvements on all simulation studies suggest that featural biasing should be incorporated by default in all ARTMAP systems. Benchmark datasets and bARTMAP code are available from the CNS Technology Lab Website: http://techlab.bu.edu/bART/. Copyright 2009 Elsevier Ltd. All rights reserved.

  15. Prediction of hot spots in protein interfaces using a random forest model with hybrid features.

    PubMed

    Wang, Lin; Liu, Zhi-Ping; Zhang, Xiang-Sun; Chen, Luonan

    2012-03-01

    Prediction of hot spots in protein interfaces provides crucial information for the research on protein-protein interaction and drug design. Existing machine learning methods generally judge whether a given residue is likely to be a hot spot by extracting features only from the target residue. However, hot spots usually form a small cluster of residues which are tightly packed together at the center of protein interface. With this in mind, we present a novel method to extract hybrid features which incorporate a wide range of information of the target residue and its spatially neighboring residues, i.e. the nearest contact residue in the other face (mirror-contact residue) and the nearest contact residue in the same face (intra-contact residue). We provide a novel random forest (RF) model to effectively integrate these hybrid features for predicting hot spots in protein interfaces. Our method can achieve accuracy (ACC) of 82.4% and Matthew's correlation coefficient (MCC) of 0.482 in Alanine Scanning Energetics Database, and ACC of 77.6% and MCC of 0.429 in Binding Interface Database. In a comparison study, performance of our RF model exceeds other existing methods, such as Robetta, FOLDEF, KFC, KFC2, MINERVA and HotPoint. Of our hybrid features, three physicochemical features of target residues (mass, polarizability and isoelectric point), the relative side-chain accessible surface area and the average depth index of mirror-contact residues are found to be the main discriminative features in hot spots prediction. We also confirm that hot spots tend to form large contact surface areas between two interacting proteins. Source data and code are available at: http://www.aporc.org/doc/wiki/HotSpot.

  16. NetTurnP – Neural Network Prediction of Beta-turns by Use of Evolutionary Information and Predicted Protein Sequence Features

    PubMed Central

    Petersen, Bent; Lundegaard, Claus; Petersen, Thomas Nordahl

    2010-01-01

    β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC  = 0.50, Qtotal = 82.1%, sensitivity  = 75.6%, PPV  = 68.8% and AUC  = 0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17 – 0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. Conclusion The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences. PMID:21152409

  17. Accurate Prediction of One-Dimensional Protein Structure Features Using SPINE-X.

    PubMed

    Faraggi, Eshel; Kloczkowski, Andrzej

    2017-01-01

    Accurate prediction of protein secondary structure and other one-dimensional structure features is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. SPINE-X is a software package to predict secondary structure as well as accessible surface area and dihedral angles ϕ and ψ. For secondary structure SPINE-X achieves an accuracy of between 81 and 84 % depending on the dataset and choice of tests. The Pearson correlation coefficient for accessible surface area prediction is 0.75 and the mean absolute error from the ϕ and ψ dihedral angles are 20(∘) and 33(∘), respectively. The source code and a Linux executables for SPINE-X are available from Research and Information Systems at http://mamiris.com .

  18. Automatic feature template generation for maximum entropy based intonational phrase break prediction

    NASA Astrophysics Data System (ADS)

    Zhou, You

    2013-03-01

    The prediction of intonational phrase (IP) breaks is important for both the naturalness and intelligibility of Text-to- Speech (TTS) systems. In this paper, we propose a maximum entropy (ME) model to predict IP breaks from unrestricted text, and evaluate various keyword selection approaches in different domains. Furthermore, we design a hierarchical clustering algorithm for automatic generation of feature templates, which minimizes the need for human supervision during ME model training. Results of comparative experiments show that, for the task of IP break prediction, ME model obviously outperforms classification and regression tree (CART), log-likelihood ratio is the best scoring measure of keyword selection, compared with manual templates, templates automatically generated by our approach greatly improves the F-score of ME based IP break prediction, and significantly reduces the size of ME model.

  19. Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data

    PubMed Central

    Grissa, Dhouha; Pétéra, Mélanie; Brandolini, Marion; Napoli, Amedeo; Comte, Blandine; Pujos-Guillot, Estelle

    2016-01-01

    Untargeted metabolomics is a powerful phenotyping tool for better understanding biological mechanisms involved in human pathology development and identifying early predictive biomarkers. This approach, based on multiple analytical platforms, such as mass spectrometry (MS), chemometrics and bioinformatics, generates massive and complex data that need appropriate analyses to extract the biologically meaningful information. Despite various tools available, it is still a challenge to handle such large and noisy datasets with limited number of individuals without risking overfitting. Moreover, when the objective is focused on the identification of early predictive markers of clinical outcome, few years before occurrence, it becomes essential to use the appropriate algorithms and workflow to be able to discover subtle effects among this large amount of data. In this context, this work consists in studying a workflow describing the general feature selection process, using knowledge discovery and data mining methodologies to propose advanced solutions for predictive biomarker discovery. The strategy was focused on evaluating a combination of numeric-symbolic approaches for feature selection with the objective of obtaining the best combination of metabolites producing an effective and accurate predictive model. Relying first on numerical approaches, and especially on machine learning methods (SVM-RFE, RF, RF-RFE) and on univariate statistical analyses (ANOVA), a comparative study was performed on an original metabolomic dataset and reduced subsets. As resampling method, LOOCV was applied to minimize the risk of overfitting. The best k-features obtained with different scores of importance from the combination of these different approaches were compared and allowed determining the variable stabilities using Formal Concept Analysis. The results revealed the interest of RF-Gini combined with ANOVA for feature selection as these two complementary methods allowed selecting the 48

  20. Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data.

    PubMed

    Grissa, Dhouha; Pétéra, Mélanie; Brandolini, Marion; Napoli, Amedeo; Comte, Blandine; Pujos-Guillot, Estelle

    2016-01-01

    Untargeted metabolomics is a powerful phenotyping tool for better understanding biological mechanisms involved in human pathology development and identifying early predictive biomarkers. This approach, based on multiple analytical platforms, such as mass spectrometry (MS), chemometrics and bioinformatics, generates massive and complex data that need appropriate analyses to extract the biologically meaningful information. Despite various tools available, it is still a challenge to handle such large and noisy datasets with limited number of individuals without risking overfitting. Moreover, when the objective is focused on the identification of early predictive markers of clinical outcome, few years before occurrence, it becomes essential to use the appropriate algorithms and workflow to be able to discover subtle effects among this large amount of data. In this context, this work consists in studying a workflow describing the general feature selection process, using knowledge discovery and data mining methodologies to propose advanced solutions for predictive biomarker discovery. The strategy was focused on evaluating a combination of numeric-symbolic approaches for feature selection with the objective of obtaining the best combination of metabolites producing an effective and accurate predictive model. Relying first on numerical approaches, and especially on machine learning methods (SVM-RFE, RF, RF-RFE) and on univariate statistical analyses (ANOVA), a comparative study was performed on an original metabolomic dataset and reduced subsets. As resampling method, LOOCV was applied to minimize the risk of overfitting. The best k-features obtained with different scores of importance from the combination of these different approaches were compared and allowed determining the variable stabilities using Formal Concept Analysis. The results revealed the interest of RF-Gini combined with ANOVA for feature selection as these two complementary methods allowed selecting the 48

  1. Using the Personality Assessment Inventory Antisocial and Borderline Features Scales to Predict Behavior Change.

    PubMed

    Penson, Brittany N; Ruchensky, Jared R; Morey, Leslie C; Edens, John F

    2016-11-01

    A substantial amount of research has examined the developmental trajectory of antisocial behavior and, in particular, the relationship between antisocial behavior and maladaptive personality traits. However, research typically has not controlled for previous behavior (e.g., past violence) when examining the utility of personality measures, such as self-report scales of antisocial and borderline traits, in predicting future behavior (e.g., subsequent violence). Examination of the potential interactive effects of measures of both antisocial and borderline traits also is relatively rare in longitudinal research predicting adverse outcomes. The current study utilizes a large sample of youthful offenders ( N = 1,354) from the Pathways to Desistance project to examine the separate effects of the Personality Assessment Inventory Antisocial Features (ANT) and Borderline Features (BOR) scales in predicting future offending behavior as well as trends in other negative outcomes (e.g., substance abuse, violence, employment difficulties) over a 1-year follow-up period. In addition, an ANT × BOR interaction term was created to explore the predictive effects of secondary psychopathy. ANT and BOR both explained unique variance in the prediction of various negative outcomes even after controlling for past indicators of those same behaviors during the preceding year.

  2. Prediction of the severity of obstructive sleep apnea by anthropometric features via support vector machine

    PubMed Central

    Liu, Wen-Te; Wu, Hau-tieng; Juang, Jer-Nan; Wisniewski, Adam; Lee, Hsin-Chien; Wu, Dean; Lo, Yu-Lun

    2017-01-01

    To develop an applicable prediction for obstructive sleep apnea (OSA) is still a challenge in clinical practice. We apply a modern machine learning method, the support vector machine to establish a predicting model for the severity of OSA. The support vector machine was applied to build up a prediction model based on three anthropometric features (neck circumference, waist circumference, and body mass index) and age on the first database. The established model was then valided independently on the second database. The anthropometric features and age were combined to generate powerful predictors for OSA. Following the common practice, we predict if a subject has the apnea-hypopnea index greater then 15 or not as well as 30 or not. Dividing by genders and age, for the AHI threhosld 15 (respectively 30), the cross validation and testing accuracy for the prediction were 85.3% and 76.7% (respectively 83.7% and 75.5%) in young female, while the negative likelihood ratio for the AHI threhosld 15 (respectively 30) for the cross validation and testing were 0.2 and 0.32 (respectively 0.06 and 0.1) in young female. The more accurate results with lower negative likelihood ratio in the younger patients, especially the female subgroup, reflect the potential of the proposed model for the screening purpose and the importance of approaching by different genders and the effects of aging. PMID:28472141

  3. Preoperative prediction of central lymph node metastasis in thyroid papillary microcarcinoma using clinicopathologic and sonographic features.

    PubMed

    Kim, Kyung-Eun; Kim, Eun-Kyung; Yoon, Jung Hyun; Han, Kyung Hwa; Moon, Hee Jung; Kwak, Jin Young

    2013-02-01

    The purpose of the present study was to evaluate the clinicopathologic factors and ultrasound (US) features predictive of central lymph node metastasis (LNM) in patients diagnosed with papillary thyroid microcarcinoma (PTMC). From March 2008 to August 2008, the clinicopathologic features and preoperative US features of 483 patients who were diagnosed with conventional PTMC were included. Medical records, US features, and pathology reports of all patients were retrospectively reviewed. Univariate and multivariate analysis was performed to identify clinicopathological prognostic factors associated with central LNM. Odds ratios (OR) with relative 95 % confidence intervals (95 % CI) were calculated to determine the relevance of all potential predictors of central LNM. Among the 483 patients with PTMC, 139 (28.8 %) patients had central LNM. The OR of significant independent factors were 2.055 (95 % CI, 1.137-3.716), 2.075 (95 % CI, 1.27-3.39), 1.71 (95 % CI, 1.073-2.724), and 15.897 (95 % CI, 4.173-60.569), respectively, for bilaterality, larger tumor size (>5 mm), extracapsular invasion, and lateral LNM. No significant association was seen among the US features of PTMC with central LNM. Central lymph node metastasis in patients with PTMC was significantly associated with various clinicopathological factors, including larger tumor size (>5 mm), bilaterality, extracapsular invasion, and lateral LNM. When these features are detected on preoperative US, selective central compartment dissection may be helpful in patients diagnosed with PTMC.

  4. Rough set feature selection and rule induction for prediction of malignancy degree in brain glioma.

    PubMed

    Wang, Xiangyang; Yang, Jie; Jensen, Richard; Liu, Xiaojun

    2006-08-01

    The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.

  5. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    PubMed

    Choi, Ickwon; Chung, Amy W; Suscovich, Todd J; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J; Francis, Donald; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Alter, Galit; Ackerman, Margaret E; Bailey-Kellogg, Chris

    2015-04-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates.

  6. Predicting solubilisation features of ternary phase diagrams of fully dilutable lecithin linker microemulsions.

    PubMed

    Nouraei, Mehdi; Acosta, Edgar J

    2017-06-01

    Fully dilutable microemulsions (μEs), used to design self-microemulsifying delivery system (SMEDS), are formulated as concentrate solutions containing oil and surfactants, without water. As water is added to dilute these systems, various μEs are produced (water-swollen reverse micelles, bicontinuous systems, and oil-swollen micelles), without the onset of phase separation. Currently, the formulation dilutable μEs follows a trial and error approach that has had a limited success. The objective of this work is to introduce the use of the hydrophilic-lipophilic-difference (HLD) and net-average-curvature (NAC) frameworks to predict the solubilisation features of ternary phase diagrams of lecithin-linker μEs and the use of these predictions to guide the formulation of dilutable μEs. To this end, the characteristic curvatures (Cc) of soybean lecithin (surfactant), glycerol monooleate (lipophilic linker) and polyglycerol caprylate (hydrophilic linker) and the equivalent alkane carbon number (EACN) of ethyl caprate (oil) were obtained via phase scans with reference surfactant-oil systems. These parameters were then used to calculate the HLD of lecithin-linkers-ethyl caprate microemulsions. The calculated HLDs were able to predict the phase transitions observed in the phase scans. The NAC was then used to fit and predict phase volumes obtained from salinity phase scans, and to predict the solubilisation features of ternary phase diagrams of the lecithin-linker formulations. The HLD-NAC predictions were reasonably accurate, and indicated that the largest region for dilutable μEs was obtained with slightly negative HLD values. The NAC framework also predicted, and explained, the changes in microemulsion properties along dilution lines. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Predicting visual fixations on video based on low-level visual features.

    PubMed

    Le Meur, Olivier; Le Callet, Patrick; Barba, Dominique

    2007-09-01

    To what extent can a computational model of the bottom-up visual attention predict what an observer is looking at? What is the contribution of the low-level visual features in the attention deployment? To answer these questions, a new spatio-temporal computational model is proposed. This model incorporates several visual features; therefore, a fusion algorithm is required to combine the different saliency maps (achromatic, chromatic and temporal). To quantitatively assess the model performances, eye movements were recorded while naive observers viewed natural dynamic scenes. Four completing metrics have been used. In addition, predictions from the proposed model are compared to the predictions from a state of the art model [Itti's model (Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254-1259)] and from three non-biologically plausible models (uniform, flicker and centered models). Regardless of the metric used, the proposed model shows significant improvement over the selected benchmarking models (except the centered model). Conclusions are drawn regarding both the influence of low-level visual features over time and the central bias in an eye tracking experiment.

  8. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features

    PubMed Central

    Yu, Kun-Hsing; Zhang, Ce; Berry, Gerald J.; Altman, Russ B.; Ré, Christopher; Rubin, Daniel L.; Snyder, Michael

    2016-01-01

    Lung cancer is the most prevalent cancer worldwide, and histopathological assessment is indispensable for its diagnosis. However, human evaluation of pathology slides cannot accurately predict patients' prognoses. In this study, we obtain 2,186 haematoxylin and eosin stained histopathology whole-slide images of lung adenocarcinoma and squamous cell carcinoma patients from The Cancer Genome Atlas (TCGA), and 294 additional images from Stanford Tissue Microarray (TMA) Database. We extract 9,879 quantitative image features and use regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with stage I adenocarcinoma (P<0.003) or squamous cell carcinoma (P=0.023) in the TCGA data set. We validate the survival prediction framework with the TMA cohort (P<0.036 for both tumour types). Our results suggest that automatically derived image features can predict the prognosis of lung cancer patients and thereby contribute to precision oncology. Our methods are extensible to histopathology images of other organs. PMID:27527408

  9. Prediction of near-term risk of developing breast cancer using computerized features from bilateral mammograms.

    PubMed

    Sun, Wenqing; Zheng, Bin; Lure, Fleming; Wu, Teresa; Zhang, Jianying; Wang, Benjamin Y; Saltzstein, Edward C; Qian, Wei

    2014-07-01

    Asymmetry of bilateral mammographic tissue density and patterns is a potentially strong indicator of having or developing breast abnormalities or early cancers. The purpose of this study is to design and test the global asymmetry features from bilateral mammograms to predict the near-term risk of women developing detectable high risk breast lesions or cancer in the next sequential screening mammography examination. The image dataset includes mammograms acquired from 90 women who underwent routine screening examinations, all interpreted as negative and not recalled by the radiologists during the original screening procedures. A computerized breast cancer risk analysis scheme using four image processing modules, including image preprocessing, suspicious region segmentation, image feature extraction, and classification was designed to detect and compute image feature asymmetry between the left and right breasts imaged on the mammograms. The highest computed area under curve (AUC) is 0.754±0.024 when applying the new computerized aided diagnosis (CAD) scheme to our testing dataset. The positive predictive value and the negative predictive value were 0.58 and 0.80, respectively. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Efficacy of computed tomography features in predicting stage III thymic tumors

    PubMed Central

    Shen, Yan; Ye, Jianding; Fang, Wentao; Zhang, Yu; Ye, Xiaodan; Ma, Yonghong; Chen, Libo; Li, Minghua

    2017-01-01

    Accurate assessment of the invasion of intrathoracic structures by stage III thymic tumors assists their appropriate management. The present study aimed to evaluate the efficacy of computed tomography (CT) features for the prediction of stage III thymoma invasion. The pre-operative CT images of 66 patients with confirmed stage III thymic tumors were reviewed retrospectively. The CT features of invasion into the mediastinal pleura, lungs, pericardium and great vessels were analyzed, and their sensitivity, specificity, positive predictive value (PPV), negative predictive value and accuracy were calculated. For mediastinal pleural and pericardial invasion, an absence of space between the tumor and the mediastinal pleura/pericardium with mediastinal pleural/pericardial thickening and pleural/pericardial effusion exhibited a specificity and PPV of 100%, respectively. For lung invasion, a multi-lobular tumor convex to the lung with adjacent lung abnormalities exhibited a specificity and PPV of 91.2 and 81.3%, respectively. For vessel invasion, the specificity and PPV were each 100% for tumors abutting ≥50% of the vessel circumference, and for tumor oppression, deformation and occlusion of the vessel. In conclusion, recognition of the appropriate CT features can serve as a guide to invasion by stage III thymic tumors, and can facilitate the selection of appropriate pre-operative treatment. PMID:28123518

  11. Quantitative Description of a Protein Fitness Landscape Based on Molecular Features

    PubMed Central

    Meini, María-Rocío; Tomatis, Pablo E.; Weinreich, Daniel M.; Vila, Alejandro J.

    2015-01-01

    Understanding the driving forces behind protein evolution requires the ability to correlate the molecular impact of mutations with organismal fitness. To address this issue, we employ here metallo-β-lactamases as a model system, which are Zn(II) dependent enzymes that mediate antibiotic resistance. We present a study of all the possible evolutionary pathways leading to a metallo-β-lactamase variant optimized by directed evolution. By studying the activity, stability and Zn(II) binding capabilities of all mutants in the preferred evolutionary pathways, we show that this local fitness landscape is strongly conditioned by epistatic interactions arising from the pleiotropic effect of mutations in the different molecular features of the enzyme. Activity and stability assays in purified enzymes do not provide explanatory power. Instead, measurement of these molecular features in an environment resembling the native one provides an accurate description of the observed antibiotic resistance profile. We report that optimization of Zn(II) binding abilities of metallo-β-lactamases during evolution is more critical than stabilization of the protein to enhance fitness. A global analysis of these parameters allows us to connect genotype with fitness based on quantitative biochemical and biophysical parameters. PMID:25767204

  12. Quantitative Description of a Protein Fitness Landscape Based on Molecular Features.

    PubMed

    Meini, María-Rocío; Tomatis, Pablo E; Weinreich, Daniel M; Vila, Alejandro J

    2015-07-01

    Understanding the driving forces behind protein evolution requires the ability to correlate the molecular impact of mutations with organismal fitness. To address this issue, we employ here metallo-β-lactamases as a model system, which are Zn(II) dependent enzymes that mediate antibiotic resistance. We present a study of all the possible evolutionary pathways leading to a metallo-β-lactamase variant optimized by directed evolution. By studying the activity, stability and Zn(II) binding capabilities of all mutants in the preferred evolutionary pathways, we show that this local fitness landscape is strongly conditioned by epistatic interactions arising from the pleiotropic effect of mutations in the different molecular features of the enzyme. Activity and stability assays in purified enzymes do not provide explanatory power. Instead, measurement of these molecular features in an environment resembling the native one provides an accurate description of the observed antibiotic resistance profile. We report that optimization of Zn(II) binding abilities of metallo-β-lactamases during evolution is more critical than stabilization of the protein to enhance fitness. A global analysis of these parameters allows us to connect genotype with fitness based on quantitative biochemical and biophysical parameters.

  13. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

    NASA Astrophysics Data System (ADS)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-01

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  14. Self-Adaptive MOEA Feature Selection for Classification of Bankruptcy Prediction Data

    PubMed Central

    Gaspar-Cunha, A.; Recio, G.; Costa, L.; Estébanez, C.

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier. PMID:24707201

  15. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    PubMed

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-22

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  16. Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection.

    PubMed

    Gu, Quan; Ding, Yong-Sheng; Jiang, Xiao-Ying; Zhang, Tong-Liang

    2010-04-01

    Apoptosis proteins have a central role in the development and the homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. The function of an apoptosis protein is closely related to its subcellular location. It is crucial to develop powerful tools to predict apoptosis protein locations for rapidly increasing gap between the number of known structural proteins and the number of known sequences in protein databank. In this study, amino acids pair compositions with different spaces are used to construct feature sets for representing sample of protein feature selection approach based on binary particle swarm optimization, which is applied to extract effective feature. Ensemble classifier is used as prediction engine, of which the basic classifier is the fuzzy K-nearest neighbor. Each basic classifier is trained with different feature sets. Two datasets often used in prior works are selected to validate the performance of proposed approach. The results obtained by jackknife test are quite encouraging, indicating that the proposed method might become a potentially useful tool for subcellular location of apoptosis protein, or at least can play a complimentary role to the existing methods in the relevant areas. The supplement information and software written in Matlab are available by contacting the corresponding author.

  17. Self-adaptive MOEA feature selection for classification of bankruptcy prediction data.

    PubMed

    Gaspar-Cunha, A; Recio, G; Costa, L; Estébanez, C

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier.

  18. Protein subcellular localization prediction based on compartment-specific features and structure conservation

    PubMed Central

    Su, Emily Chia-Yu; Chiu, Hua-Sheng; Lo, Allan; Hwang, Jenn-Kang; Sung, Ting-Yi; Hsu, Wen-Lian

    2007-01-01

    Background Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins. Results We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%. Conclusion Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant

  19. Prediction of near-term breast cancer risk based on bilateral mammographic feature asymmetry.

    PubMed

    Tan, Maxine; Zheng, Bin; Ramalingam, Pandiyarajan; Gur, David

    2013-12-01

    The objective of this study is to investigate the feasibility of predicting near-term risk of breast cancer development in women after a negative mammography screening examination. It is based on a statistical learning model that combines computerized image features related to bilateral mammographic tissue asymmetry and other clinical factors. A database of negative digital mammograms acquired from 994 women was retrospectively collected. In the next sequential screening examination (12 to 36 months later), 283 women were diagnosed positive for cancer, 349 were recalled for additional diagnostic workups and later proved to be benign, and 362 remain negative (not recalled). From an initial pool of 183 features, we applied a Sequential Forward Floating Selection feature selection method to search for effective features. Using 10 selected features, we developed and trained a support vector machine classification model to compute a cancer risk or probability score for each case. The area under the receiver operating characteristic curve and odds ratios (ORs) were used as the two performance assessment indices. The area under the receiver operating characteristic curve = 0.725 ± 0.018 was obtained for positive and negative/benign case classification. The ORs showed an increasing risk trend with increasing model-generated risk scores (from 1.00 to 12.34, between positive and negative/benign case groups). Regression analysis of ORs also indicated a significant increase trend in slope (P = .006). This study demonstrates that the risk scores computed by a new support vector machine model involving bilateral mammographic feature asymmetry have potential to assist the prediction of near-term risk of women for developing breast cancer. Copyright © 2013 AUR. Published by Elsevier Inc. All rights reserved.

  20. Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico

    USGS Publications Warehouse

    Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.

    2003-01-01

    Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire

  1. Prediction of antimicrobial peptides based on sequence alignment and feature selection methods.

    PubMed

    Wang, Ping; Hu, Lele; Liu, Guiyou; Jiang, Nan; Chen, Xiaoyun; Xu, Jianyong; Zheng, Wen; Li, Li; Tan, Ming; Chen, Zugen; Song, Hui; Cai, Yu-Dong; Chou, Kuo-Chen

    2011-04-13

    Antimicrobial peptides (AMPs) represent a class of natural peptides that form a part of the innate immune system, and this kind of 'nature's antibiotics' is quite promising for solving the problem of increasing antibiotic resistance. In view of this, it is highly desired to develop an effective computational method for accurately predicting novel AMPs because it can provide us with more candidates and useful insights for drug design. In this study, a new method for predicting AMPs was implemented by integrating the sequence alignment method and the feature selection method. It was observed that, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was over 80.23%, and the Mathews correlation coefficient is 0.73, indicating a good prediction. Moreover, it is indicated by an in-depth feature analysis that the results are quite consistent with the previously known knowledge that some amino acids are preferential in AMPs and that these amino acids do play an important role for the antimicrobial activity. For the convenience of most experimental scientists who want to use the prediction method without the interest to follow the mathematical details, a user-friendly web-server is provided at http://amp.biosino.org/.

  2. Combinatorial modeling of chromatin features quantitatively predicts DNA replication timing in Drosophila.

    PubMed

    Comoglio, Federico; Paro, Renato

    2014-01-01

    In metazoans, each cell type follows a characteristic, spatio-temporally regulated DNA replication program. Histone modifications (HMs) and chromatin binding proteins (CBPs) are fundamental for a faithful progression and completion of this process. However, no individual HM is strictly indispensable for origin function, suggesting that HMs may act combinatorially in analogy to the histone code hypothesis for transcriptional regulation. In contrast to gene expression however, the relationship between combinations of chromatin features and DNA replication timing has not yet been demonstrated. Here, by exploiting a comprehensive data collection consisting of 95 CBPs and HMs we investigated their combinatorial potential for the prediction of DNA replication timing in Drosophila using quantitative statistical models. We found that while combinations of CBPs exhibit moderate predictive power for replication timing, pairwise interactions between HMs lead to accurate predictions genome-wide that can be locally further improved by CBPs. Independent feature importance and model analyses led us to derive a simplified, biologically interpretable model of the relationship between chromatin landscape and replication timing reaching 80% of the full model accuracy using six model terms. Finally, we show that pairwise combinations of HMs are able to predict differential DNA replication timing across different cell types. All in all, our work provides support to the existence of combinatorial HM patterns for DNA replication and reveal cell-type independent key elements thereof, whose experimental investigation might contribute to elucidate the regulatory mode of this fundamental cellular process.

  3. Combinatorial Modeling of Chromatin Features Quantitatively Predicts DNA Replication Timing in Drosophila

    PubMed Central

    Comoglio, Federico; Paro, Renato

    2014-01-01

    In metazoans, each cell type follows a characteristic, spatio-temporally regulated DNA replication program. Histone modifications (HMs) and chromatin binding proteins (CBPs) are fundamental for a faithful progression and completion of this process. However, no individual HM is strictly indispensable for origin function, suggesting that HMs may act combinatorially in analogy to the histone code hypothesis for transcriptional regulation. In contrast to gene expression however, the relationship between combinations of chromatin features and DNA replication timing has not yet been demonstrated. Here, by exploiting a comprehensive data collection consisting of 95 CBPs and HMs we investigated their combinatorial potential for the prediction of DNA replication timing in Drosophila using quantitative statistical models. We found that while combinations of CBPs exhibit moderate predictive power for replication timing, pairwise interactions between HMs lead to accurate predictions genome-wide that can be locally further improved by CBPs. Independent feature importance and model analyses led us to derive a simplified, biologically interpretable model of the relationship between chromatin landscape and replication timing reaching 80% of the full model accuracy using six model terms. Finally, we show that pairwise combinations of HMs are able to predict differential DNA replication timing across different cell types. All in all, our work provides support to the existence of combinatorial HM patterns for DNA replication and reveal cell-type independent key elements thereof, whose experimental investigation might contribute to elucidate the regulatory mode of this fundamental cellular process. PMID:24465194

  4. Six additional systematic lateral cores enhance sextant biopsy prediction of pathological features at radical prostatectomy.

    PubMed

    Singh, Herb; Canto, Eduardo I; Shariat, Shahrokh F; Kadmon, Dov; Miles, Brian J; Wheeler, Thomas M; Slawin, Kevin M

    2004-01-01

    We evaluated the contribution of 6 additional systematically obtained, laterally directed biopsy cores to traditional sextant biopsy for the prediction of final pathological findings in the radical prostatectomy specimen. We studied 178 consecutive patients with no history of prostate biopsy in whom prostate cancer was diagnosed during an initial systematic 12 core biopsy and who subsequently underwent radical prostatectomy. Of the systematic 12 cores we compared the subset of the 6 traditional sextant cores (S6C), the set of 6 laterally directed cores (L6C) and the complete 12 core set, which included the 6 traditional sextant and the 6 laterally directed cores. Biopsy Gleason score, number of positive cores, total cancer length and percent of tumor in the biopsy sets were examined for their ability to predict extracapsular extension, total tumor volume and pathological Gleason score. On univariable analyses the biopsy parameters of the complete 12 core set correlated more strongly with extracapsular extension and total tumor volume than the biopsy parameters of S6C or L6C. On multivariable analyses S6C and L6C were independent predictors of pathological features at prostatectomy. The addition of 6 systematically obtained, laterally directed cores to traditional sextant biopsy improved the ability to predict pathological features at prostatectomy by a statistically and prognostically significant margin. Preoperative nomograms that use data from a full complement of 12 systematic cores, specifying sextant and laterally directed biopsy cores, should demonstrate improved performance in predicting prostatectomy pathology.

  5. SuSPect: Enhanced Prediction of Single Amino Acid Variant (SAV) Phenotype Using Network Features

    PubMed Central

    Yates, Christopher M.; Filippis, Ioannis; Kelley, Lawrence A.; Sternberg, Michael J.E.

    2014-01-01

    Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html. PMID:24810707

  6. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features.

    PubMed

    Yates, Christopher M; Filippis, Ioannis; Kelley, Lawrence A; Sternberg, Michael J E

    2014-07-15

    Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html. Copyright © 2014. Published by Elsevier Ltd.

  7. Molecular Size and Separability Features of Pea Cell Wall Polysaccharides 1

    PubMed Central

    Talbott, Lawrence D.; Ray, Peter M.

    1992-01-01

    Relative molecular size distributions of pectic and hemicellulosic polysaccharides of pea (Pisum sativum cv Alaska) third internode primary walls were determined by gel filtration chromatography. Pectic polyuronides have a peak molecular mass of about 1100 kilodaltons, relative to dextran standards. This peak may be partly an aggregate of smaller molecular units, because demonstrable aggregation occurred when samples were concentrated by evaporation. About 86% of the neutral sugars (mostly arabinose and galactose) in the pectin cofractionate with polyuronide in gel filtration chromatography and diethylaminoethyl-cellulose chromatography and appear to be attached covalently to polyuronide chains, probably as constituents of rhamnogalacturonans. However, at least 60% of the wall's arabinan/galactan is not linked covalently to the bulk of its rhamnogalacturonan, either glycosidically or by ester links, but occurs in the hemicellulose fraction, accompanied by negligible uronic acid, and has a peak molecular mass of about 1000 kilodaltons. Xyloglucan, the other principal hemicellulosic polymer, has a peak molecular mass of about 30 kilodaltons (with a secondary, usually minor, peak of approximately 300 kilodaltons) and is mostly not linked glycosidically either to pectic polyuronides or to arabinogalactan. The relatively narrow molecular mass distributions of these polymers suggest mechanisms of co- or postsynthetic control of hemicellulose chain length by the cell. Although the macromolecular features of the mentioned polymers individually agree generally with those shown in the widely disseminated sycamore cell primary wall model, the matrix polymers seem to be associated mostly noncovalently rather than in the covalently interlinked meshwork postulated by that model. Xyloglucan and arabinan/galactan may form tightly and more loosely bound layers, respectively, around the cellulose microfibrils, the outer layer interacting with pectic rhamnogalacturonans that occupy

  8. Prediction of molecular mimicry candidates in human pathogenic bacteria.

    PubMed

    Doxey, Andrew C; McConkey, Brendan J

    2013-08-15

    Molecular mimicry of host proteins is a common strategy adopted by bacterial pathogens to interfere with and exploit host processes. Despite the availability of pathogen genomes, few studies have attempted to predict virulence-associated mimicry relationships directly from genomic sequences. Here, we analyzed the proteomes of 62 pathogenic and 66 non-pathogenic bacterial species, and screened for the top pathogen-specific or pathogen-enriched sequence similarities to human proteins. The screen identified approximately 100 potential mimicry relationships including well-characterized examples among the top-scoring hits (e.g., RalF, internalin, yopH, and others), with about 1/3 of predicted relationships supported by existing literature. Examination of homology to virulence factors, statistically enriched functions, and comparison with literature indicated that the detected mimics target key host structures (e.g., extracellular matrix, ECM) and pathways (e.g., cell adhesion, lipid metabolism, and immune signaling). The top-scoring and most widespread mimicry pattern detected among pathogens consisted of elevated sequence similarities to ECM proteins including collagens and leucine-rich repeat proteins. Unexpectedly, analysis of the pathogen counterparts of these proteins revealed that they have evolved independently in different species of bacterial pathogens from separate repeat amplifications. Thus, our analysis provides evidence for two classes of mimics: complex proteins such as enzymes that have been acquired by eukaryote-to-pathogen horizontal transfer, and simpler repeat proteins that have independently evolved to mimic the host ECM. Ultimately, computational detection of pathogen-specific and pathogen-enriched similarities to host proteins provides insights into potentially novel mimicry-mediated virulence mechanisms of pathogenic bacteria.

  9. Prediction of molecular mimicry candidates in human pathogenic bacteria

    PubMed Central

    Doxey, Andrew C; McConkey, Brendan J

    2013-01-01

    Molecular mimicry of host proteins is a common strategy adopted by bacterial pathogens to interfere with and exploit host processes. Despite the availability of pathogen genomes, few studies have attempted to predict virulence-associated mimicry relationships directly from genomic sequences. Here, we analyzed the proteomes of 62 pathogenic and 66 non-pathogenic bacterial species, and screened for the top pathogen-specific or pathogen-enriched sequence similarities to human proteins. The screen identified approximately 100 potential mimicry relationships including well-characterized examples among the top-scoring hits (e.g., RalF, internalin, yopH, and others), with about 1/3 of predicted relationships supported by existing literature. Examination of homology to virulence factors, statistically enriched functions, and comparison with literature indicated that the detected mimics target key host structures (e.g., extracellular matrix, ECM) and pathways (e.g., cell adhesion, lipid metabolism, and immune signaling). The top-scoring and most widespread mimicry pattern detected among pathogens consisted of elevated sequence similarities to ECM proteins including collagens and leucine-rich repeat proteins. Unexpectedly, analysis of the pathogen counterparts of these proteins revealed that they have evolved independently in different species of bacterial pathogens from separate repeat amplifications. Thus, our analysis provides evidence for two classes of mimics: complex proteins such as enzymes that have been acquired by eukaryote-to-pathogen horizontal transfer, and simpler repeat proteins that have independently evolved to mimic the host ECM. Ultimately, computational detection of pathogen-specific and pathogen-enriched similarities to host proteins provides insights into potentially novel mimicry-mediated virulence mechanisms of pathogenic bacteria. PMID:23715053

  10. Energy Minimization of Molecular Features Observed on the (110) Face of Lysozyme Crystals

    NASA Technical Reports Server (NTRS)

    Perozzo, Mary A.; Konnert, John H.; Li, Huayu; Nadarajah, Arunan; Pusey, Marc

    1999-01-01

    Molecular dynamics and energy minimization have been carried out using the program XPLOR to check the plausibility of a model lysozyme crystal surface. The molecular features of the (110) face of lysozyme were observed using atomic force microscopy (AFM). A model of the crystal surface was constructed using the PDB file 193L, and was used to simulate an AFM image. Molecule translations, van der Waals radii, and assumed AFM tip shape were adjusted to maximize the correlation coefficient between the experimental and simulated images. The highest degree of 0 correlation (0.92) was obtained with the molecules displaced over 6 A from their positions within the bulk of the crystal. The quality of this starting model, the extent of energy minimization, and the correlation coefficient between the final model and the experimental data will be discussed.

  11. Features generated for computational splice-site prediction correspond to functional elements

    PubMed Central

    Dogan, Rezarta Islamaj; Getoor, Lise; Wilbur, W John; Mount, Stephen M

    2007-01-01

    Background Accurate selection of splice sites during the splicing of precursors to messenger RNA requires both relatively well-characterized signals at the splice sites and auxiliary signals in the adjacent exons and introns. We previously described a feature generation algorithm (FGA) that is capable of achieving high classification accuracy on human 3' splice sites. In this paper, we extend the splice-site prediction to 5' splice sites and explore the generated features for biologically meaningful splicing signals. Results We present examples from the observed features that correspond to known signals, both core signals (including the branch site and pyrimidine tract) and auxiliary signals (including GGG triplets and exon splicing enhancers). We present evidence that features identified by FGA include splicing signals not found by other methods. Conclusion Our generated features capture known biological signals in the expected sequence interval flanking splice sites. The method can be easily applied to other species and to similar classification problems, such as tissue-specific regulatory elements, polyadenylation sites, promoters, etc. PMID:17958908

  12. Prediction models for solitary pulmonary nodules based on curvelet textural features and clinical parameters.

    PubMed

    Wang, Jing-Jing; Wu, Hai-Feng; Sun, Tao; Li, Xia; Wang, Wei; Tao, Li-Xin; Huo, Da; Lv, Ping-Xin; He, Wen; Guo, Xiu-Hua

    2013-01-01

    Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images.

  13. BioCAST/IFCT-1002: epidemiological and molecular features of lung cancer in never-smokers.

    PubMed

    Couraud, Sébastien; Souquet, Pierre-Jean; Paris, Christophe; Dô, Pascal; Doubre, Hélène; Pichon, Eric; Dixmier, Adrien; Monnet, Isabelle; Etienne-Mastroianni, Bénédicte; Vincent, Michel; Trédaniel, Jean; Perrichon, Marielle; Foucher, Pascal; Coudert, Bruno; Moro-Sibilot, Denis; Dansin, Eric; Labonne, Stéphanie; Missy, Pascale; Morin, Franck; Blanché, Hélène; Zalcman, Gérard

    2015-05-01

    Lung cancer in never-smokers (LCINS) (fewer than 100 cigarettes in lifetime) is considered as a distinct entity and harbours an original molecular profile. However, the epidemiological and molecular features of LCINS in Europe remain poorly understood. All consecutive newly diagnosed LCINS patients were included in this prospective observational study by 75 participating centres during a 14-month period. Each patient completed a detailed questionnaire about risk factor exposure. Biomarker and pathological analyses were also collected. We report the main descriptive overall results with a focus on sex differences. 384 patients were included: 65 men and 319 women. 66% had been exposed to passive smoking (significantly higher among women). Definite exposure to main occupational carcinogens was significantly higher in men (35% versus 8% in women). A targetable molecular alteration was found in 73% of patients (without any significant sex difference): EGFR in 51%, ALK in 8%, KRAS in 6%, HER2 in 3%, BRAF in 3%, PI3KCA in less than 1%, and multiple in 2%. We present the largest and most comprehensive LCINS analysis in a European population. Physicians should track occupational exposure in men (35%), and a somatic molecular alteration in both sexes (73%). Copyright ©ERS 2015.

  14. Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features

    PubMed Central

    Peek, Andrew S

    2007-01-01

    Background RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) approach was used to quantitatively model RNA interference activities. Results Eight overall feature mapping methods were compared in their abilities to build SVM regression models that predict published siRNA activities. The primary factors in predictive SVM models are position specific nucleotide compositions. The secondary factors are position independent sequence motifs (N-grams) and guide strand to passenger strand sequence thermodynamics. Finally, the factors that are least contributory but are still predictive of efficacy are measures of intramolecular guide strand secondary structure and target strand secondary structure. Of these, the site of the 5' most base of the guide strand is the most informative. Conclusion The capacity of specific feature mapping methods and their ability to build predictive models of RNAi activity suggests a relative biological importance of these features. Some feature mapping methods are more informative in building predictive models and overall t-test filtering provides a method to remove some noisy features or make comparisons among datasets. Together, these features can yield predictive SVM regression models with increased predictive accuracy between predicted and observed activities both within datasets by cross validation, and between independently collected RNAi activity datasets. Feature filtering to remove features should be approached carefully in that it is possible to reduce feature set size without substantially reducing predictive models, but the features retained in the candidate models become increasingly distinct. Software to perform feature prediction and SVM training and testing on nucleic acid sequences can be found at

  15. Observations of the interstellar ice grain feature in the Taurus molecular clouds

    SciTech Connect

    Whittet, D.C.B.; Bode, H.F.; Longmore, A.J.; Baines, D.W.T.; Evans, A.

    1983-01-01

    Although water ice was originally proposed as a major constituent of the interstellar grain population (e.g. Oort and van de Hulst, 1946), the advent of infrared astronomy has shown that the expected absorption due to O-H stretching vibrations at 3 ..mu..m is illusive. Observations have in fact revealed that the carrier of this feature is apparently restricted to regions deep within dense molecular clouds (Merrill et al., 1976; Willner et al., 1982). However, the exact carrier of this feature is still controversial, and many questions remain as to the conditions required for its appearance. It is also uncertain whether it is restricted to circumstellar shells, rather than the general cloud medium. Detailed discussion of the 3 ..mu..m band properties is given elsewhere in this volume. 15 references, 4 figures.

  16. Adult primary pulmonary primitive neuroectodermal tumor: molecular features and translational opportunities.

    PubMed

    Andrei, Mirela; Cramer, Stewart F; Kramer, Zachary B; Zeidan, Amer; Faltas, Bishoy

    2013-02-01

    Primitive neuroectodermal tumors (PNET) arising directly from the lung are very rare but particularly aggressive neoplasms. We report a case of a 31-y-old man with primary pulmonary neuroectodermal tumor. We review the clinical as well as pathological features. As typical for these tumors, the diagnosis was initially delayed in our patient and prognosis was poor despite aggressive surgical resection, postoperative chemotherapy and local irradiation. Recent biological insights have revealed unique chromosomal translocations crucial to the pathogenesis of these tumors, most notably the EWS-FLI-1 translocation. We provide an overview of the molecular features of the Ewing Sarcoma Family of Tumors (ESFT) including PNET and their potential implications for therapeutic targeting.

  17. Prognostic significance and molecular features of signet-ring cell and mucinous components in colorectal carcinoma.

    PubMed

    Inamura, Kentaro; Yamauchi, Mai; Nishihara, Reiko; Kim, Sun A; Mima, Kosuke; Sukawa, Yasutaka; Li, Tingting; Yasunari, Mika; Zhang, Xuehong; Wu, Kana; Meyerhardt, Jeffrey A; Fuchs, Charles S; Harris, Curtis C; Qian, Zhi Rong; Ogino, Shuji

    2015-04-01

    Colorectal carcinoma (CRC) represents a group of histopathologically and molecularly heterogeneous diseases, which may contain signet-ring cell component and/or mucinous component to a varying extent under pathology assessment. However, little is known about the prognostic significance of those components, independent of various tumor molecular features. Utilizing a molecular pathological epidemiology database of 1,336 rectal and colon cancers in the Nurses' Health Study and the Health Professionals Follow-up Study, we examined patient survival according to the proportion of signet-ring cell and mucinous components in CRCs. Cox proportional hazards models were used to compute hazard ratio (HR) for mortality, adjusting for potential confounders including stage, microsatellite instability, CpG island methylator phenotype, LINE-1 methylation, and KRAS, BRAF, and PIK3CA mutations. Compared to CRC without signet-ring cell component, 1-50 % signet-ring cell component was associated with multivariate CRC-specific mortality HR of 1.40 [95 % confidence interval (CI) 1.02-1.93], and >50 % signet-ring cell component was associated with multivariate CRC-specific mortality HR of 4.53 (95 % CI 2.53-8.12) (P trend < 0.0001). Compared to CRC without mucinous component, neither 1-50 % mucinous component (multivariate HR 1.04; 95 % CI 0.81-1.33) nor >50 % mucinous component (multivariate HR 0.82; 95 % CI 0.54-1.23) was significantly associated with CRC-specific mortality (P trend < 0.57). Even a minor (50 % or less) signet-ring cell component in CRC was associated with higher patient mortality, independent of various tumor molecular and other clinicopathological features. In contrast, mucinous component was not associated with mortality in CRC patients.

  18. Prognostic Significance and Molecular Features of Signet-Ring Cell and Mucinous Components in Colorectal Carcinoma

    PubMed Central

    Mima, Kosuke; Sukawa, Yasutaka; Li, Tingting; Yasunari, Mika; Zhang, Xuehong; Wu, Kana; Meyerhardt, Jeffrey A.; Fuchs, Charles S.

    2014-01-01

    Background Colorectal carcinoma (CRC) represents a group of histopathologically and molecularly heterogeneous diseases, which may contain signet-ring cell component and/or mucinous component to a varying extent under pathology assessment. However, little is known about the prognostic significance of those components, independent of various tumor molecular features. Methods Utilizing a molecular pathological epidemiology database of 1,336 rectal and colon cancers in the Nurses’ Health Study and the Health Professionals Follow-up Study, we examined patient survival according to the proportion of signet-ring cell and mucinous components in CRCs. Cox proportional hazards models were used to compute hazard ratio (HR) for mortality, adjusting for potential confounders including stage, microsatellite instability, CpG island methylator phenotype, LINE-1 methylation, and KRAS, BRAF, and PIK3CA mutations. Results Compared to CRC without signet-ring cell component, 1–50 % signet-ring cell component was associated with multivariate CRC-specific mortality HR of 1.40 [95 % confidence interval (CI) 1.02–1.93], and >50 % signet-ring cell component was associated with multivariate CRC-specific mortality HR of 4.53 (95 % CI 2.53–8.12) (Ptrend > 0.0001). Compared to CRC without mucinous component, neither 1–50 % mucinous component (multivariate HR 1.04; 95 % CI 0.81–1.33) nor >50 % mucinous component (multivariate HR 0.82; 95 % CI 0.54–1.23) was significantly associated with CRC-specific mortality (Ptrend < 0.57). Conclusions Even a minor (50 % or less) signet-ring cell component in CRC was associated with higher patient mortality, independent of various tumor molecular and other clinicopathological features. In contrast, mucinous component was not associated with mortality in CRC patients. PMID:25326395

  19. Stargardt disease: clinical features, molecular genetics, animal models and therapeutic options

    PubMed Central

    Tanna, Preena; Strauss, Rupert W; Fujinami, Kaoru; Michaelides, Michel

    2017-01-01

    Stargardt disease (STGD1; MIM 248200) is the most prevalent inherited macular dystrophy and is associated with disease-causing sequence variants in the gene ABCA4. Significant advances have been made over the last 10 years in our understanding of both the clinical and molecular features of STGD1, and also the underlying pathophysiology, which has culminated in ongoing and planned human clinical trials of novel therapies. The aims of this review are to describe the detailed phenotypic and genotypic characteristics of the disease, conventional and novel imaging findings, current knowledge of animal models and pathogenesis, and the multiple avenues of intervention being explored. PMID:27491360

  20. Pulmonary ground-glass opacity: computed tomography features, histopathology and molecular pathology

    PubMed Central

    Gao, Jian-Wei; Rizzo, Stefania; Ma, Li-Hong; Qiu, Xiang-Yu; Warth, Arne; Seki, Nobuhiko; Hasegawa, Mizue; Zou, Jia-Wei; Li, Qian; Femia, Marco

    2017-01-01

    The incidence of pulmonary ground-glass opacity (GGO) lesions is increasing as a result of the widespread use of multislice spiral computed tomography (CT) and the low-dose CT screening for lung cancer detection. Besides benign lesions, GGOs can be a specific type of lung adenocarcinomas or their preinvasive lesions. Evaluation of pulmonary GGO and investigation of the correlation between CT imaging features and lung adenocarcinoma subtypes or driver genes can be helpful in confirming the diagnosis and in guiding the clinical management. Our review focuses on the pathologic characteristics of GGO detected at CT, involving histopathology and molecular pathology.

  1. Prediction of microRNAs involved in immune system diseases through network based features.

    PubMed

    Prabahar, Archana; Natarajan, Jeyakumar

    2017-01-01

    MicroRNAs are a class of small non-coding regulatory RNA molecules that modulate the expression of several genes at post-transcriptional level and play a vital role in disease pathogenesis. Recent research shows that a range of miRNAs are involved in the regulation of immunity and its deregulation results in immune mediated diseases such as cancer, inflammation and autoimmune diseases. Computational discovery of these immune miRNAs using a set of specific features is highly desirable. In the current investigation, we present a SVM based classification system which uses a set of novel network based topological and motif features in addition to the baseline sequential and structural features to predict immune specific miRNAs from other non-immune miRNAs. The classifier was trained and tested on a balanced set of equal number of positive and negative examples to show the discriminative power of our network features. Experimental results show that our approach achieves an accuracy of 90.2% and outperforms the classification accuracy of 63.2% reported using the traditional miRNA sequential and structural features. The proposed classifier was further validated with two immune disease sub-class datasets related to multiple sclerosis microarray data and psoriasis RNA-seq data with higher accuracy. These results indicate that our classifier which uses network and motif features along with sequential and structural features will lead to significant improvement in classifying immune miRNAs and hence can be applied to identify other specific classes of miRNAs as an extensible miRNA classification system.

  2. Toll-Like Receptor 7 Agonists: Chemical Feature Based Pharmacophore Identification and Molecular Docking Studies

    PubMed Central

    Sun, Lidan; Zhang, Liangren; Sun, Gang; Wang, Zhanli; Yu, Yongchun

    2013-01-01

    Chemical feature based pharmacophore models were generated for Toll-like receptors 7 (TLR7) agonists using HypoGen algorithm, which is implemented in the Discovery Studio software. Several methods tools used in validation of pharmacophore model were presented. The first hypothesis Hypo1 was considered to be the best pharmacophore model, which consists of four features: one hydrogen bond acceptor, one hydrogen bond donor, and two hydrophobic features. In addition, homology modeling and molecular docking studies were employed to probe the intermolecular interactions between TLR7 and its agonists. The results further confirmed the reliability of the pharmacophore model. The obtained pharmacophore model (Hypo1) was then employed as a query to screen the Traditional Chinese Medicine Database (TCMD) for other potential lead compounds. One hit was identified as a potent TLR7 agonist, which has antiviral activity against hepatitis virus in vitro. Therefore, our current work provides confidence for the utility of the selected chemical feature based pharmacophore model to design novel TLR7 agonists with desired biological activity. PMID:23526932

  3. Cytopathologic features of epithelioid inflammatory myofibroblastic sarcoma with correlation of histopathology, immunohistochemistry, and molecular cytogenetic analysis.

    PubMed

    Lee, Jen-Chieh; Wu, Jiann-Ming; Liau, Jau-Yu; Huang, Hsuan-Ying; Lo, Cheng-Yu; Jan, I-Shiow; Hornick, Jason L; Qian, Xiaohua

    2015-08-01

    Epithelioid inflammatory myofibroblastic sarcoma (E-IMS) is a recently established rare variant of inflammatory myofibroblastic tumor. It is characterized by a distinctive constellation of clinical, pathological, and molecular features, including a nearly exclusive intraabdominal location, strong male predilection, aggressive clinical course, predominance of epithelioid tumor cells, and Ran-binding protein 2 (RANBP2)-anaplastic lymphoma kinase (ALK) fusion in the majority of cases. To the authors' knowledge, the cytologic features of E-IMS have not been described to date. Cases of E-IMS that had corresponding cytology were searched. Six cytology samples (1 fine-needle aspiration sample, 2 imprint samples, and 3 effusion fluids) containing tumor cells were identified in 5 patients with E-IMS. The cytomorphology included large monotonous epithelioid cells arranged in loose aggregates or singly, with admixed myxoid stroma, and an inflammatory background rich in neutrophils. The tumor cells had a large, round, eccentric nucleus with vesicular chromatin, prominent nucleoli, and moderate amounts of pale cytoplasm. Delicate thin-walled branching vessels traversing tumor aggregates was a prominent feature in a fine-needle aspiration sample. Immunohistochemically, ALK was positive in all 5 tumors, with a nuclear membranous staining pattern noted in 3 cases and a cytoplasmic pattern observed in the other 2 cases. ALK rearrangement was confirmed in all 5 tumors by molecular genetic studies. The cytologic features of E-IMS recapitulate its histologic characteristics. E-IMS merits inclusion in the differential diagnosis of any intraabdominal, large epithelioid cell neoplasm. Confirmation of ALK rearrangement is advisable because patients may benefit from targeted therapies. © 2015 American Cancer Society.

  4. Predicting Molecular Crowding Effects in Ion-RNA Interactions.

    PubMed

    Yu, Tao; Zhu, Yuhong; He, Zhaojian; Chen, Shi-Jie

    2016-09-01

    We develop a new statistical mechanical model to predict the molecular crowding effects in ion-RNA interactions. By considering discrete distributions of the crowders, the model can treat the main crowder-induced effects, such as the competition with ions for RNA binding, changes of electrostatic interaction due to crowder-induced changes in the dielectric environment, and changes in the nonpolar hydration state of the crowder-RNA system. To enhance the computational efficiency, we sample the crowder distribution using a hybrid approach: For crowders in the close vicinity of RNA surface, we sample their discrete distributions; for crowders in the bulk solvent away from the RNA surface, we use a continuous mean-field distribution for the crowders. Moreover, using the tightly bound ion (TBI) model, we account for ion fluctuation and correlation effects in the calculation for ion-RNA interactions. Applications of the model to a variety of simple RNA structures such as RNA helices show a crowder-induced increase in free energy and decrease in ion binding. Such crowding effects tend to contribute to the destabilization of RNA structure. Further analysis indicates that these effects are associated with the crowder-ion competition in RNA binding and the effective decrease in the dielectric constant. This simple ion effect model may serve as a useful framework for modeling more realistic crowders with larger, more complex RNA structures.

  5. Predicting the rupture probabilities of molecular bonds in series.

    PubMed

    Neuert, Gregor; Albrecht, Christian H; Gaub, Hermann E

    2007-08-15

    An assembly of two receptor ligand bonds in series will typically break at the weaker complex upon application of an external force. The rupture site depends highly on the binding potentials of both bonds and on the loading rate of the applied force. A model is presented that allows simulations of force-induced rupture of bonds in series at a given force and loading rate based on the natural dissociation rates kR0,S0 and the potential width DeltaxR,S of the reference and sample bonds. The model is especially useful for the analysis of differential force assay experiments. This is illustrated by experiments on molecular force balances consisting of two 30-bp oligonucleotide duplexes where kR0,S0 and DeltaxR,S have been determined for different single nucleotide mismatches. Furthermore, prediction of the rupture site of two bonds in series is demonstrated for DNA duplexes in combination with streptavidin/biotin and anti-digoxigenin/digoxigenin, respectively.

  6. Predicting the Rupture Probabilities of Molecular Bonds in Series

    PubMed Central

    Neuert, Gregor; Albrecht, Christian H.; Gaub, Hermann E.

    2007-01-01

    An assembly of two receptor ligand bonds in series will typically break at the weaker complex upon application of an external force. The rupture site depends highly on the binding potentials of both bonds and on the loading rate of the applied force. A model is presented that allows simulations of force-induced rupture of bonds in series at a given force and loading rate based on the natural dissociation rates kR0,S0 and the potential width ΔxR,S of the reference and sample bonds. The model is especially useful for the analysis of differential force assay experiments. This is illustrated by experiments on molecular force balances consisting of two 30-bp oligonucleotide duplexes where kR0,S0 and ΔxR,S have been determined for different single nucleotide mismatches. Furthermore, prediction of the rupture site of two bonds in series is demonstrated for DNA duplexes in combination with streptavidin/biotin and anti-digoxigenin/digoxigenin, respectively. PMID:17468164

  7. Predictive Ensemble Decoding of Acoustical Features Explains Context-Dependent Receptive Fields.

    PubMed

    Yildiz, Izzet B; Mesgarani, Nima; Deneve, Sophie

    2016-12-07

    A primary goal of auditory neuroscience is to identify the sound features extracted and represented by auditory neurons. Linear encoding models, which describe neural responses as a function of the stimulus, have been primarily used for this purpose. Here, we provide theoretical arguments and experimental evidence in support of an alternative approach, based on decoding the stimulus from the neural response. We used a Bayesian normative approach to predict the responses of neurons detecting relevant auditory features, despite ambiguities and noise. We compared the model predictions to recordings from the primary auditory cortex of ferrets and found that: (1) the decoding filters of auditory neurons resemble the filters learned from the statistics of speech sounds; (2) the decoding model captures the dynamics of responses better than a linear encoding model of similar complexity; and (3) the decoding model accounts for the accuracy with which the stimulus is represented in neural activity, whereas linear encoding model performs very poorly. Most importantly, our model predicts that neuronal responses are fundamentally shaped by "explaining away," a divisive competition between alternative interpretations of the auditory scene. Neural responses in the auditory cortex are dynamic, nonlinear, and hard to predict. Traditionally, encoding models have been used to describe neural responses as a function of the stimulus. However, in addition to external stimulation, neural activity is strongly modulated by the responses of other neurons in the network. We hypothesized that auditory neurons aim to collectively decode their stimulus. In particular, a stimulus feature that is decoded (or explained away) by one neuron is not explained by another. We demonstrated that this novel Bayesian decoding model is better at capturing the dynamic responses of cortical neurons in ferrets. Whereas the linear encoding model poorly reflects selectivity of neurons, the decoding model can

  8. Predictive Ensemble Decoding of Acoustical Features Explains Context-Dependent Receptive Fields

    PubMed Central

    Mesgarani, Nima; Deneve, Sophie

    2016-01-01

    A primary goal of auditory neuroscience is to identify the sound features extracted and represented by auditory neurons. Linear encoding models, which describe neural responses as a function of the stimulus, have been primarily used for this purpose. Here, we provide theoretical arguments and experimental evidence in support of an alternative approach, based on decoding the stimulus from the neural response. We used a Bayesian normative approach to predict the responses of neurons detecting relevant auditory features, despite ambiguities and noise. We compared the model predictions to recordings from the primary auditory cortex of ferrets and found that: (1) the decoding filters of auditory neurons resemble the filters learned from the statistics of speech sounds; (2) the decoding model captures the dynamics of responses better than a linear encoding model of similar complexity; and (3) the decoding model accounts for the accuracy with which the stimulus is represented in neural activity, whereas linear encoding model performs very poorly. Most importantly, our model predicts that neuronal responses are fundamentally shaped by “explaining away,” a divisive competition between alternative interpretations of the auditory scene. SIGNIFICANCE STATEMENT Neural responses in the auditory cortex are dynamic, nonlinear, and hard to predict. Traditionally, encoding models have been used to describe neural responses as a function of the stimulus. However, in addition to external stimulation, neural activity is strongly modulated by the responses of other neurons in the network. We hypothesized that auditory neurons aim to collectively decode their stimulus. In particular, a stimulus feature that is decoded (or explained away) by one neuron is not explained by another. We demonstrated that this novel Bayesian decoding model is better at capturing the dynamic responses of cortical neurons in ferrets. Whereas the linear encoding model poorly reflects selectivity of neurons

  9. Transmembrane helix prediction using amino acid property features and latent semantic analysis.

    PubMed

    Ganapathiraju, Madhavi; Balakrishnan, N; Reddy, Raj; Klein-Seetharaman, Judith

    2008-01-01

    Prediction of transmembrane (TM) helices by statistical methods suffers from lack of sufficient training data. Current best methods use hundreds or even thousands of free parameters in their models which are tuned to fit the little data available for training. Further, they are often restricted to the generally accepted topology "cytoplasmic-transmembrane-extracellular" and cannot adapt to membrane proteins that do not conform to this topology. Recent crystal structures of channel proteins have revealed novel architectures showing that the above topology may not be as universal as previously believed. Thus, there is a need for methods that can better predict TM helices even in novel topologies and families. Here, we describe a new method "TMpro" to predict TM helices with high accuracy. To avoid overfitting to existing topologies, we have collapsed cytoplasmic and extracellular labels to a single state, non-TM. TMpro is a binary classifier which predicts TM or non-TM using multiple amino acid properties (charge, polarity, aromaticity, size and electronic properties) as features. The features are extracted from sequence information by applying the framework used for latent semantic analysis of text documents and are input to neural networks that learn the distinction between TM and non-TM segments. The model uses only 25 free parameters. In benchmark analysis TMpro achieves 95% segment F-score corresponding to 50% reduction in error rate compared to the best methods not requiring an evolutionary profile of a protein to be known. Performance is also improved when applied to more recent and larger high resolution datasets PDBTM and MPtopo. TMpro predictions in membrane proteins with unusual or disputed TM structure (K+ channel, aquaporin and HIV envelope glycoprotein) are discussed. TMpro uses very few free parameters in modeling TM segments as opposed to the very large number of free parameters used in state-of-the-art membrane prediction methods, yet achieves very

  10. Selection of key sequence-based features for prediction of essential genes in 31 diverse bacterial species

    PubMed Central

    Liu, Xiao; Wang, Bao-Jin; Xu, Luo; Tang, Hong-Ling; Xu, Guo-Qing

    2017-01-01

    Genes that are indispensable for survival are essential genes. Many features have been proposed for computational prediction of essential genes. In this paper, the least absolute shrinkage and selection operator method was used to screen key sequence-based features related to gene essentiality. To assess the effects, the selected features were used to predict the essential genes from 31 bacterial species based on a support vector machine classifier. For all 31 bacterial objects (21 Gram-negative objects and ten Gram-positive objects), the features in the three datasets were reduced from 57, 59, and 58, to 40, 37, and 38, respectively, without loss of prediction accuracy. Results showed that some features were redundant for gene essentiality, so could be eliminated from future analyses. The selected features contained more complex (or key) biological information for gene essentiality, and could be of use in related research projects, such as gene prediction, synthetic biology, and drug design. PMID:28358836

  11. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

    PubMed Central

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information. PMID:26543860

  12. Molecular features of interaction between VEGFA and anti-angiogenic drugs used in retinal diseases: a computational approach

    PubMed Central

    Platania, Chiara B. M.; Di Paola, Luisa; Leggio, Gian M.; Romano, Giovanni L.; Drago, Filippo; Salomone, Salvatore; Bucolo, Claudio

    2015-01-01

    Anti-angiogenic agents are biological drugs used for treatment of retinal neovascular degenerative diseases. In this study, we aimed at in silico analysis of interaction of vascular endothelial growth factor A (VEGFA), the main mediator of angiogenesis, with binding domains of anti-angiogenic agents used for treatment of retinal diseases, such as ranibizumab, bevacizumab and aflibercept. The analysis of anti-VEGF/VEGFA complexes was carried out by means of protein-protein docking and molecular dynamics (MD) coupled to molecular mechanics-Poisson Boltzmann Surface Area (MM-PBSA) calculation. Molecular dynamics simulation was further analyzed by protein contact networks. Rough energetic evaluation with protein-protein docking scores revealed that aflibercept/VEGFA complex was characterized by electrostatic stabilization, whereas ranibizumab and bevacizumab complexes were stabilized by Van der Waals (VdW) energy term; these results were confirmed by MM-PBSA. Comparison of MM-PBSA predicted energy terms with experimental binding parameters reported in literature indicated that the high association rate (Kon) of aflibercept to VEGFA was consistent with high stabilizing electrostatic energy. On the other hand, the relatively low experimental dissociation rate (Koff) of ranibizumab may be attributed to lower conformational fluctuations of the ranibizumab/VEGFA complex, higher number of contacts and hydrogen bonds in comparison to bevacizumab and aflibercept. Thus, the anti-angiogenic agents have been found to be considerably different both in terms of molecular interactions and stabilizing energy. Characterization of such features can improve the design of novel biological drugs potentially useful in clinical practice. PMID:26578958

  13. Remote health monitoring: predicting outcome success based on contextual features for cardiovascular disease.

    PubMed

    Alshurafa, Nabil; Eastwood, Jo-Ann; Pourhomayoun, Mohammad; Liu, Jason J; Sarrafzadeh, Majid

    2014-01-01

    Current studies have produced a plethora of remote health monitoring (RHM) systems designed to enhance the care of patients with chronic diseases. Many RHM systems are designed to improve patient risk factors for cardiovascular disease, including physiological parameters such as body mass index (BMI) and waist circumference, and lipid profiles such as low density lipoprotein (LDL) and high density lipoprotein (HDL). There are several patient characteristics that could be determining factors for a patient's RHM outcome success, but these characteristics have been largely unidentified. In this paper, we analyze results from an RHM system deployed in a six month Women's Heart Health study of 90 patients, and apply advanced feature selection and machine learning algorithms to identify patients' key baseline contextual features and build effective prediction models that help determine RHM outcome success. We introduce Wanda-CVD, a smartphone-based RHM system designed to help participants with cardiovascular disease risk factors by motivating participants through wireless coaching using feedback and prompts as social support. We analyze key contextual features that secure positive patient outcomes in both physiological parameters and lipid profiles. Results from the Women's Heart Health study show that health threat of heart disease, quality of life, family history, stress factors, social support, and anxiety at baseline all help predict patient RHM outcome success.

  14. Sub-resolution assist feature (SRAF) printing prediction using logistic regression

    NASA Astrophysics Data System (ADS)

    Tan, Chin Boon; Koh, Kar Kit; Zhang, Dongqing; Foong, Yee Mei

    2015-03-01

    In optical proximity correction (OPC), the sub-resolution assist feature (SRAF) has been used to enhance the process window of main structures. However, the printing of SRAF on wafer is undesirable as this may adversely degrade the overall process yield if it is transferred into the final pattern. A reasonably accurate prediction model is needed during OPC to ensure that the SRAF placement and size have no risk of SRAF printing. Current common practice in OPC is either using the main OPC model or model threshold adjustment (MTA) solution to predict the SRAF printing. This paper studies the feasibility of SRAF printing prediction using logistic regression (LR). Logistic regression is a probabilistic classification model that gives discrete binary outputs after receiving sufficient input variables from SRAF printing conditions. In the application of SRAF printing prediction, the binary outputs can be treated as 1 for SRAFPrinting and 0 for No-SRAF-Printing. The experimental work was performed using a 20nm line/space process layer. The results demonstrate that the accuracy of SRAF printing prediction using LR approach outperforms MTA solution. Overall error rate of as low as calibration 2% and verification 5% was achieved by LR approach compared to calibration 6% and verification 15% for MTA solution. In addition, the performance of LR approach was found to be relatively independent and consistent across different resist image planes compared to MTA solution.

  15. A Toxicogenomic Approach for the Prediction of Murine Hepatocarcinogenesis Using Ensemble Feature Selection

    PubMed Central

    Eichner, Johannes; Kossler, Nadine; Wrzodek, Clemens; Kalkuhl, Arno; Bach Toft, Dorthe; Ostenfeldt, Nina; Richard, Virgile; Zell, Andreas

    2013-01-01

    The current strategy for identifying the carcinogenicity of drugs involves the 2-year bioassay in male and female rats and mice. As this assay is cost-intensive and time-consuming there is a high interest in developing approaches for the screening and prioritization of drug candidates in preclinical safety evaluations. Predictive models based on toxicogenomics investigations after short-term exposure have shown their potential for assessing the carcinogenic risk. In this study, we investigated a novel method for the evaluation of toxicogenomics data based on ensemble feature selection in conjunction with bootstrapping for the purpose to derive reproducible and characteristic multi-gene signatures. This method was evaluated on a microarray dataset containing global gene expression data from liver samples of both male and female mice. The dataset was generated by the IMI MARCAR consortium and included gene expression profiles of genotoxic and nongenotoxic hepatocarcinogens obtained after treatment of CD-1 mice for 3 or 14 days. We developed predictive models based on gene expression data of both sexes and the models were employed for predicting the carcinogenic class of diverse compounds. Comparing the predictivity of our multi-gene signatures against signatures from literature, we demonstrated that by incorporating our gene sets as features slightly higher accuracy is on average achieved by a representative set of state-of-the art supervised learning methods. The constructed models were also used for the classification of Cyproterone acetate (CPA), Wy-14643 (WY) and Thioacetamid (TAA), whose primary mechanism of carcinogenicity is controversially discussed. Based on the extracted mouse liver gene expression patterns, CPA would be predicted as a nongenotoxic compound. In contrast, both WY and TAA would be classified as genotoxic mouse hepatocarcinogens. PMID:24040119

  16. Ginkgo leaf sign: a highly predictive imaging feature of spinal meningioma.

    PubMed

    Yamaguchi, Satoshi; Takeda, Masaaki; Takahashi, Toshiyuki; Yamahata, Hitoshi; Mitsuhara, Takafumi; Niiro, Tadaaki; Hanakita, Junya; Hida, Kazutoshi; Arita, Kazunori; Kurisu, Kaoru

    2015-07-31

    OBJECT Spinal meningioma and schwannoma are the most common spinal intradural extramedullary tumors, and the differentiation of these 2 tumors by CT and MRI has been a matter of debate. The purpose of this article is to present a case series of spinal meningiomas showing unique imaging features: a combination of a fan-shaped spinal cord and a streak in the tumor. The authors termed the former imaging feature "ginkgo leaf sign" and evaluated its diagnostic value. METHODS The authors present 7 cases of spinal meningioma having the ginkgo leaf sign. Thirty spinal extramedullary tumors arising lateral or ventrolateral to the spinal cord were studied to evaluate the diagnostic value of the ginkgo leaf sign for spinal meningiomas. Among 30 cases, 12 tumors were spinal meningiomas and 18 tumors from the control group were all schwannomas. RESULTS Seven of the 12 spinal meningiomas were positive for the ginkgo leaf sign. The sign was not present in the control group tumors. The overall ability to use the ginkgo leaf sign to detect meningioma indicated a sensitivity of 58%, specificity of 100%, positive predictive value of 100%, and negative predictive value of 78%. CONCLUSIONS The ginkgo leaf sign is highly specific to spinal meningiomas arising lateral or ventrolateral to the spinal cord. In the present series, the ginkgo leaf sign was perfectly predictive for spinal meningioma.

  17. Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features

    PubMed Central

    Shi, Xiao-He; Hu, Le-Le; Kong, Xiangyin; Cai, Yu-Dong; Chou, Kuo-Chen

    2010-01-01

    Background Study of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. Methods/Principal Findings To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. Conclusion/Significance Our results indicate that the network prediction system thus established is quite promising and encouraging. PMID:20300175

  18. Identifying predictive features in drug response using machine learning: opportunities and challenges.

    PubMed

    Vidyasagar, Mathukumalli

    2015-01-01

    This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.

  19. Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity.

    PubMed

    Gritsenko, Alexey A; Weingarten-Gabbay, Shira; Elias-Kirma, Shani; Nir, Ronit; de Ridder, Dick; Segal, Eran

    2017-09-18

    Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements.

  20. Breast cancer subtype intertumor heterogeneity: MRI-based features predict results of a genomic assay.

    PubMed

    Sutton, Elizabeth J; Oh, Jung Hun; Dashevsky, Brittany Z; Veeraraghavan, Harini; Apte, Aditya P; Thakur, Sunitha B; Deasy, Joseph O; Morris, Elizabeth A

    2015-11-01

    To investigate the association between a validated, gene-expression-based, aggressiveness assay, Oncotype Dx RS, and morphological and texture-based image features extracted from magnetic resonance imaging (MRI). This retrospective study received Internal Review Board approval and need for informed consent was waived. Between 2006-2012, we identified breast cancer patients with: 1) ER+, PR+, and HER2- invasive ductal carcinoma (IDC); 2) preoperative breast MRI; and 3) Oncotype Dx RS test results. Extracted features included morphological, histogram, and gray-scale correlation matrix (GLCM)-based texture features computed from tumors contoured on pre- and three postcontrast MR images. Linear regression analysis was performed to investigate the association between Oncotype Dx RS and different clinical, pathologic, and imaging features. P < 0.05 was considered statistically significant. Ninety-five patients with IDC were included with a median Oncotype Dx RS of 16 (range: 0-45). Using stepwise multiple linear regression modeling, two MR-derived image features, kurtosis in the first and third postcontrast images and histologic nuclear grade, were found to be significantly correlated with the Oncotype Dx RS with P = 0.0056, 0.0005, and 0.0105, respectively. The overall model resulted in statistically significant correlation with Oncotype Dx RS with an R-squared value of 0.23 (adjusted R-squared = 0.20; P = 0.0002) and a Spearman's rank correlation coefficient of 0.49 (P < 0.0001). A model for IDC using imaging and pathology information correlates with Oncotype Dx RS scores, suggesting that image-based features could also predict the likelihood of recurrence and magnitude of chemotherapy benefit. © 2015 Wiley Periodicals, Inc.

  1. Molecular features of secondary vascular tissue regeneration after bark girdling in Populus.

    PubMed

    Zhang, Jing; Gao, Ge; Chen, Jia-Jia; Taylor, Gail; Cui, Ke-Ming; He, Xin-Qiang

    2011-12-01

    Regeneration is a common strategy for plants to repair damage to their tissue after attacks from other organisms or physical assaults. However, how differentiating cells acquire regenerative competence and rebuild the pattern of new tissues remains largely unknown. Using anatomical observation and microarray analysis, we investigated the morphological process and molecular features of secondary vascular tissue regeneration after bark girdling in trees. After bark girdling, new phloem and cambium regenerate from differentiating xylem cells and rebuild secondary vascular tissue pattern within 1 month. Differentiating xylem cells acquire regenerative competence through epigenetic regulation and cell cycle re-entry. The xylem developmental program was blocked, whereas the phloem or cambium program was activated, resulting in the secondary vascular tissue pattern re-establishment. Phytohormones play important roles in vascular tissue regeneration. We propose a model describing the molecular features of secondary vascular tissue regeneration after bark girdling in trees. It provides information for understanding mechanisms of tissue regeneration and pattern formation of the secondary vascular tissues in plants.

  2. INTEGRATIVE ANALYSIS FOR LUNG ADENOCARCINOMA PREDICTS MORPHOLOGICAL FEATURES ASSOCIATED WITH GENETIC VARIATIONS*

    PubMed Central

    WANG, CHAO; SU, HAI; YANG, LIN; HUANG, KUN

    2016-01-01

    Lung cancer is one of the most deadly cancers and lung adenocarcinoma (LUAD) is the most common histological type of lung cancer. However, LUAD is highly heterogeneous due to genetic difference as well as phenotypic differences such as cellular and tissue morphology. In this paper, we systematically examine the relationships between histological features and gene transcription. Specifically, we calculated 283 morphological features from histology images for 201 LUAD patients from TCGA project and identified the morphological feature with strong correlation with patient outcome. We then modeled the morphology feature using multiple co-expressed gene clusters using Lasso-regression. Many of the gene clusters are highly associated with genetic variations, specifically DNA copy number variations, implying that genetic variations play important roles in the development cancer morphology. As far as we know, our finding is the first to directly link the genetic variations and functional genomics to LUAD histology. These observations will lead to new insight on lung cancer development and potential new integrative biomarkers for prediction patient prognosis and response to treatments. PMID:27896964

  3. Feature selection from short amino acid sequences in phosphorylation prediction problem

    NASA Astrophysics Data System (ADS)

    Wecławski, Jakub; Jankowski, Stanisław; Szymański, Zbigniew

    The paper describes solution of feature selection from amino acid sequences in phosphorylation prediction problem. We show that even for short sequences the variable selection leads to better classification performance. Moreover, the final simplicity of models allows for better data understanding and can be used by an expert for further analysis. The feature selection process is divided into two parts: i) the classification tree is used for finding the most relevant positions in amino acid sequences, ii) then the contrast pattern kernel is applied for pattern selection. This work summarizes the research made on classification of short amino acid sequences. The results of the research allowed us to propose a general scheme of amino acid sequence analysis.

  4. Infrared images of reflection nebulae and Orion's bar: Fluorescent molecular hydrogen and the 3.3 micron feature

    NASA Technical Reports Server (NTRS)

    Burton, Michael G.; Moorhouse, Alan; Brand, P. W. J. L.; Roche, Patrick F.; Geballe, T. R.

    1989-01-01

    Images were obtained of the (fluorescent) molecular hydrogen 1-0 S(1) line, and of the 3.3 micron emission feature, in Orion's Bar and three reflection nebulae. The emission from these species appears to come from the same spatial locations in all sources observed. This suggests that the 3.3 micron feature is excited by the same energetic UV-photons which cause the molecular hydrogen to fluoresce.

  5. Respiratory trace feature analysis for the prediction of respiratory-gated PET quantification

    NASA Astrophysics Data System (ADS)

    Wang, Shouyi; Bowen, Stephen R.; Chaovalitwongse, W. Art; Sandison, George A.; Grabowski, Thomas J.; Kinahan, Paul E.

    2014-02-01

    The benefits of respiratory gating in quantitative PET/CT vary tremendously between individual patients. Respiratory pattern is among many patient-specific characteristics that are thought to play an important role in gating-induced imaging improvements. However, the quantitative relationship between patient-specific characteristics of respiratory pattern and improvements in quantitative accuracy from respiratory-gated PET/CT has not been well established. If such a relationship could be estimated, then patient-specific respiratory patterns could be used to prospectively select appropriate motion compensation during image acquisition on a per-patient basis. This study was undertaken to develop a novel statistical model that predicts quantitative changes in PET/CT imaging due to respiratory gating. Free-breathing static FDG-PET images without gating and respiratory-gated FDG-PET images were collected from 22 lung and liver cancer patients on a PET/CT scanner. PET imaging quality was quantified with peak standardized uptake value (SUVpeak) over lesions of interest. Relative differences in SUVpeak between static and gated PET images were calculated to indicate quantitative imaging changes due to gating. A comprehensive multidimensional extraction of the morphological and statistical characteristics of respiratory patterns was conducted, resulting in 16 features that characterize representative patterns of a single respiratory trace. The six most informative features were subsequently extracted using a stepwise feature selection approach. The multiple-regression model was trained and tested based on a leave-one-subject-out cross-validation. The predicted quantitative improvements in PET imaging achieved an accuracy higher than 90% using a criterion with a dynamic error-tolerance range for SUVpeak values. The results of this study suggest that our prediction framework could be applied to determine which patients would likely benefit from respiratory motion compensation

  6. NSD1 duplication in Silver-Russell syndrome (SRS): molecular karyotyping in patients with SRS features.

    PubMed

    Sachwitz, J; Meyer, R; Fekete, G; Spranger, S; Matulevičienė, A; Kučinskas, V; Bach, A; Luczay, A; Brüchle, N O; Eggermann, K; Zerres, K; Elbracht, M; Eggermann, T

    2017-01-01

    Silver-Russell syndrome (SRS) is a growth retardation syndrome characterized by intrauterine and postnatal growth retardation, relative macrocephaly and protruding forehead, body asymmetry and feeding difficulties. Nearly 50% of cases show a hypomethylation in 11p15.5, in 10% maternal uniparental disomy of chromosome 7 is present. A significant number of patients with SRS features also exhibit chromosomal aberrations. We analyzed 43 individuals referred for SRS genetic testing by molecular karyotyping. Pathogenic variants could be detected in five of them, including a NSD1 duplication in 5q35 and a 14q32 microdeletion. NSD1 deletions are detectable in overgrowth disorders (Sotos syndrome and Beckwith-Wiedemann syndrome), whereas NSD1 duplications are associated with growth retardation. The 14q32 deletion is typically associated with Temple syndrome (TS14), but the identification of a patient in our cohort reflects the clinical overlap between TS14 and SRS. As determination of molecular subtypes is the basis for a directed counseling and therapy, the identification of pathogenic variants in >10% of the total cohort of patients referred for SRS testing and in >16% of characteristic individuals with the characteristic SRS phenotype confirms the need to apply molecular karyotyping in this cohort. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  7. Importance of Molecular Features of Non–Small Cell Lung Cancer for Choice of Treatment

    PubMed Central

    Moran, Cesar

    2011-01-01

    Lung cancer is the leading cause of cancer-related deaths in the United States. Approximately 85% of lung cancer is categorized as non–small cell lung cancer, and traditionally, non–small cell lung cancer has been treated with surgery, radiation, and chemotherapy. Targeted agents that inhibit the epidermal growth factor receptor pathway have been developed and integrated into the treatment regimens in non–small cell lung cancer. Currently, approved epidermal growth factor receptor inhibitors include the tyrosine kinase inhibitors erlotinib and gefitinib. Molecular determinants, such as epidermal growth factor receptor–activating mutations, have been associated with response to epidermal growth factor receptor tyrosine kinase inhibitors and may be used to guide treatment choices in patients with non–small cell lung cancer. Thus, treatment choice for patients with non–small cell lung cancer depends on molecular features of tumors; however, improved techniques are required to increase the specificity and efficiency of molecular profiling so that these methods can be incorporated into routine clinical practice. This review provides an overview of how genetic analysis is currently used to direct treatment choices in non–small cell lung cancer. PMID:21514411

  8. Clinical and molecular features and therapeutic perspectives of spinal muscular atrophy with respiratory distress type 1.

    PubMed

    Vanoli, Fiammetta; Rinchetti, Paola; Porro, Francesca; Parente, Valeria; Corti, Stefania

    2015-09-01

    Spinal muscular atrophy with respiratory distress (SMARD1) is an autosomal recessive neuromuscular disease caused by mutations in the IGHMBP2 gene, encoding the immunoglobulin μ-binding protein 2, leading to motor neuron degeneration. It is a rare and fatal disease with an early onset in infancy in the majority of the cases. The main clinical features are muscular atrophy and diaphragmatic palsy, which requires prompt and permanent supportive ventilation. The human disease is recapitulated in the neuromuscular degeneration (nmd) mouse. No effective treatment is available yet, but novel therapeutical approaches tested on the nmd mouse, such as the use of neurotrophic factors and stem cell therapy, have shown positive effects. Gene therapy demonstrated effectiveness in SMA, being now at the stage of clinical trial in patients and therefore representing a possible treatment for SMARD1 as well. The significant advancement in understanding of both SMARD1 clinical spectrum and molecular mechanisms makes ground for a rapid translation of pre-clinical therapeutic strategies in humans. © 2015 The Authors. Journal of Cellular and Molecular Medicine published by John Wiley & Sons Ltd and Foundation for Cellular and Molecular Medicine.

  9. Feature Detection” vs. “Predictive Coding” Models of Plant Behavior

    PubMed Central

    Calvo, Paco; Baluška, František; Sims, Andrew

    2016-01-01

    In this article we consider the possibility that plants exhibit anticipatory behavior, a mark of intelligence. If plants are able to anticipate and respond accordingly to varying states of their surroundings, as opposed to merely responding online to environmental contingencies, then such capacity may be in principle testable, and subject to empirical scrutiny. Our main thesis is that adaptive behavior can only take place by way of a mechanism that predicts the environmental sources of sensory stimulation. We propose to test for anticipation in plants experimentally by contrasting two empirical hypotheses: “feature detection” and “predictive coding.” We spell out what these contrasting hypotheses consist of by way of illustration from the animal literature, and consider how to transfer the rationale involved to the plant literature. PMID:27757094

  10. Prediction of core cancer genes using a hybrid of feature selection and machine learning methods.

    PubMed

    Liu, Y X; Zhang, N N; He, Y; Lun, L J

    2015-08-03

    Machine learning techniques are of great importance in the analysis of microarray expression data, and provide a systematic and promising way to predict core cancer genes. In this study, a hybrid strategy was introduced based on machine learning techniques to select a small set of informative genes, which will lead to improving classification accuracy. First feature filtering algorithms were applied to select a set of top-ranked genes, and then hierarchical clustering and collapsing dense clusters were used to select core cancer genes. Through empirical study, our approach is capable of selecting relatively few core cancer genes while making high-accuracy predictions. The biological significance of these genes was evaluated using systems biology analysis. Extensive functional pathway and network analyses have confirmed findings in previous studies and can bring new insights into common cancer mechanisms.

  11. Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor

    PubMed Central

    2014-01-01

    Background Transient protein-protein interactions (PPIs), which underly most biological processes, are a prime target for therapeutic development. Immense progress has been made towards computational prediction of PPIs using methods such as protein docking and sequence analysis. However, docking generally requires high resolution structures of both of the binding partners and sequence analysis requires that a significant number of recurrent patterns exist for the identification of a potential binding site. Researchers have turned to machine learning to overcome some of the other methods’ restrictions by generalising interface sites with sets of descriptive features. Best practices for dataset generation, features, and learning algorithms have not yet been identified or agreed upon, and an analysis of the overall efficacy of machine learning based PPI predictors is due, in order to highlight potential areas for improvement. Results The presence of unknown interaction sites as a result of limited knowledge about protein interactions in the testing set dramatically reduces prediction accuracy. Greater accuracy in labelling the data by enforcing higher interface site rates per domain resulted in an average 44% improvement across multiple machine learning algorithms. A set of 10 biologically unrelated proteins that were consistently predicted on with high accuracy emerged through our analysis. We identify seven features with the most predictive power over multiple datasets and machine learning algorithms. Through our analysis, we created a new predictor, RAD-T, that outperforms existing non-structurally specializing machine learning protein interface predictors, with an average 59% increase in MCC score on a dataset with a high number of interactions. Conclusion Current methods of evaluating machine-learning based PPI predictors tend to undervalue their performance, which may be artificially decreased by the presence of un-identified interaction sites. Changes to

  12. Prediction of reversible disulfide based on features from local structural signatures.

    PubMed

    Sun, Ming-An; Wang, Yejun; Zhang, Qing; Xia, Yiji; Ge, Wei; Guo, Dianjing

    2017-04-04

    Disulfide bonds are traditionally considered to play only structural roles. In recent years, increasing evidence suggests that the disulfide proteome is made up of structural disulfides and reversible disulfides. Unlike structural disulfides, reversible disulfides are usually of important functional roles and may serve as redox switches. Interestingly, only specific disulfide bonds are reversible while others are not. However, whether reversible disulfides can be predicted based on structural information remains largely unknown. In this study, two datasets with both types of disulfides were compiled using independent approaches. By comparison of various features extracted from the local structural signatures, we identified several features that differ significantly between reversible and structural disulfides, including disulfide bond length, along with the number, amino acid composition, secondary structure and physical-chemical properties of surrounding amino acids. A SVM-based classifier was developed for predicting reversible disulfides. RESULTS: By 10-fold cross-validation, the model achieved accuracy of 0.750, sensitivity of 0.352, specificity of 0.953, MCC of 0.405 and AUC of 0.751 using the RevSS_PDB dataset. The robustness was further validated by using RevSS_RedoxDB as independent testing dataset. This model was applied to proteins with known structures in the PDB database. The results show that one third of the predicted reversible disulfide containing proteins are well-known redox enzymes, while the remaining are non-enzyme proteins. Given that reversible disulfides are frequently reported from functionally important non-enzyme proteins such as transcription factors, the predictions may provide valuable candidates of novel reversible disulfides for further experimental investigation. This study provides the first comparative analysis between the reversible and the structural disulfides. Distinct features remarkably different between these two

  13. Significance of ultrasound features in predicting malignant solid thyroid nodules: need for fine-needle aspiration.

    PubMed

    Yunus, Mahira; Ahmed, Zeba

    2010-10-01

    The purpose of this study was to provide sonographic and colour flow criteria helpful for differentiation between benign and malignant solid thyroid nodules. This prospective study was carried out at Sindh Institute of Urology and Transplantation (SIUT), Karachi Pakistan from 01.05.07 to 31.12.08. Sonographic scans of 78 thyroid nodules in 66 patients were performed and characteristics of thyroid nodules that were studied included microcalcifications, an irregular or microlobulated margins, marked hypoechogenicity, a shape that was taller than it was wide and color flow pattern in Color Doppler ultrasound. The presence and absence of characteristics of nodules were classified as having positive or negative findings. If even one of these sonographic features was present, the nodule was classified as positive (malignant). If a nodule had none of the features described, it was classified as negative (benign). The final diagnosis of a lesion as benign (n = 53) or malignant (n = 25) was confirmed by fine needle aspiration biopsy, and patients who were proved to have benign lesions were followed-up for 6 months and malignant lesions which were proved on histopathology after FNA were subjected to surgery. The sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were then calculated on the basis of our proposed classification method. Among 78 solid thyroid nodules 35 lesions were classified as positive considering the sonographic characteristics and 23 of them were proved to be malignant on histopatology. Out of 43 lesions which were classified as negative, 2 were proved to be malignant. The sensitivity, specificity, positive predictive value, negative predictive value and accuracy based on our sonographic classification method were 93.8%, 66%, 56.1%, 95.9%, and 74.8%, respectively. Ultrasound is valuable for identifying many malignant or potentially malignant thyroid nodules. No single ultrasound criterion is reliable in

  14. Novel molecular subgroups for clinical classification and outcome prediction in childhood medulloblastoma: a cohort study.

    PubMed

    Schwalbe, Edward C; Lindsey, Janet C; Nakjang, Sirintra; Crosier, Stephen; Smith, Amanda J; Hicks, Debbie; Rafiee, Gholamreza; Hill, Rebecca M; Iliasova, Alice; Stone, Thomas; Pizer, Barry; Michalski, Antony; Joshi, Abhijit; Wharton, Stephen B; Jacques, Thomas S; Bailey, Simon; Williamson, Daniel; Clifford, Steven C

    2017-07-01

    International consensus recognises four medulloblastoma molecular subgroups: WNT (MBWNT), SHH (MBSHH), group 3 (MBGrp3), and group 4 (MBGrp4), each defined by their characteristic genome-wide transcriptomic and DNA methylomic profiles. These subgroups have distinct clinicopathological and molecular features, and underpin current disease subclassification and initial subgroup-directed therapies that are underway in clinical trials. However, substantial biological heterogeneity and differences in survival are apparent within each subgroup, which remain to be resolved. We aimed to investigate whether additional molecular subgroups exist within childhood medulloblastoma and whether these could be used to improve disease subclassification and prognosis predictions. In this retrospective cohort study, we assessed 428 primary medulloblastoma samples collected from UK Children's Cancer and Leukaemia Group (CCLG) treatment centres (UK), collaborating European institutions, and the UKCCSG-SIOP-PNET3 European clinical trial. An independent validation cohort (n=276) of archival tumour samples was also analysed. We analysed samples from patients with childhood medulloblastoma who were aged 0-16 years at diagnosis, and had central review of pathology and comprehensive clinical data. We did comprehensive molecular profiling, including DNA methylation microarray analysis, and did unsupervised class discovery of test and validation cohorts to identify consensus primary molecular subgroups and characterise their clinical and biological significance. We modelled survival of patients aged 3-16 years in patients (n=215) who had craniospinal irradiation and had been treated with a curative intent. Seven robust and reproducible primary molecular subgroups of childhood medulloblastoma were identified. MBWNT remained unchanged and each remaining consensus subgroup was split in two. MBSHH was split into age-dependent subgroups corresponding to infant (<4·3 years; MBSHH-Infant; n=65) and

  15. Larval description of Drusus bosnicus Klapálek 1899 (Trichoptera: Limnephilidae), with distributional, molecular and ecological features

    PubMed Central

    KUČINIĆ, MLADEN; PREVIŠIĆ, ANA; GRAF, WOLFRAM; MIHOCI, IVA; ŠOUFEK, MARIN; STANIĆ-KOŠTROMAN, SVJETLANA; LELO, SUVAD; VITECEK, SIMON; WARINGER, JOHANN

    2016-01-01

    In this study we present morphological, molecular and ecological features of the last instar larvae of Drusus bosnicus with data about distribution of this species in Bosnia and Herzegovina. We also included are the most important diagnostic features enabling separation of larvae of D. bosnicus from larvae of the other European Drusinae and Trichoptera species. PMID:26249056

  16. Pre-transplantation minimal residual disease with cytogenetic and molecular diagnostic features improves risk stratification in acute myeloid leukemia

    PubMed Central

    Oran, Betül; Jorgensen, Jeff L.; Marin, David; Wang, Sa; Ahmed, Sairah; Alousi, Amin M.; Andersson, Borje S.; Bashir, Qaiser; Bassett, Roland; Lyons, Genevieve; Chen, Julianne; Rezvani, Katy; Popat, Uday; Kebriaei, Partow; Patel, Keyur; Rondon, Gabriela; Shpall, Elizabeth J.; Champlin, Richard E.

    2017-01-01

    Our aim was to improve outcome prediction after allogeneic hematopoietic stem cell transplantation in acute myeloid leukemia by combining cytogenetic and molecular data at diagnosis with minimal residual disease assessment by multicolor flow-cytometry at transplantation. Patients with acute myeloid leukemia in first complete remission in whom minimal residual disease was assessed at transplantation were included and categorized according to the European LeukemiaNet classification. The primary outcome was 1-year relapse incidence after transplantation. Of 152 patients eligible, 48 had minimal residual disease at the time of their transplant. Minimal residual disease-positive patients were older, required more therapy to achieve first remission, were more likely to have incomplete recovery of blood counts and had more adverse risk features by cytogenetics. Relapse incidence at 1 year was higher in patients with minimal residual disease (32.6% versus 14.4%, P=0.002). Leukemia-free survival (43.6% versus 64%, P=0.007) and overall survival (48.8% versus 66.9%, P=0.008) rates were also inferior in patients with minimal residual disease. In multivariable analysis, minimal residual disease status at transplantation independently predicted 1-year relapse incidence, identifying a subgroup of intermediate-risk patients, according to the European LeukemiaNet classification, with a particularly poor outcome. Assessment of minimal residual disease at transplantation in combination with cytogenetic and molecular findings provides powerful independent prognostic information in acute myeloid leukemia, lending support to the incorporation of minimal residual disease detection to refine risk stratification and develop a more individualized approach during hematopoietic stem cell transplantation. PMID:27540139

  17. Prediction of quantum interference in molecular junctions using a parabolic diagram: Understanding the origin of Fano and anti- resonances

    NASA Astrophysics Data System (ADS)

    Nozaki, Daijiro; Avdoshenko, Stanislav M.; Sevinçli, Hâldun; Gutierrez, Rafael; Cuniberti, Gianaurelio

    2013-03-01

    Recently the interest in quantum interference (QI) phenomena in molecular devices (molecular junctions) has been growing due to the unique features observed in the transmission spectra. In order to design single molecular devices exploiting QI effects as desired, it is necessary to provide simple rules for predicting the appearance of QI effects such as anti-resonances or Fano line shapes and for controlling them. In this study, we derive a transmission function of a generic molecular junction with a side group (T-shaped molecular junction) using a minimal toy model. We developed a simple method to predict the appearance of quantum interference, Fano resonances or anti- resonances, and its position in the conductance spectrum by introducing a simple graphical representation (parabolic model). Using it we can easily visualize the relation between the key electronic parameters and the positions of normal resonant peaks and anti-resonant peaks induced by quantum interference in the conductance spectrum. We also demonstrate Fano and anti-resonance in T-shaped molecular junctions using a simple tight-binding model. This parabolic model enables one to infer on-site energies of T-shaped molecules and the coupling between side group and main conduction channel from transmission spectra.

  18. Predictive features of chronic kidney disease in atypical haemolytic uremic syndrome

    PubMed Central

    Jamme, Matthieu; Raimbourg, Quentin; Chauveau, Dominique; Seguin, Amélie; Presne, Claire; Perez, Pierre; Gobert, Pierre; Wynckel, Alain; Provôt, François; Delmas, Yahsou; Mousson, Christiane; Servais, Aude; Vrigneaud, Laurence; Veyradier, Agnès

    2017-01-01

    Chronic kidney disease (CKD) is a frequent and serious complication of atypical haemolytic uremic syndrome (aHUS). We aimed to develop a simple accurate model to predict the risk of renal dysfunction in aHUS based on clinical and biological features available at hospital admission. Renal function at 1-year follow-up, based on an estimated glomerular filtration rate < 60mL/min/1.73m2 as assessed by the Modification of Diet in Renal Disease equation, was used as an indicator of significant CKD. Prospectively collected data from a cohort of 156 aHUS patients who did not receive eculizumab were used to identify predictors of CKD. Covariates associated with renal impairment were identified by multivariate analysis. The model performance was assessed and a scoring system for clinical practice was constructed from the regression coefficient. Multivariate analyses identified three predictors of CKD: a high serum creatinine level, a high mean arterial pressure and a mildly decreased platelet count. The prognostic model had a good discriminative ability (area under the curve = .84). The scoring system ranged from 0 to 5, with corresponding risks of CKD ranging from 18% to 100%. This model accurately predicts development of 1-year CKD in patients with aHUS using clinical and biological features available on admission. After further validation, this model may assist in clinical decision making. PMID:28542627

  19. Conventional MRI features for predicting the clinical outcome of patients with invasive placenta

    PubMed Central

    Chen, Ting; Xu, Xiao-Quan; Shi, Hai-Bin; Yang, Zheng-Qiang; Zhou, Xin; Pan, Yi

    2017-01-01

    PURPOSE We aimed to evaluate whether morphologic magnetic resonance imaging (MRI) features could help to predict the maternal outcome after uterine artery embolization (UAE)-assisted cesarean section (CS) in patients with invasive placenta previa. METHODS We retrospectively reviewed the MRI data of 40 pregnant women who have undergone UAE-assisted cesarean section due to suspected high risk of massive hemorrhage caused by invasive placenta previa. Patients were divided into two groups based on the maternal outcome (good-outcome group: minor hemorrhage and uterus preserved; poor-outcome group: significant hemorrhage or emergency hysterectomy). Morphologic MRI features were compared between the two groups. Multivariate logistic regression analysis was used to identify the most valuable variables, and predictive value of the identified risk factor was determined. RESULTS Low signal intensity bands on T2-weighted imaging (P < 0.001), placenta percreta (P = 0.011), and placental cervical protrusion sign (P = 0.002) were more frequently observed in patients with poor outcome. Low signal intensity bands on T2-weighted imaging was the only significant predictor of poor maternal outcome in multivariate analysis (P = 0.020; odds ratio, 14.79), with 81.3% sensitivity and 84.3% specificity. CONCLUSION Low signal intensity bands on T2-weighted imaging might be a predictor of poor maternal outcome after UAE-assisted cesarean section in patients with invasive placenta previa. PMID:28345524

  20. Unsupervised feature learning improves prediction of human brain activity in response to natural images.

    PubMed

    Güçlü, Umut; van Gerven, Marcel A J

    2014-08-01

    Encoding and decoding in functional magnetic resonance imaging has recently emerged as an area of research to noninvasively characterize the relationship between stimulus features and human brain activity. To overcome the challenge of formalizing what stimulus features should modulate single voxel responses, we introduce a general approach for making directly testable predictions of single voxel responses to statistically adapted representations of ecologically valid stimuli. These representations are learned from unlabeled data without supervision. Our approach is validated using a parsimonious computational model of (i) how early visual cortical representations are adapted to statistical regularities in natural images and (ii) how populations of these representations are pooled by single voxels. This computational model is used to predict single voxel responses to natural images and identify natural images from stimulus-evoked multiple voxel responses. We show that statistically adapted low-level sparse and invariant representations of natural images better span the space of early visual cortical representations and can be more effectively exploited in stimulus identification than hand-designed Gabor wavelets. Our results demonstrate the potential of our approach to better probe unknown cortical representations.

  1. Geopositioning with a quadcopter: Extracted feature locations and predicted accuracy without a priori sensor attitude information

    NASA Astrophysics Data System (ADS)

    Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron

    2017-05-01

    This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.

  2. MELANCHOLIC DEPRESSION PREDICTION BY IDENTIFYING REPRESENTATIVE FEATURES IN METABOLIC AND MICROARRAY PROFILES WITH MISSING VALUES

    PubMed Central

    Nie, Zhi; Yang, Tao; Liu, Yashu; Lin, Binbin; Li, Qingyang; Narayan, Vaibhav A; Wittenberg, Gayle; Ye, Jieping

    2014-01-01

    Recent studies have revealed that melancholic depression, one major subtype of depression, is closely associated with the concentration of some metabolites and biological functions of certain genes and pathways. Meanwhile, recent advances in biotechnologies have allowed us to collect a large amount of genomic data, e.g., metabolites and microarray gene expression. With such a huge amount of information available, one approach that can give us new insights into the understanding of the fundamental biology underlying melancholic depression is to build disease status prediction models using classification or regression methods. However, the existence of strong empirical correlations, e.g., those exhibited by genes sharing the same biological pathway in microarray profiles, tremendously limits the performance of these methods. Furthermore, the occurrence of missing values which are ubiquitous in biomedical applications further complicates the problem. In this paper, we hypothesize that the problem of missing values might in some way benefit from the correlation between the variables and propose a method to learn a compressed set of representative features through an adapted version of sparse coding which is capable of identifying correlated variables and addressing the issue of missing values simultaneously. An efficient algorithm is also developed to solve the proposed formulation. We apply the proposed method on metabolic and microarray profiles collected from a group of subjects consisting of both patients with melancholic depression and healthy controls. Results show that the proposed method can not only produce meaningful clusters of variables but also generate a set of representative features that achieve superior classification performance over those generated by traditional clustering and data imputation techniques. In particular, on both datasets, we found that in comparison with the competing algorithms, the representative features learned by the proposed

  3. Melancholic depression prediction by identifying representative features in metabolic and microarray profiles with missing values.

    PubMed

    Nie, Zhi; Yang, Tao; Liu, Yashu; Li, Qingyang; Narayan, Vaibhav A; Wittenberg, Gayle; Ye, Jieping

    2015-01-01

    Recent studies have revealed that melancholic depression, one major subtype of depression, is closely associated with the concentration of some metabolites and biological functions of certain genes and pathways. Meanwhile, recent advances in biotechnologies have allowed us to collect a large amount of genomic data, e.g., metabolites and microarray gene expression. With such a huge amount of information available, one approach that can give us new insights into the understanding of the fundamental biology underlying melancholic depression is to build disease status prediction models using classification or regression methods. However, the existence of strong empirical correlations, e.g., those exhibited by genes sharing the same biological pathway in microarray profiles, tremendously limits the performance of these methods. Furthermore, the occurrence of missing values which are ubiquitous in biomedical applications further complicates the problem. In this paper, we hypothesize that the problem of missing values might in some way benefit from the correlation between the variables and propose a method to learn a compressed set of representative features through an adapted version of sparse coding which is capable of identifying correlated variables and addressing the issue of missing values simultaneously. An efficient algorithm is also developed to solve the proposed formulation. We apply the proposed method on metabolic and microarray profiles collected from a group of subjects consisting of both patients with melancholic depression and healthy controls. Results show that the proposed method can not only produce meaningful clusters of variables but also generate a set of representative features that achieve superior classification performance over those generated by traditional clustering and data imputation techniques. In particular, on both datasets, we found that in comparison with the competing algorithms, the representative features learned by the proposed

  4. In silico predictive studies of mAHR congener binding using homology modelling and molecular docking.

    PubMed

    Panda, Roshni; Cleave, A Suneetha Susan; Suresh, P K

    2014-09-01

    The aryl hydrocarbon receptor (AHR) is one of the principal xenobiotic, nuclear receptor that is responsible for the early events involved in the transcription of a complex set of genes comprising the CYP450 gene family. In the present computational study, homology modelling and molecular docking were carried out with the objective of predicting the relationship between the binding efficiency and the lipophilicity of different polychlorinated biphenyl (PCB) congeners and the AHR in silico. Homology model of the murine AHR was constructed by several automated servers and assessed by PROCHECK, ERRAT, VERIFY3D and WHAT IF. The resulting model of the AHR by MODWEB was used to carry out molecular docking of 36 PCB congeners using PatchDock server. The lipophilicity of the congeners was predicted using the XLOGP3 tool. The results suggest that the lipophilicity influences binding energy scores and is positively correlated with the same. Score and Log P were correlated with r = +0.506 at p = 0.01 level. In addition, the number of chlorine (Cl) atoms and Log P were highly correlated with r = +0.900 at p = 0.01 level. The number of Cl atoms and scores also showed a moderate positive correlation of r = +0.481 at p = 0.01 level. To the best of our knowledge, this is the first study employing PatchDock in the docking of AHR to the environmentally deleterious congeners and attempting to correlate structural features of the AHR with its biochemical properties with regards to PCBs. The result of this study are consistent with those of other computational studies reported in the previous literature that suggests that a combination of docking, scoring and ranking organic pollutants could be a possible predictive tool for investigating ligand-mediated toxicity, for their subsequent validation using wet lab-based studies.

  5. Computational intelligence models to predict porosity of tablets using minimum features.

    PubMed

    Khalid, Mohammad Hassan; Kazemi, Pezhman; Perez-Gandarillas, Lucia; Michrafy, Abderrahim; Szlęk, Jakub; Jachowicz, Renata; Mendyk, Aleksander

    2017-01-01

    The effects of different formulations and manufacturing process conditions on the physical properties of a solid dosage form are of importance to the pharmaceutical industry. It is vital to have in-depth understanding of the material properties and governing parameters of its processes in response to different formulations. Understanding the mentioned aspects will allow tighter control of the process, leading to implementation of quality-by-design (QbD) practices. Computational intelligence (CI) offers an opportunity to create empirical models that can be used to describe the system and predict future outcomes in silico. CI models can help explore the behavior of input parameters, unlocking deeper understanding of the system. This research endeavor presents CI models to predict the porosity of tablets created by roll-compacted binary mixtures, which were milled and compacted under systematically varying conditions. CI models were created using tree-based methods, artificial neural networks (ANNs), and symbolic regression trained on an experimental data set and screened using root-mean-square error (RMSE) scores. The experimental data were composed of proportion of microcrystalline cellulose (MCC) (in percentage), granule size fraction (in micrometers), and die compaction force (in kilonewtons) as inputs and porosity as an output. The resulting models show impressive generalization ability, with ANNs (normalized root-mean-square error [NRMSE] =1%) and symbolic regression (NRMSE =4%) as the best-performing methods, also exhibiting reliable predictive behavior when presented with a challenging external validation data set (best achieved symbolic regression: NRMSE =3%). Symbolic regression demonstrates the transition from the black box modeling paradigm to more transparent predictive models. Predictive performance and feature selection behavior of CI models hints at the most important variables within this factor space.

  6. Computational intelligence models to predict porosity of tablets using minimum features

    PubMed Central

    Khalid, Mohammad Hassan; Kazemi, Pezhman; Perez-Gandarillas, Lucia; Michrafy, Abderrahim; Szlęk, Jakub; Jachowicz, Renata; Mendyk, Aleksander

    2017-01-01

    The effects of different formulations and manufacturing process conditions on the physical properties of a solid dosage form are of importance to the pharmaceutical industry. It is vital to have in-depth understanding of the material properties and governing parameters of its processes in response to different formulations. Understanding the mentioned aspects will allow tighter control of the process, leading to implementation of quality-by-design (QbD) practices. Computational intelligence (CI) offers an opportunity to create empirical models that can be used to describe the system and predict future outcomes in silico. CI models can help explore the behavior of input parameters, unlocking deeper understanding of the system. This research endeavor presents CI models to predict the porosity of tablets created by roll-compacted binary mixtures, which were milled and compacted under systematically varying conditions. CI models were created using tree-based methods, artificial neural networks (ANNs), and symbolic regression trained on an experimental data set and screened using root-mean-square error (RMSE) scores. The experimental data were composed of proportion of microcrystalline cellulose (MCC) (in percentage), granule size fraction (in micrometers), and die compaction force (in kilonewtons) as inputs and porosity as an output. The resulting models show impressive generalization ability, with ANNs (normalized root-mean-square error [NRMSE] =1%) and symbolic regression (NRMSE =4%) as the best-performing methods, also exhibiting reliable predictive behavior when presented with a challenging external validation data set (best achieved symbolic regression: NRMSE =3%). Symbolic regression demonstrates the transition from the black box modeling paradigm to more transparent predictive models. Predictive performance and feature selection behavior of CI models hints at the most important variables within this factor space. PMID:28138223

  7. Prediction of molecular properties including symmetry from quantum-based molecular structural formulas, VIF.

    PubMed

    Alia, Joseph D; Vlaisavljevich, Bess; Abbot, Matthew; Warneke, Hallie; Mastin, Tyson

    2008-10-09

    Structurally covariant valency interaction formulas, VIF, gain chemical significance by comparison with resonance structures and natural bond orbital, NBO, bonding schemes and at the same time allow for additional prediction such as symmetry of ring systems and destabilization of electron pairs with respect to reference energy of -1/2 Eh. Comparisons are based on three chemical interpretations of Sinanoğlu's theory of structural covariance: (1) sets of structurally covariant quantum structural formulas, VIF, are interpreted as the same quantum operator represented in linearly related basis frames; (2) structurally covariant VIF pictures are interpreted as sets of molecular species with similar energy; and (3) the same VIF picture can be interpreted as different quantum operators, one-electron density or Hamiltonian; for example. According to these three interpretations, bond pair, lone pair, and free radical electrons understood in terms of a localized orbital representation are recognized as having energies above, below, or equal to a predetermined reference, frequently-1/2 Eh. The probable position of electron pairs and radical electrons is predicted. The selectivity of concerted ring closures in allyl anion and cation is described. Symmetries of conjugated ring systems are predicted according to their numbers of pi-electrons and spin-multiplicity. The pi-distortivity of benzene is predicted.The 3c/2e- H-bridging bonds in diborane are derived in a natural way according to the notion that the bridging bonds will have delocalizing interactions between them consistent with results of the NBO method. Key chemical bonding motifs are described using VIF. These include 2c/1e-, 2c/2e-, 2c/3e-, 3c/2e-, 3c/3e-,3c/4e-, 4n antiaromatic, and 4n+2 aromatic bonding systems. Some common organic functional groups are represented as VIF pictures and because these pictures can be interpreted simultaneously as one-electron density and Hamiltonian operators, the valence shell

  8. Using theory and simulation to link molecular features of nanoscale fillers to morphology in polymer nanocomposites

    NASA Astrophysics Data System (ADS)

    Jayaraman, Arthi; Martin, Tyler

    2014-03-01

    Polymer nanocomposites are a class of materials that consist of a polymer matrix embedded with nanoscale fillers or additives that enhance the inherent properties of the matrix polymer. To engineer polymer nanocomposites for specific applications with target macroscopic properties (e.g. photovoltaics, photonics, automobile parts) it is important to have design rules that relate molecular features to equilibrium morphology of the composite. In the first part of the talk I will present our recent theory and simulation work on composites containing polymer grafted nanoparticles, showing how polydispersity in graft and matrix polymers (physical heterogeneity) can be used to stabilize dispersion of the nanoparticles within a polymer matrix. In the second part of the talk I will present our recent work linking block-copolymer functionalization to the nanoparticle location in a polymer matrix consisting of homopolymer blends.

  9. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues

    PubMed Central

    Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/. PMID:27907159

  10. Somatic molecular changes and histo-pathological features of colorectal cancer in Tunisia

    PubMed Central

    Aissi, Sana; Buisine, Marie Pierre; Zerimech, Farid; Kourda, Nadia; Moussa, Amel; Manai, Mohamed; Porchet, Nicole

    2013-01-01

    AIM: To determine correlations between family history, clinical features and mutational status of genes involved in the progression of colorectal cancer (CRC). METHODS: Histo-pathological features and molecular changes [KRAS, BRAF and CTNNB1 genes mutations, microsatellite instability (MSI) phenotype, expression of mismatch repair (MMR) and mucin (MUC) 5AC proteins, mutation and expression analysis of TP53, MLH1 promoter hypermethylation analysis] were examined in a series of 51 unselected Tunisian CRC patients, 10 of them had a proven or probable hereditary disease, on the track of new tumoral markers for CRC susceptibility in Tunisian patients. RESULTS: As expected, MSI and MMR expression loss were associated to the presence of familial CRC (75% vs 9%, P < 0.001). However, no significant associations have been detected between personal or familial cancer history and KRAS (codons 12 and 13) or TP53 (exons 4-9) alterations. A significant inverse relationship has been observed between the presence of MSI and TP53 accumulation (10.0% vs 48.8%, P = 0.0335) in CRC tumors, suggesting different molecular pathways to CRC that in turn may reflect different environmental exposures. Interestingly, MUC5AC expression was significantly associated to the presence of MSI (46.7% vs 8.3%, P = 0.0039), MMR expression loss (46.7% vs 8.3%, P = 0.0039) and the presence of familial CRC (63% vs 23%, P = 0.039). CONCLUSION: These findings suggest that MUC5AC expression analysis may be useful in the screening of Tunisian patients with high risk of CRC. PMID:23983431

  11. Functional connectivity classification of autism identifies highly predictive brain features but falls short of biomarker standards

    PubMed Central

    Plitt, Mark; Barnes, Kelly Anne; Martin, Alex

    2014-01-01

    Objectives Autism spectrum disorders (ASD) are diagnosed based on early-manifesting clinical symptoms, including markedly impaired social communication. We assessed the viability of resting-state functional MRI (rs-fMRI) connectivity measures as diagnostic biomarkers for ASD and investigated which connectivity features are predictive of a diagnosis. Methods Rs-fMRI scans from 59 high functioning males with ASD and 59 age- and IQ-matched typically developing (TD) males were used to build a series of machine learning classifiers. Classification features were obtained using 3 sets of brain regions. Another set of classifiers was built from participants' scores on behavioral metrics. An additional age and IQ-matched cohort of 178 individuals (89 ASD; 89 TD) from the Autism Brain Imaging Data Exchange (ABIDE) open-access dataset (http://fcon_1000.projects.nitrc.org/indi/abide/) were included for replication. Results High classification accuracy was achieved through several rs-fMRI methods (peak accuracy 76.67%). However, classification via behavioral measures consistently surpassed rs-fMRI classifiers (peak accuracy 95.19%). The class probability estimates, P(ASD|fMRI data), from brain-based classifiers significantly correlated with scores on a measure of social functioning, the Social Responsiveness Scale (SRS), as did the most informative features from 2 of the 3 sets of brain-based features. The most informative connections predominantly originated from regions strongly associated with social functioning. Conclusions While individuals can be classified as having ASD with statistically significant accuracy from their rs-fMRI scans alone, this method falls short of biomarker standards. Classification methods provided further evidence that ASD functional connectivity is characterized by dysfunction of large-scale functional networks, particularly those involved in social information processing. PMID:25685703

  12. Predicting the pathological features of the mesorectum before the laparoscopic approach to rectal cancer.

    PubMed

    Fernández Ananín, Sonia; Targarona, Eduardo M; Martinez, Carmen; Pernas, Juan Carlos; Hernández, Diana; Gich, Ignasi; Sancho, Francesc J; Trias, Manuel

    2014-12-01

    Pelvic anatomy and tumour features play a role in the difficulty of the laparoscopic approach to total mesorectal excision in rectal cancer. The aim of the study was to analyse whether these characteristics also influence the quality of the surgical specimen. We performed a prospective study in consecutive patients with rectal cancer located less than 12 cm from the anal verge who underwent laparoscopic surgery between January 2010 and July 2013. Exclusion criteria were T1 and T4 tumours, abdominoperineal resections, obstructive and perforated tumours, or any major contraindication for laparoscopic surgery. Dependent variables were the circumferential resection margin (CMR) and the quality of the mesorectum. Sixty-four patients underwent laparoscopic sphincter-preserving total mesorectal excision. Resection was complete in 79.1% of specimens and CMR was positive in 9.7%. Univariate analysis showed tumour depth (T status) (P = 0.04) and promontorium-subsacrum angle (P = 0.02) independently predicted CRM (circumferential resection margin) positivity. Tumour depth (P < 0.05) and promontorium-subsacrum axis (P < 0.05) independently predicted mesorectum quality. Multivariate analysis identified the promontorium-subsacrum angle (P = 0.012) as the only independent predictor of CRM. Bony pelvis dimensions influenced the quality of the specimen obtained by laparoscopy. These measurements may be useful to predict which patients will benefit most from laparoscopic surgery and also to select patients in accordance with the learning curve of trainee surgeons.

  13. Time Score: A New Feature for Link Prediction in Social Networks

    NASA Astrophysics Data System (ADS)

    Munasinghe, Lankeshwara; Ichise, Ryutaro

    Link prediction in social networks, such as friendship networks and coauthorship networks, has recently attracted a great deal of attention. There have been numerous attempts to address the problem of link prediction through diverse approaches. In the present paper, we focus on the temporal behavior of the link strength, particularly the relationship between the time stamps of interactions or links and the temporal behavior of link strength and how link strength affects future link evolution. Most previous studies have not sufficiently discussed either the impact of time stamps of the interactions or time stamps of the links on link evolution. The gap between the current time and the time stamps of the interactions or links is also important to link evolution. In the present paper, we introduce a new time-aware feature, referred to as time score, that captures the important aspects of time stamps of interactions and the temporality of the link strengths. We also analyze the effectiveness of time score with different parameter settings for different network data sets. The results of the analysis revealed that the time score was sensitive to different networks and different time measures. We applied time score to two social network data sets, namely, Facebook friendship network data set and a coauthorship network data set. The results revealed a significant improvement in predicting future links.

  14. Ribonucleotide reductases reveal novel viral diversity and predict biological and ecological features of unknown marine viruses

    PubMed Central

    Sakowski, Eric G.; Munsell, Erik V.; Hyatt, Mara; Kress, William; Williamson, Shannon J.; Nasko, Daniel J.; Polson, Shawn W.; Wommack, K. Eric

    2014-01-01

    Virioplankton play a crucial role in aquatic ecosystems as top-down regulators of bacterial populations and agents of horizontal gene transfer and nutrient cycling. However, the biology and ecology of virioplankton populations in the environment remain poorly understood. Ribonucleotide reductases (RNRs) are ancient enzymes that reduce ribonucleotides to deoxyribonucleotides and thus prime DNA synthesis. Composed of three classes according to O2 reactivity, RNRs can be predictive of the physiological conditions surrounding DNA synthesis. RNRs are universal among cellular life, common within viral genomes and virioplankton shotgun metagenomes (viromes), and estimated to occur within >90% of the dsDNA virioplankton sampled in this study. RNRs occur across diverse viral groups, including all three morphological families of tailed phages, making these genes attractive for studies of viral diversity. Differing patterns in virioplankton diversity were clear from RNRs sampled across a broad oceanic transect. The most abundant RNRs belonged to novel lineages of podoviruses infecting α-proteobacteria, a bacterial class critical to oceanic carbon cycling. RNR class was predictive of phage morphology among cyanophages and RNR distribution frequencies among cyanophages were largely consistent with the predictions of the “kill the winner–cost of resistance” model. RNRs were also identified for the first time to our knowledge within ssDNA viromes. These data indicate that RNR polymorphism provides a means of connecting the biological and ecological features of virioplankton populations. PMID:25313075

  15. Accurate single-sequence prediction of solvent accessible surface area using local and global features

    PubMed Central

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-01-01

    We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org. PMID:25204636

  16. Prediction of individual prosthesis size for valve-sparing aortic root reconstruction based on geometric features.

    PubMed

    Hagenah, J; Werrmann, E; Scharfschwerdt, M; Ernst, F; Metzner, C

    2016-08-01

    Valve-sparing aortic root reconstruction is an up- and-coming approach for patients suffering from aortic valve insufficiencies which promises to significantly reduce complications. However, the success of the treatment strongly depends on the challenging task of choosing the correct size of the prosthesis, for which, up to now, surgeons solely have to rely on their experience. Here, we present a novel machine learning based approach, which might make it possible to predict the size of the prosthesis from pre-operatively acquired ultrasound images. We utilize support vector regression to train a prediction model on three geometric features extracted from the ultrasound data. In order to evaluate the accuracy and robustness of our approach we created a large data base of porcine aortic root geometries in a healthy state and an artificially dilated state. Our results indicate that prediction of correct prosthesis sizes is feasible. Furthermore, they suggest that it is crucial that the training data set faithfully represents the diversity of aortic root geometries.

  17. Integrated prediction of one-dimensional structural features and their relationships with conformational flexibility in helical membrane proteins.

    PubMed

    Ahmad, Shandar; Singh, Yumlembam Hemajit; Paudel, Yogesh; Mori, Takaharu; Sugita, Yuji; Mizuguchi, Kenji

    2010-10-27

    Many structural properties such as solvent accessibility, dihedral angles and helix-helix contacts can be assigned to each residue in a membrane protein. Independent studies exist on the analysis and sequence-based prediction of some of these so-called one-dimensional features. However, there is little explanation of why certain residues are predicted in a wrong structural class or with large errors in the absolute values of these features. On the other hand, membrane proteins undergo conformational changes to allow transport as well as ligand binding. These conformational changes often occur via residues that are inherently flexible and hence, predicting fluctuations in residue positions is of great significance. We performed a statistical analysis of common patterns among selected one-dimensional equilibrium structural features (ESFs) and developed a method for simultaneously predicting all of these features using an integrated system. Our results show that the prediction performance can be improved if multiple structural features are trained in an integrated model, compared to the current practice of developing individual models. In particular, the performance of the solvent accessibility and bend-angle prediction improved in this way. The well-performing bend-angle prediction can be used to predict helical positions with severe kinks at a modest success rate. Further, we showed that single-chain conformational dynamics, measured by B-factors derived from normal mode analysis, could be predicted from observed and predicted ESFs with good accuracy. A web server was developed (http://tardis.nibio.go.jp/netasa/htmone/) for predicting the one-dimensional ESFs from sequence information and analyzing the differences between the predicted and observed values of the ESFs. The prediction performance of the integrated model is significantly better than that of the models performing the task separately for each feature for the solvent accessibility and bend

  18. Integrated prediction of one-dimensional structural features and their relationships with conformational flexibility in helical membrane proteins

    PubMed Central

    2010-01-01

    Background Many structural properties such as solvent accessibility, dihedral angles and helix-helix contacts can be assigned to each residue in a membrane protein. Independent studies exist on the analysis and sequence-based prediction of some of these so-called one-dimensional features. However, there is little explanation of why certain residues are predicted in a wrong structural class or with large errors in the absolute values of these features. On the other hand, membrane proteins undergo conformational changes to allow transport as well as ligand binding. These conformational changes often occur via residues that are inherently flexible and hence, predicting fluctuations in residue positions is of great significance. Results We performed a statistical analysis of common patterns among selected one-dimensional equilibrium structural features (ESFs) and developed a method for simultaneously predicting all of these features using an integrated system. Our results show that the prediction performance can be improved if multiple structural features are trained in an integrated model, compared to the current practice of developing individual models. In particular, the performance of the solvent accessibility and bend-angle prediction improved in this way. The well-performing bend-angle prediction can be used to predict helical positions with severe kinks at a modest success rate. Further, we showed that single-chain conformational dynamics, measured by B-factors derived from normal mode analysis, could be predicted from observed and predicted ESFs with good accuracy. A web server was developed (http://tardis.nibio.go.jp/netasa/htmone/) for predicting the one-dimensional ESFs from sequence information and analyzing the differences between the predicted and observed values of the ESFs. Conclusions The prediction performance of the integrated model is significantly better than that of the models performing the task separately for each feature for the solvent

  19. Racial Differences in Esophageal Squamous Cell Carcinoma: Incidence and Molecular Features

    PubMed Central

    Zhou, Kai; Yang, Liguang

    2017-01-01

    The incidence and histological type of esophageal cancer are highly variable depending on geographic location and race/ethnicity. Here we want to determine if racial difference exists in the molecular features of esophageal cancer. We firstly confirmed that the incidence rate of esophagus adenocarcinoma (EA) was higher in Whites than in Asians and Blacks, while the incidence of esophageal squamous cell carcinoma (ESCC) was highest in Asians. Then we compared the genome-wide somatic mutations, methylation, and gene expression to identify differential genes by race. The mutation frequencies of some genes in the same pathway showed opposite difference between Asian and White patients, but their functional effects to the pathway may be consistent. The global patterns of methylation and expression were similar, which reflected the common characteristics of ESCC tumors from different populations. A small number of genes had significant differences between Asians and Whites. More interesting, the racial differences of COL11A1 were consistent across multiple molecular levels, with higher mutation frequency, higher methylation, and lower expression in White patients. This indicated that COL11A1 might play important roles in ESCC, especially in White population. Additional studies are needed to further explore their functions in esophageal cancer. PMID:28393072

  20. Programmatic features of aging originating in development: aging mechanisms beyond molecular damage?

    PubMed Central

    de Magalhães, João Pedro

    2012-01-01

    The idea that aging follows a predetermined sequence of events, a program, has been discredited by most contemporary authors. Instead, aging is largely thought to occur due to the accumulation of various forms of molecular damage. Recent work employing functional genomics now suggests that, indeed, certain facets of mammalian aging may follow predetermined patterns encoded in the genome as part of developmental processes. It appears that genetic programs coordinating some aspects of growth and development persist into adulthood and may become detrimental. This link between development and aging may occur due to regulated processes, including through the action of microRNAs and epigenetic mechanisms. Taken together with other results, in particular from worms, these findings provide evidence that some aging changes are not primarily a result of a build-up of stochastic damage but are rather a product of regulated processes. These processes are interpreted as forms of antagonistic pleiotropy, the product of a “shortsighted watchmaker,” and thus do not assume aging evolved for a purpose. Overall, it appears that the genome does, indeed, contain specific instructions that drive aging in animals, a radical shift in our perception of the aging process.—de Magalhães, J. P. Programmatic features of aging originating in development: aging mechanisms beyond molecular damage? PMID:22964300

  1. Molecular crosstalk between tumour and brain parenchyma instructs histopathological features in glioblastoma

    PubMed Central

    Bougnaud, Sébastien; Golebiewska, Anna; Oudin, Anaïs; Keunen, Olivier; Harter, Patrick N.; Mäder, Lisa; Azuaje, Francisco; Fritah, Sabrina; Stieber, Daniel; Kaoma, Tony; Vallar, Laurent; Brons, Nicolaas H.C.; Daubon, Thomas; Miletic, Hrvoje; Sundstrøm, Terje; Herold-Mende, Christel; Mittelbronn, Michel; Bjerkvig, Rolf; Niclou, Simone P.

    2016-01-01

    The histopathological and molecular heterogeneity of glioblastomas represents a major obstacle for effective therapies. Glioblastomas do not develop autonomously, but evolve in a unique environment that adapts to the growing tumour mass and contributes to the malignancy of these neoplasms. Here, we show that patient-derived glioblastoma xenografts generated in the mouse brain from organotypic spheroids reproducibly give rise to three different histological phenotypes: (i) a highly invasive phenotype with an apparent normal brain vasculature, (ii) a highly angiogenic phenotype displaying microvascular proliferation and necrosis and (iii) an intermediate phenotype combining features of invasion and vessel abnormalities. These phenotypic differences were visible during early phases of tumour development suggesting an early instructive role of tumour cells on the brain parenchyma. Conversely, we found that tumour-instructed stromal cells differentially influenced tumour cell proliferation and migration in vitro, indicating a reciprocal crosstalk between neoplastic and non-neoplastic cells. We did not detect any transdifferentiation of tumour cells into endothelial cells. Cell type-specific transcriptomic analysis of tumour and endothelial cells revealed a strong phenotype-specific molecular conversion between the two cell types, suggesting co-evolution of tumour and endothelial cells. Integrative bioinformatic analysis confirmed the reciprocal crosstalk between tumour and microenvironment and suggested a key role for TGFβ1 and extracellular matrix proteins as major interaction modules that shape glioblastoma progression. These data provide novel insight into tumour-host interactions and identify novel stroma-specific targets that may play a role in combinatorial treatment strategies against glioblastoma. PMID:27049916

  2. Spatial Habitat Features Derived from Multiparametric Magnetic Resonance Imaging Data Are Associated with Molecular Subtype and 12-Month Survival Status in Glioblastoma Multiforme

    PubMed Central

    Lee, Joonsang; Narang, Shivali; Martinez, Juan; Rao, Ganesh; Rao, Arvind

    2015-01-01

    One of the most common and aggressive malignant brain tumors is Glioblastoma multiforme. Despite the multimodality treatment such as radiation therapy and chemotherapy (temozolomide: TMZ), the median survival rate of glioblastoma patient is less than 15 months. In this study, we investigated the association between measures of spatial diversity derived from spatial point pattern analysis of multiparametric magnetic resonance imaging (MRI) data with molecular status as well as 12-month survival in glioblastoma. We obtained 27 measures of spatial proximity (diversity) via spatial point pattern analysis of multiparametric T1 post-contrast and T2 fluid-attenuated inversion recovery MRI data. These measures were used to predict 12-month survival status (≤12 or >12 months) in 74 glioblastoma patients. Kaplan-Meier with receiver operating characteristic analyses was used to assess the relationship between derived spatial features and 12-month survival status as well as molecular subtype status in patients with glioblastoma. Kaplan-Meier survival analysis revealed that 14 spatial features were capable of stratifying overall survival in a statistically significant manner. For prediction of 12-month survival status based on these diversity indices, sensitivity and specificity were 0.86 and 0.64, respectively. The area under the receiver operating characteristic curve and the accuracy were 0.76 and 0.75, respectively. For prediction of molecular subtype status, proneural subtype shows highest accuracy of 0.93 among all molecular subtypes based on receiver operating characteristic analysis. We find that measures of spatial diversity from point pattern analysis of intensity habitats from T1 post-contrast and T2 fluid-attenuated inversion recovery images are associated with both tumor subtype status and 12-month survival status and may therefore be useful indicators of patient prognosis, in addition to providing potential guidance for molecularly-targeted therapies in

  3. The associations between mast cell infiltration, clinical features and molecular types of invasive breast cancer

    PubMed Central

    Tang, Xiaoqiao; Zhang, Yifen; Huang, Tao

    2016-01-01

    Associations between mast cell infiltration and the clinical features and known molecular profile of breast cancer remain unclear. The distribution difference of mast cell was evaluated, in 219 patients with no special type of invasive carcinoma, using sorts of age, max diameter of cancer, histological type, lymph node metastasis as well as the expressions of estrogen receptor (ER), progestogen receptor (PR), human epidermal growth factor receptor 2 (HER-2) and nuclear protein Ki67. The mast cell density (MCD) in patients younger than 50 years old was significantly higher than that in patients with age ≥ 50. The MCD in ER or PR positive patients was significantly higher than MCD in ER or PR negative patients. The MCD in patients with Ki67 ≤ 14% was also significantly higher than MDC in patients with Ki67 > 14%. The MCD of patients with invasive ductal carcinoma was significantly higher than MCD of patients with invasive lobular carcinoma. No significant distribution difference of MCD was found to be associated with max diameter of cancer, lymph node metastasis and HER-2. Further analysis found that MDC was significantly higher in patients after neo-adjuvant chemotherapy. The distribution difference of mast cell widely exists in patients with distinct clinical features, the role of mast cell in breast cancer need further research with detailed and reasonable classification to clarify. PMID:27835573

  4. Clinical, Pathological, and Molecular Features of Lung Adenocarcinomas with AXL Expression

    PubMed Central

    Suda, Kenichi; Shimizu, Shigeki; Sakai, Kazuko; Mizuuchi, Hiroshi; Tomizawa, Kenji; Takemoto, Toshiki; Nishio, Kazuto; Mitsudomi, Tetsuya

    2016-01-01

    The receptor tyrosine kinase AXL is a member of the Tyro3-Axl-Mer receptor tyrosine kinase subfamily. AXL affects several cellular functions, including growth and migration. AXL aberration is reportedly a marker for poor prognosis and treatment resistance in various cancers. In this study, we analyzed clinical, pathological, and molecular features of AXL expression in lung adenocarcinomas (LADs). We examined 161 LAD specimens from patients who underwent pulmonary resections. When AXL protein expression was quantified (0, 1+, 2+, 3+) according to immunohistochemical staining intensity, results were 0: 35%; 1+: 20%; 2+: 37%; and 3+: 7% for the 161 samples. AXL expression status did not correlate with clinical features, including smoking status and pathological stage. However, patients whose specimens showed strong AXL expression (3+) had markedly poorer prognoses than other groups (P = 0.0033). Strong AXL expression was also significantly associated with downregulation of E-cadherin (P = 0.025) and CD44 (P = 0.0010). In addition, 9 of 12 specimens with strong AXL expression had driver gene mutations (6 with EGFR, 2 with KRAS, 1 with ALK). In conclusion, we found that strong AXL expression in surgically resected LADs was a predictor of poor prognosis. LADs with strong AXL expression were characterized by mesenchymal status, higher expression of stem-cell-like markers, and frequent driver gene mutations. PMID:27100677

  5. Ceruloplasmin/Hephaestin Knockout Mice Model Morphologic and Molecular Features of AMD

    PubMed Central

    Hadziahmetovic, Majda; Dentchev, Tzvete; Song, Ying; Haddad, Nadine; He, Xining; Hahn, Paul; Pratico, Domenico; Wen, Rong; Harris, Z. Leah; Lambris, John D.; Beard, John; Dunaief, Joshua L.

    2008-01-01

    Purpose Iron is an essential element in human metabolism but also is a potent generator of oxidative damage with levels that increase with age. Several studies suggest that iron accumulation may be a factor in age-related macular degeneration (AMD). In prior studies, both iron overload and features of AMD were identified in mice deficient in the ferroxidase ceruloplasmin (Cp) and its homologue hephaestin (Heph) (double knockout, DKO). In this study, the location and timing of iron accumulation, the rate and reproducibility of retinal degeneration, and the roles of oxidative stress and complement activation were determined. Methods Morphologic analysis and histochemical iron detection by Perls' staining was performed on retina sections from DKO and control mice. Immunofluorescence and immunohistochemistry were performed with antibodies detecting activated complement factor C3, transferrin receptor, L-ferritin, and macrophages. Tissue iron levels were measured by atomic absorption spectrophotometry. Isoprostane F2α-VI, a specific marker of oxidative stress, was quantified in the tissue by gas chromatography/mass spectrometry. Results DKOs exhibited highly reproducible age-dependent iron overload, which plateaued at 6 months of age, with subsequent progressive retinal degeneration continuing to at least 12 months. The degeneration shared some features of AMD, including RPE hypertrophy and hyperplasia, photoreceptor degeneration, subretinal neovascularization, RPE lipofuscin accumulation, oxidative stress, and complement activation. Conclusions DKOs have age-dependent iron accumulation followed by retinal degeneration modeling some of the morphologic and molecular features of AMD. Therefore, these mice are a good platform on which to test therapeutic agents for AMD, such as antioxidants, iron chelators, and antiangiogenic agents. PMID:18326691

  6. The Prediction of Biological Activity Using Molecular Connectivity Indices.

    DTIC Science & Technology

    1986-04-23

    toxicities of 15 organotin compounds against Daphnia magna . -18- This study confirmed that molecular topology can be employed to model the behavior...parameter correlation of the toxicity of polycyclic aromatic hydrocarbons in Daphnia Pulex with 0XV: -log LC50 = 0.5346 OXV - 7.004 (r = 0.9972, n...CRC Press, Boca Raton, Florida, * 1983, chap. 4, pp. 105-140. 8. L.B. Kier and L.H. Hall, Molecular Connectivity in Chemistry and Drug * Research

  7. Histopathological features predictive of a clinical diagnosis of ophthalmic granulomatosis with polyangiitis (GPA).

    PubMed

    Isa, Hazlita; Lightman, Sue; Luthert, Philip J; Rose, Geoffrey E; Verity, David H; Taylor, Simon R J

    2012-01-01

    The limited form of Granulomatosis with Polyangiitis (GPA), formerly known as Wegener's Granulomatosis (WG) primarily involves the head and neck region, including the orbit, but is often a diagnostic challenge, particularly as it commonly lacks positive anti-neutrophil cytoplasm antibody (ANCA) titres or classical features on diagnostic orbital biopsies. The purpose of this study was to relate biopsy findings with clinical outcome and to determine which histopathological features are predictive of a clinical diagnosis of GPA. Retrospective case series of 234 patients identified from the database of the UCL Institute of Ophthalmology Department of Eye Pathology as having had orbital biopsies of orbital inflammatory disorders performed between 1988 and 2009. Clinical records were obtained for the patients and analysed to see whether patients had GPA or not, according to a standard set of diagnostic criteria (excluding any histopathological findings). Biopsy features were then correlated with the clinical diagnosis in univariate and multivariate analyses to determine factors predictive of GPA. Of the 234 patients, 36 were diagnosed with GPA and 198 with other orbital pathologies. The majority of biopsies were from orbital masses (47%). Histology showed a range of acute and chronic inflammatory pictures in all biopsies, but the presence of neutrophils (P<0.001), vasculitis (P<0.001), necrosis (P<0.001), eosinophils (P<0.02) and macrophages (P=0.05) were significantly associated with a later clinical diagnosis of GPA. In a multivariate analysis, only tissue neutrophils (OR=3.6, P=0.01) and vasculitis (OR=2.6, P=0.02) were independently associated with GPA, in contrast to previous reports associating eosinophils and necrosis with the diagnosis. Neutrophil, eosinophil and macrophage infiltration of orbital tissues, together with vasculitis and necrosis, are all associated with a clinical diagnosis of GPA, but only neutrophil infiltration and vasculitis are independently

  8. Molecular definition of a region of chromosome 21 that causes features of the Down syndrome phenotype

    PubMed Central

    Korenberg, Julie R.; Kawashima, Hiroko; Pulst, Stefan-M.; Ikeuchi, T.; Ogasawara, N.; Yamamoto, K.; Schonberg, Steven A.; West, Ruth; Allen, Leland; Magenis, Ellen; Ikawa, K.; Taniguchi, N.; Epstein, Charles J.

    1990-01-01

    Down syndrome (DS) is a major cause of mental retardation and heart disease. Although it is usually caused by the presence of an extra chromosome 21, a subset of the diagnostic features may be caused by the presence of only band 21q22. We now present evidence that significantly narrows the chromosomal region responsible for several of the phenotypic features of DS. We report a molecular and cytogenetic analysis of a three-generation family containing four individuals with clinical DS as manifested by the characteristic facial appearance, endocardial cushion defect, mental retardation, and probably dermatoglyphic changes. Autoradiograms of quantitative Southern blots of DNAs from two affected sisters, their carrier father, and a normal control were analyzed after hybridization with two to six unique DNA sequences regionally mapped on chromosome 21. These include cDNA probes for the genes for CuZn-superoxide dismutase (SOD1) mapping in 21q22.1 and for the amyloid precursor protein (APP) mapping in 21q11.2-21.05, in addition to six probes for single-copy sequences: D21S46 in 21q11.2-21.05, D21S47 and SF57 in 21q22.1-22.3, and D21S39, D21S42, and D21S43 in 21q22.3. All sequences located in 21q22.3 were present in three copies in the affected individuals, whereas those located proximal to this region were present in only two copies. In the carrier father, all DNA sequences were present in only two copies. Cytogenetic analysis of affected individuals employing R and G banding of prometaphase preparations combined with in situ hybridization revealed a translocation of the region from very distal 21q22.1 to 21qter to chromosome 4q. Except for a possible phenotypic contribution from the deletion of chromosome band 4q35, these data provide a molecular definition of the minimal region of chromosome 21 which, when duplicated, generates the facial features, heart defect, a component of the mental retardation, and probably several of the dermatoglyphic changes of DS. This region

  9. Considerations for standardizing predictive molecular pathology for cancer prognosis.

    PubMed

    Fiorentino, Michelangelo; Scarpelli, Marina; Lopez-Beltran, Antonio; Cheng, Liang; Montironi, Rodolfo

    2017-01-01

    Molecular tests that were once ancillary to the core business of cyto-histopathology are becoming the most relevant workload in pathology departments after histopathology/cytopathology and before autopsies. This has resulted from innovations in molecular biology techniques, which have developed at an incredibly fast pace. Areas covered: Most of the current widely used techniques in molecular pathology such as FISH, direct sequencing, pyrosequencing, and allele-specific PCR will be replaced by massive parallel sequencing that will not be considered next generation, but rather, will be considered to be current generation sequencing. The pre-analytical steps of molecular techniques such as DNA extraction or sample preparation will be largely automated. Moreover, all the molecular pathology instruments will be part of an integrated workflow that traces the sample from extraction to the analytical steps until the results are reported; these steps will be guided by expert laboratory information systems. In situ hybridization and immunohistochemistry for quantification will be largely digitalized as much as histology will be mostly digitalized rather than viewed using microscopy. Expert commentary: This review summarizes the technical and regulatory issues concerning the standardization of molecular tests in pathology. A vision of the future perspectives of technological changes is also provided.

  10. Automatic Recognition of Solar Features for Developing Data Driven Prediction Models of Solar Activity and Space Weather

    DTIC Science & Technology

    2013-05-01

    Aschwanden, M. J. 2005, Physics of the Solar Corona . An Introduction with Problems and Solutions (2nd edition), ed. Aschwanden, M. J. Balasubramaniam, K...AFRL-OSR-VA-TR-2013-0020 Automatic Recognition of Solar Features for Developing Data Driven Prediction Models of Solar Activity...Automatic Recognition of Solar Features for Developing Data Driven Prediction Models of Solar Activity and Space Weather 5a. CONTRACT NUMBER FA9550-09

  11. Delta-radiomics features for the prediction of patient outcomes in non-small cell lung cancer.

    PubMed

    Fave, Xenia; Zhang, Lifei; Yang, Jinzhong; Mackin, Dennis; Balter, Peter; Gomez, Daniel; Followill, David; Jones, Aaron Kyle; Stingo, Francesco; Liao, Zhongxing; Mohan, Radhe; Court, Laurence

    2017-04-03

    Radiomics is the use of quantitative imaging features extracted from medical images to characterize tumor pathology or heterogeneity. Features measured at pretreatment have successfully predicted patient outcomes in numerous cancer sites. This project was designed to determine whether radiomics features measured from non-small cell lung cancer (NSCLC) change during therapy and whether those features (delta-radiomics features) can improve prognostic models. Features were calculated from pretreatment and weekly intra-treatment computed tomography images for 107 patients with stage III NSCLC. Pretreatment images were used to determine feature-specific image preprocessing. Linear mixed-effects models were used to identify features that changed significantly with dose-fraction. Multivariate models were built for overall survival, distant metastases, and local recurrence using only clinical factors, clinical factors and pretreatment radiomics features, and clinical factors, pretreatment radiomics features, and delta-radiomics features. All of the radiomics features changed significantly during radiation therapy. For overall survival and distant metastases, pretreatment compactness improved the c-index. For local recurrence, pretreatment imaging features were not prognostic, while texture-strength measured at the end of treatment significantly stratified high- and low-risk patients. These results suggest radiomics features change due to radiation therapy and their values at the end of treatment may be indicators of tumor response.

  12. Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures.

    PubMed

    Darch, Jonathan; Milner, Ben; Vaseghi, Saeed

    2008-12-01

    The aim of this work is to develop methods that enable acoustic speech features to be predicted from mel-frequency cepstral coefficient (MFCC) vectors as may be encountered in distributed speech recognition architectures. The work begins with a detailed analysis of the multiple correlation between acoustic speech features and MFCC vectors. This confirms the existence of correlation, which is found to be higher when measured within specific phonemes rather than globally across all speech sounds. The correlation analysis leads to the development of a statistical method of predicting acoustic speech features from MFCC vectors that utilizes a network of hidden Markov models (HMMs) to localize prediction to specific phonemes. Within each HMM, the joint density of acoustic features and MFCC vectors is modeled and used to make a maximum a posteriori prediction. Experimental results are presented across a range of conditions, such as with speaker-dependent, gender-dependent, and gender-independent constraints, and these show that acoustic speech features can be predicted from MFCC vectors with good accuracy. A comparison is also made against an alternative scheme that substitutes the higher-order MFCCs with acoustic features for transmission. This delivers accurate acoustic features but at the expense of a significant reduction in speech recognition accuracy.

  13. Molecular dissection of colorectal cancer in pre-clinical models identifies biomarkers predicting sensitivity to EGFR inhibitors

    PubMed Central

    Schütte, Moritz; Risch, Thomas; Abdavi-Azar, Nilofar; Boehnke, Karsten; Schumacher, Dirk; Keil, Marlen; Yildiriman, Reha; Jandrasits, Christine; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Worth, Catherine L.; Schweiger, Caroline; Liebs, Sandra; Lange, Martin; Warnatz, Hans- Jörg; Butcher, Lee M.; Barrett, James E.; Sultan, Marc; Wierling, Christoph; Golob-Schwarzl, Nicole; Lax, Sigurd; Uranitsch, Stefan; Becker, Michael; Welte, Yvonne; Regan, Joseph Lewis; Silvestrov, Maxine; Kehler, Inge; Fusi, Alberto; Kessler, Thomas; Herwig, Ralf; Landegren, Ulf; Wienke, Dirk; Nilsson, Mats; Velasco, Juan A.; Garin-Chesa, Pilar; Reinhard, Christoph; Beck, Stephan; Schäfer, Reinhold; Regenbrecht, Christian R. A.; Henderson, David; Lange, Bodo; Haybaeck, Johannes; Keilholz, Ulrich; Hoffmann, Jens; Lehrach, Hans; Yaspo, Marie-Laure

    2017-01-01

    Colorectal carcinoma represents a heterogeneous entity, with only a fraction of the tumours responding to available therapies, requiring a better molecular understanding of the disease in precision oncology. To address this challenge, the OncoTrack consortium recruited 106 CRC patients (stages I–IV) and developed a pre-clinical platform generating a compendium of drug sensitivity data totalling >4,000 assays testing 16 clinical drugs on patient-derived in vivo and in vitro models. This large biobank of 106 tumours, 35 organoids and 59 xenografts, with extensive omics data comparing donor tumours and derived models provides a resource for advancing our understanding of CRC. Models recapitulate many of the genetic and transcriptomic features of the donors, but defined less complex molecular sub-groups because of the loss of human stroma. Linking molecular profiles with drug sensitivity patterns identifies novel biomarkers, including a signature outperforming RAS/RAF mutations in predicting sensitivity to the EGFR inhibitor cetuximab. PMID:28186126

  14. Sequence features accurately predict genome-wide MeCP2 binding in vivo

    PubMed Central

    Rube, H. Tomas; Lee, Wooje; Hejna, Miroslav; Chen, Huaiyang; Yasui, Dag H.; Hess, John F.; LaSalle, Janine M.; Song, Jun S.; Gong, Qizhi

    2016-01-01

    Methyl-CpG binding protein 2 (MeCP2) is critical for proper brain development and expressed at near-histone levels in neurons, but the mechanism of its genomic localization remains poorly understood. Using high-resolution MeCP2-binding data, we show that DNA sequence features alone can predict binding with 88% accuracy. Integrating MeCP2 binding and DNA methylation in a probabilistic graphical model, we demonstrate that previously reported genome-wide association with methylation is in part due to MeCP2's affinity to GC-rich chromatin, a result replicated using published data. Furthermore, MeCP2 co-localizes with nucleosomes. Finally, MeCP2 binding downstream of promoters correlates with increased expression in Mecp2-deficient neurons. PMID:27008915

  15. Predicting the Poaceae pollen season: six month-ahead forecasting and identification of relevant features.

    PubMed

    Navares, Ricardo; Aznarte, José Luis

    2017-04-01

    In this paper, we approach the problem of predicting the concentrations of Poaceae pollen which define the main pollination season in the city of Madrid. A classification-based approach, based on a computational intelligence model (random forests), is applied to forecast the dates in which risk concentration levels are to be observed. Unlike previous works, the proposal extends the range of forecasting horizons up to 6 months ahead. Furthermore, the proposed model allows to determine the most influential factors for each horizon, making no assumptions about the significance of the weather features. The performace of the proposed model proves it as a successful tool for allergy patients in preventing and minimizing the exposure to risky pollen concentrations and for researchers to gain a deeper insight on the factors driving the pollination season.

  16. Predicting the Poaceae pollen season: six month-ahead forecasting and identification of relevant features

    NASA Astrophysics Data System (ADS)

    Navares, Ricardo; Aznarte, José Luis

    2016-09-01

    In this paper, we approach the problem of predicting the concentrations of Poaceae pollen which define the main pollination season in the city of Madrid. A classification-based approach, based on a computational intelligence model (random forests), is applied to forecast the dates in which risk concentration levels are to be observed. Unlike previous works, the proposal extends the range of forecasting horizons up to 6 months ahead. Furthermore, the proposed model allows to determine the most influential factors for each horizon, making no assumptions about the significance of the weather features. The performace of the proposed model proves it as a successful tool for allergy patients in preventing and minimizing the exposure to risky pollen concentrations and for researchers to gain a deeper insight on the factors driving the pollination season.

  17. Predicting the Poaceae pollen season: six month-ahead forecasting and identification of relevant features

    NASA Astrophysics Data System (ADS)

    Navares, Ricardo; Aznarte, José Luis

    2017-04-01

    In this paper, we approach the problem of predicting the concentrations of Poaceae pollen which define the main pollination season in the city of Madrid. A classification-based approach, based on a computational intelligence model (random forests), is applied to forecast the dates in which risk concentration levels are to be observed. Unlike previous works, the proposal extends the range of forecasting horizons up to 6 months ahead. Furthermore, the proposed model allows to determine the most influential factors for each horizon, making no assumptions about the significance of the weather features. The performace of the proposed model proves it as a successful tool for allergy patients in preventing and minimizing the exposure to risky pollen concentrations and for researchers to gain a deeper insight on the factors driving the pollination season.

  18. Microaneurysms in renal angiomyolipomas: Can clinical and computed tomography features predict their presence and size?

    PubMed

    Champagnac, J; Melodelima, C; Martinelli, T; Pagnoux, G; Badet, L; Juillard, L; Rouvière, O

    2016-03-01

    To evaluate clinical and multidetector computed tomography (MDCT) features associated with the presence and size of microaneurysms in renal angiomyolipomas (AMLs). The MDCTs and digital subtraction angiographies (DSAs) of 31 patients who had further percutaneous arterial embolization of AMLs were retrospectively reviewed. They were 22 women and 9 men (mean age, 47.7±27.7 years). The medical files of the included patients were reviewed for age, gender and clinical features. MDCT and DSA images were analyzed by two readers working in consensus. Of the 31 patients, 15 had tuberous sclerosis complex (TSC) or lymphangioleiomyomatosis (LAM). In total, the 31 patients had 54 AMLs (5 ruptured). On DSA, 28 clusters of microaneurysms were found in 17 patients (21 AMLs). Four of the five ruptured AMLs had microaneurysms. None of the 12 AMLs≤40mm and 21 of the 42 AMLs>40mm had microaneurysms. Among AMLs>40mm, history of TSC/LAM (P=0.5), RENAL score (P=0.7) and relative volume of fat (P=0.11) did not significantly predict the presence of microaneurysms. Microaneurysms were significantly larger in ruptured (9.5±5.7mm) than non-ruptured (3.9±1.9mm, P=0.02) AMLs. No associations were found between the size of microaneurysms and the size of AMLs. Microaneurysms were found in no AML ≤40mm and in 50%of AMLs>40mm. In AMLs >40mm, history of TSC/LAM, RENAL score and relative volume of fat did not significantly predict the presence of microaneurysms. Copyright © 2016. Published by Elsevier Masson SAS.

  19. Body Composition Features Predict Overall Survival in Patients With Hepatocellular Carcinoma

    PubMed Central

    Singal, Amit G; Zhang, Peng; Waljee, Akbar K; Ananthakrishnan, Lakshmi; Parikh, Neehar D; Sharma, Pratima; Barman, Pranab; Krishnamurthy, Venkataramu; Wang, Lu; Wang, Stewart C; Su, Grace L

    2016-01-01

    Objectives: Existing prognostic models for patients with hepatocellular carcinoma (HCC) have limitations. Analytic morphomics, a novel process to measure body composition using computational image-processing algorithms, may offer further prognostic information. The aim of this study was to develop and validate a prognostic model for HCC patients using body composition features and objective clinical information. Methods: Using computed tomography scans from a cohort of HCC patients at the VA Ann Arbor Healthcare System between January 2006 and December 2013, we developed a prognostic model using analytic morphomics and routine clinical data based on multivariate Cox regression and regularization methods. We assessed model performance using C-statistics and validated predicted survival probabilities. We validated model performance in an external cohort of HCC patients from Parkland Hospital, a safety-net health system in Dallas County. Results: The derivation cohort consisted of 204 HCC patients (20.1% Barcelona Clinic Liver Cancer classification (BCLC) 0/A), and the validation cohort had 225 patients (22.2% BCLC 0/A). The analytic morphomics model had good prognostic accuracy in the derivation cohort (C-statistic 0.80, 95% confidence interval (CI) 0.71–0.89) and external validation cohort (C-statistic 0.75, 95% CI 0.68–0.82). The accuracy of the analytic morphomics model was significantly higher than that of TNM and BCLC staging systems in derivation (P<0.001 for both) and validation (P<0.001 for both) cohorts. For calibration, mean absolute errors in predicted 1-year survival probabilities were 5.3% (90% quantile of 7.5%) and 7.6% (90% quantile of 12.5%) in the derivation and validation cohorts, respectively. Conclusion: Body composition features, combined with readily available clinical data, can provide valuable prognostic information for patients with newly diagnosed HCC. PMID:27228403

  20. Assessment of two mammographic density related features in predicting near-term breast cancer risk

    NASA Astrophysics Data System (ADS)

    Zheng, Bin; Sumkin, Jules H.; Zuley, Margarita L.; Wang, Xingwei; Klym, Amy H.; Gur, David

    2012-02-01

    In order to establish a personalized breast cancer screening program, it is important to develop risk models that have high discriminatory power in predicting the likelihood of a woman developing an imaging detectable breast cancer in near-term (e.g., <3 years after a negative examination in question). In epidemiology-based breast cancer risk models, mammographic density is considered the second highest breast cancer risk factor (second to woman's age). In this study we explored a new feature, namely bilateral mammographic density asymmetry, and investigated the feasibility of predicting near-term screening outcome. The database consisted of 343 negative examinations, of which 187 depicted cancers that were detected during the subsequent screening examination and 155 that remained negative. We computed the average pixel value of the segmented breast areas depicted on each cranio-caudal view of the initial negative examinations. We then computed the mean and difference mammographic density for paired bilateral images. Using woman's age, subjectively rated density (BIRADS), and computed mammographic density related features we compared classification performance in estimating the likelihood of detecting cancer during the subsequent examination using areas under the ROC curves (AUC). The AUCs were 0.63+/-0.03, 0.54+/-0.04, 0.57+/-0.03, 0.68+/-0.03 when using woman's age, BIRADS rating, computed mean density and difference in computed bilateral mammographic density, respectively. Performance increased to 0.62+/-0.03 and 0.72+/-0.03 when we fused mean and difference in density with woman's age. The results suggest that, in this study, bilateral mammographic tissue density is a significantly stronger (p<0.01) risk indicator than both woman's age and mean breast density.

  1. Whole slide image with image analysis of atypical bile duct brushing: Quantitative features predictive of malignancy.

    PubMed

    Collins, Brian T; Weimholt, R Cody

    2015-01-01

    Whole slide images (WSIs) involve digitally capturing glass slides for microscopic computer-based viewing and these are amenable to quantitative image analysis. Bile duct (BD) brushing can show morphologic features that are categorized as indeterminate for malignancy. The study aims to evaluate quantitative morphologic features of atypical categories of BD brushing by WSI analysis for the identification of criteria predictive of malignancy. Over a 3-year period, BD brush specimens with indeterminate diagnostic categorization (atypical to suspicious) were subjected to WSI analysis. Ten well-visualized groups with morphologic atypical features were selected per case and had the quantitative analysis performed for group area, individual nuclear area, the number of nuclei per group, N: C ratio and nuclear size differential. There were 28 cases identified with 17 atypical and 11 suspicious. The average nuclear area was 63.7 µm(2) for atypical and 80.1 µm(2) for suspicious (+difference 16.4 µm(2); P = 0.002). The nuclear size differential was 69.7 µm(2) for atypical and 88.4 µm(2) for suspicious (+difference 18.8 µm(2); P = 0.009). An average nuclear area >70 µm(2) had a 3.2 risk ratio for suspicious categorization. The quantitative criteria findings as measured by image analysis on WSI showed that cases categorized as suspicious had more nuclear size pleomorphism (+18.8 µm(2)) and larger nuclei (+16.4 µm(2)) than those categorized as atypical. WSI with morphologic image analysis can demonstrate quantitative statistically significant differences between atypical and suspicious BD brushings and provide objective criteria that support the diagnosis of carcinoma.

  2. Molecular effective coverage surface area of optical clearing agents for predicting optical clearing potential

    NASA Astrophysics Data System (ADS)

    Feng, Wei; Ma, Ning; Zhu, Dan

    2015-03-01

    The improvement of methods for optical clearing agent prediction exerts an important impact on tissue optical clearing technique. The molecular dynamic simulation is one of the most convincing and simplest approaches to predict the optical clearing potential of agents by analyzing the hydrogen bonds, hydrogen bridges and hydrogen bridges type forming between agents and collagen. However, the above analysis methods still suffer from some problem such as analysis of cyclic molecule by reason of molecular conformation. In this study, a molecular effective coverage surface area based on the molecular dynamic simulation was proposed to predict the potential of optical clearing agents. Several typical cyclic molecules, fructose, glucose and chain molecules, sorbitol, xylitol were analyzed by calculating their molecular effective coverage surface area, hydrogen bonds, hydrogen bridges and hydrogen bridges type, respectively. In order to verify this analysis methods, in vitro skin samples optical clearing efficacy were measured after 25 min immersing in the solutions, fructose, glucose, sorbitol and xylitol at concentration of 3.5 M using 1951 USAF resolution test target. The experimental results show accordance with prediction of molecular effective coverage surface area. Further to compare molecular effective coverage surface area with other parameters, it can show that molecular effective coverage surface area has a better performance in predicting OCP of agents.

  3. Evaluating stability of histomorphometric features across scanner and staining variations: predicting biochemical recurrence from prostate cancer whole slide images

    NASA Astrophysics Data System (ADS)

    Leo, Patrick; Lee, George; Madabhushi, Anant

    2016-03-01

    Quantitative histomorphometry (QH) is the process of computerized extraction of features from digitized tissue slide images. Typically these features are used in machine learning classifiers to predict disease presence, behavior and outcome. Successful robust classifiers require features that both discriminate between classes of interest and are stable across data from multiple sites. Feature stability may be compromised by variation in slide staining and scanning procedures. These laboratory specific variables include dye batch, slice thickness and the whole slide scanner used to digitize the slide. The key therefore is to be able to identify features that are not only discriminating between the classes of interest (e.g. cancer and non-cancer or biochemical recurrence and non- recurrence) but also features that will not wildly fluctuate on slides representing the same tissue class but from across multiple different labs and sites. While there has been some recent efforts at understanding feature stability in the context of radiomics applications (i.e. feature analysis of radiographic images), relatively few attempts have been made at studying the trade-off between feature stability and discriminability for histomorphometric and digital pathology applications. In this paper we present two new measures, preparation-induced instability score (PI) and latent instability score (LI), to quantify feature instability across and within datasets. Dividing PI by LI yields a ratio for how often a feature for a specific tissue class (e.g. low grade prostate cancer) is different between datasets from different sites versus what would be expected from random chance alone. Using this ratio we seek to quantify feature vulnerability to variations in slide preparation and digitization. Since our goal is to identify stable QH features we evaluate these features for their stability and thus inclusion in machine learning based classifiers in a use case involving prostate cancer

  4. Abstract Conceptual Feature Ratings Predict Gaze within Written Word Arrays: Evidence from a Visual Wor(l)d Paradigm

    ERIC Educational Resources Information Center

    Primativo, Silvia; Reilly, Jamie; Crutch, Sebastian J

    2017-01-01

    The Abstract Conceptual Feature (ACF) framework predicts that word meaning is represented within a high-dimensional semantic space bounded by weighted contributions of perceptual, affective, and encyclopedic information. The ACF, like latent semantic analysis, is amenable to distance metrics between any two words. We applied predictions of the ACF…

  5. Search performance is better predicted by tileability than presence of a unique basic feature

    PubMed Central

    Chang, Honghua; Rosenholtz, Ruth

    2016-01-01

    Traditional models of visual search such as feature integration theory (FIT; Treisman & Gelade, 1980), have suggested that a key factor determining task difficulty consists of whether or not the search target contains a “basic feature” not found in the other display items (distractors). Here we discriminate between such traditional models and our recent texture tiling model (TTM) of search (Rosenholtz, Huang, Raj, Balas, & Ilie, 2012b), by designing new experiments that directly pit these models against each other. Doing so is nontrivial, for two reasons. First, the visual representation in TTM is fully specified, and makes clear testable predictions, but its complexity makes getting intuitions difficult. Here we elucidate a rule of thumb for TTM, which enables us to easily design new and interesting search experiments. FIT, on the other hand, is somewhat ill-defined and hard to pin down. To get around this, rather than designing totally new search experiments, we start with five classic experiments that FIT already claims to explain: T among Ls, 2 among 5s, Q among Os, O among Qs, and an orientation/luminance-contrast conjunction search. We find that fairly subtle changes in these search tasks lead to significant changes in performance, in a direction predicted by TTM, providing definitive evidence in favor of the texture tiling model as opposed to traditional views of search. PMID:27548090

  6. Beyond intensity: Spectral features effectively predict music-induced subjective arousal.

    PubMed

    Gingras, Bruno; Marin, Manuela M; Fitch, W Tecumseh

    2014-01-01

    Emotions in music are conveyed by a variety of acoustic cues. Notably, the positive association between sound intensity and arousal has particular biological relevance. However, although amplitude normalization is a common procedure used to control for intensity in music psychology research, direct comparisons between emotional ratings of original and amplitude-normalized musical excerpts are lacking. In this study, 30 nonmusicians retrospectively rated the subjective arousal and pleasantness induced by 84 six-second classical music excerpts, and an additional 30 nonmusicians rated the same excerpts normalized for amplitude. Following the cue-redundancy and Brunswik lens models of acoustic communication, we hypothesized that arousal and pleasantness ratings would be similar for both versions of the excerpts, and that arousal could be predicted effectively by other acoustic cues besides intensity. Although the difference in mean arousal and pleasantness ratings between original and amplitude-normalized excerpts correlated significantly with the amplitude adjustment, ratings for both sets of excerpts were highly correlated and shared a similar range of values, thus validating the use of amplitude normalization in music emotion research. Two acoustic parameters, spectral flux and spectral entropy, accounted for 65% of the variance in arousal ratings for both sets, indicating that spectral features can effectively predict arousal. Additionally, we confirmed that amplitude-normalized excerpts were adequately matched for loudness. Overall, the results corroborate our hypotheses and support the cue-redundancy and Brunswik lens models.

  7. Identification of critical chemical features for Aurora kinase-B inhibitors using Hip-Hop, virtual screening and molecular docking

    NASA Astrophysics Data System (ADS)

    Sakkiah, Sugunadevi; Thangapandian, Sundarapandian; John, Shalini; Lee, Keun Woo

    2011-01-01

    This study was performed to find the selective chemical features for Aurora kinase-B inhibitors using the potent methods like Hip-Hop, virtual screening, homology modeling, molecular dynamics and docking. The best hypothesis, Hypo1 was validated toward a wide range of test set containing the selective inhibitors of Aurora kinase-B. Homology modeling and molecular dynamics studies were carried out to perform the molecular docking studies. The best hypothesis Hypo1 was used as a 3D query to screen the chemical databases. The screened molecules from the databases were sorted based on ADME and drug like properties. The selective hit compounds were docked and the hydrogen bond interactions with the critical amino acids present in Aurora kinase-B were compared with the chemical features present in the Hypo1. Finally, we suggest that the chemical features present in the Hypo1 are vital for a molecule to inhibit the Aurora kinase-B activity.

  8. Wetland features and landscape context predict the risk of wetland habitat loss.

    PubMed

    Gutzwiller, Kevin J; Flather, Curtis H

    2011-04-01

    Wetlands generally provide significant ecosystem services and function as important harbors of biodiversity. To ensure that these habitats are conserved, an efficient means of identifying wetlands at risk of conversion is needed, especially in the southern United States where the rate of wetland loss has been highest in recent decades. We used multivariate adaptive regression splines to develop a model to predict the risk of wetland habitat loss as a function of wetland features and landscape context. Fates of wetland habitats from 1992 to 1997 were obtained from the National Resources Inventory for the U.S. Forest Service's Southern Region, and land-cover data were obtained from the National Land Cover Data. We randomly selected 70% of our 40 617 observations to build the model (n = 28 432), and randomly divided the remaining 30% of the data into five Test data sets (n = 2437 each). The wetland and landscape variables that were important in the model, and their relative contributions to the model's predictive ability (100 = largest, 0 = smallest), were land-cover/ land-use of the surrounding landscape (100.0), size and proximity of development patches within 570 m (39.5), land ownership (39.1), road density within 570 m (37.5), percent woody and herbaceous wetland cover within 570 m (27.8), size and proximity of development patches within 5130 m (25.7), percent grasslands/herbaceous plants and pasture/hay cover within 5130 m (21.7), wetland type (21.2), and percent woody and herbaceous wetland cover within 1710 m (16.6). For the five Test data sets, Kappa statistics (0.40, 0.50, 0.52, 0.55, 0.56; P < 0.0001), area-under-the-receiver-operating-curve (AUC) statistics (0.78, 0.82, 0.83, 0.83, 0.84; P < 0.0001), and percent correct prediction of wetland habitat loss (69.1, 80.4, 81.7, 82.3, 83.1) indicated the model generally had substantial predictive ability across the South. Policy analysts and land-use planners can use the model and associated maps to prioritize

  9. PREDICTING FIFTEEN-YEAR CANCER-SPECIFIC MORTALITY BASED ON THE PATHOLOGICAL FEATURES OF PROSTATE CANCER

    PubMed Central

    Eggener, Scott E.; Scardino, Peter T.; Walsh, Patrick C.; Han, Misop; Partin, Alan W.; Trock, Bruce J.; Feng, Zhaoyong; Wood, David P.; Eastham, James A.; Yossepowitch, Ofer; Rabah, Danny M.; Kattan, Michael W.; Yu, Changhong; Klein, Eric A.; Stephenson, Andrew J.

    2014-01-01

    Purpose Long-term prostate cancer-specific mortality (PCSM) after radical prostatectomy is poorly defined in the era of widespread screening. An understanding of the treated natural history of screen-detected cancers and the pathological risk factors for PCSM are needed for treatment decision-making. Methods Using Fine and Gray competing risk regression analysis, the clinical and pathological data and follow-up information of 11,521 patients treated by radical prostatectomy at four academic centers from 1987 to 2005 were modeled to predict PCSM. The model was validated on 12,389 patients treated at a separate institution during the same period. Results The overall 15-year PCSM was 7%. Primary and secondary pathological Gleason grade 4–5 (P < 0.001 for both), seminal vesicle invasion (P < 0.001), and year of surgery (P = 0.002) were significant predictors of PCSM. A nomogram predicting 15-year PCSM based on standard pathological parameters was accurate and discriminating with an externally-validated concordance index of 0.92. Stratified by patient age, 15-year PCSM for Gleason score ≤ 6, 3+4, 4+3, and 8–10 ranged from 0.2–1.2%, 4.2–6.5%, 6.6–11%, and 26–37%, respectively. The 15-year PCSM risks ranged from 0.8–1.5%, 2.9–10%, 15–27%, and 22–30% for organ-confined cancer, extraprostatic extension, seminal vesicle invasion, and lymph node metastasis, respectively. Only 3 of 9557 patients with organ-confined, Gleason score ≤ 6 cancers have died from prostate cancer. Conclusions The presence of poorly differentiated cancer and seminal vesicle invasion are the prime determinants of PCSM after radical prostatectomy. The risk of PCSM can be predicted with unprecedented accuracy once the pathological features of prostate cancer are known. PMID:21239008

  10. Computer extracted texture features on T2w MRI to predict biochemical recurrence following radiation therapy for prostate cancer

    NASA Astrophysics Data System (ADS)

    Ginsburg, Shoshana B.; Rusu, Mirabela; Kurhanewicz, John; Madabhushi, Anant

    2014-03-01

    In this study we explore the ability of a novel machine learning approach, in conjunction with computer-extracted features describing prostate cancer morphology on pre-treatment MRI, to predict whether a patient will develop biochemical recurrence within ten years of radiation therapy. Biochemical recurrence, which is characterized by a rise in serum prostate-specific antigen (PSA) of at least 2 ng/mL above the nadir PSA, is associated with increased risk of metastasis and prostate cancer-related mortality. Currently, risk of biochemical recurrence is predicted by the Kattan nomogram, which incorporates several clinical factors to predict the probability of recurrence-free survival following radiation therapy (but has limited prediction accuracy). Semantic attributes on T2w MRI, such as the presence of extracapsular extension and seminal vesicle invasion and surrogate measure- ments of tumor size, have also been shown to be predictive of biochemical recurrence risk. While the correlation between biochemical recurrence and factors like tumor stage, Gleason grade, and extracapsular spread are well- documented, it is less clear how to predict biochemical recurrence in the absence of extracapsular spread and for small tumors fully contained in the capsule. Computer{extracted texture features, which quantitatively de- scribe tumor micro-architecture and morphology on MRI, have been shown to provide clues about a tumor's aggressiveness. However, while computer{extracted features have been employed for predicting cancer presence and grade, they have not been evaluated in the context of predicting risk of biochemical recurrence. This work seeks to evaluate the role of computer-extracted texture features in predicting risk of biochemical recurrence on a cohort of sixteen patients who underwent pre{treatment 1.5 Tesla (T) T2w MRI. We extract a combination of first-order statistical, gradient, co-occurrence, and Gabor wavelet features from T2w MRI. To identify which of these

  11. Prediction of pathologic femoral fractures in patients with lung cancer using machine learning algorithms: Comparison of computed tomography-based radiological features with clinical features versus without clinical features.

    PubMed

    Oh, Eunsun; Seo, Sung Wook; Yoon, Young Cheol; Kim, Dong Wook; Kwon, Sunyoung; Yoon, Sungroh

    2017-01-01

    The purpose of this article is to compare the predictive power of two models trained with computed tomography (CT)-based radiological features and both CT-based radiological and clinical features for pathologic femoral fractures in patients with lung cancer using machine learning algorithms. Between January 2010 and December 2014, 315 lung cancer patients with metastasis to the femur were included. Among them, 84 patients who underwent CT scan and were followed up for more than 3 months were enrolled. We examined clinical and radiological risk factors affecting pathologic fracture through logistic regression. Predictive analysis was performed using five different supervised learning algorithms. The power of predictive model trained with CT-based radiological features was compared to those trained with both CT-based radiological and clinical features. In multivariate logistic regression, female sex (odds ratio = 0.25, p = 0.0126), osteolysis (odds ratio = 7.62, p = 0.0239), and absence of radiation therapy (odds ratio = 10.25, p = 0.0258) significantly increased the risk of pathologic fracture in proximal femur. The predictive model trained with both CT-based radiological and clinical features showed the highest area under the receiver operating characteristic curve (0.80 ± 0.14, p < 0.0001) through gradient boosting algorithm. We believe that machine learning algorithms may be useful in the prediction of pathologic femoral fracture, which are multifactorial problem.

  12. Mining SOM expression portraits: feature selection and integrating concepts of molecular function

    PubMed Central

    2012-01-01

    Background Self organizing maps (SOM) enable the straightforward portraying of high-dimensional data of large sample collections in terms of sample-specific images. The analysis of their texture provides so-called spot-clusters of co-expressed genes which require subsequent significance filtering and functional interpretation. We address feature selection in terms of the gene ranking problem and the interpretation of the obtained spot-related lists using concepts of molecular function. Results Different expression scores based either on simple fold change-measures or on regularized Student’s t-statistics are applied to spot-related gene lists and compared with special emphasis on the error characteristics of microarray expression data. The spot-clusters are analyzed using different methods of gene set enrichment analysis with the focus on overexpression and/or overrepresentation of predefined sets of genes. Metagene-related overrepresentation of selected gene sets was mapped into the SOM images to assign gene function to different regions. Alternatively we estimated set-related overexpression profiles over all samples studied using a gene set enrichment score. It was also applied to the spot-clusters to generate lists of enriched gene sets. We used the tissue body index data set, a collection of expression data of human tissues as an illustrative example. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. In addition, we display special sets of housekeeping and of consistently weak and high expressed genes using SOM data filtering. Conclusions The presented methods allow the comprehensive downstream analysis of SOM-transformed expression data in terms of cluster-related gene lists and enriched gene sets for functional interpretation. SOM clustering implies the ability to define either new gene sets using selected SOM spots or to verify and/or to amend existing

  13. Cone-like morphological, molecular, and electrophysiological features of the photoreceptors of the Nrl knockout mouse.

    PubMed

    Daniele, Lauren L; Lillo, Concepcion; Lyubarsky, Arkady L; Nikonov, Sergei S; Philp, Nancy; Mears, Alan J; Swaroop, Anand; Williams, David S; Pugh, Edward N

    2005-06-01

    To test the hypothesis that Nrl(-)(/)(-) photoreceptors are cones, by comparing them with WT rods and cones using morphological, molecular, histochemical, and electrophysiological criteria. The photoreceptor layer of fixed retinal tissue of 4- to 6-week-old mice was examined in plastic sections by electron microscopy, and by confocal microscopy in frozen sections immunolabeled for the mouse UV-cone pigment and colabeled with PNA. Quantitative immunoblot analysis was used to determine the levels of expression of key cone-specific proteins. Single- and paired-flash methods were used to extract the spectral sensitivity, kinetics, and amplification of the a-wave of the ERG. Outer segments of Nrl(-/-) photoreceptors ( approximately 7 mum) are shorter than those of wild-type (WT) rods ( approximately 25 mum) and cones ( approximately 15 mum); but, like WT cones, they have 25 or more basal discs open to the extracellular space, extracellular matrix sheaths stained by PNA, chromatin "clumping" in their nuclei, and mitochondria two times shorter than rods. Nrl(-/-) photoreceptors express the mouse UV cone pigment, cone transducin, and cone arrestin in amounts expected, given the relative size and density of cones in the two retinas. The ERG a-wave was used to assay the properties of the photocurrent response. The sensitivity of the Nrl(-/-) a-wave is at its maximum at 360 nm, with a secondary mode at 510 nm having approximately one-tenth the maximum sensitivity. These wavelengths are the lambda(max) of the two mouse cone pigments. The time to peak of the dim-flash photocurrent response was approximately 50 ms, more than two times faster than that of rods. Many morphological, molecular, and electrophysiological features of the Nrl(-/-) photoreceptors are cone-like, and strongly distinguish these cells from rods. This retina provides a model for the investigation of cone function and cone-specific genetic disease.

  14. Features of exciton dynamics in molecular nanoclusters (J-aggregates): Exciton self-trapping (Review Article)

    NASA Astrophysics Data System (ADS)

    Malyukin, Yu. V.; Sorokin, A. V.; Semynozhenko, V. P.

    2016-06-01

    We present thoroughly analyzed experimental results that demonstrate the anomalous manifestation of the exciton self-trapping effect, which is already well-known in bulk crystals, in ordered molecular nanoclusters called J-aggregates. Weakly-coupled one-dimensional (1D) molecular chains are the main structural feature of J-aggregates, wherein the electron excitations are manifested as 1D Frenkel excitons. According to the continuum theory of Rashba-Toyozawa, J-aggregates can have only self-trapped excitons, because 1D excitons must adhere to barrier-free self-trapping at any exciton-phonon coupling constant g = ɛLR/2β, wherein ɛLR is the lattice relaxation energy, and 2β is the half-width of the exciton band. In contrast, very often only the luminescence of free, mobile excitons would manifest in experiments involving J-aggregates. Using the Urbach rule in order to analyze the low-frequency region of the low-temperature exciton absorption spectra has shown that J-aggregates can have both a weak (g < 1) and a strong (g > 1) exciton-phonon coupling. Moreover, it is experimentally demonstrated that under certain conditions, the J-aggregate excited state can have both free and self-trapped excitons, i.e., we establish the existence of a self-trapping barrier for 1D Frenkel excitons. We demonstrate and analyze the reasons behind the anomalous existence of both free and self-trapped excitons in J-aggregates, and demonstrate how exciton-self trapping efficiency can be managed in J-aggregates by varying the values of g, which is fundamentally impossible in bulk crystals. We discuss how the exciton-self trapping phenomenon can be used as an alternate interpretation of the wide band emission of some J-aggregates, which has thus far been explained by the strongly localized exciton model.

  15. STAT3 Expression, Molecular Features, Inflammation Patterns and Prognosis in a Database of 724 Colorectal Cancers

    PubMed Central

    Morikawa, Teppei; Baba, Yoshifumi; Yamauchi, Mai; Kuchiba, Aya; Nosho, Katsuhiko; Shima, Kaori; Tanaka, Noriko; Huttenhower, Curtis; Frank, David A.; Fuchs, Charles S.; Ogino, Shuji

    2010-01-01

    Purpose STAT3 (signal transducer and activator of transcription 3) is a transcription factor that is constitutively activated in some cancers. STAT3 appears to play crucial roles in cell proliferation and survival, angiogenesis, tumor-promoting inflammation and suppression of anti-tumor host immune response in the tumor microenvironment. Although the STAT3 signaling pathway is a potential drug target, clinical, pathologic, molecular or prognostic features of STAT3-activated colorectal cancer remain uncertain. Experimental Design Utilizing a database of 724 colon and rectal cancer cases, we evaluated phosphorylated STAT3 (p-STAT3) expression by immunohistochemistry. Cox proportional hazards model was used to compute mortality hazard ratio (HR), adjusting for clinical, pathologic and molecular features, including microsatellite instability (MSI), the CpG island methylator phenotype (CIMP), LINE-1 methylation, 18q loss of heterozygosity, TP53 (p53), CTNNB1 (β-catenin), JC virus T-antigen, and KRAS, BRAF, and PIK3CA mutations. Results Among the 724 tumors, 131 (18%) showed high-level p-STAT3 expression (p-STAT3-high), 244 (34%) showed low-level expression (p-STAT3-low), and the remaining 349 (48%) were negative for p-STAT3. p-STAT3 overexpression was associated with significantly higher colorectal cancer-specific mortality [log-rank p=0.0020; univariate HR (p-STAT3-high vs. p-STAT3-negative) 1.85, 95% confidence interval (CI) 1.30–2.63, Ptrend =0.0005; multivariate HR, 1.61, 95% CI 1.11–2.34, Ptrend =0.015). p-STAT3 expression was positively associated with peritumoral lymphocytic reaction (multivariate odds ratio 3.23; 95% CI, 1.89–5.53; p<0.0001). p-STAT3 expression was not associated with MSI, CIMP, or LINE-1 hypomethylation. Conclusions STAT3 activation in colorectal cancer is associated with adverse clinical outcome, supporting its potential roles as a prognostic biomarker and a chemoprevention and/or therapeutic target. PMID:21310826

  16. A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data

    PubMed Central

    Rahnenführer, Jörg; Lang, Michel

    2017-01-01

    Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy. PMID:28835769

  17. [Clinical features and molecular diagnosis of three patients with DiGeorge anomaly].

    PubMed

    Sun, Jin-qiao; Wang, Lai-shuan; Qi, Chun-hua; Ying, Wen-jing; Guo, Xiao-hong; Liu, Dan-ru; Hui, Xiao-ying; Liu, Fang; Cao, Yun; Luo, Fei-hong; Wang, Xiao-chuan

    2012-12-01

    To investigate the clinical features and molecular diagnostic methods of three patients with DiGeorge anomaly. The clinical manifestations and immunological features of the three cases with DiGeorge anomaly were analyzed. We detected the chromosome 22q11.2 gene deletion by fluorescence in situ hybridization (FISH). (1) CLINICAL MANIFESTATIONS: All three cases had varying degrees of infection, congenital heart disease and small thymus by imaging; two cases had significant hypocalcemia (1.11 mmol/L and 1.22 mmol/L, respectively), accompanied by convulsions; only 1 case had cleft palate and all had no significant facial deformity. (2) Immunological characteristics: All three cases had varying degrees of T-cell immune function defects (percentage of T lymphocytes was 24% - 43%, absolute count was 309 - 803/µl), and levels of immunoglobulin G, A, M, and percent of B lymphocytes and absolute count were normal. (3) Detection of the chromosome 22q11.2 gene deletion: 400 cells of each case were detected. All cells showed two green and one red hybridization signal, indicating the presence of gene deletions in chromosome 22q11.2. (4) OUTCOME: All three cases were treated with thymosin, and appropriate clinical intervention for cardiac malformations, hypocalcemia, and were followed-up for 4 - 18 months, the prognosis was good. DiGeorge anomaly showed diverse clinical manifestations. We should consider the disease if patients had congenital heart disease, thymic hypoplasia, hypocalcemia and/or impaired immune function. FISH for detecting chromosome 22q11.2 gene deletion can be used as accurate and rapid diagnostic method. Thymosin treatment and other clinical intervention may help to improve the prognosis of patients with partial DiGeorge anomaly.

  18. Dynamic features of carboxy cytoglobin distal mutants investigated by molecular dynamics simulations.

    PubMed

    Zhao, Cong; Du, Weihong

    2016-04-01

    Cytoglobin (Cgb) is a member of hemoprotein family with roles in NO metabolism, fibrosis, and tumourigenesis. Similar to other hemoproteins, Cgb structure and functions are markedly influenced by distal key residues. The sixth ligand His(81) (E7) is crucial to exogenous ligand binding, heme pocket conformation, and physiological roles of this protein. However, the effects of other key residues on heme pocket and protein biological functions are not well known. In this work, a molecular dynamics (MD) simulation study of two single mutants in CO-ligated Cgb (L46FCgbCO and L46VCgbCO) and two double mutants (L46FH81QCgbCO and L46VH81QCgbCO) was conducted to explore the effects of the key distal residues Leu(46)(B10) and His(81)(E7) on Cgb structure and functions. Results indicated that the distal mutation of B10 and E7 affected CgbCO dynamic properties on loop region fluctuation, internal cavity rearrangement, and heme motion. The distal conformation change was reflected by the distal key residues Gln(62) (CD3) and Arg(84)(E10). The hydrogen bond between heme propionates with CD3 or E10 residues were evidently influenced by B10/E7 mutation. Furthermore, heme pocket rearrangement was also observed based on the distal pocket volume and occurrence rate of inner cavities. The mutual effects of B10 and E7 residues on protein conformational rearrangement and other dynamic features were expressed in current MD studies of CgbCO and its distal mutants, suggesting their crucial role in heme pocket stabilization, ligand binding, and Cgb biological functions. The mutation of distal B10 and E7 residues affects the dynamic features of carboxy cytoglobin.

  19. Relationship among clinical, pathological and bio-molecular features in low-grade epilepsy-associated neuroepithelial tumors.

    PubMed

    Vornetti, Gianfranco; Marucci, Gianluca; Zenesini, Corrado; de Biase, Dario; Michelucci, Roberto; Tinuper, Paolo; Tallini, Giovanni; Giulioni, Marco

    2017-10-01

    The aim of this study was to evaluate the relationship between molecular markers and clinicopathological features in patients operated on for low-grade epilepsy-associated neuroepithelial tumors. Molecular-genetic signatures are becoming increasingly important in characterizing these lesions, which represent the second most common cause of focal epilepsy in patients undergoing epilepsy surgery. Data from 22 patients operated on for histopathologically confirmed low-grade epilepsy-associated neuroepithelial tumors were retrospectively collected. All specimens were examined for BRAF and IDH mutational status, 1p/19q codeletion and CD34 expression. The relationship between bio-molecular markers and several demographic, clinical and pathological features were analyzed. BRAF mutation was found in 11 (50.0%) patients and CD34 expression in 13 (59.1%). No patients presented IDH mutation or 1p/19q codeletion. Multiple seizure types were present in 5 (45.5%) patients with BRAF mutation and in none of those with BRAF wild type (p=0.035). Moreover, BRAF mutation was predominant in right-sided lesions (p=0.004) and CD34 expression was significantly associated with a longer duration of epilepsy (p=0.027). Several other clinicopathological features, such as association with focal cortical dysplasia and postoperative seizure outcome, showed no significant correlation with molecular markers. Further studies are necessary both to confirm these data in larger cohort of patients and to investigate possible relationships between molecular markers and other clinicopathological features. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Clinical Features, Outcomes, and Molecular Characteristics of Community- and Health Care-Associated Staphylococcus lugdunensis Infections

    PubMed Central

    Yeh, Chun-Fu; Chang, Shih-Cheng; Cheng, Chun-Wen; Lin, Jung-Fu; Liu, Tsui-Ping

    2016-01-01

    Staphylococcus lugdunensis is a major cause of aggressive endocarditis, but it is also responsible for a broad spectrum of infections. The differences in clinical and molecular characteristics between community-associated (CA) and health care-associated (HA) S. lugdunensis infections have remained unclear. We performed a retrospective study of S. lugdunensis infections between 2003 and 2014 to compare the clinical and molecular characteristics of CA and HA isolates. We collected 129 S. lugdunensis isolates in total: 81 (62.8%) HA isolates and 48 (37.2%) CA isolates. HA infections were more frequent than CA infections in children (16.0% versus 4.2%, respectively; P = 0.041) and the elderly (38.3% versus 14.6%, respectively; P = 0.004). The CA isolates were more likely to cause skin and soft tissue infections (85.4% versus 19.8%, respectively; P < 0.001). HA isolates were more frequently responsible for bacteremia of unknown origin (34.6% versus 4.2%, respectively; P < 0.001) and for catheter-related bacteremia (12.3% versus 0%, respectively; P = 0.011) than CA isolates. Fourteen-day mortality was higher for HA infections than for CA infections (11.1% versus 0%, respectively). A higher proportion of the HA isolates than of the CA isolates were resistant to penicillin (76.5% versus 52.1%, respectively; P = 0.004) and oxacillin (32.1% versus 2.1%, respectively; P < 0.001). Two major clonal complexes (CC1 and CC3) were identified. Sequence type 41 (ST41) was the most common sequence type identified (29.5%). The proportion of ST38 isolates was higher for HA than for CA infections (33.3% versus 12.5%, respectively; P = 0.009). These isolates were of staphylococcal cassette chromosome mec element (SCCmec)type IV, V, or Vt. HA and CA S. lugdunensis infections differ in terms of their clinical features, outcome, antibiotic susceptibilities, and molecular characteristics. PMID:27225402

  1. Attentional Selection Can Be Predicted by Reinforcement Learning of Task-relevant Stimulus Features Weighted by Value-independent Stickiness.

    PubMed

    Balcarras, Matthew; Ardid, Salva; Kaping, Daniel; Everling, Stefan; Womelsdorf, Thilo

    2016-02-01

    Attention includes processes that evaluate stimuli relevance, select the most relevant stimulus against less relevant stimuli, and bias choice behavior toward the selected information. It is not clear how these processes interact. Here, we captured these processes in a reinforcement learning framework applied to a feature-based attention task that required macaques to learn and update the value of stimulus features while ignoring nonrelevant sensory features, locations, and action plans. We found that value-based reinforcement learning mechanisms could account for feature-based attentional selection and choice behavior but required a value-independent stickiness selection process to explain selection errors while at asymptotic behavior. By comparing different reinforcement learning schemes, we found that trial-by-trial selections were best predicted by a model that only represents expected values for the task-relevant feature dimension, with nonrelevant stimulus features and action plans having only a marginal influence on covert selections. These findings show that attentional control subprocesses can be described by (1) the reinforcement learning of feature values within a restricted feature space that excludes irrelevant feature dimensions, (2) a stochastic selection process on feature-specific value representations, and (3) value-independent stickiness toward previous feature selections akin to perseveration in the motor domain. We speculate that these three mechanisms are implemented by distinct but interacting brain circuits and that the proposed formal account of feature-based stimulus selection will be important to understand how attentional subprocesses are implemented in primate brain networks.

  2. Predicting the biomechanical strength of proximal femur specimens with bone mineral density features and support vector regression

    NASA Astrophysics Data System (ADS)

    Huber, Markus B.; Yang, Chien-Chun; Carballido-Gamio, Julio; Bauer, Jan S.; Baum, Thomas; Nagarajan, Mahesh B.; Eckstein, Felix; Lochmüller, Eva; Majumdar, Sharmila; Link, Thomas M.; Wismüller, Axel

    2012-03-01

    To improve the clinical assessment of osteoporotic hip fracture risk, recent computer-aided diagnosis systems explore new approaches to estimate the local trabecular bone quality beyond bone density alone to predict femoral bone strength. In this context, statistical bone mineral density (BMD) features extracted from multi-detector computed tomography (MDCT) images of proximal femur specimens and different function approximations methods were compared in their ability to predict the biomechanical strength. MDCT scans were acquired in 146 proximal femur specimens harvested from human cadavers. The femurs' failure load (FL) was determined through biomechanical testing. An automated volume of interest (VOI)-fitting algorithm was used to define a consistent volume in the femoral head of each specimen. In these VOIs, the trabecular bone was represented by statistical moments of the BMD distribution and by pairwise spatial occurrence of BMD values using the gray-level co-occurrence (GLCM) approach. A linear multi-regression analysis (MultiReg) and a support vector regression algorithm with a linear kernel (SVRlin) were used to predict the FL from the image feature sets. The prediction performance was measured by the root mean square error (RMSE) for each image feature on independent test sets; in addition the coefficient of determination R2 was calculated. The best prediction result was obtained with a GLCM feature set using SVRlin, which had the lowest prediction error (RSME = 1.040+/-0.143, R2 = 0.544) and which was significantly lower that the standard approach of using BMD.mean and MultiReg (RSME = 1.093+/-0.133, R2 = 0.490, p<0.0001). The combined sets including BMD.mean and GLCM features had a similar or slightly lower performance than using only GLCM features. The results indicate that the performance of high-dimensional BMD features extracted from MDCT images in predicting the biomechanical strength of proximal femur specimens can be significantly improved by

  3. US-guided percutaneous cholecystostomy: features predicting culture-positive bile and clinical outcome.

    PubMed

    Sosna, Jacob; Kruskal, Jonathan B; Copel, Laurian; Goldberg, S Nahum; Kane, Robert A

    2004-03-01

    To assess sonographic and clinical features that might be used to predict infected bile and/or patient outcome from ultrasonography (US)-guided percutaneous cholecystostomy. Between February 1997 and August 2002 at one institution, 112 patients underwent US-guided percutaneous cholecystostomy (59 men, 53 women; average age, 69.3 years). All US images were scored on a defined semiquantitative scale according to preset parameters: (a) gallbladder distention, (b) sludge and/or stones, (c) wall appearance, (d) pericholecystic fluid, and (e) common bile duct size and/or choledocholithiasis. Separate and total scores were generated. Retrospective evaluation of (a) the bacteriologic growth of aspirated bile and its color and (b) clinical indices (fever, white blood cell count, bilirubin level, liver function test results) was conducted by reviewing medical records. For each patient, the clinical manifestation was classified into four groups: (a) localized right upper quadrant symptoms, (b) generalized abdominal symptoms, (c) unexplained sepsis, or (d) sepsis with other known infection. Logistic regression models, exact Wilcoxon-Mann-Whitney test, and the Kruskal-Wallis test were used. Forty-seven (44%) of 107 patients had infected bile. A logistic regression model showed that wall appearance, distention, bile color, and pericholecystic fluid were not individually significant predictors for culture-positive bile, leaving sludge and/or stones (P =.003, odds ratio = 1.647), common bile duct status (P =.02, odds ratio = 2.214), and total score (P =.007, odds ratio = 1.267). No US covariates or clinical indices predicted clinical outcome. Clinical manifestation was predictive of clinical outcome (P =.001) and aspirating culture-positive bile (P =.008); specifically, 30 (86%) of 35 patients with right upper quadrant symptoms had their condition improve, compared with one (7%) of 15 asymptomatic patients with other known causes of infection. US variables can be used to predict

  4. Molecular pathway activation features of pediatric acute myeloid leukemia (AML) and acute lymphoblast leukemia (ALL) cells

    PubMed Central

    Petrov, Ivan; Suntsova, Maria; Mutorova, Olga; Sorokin, Maxim; Garazha, Andrew; Ilnitskaya, Elena; Spirin, Pavel; Larin, Sergey; Zhavoronkov, Alex; Kovalchuk, Olga; Prassolov, Vladimir; Roumiantsev, Alexander; Buzdin, Anton

    2016-01-01

    Acute lymphoblast leukemia (ALL) is characterized by overproduction of immature white blood cells in the bone marrow. ALL is most common in the childhood and has high (>80%) cure rate. In contrast, acute myeloid leukemia (AML) has far greater mortality rate than the ALL and is most commonly affecting older adults. However, AML is a leading cause of childhood cancer mortality. In this study, we compare gene expression and molecular pathway activation patterns in three normal blood, seven pediatric ALL and seven pediatric AML bone marrow samples. We identified 172/94 and 148/31 characteristic gene expression/pathway activation signatures, clearly distinguishing pediatric ALL and AML cells, respectively, from the normal blood. The pediatric AML and ALL cells differed by 139/34 gene expression/pathway activation biomarkers. For the adult 30 AML and 17 normal blood samples, we found 132/33 gene expression/pathway AML-specific features, of which only 7/2 were common for the adult and pediatric AML and, therefore, age-independent. At the pathway level, we found more differences than similarities between the adult and pediatric forms. These findings suggest that the adult and pediatric AMLs may require different treatment strategies. PMID:27870639

  5. Molecular features in complex environment: Cooperative team players during excited state bond cleavage

    PubMed Central

    Thallmair, Sebastian; Roos, Matthias K.; de Vivie-Riedle, Regina

    2016-01-01

    Photoinduced bond cleavage is often employed for the generation of highly reactive carbocations in solution and to study their reactivity. Diphenylmethyl derivatives are prominent precursors in polar and moderately polar solvents like acetonitrile or dichloromethane. Depending on the leaving group, the photoinduced bond cleavage occurs on a femtosecond to picosecond time scale and typically leads to two distinguishable products, the desired diphenylmethyl cations (Ph2CH+) and as competing by-product the diphenylmethyl radicals (Ph2CH•). Conical intersections are the chief suspects for such ultrafast branching processes. We show for two typical examples, the neutral diphenylmethylchloride (Ph2CH–Cl) and the charged diphenylmethyltriphenylphosphonium ions (Ph2CH−PPh3+) that the role of the conical intersections depends not only on the molecular features but also on the interplay with the environment. It turns out to differ significantly for both precursors. Our analysis is based on quantum chemical and quantum dynamical calculations. For comparison, we use ultrafast transient absorption measurements. In case of Ph2CH–Cl, we can directly connect the observed signals to two early three-state and two-state conical intersections, both close to the Franck-Condon region. In case of the Ph2CH−PPh3+, dynamic solvent effects are needed to activate a two-state conical intersection at larger distances along the reaction coordinate. PMID:26958588

  6. Molecular features and toxicological properties of four common pesticides, acetamiprid, deltamethrin, chlorpyriphos and fipronil.

    PubMed

    Taillebois, Emiliane; Alamiddine, Zakaria; Brazier, Christine; Graton, Jérôme; Laurent, Adèle D; Thany, Steeve H; Le Questel, Jean-Yves

    2015-04-01

    Structural features and selected physicochemical properties of four common pesticides: acetamiprid (neonicotinoid), chlorpyriphos (organophosphate insecticide), deltamethrin (pyrethroid) and fipronil (phenylpyrazole) have been investigated by Density Functional Theory quantum chemical calculations. The high flexible character of these insecticides is revealed by the numerous conformers obtained, located within a 20kJmol(-1) range in the gas phase. In line with this trend, a redistribution of the energetic minima is observed in water medium. Molecular electrostatic potential calculations provide a ranking of the potential interaction sites of the four insecticides. The theoretical studies reported in the present work are completed by comparative toxicological assays against three aphid strains. Thus, the same toxicity order for the two susceptible strains Myzus persicae 4106A and Acyrthosiphon pisum LSR1: acetamiprid>fipronil>deltamethrin>chlorpyriphos is revealed. In the resistant strain M. persicae 1300145, the toxicity order is modified: acetamiprid>fipronil>chlorpyriphos>deltamethrin. Interestingly, the strain 1300145 which is known to be resistant to neonicotinoids, is also less sensitive to deltamethrin, chlorpyriphos and fipronil.

  7. Association of Fusobacterium species in pancreatic cancer tissues with molecular features and prognosis.

    PubMed

    Mitsuhashi, Kei; Nosho, Katsuhiko; Sukawa, Yasutaka; Matsunaga, Yasutaka; Ito, Miki; Kurihara, Hiroyoshi; Kanno, Shinichi; Igarashi, Hisayoshi; Naito, Takafumi; Adachi, Yasushi; Tachibana, Mami; Tanuma, Tokuma; Maguchi, Hiroyuki; Shinohara, Toshiya; Hasegawa, Tadashi; Imamura, Masafumi; Kimura, Yasutoshi; Hirata, Koichi; Maruyama, Reo; Suzuki, Hiromu; Imai, Kohzoh; Yamamoto, Hiroyuki; Shinomura, Yasuhisa

    2015-03-30

    Recently, bacterial infection causing periodontal disease has attracted considerable attention as a risk factor for pancreatic cancer. Fusobacterium species is an oral bacterial group of the human microbiome. Some evidence suggests that Fusobacterium species promote colorectal cancer development; however, no previous studies have reported the association between Fusobacterium species and pancreatic cancer. Therefore, we examined whether Fusobacterium species exist in pancreatic cancer tissue. Using a database of 283 patients with pancreatic ductal adenocarcinoma (PDAC), we tested cancer tissue specimens for Fusobacterium species. We also tested the specimens for KRAS, NRAS, BRAF and PIK3CA mutations and measured microRNA-21 and microRNA-31. In addition, we assessed epigenetic alterations, including CpG island methylator phenotype (CIMP). Our data showed an 8.8% detection rate of Fusobacterium species in pancreatic cancers; however, tumor Fusobacterium status was not associated with any clinical and molecular features. In contrast, in multivariate Cox regression analysis, compared with the Fusobacterium species-negative group, we observed significantly higher cancer-specific mortality rates in the positive group (p = 0.023). In conclusion, Fusobacterium species were detected in pancreatic cancer tissue. Tumor Fusobacterium species status is independently associated with a worse prognosis of pancreatic cancer, suggesting that Fusobacterium species may be a prognostic biomarker of pancreatic cancer.

  8. Cryptosporidiosis in HIV/AIDS Patients in Kenya: Clinical Features, Epidemiology, Molecular Characterization and Antibody Responses

    PubMed Central

    Wanyiri, Jane W.; Kanyi, Henry; Maina, Samuel; Wang, David E.; Steen, Aaron; Ngugi, Paul; Kamau, Timothy; Waithera, Tabitha; O'Connor, Roberta; Gachuhi, Kimani; Wamae, Claire N.; Mwamburi, Mkaya; Ward, Honorine D.

    2014-01-01

    We investigated the epidemiological and clinical features of cryptosporidiosis, the molecular characteristics of infecting species and serum antibody responses to three Cryptosporidium-specific antigens in human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome (AIDS) patients in Kenya. Cryptosporidium was the most prevalent enteric pathogen and was identified in 56 of 164 (34%) of HIV/AIDS patients, including 25 of 70 (36%) with diarrhea and 31 of 94 (33%) without diarrhea. Diarrhea in patients exclusively infected with Cryptosporidium was significantly associated with the number of children per household, contact with animals, and water treatment. Cryptosporidium hominis was the most prevalent species and the most prevalent subtype family was Ib. Patients without diarrhea had significantly higher serum IgG levels to Chgp15, Chgp40 and Cp23, and higher fecal IgA levels to Chgp15 and Chgp40 than those with diarrhea suggesting that antibody responses to these antigens may be associated with protection from diarrhea and supporting further investigation of these antigens as vaccine candidates. PMID:24865675

  9. Human Leptospira Isolates Circulating in Mayotte (Indian Ocean) Have Unique Serological and Molecular Features

    PubMed Central

    Bourhy, P.; Collet, L.; Lernout, T.; Zinini, F.; Hartskeerl, R. A.; van der Linden, Hans; Thiberge, J. M.; Diancourt, L.; Brisse, S.; Giry, C.; Pettinelli, F.

    2012-01-01

    Leptospirosis is one of the most widespread zoonoses in the world. However, there is a lack of information on circulating Leptospira strains in remote parts of the world. We describe the serological and molecular features of leptospires isolated from 94 leptospirosis patients in Mayotte, a French department located in the Comoros archipelago, between 2007 and 2010. Multilocus sequence typing identified these isolates as Leptospira interrogans, L. kirschneri, L. borgpetersenii, and members of a previously undefined phylogenetic group. This group, consisting of 15 strains, could represent a novel species. Serological typing revealed that 70% of the isolates belonged to the serogroup complex Mini/Sejroe/Hebdomadis, followed by the serogroups Pyrogenes, Grippotyphosa, and Pomona. However, unambiguous typing at the serovar level was not possible for most of the strains because the isolate could belong to more than one serovar or because serovar and species did not match the original classification. Our results indicate that the serovar and genotype distribution in Mayotte differs from what is observed in other regions, thus suggesting a high degree of diversity of circulating isolates worldwide. These results are essential for the improvement of current diagnostic tools and provide a starting point for a better understanding of the epidemiology of leptospirosis in this area of endemicity. PMID:22162544

  10. Molecular pathway activation features of pediatric acute myeloid leukemia (AML) and acute lymphoblast leukemia (ALL) cells.

    PubMed

    Petrov, Ivan; Suntsova, Maria; Mutorova, Olga; Sorokin, Maxim; Garazha, Andrew; Ilnitskaya, Elena; Spirin, Pavel; Larin, Sergey; Kovalchuk, Olga; Prassolov, Vladimir; Zhavoronkov, Alex; Roumiantsev, Alexander; Buzdin, Anton

    2016-11-19

    Acute lymphoblast leukemia (ALL) is characterized by overproduction of immature white blood cells in the bone marrow. ALL is most common in the childhood and has high (>80%) cure rate. In contrast, acute myeloid leukemia (AML) has far greater mortality rate than the ALL and is most commonly affecting older adults. However, AML is a leading cause of childhood cancer mortality. In this study, we compare gene expression and molecular pathway activation patterns in three normal blood, seven pediatric ALL and seven pediatric AML bone marrow samples. We identified 172/94 and 148/31 characteristic gene expression/pathway activation signatures, clearly distinguishing pediatric ALL and AML cells, respectively, from the normal blood. The pediatric AML and ALL cells differed by 139/34 gene expression/pathway activation biomarkers. For the adult 30 AML and 17 normal blood samples, we found 132/33 gene expression/pathway AML-specific features, of which only 7/2 were common for the adult and pediatric AML and, therefore, age-independent. At the pathway level, we found more differences than similarities between the adult and pediatric forms. These findings suggest that the adult and pediatric AMLs may require different treatment strategies.

  11. Anion pairs in room temperature ionic liquids predicted by molecular dynamics simulation, verified by spectroscopic characterization

    SciTech Connect

    Schwenzer, Birgit; Kerisit, Sebastien N.; Vijayakumar, M.

    2014-01-01

    Molecular-level spectroscopic analyses of an aprotic and a protic room-temperature ionic liquid, BMIM OTf and BMIM HSO4, respectively, have been carried out with the aim of verifying molecular dynamics simulations that predict anion pair formation in these fluid structures. Fourier-transform infrared spectroscopy, Raman spectroscopy and nuclear magnetic resonance spectroscopy of various nuclei support the theoretically-determined average molecular arrangements.

  12. Mutation spectrum of TP53 gene predicts clinicopathological features and survival of gastric cancer

    PubMed Central

    Tahara, Tomomitsu; Shibata, Tomoyuki; Okamoto, Yasuyuki; Yamazaki, Jumpei; Kawamura, Tomohiko; Horiguchi, Noriyuki; Okubo, Masaaki; Nakano, Naoko; Ishizuka, Takamitsu; Nagasaka, Mitsuo; Nakagawa, Yoshihito; Ohmiya, Naoki

    2016-01-01

    Background and aim TP53 gene is frequently mutated in gastric cancer (GC), but the relationship with clinicopathological features and prognosis is conflicting. Here, we screened TP53 mutation spectrum of 214 GC patients in relation to their clinicopathological features and prognosis. Results TP53 nonsilent mutations were detected in 80 cases (37.4%), being frequently occurred as C:G to T:A single nucleotide transitions at 5′-CpG-3′ sites. TP53 mutations occurred more frequently in differentiated histologic type than in undifferentiated type in the early stage (48.6% vs. 7%, P=0.0006), while the mutations correlated with venous invasion among advanced stage (47.7% vs. 20.7%, P=0.04). Subset of GC with TP53 hot spot mutations (R175, G245, R248, R273, R282) presented significantly worse overall survival and recurrence free survival compared to others (both P=0.001). Methods Matched biopsies from GC and adjacent tissues from 214 patients were used for the experiment. All coding regions of TP53 gene (exon2 to exon11) were examined using Sanger sequencing. Conclusion Our data suggest that GC with TP53 mutations seems to develop as differentiated histologic type and show aggressive biological behavior such as venous invasion. Moreover, our data emphasizes the importance of discriminating TP53 hot spot mutations (R175, G245, R248, R273, R282) to predict worse overall survival and recurrence free survival of GC patients. PMID:27323394

  13. Predictive toxicology: benchmarking molecular descriptors and statistical methods.

    PubMed

    Feng, Jun; Lurati, Laura; Ouyang, Haojun; Robinson, Tracy; Wang, Yuanyuan; Yuan, Shenglan; Young, S Stanley

    2003-01-01

    The development of drugs depends on finding compounds that have beneficial effects with a minimum of toxic effects. The measurement of toxic effects is typically time-consuming and expensive, so there is a need to be able to predict toxic effects from the compound structure. Predicting toxic effects is expected to be challenging because there are usually multiple toxic mechanisms involved. In this paper, combinations of different chemical descriptors and popular statistical methods were applied to the problem of predictive toxicology. Four data sets were collected and cleaned, and four different sets of chemical descriptors were calculated for the compounds in each of the four data sets. Three statistical methods (recursive partitioning, neural networks, and partial least squares) were used to attempt to link chemical descriptors to the response. Good predictions were achieved in the two smaller data sets; we found for large data sets that the results were less effective, indicating that new chemical descriptors or statistical methods are needed. All of the methods and descriptors worked to a degree, but our work hints that certain descriptors work better with specific statistical methods so there is a need for better understanding and for continued methods development.

  14. Predicting High-Impact Pharmacological Targets by Integrating Transcriptome and Text-Mining Features.

    PubMed

    Mayburd, Anatoly; Baranova, Ancha

    Novel, "outside of the box" approaches are needed for evaluating candidate molecules, especially in oncology. Throughout the years of 2000-2010, the efficiency of drug development fell to barely acceptable levels, and in the second decade of this century, levels have improved only marginally. This dismal condition continues despite unprecedented progress in the development of a variety of high-throughput tools, computational methods, aggregated databases, drug repurposing programs and innovative chemistries. Here we tested a hypothesis that the economic impact of targeting a particular gene product is predictable a priori by employing a combination of transcriptome profiles and quantitative metrics reflecting existing literature. To extract classification features, the gene expression patterns of a posteriori high-impact and low-impact anti-cancer target sets were compared. To minimize the possible bias of text-mining, the number of manuscripts published prior to the first clinical trial or relevant review paper, as well as its first derivative in this interval, were collected and used as quantitative metrics of public interest. By combining the gene expression and literature mining features, a 4-fold enrichment in high-impact targets was produced, resulting in a favourable ROC curve analysis for the top impact targets. The dataset was enriched by the highest impact anti-cancer targets, while demonstrating drastic differences in economic value between high and low-impact targets. Known anti-cancer products of EGFR, ERBB2, CYP19A1/aromatase, MTOR, PTGS2, tubulin, VEGFA, BRAF, PGR, PDGFRA, SRC, REN, CSF1R, CTLA4 and HSP90AA1 genes received the highest scores for predicted impact, while microsomal steroid sulfatase, anticoagulant protein C, p53, CDKN2A, c-Jun, and TNSFS11 were highlighted as most promising research-stage targets. A significant cost reduction may be achieved by a priori impact assessment of targets and ligands before their development or repurposing

  15. Two-step feature selection for predicting survival time of patients with metastatic castrate resistant prostate cancer

    PubMed Central

    Shiga, Motoki

    2016-01-01

    Metastatic castrate resistant prostate cancer (mCRPC) is the major cause of death in prostate cancer patients. Even though some options for treatment of mCRPC have been developed, the most effective therapies remain unclear. Thus finding key patient clinical variables related with mCRPC is an important issue for understanding the disease progression mechanism of mCRPC and clinical decision making for these patients. The Prostate Cancer DREAM Challenge is a crowd-based competition to tackle this essential challenge using new large clinical datasets. This paper proposes an effective procedure for predicting global risks and survival times of these patients, aimed at sub-challenge 1a and 1b of the Prostate Cancer DREAM challenge. The procedure implements a two-step feature selection procedure, which first implements sparse feature selection for numerical clinical variables and statistical hypothesis testing of differences between survival curves caused by categorical clinical variables, and then implements a forward feature selection to narrow the list of informative features. Using Cox’s proportional hazards model with these selected features, this method predicted global risk and survival time of patients using a linear model whose input is a median time computed from the hazard model. The challenge results demonstrated that the proposed procedure outperforms the state of the art model by correctly selecting more informative features on both the global risk prediction and the survival time prediction. PMID:27990267

  16. Is Using Threshold-Crossing Method and Single Type of Features Sufficient to Achieve Realistic Application of Seizure Prediction?

    PubMed

    Zheng, Yang; Wang, Gang; Wang, Jue

    2016-10-01

    Objective This study aims to verify whether the simple threshold-crossing method can work well enough to achieve the realistic application of seizure prediction on the basis of a large public database, and examines how a more complex classifier can improve prediction performance. It also verified whether the combination of multiple types of features with a complex classifier can improve prediction performance. Method Phase synchronization and spectral power features were extracted from electroencephalogram recordings. The threshold-crossing method and a support vector machine (SVM) were used to identify preictal and interictal samples. Based on the type of selected features and the manner of classification, 5 different methods were conducted on 19 patients. The performances of these methods were directly compared and tested using a random predictor. In-sample optimization problems were avoided in the feature and parameter selection procedure to obtain credible results. Results The threshold-crossing method could only obtain satisfying prediction results for approximately half of the selected patients. The SVM classifier could significantly improve prediction performance compared with the threshold-crossing method for both types of features. Although the average performance was further improved when both types of features were combined with the SVM classifier, the improvement was insignificant. Conclusion A complex classifier, such as the SVM, is recommended in a realistic prediction device, although it will increase the complexity of the device. Indeed, the simple threshold-crossing method performs well enough for some of the patients. The combination of phase synchronization and spectral power features is unnecessary because of the increased computation complexity. © EEG and Clinical Neuroscience Society (ECNS) 2015.

  17. Prenatal Features Predictive of Robin Sequence Identified by Fetal Magnetic Resonance Imaging.

    PubMed

    Rogers-Vizena, Carolyn R; Mulliken, John B; Daniels, Kimberly M; Estroff, Judy A

    2016-06-01

    Prenatal magnetic resonance imaging is increasingly used to detect congenital anomalies. The purpose of this study was to determine whether prenatal magnetic resonance imaging accurately characterizes features predictive of postnatal Robin sequence so that possible airway compromise and feeding difficulty at birth can be anticipated. The authors retrospectively identified pregnant women who underwent fetal magnetic resonance imaging between 2002 and 2014 and were found to be carrying a fetus with micrognathia. Micrognathia was subjectively categorized as minor, moderate, or severe. Pregnancy outcome was determined as follows: intrauterine fetal demise, elective termination, early neonatal death, or viable infant. Postnatal findings of micrognathia, Robin sequence, and associated anomalies were compared to prenatal findings. Micrognathia was identified in 123 fetuses. Fifty-two pregnancies (42.3 percent) produced a viable infant. The remainder resulted in termination in the fetal period or death shortly after birth resulting from unrelated causes. For infants who lived, prenatal micrognathia was categorized as minor (55.1 percent), moderate (30.6 percent), or severe (14.3 percent). Forty-two percent of neonates with minor prenatal micrognathia had postnatal micrognathia; however, only 11.1 percent had Robin sequence. All neonates with moderate fetal micrognathia had postnatal micrognathia, and the majority had Robin sequence (86.7 percent). All newborns with severe micrognathia had Robin sequence and all prenatally diagnosed with glossoptosis had Robin sequence. Prenatal findings of moderate or severe micrognathia or glossoptosis are predictive of postnatal Robin sequence, thus expediting appropriate perinatal management of airway and feeding problems. Diagnostic, IV.

  18. Wolfram Syndrome in the Japanese Population; Molecular Analysis of WFS1 Gene and Characterization of Clinical Features

    PubMed Central

    Inoue, Hiroshi; Okuya, Shigeru; Ohta, Yasuharu; Akiyama, Masaru; Taguchi, Akihiko; Kora, Yukari; Okayama, Naoko; Yamada, Yuichiro; Wada, Yasuhiko; Amemiya, Shin; Sugihara, Shigetaka; Nakao, Yuzo; Oka, Yoshitomo; Tanizawa, Yukio

    2014-01-01

    Background Wolfram syndrome (WFS) is a recessive neurologic and endocrinologic degenerative disorder, and is also known as DIDMOAD (Diabetes Insipidus, early-onset Diabetes Mellitus, progressive Optic Atrophy and Deafness) syndrome. Most affected individuals carry recessive mutations in the Wolfram syndrome 1 gene (WFS1). However, the phenotypic pleiomorphism, rarity and molecular complexity of this disease complicate our efforts to understand WFS. To address this limitation, we aimed to describe complications and to elucidate the contributions of WFS1 mutations to clinical manifestations in Japanese patients with WFS. Methodology The minimal ascertainment criterion for diagnosing WFS was having both early onset diabetes mellitus and bilateral optic atrophy. Genetic analysis for WFS1 was performed by direct sequencing. Principal Findings Sixty-seven patients were identified nationally for a prevalence of one per 710,000, with 33 patients (49%) having all 4 components of DIDMOAD. In 40 subjects who agreed to participate in this investigation from 30 unrelated families, the earliest manifestation was DM at a median age of 8.7 years, followed by OA at a median age of 15.8 years. However, either OA or DI was the first diagnosed feature in 6 subjects. In 10, features other than DM predated OA. Twenty-seven patients (67.5%) had a broad spectrum of recessive mutations in WFS1. Two patients had mutations in only one allele. Eleven patients (27.5%) had intact WFS1 alleles. Ages at onset of both DM and OA in patients with recessive WFS1 mutations were indistinguishable from those in patients without WFS1 mutations. In the patients with predicted complete loss-of-function mutations, ages at the onsets of both DM and OA were significantly earlier than those in patients with predicted partial-loss-of function mutations. Conclusion/Significance This study emphasizes the clinical and genetic heterogeneity in patients with WFS. Genotype-phenotype correlations may exist in patients

  19. Predicting Response to Neoadjuvant Chemoradiotherapy in Esophageal Cancer with Textural Features Derived from Pretreatment (18)F-FDG PET/CT Imaging.

    PubMed

    Beukinga, Roelof J; Hulshoff, Jan B; van Dijk, Lisanne V; Muijs, Christina T; Burgerhof, Johannes G M; Kats-Ugurlu, Gursah; Slart, Riemer H J A; Slump, Cornelis H; Mul, Véronique E M; Plukker, John Th M

    2017-05-01

    Adequate prediction of tumor response to neoadjuvant chemoradiotherapy (nCRT) in esophageal cancer (EC) patients is important in a more personalized treatment. The current best clinical method to predict pathologic complete response is SUVmax in (18)F-FDG PET/CT imaging. To improve the prediction of response, we constructed a model to predict complete response to nCRT in EC based on pretreatment clinical parameters and (18)F-FDG PET/CT-derived textural features. Methods: From a prospectively maintained single-institution database, we reviewed 97 consecutive patients with locally advanced EC and a pretreatment (18)F-FDG PET/CT scan between 2009 and 2015. All patients were treated with nCRT (carboplatin/paclitaxel/41.4 Gy) followed by esophagectomy. We analyzed clinical, geometric, and pretreatment textural features extracted from both (18)F-FDG PET and CT. The current most accurate prediction model with SUVmax as a predictor variable was compared with 6 different response prediction models constructed using least absolute shrinkage and selection operator regularized logistic regression. Internal validation was performed to estimate the model's performances. Pathologic response was defined as complete versus incomplete response (Mandard tumor regression grade system 1 vs. 2-5). Results: Pathologic examination revealed 19 (19.6%) complete and 78 (80.4%) incomplete responders. Least absolute shrinkage and selection operator regularization selected the clinical parameters: histologic type and clinical T stage, the (18)F-FDG PET-derived textural feature long run low gray level emphasis, and the CT-derived textural feature run percentage. Introducing these variables to a logistic regression analysis showed areas under the receiver-operating-characteristic curve (AUCs) of 0.78 compared with 0.58 in the SUVmax model. The discrimination slopes were 0.17 compared with 0.01, respectively. After internal validation, the AUCs decreased to 0.74 and 0.54, respectively. Conclusion

  20. Predicting tooth color from facial features and gender: results from a white elderly cohort.

    PubMed

    Hassel, Alexander J; Nitschke, Ina; Dreyhaupt, Jens; Wegener, Ina; Rammelsberg, Peter; Hassel, Jessica C

    2008-02-01

    Clinicians providing edentulous patients with complete dentures are often confronted with the problem of not knowing the patient's natural tooth color. It would be valuable to be able to determine this from other facial features. The purpose of this study was to assess the possibility of predicting tooth color in the elderly from hair and eye color, facial skin complexion, and gender. The lightness (L*), chroma (C*), and hue (h*) of the color of 541 natural teeth were measured for a white study population (94 subjects, 75 to 77 years old, 55.3% male) by means of a single measurement with a clinically applicable spectrophotometer. Hair and eye color and facial skin complexion were recorded in categories. Mixed-effects regression models were calculated for each L*, C*, and h* value with hair and eye color, facial skin complexion, and gender as independent variables (alpha=.05). Only gender and hair color in univariate analysis and, additionally, eye color in multivariate analysis, were significant predictors of tooth color. Higher L* values (lighter color) were associated with lighter eye color and with female gender. The C* value was lower (less saturated) for women. More yellow/green than yellow/red h* values were associated with hair colors other than black and with female gender. However, the parameter estimates of the variables were rather low. Determination of tooth color from hair and eye color and from gender in the white elderly was only partially possible.

  1. Experimental indication of a naphthalene-base molecular aggregate for the carrier of the 2175 angstroms interstellar extinction feature

    NASA Technical Reports Server (NTRS)

    Beegle, L. W.; Wdowiak, T. J.; Robinson, M. S.; Cronin, J. R.; McGehee, M. D.; Clemett, S. J.; Gillette, S.

    1997-01-01

    Experiments where the simple polycyclic aromatic hydrocarbon (PAH) naphthalene (C10H8) is subjected to the energetic environment of a plasma have resulted in the synthesis of a molecular aggregate that has ultraviolet spectral characteristics that suggest it provides insight into the nature of the carrier of the 2175 angstroms interstellar extinction feature and may be a laboratory analog. Ultraviolet, visible, infrared, and mass spectroscopy, along with gas chromatography, indicate that it is a molecular aggregate in which an aromatic double ring ("naphthalene") structural base serves as the electron "box" chromophore that gives rise to the envelope of the 2175 angstroms feature. This chromophore can also provide the peak of the feature or function as a mantle in concert with another peak provider such as graphite. The molecular base/chromophore manifests itself both as a structural component of an alkyl-aromatic polymer and as a substructure of hydrogenated PAH species. Its spectral and molecular characteristics are consistent with what is generally expected for a complex molecular aggregate that has a role as an interstellar constituent.

  2. Experimental indication of a naphthalene-base molecular aggregate for the carrier of the 2175 angstroms interstellar extinction feature

    NASA Technical Reports Server (NTRS)

    Beegle, L. W.; Wdowiak, T. J.; Robinson, M. S.; Cronin, J. R.; McGehee, M. D.; Clemett, S. J.; Gillette, S.

    1997-01-01

    Experiments where the simple polycyclic aromatic hydrocarbon (PAH) naphthalene (C10H8) is subjected to the energetic environment of a plasma have resulted in the synthesis of a molecular aggregate that has ultraviolet spectral characteristics that suggest it provides insight into the nature of the carrier of the 2175 angstroms interstellar extinction feature and may be a laboratory analog. Ultraviolet, visible, infrared, and mass spectroscopy, along with gas chromatography, indicate that it is a molecular aggregate in which an aromatic double ring ("naphthalene") structural base serves as the electron "box" chromophore that gives rise to the envelope of the 2175 angstroms feature. This chromophore can also provide the peak of the feature or function as a mantle in concert with another peak provider such as graphite. The molecular base/chromophore manifests itself both as a structural component of an alkyl-aromatic polymer and as a substructure of hydrogenated PAH species. Its spectral and molecular characteristics are consistent with what is generally expected for a complex molecular aggregate that has a role as an interstellar constituent.

  3. The molecular features of uncoupling protein 1 support a conventional mitochondrial carrier-like mechanism

    PubMed Central

    Crichton, Paul G.; Lee, Yang; Kunji, Edmund R.S.

    2017-01-01

    Uncoupling protein 1 (UCP1) is an integral membrane protein found in the mitochondrial inner membrane of brown adipose tissue, and facilitates the process of non-shivering thermogenesis in mammals. Its activation by fatty acids, which overcomes its inhibition by purine nucleotides, leads to an increase in the proton conductance of the inner mitochondrial membrane, short-circuiting the mitochondrion to produce heat rather than ATP. Despite 40 years of intense research, the underlying molecular mechanism of UCP1 is still under debate. The protein belongs to the mitochondrial carrier family of transporters, which have recently been shown to utilise a domain-based alternating-access mechanism, cycling between a cytoplasmic and matrix state to transport metabolites across the inner membrane. Here, we review the protein properties of UCP1 and compare them to those of mitochondrial carriers. UCP1 has the same structural fold as other mitochondrial carriers and, in contrast to past claims, is a monomer, binding one purine nucleotide and three cardiolipin molecules tightly. The protein has a single substrate binding site, which is similar to those of the dicarboxylate and oxoglutarate carriers, but also contains a proton binding site and several hydrophobic residues. As found in other mitochondrial carriers, UCP1 has two conserved salt bridge networks on either side of the central cavity, which regulate access to the substrate binding site in an alternating way. The conserved domain structures and mobile inter-domain interfaces are consistent with an alternating access mechanism too. In conclusion, UCP1 has retained all of the key features of mitochondrial carriers, indicating that it operates by a conventional carrier-like mechanism. PMID:28057583

  4. The molecular features of uncoupling protein 1 support a conventional mitochondrial carrier-like mechanism.

    PubMed

    Crichton, Paul G; Lee, Yang; Kunji, Edmund R S

    2017-03-01

    Uncoupling protein 1 (UCP1) is an integral membrane protein found in the mitochondrial inner membrane of brown adipose tissue, and facilitates the process of non-shivering thermogenesis in mammals. Its activation by fatty acids, which overcomes its inhibition by purine nucleotides, leads to an increase in the proton conductance of the inner mitochondrial membrane, short-circuiting the mitochondrion to produce heat rather than ATP. Despite 40 years of intense research, the underlying molecular mechanism of UCP1 is still under debate. The protein belongs to the mitochondrial carrier family of transporters, which have recently been shown to utilise a domain-based alternating-access mechanism, cycling between a cytoplasmic and matrix state to transport metabolites across the inner membrane. Here, we review the protein properties of UCP1 and compare them to those of mitochondrial carriers. UCP1 has the same structural fold as other mitochondrial carriers and, in contrast to past claims, is a monomer, binding one purine nucleotide and three cardiolipin molecules tightly. The protein has a single substrate binding site, which is similar to those of the dicarboxylate and oxoglutarate carriers, but also contains a proton binding site and several hydrophobic residues. As found in other mitochondrial carriers, UCP1 has two conserved salt bridge networks on either side of the central cavity, which regulate access to the substrate binding site in an alternating way. The conserved domain structures and mobile inter-domain interfaces are consistent with an alternating access mechanism too. In conclusion, UCP1 has retained all of the key features of mitochondrial carriers, indicating that it operates by a conventional carrier-like mechanism. Copyright © 2017 Medical research Council. Published by Elsevier B.V. All rights reserved.

  5. Reactivity of a Molecular Magnesium Hydride Featuring a Terminal Magnesium-Hydrogen Bond.

    PubMed

    Schnitzler, Silvia; Spaniol, Thomas P; Okuda, Jun

    2016-12-19

    The reactivity of the molecular magnesium hydride [Mg(Me3TACD·Al(i)Bu3)H] (1) featuring a terminal magnesium-hydrogen bond and an NNNN-type macrocyclic ligand, Me3TACD ((Me3TACD)H = Me3[12]aneN4 = 1,4,7-trimethyl-1,4,7,10-tetraazacyclododecane), can be grouped into protonolysis, oxidation, hydrometalation, (insertion), and hydride abstraction. Protonolysis of 1 with weak Brønsted acids HX such as terminal acetylenes, amines, silanols, and si