Science.gov

Sample records for molecular features predicting

  1. Genes associated with histopathologic features of triple negative breast tumors predict molecular subtypes.

    PubMed

    Purrington, Kristen S; Visscher, Daniel W; Wang, Chen; Yannoukakos, Drakoulis; Hamann, Ute; Nevanlinna, Heli; Cox, Angela; Giles, Graham G; Eckel-Passow, Jeanette E; Lakis, Sotiris; Kotoula, Vassiliki; Fountzilas, George; Kabisch, Maria; Rüdiger, Thomas; Heikkilä, Päivi; Blomqvist, Carl; Cross, Simon S; Southey, Melissa C; Olson, Janet E; Gilbert, Judy; Deming-Halverson, Sandra; Kosma, Veli-Matti; Clarke, Christine; Scott, Rodney; Jones, J Louise; Zheng, Wei; Mannermaa, Arto; Eccles, Diana M; Vachon, Celine M; Couch, Fergus J

    2016-05-01

    Distinct subtypes of triple negative (TN) breast cancer have been identified by tumor expression profiling. However, little is known about the relationship between histopathologic features of TN tumors, which reflect aspects of both tumor behavior and tumor microenvironment, and molecular TN subtypes. The histopathologic features of TN tumors were assessed by central review and 593 TN tumors were subjected to whole genome expression profiling using the Illumina Whole Genome DASL array. TN molecular subtypes were defined based on gene expression data associated with histopathologic features of TN tumors. Gene expression analysis yielded signatures for four TN subtypes (basal-like, androgen receptor positive, immune, and stromal) consistent with previous studies. Expression analysis also identified genes significantly associated with the 12 histological features of TN tumors. Development of signatures using these markers of histopathological features resulted in six distinct TN subtype signatures, including an additional basal-like and stromal signature. The additional basal-like subtype was distinguished by elevated expression of cell motility and glucose metabolism genes and reduced expression of immune signaling genes, whereas the additional stromal subtype was distinguished by elevated expression of immunomodulatory pathway genes. Histopathologic features that reflect heterogeneity in tumor architecture, cell structure, and tumor microenvironment are related to TN subtype. Accounting for histopathologic features in the development of gene expression signatures, six major subtypes of TN breast cancer were identified. PMID:27083182

  2. Semen molecular and cellular features: these parameters can reliably predict subsequent ART outcome in a goat model

    PubMed Central

    Berlinguer, Fiammetta; Madeddu, Manuela; Pasciu, Valeria; Succu, Sara; Spezzigu, Antonio; Satta, Valentina; Mereu, Paolo; Leoni, Giovanni G; Naitana, Salvatore

    2009-01-01

    Currently, the assessment of sperm function in a raw or processed semen sample is not able to reliably predict sperm ability to withstand freezing and thawing procedures and in vivo fertility and/or assisted reproductive biotechnologies (ART) outcome. The aim of the present study was to investigate which parameters among a battery of analyses could predict subsequent spermatozoa in vitro fertilization ability and hence blastocyst output in a goat model. Ejaculates were obtained by artificial vagina from 3 adult goats (Capra hircus) aged 2 years (A, B and C). In order to assess the predictive value of viability, computer assisted sperm analyzer (CASA) motility parameters and ATP intracellular concentration before and after thawing and of DNA integrity after thawing on subsequent embryo output after an in vitro fertility test, a logistic regression analysis was used. Individual differences in semen parameters were evident for semen viability after thawing and DNA integrity. Results of IVF test showed that spermatozoa collected from A and B lead to higher cleavage rates (0 < 0.01) and blastocysts output (p < 0.05) compared with C. Logistic regression analysis model explained a deviance of 72% (p < 0.0001), directly related with the mean percentage of rapid spermatozoa in fresh semen (p < 0.01), semen viability after thawing (p < 0.01), and with two of the three comet parameters considered, i.e tail DNA percentage and comet length (p < 0.0001). DNA integrity alone had a high predictive value on IVF outcome with frozen/thawed semen (deviance explained: 57%). The model proposed here represents one of the many possible ways to explain differences found in embryo output following IVF with different semen donors and may represent a useful tool to select the most suitable donors for semen cryopreservation. PMID:19900288

  3. Features of CD44+/CD24-low phenotypic cell distribution in relation to predictive markers and molecular subtypes of invasive ductal carcinoma of the breast.

    PubMed

    Gudadze, M; Kankava, Q; Mariamidze, A; Burkadze, G

    2014-03-01

    Breast cancer is the most widespread pathology among women. Despite the current progresses in research and treatment of metastatic breast cancer, mortality caused by this disease is still high, because above mentioned therapy is limited due to existence of cells resistant to therapy . Cancer stem cells are the only cells with ability of unlimited proliferative activity and cancerous potential, thus, they participate in the growth, progression and dissemination of cancer. Cancer stem cells are resistant to various forms of therapy, including chemotherapy and radiotherapy . Results of examination showed that 50% of all cases are positive on so called markers of stem cells, thus 45% of cases are negative. CD44+/CD24-low cases (cases that reveal stem cell-phenotype) in the group of invasive ductal carcinoma of Luminal A molecular subtype are almost as many as CD44+/CD24+ and CD44-/CD24+ phenotype cancers. In this group non-stem phenotype cases are 65%, so 5 times more than stem cell phenotype cancers. 1324 postoperative breast materials studied through 2008-2012 at the laboratory of "Pathgeo-Union of Pathologists" LTD and Academician N. Kipshidze Central University Clinic were used as test materials and specimens from 393 patients with invasive ductal carcinoma were selected. CD44/CD24 markers' expression in phenotypically different cancers and clinic-pathologic parameters as well as various biological features was conducted by the Pearson's correlation analysis and using X2 test. Statistical analysis of obtained numeral data was held using SPSS V.19.0 program. Confidence interval of 95% was considered statistically significant. Stem cell phenotype positive cases are with the highest percentage represented in Luminal B and basal-like molecular subgroup that to our minds is associated with their aggressive behavior and resistance to chemotherapy. Relatively good prognosis and response to chemotherapy of Luminal A molecular subtype cancers are to be stipulated by lower

  4. Communication: Finding destructive interference features in molecular transport junctions.

    PubMed

    Reuter, Matthew G; Hansen, Thorsten

    2014-11-14

    Associating molecular structure with quantum interference features in electrode-molecule-electrode transport junctions has been difficult because existing guidelines for understanding interferences only apply to conjugated hydrocarbons. Herein we use linear algebra and the Landauer-Büttiker theory for electron transport to derive a general rule for predicting the existence and locations of interference features. Our analysis illustrates that interferences can be directly determined from the molecular Hamiltonian and the molecule-electrode couplings, and we demonstrate its utility with several examples. PMID:25399124

  5. Communication: Finding destructive interference features in molecular transport junctions

    SciTech Connect

    Reuter, Matthew G.; Hansen, Thorsten

    2014-11-14

    Associating molecular structure with quantum interference features in electrode-molecule-electrode transport junctions has been difficult because existing guidelines for understanding interferences only apply to conjugated hydrocarbons. Herein we use linear algebra and the Landauer-Büttiker theory for electron transport to derive a general rule for predicting the existence and locations of interference features. Our analysis illustrates that interferences can be directly determined from the molecular Hamiltonian and the molecule–electrode couplings, and we demonstrate its utility with several examples.

  6. Predicting discovery rates of genomic features.

    PubMed

    Gravel, Simon

    2014-06-01

    Successful sequencing experiments require judicious sample selection. However, this selection must often be performed on the basis of limited preliminary data. Predicting the statistical properties of the final sample based on preliminary data can be challenging, because numerous uncertain model assumptions may be involved. Here, we ask whether we can predict "omics" variation across many samples by sequencing only a fraction of them. In the infinite-genome limit, we find that a pilot study sequencing 5% of a population is sufficient to predict the number of genetic variants in the entire population within 6% of the correct value, using an estimator agnostic to demography, selection, or population structure. To reach similar accuracy in a finite genome with millions of polymorphisms, the pilot study would require ∼15% of the population. We present computationally efficient jackknife and linear programming methods that exhibit substantially less bias than the state of the art when applied to simulated data and subsampled 1000 Genomes Project data. Extrapolating based on the National Heart, Lung, and Blood Institute Exome Sequencing Project data, we predict that 7.2% of sites in the capture region would be variable in a sample of 50,000 African Americans and 8.8% in a European sample of equal size. Finally, we show how the linear programming method can also predict discovery rates of various genomic features, such as the number of transcription factor binding sites across different cell types. PMID:24637199

  7. Feature Selection for Wheat Yield Prediction

    NASA Astrophysics Data System (ADS)

    Ruß, Georg; Kruse, Rudolf

    Carrying out effective and sustainable agriculture has become an important issue in recent years. Agricultural production has to keep up with an everincreasing population by taking advantage of a field’s heterogeneity. Nowadays, modern technology such as the global positioning system (GPS) and a multitude of developed sensors enable farmers to better measure their fields’ heterogeneities. For this small-scale, precise treatment the term precision agriculture has been coined. However, the large amounts of data that are (literally) harvested during the growing season have to be analysed. In particular, the farmer is interested in knowing whether a newly developed heterogeneity sensor is potentially advantageous or not. Since the sensor data are readily available, this issue should be seen from an artificial intelligence perspective. There it can be treated as a feature selection problem. The additional task of yield prediction can be treated as a multi-dimensional regression problem. This article aims to present an approach towards solving these two practically important problems using artificial intelligence and data mining ideas and methodologies.

  8. Molecular Dynamics Simulations Of Nanometer-Scale Feature Etch

    SciTech Connect

    Vegh, J. J.; Graves, D. B.

    2008-09-23

    Molecular dynamics (MD) simulations have been carried out to examine fundamental etch limitations. Beams of Ar{sup +}, Ar{sup +}/F and CF{sub x}{sup +} (x = 2,3) with 2 nm diameter cylindrical confinement were utilized to mimic 'perfect' masks for small feature etching in silicon. The holes formed during etch exhibit sidewall damage and passivation as a result of ion-induced mixing. The MD results predict a minimum hole diameter of {approx}5 nm after post-etch cleaning of the sidewall.

  9. Feature Selection for Neural Network Based Stock Prediction

    NASA Astrophysics Data System (ADS)

    Sugunnasil, Prompong; Somhom, Samerkae

    We propose a new methodology of feature selection for stock movement prediction. The methodology is based upon finding those features which minimize the correlation relation function. We first produce all the combination of feature and evaluate each of them by using our evaluate function. We search through the generated set with hill climbing approach. The self-organizing map based stock prediction model is utilized as the prediction method. We conduct the experiment on data sets of the Microsoft Corporation, General Electric Co. and Ford Motor Co. The results show that our feature selection method can improve the efficiency of the neural network based stock prediction.

  10. Predicting the molecular complexity of sequencing libraries.

    PubMed

    Daley, Timothy; Smith, Andrew D

    2013-04-01

    Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing. PMID:23435259

  11. Molecular Markers Predictive of Chemotherapy Response in Colorectal Cancer

    PubMed Central

    Shiovitz, Stacey; Grady, William M.

    2015-01-01

    Recognition of the molecular heterogeneity of colorectal cancer (CRC) has led to the classification of CRC based on a variety of clinical and molecular characteristics. Although the clinical significance of the majority of these molecular alterations is still being ascertained, it is widely anticipated that these characteristics will improve the accuracy of our ability to determine the prognosis and therapeutic response of CRC patients. A few of these markers, such as microsatellite instability and the CpG island methylator phenotype (CIMP), show promise as predictive markers for cytotoxic chemotherapy. KRAS is a validated biomarker for EGFR-targeted therapy, while NRAS and PI3KCA are evolving markers for targeted therapies. Multiple new actionable drug targets are being identified on a regular basis, but most are not ready for clinical use at this time. This review focuses on key molecular features of CRCs and the application of these molecular alterations as predictive biomarkers for CRC. PMID:25663616

  12. Learning through Feature Prediction: An Initial Investigation into Teaching Categories to Children with Autism through Predicting Missing Features

    ERIC Educational Resources Information Center

    Sweller, Naomi

    2015-01-01

    Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…

  13. Molecular features of cellular reprogramming and development.

    PubMed

    Smith, Zachary D; Sindhu, Camille; Meissner, Alexander

    2016-03-01

    Differentiating somatic cells are progressively restricted to specialized functions during ontogeny, but they can be experimentally directed to form other cell types, including those with complete embryonic potential. Early nuclear reprogramming methods, such as somatic cell nuclear transfer (SCNT) and cell fusion, posed significant technical hurdles to precise dissection of the regulatory programmes governing cell identity. However, the discovery of reprogramming by ectopic expression of a defined set of transcription factors, known as direct reprogramming, provided a tractable platform to uncover molecular characteristics of cellular specification and differentiation, cell type stability and pluripotency. We discuss the control and maintenance of cellular identity during developmental transitions as they have been studied using direct reprogramming, with an emphasis on transcriptional and epigenetic regulation. PMID:26883001

  14. Outer packet sets and feature prediction of computer virus

    NASA Astrophysics Data System (ADS)

    Zhang, Ling

    2014-10-01

    The packet sets model was proposed by Prof. Shi in 2008. A packet sets is a set pair composed of internal and outer packet sets, and it has dynamic characteristic. Using packet sets theory, this paper gives the feature prediction of computer virus based on outer packet sets. The concept of virus screening-filtering is given, furthermore, the virus screening-filtering order theorem, composite virus screening-filtering theorem and virus screening-filtering rule are presented. A prediction method of computer virus feature is given based on the results. The outer packet sets is a new tool in the research of the prediction of dynamic virus feature.

  15. Predicting Clinical Outcomes Using Molecular Biomarkers

    PubMed Central

    Burke, Harry B.

    2016-01-01

    Over the past 20 years, there has been an exponential increase in the number of biomarkers. At the last count, there were 768,259 papers indexed in PubMed.gov directly related to biomarkers. Although many of these papers claim to report clinically useful molecular biomarkers, embarrassingly few are currently in clinical use. It is suggested that a failure to properly understand, clinically assess, and utilize molecular biomarkers has prevented their widespread adoption in treatment, in comparative benefit analyses, and their integration into individualized patient outcome predictions for clinical decision-making and therapy. A straightforward, general approach to understanding how to predict clinical outcomes using risk, diagnostic, and prognostic molecular biomarkers is presented. In the future, molecular biomarkers will drive advances in risk, diagnosis, and prognosis, they will be the targets of powerful molecular therapies, and they will individualize and optimize therapy. Furthermore, clinical predictions based on molecular biomarkers will be displayed on the clinician’s screen during the physician–patient interaction, they will be an integral part of physician–patient-shared decision-making, and they will improve clinical care and patient outcomes. PMID:27279751

  16. Predicting beef tenderness using color and multispectral image texture features.

    PubMed

    Sun, X; Chen, K J; Maddock-Carlin, K R; Anderson, V L; Lepper, A N; Schwartz, C A; Keller, W L; Ilse, B R; Magolski, J D; Berg, E P

    2012-12-01

    The objective of this study was to investigate the usefulness of raw meat surface characteristics (texture) in predicting cooked beef tenderness. Color and multispectral texture features, including 4 different wavelengths and 217 image texture features, were extracted from 2 laboratory-based multispectral camera imaging systems. Steaks were segregated into tough and tender classification groups based on Warner-Bratzler shear force. The texture features were submitted to STEPWISE multiple regression and support vector machine (SVM) analyses to establish prediction models for beef tenderness. A subsample (80%) of tender or tough classified steaks were used to train models which were then validated on the remaining (20%) test steaks. For color images, the SVM model correctly identified tender steaks with 100% accurately while the STEPWISE equation identified 94.9% of the tender steaks correctly. For multispectral images, the SVM model predicted 91% and STEPWISE predicted 87% average accuracy of beef tender. PMID:22647652

  17. Feature selection for splice site prediction: A new method using EDA-based feature ranking

    PubMed Central

    Saeys, Yvan; Degroeve, Sven; Aeyels, Dirk; Rouzé, Pierre; Van de Peer, Yves

    2004-01-01

    Background The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. Results In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. Conclusion We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features. PMID:15154966

  18. Molecular predictive and prognostic factors in ependymoma.

    PubMed

    Benson, Rony; Mallick, Supriya; Julka, Pramod K; Rath, Goura K

    2016-01-01

    An ependymoma is an uncommon glial tumor, which arises from different parts of the neuroaxis. Considerable variation in presentation and survival in tumors in different locations after an optimum treatment indicates inherent molecular and genetic differences in tumorigenesis between them. A number of genetic aberrations have been identified to distinctly characterize different subgroups of ependymomas that include a posterior fossa tumor, a supratentorial tumor, and a pediatric tumor. These different groups have substantial genetic alterations, and also distinct demography, clinical characteristics, and prognosis. This article is intended to review the diverse molecular and genetic aberrations that may be helpful in prognostication and prediction of survival in patients suffering from an ependymoma. PMID:26954807

  19. Tumors of the Testis: Morphologic Features and Molecular Alterations.

    PubMed

    Howitt, Brooke E; Berney, Daniel M

    2015-12-01

    This article reviews the most frequently encountered tumor of the testis; pure and mixed malignant testicular germ cell tumors (TGCT), with emphasis on adult (postpubertal) TGCTs and their differential diagnoses. We additionally review TGCT in the postchemotherapy setting, and findings to be integrated into the surgical pathology report, including staging of testicular tumors and other problematic issues. The clinical features, gross pathologic findings, key histologic features, common differential diagnoses, the use of immunohistochemistry, and molecular alterations in TGCTs are discussed. PMID:26612222

  20. NSCLC tumor shrinkage prediction using quantitative image features.

    PubMed

    Hunter, Luke A; Chen, Yi Pei; Zhang, Lifei; Matney, Jason E; Choi, Haesun; Kry, Stephen F; Martel, Mary K; Stingo, Francesco; Liao, Zhongxing; Gomez, Daniel; Yang, Jinzhong; Court, Laurence E

    2016-04-01

    The objective of this study was to develop a quantitative image feature model to predict non-small cell lung cancer (NSCLC) volume shrinkage from pre-treatment CT images. 64 stage II-IIIB NSCLC patients with similar treatments were all imaged using the same CT scanner and protocol. For each patient, the planning gross tumor volume (GTV) was deformed onto the week 6 treatment image, and tumor shrinkage was quantified as the deformed GTV volume divided by the planning GTV volume. Geometric, intensity histogram, absolute gradient image, co-occurrence matrix, and run-length matrix image features were extracted from each planning GTV. Prediction models were generated using principal component regression with simulated annealing subset selection. Performance was quantified using the mean squared error (MSE) between the predicted and observed tumor shrinkages. Permutation tests were used to validate the results. The optimal prediction model gave a strong correlation between the observed and predicted tumor shrinkages with r=0.81 and MSE=8.60×10(-3). Compared to predictions based on the mean population shrinkage this resulted in a 2.92 fold reduction in MSE. In conclusion, this study indicated that quantitative image features extracted from existing pre-treatment CT images can successfully predict tumor shrinkage and provide additional information for clinical decisions regarding patient risk stratification, treatment, and prognosis. PMID:26878137

  1. A protein structural class prediction method based on novel features.

    PubMed

    Zhang, Lichao; Zhao, Xiqiang; Kong, Liang

    2013-09-01

    In this study, a 12-dimensional feature vector is constructed to reflect the general contents and spatial arrangements of the secondary structural elements of a given protein sequence. Among the 12 features, 6 novel features are specially designed to improve the prediction accuracies for α/β and α + β classes based on the distributions of α-helices and β-strands and the characteristics of parallel β-sheets and anti-parallel β-sheets. To evaluate our method, the jackknife cross-validating test is employed on two widely-used datasets, 25PDB and 1189 datasets with sequence similarity lower than 40% and 25%, respectively. The performance of our method outperforms the recently reported methods in most cases, and the 6 newly-designed features have significant positive effect to the prediction accuracies, especially for α/β and α + β classes. PMID:23770446

  2. Stabilizing l1-norm prediction models by supervised feature grouping.

    PubMed

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2016-02-01

    Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l1-norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. PMID:26689771

  3. BDDCS Class Prediction for New Molecular Entities

    PubMed Central

    Broccatelli, Fabio; Cruciani, Gabriele; Benet, Leslie Z.; Oprea, Tudor I.

    2012-01-01

    The Biopharmaceutics Drug Disposition Classification System (BDDCS) was successfully employed for predicting drug-drug interactions (DDIs) with respect to drug metabolizing enzymes (DMEs), drug transporters and their interplay. The major assumption of BDDCS is that the extent of metabolism (EoM) predicts high versus low intestinal permeability rate, and vice versa, at least when uptake transporters or paracellular transport are not involved. We recently published a collection of over 900 marketed drugs classified for BDDCS. We suggest that a reliable model for predicting BDDCS class, integrated with in vitro assays, could anticipate disposition and potential DDIs of new molecular entities (NMEs). Here we describe a computational procedure for predicting BDDCS class from molecular structures. The model was trained on a set of 300 oral drugs, and validated on an external set of 379 oral drugs, using 17 descriptors calculated or derived from the VolSurf+ software. For each molecule, a probability of BDDCS class membership was given, based on predicted EoM, FDA solubility (FDAS) and their confidence scores. The accuracy in predicting FDAS was 78% in training and 77% in validation, while for EoM prediction the accuracy was 82% in training and 79% in external validation. The actual BDDCS class corresponded to the highest ranked calculated class for 55% of the validation molecules, and it was within the top two ranked more than 92% of the times. The unbalanced stratification of the dataset didn’t affect the prediction, which showed highest accuracy in predicting classes 2 and 3 with respect to the most populated class 1. For class 4 drugs a general lack of predictability was observed. A linear discriminant analysis (LDA) confirmed the degree of accuracy for the prediction of the different BDDCS classes is tied to the structure of the dataset. This model could routinely be used in early drug discovery to prioritize in vitro tests for NMEs (e.g., affinity to transporters

  4. How to Predict Molecular Interactions between Species?

    PubMed Central

    Schulze, Sylvie; Schleicher, Jana; Guthke, Reinhard; Linde, Jörg

    2016-01-01

    Organisms constantly interact with other species through physical contact which leads to changes on the molecular level, for example the transcriptome. These changes can be monitored for all genes, with the help of high-throughput experiments such as RNA-seq or microarrays. The adaptation of the gene expression to environmental changes within cells is mediated through complex gene regulatory networks. Often, our knowledge of these networks is incomplete. Network inference predicts gene regulatory interactions based on transcriptome data. An emerging application of high-throughput transcriptome studies are dual transcriptomics experiments. Here, the transcriptome of two or more interacting species is measured simultaneously. Based on a dual RNA-seq data set of murine dendritic cells infected with the fungal pathogen Candida albicans, the software tool NetGenerator was applied to predict an inter-species gene regulatory network. To promote further investigations of molecular inter-species interactions, we recently discussed dual RNA-seq experiments for host-pathogen interactions and extended the applied tool NetGenerator (Schulze et al., 2015). The updated version of NetGenerator makes use of measurement variances in the algorithmic procedure and accepts gene expression time series data with missing values. Additionally, we tested multiple modeling scenarios regarding the stimuli functions of the gene regulatory network. Here, we summarize the work by Schulze et al. (2015) and put it into a broader context. We review various studies making use of the dual transcriptomics approach to investigate the molecular basis of interacting species. Besides the application to host-pathogen interactions, dual transcriptomics data are also utilized to study mutualistic and commensalistic interactions. Furthermore, we give a short introduction into additional approaches for the prediction of gene regulatory networks and discuss their application to dual transcriptomics data. We

  5. Classification performance prediction using parametric scattering feature models

    NASA Astrophysics Data System (ADS)

    Chiang, Hung-Chih; Moses, Randolph L.; Potter, Lee C.

    2000-08-01

    We consider a method for estimating classification performance of a model-based synthetic aperture radar (SAR) automatic target recognition system. Target classification is performed by comparing an unordered feature set extracted from a measured SAR image chip with an unordered feature set predicted from a hypothesized target class and pose. A Bayes likelihood metric that incorporates uncertainty in both the predicted and extracted feature vectors is used to compute the match score. Evaluation of the match likelihoods requires a correspondence between the unordered predicted and extracted feature sets. This is a bipartite graph matching problem with insertions and deletions; we show that the optimal match can be found in polynomial time. We extend the results in 1 to estimate classification performance for a ten-class SAR ATR problem. We consider a synthetic classification problem to validate the classifier and to address resolution and robustness questions in the likelihood scoring method. Specifically, we consider performance versus SAR resolution, performance degradation due to mismatch between the assumed and actual feature statistics, and performance impact of correlated feature attributes.

  6. Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging

    PubMed Central

    Fernandes, Armando; Vinga, Susana

    2016-01-01

    The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems. PMID:26934190

  7. Clinical and molecular features of young-onset colorectal cancer

    PubMed Central

    Ballester, Veroushka; Rashtak, Shahrooz; Boardman, Lisa

    2016-01-01

    Colorectal cancer (CRC) is one of the leading causes of cancer related mortality worldwide. Although young-onset CRC raises the possibility of a hereditary component, hereditary CRC syndromes only explain a minority of young-onset CRC cases. There is evidence to suggest that young-onset CRC have a different molecular profile than late-onset CRC. While the pathogenesis of young-onset CRC is well characterized in individuals with an inherited CRC syndrome, knowledge regarding the molecular features of sporadic young-onset CRC is limited. Understanding the molecular mechanisms of young-onset CRC can help us tailor specific screening and management strategies. While the incidence of late-onset CRC has been decreasing, mainly attributed to an increase in CRC screening, the incidence of young-onset CRC is increasing. Differences in the molecular biology of these tumors and low suspicion of CRC in young symptomatic individuals, may be possible explanations. Currently there is no evidence that supports that screening of average risk individuals less than 50 years of age will translate into early detection or increased survival. However, increasing understanding of the underlying molecular mechanisms of young-onset CRC could help us tailor specific screening and management strategies. The purpose of this review is to evaluate the current knowledge about young-onset CRC, its clinicopathologic features, and the newly recognized molecular alterations involved in tumor progression. PMID:26855533

  8. Sequence-based feature prediction and annotation of proteins

    PubMed Central

    Juncker, Agnieszka S; Jensen, Lars J; Pierleoni, Andrea; Bernsel, Andreas; Tress, Michael L; Bork, Peer; von Heijne, Gunnar; Valencia, Alfonso; Ouzounis, Christos A; Casadio, Rita; Brunak, Søren

    2009-01-01

    A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome. PMID:19226438

  9. Prediction of acoustic feature parameters using myoelectric signals.

    PubMed

    Lee, Ki-Seung

    2010-07-01

    It is well-known that a clear relationship exists between human voices and myoelectric signals (MESs) from the area of the speaker's mouth. In this study, we utilized this information to implement a speech synthesis scheme in which MES alone was used to predict the parameters characterizing the vocal-tract transfer function of specific speech signals. Several feature parameters derived from MES were investigated to find the optimal feature for maximization of the mutual information between the acoustic and the MES features. After the optimal feature was determined, an estimation rule for the acoustic parameters was proposed, based on a minimum mean square error (MMSE) criterion. In a preliminary study, 60 isolated words were used for both objective and subjective evaluations. The results showed that the average Euclidean distance between the original and predicted acoustic parameters was reduced by about 30% compared with the average Euclidean distance of the original parameters. The intelligibility of the synthesized speech signals using the predicted features was also evaluated. A word-level identification ratio of 65.5% and a syllable-level identification ratio of 73% were obtained through a listening test. PMID:20172775

  10. Molecular classification and prediction in gastric cancer

    PubMed Central

    Lin, Xiandong; Zhao, Yongzhong; Song, Won-min; Zhang, Bin

    2015-01-01

    Gastric cancer, a highly heterogeneous disease, is the second leading cause of cancer death and the fourth most common cancer globally, with East Asia accounting for more than half of cases annually. Alongside TNM staging, gastric cancer clinic has two well-recognized classification systems, the Lauren classification that subdivides gastric adenocarcinoma into intestinal and diffuse types and the alternative World Health Organization system that divides gastric cancer into papillary, tubular, mucinous (colloid), and poorly cohesive carcinomas. Both classification systems enable a better understanding of the histogenesis and the biology of gastric cancer yet have a limited clinical utility in guiding patient therapy due to the molecular heterogeneity of gastric cancer. Unprecedented whole-genome-scale data have been catalyzing and advancing the molecular subtyping approach. Here we cataloged and compared those published gene expression profiling signatures in gastric cancer. We summarized recent integrated genomic characterization of gastric cancer based on additional data of somatic mutation, chromosomal instability, EBV virus infection, and DNA methylation. We identified the consensus patterns across these signatures and identified the underlying molecular pathways and biological functions. The identification of molecular subtyping of gastric adenocarcinoma and the development of integrated genomics approaches for clinical applications such as prediction of clinical intervening emerge as an essential phase toward personalized medicine in treating gastric cancer. PMID:26380657

  11. Volumetric feature extraction and visualization of tomographic molecular imaging.

    PubMed

    Bajaj, Chandrajit; Yu, Zeyun; Auer, Manfred

    2003-01-01

    Electron tomography is useful for studying large macromolecular complex within their cellular context. The associate problems include crowding and complexity. Data exploration and 3D visualization of complexes require rendering of tomograms as well as extraction of all features of interest. We present algorithms for fully automatic boundary segmentation and skeletonization, and demonstrate their applications in feature extraction and visualization of cell and molecular tomographic imaging. We also introduce an interactive volumetric exploration and visualization tool (Volume Rover), which encapsulates implementations of the above volumetric image processing algorithms, and additionally uses efficient multi-resolution interactive geometry and volume rendering techniques for interactive visualization. PMID:14643216

  12. Exploiting Information Diffusion Feature for Link Prediction in Sina Weibo

    NASA Astrophysics Data System (ADS)

    Li, Dong; Zhang, Yongchao; Xu, Zhiming; Chu, Dianhui; Li, Sheng

    2016-01-01

    The rapid development of online social networks (e.g., Twitter and Facebook) has promoted research related to social networks in which link prediction is a key problem. Although numerous attempts have been made for link prediction based on network structure, node attribute and so on, few of the current studies have considered the impact of information diffusion on link creation and prediction. This paper mainly addresses Sina Weibo, which is the largest microblog platform with Chinese characteristics, and proposes the hypothesis that information diffusion influences link creation and verifies the hypothesis based on real data analysis. We also detect an important feature from the information diffusion process, which is used to promote link prediction performance. Finally, the experimental results on Sina Weibo dataset have demonstrated the effectiveness of our methods.

  13. Nonstationary time series prediction combined with slow feature analysis

    NASA Astrophysics Data System (ADS)

    Wang, G.; Chen, X.

    2015-07-01

    Almost all climate time series have some degree of nonstationarity due to external driving forces perturbing the observed system. Therefore, these external driving forces should be taken into account when constructing the climate dynamics. This paper presents a new technique of obtaining the driving forces of a time series from the slow feature analysis (SFA) approach, and then introduces them into a predictive model to predict nonstationary time series. The basic theory of the technique is to consider the driving forces as state variables and to incorporate them into the predictive model. Experiments using a modified logistic time series and winter ozone data in Arosa, Switzerland, were conducted to test the model. The results showed improved prediction skills.

  14. Exploiting Information Diffusion Feature for Link Prediction in Sina Weibo

    PubMed Central

    Li, Dong; Zhang, Yongchao; Xu, Zhiming; Chu, Dianhui; Li, Sheng

    2016-01-01

    The rapid development of online social networks (e.g., Twitter and Facebook) has promoted research related to social networks in which link prediction is a key problem. Although numerous attempts have been made for link prediction based on network structure, node attribute and so on, few of the current studies have considered the impact of information diffusion on link creation and prediction. This paper mainly addresses Sina Weibo, which is the largest microblog platform with Chinese characteristics, and proposes the hypothesis that information diffusion influences link creation and verifies the hypothesis based on real data analysis. We also detect an important feature from the information diffusion process, which is used to promote link prediction performance. Finally, the experimental results on Sina Weibo dataset have demonstrated the effectiveness of our methods. PMID:26817436

  15. Exploiting Information Diffusion Feature for Link Prediction in Sina Weibo.

    PubMed

    Li, Dong; Zhang, Yongchao; Xu, Zhiming; Chu, Dianhui; Li, Sheng

    2016-01-01

    The rapid development of online social networks (e.g., Twitter and Facebook) has promoted research related to social networks in which link prediction is a key problem. Although numerous attempts have been made for link prediction based on network structure, node attribute and so on, few of the current studies have considered the impact of information diffusion on link creation and prediction. This paper mainly addresses Sina Weibo, which is the largest microblog platform with Chinese characteristics, and proposes the hypothesis that information diffusion influences link creation and verifies the hypothesis based on real data analysis. We also detect an important feature from the information diffusion process, which is used to promote link prediction performance. Finally, the experimental results on Sina Weibo dataset have demonstrated the effectiveness of our methods. PMID:26817436

  16. Common features of microRNA target prediction tools.

    PubMed

    Peterson, Sarah M; Thompson, Jeffrey A; Ufkin, Melanie L; Sathyanarayana, Pradeep; Liaw, Lucy; Congdon, Clare Bates

    2014-01-01

    The human genome encodes for over 1800 microRNAs (miRNAs), which are short non-coding RNA molecules that function to regulate gene expression post-transcriptionally. Due to the potential for one miRNA to target multiple gene transcripts, miRNAs are recognized as a major mechanism to regulate gene expression and mRNA translation. Computational prediction of miRNA targets is a critical initial step in identifying miRNA:mRNA target interactions for experimental validation. The available tools for miRNA target prediction encompass a range of different computational approaches, from the modeling of physical interactions to the incorporation of machine learning. This review provides an overview of the major computational approaches to miRNA target prediction. Our discussion highlights three tools for their ease of use, reliance on relatively updated versions of miRBase, and range of capabilities, and these are DIANA-microT-CDS, miRanda-mirSVR, and TargetScan. In comparison across all miRNA target prediction tools, four main aspects of the miRNA:mRNA target interaction emerge as common features on which most target prediction is based: seed match, conservation, free energy, and site accessibility. This review explains these features and identifies how they are incorporated into currently available target prediction tools. MiRNA target prediction is a dynamic field with increasing attention on development of new analysis tools. This review attempts to provide a comprehensive assessment of these tools in a manner that is accessible across disciplines. Understanding the basis of these prediction methodologies will aid in user selection of the appropriate tools and interpretation of the tool output. PMID:24600468

  17. Automated Analysis and Classification of Histological Tissue Features by Multi-Dimensional Microscopic Molecular Profiling

    PubMed Central

    Riordan, Daniel P.; Varma, Sushama; West, Robert B.; Brown, Patrick O.

    2015-01-01

    Characterization of the molecular attributes and spatial arrangements of cells and features within complex human tissues provides a critical basis for understanding processes involved in development and disease. Moreover, the ability to automate steps in the analysis and interpretation of histological images that currently require manual inspection by pathologists could revolutionize medical diagnostics. Toward this end, we developed a new imaging approach called multidimensional microscopic molecular profiling (MMMP) that can measure several independent molecular properties in situ at subcellular resolution for the same tissue specimen. MMMP involves repeated cycles of antibody or histochemical staining, imaging, and signal removal, which ultimately can generate information analogous to a multidimensional flow cytometry analysis on intact tissue sections. We performed a MMMP analysis on a tissue microarray containing a diverse set of 102 human tissues using a panel of 15 informative antibody and 5 histochemical stains plus DAPI. Large-scale unsupervised analysis of MMMP data, and visualization of the resulting classifications, identified molecular profiles that were associated with functional tissue features. We then directly annotated H&E images from this MMMP series such that canonical histological features of interest (e.g. blood vessels, epithelium, red blood cells) were individually labeled. By integrating image annotation data, we identified molecular signatures that were associated with specific histological annotations and we developed statistical models for automatically classifying these features. The classification accuracy for automated histology labeling was objectively evaluated using a cross-validation strategy, and significant accuracy (with a median per-pixel rate of 77% per feature from 15 annotated samples) for de novo feature prediction was obtained. These results suggest that high-dimensional profiling may advance the development of computer

  18. Predictive features of breast cancer on Mexican screening mammography patients

    NASA Astrophysics Data System (ADS)

    Rodriguez-Rojas, Juan; Garza-Montemayor, Margarita; Trevino-Alvarado, Victor; Tamez-Pena, José Gerardo

    2013-02-01

    Breast cancer is the most common type of cancer worldwide. In response, breast cancer screening programs are becoming common around the world and public programs now serve millions of women worldwide. These programs are expensive, requiring many specialized radiologists to examine all images. Nevertheless, there is a lack of trained radiologists in many countries as in Mexico, which is a barrier towards decreasing breast cancer mortality, pointing at the need of a triaging system that prioritizes high risk cases for prompt interpretation. Therefore we explored in an image database of Mexican patients whether high risk cases can be distinguished using image features. We collected a set of 200 digital screening mammography cases from a hospital in Mexico, and assigned low or high risk labels according to its BIRADS score. Breast tissue segmentation was performed using an automatic procedure. Image features were obtained considering only the segmented region on each view and comparing the bilateral di erences of the obtained features. Predictive combinations of features were chosen using a genetic algorithms based feature selection procedure. The best model found was able to classify low-risk and high-risk cases with an area under the ROC curve of 0.88 on a 150-fold cross-validation test. The features selected were associated to the differences of signal distribution and tissue shape on bilateral views. The model found can be used to automatically identify high risk cases and trigger the necessary measures to provide prompt treatment.

  19. Quantitative imaging features to predict cancer status in lung nodules

    NASA Astrophysics Data System (ADS)

    Liu, Ying; Balagurunathan, Yoganand; Atwater, Thomas; Antic, Sanja; Li, Qian; Walker, Ronald; Smith, Gary T.; Massion, Pierre P.; Schabath, Matthew B.; Gillies, Robert J.

    2016-03-01

    Background: We propose a systematic methodology to quantify incidentally identified lung nodules based on observed radiological traits on a point scale. These quantitative traits classification model was used to predict cancer status. Materials and Methods: We used 102 patients' low dose computed tomography (LDCT) images for this study, 24 semantic traits were systematically scored from each image. We built a machine learning classifier in cross validation setting to find best predictive imaging features to differentiate malignant from benign lung nodules. Results: The best feature triplet to discriminate malignancy was based on long axis, concavity and lymphadenopathy with average AUC of 0.897 (Accuracy of 76.8%, Sensitivity of 64.3%, Specificity of 90%). A similar semantic triplet optimized on Sensitivity/Specificity (Youden's J index) included long axis, vascular convergence and lymphadenopathy which had an average AUC of 0.875 (Accuracy of 81.7%, Sensitivity of 76.2%, Specificity of 95%). Conclusions: Quantitative radiological image traits can differentiate malignant from benign lung nodules. These semantic features along with size measurement enhance the prediction accuracy.

  20. Application of optimal prediction to molecular dynamics

    SciTech Connect

    Barber IV, John Letherman

    2004-12-01

    Optimal prediction is a general system reduction technique for large sets of differential equations. In this method, which was devised by Chorin, Hald, Kast, Kupferman, and Levy, a projection operator formalism is used to construct a smaller system of equations governing the dynamics of a subset of the original degrees of freedom. This reduced system consists of an effective Hamiltonian dynamics, augmented by an integral memory term and a random noise term. Molecular dynamics is a method for simulating large systems of interacting fluid particles. In this thesis, I construct a formalism for applying optimal prediction to molecular dynamics, producing reduced systems from which the properties of the original system can be recovered. These reduced systems require significantly less computational time than the original system. I initially consider first-order optimal prediction, in which the memory and noise terms are neglected. I construct a pair approximation to the renormalized potential, and ignore three-particle and higher interactions. This produces a reduced system that correctly reproduces static properties of the original system, such as energy and pressure, at low-to-moderate densities. However, it fails to capture dynamical quantities, such as autocorrelation functions. I next derive a short-memory approximation, in which the memory term is represented as a linear frictional force with configuration-dependent coefficients. This allows the use of a Fokker-Planck equation to show that, in this regime, the noise is {delta}-correlated in time. This linear friction model reproduces not only the static properties of the original system, but also the autocorrelation functions of dynamical variables.

  1. Proteomic Features Predict Seroreactivity against Leptospiral Antigens in Leptospirosis Patients

    PubMed Central

    2015-01-01

    With increasing efficiency, accuracy, and speed we can access complete genome sequences from thousands of infectious microorganisms; however, the ability to predict antigenic targets of the immune system based on amino acid sequence alone is still needed. Here we use a Leptospira interrogans microarray expressing 91% (3359) of all leptospiral predicted ORFs (3667) and make an empirical accounting of all antibody reactive antigens recognized in sera from naturally infected humans; 191 antigens elicited an IgM or IgG response, representing 5% of the whole proteome. We classified the reactive antigens into 26 annotated COGs (clusters of orthologous groups), 26 JCVI Mainrole annotations, and 11 computationally predicted proteomic features. Altogether, 14 significantly enriched categories were identified, which are associated with immune recognition including mass spectrometry evidence of in vitro expression and in vivo mRNA up-regulation. Together, this group of 14 enriched categories accounts for just 25% of the leptospiral proteome but contains 50% of the immunoreactive antigens. These findings are consistent with our previous studies of other Gram-negative bacteria. This genome-wide approach provides an empirical basis to predict and classify antibody reactive antigens based on structural, physical–chemical, and functional proteomic features and a framework for understanding the breadth and specificity of the immune response to L. interrogans. PMID:25358092

  2. Identifying predictive morphologic features of malignancy in eyelid lesions

    PubMed Central

    Leung, Christina; Johnson, Davin; Pang, Renee; Kratky, Vladimir

    2015-01-01

    Abstract Objective To determine features of eyelid lesions most predictive of malignancy, and to design a key to assist general practitioners in the triaging of such lesions. Design Prospective observational study. Setting Department of Ophthalmology at Queen’s University in Kingston, Ont. Participants A total of 199 consecutive periocular lesions requiring biopsy or excision were included. Main outcome measures First, potential features suggestive of malignancy for eyelid lesions were identified based on a survey sent to Canadian oculoplastic surgeons. The sensitivity, specificity, and odds ratios (ORs) of these features were then determined using 199 consecutive photographed eyelid lesions of patients who presented to the Department of Ophthalmology and underwent biopsy or excision. A triage key was then created based on the features with the highest ORs, and it was pilot-tested by a group of medical students. Results Of the 199 lesions included, 161 (80.9%) were benign and 38 (19.1%) were malignant. The 3 features with the highest ORs in predicting malignancy were infiltration (OR = 18.2, P < .01), ulceration (OR = 14.7, P < .01), and loss of eyelashes (OR = 6.0, P < .01). The acronym LUI (loss of eyelashes, ulceration, infiltration) was created to assist in memory recall. After watching a video describing the LUI triage key, the mean total score of a group of medical students for correctly identifying malignant lesions increased from 46% to 70% (P < .001). Conclusion Differentiating benign from malignant eyelid lesions can be difficult even for experienced physicians. The LUI triage key provides physicians with an evidence-based, easy-to-remember system for assisting in the triaging of these lesions. PMID:25756148

  3. Predicting Malignancy in Thyroid Nodules: Molecular Advances

    PubMed Central

    Melck, Adrienne L.; Yip, Linwah

    2016-01-01

    Over the last several years, a clearer understanding of the genetic alterations underlying thyroid carcinogenesis has developed. This knowledge can be utilized to tackle one of the greatest challenges facing thyroidologists: management of the indeterminate thyroid nodule. Despite the accuracy of fine needle aspiration cytology, many patients undergo invasive surgery in order to determine if a follicular or Hurthle cell neoplasm is malignant, and better diagnostic tools are required. A number of biomarkers have recently been studied and show promise in this setting. In particular, BRAF, RAS, PAX8-PPARγ, microRNAs and loss of heterozygosity have each been demonstrated as useful molecular tools for predicting malignancy and can thereby guide decisions regarding surgical management of nodular thyroid disease. This review summarizes the current literature surrounding each of these markers and highlights our institution’s prospective analysis of these markers and their subsequent incorporation into our management algorithms for thyroid nodules. PMID:21818817

  4. A Prediction Model for Membrane Proteins Using Moments Based Features.

    PubMed

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  5. A Prediction Model for Membrane Proteins Using Moments Based Features

    PubMed Central

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  6. Characterization of statistical features for plant microRNA prediction

    PubMed Central

    2011-01-01

    Background Several tools are available to identify miRNAs from deep-sequencing data, however, only a few of them, like miRDeep, can identify novel miRNAs and are also available as a standalone application. Given the difference between plant and animal miRNAs, particularly in terms of distribution of hairpin length and the nature of complementarity with its duplex partner (or miRNA star), the underlying (statistical) features of miRDeep and other tools, using similar features, are likely to get affected. Results The potential effects on features, such as minimum free energy, stability of secondary structures, excision length, etc., were examined, and the parameters of those displaying sizable changes were estimated for plant specific miRNAs. We found most of these features acquired a new set of values or distributions for plant specific miRNAs. While the length of conserved positions (nucleus) in mature miRNAs were relatively longer in plants, the difference in distribution of minimum free energy, between real and background hairpins, was marginal. However, the choice of source (species) of background sequences was found to affect both the minimum free energy and miRNA hairpin stability. The new parameters were tested on an Illumina dataset from maize seedlings, and the results were compared with those obtained using default parameters. The newly parameterized model was found to have much improved specificity and sensitivity over its default counterpart. Conclusions In summary, the present study reports behavior of few general and tool-specific statistical features for improving the prediction accuracy of plant miRNAs from deep-sequencing data. PMID:21324149

  7. Universal molecular features of refractory dissolved organic matter in fresh- and seawater

    NASA Astrophysics Data System (ADS)

    Dittmar, T.; Blasius, B.; Steinbrink, C.; Feenders, C.; Stumm, M.; Christoffers, J.; Niggemann, J.; Gerdts, G.; Osterholz, H.; Seibt, M.; Seidel, M.; Vähätalo, A.

    2012-04-01

    Dissolved organic matter (DOM) is among the largest pools of reduced carbon on Earth's surface. Its molecular structure and the reasons behind its stability in the aquatic environment are unknown. We present a mathematical model that predicts essential molecular features of refractory dissolved organic matter in fresh- and seawater. The model has only eight input variables and can accurately reproduce the presence and abundance of up to 10,000 molecular formulae in aquatic systems. The model was established with ultrahigh-resolution mass spectrometry data of North Pacific deep water (obtained on a 15 Tesla Fourier-transform ion cyclotron resonance mass spectrometer, FT-ICR-MS). We determined the molecular formulae of DOM with help of FT-ICR-MS in >1,000 samples from around the globe, covering a wide variety of open ocean, freshwater and coastal systems. The molecular formulae predicted from our North Pacific deep water model were present in all sea- and fresh water samples. In terrigenous DOM, we detected a second group of compounds that could also accurately be predicted with our model, by using a different set of eight input variables. This exclusively terrigenous compound group was more photo-reactive than the universal compound group. During a two-year sampling period at a continental shelf station, the universal DOM compounds were always present at their predicted abundance. During plankton blooms, additional compounds were produced that did not match our model and that did not persist on a longer term. The universal DOM pattern was also not observed in mesocosm experiments where algae and bacteria blooms were artificially induced. Refractory DOM in any aquatic system not only shares the same molecular formulae at the same relative abundance, but compounds with the same molecular formulae most likely have the same molecular structure, independent of the origin of DOM. Fragmentation experiments in the FT-ICR-MS on a wide range of molecular formulae revealed

  8. Assist feature printability prediction by 3-D resist profile reconstruction

    NASA Astrophysics Data System (ADS)

    Zheng, Xin; Huang, Jensheng; Chin, Fook; Kazarian, Aram; Kuo, Chun-Chieh

    2012-06-01

    Sub-resolution Assist Features (SRAFs) are powerful tools to enhance the focus margin of drawn patterns. SRAFs are placed and sized so they do not print on the wafer, but the larger the SRAF, the more effective it becomes at enhancing through-focus stability. The size and location of an SRAF that will image on a wafer is highly dependent upon neighboring patterns and models of SRAF printability are, at present, unreliable. Model-based SRAF placement has been used to enhance resolution at 20nm node processes and below with stringent requirements that inserted SRAFs will not be imaged on wafer. However, despite widespread SRAF use and hard data as to SRAF effectiveness, it has been very difficult to develop a process model that accurately predicts under what process conditions an SRAF will image on a wafer. More accurate models of SRAF printing should allow model based SRAF placement to be relaxed, resulting in more effective SRAF placement and broader focus margins. One of the first problems with the concept of SRAF printability is the definition of an SRAF printing on a wafer. This is not obvious because two different states of printing exist. The first print state is when a residue is left on a wafer from the SRAF. The first state can be considered printing from the point of view that photoresist is on the wafer and the photoresist may even lift off and cause defects. However, the first state can be considered non-printing because the over etch from the etch process will generally remove the photoresist residual and the material underneath. The second state is when a pattern is formed and etched into the substrate, a state at which the pattern has clearly printed on the wafer. Of course, intermediate states may also be defined. In order to be applicable, an SRAF printability model must be able to predict both printing states. In addition, the model must be able to extrapolate to configurations beyond those used to develop the model in the first place. These model

  9. Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

    PubMed

    Funk, Christopher S; Kahanda, Indika; Ben-Hur, Asa; Verspoor, Karin M

    2015-01-01

    Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated. PMID:26005564

  10. The formation of discrete high velocity molecular features

    NASA Astrophysics Data System (ADS)

    Hartquist, T. W.; Dyson, J. E.

    1987-10-01

    Clumps embedded in a flowing diffuse medium will be dissipated before ram pressure accelerates them substantially. Molecular hydrogen can be accelerated to high speeds by passing through a slow shock leading a shell at the edge of a wind-driven bubble if the density in the ambient medium drops rapidly enough to allow the shell to accelerate subsequently. The shell will be subject to the Rayleigh-Taylor instability which will drive transonic turbulence but will not initiate the formation of fragments having large density contrasts until the shell reaches sufficient speeds to become thermally unstable. The existence of high velocity discrete features in and the magnitude of the linewidth of the H2 emission from CRL 618 are explained with this acceleration mechanism. High velocity water masers may be formed in a similar fashion, but not Herbig-Haro objects.

  11. Clinical and molecular genetic features of ARC syndrome.

    PubMed

    Gissen, Paul; Tee, Louise; Johnson, Colin A; Genin, Emmanuelle; Caliebe, Almuth; Chitayat, David; Clericuzio, Carol; Denecke, Jonas; Di Rocco, Maja; Fischler, Björn; FitzPatrick, David; García-Cazorla, Angeles; Guyot, Delphine; Jacquemont, Sebastien; Koletzko, Sibylle; Leheup, Bruno; Mandel, Hanna; Sanseverino, Maria Teresa Vieira; Houwen, Roderick H J; McKiernan, Patrick J; Kelly, Deirdre A; Maher, Eamonn R

    2006-10-01

    Arthrogryposis, renal dysfunction and cholestasis (ARC) syndrome (MIM 208085) is an autosomal recessive multisystem disorder that may be associated with germline VPS33B mutations. VPS33B is involved in regulation of vesicular membrane fusion by interacting with SNARE proteins, and evidence of abnormal polarised membrane protein trafficking has been reported in ARC patients. We characterised clinical and molecular features of ARC syndrome in order to identify potential genotype-phenotype correlations. The clinical phenotype of 62 ARC syndrome patients was analysed. In addition to classical features described previously, all patients had severe failure to thrive, which was not adequately explained by the degree of liver disease and 10% had structural cardiac defects. Almost half of the patients who underwent diagnostic organ biopsy (7/16) developed life-threatening haemorrhage. We found that most patients (9/11) who suffered severe haemorrhage (7 post biopsy and 4 spontaneous) had normal platelet count and morphology. Germline VPS33B mutations were detected in 28/35 families (48/62 individuals) with ARC syndrome. Several mutations were restricted to specific ethnic groups. Thus p.Arg438X mutation was common in the UK Pakistani families and haplotyping was consistent with a founder mutation with the most recent common ancestor 900-1,000 years ago. Heterozygosity was found in the VPS33B locus in some cases of ARC providing the first evidence of a possible second ARC syndrome gene. In conclusion we state that molecular diagnosis is possible for most children in whom ARC syndrome is suspected and VPS33B mutation analysis should replace organ biopsy as a first line diagnostic test for ARC syndrome. PMID:16896922

  12. Molecular Features Related to HIV Integrase Inhibition Obtained from Structure- and Ligand-Based Approaches

    PubMed Central

    de Carvalho, Luciana L.; Maltarollo, Vinícius G.; de Lima, Emmanuela Ferreira; Weber, Karen C.; Honorio, Kathia M.; da Silva, Albérico B. F.

    2014-01-01

    Among several biological targets to treat AIDS, HIV integrase is a promising enzyme that can be employed to develop new anti-HIV agents. The aim of this work is to propose a mechanistic interpretation of HIV-1 integrase inhibition and to rationalize the molecular features related to the binding affinity of studied ligands. A set of 79 HIV-1 integrase inhibitors and its relationship with biological activity are investigated employing 2D and 3D QSAR models, docking analysis and DFT studies. Analyses of docking poses and frontier molecular orbitals revealed important features on the main ligand-receptor interactions. 2D and 3D models presenting good internal consistency, predictive power and stability were obtained in all cases. Significant correlation coefficients (r2 = 0.908 and q2 = 0.643 for 2D model; r2 = 0.904 and q2 = 0.719 for 3D model) were obtained, indicating the potential of these models for untested compounds. The generated holograms and contribution maps revealed important molecular requirements to HIV-1 IN inhibition and several evidences for molecular modifications. The final models along with information resulting from molecular orbitals, 2D contribution and 3D contour maps should be useful in the design of new inhibitors with increased potency and selectivity within the chemical diversity of the data. PMID:24416129

  13. Molecular features of hypothalamic plaques in Alzheimer's disease.

    PubMed Central

    Standaert, D. G.; Lee, V. M.; Greenberg, B. D.; Lowery, D. E.; Trojanowski, J. Q.

    1991-01-01

    The pathology of Alzheimer's disease (AD) involves subcortical as well as cortical structures. The authors have used immunohistochemical methods to study the molecular composition of AD plaques in the hypothalamus. In contrast to previous studies using histochemical methods, the authors observed large numbers of diffuse plaques in the AD hypothalamus labeled with an antiserum to the beta-amyloid, or A4 peptide, of the beta-amyloid precursor proteins (beta APPs), whereas A4-immunoreactive plaques were uncommon in the hypothalamus of patients without AD. Unlike plaques in the cortex and hippocampus of AD patients, hypothalamic plaques did not contain epitopes corresponding to other regions of the beta APPs, nor did they contain tau-, neurofilament-, or microtubule-associated protein-reactive epitopes, and did not disrupt the neuropil or produce astrogliosis. These findings demonstrate that there are substantial molecular and cellular differences in the pathologic features of AD in the hypothalamus compared with those observed in hippocampal and cortical structures, which may provide insight into the pathogenetic mechanisms of AD. Images Figure 1 Figure 2 Figure 3 Figure 4 PMID:1653521

  14. Predicting the Presence of Large Fish through Benthic Geomorphic Features

    NASA Astrophysics Data System (ADS)

    Knuth, F.; Sautter, L.; Levine, N. S.; Kracker, L.

    2013-12-01

    Marine Protected Areas are critical in sustaining the resilience of fish populations to commercial fishing operations. Using acoustic data to survey these areas promises efficiency, accuracy, and minimal environmental impact. In July, 2013, the NOAA Ship Pisces collected bathymetric, backscatter and water column data for 10 proposed MPA sites along the U.S. Southeast Atlantic continental shelf. A total of 205 km2 of seafloor were mapped between Mayport, FL and Wilmington, NC, using the SIMRAD ME70 and EK60 echosounder systems. These data were processed in Caris HIPS, QPS FMGT, MATLAB and ArcGIS. The backscatter and bathymetry reveal various benthic geomorphic features, including flat sand, rippled sand, and rugose hard bottom. Water column data directly above highly rugose hardbottom contains the greatest counts for large fish populations. Using spatial statistics, such as a geographically weighted regression model, we aim to identify features of the benthic profile, including rugosity, curvature and slope, that can predict the presence of large fish. The success of this approach will greatly expedite fishery surveys, minimize operational cost and aid in making timely management decisions.

  15. Enhanced Protein Fold Prediction Method Through a Novel Feature Extraction Technique.

    PubMed

    Wei, Leyi; Liao, Minghong; Gao, Xing; Zou, Quan

    2015-09-01

    Information of protein 3-dimensional (3D) structures plays an essential role in molecular biology, cell biology, biomedicine, and drug design. Protein fold prediction is considered as an immediate step for deciphering the protein 3D structures. Therefore, protein fold prediction is one of fundamental problems in structural bioinformatics. Recently, numerous taxonomic methods have been developed for protein fold prediction. Unfortunately, the overall prediction accuracies achieved by existing taxonomic methods are not satisfactory although much progress has been made. To address this problem, we propose a novel taxonomic method, called PFPA, which is featured by combining a novel feature set through an ensemble classifier. Particularly, the sequential evolution information from the profiles of PSI-BLAST and the local and global secondary structure information from the profiles of PSI-PRED are combined to construct a comprehensive feature set. Experimental results demonstrate that PFPA outperforms the state-of-the-art predictors. To be specific, when tested on the independent testing set of a benchmark dataset, PFPA achieves an overall accuracy of 73.6%, which is the leading accuracy ever reported. Moreover, PFPA performs well without significant performance degradation on three updated large-scale datasets, indicating the robustness and generalization of PFPA. Currently, a webserver that implements PFPA is freely available on http://121.192.180.204:8080/PFPA/Index.html. PMID:26335556

  16. Prediction of cell-penetrating peptides with feature selection techniques.

    PubMed

    Tang, Hua; Su, Zhen-Dong; Wei, Huan-Huan; Chen, Wei; Lin, Hao

    2016-08-12

    Cell-penetrating peptides are a group of peptides which can transport different types of cargo molecules such as drugs across plasma membrane and have been applied in the treatment of various diseases. Thus, the accurate prediction of cell-penetrating peptides with bioinformatics methods will accelerate the development of drug delivery systems. The study aims to develop a powerful model to accurately identify cell-penetrating peptides. At first, the peptides were translated into a set of vectors with the same dimension by using dipeptide compositions. Secondly, the Analysis of Variance-based technique was used to reduce the dimension of the vector and explore the optimized features. Finally, the support vector machine was utilized to discriminate cell-penetrating peptides from non-cell-penetrating peptides. The five-fold cross-validated results showed that our proposed method could achieve an overall prediction accuracy of 83.6%. Based on the proposed model, we constructed a free webserver called C2Pred (http://lin.uestc.edu.cn/server/C2Pred). PMID:27291150

  17. Prognostic Significance and Molecular Features of Colorectal Mucinous Adenocarcinomas

    PubMed Central

    Wang, Mo-Jin; Ping, Jie; Li, Yuan; Holmqvist, Annica; Adell, Gunnar; Arbman, Gunnar; Zhang, Hong; Zhou, Zong-Guang; Sun, Xiao-Feng

    2015-01-01

    Abstract Mucinous adenocarcinoma (MC) is a special histology subtype of colorectal adenocarcinoma. The survival of MC is controversial and the prognostic biomarkers of MC remain unclear. To analyze prognostic significance and molecular features of colorectal MC. This study included 755,682 and 1001 colorectal cancer (CRC) patients from Surveillance, Epidemiology, and End Results program (SEER, 1973–2011), and Linköping Cancer (LC, 1972–2009) databases. We investigated independently the clinicopathological characteristics, survival, and variety of molecular features from these 2 databases. MC was found in 9.3% and 9.8% patients in SEER and LC, respectively. MC was more frequently localized in the right colon compared with nonmucinous adenocarcinoma (NMC) in both SEER (57.7% vs 37.2%, P < 0.001) and LC (46.9% vs 27.7%, P < 0.001). Colorectal MC patients had significantly worse cancer-specific survival (CSS) than NMC patients (SEER, P < 0.001; LC, P = 0.026), prominently in stage III (SEER, P < 0.001; LC, P = 0.023). The multivariate survival analysis showed that MC was independently related to poor prognosis in rectal cancer patients (SEER, hazard ratios [HR], 1.076; 95% confidence intervals [CI], 1.057–1.096; P < 0.001). In LC, the integrated analysis of genetic and epigenetic features showed that that strong expression of PINCH (HR, 3.954; 95% CI, 1.493–10.47; P = 0.013) and weak expression of RAD50 (HR 0.348, 95% CI, 0.106–1.192; P = 0.026) were significantly associated with poor CSS of colorectal MC patients. In conclusion, the colorectal MC patients had significantly worse CSS than NMC patients, prominently in stage III. MC was an independent prognostic factor associated with worse survival in rectal cancer patients. The PINCH and RAD50 were prognostic biomarkers for colorectal MC patients. PMID:26705231

  18. Prediction of substrate-enzyme-product interaction based on molecular descriptors and physicochemical properties.

    PubMed

    Niu, Bing; Huang, Guohua; Zheng, Linfeng; Wang, Xueyuan; Chen, Fuxue; Zhang, Yuhui; Huang, Tao

    2013-01-01

    It is important to correctly and efficiently predict the interaction of substrate-enzyme and to predict their product in metabolic pathway. In this work, a novel approach was introduced to encode substrate/product and enzyme molecules with molecular descriptors and physicochemical properties, respectively. Based on this encoding method, KNN was adopted to build the substrate-enzyme-product interaction network. After selecting the optimal features that are able to represent the main factors of substrate-enzyme-product interaction in our prediction, totally 160 features out of 290 features were attained which can be clustered into ten categories: elemental analysis, geometry, chemistry, amino acid composition, predicted secondary structure, hydrophobicity, polarizability, solvent accessibility, normalized van der Waals volume, and polarity. As a result, our predicting model achieved an MCC of 0.423 and an overall prediction accuracy of 89.1% for 10-fold cross-validation test. PMID:24455714

  19. Predictive Features of a Cockpit Traffic Display: A Workload Assessment

    NASA Technical Reports Server (NTRS)

    Wickens, Christopher D.; Morphew, Ephimia

    1997-01-01

    Eighteen pilots flew a series of traffic avoidance maneuvers in an experiment designed to assess the support offered and workload imposed by different levels of traffic display information in a free flight simulation. Three display prototypes were compared which differed in traffic information provided. A BASELINE (BL) display provided current and (2nd order) predicted information regarding ownship and current information of an intruder aircraft, represented on lateral and vertical displays in a coplanar suite. An INTRUDER PREDICTOR (IP) display, augmented the baseline display by providing lateral and vertical prediction of the intruder aircraft. A THREAT VECTOR (TV) display added to the IP display a vector that indicates the direction from ownship to the intruder at the predicted point of closest contact (POCC). The length of the vector corresponds to the radius of the protected zone, and the distance of the intersection of the vector with ownship predictor, corresponds to the time available till POCC or loss of separation. Pilots time shared the traffic avoidance task with a secondary task requiring them to monitor the top of the display for faint targets. This task simulated the visual demands of out-of-cockpit scanning, and hence was used to estimate the head-down time required by the different display formats. The results revealed that both display augmentations improved performance (safety) as assessed by predicted and actual loss of separation (i.e., penetration of the protected zone). Both enhancements also reduced workload, as assessed by the NASA TLX scale. The intruder predictor display produced these benefits with no substantial impact on the qualitative nature of the avoidance maneuvers that were selected. The threat vector produced the safety benefits by inducing a greater degree of (effective) lateral maneuvering, thus partially offsetting the benefits of reduced workload. The three displays did not differ in terms of their effect on performance of

  20. Beyond [lambda][subscript max] Part 2: Predicting Molecular Color

    ERIC Educational Resources Information Center

    Williams, Darren L.; Flaherty, Thomas J.; Alnasleh, Bassam K.

    2009-01-01

    A concise roadmap for using computational chemistry programs (i.e., Gaussian 03W) to predict the color of a molecular species is presented. A color-predicting spreadsheet is available with the online material that uses transition wavelengths and peak-shape parameters to predict the visible absorbance spectrum, transmittance spectrum, chromaticity…

  1. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  2. Delta hepatitis: molecular biology and clinical and epidemiological features.

    PubMed Central

    Polish, L B; Gallagher, M; Fields, H A; Hadler, S C

    1993-01-01

    Hepatitis delta virus, discovered in 1977, requires the help of hepatitis B virus to replicate in hepatocytes and is an important cause of acute, fulminant, and chronic liver disease in many regions of the world. Because of the helper function of hepatitis delta virus, infection with it occurs either as a coinfection with hepatitis B or as a superinfection of a carrier of hepatitis B surface antigen. Although the mechanisms of transmission are similar to those of hepatitis B virus, the patterns of transmission of delta virus vary widely around the world. In regions of the world in which hepatitis delta virus infection is not endemic, the disease is confined to groups at high risk of acquiring hepatitis B infection and high-risk hepatitis B carriers. Because of the propensity of this viral infection to cause fulminant as well as chronic liver disease, continued incursion of hepatitis delta virus into areas of the world where persistent hepatitis B infection is endemic will have serious implications. Prevention depends on the widespread use of hepatitis B vaccine. This review focuses on the molecular biology and the clinical and epidemiologic features of this important viral infection. PMID:8358704

  3. Molecular Evolution and Structural Features of IRAK Family Members

    PubMed Central

    Gosu, Vijayakumar; Basith, Shaherin; Durai, Prasannavenkatesh; Choi, Sangdun

    2012-01-01

    The interleukin-1 receptor-associated kinase (IRAK) family comprises critical signaling mediators of the TLR/IL-1R signaling pathways. IRAKs are Ser/Thr kinases. There are 4 members in the vertebrate genome (IRAK1, IRAK2, IRAKM, and IRAK4) and an IRAK homolog, Pelle, in insects. IRAK family members are highly conserved in vertebrates, but the evolutionary relationship between IRAKs in vertebrates and insects is not clear. To investigate the evolutionary history and functional divergence of IRAK members, we performed extensive bioinformatics analysis. The phylogenetic relationship between IRAK sequences suggests that gene duplication events occurred in the evolutionary lineage, leading to early vertebrates. A comparative phylogenetic analysis with insect homologs of IRAKs suggests that the Tube protein is a homolog of IRAK4, unlike the anticipated protein, Pelle. Furthermore, the analysis supports that an IRAK4-like kinase is an ancestral protein in the metazoan lineage of the IRAK family. Through functional analysis, several potentially diverged sites were identified in the common death domain and kinase domain. These sites have been constrained during evolution by strong purifying selection, suggesting their functional importance within IRAKs. In summary, our study highlighted the molecular evolution of the IRAK family, predicted the amino acids that contributed to functional divergence, and identified structural variations among the IRAK paralogs that may provide a starting point for further experimental investigations. PMID:23166766

  4. Molecular biology of testicular germ cell tumors: unique features awaiting clinical application.

    PubMed

    Boublikova, Ludmila; Buchler, Tomas; Stary, Jan; Abrahamova, Jitka; Trka, Jan

    2014-03-01

    Testicular germ cell tumors (TGCTs) are the most common solid tumors in young adult men characterized by distinct biologic features and clinical behavior. Both genetic predispositions and environmental factors probably play a substantial role in their etiology. TGTCs arise from a malignant transformation of primordial germ cells in a process that starts prenatally, is often associated with a certain degree of gonadal dysgenesis, and involves the acquirement of several specific aberrations, including activation of SCF-CKIT, amplification of 12p with up-regulation of stem cell genes, and subsequent genetic and epigenetic alterations. Their embryonic and germ origin determines the unique sensitivity of TGCTs to platinum-based chemotherapy. Contrary to the vast majority of other malignancies, no molecular prognostic/predictive factors nor targeted therapy is available for patients with these tumors. This review summarizes the principal molecular characteristics of TGCTs that could represent a potential basis for development of novel diagnostic and treatment approaches. PMID:24182421

  5. Neural Network predictions of Diatomic and Triatomic Molecular Data

    NASA Astrophysics Data System (ADS)

    Blake Laing, W.

    1997-11-01

    The arrangement of molecules in periodic systems offers an enhanced comprehension of trends in molecular properties, a more efficient method of sorting and searching of molecular databases, and bases for the prediction of new data. Neural networks have the ability to "learn" existing data and to forecast a large amount of new data without a smoothing equation.(R. Hefferlin, B. Davis, W. B. Laing, "The Learning and Prediction of Triatomic Molecular Data with Neural Networks," International Arctic Seminar 1997, Murmansk, Russia)(J. Wohlers, W. B. Laing, R. Hefferlin, and B. Daivs, "Least-Squares and Neural-Network Forecasting from Citical Data: Diatomic Molecular Internuclear Separations and Triatomic Heats of Atomization and Ionization Potentials," Advances in Molecular Similarity: JIA book series, in press) This report will present periodic systems of molecules as well as neural network predictions for additional properties of diatomic and triatomic molecules.

  6. Lung Cancer Prediction Using Neural Network Ensemble with Histogram of Oriented Gradient Genomic Features

    PubMed Central

    Adetiba, Emmanuel; Olugbara, Oludayo O.

    2015-01-01

    This paper reports an experimental comparison of artificial neural network (ANN) and support vector machine (SVM) ensembles and their “nonensemble” variants for lung cancer prediction. These machine learning classifiers were trained to predict lung cancer using samples of patient nucleotides with mutations in the epidermal growth factor receptor, Kirsten rat sarcoma viral oncogene, and tumor suppressor p53 genomes collected as biomarkers from the IGDB.NSCLC corpus. The Voss DNA encoding was used to map the nucleotide sequences of mutated and normal genomes to obtain the equivalent numerical genomic sequences for training the selected classifiers. The histogram of oriented gradient (HOG) and local binary pattern (LBP) state-of-the-art feature extraction schemes were applied to extract representative genomic features from the encoded sequences of nucleotides. The ANN ensemble and HOG best fit the training dataset of this study with an accuracy of 95.90% and mean square error of 0.0159. The result of the ANN ensemble and HOG genomic features is promising for automated screening and early detection of lung cancer. This will hopefully assist pathologists in administering targeted molecular therapy and offering counsel to early stage lung cancer patients and persons in at risk populations. PMID:25802891

  7. Linking molecular feature space and disease terms for the immunosuppressive drug rapamycin.

    PubMed

    Bernthaler, Andreas; Mönks, Konrad; Mühlberger, Irmgard; Mayer, Bernd; Perco, Paul; Oberbauer, Rainer

    2011-10-01

    Next to development of novel drugs also drug repositioning appears promising for tackling unmet clinical needs. Here Omics provided the ground for novel analysis strategies for linking drug and disease by integrating profiles on the molecular as well as the clinical data level. We developed a workflow for linking drugs and diseases for identifying repositioning options, and exemplify the procedure for the immunosuppressive drug rapamycin. Our strategy rests on delineating a drug-specific molecular profile by combining Omics data reflecting the drug's impact on the cellular status as well as drug-associated molecular features extracted from the scientific literature. For rapamycin the respective profile held 905 unique molecular features reflecting defined molecular processes as identified by molecular pathway and process enrichment analysis. Literature mining identified 419 diseases significantly associated with this rapamycin molecular feature list, and transforming the significance of gene-disease associations into a continuous score allowed us to compute ROC and precision-recall for comparing this disease list with diseases already undergoing clinical trials utilizing rapamycin. The AUC of this assignment was computed as 0.84, indicating excellent recovery of relevant disease terms solely based on the drug molecular feature profile. We verified relevant indications by comparing molecular feature sets characteristic for the identified diseases to the drug molecular feature profile, demonstrating highly significant overlaps. The presented workflow allowed positive identification of diseases associated with rapamycin utilizing the drug-specific molecular feature profile, and may be well applicable to other drugs of interest. PMID:21789336

  8. Radiogenomic analysis of breast cancer: dynamic contrast enhanced - magnetic resonance imaging based features are associated with molecular subtypes

    NASA Astrophysics Data System (ADS)

    Wang, Shijian; Fan, Ming; Zhang, Juan; Zheng, Bin; Wang, Xiaojia; Li, Lihua

    2016-03-01

    Breast cancer is one of the most common malignant tumor with upgrading incidence in females. The key to decrease the mortality is early diagnosis and reasonable treatment. Molecular classification could provide better insights into patient-directed therapy and prognosis prediction of breast cancer. It is known that different molecular subtypes have different characteristics in magnetic resonance imaging (MRI) examination. Therefore, we assumed that imaging features can reflect molecular information in breast cancer. In this study, we investigated associations between dynamic contrasts enhanced MRI (DCE-MRI) features and molecular subtypes in breast cancer. Sixty patients with breast cancer were enrolled and the MR images were pre-processed for noise reduction, registration and segmentation. Sixty-five dimensional imaging features including statistical characteristics, morphology, texture and dynamic enhancement in breast lesion and background regions were semiautomatically extracted. The associations between imaging features and molecular subtypes were assessed by using statistical analyses, including univariate logistic regression and multivariate logistic regression. The results of multivariate regression showed that imaging features are significantly associated with molecular subtypes of Luminal A (p=0.00473), HER2-enriched (p=0.00277) and Basal like (p=0.0117), respectively. The results indicated that three molecular subtypes are correlated with DCE-MRI features in breast cancer. Specifically, patients with a higher level of compactness or lower level of skewness in breast lesion are more likely to be Luminal A subtype. Besides, the higher value of the dynamic enhancement at T1 time in normal side reflect higher possibility of HER2-enriched subtype in breast cancer.

  9. Embedded prediction in feature extraction: application to single-trial EEG discrimination.

    PubMed

    Hsu, Wei-Yen

    2013-01-01

    In this study, an analysis system embedding neuron-fuzzy prediction in feature extraction is proposed for brain-computer interface (BCI) applications. Wavelet-fractal features combined with neuro-fuzzy predictions are applied for feature extraction in motor imagery (MI) discrimination. The features are extracted from the electroencephalography (EEG) signals recorded from participants performing left and right MI. Time-series predictions are performed by training 2 adaptive neuro-fuzzy inference systems (ANFIS) for respective left and right MI data. Features are then calculated from the difference in multi-resolution fractal feature vector (MFFV) between the predicted and actual signals through a window of EEG signals. Finally, the support vector machine is used for classification. The proposed method estimates its performance in comparison with the linear adaptive autoregressive (AAR) model and the AAR time-series prediction of 6 participants from 2 data sets. The results indicate that the proposed method is promising in MI classification. PMID:23248335

  10. Clinical Risk Prediction by Exploring High-Order Feature Correlations

    PubMed Central

    Wang, Fei; Zhang, Ping; Wang, Xiang; Hu, Jianying

    2014-01-01

    Clinical risk prediction is one important problem in medical informatics, and logistic regression is one of the most widely used approaches for clinical risk prediction. In many cases, the number of potential risk factors is fairly large and the actual set of factors that contribute to the risk is small. Therefore sparse logistic regression is proposed, which can not only predict the clinical risk but also identify the set of relevant risk factors. The inputs of logistic regression and sparse logistic regression are required to be in vector form. This limits the applicability of these models in the problems when the data cannot be naturally represented vectors (e.g., medical images are two-dimensional matrices). To handle the cases when the data are in the form of multi-dimensional arrays, we propose HOSLR: High-Order Sparse Logistic Regression, which can be viewed as a high order extension of sparse logistic regression. Instead of solving one classification vector as in conventional logistic regression, we solve for K classification vectors in HOSLR (K is the number of modes in the data). A block proximal descent approach is proposed to solve the problem and its convergence is guaranteed. Finally we validate the effectiveness of HOSLR on predicting the onset risk of patients with Alzheimer’s disease and heart failure. PMID:25954428

  11. Interval Prediction of Molecular Properties in Parametrized Quantum Chemistry

    NASA Astrophysics Data System (ADS)

    Edwards, David E.; Zubarev, Dmitry Yu.; Packard, Andrew; Lester, William A.; Frenklach, Michael

    2014-06-01

    The accurate evaluation of molecular properties lies at the core of predictive physical models. Most reliable quantum-chemical calculations are limited to smaller molecular systems while purely empirical approaches are limited in accuracy and reliability. A promising approach is to employ a quantum-mechanical formalism with simplifications and to compensate for the latter with parametrization. We propose a strategy of directly predicting the uncertainty interval for a property of interest, based on training-data uncertainties, which sidesteps the need for an optimum set of parameters.

  12. Personalized Cancer Medicine: Molecular Diagnostics, Predictive biomarkers, and Drug Resistance

    PubMed Central

    Gonzalez de Castro, D; Clarke, P A; Al-Lazikani, B; Workman, P

    2013-01-01

    The progressive elucidation of the molecular pathogenesis of cancer has fueled the rational development of targeted drugs for patient populations stratified by genetic characteristics. Here we discuss general challenges relating to molecular diagnostics and describe predictive biomarkers for personalized cancer medicine. We also highlight resistance mechanisms for epidermal growth factor receptor (EGFR) kinase inhibitors in lung cancer. We envisage a future requiring the use of longitudinal genome sequencing and other omics technologies alongside combinatorial treatment to overcome cellular and molecular heterogeneity and prevent resistance caused by clonal evolution. PMID:23361103

  13. Prediction of OCR accuracy using simple image features

    SciTech Connect

    Blando, L.R.; Kanai, Junichi; Nartker, T.A.

    1995-04-01

    A classifier for predicting the character accuracy of a given page achieved by any Optical Character Recognition (OCR) system is presented. This classifier is based on measuring the amount of white speckle, the amount of character fragments, and overall size information in the page. No output from the OCR system is used. The given page is classified as either good quality (i.e., high OCR accuracy expected) or poor (i.e., low OCR accuracy expected). Six OCR systems processed two different sets of test data: a set of 439 pages obtained from technical and scientific documents and a set of 200 pages obtained from magazines. For every system, approximately 85% of the pages in each data set were correctly predicted. The performance of this classifier is also compared with the ideal-case performance of a prediction method based upon the number of reject markers in OCR generated text. In several cases, this method matched or exceeded the performance of the reject based approach.

  14. Epileptic Seizure Prediction based on Ratio and Differential Linear Univariate Features

    PubMed Central

    Rasekhi, Jalil; Mollaei, Mohammad Reza Karami; Bandarabadi, Mojtaba; Teixeira, César A.; Dourado, António

    2015-01-01

    Bivariate features, obtained from multichannel electroencephalogram recordings, quantify the relation between different brain regions. Studies based on bivariate features have shown optimistic results for tackling epileptic seizure prediction problem in patients suffering from refractory epilepsy. A new bivariate approach using univariate features is proposed here. Differences and ratios of 22 linear univariate features were calculated using pairwise combination of 6 electroencephalograms channels, to create 330 differential, and 330 relative features. The feature subsets were classified using support vector machines separately, as one of the two classes of preictal and nonpreictal. Furthermore, minimum Redundancy Maximum Relevance feature reduction method is employed to improve the predictions and reduce the number of false alarms. The studies were carried out on features obtained from 10 patients. For reduced subset of 30 features and using differential approach, the seizures were on average predicted in 60.9% of the cases (28 out of 46 in 737.9 h of test data), with a low false prediction rate of 0.11 h−1. Results of bivariate approaches were compared with those achieved from original linear univariate features, extracted from 6 channels. The advantage of proposed bivariate features is the smaller number of false predictions in comparison to the original 22 univariate features. In addition, reduction in feature dimension could provide a less complex and the more cost-effective algorithm. Results indicate that applying machine learning methods on a multidimensional feature space resulting from relative/differential pairwise combination of 22 univariate features could predict seizure onsets with high performance. PMID:25709936

  15. Analysis of motion features for molecular dynamics simulation of proteins

    NASA Astrophysics Data System (ADS)

    Kamada, Mayumi; Toda, Mikito; Sekijima, Masakazu; Takata, Masami; Joe, Kazuki

    2011-01-01

    Recently, a new method for time series analysis using the wavelet transformation has been proposed by Sakurai et al. We apply it to molecular dynamics simulation of Thermomyces lanuginosa lipase (TLL). Introducing indexes to characterize collective motion of the protein, we have obtained the following two results. First, time evolution of the collective motion involves not only the dynamics within a single potential well but also takes place wandering around multiple conformations. Second, correlation of the collective motion between secondary structures shows that collective motion exists involving multiple secondary structures. We discuss future prospects of our study involving 'disordered proteins'.

  16. Rational Prediction with Molecular Dynamics for Hit Identification

    PubMed Central

    Nichols, Sara E; Swift, Robert V; Amaro, Rommie E

    2012-01-01

    Although the motions of proteins are fundamental for their function, for pragmatic reasons, the consideration of protein elasticity has traditionally been neglected in drug discovery and design. This review details protein motion, its relevance to biomolecular interactions and how it can be sampled using molecular dynamics simulations. Within this context, two major areas of research in structure-based prediction that can benefit from considering protein flexibility, binding site detection and molecular docking, are discussed. Basic classification metrics and statistical analysis techniques, which can facilitate performance analysis, are also reviewed. With hardware and software advances, molecular dynamics in combination with traditional structure-based prediction methods can potentially reduce the time and costs involved in the hit identification pipeline. PMID:23110535

  17. Skeletal Muscle Laminopathies: A Review of Clinical and Molecular Features.

    PubMed

    Maggi, Lorenzo; Carboni, Nicola; Bernasconi, Pia

    2016-01-01

    LMNA-related disorders are caused by mutations in the LMNA gene, which encodes for the nuclear envelope proteins, lamin A and C, via alternative splicing. Laminopathies are associated with a wide range of disease phenotypes, including neuromuscular, cardiac, metabolic disorders and premature aging syndromes. The most frequent diseases associated with mutations in the LMNA gene are characterized by skeletal and cardiac muscle involvement. This review will focus on genetics and clinical features of laminopathies affecting primarily skeletal muscle. Although only symptomatic treatment is available for these patients, many achievements have been made in clarifying the pathogenesis and improving the management of these diseases. PMID:27529282

  18. Molecular Pathogenesis and Diagnostic, Prognostic and Predictive Molecular Markers in Sarcoma.

    PubMed

    Mariño-Enríquez, Adrián; Bovée, Judith V M G

    2016-09-01

    Sarcomas are infrequent mesenchymal neoplasms characterized by notable morphological and molecular heterogeneity. Molecular studies in sarcoma provide refinements to morphologic classification, and contribute diagnostic information (frequently), prognostic stratification (rarely) and predict therapeutic response (occasionally). Herein, we summarize the major molecular mechanisms underlying sarcoma pathogenesis and present clinically useful diagnostic, prognostic and predictive molecular markers for sarcoma. Five major molecular alterations are discussed, illustrated with representative sarcoma types, including 1. the presence of chimeric transcription factors, in vascular tumors; 2. abnormal kinase signaling, in gastrointestinal stromal tumor; 3. epigenetic deregulation, in chondrosarcoma, chondroblastoma, and other tumors; 4. deregulated cell survival and proliferation, due to focal copy number alterations, in dedifferentiated liposarcoma; 5. extreme genomic instability, in conventional osteosarcoma as a representative example of sarcomas with highly complex karyotype. PMID:27523972

  19. Extraction of Molecular Features through Exome to Transcriptome Alignment.

    PubMed

    Mudvari, Prakriti; Kowsari, Kamran; Cole, Charles; Mazumder, Raja; Horvath, Anelia

    2013-08-22

    Integrative Next Generation Sequencing (NGS) DNA and RNA analyses have very recently become feasible, and the published to date studies have discovered critical disease implicated pathways, and diagnostic and therapeutic targets. A growing number of exomes, genomes and transcriptomes from the same individual are quickly accumulating, providing unique venues for mechanistic and regulatory features analysis, and, at the same time, requiring new exploration strategies. In this study, we have integrated variation and expression information of four NGS datasets from the same individual: normal and tumor breast exomes and transcriptomes. Focusing on SNPcentered variant allelic prevalence, we illustrate analytical algorithms that can be applied to extract or validate potential regulatory elements, such as expression or growth advantage, imprinting, loss of heterozygosity (LOH), somatic changes, and RNA editing. In addition, we point to some critical elements that might bias the output and recommend alternative measures to maximize the confidence of findings. The need for such strategies is especially recognized within the growing appreciation of the concept of systems biology: integrative exploration of genome and transcriptome features reveal mechanistic and regulatory insights that reach far beyond linear addition of the individual datasets. PMID:24791251

  20. Extraction of Molecular Features through Exome to Transcriptome Alignment

    PubMed Central

    Mudvari, Prakriti; Kowsari, Kamran; Cole, Charles; Mazumder, Raja; Horvath, Anelia

    2014-01-01

    Integrative Next Generation Sequencing (NGS) DNA and RNA analyses have very recently become feasible, and the published to date studies have discovered critical disease implicated pathways, and diagnostic and therapeutic targets. A growing number of exomes, genomes and transcriptomes from the same individual are quickly accumulating, providing unique venues for mechanistic and regulatory features analysis, and, at the same time, requiring new exploration strategies. In this study, we have integrated variation and expression information of four NGS datasets from the same individual: normal and tumor breast exomes and transcriptomes. Focusing on SNPcentered variant allelic prevalence, we illustrate analytical algorithms that can be applied to extract or validate potential regulatory elements, such as expression or growth advantage, imprinting, loss of heterozygosity (LOH), somatic changes, and RNA editing. In addition, we point to some critical elements that might bias the output and recommend alternative measures to maximize the confidence of findings. The need for such strategies is especially recognized within the growing appreciation of the concept of systems biology: integrative exploration of genome and transcriptome features reveal mechanistic and regulatory insights that reach far beyond linear addition of the individual datasets. PMID:24791251

  1. Prediction of reactive hazards based on molecular structure.

    PubMed

    Saraf, S R; Rogers, W J; Mannan, M S

    2003-03-17

    There is considerable interest in prediction of reactive hazards based on chemical structure. Calorimetric measurements to determine reactivity can be resource consuming, so computational methods to predict reactivity hazards present an attractive option. This paper reviews some of the commonly employed theoretical hazard evaluation techniques, including the oxygen-balance method, ASTM CHETAH, and calculated adiabatic reaction temperature (CART). It also discusses the development of a study table to correlate and predict calorimetric properties of pure compounds. Quantitative structure-property relationships (QSPR) based on quantum mechanical calculations can be employed to correlate calorimetrically measured onset temperatures, T(o), and energies of reaction, -deltaH, with molecular properties. To test the feasibility of this approach, the QSPR technique is used to correlate differential scanning calorimeter (DSC) data, T(o) and -deltaH, with molecular properties for 19 nitro compounds. PMID:12628775

  2. Structural and Molecular Modeling Features of P2X Receptors

    PubMed Central

    Alves, Luiz Anastacio; da Silva, João Herminio Martins; Ferreira, Dinarte Neto Moreira; Fidalgo-Neto, Antonio Augusto; Teixeira, Pedro Celso Nogueira; de Souza, Cristina Alves Magalhães; Caffarena, Ernesto Raúl; de Freitas, Mônica Santos

    2014-01-01

    Currently, adenosine 5′-triphosphate (ATP) is recognized as the extracellular messenger that acts through P2 receptors. P2 receptors are divided into two subtypes: P2Y metabotropic receptors and P2X ionotropic receptors, both of which are found in virtually all mammalian cell types studied. Due to the difficulty in studying membrane protein structures by X-ray crystallography or NMR techniques, there is little information about these structures available in the literature. Two structures of the P2X4 receptor in truncated form have been solved by crystallography. Molecular modeling has proven to be an excellent tool for studying ionotropic receptors. Recently, modeling studies carried out on P2X receptors have advanced our knowledge of the P2X receptor structure-function relationships. This review presents a brief history of ion channel structural studies and shows how modeling approaches can be used to address relevant questions about P2X receptors. PMID:24637936

  3. Clinical and Molecular Features of POLG-Related Mitochondrial Disease

    PubMed Central

    Stumpf, Jeffrey D.; Saneto, Russell P.; Copeland, William C.

    2013-01-01

    The inability to replicate mitochondrial genomes (mtDNA) by the mitochondrial DNA polymerase (pol γ) leads to a subset of mitochondrial diseases. Many mutations in POLG, the gene that encodes pol γ, have been associated with mitochondrial diseases such as myocerebrohepatopathy spectrum (MCHS) disorders, Alpers-Huttenlocher syndrome, myoclonic epilepsy myopathy sensory ataxia (MEMSA), ataxia neuropathy spectrum (ANS), and progressive external ophthalmoplegia (PEO). This chapter explores five important topics in POLG-related disease: (1) clinical symptoms that identify and distinguish POLG-related diseases, (2) molecular characterization of defects in polymerase activity by POLG disease variants, (3) the importance of holoenzyme formation in disease presentation, (4) the role of pol γ exonuclease activity and mutagenesis in disease and aging, and (5) novel approaches to therapy and avoidance of toxicity based on primary research in pol γ replication. PMID:23545419

  4. Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features

    PubMed Central

    2011-01-01

    Background Machine learning methods are nowadays used for many biological prediction problems involving drugs, ligands or polypeptide segments of a protein. In order to build a prediction model a so called training data set of molecules with measured target properties is needed. For many such problems the size of the training data set is limited as measurements have to be performed in a wet lab. Furthermore, the considered problems are often complex, such that it is not clear which molecular descriptors (features) may be suitable to establish a strong correlation with the target property. In many applications all available descriptors are used. This can lead to difficult machine learning problems, when thousands of descriptors are considered and only few (e.g. below hundred) molecules are available for training. Results The CoEPrA contest provides four data sets, which are typical for biological regression problems (few molecules in the training data set and thousands of descriptors). We applied the same two-step training procedure for all four regression tasks. In the first stage, we used optimized L1 regularization to select the most relevant features. Thus, the initial set of more than 6,000 features was reduced to about 50. In the second stage, we used only the selected features from the preceding stage applying a milder L2 regularization, which generally yielded further improvement of prediction performance. Our linear model employed a soft loss function which minimizes the influence of outliers. Conclusions The proposed two-step method showed good results on all four CoEPrA regression tasks. Thus, it may be useful for many other biological prediction problems where for training only a small number of molecules are available, which are described by thousands of descriptors. PMID:22026913

  5. Clinical impact of molecular features in diffuse large B-cell lymphoma and follicular lymphoma.

    PubMed

    Pon, Julia R; Marra, Marco A

    2016-01-14

    Our understanding of the pathogenesis and heterogeneity of diffuse large B-cell lymphoma (DLBCL) and follicular lymphoma (FL) has been dramatically enhanced by recent attempts to profile molecular features of these lymphomas. In this article, we discuss ways in which testing for molecular features may impact DLBCL and FL management if clinical trials are designed to incorporate such tests. Specifically, we discuss how distinguishing lymphomas on the basis of cell-of-origin subtypes or the presence of other molecular features is prognostically and therapeutically significant. Conversely, we discuss how the molecular similarities of DLBCL and FL have provided insight into the potential of both DLBCL and FL cases to respond to agents targeting alterations they have in common. Through these examples, we demonstrate how the translation of our understanding of cancer biology into improvements in patient outcomes depends on analyzing the molecular correlates of treatment outcomes in clinical trials and in routinely treated patients. PMID:26447189

  6. Molecular features in arsenic-induced lung tumors

    PubMed Central

    2013-01-01

    Arsenic is a well-known human carcinogen, which potentially affects ~160 million people worldwide via exposure to unsafe levels in drinking water. Lungs are one of the main target organs for arsenic-related carcinogenesis. These tumors exhibit particular features, such as squamous cell-type specificity and high incidence among never smokers. Arsenic-induced malignant transformation is mainly related to the biotransformation process intended for the metabolic clearing of the carcinogen, which results in specific genetic and epigenetic alterations that ultimately affect key pathways in lung carcinogenesis. Based on this, lung tumors induced by arsenic exposure could be considered an additional subtype of lung cancer, especially in the case of never-smokers, where arsenic is a known etiological agent. In this article, we review the current knowledge on the various mechanisms of arsenic carcinogenicity and the specific roles of this metalloid in signaling pathways leading to lung cancer. PMID:23510327

  7. Synthesis of a specified, silica molecular sieve by using computationally predicted organic structure-directing agents.

    PubMed

    Schmidt, Joel E; Deem, Michael W; Davis, Mark E

    2014-08-01

    Crystalline molecular sieves are used in numerous applications, where the properties exploited for each technology are the direct consequence of structural features. New materials are typically discovered by trial and error, and in many cases, organic structure-directing agents (OSDAs) are used to direct their formation. Here, we report the first successful synthesis of a specified molecular sieve through the use of an OSDA that was predicted from a recently developed computational method that constructs chemically synthesizable OSDAs. Pentamethylimidazolium is computationally predicted to have the largest stabilization energy in the STW framework, and is experimentally shown to strongly direct the synthesis of pure-silica STW. Other OSDAs with lower stabilization energies did not form STW. The general method demonstrated here to create STW may lead to new, simpler OSDAs for existing frameworks and provide a way to predict OSDAs for desired, theoretical frameworks. PMID:24961789

  8. Clinical Relevance of Prognostic and Predictive Molecular Markers in Gliomas.

    PubMed

    Siegal, Tali

    2016-01-01

    Sorting and grading of glial tumors by the WHO classification provide clinicians with guidance as to the predicted course of the disease and choice of treatment. Nonetheless, histologically identical tumors may have very different outcome and response to treatment. Molecular markers that carry both diagnostic and prognostic information add useful tools to traditional classification by redefining tumor subtypes within each WHO category. Therefore, molecular markers have become an integral part of tumor assessment in modern neuro-oncology and biomarker status now guides clinical decisions in some subtypes of gliomas. The routine assessment of IDH status improves histological diagnostic accuracy by differentiating diffuse glioma from reactive gliosis. It carries a favorable prognostic implication for all glial tumors and it is predictive for chemotherapeutic response in anaplastic oligodendrogliomas with codeletion of 1p/19q chromosomes. Glial tumors that contain chromosomal codeletion of 1p/19q are defined as tumors of oligodendroglial lineage and have favorable prognosis. MGMT promoter methylation is a favorable prognostic marker in astrocytic high-grade gliomas and it is predictive for chemotherapeutic response in anaplastic gliomas with wild-type IDH1/2 and in glioblastoma of the elderly. The clinical implication of other molecular markers of gliomas like mutations of EGFR and ATRX genes and BRAF fusion or point mutation is highlighted. The potential of molecular biomarker-based classification to guide future therapeutic approach is discussed and accentuated. PMID:26508407

  9. Prediction and Analysis of Quorum Sensing Peptides Based on Sequence Features

    PubMed Central

    Rajput, Akanksha; Gupta, Amit Kumar; Kumar, Manoj

    2015-01-01

    Quorum sensing peptides (QSPs) are the signaling molecules used by the Gram-positive bacteria in orchestrating cell-to-cell communication. In spite of their enormous importance in signaling process, their detailed bioinformatics analysis is lacking. In this study, QSPs and non-QSPs were examined according to their amino acid composition, residues position, motifs and physicochemical properties. Compositional analysis concludes that QSPs are enriched with aromatic residues like Trp, Tyr and Phe. At the N-terminal, Ser was a dominant residue at maximum positions, namely, first, second, third and fifth while Phe was a preferred residue at first, third and fifth positions from the C-terminal. A few motifs from QSPs were also extracted. Physicochemical properties like aromaticity, molecular weight and secondary structure were found to be distinguishing features of QSPs. Exploiting above properties, we have developed a Support Vector Machine (SVM) based predictive model. During 10-fold cross-validation, SVM achieves maximum accuracy of 93.00%, Mathew’s correlation coefficient (MCC) of 0.86 and Receiver operating characteristic (ROC) of 0.98 on the training/testing dataset (T200p+200n). Developed models performed equally well on the validation dataset (V20p+20n). The server also integrates several useful analysis tools like “QSMotifScan”, “ProtFrag”, “MutGen” and “PhysicoProp”. Our analysis reveals important characteristics of QSPs and on the basis of these unique features, we have developed a prediction algorithm “QSPpred” (freely available at: http://crdd.osdd.net/servers/qsppred). PMID:25781990

  10. Wiring and Molecular Features of Prefrontal Ensembles Representing Distinct Experiences.

    PubMed

    Ye, Li; Allen, William E; Thompson, Kimberly R; Tian, Qiyuan; Hsueh, Brian; Ramakrishnan, Charu; Wang, Ai-Chi; Jennings, Joshua H; Adhikari, Avishek; Halpern, Casey H; Witten, Ilana B; Barth, Alison L; Luo, Liqun; McNab, Jennifer A; Deisseroth, Karl

    2016-06-16

    A major challenge in understanding the cellular diversity of the brain has been linking activity during behavior with standard cellular typology. For example, it has not been possible to determine whether principal neurons in prefrontal cortex active during distinct experiences represent separable cell types, and it is not known whether these differentially active cells exert distinct causal influences on behavior. Here, we develop quantitative hydrogel-based technologies to connect activity in cells reporting on behavioral experience with measures for both brain-wide wiring and molecular phenotype. We find that positive and negative-valence experiences in prefrontal cortex are represented by cell populations that differ in their causal impact on behavior, long-range wiring, and gene expression profiles, with the major discriminant being expression of the adaptation-linked gene NPAS4. These findings illuminate cellular logic of prefrontal cortex information processing and natural adaptive behavior and may point the way to cell-type-specific understanding and treatment of disease-associated states. PMID:27238022

  11. Clinical and molecular features of Joubert syndrome and related disorders

    PubMed Central

    Parisi, Melissa A.

    2009-01-01

    Joubert syndrome (JBTS; OMIM 213300) is a rare, autosomal recessive disorder characterized by a specific congenital malformation of the hindbrain and a broad spectrum of other phenotypic findings that is now known to be caused by defects in the structure and/or function of the primary cilium. The complex hindbrain malformation that is characteristic of JBTS can be identified on axial magnetic resonance imaging and is known as the molar tooth sign (MTS); other diagnostic criteria include intellectual disability, hypotonia, and often, abnormal respiratory pattern and/or abnormal eye movements. In addition, a broad spectrum of other anomalies characterize Joubert syndrome and related disorders (JSRD), and may include retinal dystrophy, ocular coloboma, oral frenulae and tongue tumors, polydactyly, cystic renal disease (including cystic dysplasia or juvenile nephronophthisis), and congenital hepatic fibrosis. The clinical course can be variable, but most children with this condition survive infancy to reach adulthood. At least 8 genes cause JSRD, with some genotype-phenotype correlations emerging, including the association between mutations in the MKS3 gene and hepatic fibrosis characteristic of the JSRD subtype known as COACH syndrome. Several of the causative genes for JSRD are implicated in other ciliary disorders, such as juvenile nephronophthisis and Meckel syndrome, illustrating the close association between these conditions and their overlapping clinical features that reflect a shared etiology involving the primary cilium. PMID:19876931

  12. Adaptive modelling of structured molecular representations for toxicity prediction

    NASA Astrophysics Data System (ADS)

    Bertinetto, Carlo; Duce, Celia; Micheli, Alessio; Solaro, Roberto; Tiné, Maria Rosaria

    2012-12-01

    We investigated the possibility of modelling structure-toxicity relationships by direct treatment of the molecular structure (without using descriptors) through an adaptive model able to retain the appropriate structural information. With respect to traditional descriptor-based approaches, this provides a more general and flexible way to tackle prediction problems that is particularly suitable when little or no background knowledge is available. Our method employs a tree-structured molecular representation, which is processed by a recursive neural network (RNN). To explore the realization of RNN modelling in toxicological problems, we employed a data set containing growth impairment concentrations (IGC50) for Tetrahymena pyriformis.

  13. Interaction of proteases with legume seed inhibitors. Molecular features.

    PubMed

    de Seidl, D S

    1996-12-01

    After having found that raw black beans (Phaseolus vulgaris) were toxic, while the cooked ones constitute the basic diet of the underdeveloped peoples of the world, in the sixties, our research directed by Dr. Jaffé, concentrated mainly around the detection and identification of the heat labile toxic factors in legume seeds. A micromethod for the detection of protease inhibitors (PI) in individual seeds was developed, for the purpose of establishing that the multiple trypsin inhibitors (TI) found in the Cubagua variety were expressions of single seeds and not a mixture of a non homogenous bean lot. Six isoinhibitors were isolated and purified, all of which were "double-headed" and interacted with trypsin (T) and chymotrypsin (CHT) independently and simultaneously, as shown by electrophoresis of their binary and ternary complexes with each and both enzymes. However, their affinity for the enzymes, including elastases, was rather variable, as well as their amino acid composition which consisted of 51 units for inhibitor V, the smallest, and 83 amino acids for inhibitor I, the largest. A low molecular weight protein fraction that inhibited subtilisin (S), but recognized neither T, CHT nor pancreatic elastase was detected in 63 varieties of Phaseolus vulgaris as well as in broad beans (Vicia faba), chick peas (Cicer arietinum), jack beans (Canavalia ensiformis), kidney beans (Vigna aureus), etc., It was absent though, in soybeans (Glycine max), lentils (Lens culinaris), green peas (Pisum sativum), cowpea (Vigna sinensis) and lupine seeds (Lupinus sp). Subtilisin inhibitors (SI) were isolated from black beans, broad beans, chick peas and jack beans. Their Mr is between 8-9KD and they show a rather high stability in the presence of denaturing agents. They are specific toward microbial proteases, in addition to subtilisins, Carlsberg and BPN', they inhibit the alkaline protease from Tritirachium album (Protease K), from Aspergillus oryzae and one isolated from

  14. Circular features with predictable size on Xanadu region of Titan

    NASA Astrophysics Data System (ADS)

    Kochemasov, G. G.

    2008-09-01

    Planets' satellites in the Solar system (rocky and icy) have in common one fundamental property: all of them move simultaneously in two orbits - around Sun and around their planets (planets have only one orbit in the Solar system). As was shown by the wave planetology [1-6] " orbits make structures'. This means that movements in elliptical keplerian orbits imply periodically changing increasing and decreasing accelerations. Multiplied by celestial body mass this produces inertia-gravity forces (Newton: F=m • a). These forces warp celestial bodies in form of standing waves propagating in rotating bodies in four interfering orthogonal and diagonal directions. This interference gives three kinds of regularly disposed tectonic blocks: uprising (+), subsiding (-), neutral (0)(Fig. 1). Their size depends on warping wavelengths. The fundamental wave1 and its first overtone wave2 (and weaker ones) are responsible for ubiquitous tectonic dichotomy - two hemispheres - segments and sectoring. These superimposed global tectonic features are adorned by tectonic granulations size of which is inversely proportional to orbital frequencies: higher frequency - smaller granule, lower frequency - larger granule. A row of the planets granulations is as follows: Mercury πR/16, Venus πR/6, Earth πR/4, Mars πR/2, asteroids πR/1, Jupiter 3πR, Saturn 7.5πR, Uranus 21πR, Neptune 41πR, Pluto 62πR (a granule size is a half of a wavelength; a scale is Earth with πR/4 granule corresponding to 1/1 year orbital frequency; R-radius). So, orbits make structures. They are simpler for planets, but much more complicated for moons. Their surfaces are saturated with granules related to two main frequencies and at least two modulated side frequencies. Two orbits imply a wave modulation. The lower circum-Sun frequency modulates the higher circum-planet frequency by dividing and multiplying it thus producing two side frequencies with corresponding waves and granules. In case of Titan for the

  15. Protein location prediction using atomic composition and global features of the amino acid sequence

    SciTech Connect

    Cherian, Betsy Sheena; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.

  16. Clinical, Epidemiologic, Histopathologic and Molecular Features of an Unexplained Dermopathy

    PubMed Central

    Pearson, Michele L.; Selby, Joseph V.; Katz, Kenneth A.; Cantrell, Virginia; Braden, Christopher R.; Parise, Monica E.; Paddock, Christopher D.; Lewin-Smith, Michael R.; Kalasinsky, Victor F.; Goldstein, Felicia C.; Hightower, Allen W.; Papier, Arthur; Lewis, Brian; Motipara, Sarita; Eberhard, Mark L.

    2012-01-01

    Background Morgellons is a poorly characterized constellation of symptoms, with the primary manifestations involving the skin. We conducted an investigation of this unexplained dermopathy to characterize the clinical and epidemiologic features and explore potential etiologies. Methods A descriptive study was conducted among persons at least 13 years of age and enrolled in Kaiser Permanente Northern California (KPNC) during 2006–2008. A case was defined as the self-reported emergence of fibers or materials from the skin accompanied by skin lesions and/or disturbing skin sensations. We collected detailed epidemiologic data, performed clinical evaluations and geospatial analyses and analyzed materials collected from participants' skin. Results We identified 115 case-patients. The prevalence was 3.65 (95% CI = 2.98, 4.40) cases per 100,000 enrollees. There was no clustering of cases within the 13-county KPNC catchment area (p = .113). Case-patients had a median age of 52 years (range: 17–93) and were primarily female (77%) and Caucasian (77%). Multi-system complaints were common; 70% reported chronic fatigue and 54% rated their overall health as fair or poor with mean Physical Component Scores and Mental Component Scores of 36.63 (SD = 12.9) and 35.45 (SD = 12.89), respectively. Cognitive deficits were detected in 59% of case-patients and 63% had evidence of clinically significant somatic complaints; 50% had drugs detected in hair samples and 78% reported exposure to solvents. Solar elastosis was the most common histopathologic abnormality (51% of biopsies); skin lesions were most consistent with arthropod bites or chronic excoriations. No parasites or mycobacteria were detected. Most materials collected from participants' skin were composed of cellulose, likely of cotton origin. Conclusions This unexplained dermopathy was rare among this population of Northern California residents, but associated with significantly reduced health-related quality of

  17. Bankruptcy prediction using SVM models with a new approach to combine features selection and parameter optimisation

    NASA Astrophysics Data System (ADS)

    Zhou, Ligang; Keung Lai, Kin; Yen, Jerome

    2014-03-01

    Due to the economic significance of bankruptcy prediction of companies for financial institutions, investors and governments, many quantitative methods have been used to develop effective prediction models. Support vector machine (SVM), a powerful classification method, has been used for this task; however, the performance of SVM is sensitive to model form, parameter setting and features selection. In this study, a new approach based on direct search and features ranking technology is proposed to optimise features selection and parameter setting for 1-norm and least-squares SVM models for bankruptcy prediction. This approach is also compared to the SVM models with parameter optimisation and features selection by the popular genetic algorithm technique. The experimental results on a data set with 2010 instances show that the proposed models are good alternatives for bankruptcy prediction.

  18. Structural class prediction of protein using novel feature extraction method from chaos game representation of predicted secondary structure.

    PubMed

    Zhang, Lichao; Kong, Liang; Han, Xiaodong; Lv, Jinfeng

    2016-07-01

    Protein structural class prediction plays an important role in protein structure and function analysis, drug design and many other biological applications. Extracting good representation from protein sequence is fundamental for this prediction task. In recent years, although several secondary structure based feature extraction strategies have been specially proposed for low-similarity protein sequences, the prediction accuracy still remains limited. To explore the potential of secondary structure information, this study proposed a novel feature extraction method from the chaos game representation of predicted secondary structure to mainly capture sequence order information and secondary structure segments distribution information in a given protein sequence. Several kinds of prediction accuracies obtained by the jackknife test are reported on three widely used low-similarity benchmark datasets (25PDB, 1189 and 640). Compared with the state-of-the-art prediction methods, the proposed method achieves the highest overall accuracies on all the three datasets. The experimental results confirm that the proposed feature extraction method is effective for accurate prediction of protein structural class. Moreover, it is anticipated that the proposed method could be extended to other graphical representations of protein sequence and be helpful in future research. PMID:27084358

  19. Predictive Value of Morphological Features in Patients with Autism versus Normal Controls

    ERIC Educational Resources Information Center

    Ozgen, H.; Hellemann, G. S.; de Jonge, M. V.; Beemer, F. A.; van Engeland, H.

    2013-01-01

    We investigated the predictive power of morphological features in 224 autistic patients and 224 matched-pairs controls. To assess the relationship between the morphological features and autism, we used the receiver operator curves (ROC). In addition, we used recursive partitioning (RP) to determine a specific pattern of abnormalities that is…

  20. Using random forest to classify linear B-cell epitopes based on amino acid properties and molecular features.

    PubMed

    Huang, Jian-Hua; Wen, Ming; Tang, Li-Juan; Xie, Hua-Lin; Fu, Liang; Liang, Yi-Zeng; Lu, Hong-Mei

    2014-08-01

    Identification and characterization of B-cell epitopes in target antigens was one of the key steps in epitopes-driven vaccine design, immunodiagnostic tests, and antibody production. Experimental determination of epitopes was labor-intensive and expensive. Therefore, there was an urgent need of computational methods for reliable identification of B-cell epitopes. In current study, we proposed a novel peptide feature description method which combined peptide amino acid properties with chemical molecular features. Based on these combined features, a random forest (RF) classifier was adopted to classify B-cell epitopes and non-epitopes. RF is an ensemble method that uses recursive partitioning to generate many trees for aggregating the results; and it always produces highly competitive models. The classification accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), and area under the curve (AUC) values for current method were 78.31%, 80.05%, 72.23%, 0.5836, and 0.8800, respectively. These results showed that an appropriate combination of peptide amino acid features and chemical molecular features with a RF model could enhance the prediction performance of linear B-cell epitopes. Finally, a freely online service was available at http://sysbio.yznu.cn/Research/Epitopesprediction.aspx. PMID:24721579

  1. Toward Fully in Silico Melting Point Prediction Using Molecular Simulations

    SciTech Connect

    Zhang, Y; Maginn, EJ

    2013-03-01

    Melting point is one of the most fundamental and practically important properties of a compound. Molecular computation of melting points. However, all of these methods simulation methods have been developed for the accurate need an experimental crystal structure as input, which means that such calculations are not really predictive since the melting point can be measured easily in experiments once a crystal structure is known. On the other hand, crystal structure prediction (CSP) has become an active field and significant progress has been made, although challenges still exist. One of the main challenges is the existence of many crystal structures (polymorphs) that are very close in energy. Thermal effects and kinetic factors make the situation even more complicated, such that it is still not trivial to predict experimental crystal structures. In this work, we exploit the fact that free energy differences are often small between crystal structures. We show that accurate melting point predictions can be made by using a reasonable crystal structure from CSP as a starting point for a free energy-based melting point calculation. The key is that most crystal structures predicted by CSP have free energies that are close to that of the experimental structure. The proposed method was tested on two rigid molecules and the results suggest that a fully in silico melting point prediction method is possible.

  2. Widespread convergence in toxin resistance by predictable molecular evolution

    PubMed Central

    Ujvari, Beata; Casewell, Nicholas R.; Sunagar, Kartik; Arbuckle, Kevin; Wüster, Wolfgang; Lo, Nathan; O’Meally, Denis; Beckmann, Christa; King, Glenn F.; Deplazes, Evelyne; Madsen, Thomas

    2015-01-01

    The question about whether evolution is unpredictable and stochastic or intermittently constrained along predictable pathways is the subject of a fundamental debate in biology, in which understanding convergent evolution plays a central role. At the molecular level, documented examples of convergence are rare and limited to occurring within specific taxonomic groups. Here we provide evidence of constrained convergent molecular evolution across the metazoan tree of life. We show that resistance to toxic cardiac glycosides produced by plants and bufonid toads is mediated by similar molecular changes to the sodium-potassium-pump (Na+/K+-ATPase) in insects, amphibians, reptiles, and mammals. In toad-feeding reptiles, resistance is conferred by two point mutations that have evolved convergently on four occasions, whereas evidence of a molecular reversal back to the susceptible state in varanid lizards migrating to toad-free areas suggests that toxin resistance is maladaptive in the absence of selection. Importantly, resistance in all taxa is mediated by replacements of 2 of the 12 amino acids comprising the Na+/K+-ATPase H1–H2 extracellular domain that constitutes a core part of the cardiac glycoside binding site. We provide mechanistic insight into the basis of resistance by showing that these alterations perturb the interaction between the cardiac glycoside bufalin and the Na+/K+-ATPase. Thus, similar selection pressures have resulted in convergent evolution of the same molecular solution across the breadth of the animal kingdom, demonstrating how a scarcity of possible solutions to a selective challenge can lead to highly predictable evolutionary responses. PMID:26372961

  3. Selecting radiomic features from FDG-PET images for cancer treatment outcome prediction.

    PubMed

    Lian, Chunfeng; Ruan, Su; Denœux, Thierry; Jardin, Fabrice; Vera, Pierre

    2016-08-01

    As a vital task in cancer therapy, accurately predicting the treatment outcome is valuable for tailoring and adapting a treatment planning. To this end, multi-sources of information (radiomics, clinical characteristics, genomic expressions, etc) gathered before and during treatment are potentially profitable. In this paper, we propose such a prediction system primarily using radiomic features (e.g., texture features) extracted from FDG-PET images. The proposed system includes a feature selection method based on Dempster-Shafer theory, a powerful tool to deal with uncertain and imprecise information. It aims to improve the prediction accuracy, and reduce the imprecision and overlaps between different classes (treatment outcomes) in a selected feature subspace. Considering that training samples are often small-sized and imbalanced in our applications, a data balancing procedure and specified prior knowledge are taken into account to improve the reliability of the selected feature subsets. Finally, the Evidential K-NN (EK-NN) classifier is used with selected features to output prediction results. Our prediction system has been evaluated by synthetic and clinical datasets, consistently showing good performance. PMID:27236221

  4. Feature maps driven no-reference image quality prediction of authentically distorted images

    NASA Astrophysics Data System (ADS)

    Ghadiyaram, Deepti; Bovik, Alan C.

    2015-03-01

    Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.

  5. Prediction of Conversion from Mild Cognitive Impairment to Alzheimer's Disease Using MRI and Structural Network Features.

    PubMed

    Wei, Rizhen; Li, Chuhan; Fogelson, Noa; Li, Ling

    2016-01-01

    Optimized magnetic resonance imaging (MRI) features and abnormalities of brain network architectures may allow earlier detection and accurate prediction of the progression from mild cognitive impairment (MCI) to Alzheimer's disease (AD). In this study, we proposed a classification framework to distinguish MCI converters (MCIc) from MCI non-converters (MCInc) by using a combination of FreeSurfer-derived MRI features and nodal features derived from the thickness network. At the feature selection step, we first employed sparse linear regression with stability selection, for the selection of discriminative features in the iterative combinations of MRI and network measures. Subsequently the top K features of available combinations were selected as optimal features for classification. To obtain unbiased results, support vector machine (SVM) classifiers with nested cross validation were used for classification. The combination of 10 features including those from MRI and network measures attained accuracies of 66.04, 76.39, 74.66, and 73.91% for mixed conversion time, 6, 12, and 18 months before diagnosis of probable AD, respectively. Analysis of the diagnostic power of different time periods before diagnosis of probable AD showed that short-term prediction (6 and 12 months) achieved more stable and higher AUC scores compared with long-term prediction (18 months), with K-values from 1 to 30. The present results suggest that meaningful predictors composed of MRI and network measures may offer the possibility for early detection of progression from MCI to AD. PMID:27148045

  6. Prediction of Conversion from Mild Cognitive Impairment to Alzheimer's Disease Using MRI and Structural Network Features

    PubMed Central

    Wei, Rizhen; Li, Chuhan; Fogelson, Noa; Li, Ling

    2016-01-01

    Optimized magnetic resonance imaging (MRI) features and abnormalities of brain network architectures may allow earlier detection and accurate prediction of the progression from mild cognitive impairment (MCI) to Alzheimer's disease (AD). In this study, we proposed a classification framework to distinguish MCI converters (MCIc) from MCI non-converters (MCInc) by using a combination of FreeSurfer-derived MRI features and nodal features derived from the thickness network. At the feature selection step, we first employed sparse linear regression with stability selection, for the selection of discriminative features in the iterative combinations of MRI and network measures. Subsequently the top K features of available combinations were selected as optimal features for classification. To obtain unbiased results, support vector machine (SVM) classifiers with nested cross validation were used for classification. The combination of 10 features including those from MRI and network measures attained accuracies of 66.04, 76.39, 74.66, and 73.91% for mixed conversion time, 6, 12, and 18 months before diagnosis of probable AD, respectively. Analysis of the diagnostic power of different time periods before diagnosis of probable AD showed that short-term prediction (6 and 12 months) achieved more stable and higher AUC scores compared with long-term prediction (18 months), with K-values from 1 to 30. The present results suggest that meaningful predictors composed of MRI and network measures may offer the possibility for early detection of progression from MCI to AD. PMID:27148045

  7. Prediction of structural features and application to outer membrane protein identification

    NASA Astrophysics Data System (ADS)

    Yan, Renxiang; Wang, Xiaofeng; Huang, Lanqing; Yan, Feidi; Xue, Xiaoyu; Cai, Weiwen

    2015-06-01

    Protein three-dimensional (3D) structures provide insightful information in many fields of biology. One-dimensional properties derived from 3D structures such as secondary structure, residue solvent accessibility, residue depth and backbone torsion angles are helpful to protein function prediction, fold recognition and ab initio folding. Here, we predict various structural features with the assistance of neural network learning. Based on an independent test dataset, protein secondary structure prediction generates an overall Q3 accuracy of ~80%. Meanwhile, the prediction of relative solvent accessibility obtains the highest mean absolute error of 0.164, and prediction of residue depth achieves the lowest mean absolute error of 0.062. We further improve the outer membrane protein identification by including the predicted structural features in a scoring function using a simple profile-to-profile alignment. The results demonstrate that the accuracy of outer membrane protein identification can be improved by ~3% at a 1% false positive level when structural features are incorporated. Finally, our methods are available as two convenient and easy-to-use programs. One is PSSM-2-Features for predicting secondary structure, relative solvent accessibility, residue depth and backbone torsion angles, the other is PPA-OMP for identifying outer membrane proteins from proteomes.

  8. Clinicopathological and molecular features of malignant optic pathway glioma in an adult.

    PubMed

    Nagaishi, Masaya; Sugiura, Yoshiki; Takano, Issei; Tanaka, Yoshihiro; Suzuki, Kensuke; Yokoo, Hideaki; Hyodo, Akio

    2015-01-01

    Malignant gliomas of the optic pathway are rare, and their genetic alterations are poorly understood. We describe a 64-year-old woman with anaplastic astrocytoma originating from the optic pathway, together with the molecular features. She presented with progressive visual field loss, and a biopsy sample was obtained from the lesion in the optic chiasm. She underwent radiosurgery concomitant with temozolomide chemotherapy, and subsequently remained stable for 10 months after initial presentation. Molecular analysis indicated that the mass may have shared common molecular genetic features with conventional primary astrocytic gliomas but not pilocytic gliomas, which supported the morphologic diagnosis of anaplastic astrocytoma. Molecular analysis of malignant optic pathway gliomas in adults is useful for distinguishing between high-grade gliomas and anaplastic pilocytic astrocytomas, and for determining further therapy. PMID:25150758

  9. Acinar Cell Carcinoma of the Pancreas: Overview of Clinicopathologic Features and Insights into the Molecular Pathology

    PubMed Central

    La Rosa, Stefano; Sessa, Fausto; Capella, Carlo

    2015-01-01

    Acinar cell carcinomas (ACCs) of the pancreas are rare pancreatic neoplasms accounting for about 1–2% of pancreatic tumors in adults and about 15% in pediatric subjects. They show different clinical symptoms at presentation, different morphological features, different outcomes, and different molecular alterations. This heterogeneous clinicopathological spectrum may give rise to difficulties in the clinical and pathological diagnosis with consequential therapeutic and prognostic implications. The molecular mechanisms involved in the onset and progression of ACCs are still not completely understood, although in recent years, several attempts have been made to clarify the molecular mechanisms involved in ACC biology. In this paper, we will review the main clinicopathological and molecular features of pancreatic ACCs of both adult and pediatric subjects to give the reader a comprehensive overview of this rare tumor type. PMID:26137463

  10. In silico prediction of major drug clearance pathways by support vector machines with feature-selected descriptors.

    PubMed

    Toshimoto, Kouta; Wakayama, Naomi; Kusama, Makiko; Maeda, Kazuya; Sugiyama, Yuichi; Akiyama, Yutaka

    2014-11-01

    We have previously established an in silico classification method ("CPathPred") to predict the major clearance pathways of drugs based on an empirical decision with only four physicochemical descriptors-charge, molecular weight, octanol-water distribution coefficient, and protein unbound fraction in plasma-using a rectangular method. In this study, we attempted to improve the prediction performance of the method by introducing a support vector machine (SVM) and increasing the number of descriptors. The data set consisted of 141 approved drugs whose major clearance pathways were classified into metabolism by CYP3A4, CYP2C9, or CYP2D6; organic anion transporting polypeptide-mediated hepatic uptake; or renal excretion. With the same four default descriptors as used in CPathPred, the SVM-based predictor (named "default descriptor SVM") resulted in higher prediction performance compared with a rectangular-based predictor judged by 10-fold cross-validation. Two SVM-based predictors were also established by adding some descriptors as follows: 1) 881 descriptors predicted in silico from the chemical structures of drugs in addition to 4 default descriptors ("885 descriptor SVM"); and 2) selected descriptors extracted by a feature selection based on a greedy algorithm with default descriptors ("feature selection SVM"). The prediction accuracies of the rectangular-based predictor, default descriptor SVM, 885 descriptor SVM, and feature selection SVM were 0.49, 0.60, 0.72, and 0.91, respectively, and the overall precision values for these four methods were 0.72, 0.77, 0.86, and 0.98, respectively. In conclusion, we successfully constructed SVM-based predictors with limited numbers of descriptors to classify the major clearance pathways of drugs in humans with high prediction performance. PMID:25128502

  11. Clinical and Molecular Cytogenetic Characterisation of Children with Developmental Delay and Dysmorphic Features

    PubMed Central

    BERTOK, Sara; ŽERJAV TANŠEK, Mojca; KOTNIK, Primož; BATTELINO, Tadej; VOLK, Marija; PECILE, Vanna; CLEVA, Lisa; GASPARINI, Paolo; KOVAČ, Jernej; HOVNIK, Tinka

    2015-01-01

    Introduction Developmental delay and dysmorphic features affect 1 – 3 % of paediatric population. In the last few years molecular cytogenetic high resolution techniques (comparative genomic hybridization arrays and single-nucleotide polymorphism arrays) have been proven to be a first-tier choice for clinical diagnostics of developmental delay and dysmorphic features. Methods and results In the present article we describe the clinical advantages of molecular cytogenetic approach (comparative genomic hybridization arrays and single nucleotide polymorphism arrays) in the diagnostic procedure of two children with developmental delay, dysmorphic features and additional morphological phenotypes. Additionally, we demonstrate the necessity of fluorescent in situ hybridization utilisation to identify the localisation and underlying mechanism of detected chromosomal rearrangement. Conclusions Two types of chromosomal abnormalities were identified and confirmed using different molecular genetic approaches. Comparative genomic hybridization arrays and single nucleotide polymorphism arrays are hereby presented as important methods to identify chromosomal imbalances in patients with developmental delay and dysmorphic features. We emphasize the importance of molecular genetic testing in patients’ parents for the demonstration of the origin and clinical importance of the aberrations prior determined in the patients. The results obtained using molecular cytogenetic high resolution techniques methods are the cornerstone for proper genetic counselling to the affected families.

  12. Modified Logistic Regression Models Using Gene Coexpression and Clinical Features to Predict Prostate Cancer Progression

    PubMed Central

    Zhao, Hongya; Logothetis, Christopher J.; Gorlov, Ivan P.; Zeng, Jia; Dai, Jianguo

    2013-01-01

    Predicting disease progression is one of the most challenging problems in prostate cancer research. Adding gene expression data to prediction models that are based on clinical features has been proposed to improve accuracy. In the current study, we applied a logistic regression (LR) model combining clinical features and gene co-expression data to improve the accuracy of the prediction of prostate cancer progression. The top-scoring pair (TSP) method was used to select genes for the model. The proposed models not only preserved the basic properties of the TSP algorithm but also incorporated the clinical features into the prognostic models. Based on the statistical inference with the iterative cross validation, we demonstrated that prediction LR models that included genes selected by the TSP method provided better predictions of prostate cancer progression than those using clinical variables only and/or those that included genes selected by the one-gene-at-a-time approach. Thus, we conclude that TSP selection is a useful tool for feature (and/or gene) selection to use in prognostic models and our model also provides an alternative for predicting prostate cancer progression. PMID:24367394

  13. Cellular automata with object-oriented features for parallel molecular network modeling.

    PubMed

    Zhu, Hao; Wu, Yinghui; Huang, Sui; Sun, Yan; Dhar, Pawan

    2005-06-01

    Cellular automata are an important modeling paradigm for studying the dynamics of large, parallel systems composed of multiple, interacting components. However, to model biological systems, cellular automata need to be extended beyond the large-scale parallelism and intensive communication in order to capture two fundamental properties characteristic of complex biological systems: hierarchy and heterogeneity. This paper proposes extensions to a cellular automata language, Cellang, to meet this purpose. The extended language, with object-oriented features, can be used to describe the structure and activity of parallel molecular networks within cells. Capabilities of this new programming language include object structure to define molecular programs within a cell, floating-point data type and mathematical functions to perform quantitative computation, message passing capability to describe molecular interactions, as well as new operators, statements, and built-in functions. We discuss relevant programming issues of these features, including the object-oriented description of molecular interactions with molecule encapsulation, message passing, and the description of heterogeneity and anisotropy at the cell and molecule levels. By enabling the integration of modeling at the molecular level with system behavior at cell, tissue, organ, or even organism levels, the program will help improve our understanding of how complex and dynamic biological activities are generated and controlled by parallel functioning of molecular networks. Index Terms-Cellular automata, modeling, molecular network, object-oriented. PMID:16117022

  14. MINT: Mutual Information Based Transductive Feature Selection for Genetic Trait Prediction.

    PubMed

    He, Dan; Rish, Irina; Haws, David; Parida, Laxmi

    2016-01-01

    Whole genome prediction of complex phenotypic traits using high-density genotyping arrays has attracted a lot of attention, as it is relevant to the fields of plant and animal breeding and genetic epidemiology. Since the number of genotypes is generally much bigger than the number of samples, predictive models suffer from the curse of dimensionality. The curse of dimensionality problem not only affects the computational efficiency of a particular genomic selection method, but can also lead to a poor performance, mainly due to possible overfitting, or un-informative features. In this work, we propose a novel transductive feature selection method, called MINT, which is based on the MRMR (Max-Relevance and Min-Redundancy) criterion. We apply MINT on genetic trait prediction problems and show that, in general, MINT is a better feature selection method than the state-of-the-art inductive method MRMR. PMID:27295642

  15. Adaptive reliance on the most stable sensory predictions enhances perceptual feature extraction of moving stimuli

    PubMed Central

    Kumar, Neeraj

    2016-01-01

    The prediction of the sensory outcomes of action is thought to be useful for distinguishing self- vs. externally generated sensations, correcting movements when sensory feedback is delayed, and learning predictive models for motor behavior. Here, we show that aspects of another fundamental function—perception—are enhanced when they entail the contribution of predicted sensory outcomes and that this enhancement relies on the adaptive use of the most stable predictions available. We combined a motor-learning paradigm that imposes new sensory predictions with a dynamic visual search task to first show that perceptual feature extraction of a moving stimulus is poorer when it is based on sensory feedback that is misaligned with those predictions. This was possible because our novel experimental design allowed us to override the “natural” sensory predictions present when any action is performed and separately examine the influence of these two sources on perceptual feature extraction. We then show that if the new predictions induced via motor learning are unreliable, rather than just relying on sensory information for perceptual judgments, as is conventionally thought, then subjects adaptively transition to using other stable sensory predictions to maintain greater accuracy in their perceptual judgments. Finally, we show that when sensory predictions are not modified at all, these judgments are sharper when subjects combine their natural predictions with sensory feedback. Collectively, our results highlight the crucial contribution of sensory predictions to perception and also suggest that the brain intelligently integrates the most stable predictions available with sensory information to maintain high fidelity in perceptual decisions. PMID:26823516

  16. Adaptive reliance on the most stable sensory predictions enhances perceptual feature extraction of moving stimuli.

    PubMed

    Kumar, Neeraj; Mutha, Pratik K

    2016-03-01

    The prediction of the sensory outcomes of action is thought to be useful for distinguishing self- vs. externally generated sensations, correcting movements when sensory feedback is delayed, and learning predictive models for motor behavior. Here, we show that aspects of another fundamental function-perception-are enhanced when they entail the contribution of predicted sensory outcomes and that this enhancement relies on the adaptive use of the most stable predictions available. We combined a motor-learning paradigm that imposes new sensory predictions with a dynamic visual search task to first show that perceptual feature extraction of a moving stimulus is poorer when it is based on sensory feedback that is misaligned with those predictions. This was possible because our novel experimental design allowed us to override the "natural" sensory predictions present when any action is performed and separately examine the influence of these two sources on perceptual feature extraction. We then show that if the new predictions induced via motor learning are unreliable, rather than just relying on sensory information for perceptual judgments, as is conventionally thought, then subjects adaptively transition to using other stable sensory predictions to maintain greater accuracy in their perceptual judgments. Finally, we show that when sensory predictions are not modified at all, these judgments are sharper when subjects combine their natural predictions with sensory feedback. Collectively, our results highlight the crucial contribution of sensory predictions to perception and also suggest that the brain intelligently integrates the most stable predictions available with sensory information to maintain high fidelity in perceptual decisions. PMID:26823516

  17. Scoring multiple features to predict drug disease associations using information fusion and aggregation.

    PubMed

    Moghadam, H; Rahgozar, M; Gharaghani, S

    2016-08-01

    Prediction of drug-disease associations is one of the current fields in drug repositioning that has turned into a challenging topic in pharmaceutical science. Several available computational methods use network-based and machine learning approaches to reposition old drugs for new indications. However, they often ignore features of drugs and diseases as well as the priority and importance of each feature, relation, or interactions between features and the degree of uncertainty. When predicting unknown drug-disease interactions there are diverse data sources and multiple features available that can provide more accurate and reliable results. This information can be collectively mined using data fusion methods and aggregation operators. Therefore, we can use the feature fusion method to make high-level features. We have proposed a computational method named scored mean kernel fusion (SMKF), which uses a new method to score the average aggregation operator called scored mean. To predict novel drug indications, this method systematically combines multiple features related to drugs or diseases at two levels: the drug-drug level and the drug-disease level. The purpose of this study was to investigate the effect of drug and disease features as well as data fusion to predict drug-disease interactions. The method was validated against a well-established drug-disease gold-standard dataset. When compared with the available methods, our proposed method outperformed them and competed well in performance with area under cover (AUC) of 0.91, F-measure of 84.9% and Matthews correlation coefficient of 70.31%. PMID:27455069

  18. MRI signal and texture features for the prediction of MCI to Alzheimer's disease progression

    NASA Astrophysics Data System (ADS)

    Martínez-Torteya, Antonio; Rodríguez-Rojas, Juan; Celaya-Padilla, José M.; Galván-Tejada, Jorge I.; Treviño, Victor; Tamez-Peña, José G.

    2014-03-01

    An early diagnosis of Alzheimer's disease (AD) confers many benefits. Several biomarkers from different information modalities have been proposed for the prediction of MCI to AD progression, where features extracted from MRI have played an important role. However, studies have focused almost exclusively in the morphological characteristics of the images. This study aims to determine whether features relating to the signal and texture of the image could add predictive power. Baseline clinical, biological and PET information, and MP-RAGE images for 62 subjects from the Alzheimer's Disease Neuroimaging Initiative were used in this study. Images were divided into 83 regions and 50 features were extracted from each one of these. A multimodal database was constructed, and a feature selection algorithm was used to obtain an accurate and small logistic regression model, which achieved a cross-validation accuracy of 0.96. These model included six features, five of them obtained from the MP-RAGE image, and one obtained from genotyping. A risk analysis divided the subjects into low-risk and high-risk groups according to a prognostic index, showing that both groups are statistically different (p-value of 2.04e-11). The results demonstrate that MRI features related to both signal and texture, add MCI to AD predictive power, and support the idea that multimodal biomarkers outperform single-modality biomarkers.

  19. Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types.

    PubMed

    Hayat, Maqsood; Khan, Asifullah

    2012-05-01

    Membrane proteins are a major class of proteins and encoded by approximately 20% to 30% of genes in most organisms. In this work, a two-layer novel membrane protein prediction system, called Mem-PHybrid, is proposed. It is able to first identify the protein query as a membrane or nonmembrane protein. In the second level, it further identifies the type of membrane protein. The proposed Mem-PHybrid prediction system is based on hybrid features, whereby a fusion of both the physicochemical and split amino acid composition-based features is performed. This enables the proposed Mem-PHybrid to exploit the discrimination capabilities of both types of feature extraction strategy. In addition, minimum redundancy and maximum relevance has also been applied to reduce the dimensionality of a feature vector. We employ random forest, evidence-theoretic K-nearest neighbor, and support vector machine (SVM) as classifiers and analyze their performance on two datasets. SVM using hybrid features yields the highest accuracy of 89.6% and 97.3% on dataset1 and 91.5% and 95.5% on dataset2 for jackknife and independent dataset tests, respectively. The enhanced prediction performance of Mem-PHybrid is largely attributed to the exploitation of the discrimination power of the hybrid features and of the learning capability of SVM. Mem-PHybrid is accessible at http://www.111.68.99.218/Mem-PHybrid. PMID:22342883

  20. Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision.

    PubMed

    Zhang, Lei; Li, Jing; Xiao, Yun; Cui, Hao; Du, Guoqing; Wang, Ying; Li, Ziyao; Wu, Tong; Li, Xia; Tian, Jiawei

    2015-01-01

    Breast cancer is molecularly heterogeneous and categorized into four molecular subtypes: Luminal-A, Luminal-B, HER2-amplified and Triple-negative. In this study, we aimed to apply an ensemble decision approach to identify the ultrasound and clinical features related to the molecular subtypes. We collected ultrasound and clinical features from 1,000 breast cancer patients and performed immunohistochemistry on these samples. We used the ensemble decision approach to select unique features and to construct decision models. The decision model for Luminal-A subtype was constructed based on the presence of an echogenic halo and post-acoustic shadowing or indifference. The decision model for Luminal-B subtype was constructed based on the absence of an echogenic halo and vascularity. The decision model for HER2-amplified subtype was constructed based on the presence of post-acoustic enhancement, calcification, vascularity and advanced age. The model for Triple-negative subtype followed two rules. One was based on irregular shape, lobulate margin contour, the absence of calcification and hypovascularity, whereas the other was based on oval shape, hypovascularity and micro-lobulate margin contour. The accuracies of the models were 83.8%, 77.4%, 87.9% and 92.7%, respectively. We identified specific features of each molecular subtype and expanded the scope of ultrasound for making diagnoses using these decision models. PMID:26046791

  1. A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction.

    PubMed

    Du, Xiuquan; Cheng, Jiaxing; Zheng, Tingting; Duan, Zheng; Qian, Fulan

    2014-01-01

    Protein-protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp. PMID:25046746

  2. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

    PubMed Central

    Khan, Tarik A.; Friedensohn, Simon; de Vries, Arthur R. Gorter; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T.

    2016-01-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518

  3. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting.

    PubMed

    Khan, Tarik A; Friedensohn, Simon; Gorter de Vries, Arthur R; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T

    2016-03-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion-the intraclonal diversity index-which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518

  4. Molecular structures of carotenoids as predicted by MNDO-AM1 molecular orbital calculations

    NASA Astrophysics Data System (ADS)

    Hashimoto, Hideki; Yoda, Takeshi; Kobayashi, Takayoshi; Young, Andrew J.

    2002-02-01

    Semi-empirical molecular orbital calculations using AM1 Hamiltonian (MNDO-AM1 method) were performed for a number of biologically important carotenoid molecules, namely all- trans-β-carotene, all- trans-zeaxanthin, and all- trans-violaxanthin (found in higher plants and algae) together with all- trans-canthaxanthin, all- trans-astaxanthin, and all- trans-tunaxanthin in order to predict their stable structures. The molecular structures of all- trans-β-carotene, all- trans-canthaxanthin, and all- trans-astaxanthin predicted based on molecular orbital calculations were compared with those determined by X-ray crystallography. Predicted bond lengths, bond angles, and dihedral angles showed an excellent agreement with those determined experimentally, a fact that validated the present theoretical calculations. Comparison of the bond lengths, bond angles and dihedral angles of the most stable conformer among all the carotenoid molecules showed that the displacements are localized around the substituent groups and hence around the cyclohexene rings. The most stable conformers of all- trans-zeaxanthin and all- trans-violaxanthin gave rise to a torsion angle around the C6-C7 bond to be ±48.7 and -84.8°, respectively. This difference is a key factor in relation to the biological function of these two carotenoids in plants and algae (the xanthophyll cycle). Further analyses by calculating the atomic charges and using enpartment calculations (division of bond energies between component atoms) were performed to ascribe the cause of the different observed torsion angles.

  5. Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression

    PubMed Central

    Laimighofer, Michael; Krumsiek, Jan; Theis, Fabian J.

    2016-01-01

    Abstract With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN. PMID:26894327

  6. Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression.

    PubMed

    Laimighofer, Michael; Krumsiek, Jan; Buettner, Florian; Theis, Fabian J

    2016-04-01

    With widespread availability of omics profiling techniques, the analysis and interpretation of high-dimensional omics data, for example, for biomarkers, is becoming an increasingly important part of clinical medicine because such datasets constitute a promising resource for predicting survival outcomes. However, early experience has shown that biomarkers often generalize poorly. Thus, it is crucial that models are not overfitted and give accurate results with new data. In addition, reliable detection of multivariate biomarkers with high predictive power (feature selection) is of particular interest in clinical settings. We present an approach that addresses both aspects in high-dimensional survival models. Within a nested cross-validation (CV), we fit a survival model, evaluate a dataset in an unbiased fashion, and select features with the best predictive power by applying a weighted combination of CV runs. We evaluate our approach using simulated toy data, as well as three breast cancer datasets, to predict the survival of breast cancer patients after treatment. In all datasets, we achieve more reliable estimation of predictive power for unseen cases and better predictive performance compared to the standard CoxLasso model. Taken together, we present a comprehensive and flexible framework for survival models, including performance estimation, final feature selection, and final model construction. The proposed algorithm is implemented in an open source R package (SurvRank) available on CRAN. PMID:26894327

  7. Multivariate Feature Selection for Predicting Scour-Related Bridge Damage using a Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Anderson, I.

    2015-12-01

    Scour and hydraulic damage are the most common cause of bridge failure, reported to be responsible for over 60% of bridge failure nationwide. Scour is a complex process, and is likely an epistatic function of both bridge and stream conditions that are both stationary and in dynamic flux. Bridge inspections, conducted regularly on bridges nationwide, rate bridge health assuming a static stream condition, and typically do not include dynamically changing geomorphological adjustments. The Vermont Agency of Natural Resources stream geomorphic assessment data could add value into the current bridge inspection and scour design. The 2011 bridge damage from Tropical Storm Irene served as a case study for feature selection to improve bridge scour damage prediction in extreme events. The bridge inspection (with over 200 features on more than 300 damaged and 2,000 non-damaged bridges), and the stream geomorphic assessment (with over 300 features on more than 5000 stream reaches) constitute "Big Data", and together have the potential to generate large numbers of combined features ("epistatic relationships") that might better predict scour-related bridge damage. The potential combined features pose significant computational challenges for traditional statistical techniques (e.g., multivariate logistic regression). This study uses a genetic algorithm to perform a search of the multivariate feature space to identify epistatic relationships that are indicative of bridge scour damage. The combined features identified could be used to improve bridge scour design, and to better monitor and rate bridge scour vulnerability.

  8. Patient feature based dosimetric Pareto front prediction in esophageal cancer radiotherapy

    SciTech Connect

    Wang, Jiazhou; Zhao, Kuaike; Peng, Jiayuan; Xie, Jiang; Chen, Junchao; Zhang, Zhen; Hu, Weigang; Jin, Xiance; Studenski, Matthew

    2015-02-15

    Purpose: To investigate the feasibility of the dosimetric Pareto front (PF) prediction based on patient’s anatomic and dosimetric parameters for esophageal cancer patients. Methods: Eighty esophagus patients in the authors’ institution were enrolled in this study. A total of 2928 intensity-modulated radiotherapy plans were obtained and used to generate PF for each patient. On average, each patient had 36.6 plans. The anatomic and dosimetric features were extracted from these plans. The mean lung dose (MLD), mean heart dose (MHD), spinal cord max dose, and PTV homogeneity index were recorded for each plan. Principal component analysis was used to extract overlap volume histogram (OVH) features between PTV and other organs at risk. The full dataset was separated into two parts; a training dataset and a validation dataset. The prediction outcomes were the MHD and MLD. The spearman’s rank correlation coefficient was used to evaluate the correlation between the anatomical features and dosimetric features. The stepwise multiple regression method was used to fit the PF. The cross validation method was used to evaluate the model. Results: With 1000 repetitions, the mean prediction error of the MHD was 469 cGy. The most correlated factor was the first principal components of the OVH between heart and PTV and the overlap between heart and PTV in Z-axis. The mean prediction error of the MLD was 284 cGy. The most correlated factors were the first principal components of the OVH between heart and PTV and the overlap between lung and PTV in Z-axis. Conclusions: It is feasible to use patients’ anatomic and dosimetric features to generate a predicted Pareto front. Additional samples and further studies are required improve the prediction model.

  9. PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites.

    PubMed

    Song, Jiangning; Tan, Hao; Perry, Andrew J; Akutsu, Tatsuya; Webb, Geoffrey I; Whisstock, James C; Pike, Robert N

    2012-01-01

    The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using

  10. Genomic Signal Processing: Predicting Basic Molecular Biological Principles

    NASA Astrophysics Data System (ADS)

    Alter, Orly

    2005-03-01

    Advances in high-throughput technologies enable acquisition of different types of molecular biological data, monitoring the flow of biological information as DNA is transcribed to RNA, and RNA is translated to proteins, on a genomic scale. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment and drug development. Recently we described data-driven models for genome-scale molecular biological data, which use singular value decomposition (SVD) and the comparative generalized SVD (GSVD). Now we describe an integrative data-driven model, which uses pseudoinverse projection (1). We also demonstrate the predictive power of these matrix algebra models (2). The integrative pseudoinverse projection model formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the ``basis'' set. The mathematical variables of this integrative model, the pseudoinverse correlation patterns that are uncovered in the data, represent independent processes and corresponding cellular states (such as observed genome-wide effects of known regulators or transcription factors, the biological components of the cellular machinery that generate the genomic signals, and measured samples in which these regulators or transcription factors are over- or underactive). Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis, and gives a global picture of the correlations and possibly also causal coordination of

  11. Critical Features Predicting Sustained Implementation of School-Wide Positive Behavioral Interventions and Supports

    ERIC Educational Resources Information Center

    Mathews, Susanna; McIntosh, Kent; Frank, Jennifer L.; May, Seth L.

    2014-01-01

    The current study explored the extent to which a common measure of perceived implementation of critical features of Positive Behavioral Interventions and Supports (PBIS) predicted fidelity of implementation 3 years later. Respondents included school personnel from 261 schools across the United States implementing PBIS. School teams completed the…

  12. Critical Features Predicting Sustained Implementation of School-Wide Positive Behavior Support

    ERIC Educational Resources Information Center

    Mathews, Susanna; McIntosh, Kent; Frank, Jennifer; May, Seth

    2014-01-01

    The current study explored the extent to which a common measure of perceived implementation of critical features of School-wide Positive Behavior Support (SWPBS) predicted fidelity of implementation 3 years later. Respondents included school personnel from 261 schools across the United States implementing SWPBS. School teams completed the…

  13. Improved Species-Specific Lysine Acetylation Site Prediction Based on a Large Variety of Features Set

    PubMed Central

    Wuyun, Qiqige; Zheng, Wei; Zhang, Yanping; Ruan, Jishou; Hu, Gang

    2016-01-01

    Lysine acetylation is a major post-translational modification. It plays a vital role in numerous essential biological processes, such as gene expression and metabolism, and is related to some human diseases. To fully understand the regulatory mechanism of acetylation, identification of acetylation sites is first and most important. However, experimental identification of protein acetylation sites is often time consuming and expensive. Therefore, the alternative computational methods are necessary. Here, we developed a novel tool, KA-predictor, to predict species-specific lysine acetylation sites based on support vector machine (SVM) classifier. We incorporated different types of features and employed an efficient feature selection on each type to form the final optimal feature set for model learning. And our predictor was highly competitive for the majority of species when compared with other methods. Feature contribution analysis indicated that HSE features, which were firstly introduced for lysine acetylation prediction, significantly improved the predictive performance. Particularly, we constructed a high-accurate structure dataset of H.sapiens from PDB to analyze the structural properties around lysine acetylation sites. Our datasets and a user-friendly local tool of KA-predictor can be freely available at http://sourceforge.net/p/ka-predictor. PMID:27183223

  14. Improved Species-Specific Lysine Acetylation Site Prediction Based on a Large Variety of Features Set.

    PubMed

    Wuyun, Qiqige; Zheng, Wei; Zhang, Yanping; Ruan, Jishou; Hu, Gang

    2016-01-01

    Lysine acetylation is a major post-translational modification. It plays a vital role in numerous essential biological processes, such as gene expression and metabolism, and is related to some human diseases. To fully understand the regulatory mechanism of acetylation, identification of acetylation sites is first and most important. However, experimental identification of protein acetylation sites is often time consuming and expensive. Therefore, the alternative computational methods are necessary. Here, we developed a novel tool, KA-predictor, to predict species-specific lysine acetylation sites based on support vector machine (SVM) classifier. We incorporated different types of features and employed an efficient feature selection on each type to form the final optimal feature set for model learning. And our predictor was highly competitive for the majority of species when compared with other methods. Feature contribution analysis indicated that HSE features, which were firstly introduced for lysine acetylation prediction, significantly improved the predictive performance. Particularly, we constructed a high-accurate structure dataset of H.sapiens from PDB to analyze the structural properties around lysine acetylation sites. Our datasets and a user-friendly local tool of KA-predictor can be freely available at http://sourceforge.net/p/ka-predictor. PMID:27183223

  15. Comparison of the predictive power of beef surface wavelet texture features at high and low magnification.

    PubMed

    Jackman, Patrick; Sun, Da-Wen; Allen, Paul

    2009-07-01

    Beef longissimus dorsi surface texture is an indicator used in predicting beef palatability by expert graders. Computer vision systems have previously used imaging at normal view to develop surface texture features with some success. Good models of beef overall acceptability using imaging at high magnification have been recently developed. As a comparison the same surface texture features were computed from the corresponding images at normal view and used to model overall acceptability. Both sets of texture features were also combined with muscle colour and marbling features and used to model overall acceptability. Models using texture features alone were more successful at normal modality. However colour and marbling features combined much better with texture features at high modality to yield the most accurate model of overall acceptability (r(2)=0.93). Accurate Partial Least Squares Regression (PLSR) models were computed at both modalities with and without inclusion of colour and marbling features. Addition of squared terms to the models failed to improve accuracy. PMID:20416713

  16. Non-linear feature extraction from HRV signal for mortality prediction of ICU cardiovascular patient.

    PubMed

    Karimi Moridani, Mohammad; Setarehdan, Seyed Kamaledin; Motie Nasrabadi, Ali; Hajinasrollah, Esmaeil

    2016-04-01

    Intensive care unit (ICU) patients are at risk of in-ICU morbidities and mortality, making specific systems for identifying at-risk patients a necessity for improving clinical care. This study presents a new method for predicting in-hospital mortality using heart rate variability (HRV) collected from the times of a patient's ICU stay. In this paper, a HRV time series processing based method is proposed for mortality prediction of ICU cardiovascular patients. HRV signals were obtained measuring R-R time intervals. A novel method, named return map, is then developed that reveals useful information from the HRV time series. This study also proposed several features that can be extracted from the return map, including the angle between two vectors, the area of triangles formed by successive points, shortest distance to 45° line and their various combinations. Finally, a thresholding technique is proposed to extract the risk period and to predict mortality. The data used to evaluate the proposed algorithm obtained from 80 cardiovascular ICU patients, from the first 48 h of the first ICU stay of 40 males and 40 females. This study showed that the angle feature has on average a sensitivity of 87.5% (with 12 false alarms), the area feature has on average a sensitivity of 89.58% (with 10 false alarms), the shortest distance feature has on average a sensitivity of 85.42% (with 14 false alarms) and, finally, the combined feature has on average a sensitivity of 92.71% (with seven false alarms). The results showed that the last half an hour before the patient's death is very informative for diagnosing the patient's condition and to save his/her life. These results confirm that it is possible to predict mortality based on the features introduced in this paper, relying on the variations of the HRV dynamic characteristics. PMID:27028609

  17. Predicting and explaining the movement of mesoscale oceanographic features using CLIPS

    NASA Technical Reports Server (NTRS)

    Bridges, Susan; Chen, Liang-Chun; Lybanon, Matthew

    1994-01-01

    The Naval Research Laboratory has developed an oceanographic expert system that describes the evolution of mesoscale features in the Gulf Stream region of the northwest Atlantic Ocean. These features include the Gulf Stream current and the warm and cold core eddies associated with the Gulf Stream. An explanation capability was added to the eddy prediction component of the expert system in order to allow the system to justify the reasoning process it uses to make predictions. The eddy prediction and explanation components of the system have recently been redesigned and translated from OPS83 to C and CLIPS and the new system is called WATE (Where Are Those Eddies). The new design has improved the system's readability, understandability and maintainability and will also allow the system to be incorporated into the Semi-Automated Mesoscale Analysis System which will eventually be embedded into the Navy's Tactical Environmental Support System, Third Generation, TESS(3).

  18. The prognostic impact of clinical and molecular features in hairy cell leukaemia variant and splenic marginal zone lymphoma.

    PubMed

    Hockley, Sarah L; Else, Monica; Morilla, Alison; Wotherspoon, Andrew; Dearden, Claire; Catovsky, Daniel; Gonzalez, David; Matutes, Estella

    2012-08-01

    Hairy cell leukaemia variant (HCL-variant) and splenic marginal zone lymphoma (SMZL) are disorders with overlapping features. We investigated the prognostic impact in these disorders of clinical and molecular features including IGH VDJ rearrangements, IGHV gene usage and TP 53 mutations. Clinical and laboratory data were collected before therapy from 35 HCL-variant and 68 SMZL cases. End-points were the need for treatment and overall survival. 97% of HCL-variant and 77% of SMZL cases required treatment (P = 0·009). Survival at 5 years was significantly worse in HCL-variant [57% (95% confidence interval 38-73%)] compared with SMZL [84% (71-91%); Hazard Ratio 2·25 (1·20-4·25), P = 0·01]. In HCL-variant, adverse prognostic factors for survival were older age (P = 0·04), anaemia (P = 0·01) and TP 53 mutations (P = 0·02). In SMZL, splenomegaly, anaemia and IGHV genes with >98% homology to the germline predicted the need for treatment; older age, anaemia and IGHV unmutated genes (100% homology) predicted shorter survival. IGHV gene usage had no impact on clinical outcome in either disease. The combination of unfavourable factors allowed patients to be stratified into risk groups with significant differences in survival. Although HCL-variant and SMZL share some features, they have different outcomes, influenced by clinical and biological factors. PMID:22594855

  19. Use of the molecular connectivity index to predict chemical biotransfer

    SciTech Connect

    Dowdy, D.L.; McKone, T.E.; Hsieh, D.P.H.

    1994-12-31

    Chemicals released into the environment can pose a danger to organisms if exposure occurs. In order to assess the level of risk, it is necessary to first determine if a chemical is capable of biotransfer from a given environmental medium into a particular biological system. Experimental determination of biotransfer factors (BTF), defined as the ratio of the concentration of a chemical in an organism or tissue to that in the exposure medium, is usually difficult, expensive, and time consuming. Since an accurate measurement of BTF is crucial to exposure and risk assessment, it would be advantageous if BTF could be estimated from a chemical property that is quantifiable with high precision. The molecular connectivity index (MCI) is such a chemical property, which in theory encodes information about molecular size, branching, cyclization, saturation, and heteroatom content. MCI`s are readily obtainable from chemical structure and the periodic table, requiring no experimental measurement. The results indicate a strong correlation between the MCI and BTF values for animal tissue, milk, and vegetation. Using MCI to estimate BTF could provide a faster, more cost effective, and more accurate method for predicting chemical biotransfer.

  20. Computer-aided breast MR image feature analysis for prediction of tumor response to chemotherapy

    SciTech Connect

    Aghaei, Faranak; Tan, Maxine; Liu, Hong; Zheng, Bin; Hollingsworth, Alan B.; Qian, Wei

    2015-11-15

    Purpose: To identify a new clinical marker based on quantitative kinetic image features analysis and assess its feasibility to predict tumor response to neoadjuvant chemotherapy. Methods: The authors assembled a dataset involving breast MR images acquired from 68 cancer patients before undergoing neoadjuvant chemotherapy. Among them, 25 patients had complete response (CR) and 43 had partial and nonresponse (NR) to chemotherapy based on the response evaluation criteria in solid tumors. The authors developed a computer-aided detection scheme to segment breast areas and tumors depicted on the breast MR images and computed a total of 39 kinetic image features from both tumor and background parenchymal enhancement regions. The authors then applied and tested two approaches to classify between CR and NR cases. The first one analyzed each individual feature and applied a simple feature fusion method that combines classification results from multiple features. The second approach tested an attribute selected classifier that integrates an artificial neural network (ANN) with a wrapper subset evaluator, which was optimized using a leave-one-case-out validation method. Results: In the pool of 39 features, 10 yielded relatively higher classification performance with the areas under receiver operating characteristic curves (AUCs) ranging from 0.61 to 0.78 to classify between CR and NR cases. Using a feature fusion method, the maximum AUC = 0.85 ± 0.05. Using the ANN-based classifier, AUC value significantly increased to 0.96 ± 0.03 (p < 0.01). Conclusions: This study demonstrated that quantitative analysis of kinetic image features computed from breast MR images acquired prechemotherapy has potential to generate a useful clinical marker in predicting tumor response to chemotherapy.

  1. Relationship of carbohydrate molecular spectroscopic features in combined feeds to carbohydrate utilization and availability in ruminants

    NASA Astrophysics Data System (ADS)

    Zhang, Xuewei; Yu, Peiqiang

    To date, there is no study on the relationship between carbohydrate (CHO) molecular structures and nutrient availability of combined feeds in ruminants. The objective of this study was to use molecular spectroscopy to reveal the relationship between CHO molecular spectral profiles (in terms of functional groups (biomolecular, biopolymer) spectral peak area and height intensity) and CHO chemical profiles, CHO subfractions, energy values, and CHO rumen degradation kinetics of combined feeds of hulless barley with pure wheat dried distillers grains with solubles (DDGS) at five different combination ratios (hulless barley to pure wheat DDGS: 100:0, 75:25, 50:50, 25:75, 0:100). The molecular spectroscopic parameters assessed included: lignin biopolymer molecular spectra profile (peak area and height, region and baseline: ca. 1539-1504 cm-1); structural carbohydrate (STCHO, peaks area region and baseline: ca. 1485-1186 cm-1) mainly associated with hemi- and cellulosic compounds; cellulosic materials peak area (centered at ca. 1240 cm-1 with region and baseline: ca. 1272-1186 cm-1); total carbohydrate (CHO, peaks area region and baseline: ca. 1186-946 cm-1). The results showed that the functional groups (biomolecular, biopolymer) in the combined feeds are sensitive to the changes of carbohydrate chemical and nutrient profiles. The changes of the CHO molecular spectroscopic features in the combined feeds were highly correlated with CHO chemical profiles, CHO subfractions, in situ CHO rumen degradation kinetics and fermentable organic matter supply. Further study is needed to investigate possibility of using CHO molecular spectral features as a predictor to estimate nutrient availability in combined feeds for animals and quantify their relationship.

  2. Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features

    PubMed Central

    Stiglic, Gregor; Povalej Brzan, Petra; Fijacko, Nino; Wang, Fei; Delibasic, Boris; Kalousis, Alexandros; Obradovic, Zoran

    2015-01-01

    Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755–0.771) to 0.769 (95% CI: 0.761–0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression. PMID:26645087

  3. General morphological and biological features of neoplasms: integration of molecular findings.

    PubMed

    Diaz-Cano, S J

    2008-07-01

    This review highlights the importance of morphology-molecular correlations for a proper implementation of new markers. It covers both general aspects of tumorigenesis (which are normally omitted in papers analysing molecular pathways) and the general mechanisms for the acquired capabilities of neoplasms. The mechanisms are also supported by appropriate diagrams for each acquired capability that include overlooked features such as mobilization of cellular resources and changes in chromatin, transcription and epigenetics; fully accepted oncogenes and tumour suppressor genes are highlighted, while the pathways are also presented as activating or inactivating with appropriate colour coding. Finally, the concepts and mechanisms presented enable us to understand the basic requirements for the appropriate implementation of molecular tests in clinical practice. In summary, the basic findings are presented to serve as a bridge to clinical applications. The current definition of neoplasm is descriptive and difficult to apply routinely. Biologically, neoplasms develop through acquisition of capabilities that involve tumour cell aspects and modified microenvironment interactions, resulting in unrestricted growth due to a stepwise accumulation of cooperative genetic alterations that affect key molecular pathways. The correlation of these molecular aspects with morphological changes is essential for better understanding of essential concepts as early neoplasms/precancerous lesions, progression/dedifferentiation, and intratumour heterogeneity. The acquired capabilities include self-maintained replication (cell cycle dysregulation), extended cell survival (cell cycle arrest, apoptosis dysregulation, and replicative lifespan), genetic instability (chromosomal and microsatellite), changes of chromatin, transcription and epigenetics, mobilization of cellular resources, and modified microenvironment interactions (tumour cells, stromal cells, extracellular, endothelium). The acquired

  4. Feature genes predicting the FLT3/ITD mutation in acute myeloid leukemia

    PubMed Central

    LI, CHENGLONG; ZHU, BIAO; CHEN, JIAO; HUANG, XIAOBING

    2016-01-01

    In the present study, gene expression profiles of acute myeloid leukemia (AML) samples were analyzed to identify feature genes with the capacity to predict the mutation status of FLT3/ITD. Two machine learning models, namely the support vector machine (SVM) and random forest (RF) methods, were used for classification. Four datasets were downloaded from the European Bioinformatics Institute, two of which (containing 371 samples, including 281 FLT3/ITD mutation-negative and 90 mutation-positive samples) were randomly defined as the training group, while the other two datasets (containing 488 samples, including 350 FLT3/ITD mutation-negative and 138 mutation-positive samples) were defined as the test group. Differentially expressed genes (DEGs) were identified by significance analysis of the micro-array data by using the training samples. The classification efficiency of the SCM and RF methods was evaluated using the following parameters: Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and the area under the receiver operating characteristic curve. Functional enrichment analysis was performed for the feature genes with DAVID. A total of 585 DEGs were identified in the training group, of which 580 were upregulated and five were downregulated. The classification accuracy rates of the two methods for the training group, the test group and the combined group using the 585 feature genes were >90%. For the SVM and RF methods, the rates of correct determination, specificity and PPV were >90%, while the sensitivity and NPV were >80%. The SVM method produced a slightly better classification effect than the RF method. A total of 13 biological pathways were overrepresented by the feature genes, mainly involving energy metabolism, chromatin organization and translation. The feature genes identified in the present study may be used to predict the mutation status of FLT3/ITD in patients with AML. PMID:27177049

  5. Predicting the Occurrence of Cave-Inhabiting Fauna Based on Features of the Earth Surface Environment

    PubMed Central

    Doctor, Daniel H.; Niemiller, Matthew L.; Weary, David J.; Young, John A.; Zigler, Kirk S.

    2016-01-01

    One of the most challenging fauna to study in situ is the obligate cave fauna because of the difficulty of sampling. Cave-limited species display patchy and restricted distributions, but it is often unclear whether the observed distribution is a sampling artifact or a true restriction in range. Further, the drivers of the distribution could be local environmental conditions, such as cave humidity, or they could be associated with surface features that are surrogates for cave conditions. If surface features can be used to predict the distribution of important cave taxa, then conservation management is more easily obtained. We examined the hypothesis that the presence of major faunal groups of cave obligate species could be predicted based on features of the earth surface. Georeferenced records of cave obligate amphipods, crayfish, fish, isopods, beetles, millipedes, pseudoscorpions, spiders, and springtails within the area of Appalachian Landscape Conservation Cooperative in the eastern United States (Illinois to Virginia and New York to Alabama) were assigned to 20 x 20 km grid cells. Habitat suitability for these faunal groups was modeled using logistic regression with twenty predictor variables within each grid cell, such as percent karst, soil features, temperature, precipitation, and elevation. Models successfully predicted the presence of a group greater than 65% of the time (mean = 88%) for the presence of single grid cell endemics, and for all faunal groups except pseudoscorpions. The most common predictor variables were latitude, percent karst, and the standard deviation of the Topographic Position Index (TPI), a measure of landscape rugosity within each grid cell. The overall success of these models points to a number of important connections between the surface and cave environments, and some of these, especially soil features and topographic variability, suggest new research directions. These models should prove to be useful tools in predicting the

  6. Predicting the Occurrence of Cave-Inhabiting Fauna Based on Features of the Earth Surface Environment.

    PubMed

    Christman, Mary C; Doctor, Daniel H; Niemiller, Matthew L; Weary, David J; Young, John A; Zigler, Kirk S; Culver, David C

    2016-01-01

    One of the most challenging fauna to study in situ is the obligate cave fauna because of the difficulty of sampling. Cave-limited species display patchy and restricted distributions, but it is often unclear whether the observed distribution is a sampling artifact or a true restriction in range. Further, the drivers of the distribution could be local environmental conditions, such as cave humidity, or they could be associated with surface features that are surrogates for cave conditions. If surface features can be used to predict the distribution of important cave taxa, then conservation management is more easily obtained. We examined the hypothesis that the presence of major faunal groups of cave obligate species could be predicted based on features of the earth surface. Georeferenced records of cave obligate amphipods, crayfish, fish, isopods, beetles, millipedes, pseudoscorpions, spiders, and springtails within the area of Appalachian Landscape Conservation Cooperative in the eastern United States (Illinois to Virginia and New York to Alabama) were assigned to 20 x 20 km grid cells. Habitat suitability for these faunal groups was modeled using logistic regression with twenty predictor variables within each grid cell, such as percent karst, soil features, temperature, precipitation, and elevation. Models successfully predicted the presence of a group greater than 65% of the time (mean = 88%) for the presence of single grid cell endemics, and for all faunal groups except pseudoscorpions. The most common predictor variables were latitude, percent karst, and the standard deviation of the Topographic Position Index (TPI), a measure of landscape rugosity within each grid cell. The overall success of these models points to a number of important connections between the surface and cave environments, and some of these, especially soil features and topographic variability, suggest new research directions. These models should prove to be useful tools in predicting the

  7. Feature genes predicting the FLT3/ITD mutation in acute myeloid leukemia.

    PubMed

    Li, Chenglong; Zhu, Biao; Chen, Jiao; Huang, Xiaobing

    2016-07-01

    In the present study, gene expression profiles of acute myeloid leukemia (AML) samples were analyzed to identify feature genes with the capacity to predict the mutation status of FLT3/ITD. Two machine learning models, namely the support vector machine (SVM) and random forest (RF) methods, were used for classification. Four datasets were downloaded from the European Bioinformatics Institute, two of which (containing 371 samples, including 281 FLT3/ITD mutation-negative and 90 mutation‑positive samples) were randomly defined as the training group, while the other two datasets (containing 488 samples, including 350 FLT3/ITD mutation-negative and 138 mutation-positive samples) were defined as the test group. Differentially expressed genes (DEGs) were identified by significance analysis of the microarray data by using the training samples. The classification efficiency of the SCM and RF methods was evaluated using the following parameters: Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and the area under the receiver operating characteristic curve. Functional enrichment analysis was performed for the feature genes with DAVID. A total of 585 DEGs were identified in the training group, of which 580 were upregulated and five were downregulated. The classification accuracy rates of the two methods for the training group, the test group and the combined group using the 585 feature genes were >90%. For the SVM and RF methods, the rates of correct determination, specificity and PPV were >90%, while the sensitivity and NPV were >80%. The SVM method produced a slightly better classification effect than the RF method. A total of 13 biological pathways were overrepresented by the feature genes, mainly involving energy metabolism, chromatin organization and translation. The feature genes identified in the present study may be used to predict the mutation status of FLT3/ITD in patients with AML. PMID:27177049

  8. Biased ART: a neural architecture that shifts attention toward previously disregarded features following an incorrect prediction.

    PubMed

    Carpenter, Gail A; Gaddam, Sai Chaitanya

    2010-04-01

    Memories in Adaptive Resonance Theory (ART) networks are based on matched patterns that focus attention on those portions of bottom-up inputs that match active top-down expectations. While this learning strategy has proved successful for both brain models and applications, computational examples show that attention to early critical features may later distort memory representations during online fast learning. For supervised learning, biased ARTMAP (bARTMAP) solves the problem of over-emphasis on early critical features by directing attention away from previously attended features after the system makes a predictive error. Small-scale, hand-computed analog and binary examples illustrate key model dynamics. Two-dimensional simulation examples demonstrate the evolution of bARTMAP memories as they are learned online. Benchmark simulations show that featural biasing also improves performance on large-scale examples. One example, which predicts movie genres and is based, in part, on the Netflix Prize database, was developed for this project. Both first principles and consistent performance improvements on all simulation studies suggest that featural biasing should be incorporated by default in all ARTMAP systems. Benchmark datasets and bARTMAP code are available from the CNS Technology Lab Website: http://techlab.bu.edu/bART/. PMID:19811892

  9. MRI texture features as biomarkers to predict MGMT methylation status in glioblastomas

    PubMed Central

    Korfiatis, Panagiotis; Kline, Timothy L.; Coufalova, Lucie; Lachance, Daniel H.; Parney, Ian F.; Carter, Rickey E.; Buckner, Jan C.; Erickson, Bradley J.

    2016-01-01

    Purpose: Imaging biomarker research focuses on discovering relationships between radiological features and histological findings. In glioblastoma patients, methylation of the O6-methylguanine methyltransferase (MGMT) gene promoter is positively correlated with an increased effectiveness of current standard of care. In this paper, the authors investigate texture features as potential imaging biomarkers for capturing the MGMT methylation status of glioblastoma multiforme (GBM) tumors when combined with supervised classification schemes. Methods: A retrospective study of 155 GBM patients with known MGMT methylation status was conducted. Co-occurrence and run length texture features were calculated, and both support vector machines (SVMs) and random forest classifiers were used to predict MGMT methylation status. Results: The best classification system (an SVM-based classifier) had a maximum area under the receiver-operating characteristic (ROC) curve of 0.85 (95% CI: 0.78–0.91) using four texture features (correlation, energy, entropy, and local intensity) originating from the T2-weighted images, yielding at the optimal threshold of the ROC curve, a sensitivity of 0.803 and a specificity of 0.813. Conclusions: Results show that supervised machine learning of MRI texture features can predict MGMT methylation status in preoperative GBM tumors, thus providing a new noninvasive imaging biomarker. PMID:27277032

  10. Prediction of hot spots in protein interfaces using a random forest model with hybrid features.

    PubMed

    Wang, Lin; Liu, Zhi-Ping; Zhang, Xiang-Sun; Chen, Luonan

    2012-03-01

    Prediction of hot spots in protein interfaces provides crucial information for the research on protein-protein interaction and drug design. Existing machine learning methods generally judge whether a given residue is likely to be a hot spot by extracting features only from the target residue. However, hot spots usually form a small cluster of residues which are tightly packed together at the center of protein interface. With this in mind, we present a novel method to extract hybrid features which incorporate a wide range of information of the target residue and its spatially neighboring residues, i.e. the nearest contact residue in the other face (mirror-contact residue) and the nearest contact residue in the same face (intra-contact residue). We provide a novel random forest (RF) model to effectively integrate these hybrid features for predicting hot spots in protein interfaces. Our method can achieve accuracy (ACC) of 82.4% and Matthew's correlation coefficient (MCC) of 0.482 in Alanine Scanning Energetics Database, and ACC of 77.6% and MCC of 0.429 in Binding Interface Database. In a comparison study, performance of our RF model exceeds other existing methods, such as Robetta, FOLDEF, KFC, KFC2, MINERVA and HotPoint. Of our hybrid features, three physicochemical features of target residues (mass, polarizability and isoelectric point), the relative side-chain accessible surface area and the average depth index of mirror-contact residues are found to be the main discriminative features in hot spots prediction. We also confirm that hot spots tend to form large contact surface areas between two interacting proteins. Source data and code are available at: http://www.aporc.org/doc/wiki/HotSpot. PMID:22258275

  11. Automatic feature template generation for maximum entropy based intonational phrase break prediction

    NASA Astrophysics Data System (ADS)

    Zhou, You

    2013-03-01

    The prediction of intonational phrase (IP) breaks is important for both the naturalness and intelligibility of Text-to- Speech (TTS) systems. In this paper, we propose a maximum entropy (ME) model to predict IP breaks from unrestricted text, and evaluate various keyword selection approaches in different domains. Furthermore, we design a hierarchical clustering algorithm for automatic generation of feature templates, which minimizes the need for human supervision during ME model training. Results of comparative experiments show that, for the task of IP break prediction, ME model obviously outperforms classification and regression tree (CART), log-likelihood ratio is the best scoring measure of keyword selection, compared with manual templates, templates automatically generated by our approach greatly improves the F-score of ME based IP break prediction, and significantly reduces the size of ME model.

  12. Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data

    PubMed Central

    Grissa, Dhouha; Pétéra, Mélanie; Brandolini, Marion; Napoli, Amedeo; Comte, Blandine; Pujos-Guillot, Estelle

    2016-01-01

    Untargeted metabolomics is a powerful phenotyping tool for better understanding biological mechanisms involved in human pathology development and identifying early predictive biomarkers. This approach, based on multiple analytical platforms, such as mass spectrometry (MS), chemometrics and bioinformatics, generates massive and complex data that need appropriate analyses to extract the biologically meaningful information. Despite various tools available, it is still a challenge to handle such large and noisy datasets with limited number of individuals without risking overfitting. Moreover, when the objective is focused on the identification of early predictive markers of clinical outcome, few years before occurrence, it becomes essential to use the appropriate algorithms and workflow to be able to discover subtle effects among this large amount of data. In this context, this work consists in studying a workflow describing the general feature selection process, using knowledge discovery and data mining methodologies to propose advanced solutions for predictive biomarker discovery. The strategy was focused on evaluating a combination of numeric-symbolic approaches for feature selection with the objective of obtaining the best combination of metabolites producing an effective and accurate predictive model. Relying first on numerical approaches, and especially on machine learning methods (SVM-RFE, RF, RF-RFE) and on univariate statistical analyses (ANOVA), a comparative study was performed on an original metabolomic dataset and reduced subsets. As resampling method, LOOCV was applied to minimize the risk of overfitting. The best k-features obtained with different scores of importance from the combination of these different approaches were compared and allowed determining the variable stabilities using Formal Concept Analysis. The results revealed the interest of RF-Gini combined with ANOVA for feature selection as these two complementary methods allowed selecting the 48

  13. Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data.

    PubMed

    Grissa, Dhouha; Pétéra, Mélanie; Brandolini, Marion; Napoli, Amedeo; Comte, Blandine; Pujos-Guillot, Estelle

    2016-01-01

    Untargeted metabolomics is a powerful phenotyping tool for better understanding biological mechanisms involved in human pathology development and identifying early predictive biomarkers. This approach, based on multiple analytical platforms, such as mass spectrometry (MS), chemometrics and bioinformatics, generates massive and complex data that need appropriate analyses to extract the biologically meaningful information. Despite various tools available, it is still a challenge to handle such large and noisy datasets with limited number of individuals without risking overfitting. Moreover, when the objective is focused on the identification of early predictive markers of clinical outcome, few years before occurrence, it becomes essential to use the appropriate algorithms and workflow to be able to discover subtle effects among this large amount of data. In this context, this work consists in studying a workflow describing the general feature selection process, using knowledge discovery and data mining methodologies to propose advanced solutions for predictive biomarker discovery. The strategy was focused on evaluating a combination of numeric-symbolic approaches for feature selection with the objective of obtaining the best combination of metabolites producing an effective and accurate predictive model. Relying first on numerical approaches, and especially on machine learning methods (SVM-RFE, RF, RF-RFE) and on univariate statistical analyses (ANOVA), a comparative study was performed on an original metabolomic dataset and reduced subsets. As resampling method, LOOCV was applied to minimize the risk of overfitting. The best k-features obtained with different scores of importance from the combination of these different approaches were compared and allowed determining the variable stabilities using Formal Concept Analysis. The results revealed the interest of RF-Gini combined with ANOVA for feature selection as these two complementary methods allowed selecting the 48

  14. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    PubMed

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  15. Prediction of Golgi-resident protein types using general form of Chou's pseudo-amino acid compositions: Approaches with minimal redundancy maximal relevance feature selection.

    PubMed

    Jiao, Ya-Sen; Du, Pu-Feng

    2016-08-01

    Recently, several efforts have been made in predicting Golgi-resident proteins. However, it is still a challenging task to identify the type of a Golgi-resident protein. Precise prediction of the type of a Golgi-resident protein plays a key role in understanding its molecular functions in various biological processes. In this paper, we proposed to use a mutual information based feature selection scheme with the general form Chou's pseudo-amino acid compositions to predict the Golgi-resident protein types. The positional specific physicochemical properties were applied in the Chou's pseudo-amino acid compositions. We achieved 91.24% prediction accuracy in a jackknife test with 49 selected features. It has the best performance among all the present predictors. This result indicates that our computational model can be useful in identifying Golgi-resident protein types. PMID:27155042

  16. miRNAfe: A comprehensive tool for feature extraction in microRNA prediction.

    PubMed

    Yones, Cristian A; Stegmayer, Georgina; Kamenetzky, Laura; Milone, Diego H

    2015-12-01

    miRNAfe is a comprehensive tool to extract features from RNA sequences. It is freely available as a web service, allowing a single access point to almost all state-of-the-art feature extraction methods used today in a variety of works from different authors. It has a very simple user interface, where the user only needs to load a file containing the input sequences and select the features to extract. As a result, the user obtains a text file with the features extracted, which can be used to analyze the sequences or as input to a miRNA prediction software. The tool can calculate up to 80 features where many of them are multidimensional arrays. In order to simplify the web interface, the features have been divided into six pre-defined groups, each one providing information about: primary sequence, secondary structure, thermodynamic stability, statistical stability, conservation between genomes of different species and substrings analysis of the sequences. Additionally, pre-trained classifiers are provided for prediction in different species. All algorithms to extract the features have been validated, comparing the results with the ones obtained from software of the original authors. The source code is freely available for academic use under GPL license at http://sourceforge.net/projects/sourcesinc/files/mirnafe/0.90/. A user-friendly access is provided as web interface at http://fich.unl.edu.ar/sinc/web-demo/mirnafe/. A more configurable web interface can be accessed at http://fich.unl.edu.ar/sinc/web-demo/mirnafe-full/. PMID:26499212

  17. Systems Medicine: from molecular features and models to the clinic in COPD

    PubMed Central

    2014-01-01

    Background and hypothesis Chronic Obstructive Pulmonary Disease (COPD) patients are characterized by heterogeneous clinical manifestations and patterns of disease progression. Two major factors that can be used to identify COPD subtypes are muscle dysfunction/wasting and co-morbidity patterns. We hypothesized that COPD heterogeneity is in part the result of complex interactions between several genes and pathways. We explored the possibility of using a Systems Medicine approach to identify such pathways, as well as to generate predictive computational models that may be used in clinic practice. Objective and method Our overarching goal is to generate clinically applicable predictive models that characterize COPD heterogeneity through a Systems Medicine approach. To this end we have developed a general framework, consisting of three steps/objectives: (1) feature identification, (2) model generation and statistical validation, and (3) application and validation of the predictive models in the clinical scenario. We used muscle dysfunction and co-morbidity as test cases for this framework. Results In the study of muscle wasting we identified relevant features (genes) by a network analysis and generated predictive models that integrate mechanistic and probabilistic models. This allowed us to characterize muscle wasting as a general de-regulation of pathway interactions. In the co-morbidity analysis we identified relevant features (genes/pathways) by the integration of gene-disease and disease-disease associations. We further present a detailed characterization of co-morbidities in COPD patients that was implemented into a predictive model. In both use cases we were able to achieve predictive modeling but we also identified several key challenges, the most pressing being the validation and implementation into actual clinical practice. Conclusions The results confirm the potential of the Systems Medicine approach to study complex diseases and generate clinically relevant

  18. Dimensionality reduced cortical features and their use in predicting longitudinal changes in Alzheimer's disease.

    PubMed

    Park, Hyunjin; Yang, Jin-ju; Seo, Jongbum; Lee, Jong-min

    2013-08-29

    Neuroimaging features derived from the cortical surface provide important information in detecting changes related to the progression of Alzheimer's disease (AD). Recent widespread adoption of neuroimaging has allowed researchers to study longitudinal data in AD. We adopted cortical thickness and sulcal depth, parameterized by three-dimensional meshes, from magnetic resonance imaging as the surface features. The cortical feature is high-dimensional, and it is difficult to use directly with a classifier because of the "small sample size" problem. We applied manifold learning to reduce the dimensionality of the feature and then tested the usage of the dimensionality reduced feature with a support vector machine classifier. Principal component analysis (PCA) was chosen as the method of manifold learning. PCA was applied to a region of interest within the cortical surface. We used 30 normal, 30 mild cognitive impairment (MCI) and 12 conversion cases taken from the ADNI database. The classifier was trained using the cortical features extracted from normal and MCI patients. The classifier was tested for the 12 conversion patients only using the imaging data before the actual conversion. The conversion was predicted early with an accuracy of 83%. PMID:23827219

  19. Hyper-Echoic Rim in Thyroid Nodules: A New Ultrasonographic Feature for Malignancy Prediction.

    PubMed

    Dong, YiJie; Zhan, WeiWei; Zhou, JianQiao; Song, LinLin; Ni, XiaoFeng; Zhang, BenYan

    2016-09-01

    The goal of this study was to verify the ultrasound features of hyper-echoic rims in thyroid nodules and to evaluate their diagnostic value in predicting thyroid malignancies. We retrospectively analyzed 228 pathologically proven thyroid nodules (137 malignant and 91 benign nodules). Forty-eight thyroid nodules had a hyper echogenic rim. All malignant nodules (137) were papillary carcinomas, which were studied to identify the correlation between the hyper-echoic rim (detected by ultrasound) and other histologic features. Presence of a hyper-echoic rim had high specificity (94.51%), but low sensitivity (31.39%) in predicting malignancy (p < 0.05). Thirty-seven of 43 malignant nodules had boundary zones of mixed structure (apparent fibrous stroma bands or dense collagenous border with a mixed population of cancerous cells) under microscopic examination. In conclusion, the hyper-echogenic rim could be one additional ultrasound parameter in the diagnosis of thyroid lesions. PMID:27339761

  20. Prediction of banana quality indices from color features using support vector regression.

    PubMed

    Sanaeifar, Alireza; Bakhshipour, Adel; de la Guardia, Miguel

    2016-02-01

    Banana undergoes significant quality indices and color transformations during shelf-life process, which in turn affect important chemical and physical characteristics for the organoleptic quality of banana. A computer vision system was implemented in order to evaluate color of banana in RGB, L*a*b* and HSV color spaces, and changes in color features of banana during shelf-life were employed for the quantitative prediction of quality indices. The radial basis function (RBF) was applied as the kernel function of support vector regression (SVR) and the color features, in different color spaces, were selected as the inputs of the model, being determined total soluble solids, pH, titratable acidity and firmness as the output. Experimental results provided an improvement in predictive accuracy as compared with those obtained by using artificial neural network (ANN). PMID:26653423

  1. Machine learning methods enable predictive modeling of antibody feature:function relationships in RV144 vaccinees.

    PubMed

    Choi, Ickwon; Chung, Amy W; Suscovich, Todd J; Rerks-Ngarm, Supachai; Pitisuttithum, Punnee; Nitayaphan, Sorachai; Kaewkungwal, Jaranit; O'Connell, Robert J; Francis, Donald; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Alter, Galit; Ackerman, Margaret E; Bailey-Kellogg, Chris

    2015-04-01

    The adaptive immune response to vaccination or infection can lead to the production of specific antibodies to neutralize the pathogen or recruit innate immune effector cells for help. The non-neutralizing role of antibodies in stimulating effector cell responses may have been a key mechanism of the protection observed in the RV144 HIV vaccine trial. In an extensive investigation of a rich set of data collected from RV144 vaccine recipients, we here employ machine learning methods to identify and model associations between antibody features (IgG subclass and antigen specificity) and effector function activities (antibody dependent cellular phagocytosis, cellular cytotoxicity, and cytokine release). We demonstrate via cross-validation that classification and regression approaches can effectively use the antibody features to robustly predict qualitative and quantitative functional outcomes. This integration of antibody feature and function data within a machine learning framework provides a new, objective approach to discovering and assessing multivariate immune correlates. PMID:25874406

  2. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features.

    PubMed

    Yu, Kun-Hsing; Zhang, Ce; Berry, Gerald J; Altman, Russ B; Ré, Christopher; Rubin, Daniel L; Snyder, Michael

    2016-01-01

    Lung cancer is the most prevalent cancer worldwide, and histopathological assessment is indispensable for its diagnosis. However, human evaluation of pathology slides cannot accurately predict patients' prognoses. In this study, we obtain 2,186 haematoxylin and eosin stained histopathology whole-slide images of lung adenocarcinoma and squamous cell carcinoma patients from The Cancer Genome Atlas (TCGA), and 294 additional images from Stanford Tissue Microarray (TMA) Database. We extract 9,879 quantitative image features and use regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with stage I adenocarcinoma (P<0.003) or squamous cell carcinoma (P=0.023) in the TCGA data set. We validate the survival prediction framework with the TMA cohort (P<0.036 for both tumour types). Our results suggest that automatically derived image features can predict the prognosis of lung cancer patients and thereby contribute to precision oncology. Our methods are extensible to histopathology images of other organs. PMID:27527408

  3. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features

    PubMed Central

    Yu, Kun-Hsing; Zhang, Ce; Berry, Gerald J.; Altman, Russ B.; Ré, Christopher; Rubin, Daniel L.; Snyder, Michael

    2016-01-01

    Lung cancer is the most prevalent cancer worldwide, and histopathological assessment is indispensable for its diagnosis. However, human evaluation of pathology slides cannot accurately predict patients' prognoses. In this study, we obtain 2,186 haematoxylin and eosin stained histopathology whole-slide images of lung adenocarcinoma and squamous cell carcinoma patients from The Cancer Genome Atlas (TCGA), and 294 additional images from Stanford Tissue Microarray (TMA) Database. We extract 9,879 quantitative image features and use regularized machine-learning methods to select the top features and to distinguish shorter-term survivors from longer-term survivors with stage I adenocarcinoma (P<0.003) or squamous cell carcinoma (P=0.023) in the TCGA data set. We validate the survival prediction framework with the TMA cohort (P<0.036 for both tumour types). Our results suggest that automatically derived image features can predict the prognosis of lung cancer patients and thereby contribute to precision oncology. Our methods are extensible to histopathology images of other organs. PMID:27527408

  4. Prediction of near-term risk of developing breast cancer using computerized features from bilateral mammograms.

    PubMed

    Sun, Wenqing; Zheng, Bin; Lure, Fleming; Wu, Teresa; Zhang, Jianying; Wang, Benjamin Y; Saltzstein, Edward C; Qian, Wei

    2014-07-01

    Asymmetry of bilateral mammographic tissue density and patterns is a potentially strong indicator of having or developing breast abnormalities or early cancers. The purpose of this study is to design and test the global asymmetry features from bilateral mammograms to predict the near-term risk of women developing detectable high risk breast lesions or cancer in the next sequential screening mammography examination. The image dataset includes mammograms acquired from 90 women who underwent routine screening examinations, all interpreted as negative and not recalled by the radiologists during the original screening procedures. A computerized breast cancer risk analysis scheme using four image processing modules, including image preprocessing, suspicious region segmentation, image feature extraction, and classification was designed to detect and compute image feature asymmetry between the left and right breasts imaged on the mammograms. The highest computed area under curve (AUC) is 0.754±0.024 when applying the new computerized aided diagnosis (CAD) scheme to our testing dataset. The positive predictive value and the negative predictive value were 0.58 and 0.80, respectively. PMID:24725671

  5. Protein subcellular localization prediction based on compartment-specific features and structure conservation

    PubMed Central

    Su, Emily Chia-Yu; Chiu, Hua-Sheng; Lo, Allan; Hwang, Jenn-Kang; Sung, Ting-Yi; Hsu, Wen-Lian

    2007-01-01

    Background Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins. Results We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%. Conclusion Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant

  6. Self-adaptive MOEA feature selection for classification of bankruptcy prediction data.

    PubMed

    Gaspar-Cunha, A; Recio, G; Costa, L; Estébanez, C

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier. PMID:24707201

  7. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

    PubMed Central

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-01

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/. PMID:26797014

  8. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

    NASA Astrophysics Data System (ADS)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-01

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  9. Self-Adaptive MOEA Feature Selection for Classification of Bankruptcy Prediction Data

    PubMed Central

    Gaspar-Cunha, A.; Recio, G.; Costa, L.; Estébanez, C.

    2014-01-01

    Bankruptcy prediction is a vast area of finance and accounting whose importance lies in the relevance for creditors and investors in evaluating the likelihood of getting into bankrupt. As companies become complex, they develop sophisticated schemes to hide their real situation. In turn, making an estimation of the credit risks associated with counterparts or predicting bankruptcy becomes harder. Evolutionary algorithms have shown to be an excellent tool to deal with complex problems in finances and economics where a large number of irrelevant features are involved. This paper provides a methodology for feature selection in classification of bankruptcy data sets using an evolutionary multiobjective approach that simultaneously minimise the number of features and maximise the classifier quality measure (e.g., accuracy). The proposed methodology makes use of self-adaptation by applying the feature selection algorithm while simultaneously optimising the parameters of the classifier used. The methodology was applied to four different sets of data. The obtained results showed the utility of using the self-adaptation of the classifier. PMID:24707201

  10. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    PubMed

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-01

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/. PMID:26797014

  11. Music-induced emotions can be predicted from a combination of brain activity and acoustic features.

    PubMed

    Daly, Ian; Williams, Duncan; Hallowell, James; Hwang, Faustina; Kirke, Alexis; Malik, Asad; Weaver, James; Miranda, Eduardo; Nasuto, Slawomir J

    2015-12-01

    It is widely acknowledged that music can communicate and induce a wide range of emotions in the listener. However, music is a highly-complex audio signal composed of a wide range of complex time- and frequency-varying components. Additionally, music-induced emotions are known to differ greatly between listeners. Therefore, it is not immediately clear what emotions will be induced in a given individual by a piece of music. We attempt to predict the music-induced emotional response in a listener by measuring the activity in the listeners electroencephalogram (EEG). We combine these measures with acoustic descriptors of the music, an approach that allows us to consider music as a complex set of time-varying acoustic features, independently of any specific music theory. Regression models are found which allow us to predict the music-induced emotions of our participants with a correlation between the actual and predicted responses of up to r=0.234,p<0.001. This regression fit suggests that over 20% of the variance of the participant's music induced emotions can be predicted by their neural activity and the properties of the music. Given the large amount of noise, non-stationarity, and non-linearity in both EEG and music, this is an encouraging result. Additionally, the combination of measures of brain activity and acoustic features describing the music played to our participants allows us to predict music-induced emotions with significantly higher accuracies than either feature type alone (p<0.01). PMID:26544602

  12. Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico

    USGS Publications Warehouse

    Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.

    2003-01-01

    Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire

  13. Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features

    PubMed Central

    Luo, Longqiang; Li, Dingfang; Zhang, Wen; Tu, Shikui; Zhu, Xiaopeng; Tian, Gang

    2016-01-01

    Background Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. Methods In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. Results We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. Conclusions Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File. PMID:27074043

  14. Long Hydrocarbon Chains Serve as Unique Molecular Features Recognized by Ventral Glomeruli of the Rat Olfactory Bulb

    PubMed Central

    Ho, Sabrina L.; Johnson, Brett A.; Leon, Michael

    2008-01-01

    In an effort to understand mammalian olfactory processing, we have been describing the responses to systematically different odorants in the glomerular layer of the main olfactory bulb of rats. To understand the processing of pure hydrocarbon structures in this system, we used the [14C]2-deoxyglucose method to determine glomerular responses to a homologous series of alkanes (from six to sixteen carbons) that are straight-chained hydrocarbons without functional groups. We found two rostral regions of activity evoked by these odorants, one lateral and one medial, that were observed to shift ventrally with increasing alkane carbon chain length. Furthermore, we successfully predicted that the longest alkanes with carbon chain length greater than our previous odorant selections would stimulate extremely ventral glomerular regions where no activation had been observed with the hundreds of odorants that we had previously studied. Overlaps in response profiles were observed in the patterns evoked by alkanes and by other aliphatic odorants of corresponding carbon chain length despite possessing different oxygen-containing functional groups, which demonstrated that hydrocarbon chains could serve as molecular features in the combinatorial coding of odorant information. We found a close and predictable relationship among the molecular properties of odorants, their induced neural activity, and their perceptual similarities. PMID:16856178

  15. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features.

    PubMed

    Yates, Christopher M; Filippis, Ioannis; Kelley, Lawrence A; Sternberg, Michael J E

    2014-07-15

    Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html. PMID:24810707

  16. Prediction models for solitary pulmonary nodules based on curvelet textural features and clinical parameters.

    PubMed

    Wang, Jing-Jing; Wu, Hai-Feng; Sun, Tao; Li, Xia; Wang, Wei; Tao, Li-Xin; Huo, Da; Lv, Ping-Xin; He, Wen; Guo, Xiu-Hua

    2013-01-01

    Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images. PMID:24289618

  17. Role of Side-Chain Molecular Features in Tuning Lower Critical Solution Temperatures (LCSTs) of Oligoethylene Glycol Modified Polypeptides.

    PubMed

    Gharakhanian, Eric G; Deming, Timothy J

    2016-07-01

    A series of thermoresponsive polypeptides has been synthesized using a methodology that allowed facile adjustment of side-chain functional groups. The lower critical solution temperature (LCST) properties of these polymers in water were then evaluated relative to systematic molecular modifications in their side-chains. It was found that in addition to the number of ethylene glycol repeats in the side-chains, terminal and linker groups also have substantial and predictable effects on cloud point temperatures (Tcp). In particular, we found that the structure of these polypeptides allowed for inclusion of polar hydroxyl groups, which significantly increased their hydrophilicity and decreased the need to use long oligoethylene glycol repeats to obtain LCSTs. The thioether linkages in these polypeptides were found to provide an additional structural feature for reversible switching of both polypeptide conformation and thermoresponsive properties. PMID:27102972

  18. Molecular Features of Subtype-Specific Progression from Ductal Carcinoma In Situ to Invasive Breast Cancer.

    PubMed

    Lesurf, Robert; Aure, Miriam Ragle; Mørk, Hanne Håberg; Vitelli, Valeria; Lundgren, Steinar; Børresen-Dale, Anne-Lise; Kristensen, Vessela; Wärnberg, Fredrik; Hallett, Michael; Sørlie, Therese

    2016-07-26

    Breast cancer consists of at least five main molecular "intrinsic" subtypes that are reflected in both pre-invasive and invasive disease. Although previous studies have suggested that many of the molecular features of invasive breast cancer are established early, it is unclear what mechanisms drive progression and whether the mechanisms of progression are dependent or independent of subtype. We have generated mRNA, miRNA, and DNA copy-number profiles from a total of 59 in situ lesions and 85 invasive tumors in order to comprehensively identify those genes, signaling pathways, processes, and cell types that are involved in breast cancer progression. Our work provides evidence that there are molecular features associated with disease progression that are unique to the intrinsic subtypes. We additionally establish subtype-specific signatures that are able to identify a small proportion of pre-invasive tumors with expression profiles that resemble invasive carcinoma, indicating a higher likelihood of future disease progression. PMID:27396337

  19. Quantitative Description of a Protein Fitness Landscape Based on Molecular Features.

    PubMed

    Meini, María-Rocío; Tomatis, Pablo E; Weinreich, Daniel M; Vila, Alejandro J

    2015-07-01

    Understanding the driving forces behind protein evolution requires the ability to correlate the molecular impact of mutations with organismal fitness. To address this issue, we employ here metallo-β-lactamases as a model system, which are Zn(II) dependent enzymes that mediate antibiotic resistance. We present a study of all the possible evolutionary pathways leading to a metallo-β-lactamase variant optimized by directed evolution. By studying the activity, stability and Zn(II) binding capabilities of all mutants in the preferred evolutionary pathways, we show that this local fitness landscape is strongly conditioned by epistatic interactions arising from the pleiotropic effect of mutations in the different molecular features of the enzyme. Activity and stability assays in purified enzymes do not provide explanatory power. Instead, measurement of these molecular features in an environment resembling the native one provides an accurate description of the observed antibiotic resistance profile. We report that optimization of Zn(II) binding abilities of metallo-β-lactamases during evolution is more critical than stabilization of the protein to enhance fitness. A global analysis of these parameters allows us to connect genotype with fitness based on quantitative biochemical and biophysical parameters. PMID:25767204

  20. Molecular Size and Separability Features of Pea Cell Wall Polysaccharides 1

    PubMed Central

    Talbott, Lawrence D.; Ray, Peter M.

    1992-01-01

    Relative molecular size distributions of pectic and hemicellulosic polysaccharides of pea (Pisum sativum cv Alaska) third internode primary walls were determined by gel filtration chromatography. Pectic polyuronides have a peak molecular mass of about 1100 kilodaltons, relative to dextran standards. This peak may be partly an aggregate of smaller molecular units, because demonstrable aggregation occurred when samples were concentrated by evaporation. About 86% of the neutral sugars (mostly arabinose and galactose) in the pectin cofractionate with polyuronide in gel filtration chromatography and diethylaminoethyl-cellulose chromatography and appear to be attached covalently to polyuronide chains, probably as constituents of rhamnogalacturonans. However, at least 60% of the wall's arabinan/galactan is not linked covalently to the bulk of its rhamnogalacturonan, either glycosidically or by ester links, but occurs in the hemicellulose fraction, accompanied by negligible uronic acid, and has a peak molecular mass of about 1000 kilodaltons. Xyloglucan, the other principal hemicellulosic polymer, has a peak molecular mass of about 30 kilodaltons (with a secondary, usually minor, peak of approximately 300 kilodaltons) and is mostly not linked glycosidically either to pectic polyuronides or to arabinogalactan. The relatively narrow molecular mass distributions of these polymers suggest mechanisms of co- or postsynthetic control of hemicellulose chain length by the cell. Although the macromolecular features of the mentioned polymers individually agree generally with those shown in the widely disseminated sycamore cell primary wall model, the matrix polymers seem to be associated mostly noncovalently rather than in the covalently interlinked meshwork postulated by that model. Xyloglucan and arabinan/galactan may form tightly and more loosely bound layers, respectively, around the cellulose microfibrils, the outer layer interacting with pectic rhamnogalacturonans that occupy

  1. Remote health monitoring: predicting outcome success based on contextual features for cardiovascular disease.

    PubMed

    Alshurafa, Nabil; Eastwood, Jo-Ann; Pourhomayoun, Mohammad; Liu, Jason J; Sarrafzadeh, Majid

    2014-01-01

    Current studies have produced a plethora of remote health monitoring (RHM) systems designed to enhance the care of patients with chronic diseases. Many RHM systems are designed to improve patient risk factors for cardiovascular disease, including physiological parameters such as body mass index (BMI) and waist circumference, and lipid profiles such as low density lipoprotein (LDL) and high density lipoprotein (HDL). There are several patient characteristics that could be determining factors for a patient's RHM outcome success, but these characteristics have been largely unidentified. In this paper, we analyze results from an RHM system deployed in a six month Women's Heart Health study of 90 patients, and apply advanced feature selection and machine learning algorithms to identify patients' key baseline contextual features and build effective prediction models that help determine RHM outcome success. We introduce Wanda-CVD, a smartphone-based RHM system designed to help participants with cardiovascular disease risk factors by motivating participants through wireless coaching using feedback and prompts as social support. We analyze key contextual features that secure positive patient outcomes in both physiological parameters and lipid profiles. Results from the Women's Heart Health study show that health threat of heart disease, quality of life, family history, stress factors, social support, and anxiety at baseline all help predict patient RHM outcome success. PMID:25570321

  2. BioCAST/IFCT-1002: epidemiological and molecular features of lung cancer in never-smokers.

    PubMed

    Couraud, Sébastien; Souquet, Pierre-Jean; Paris, Christophe; Dô, Pascal; Doubre, Hélène; Pichon, Eric; Dixmier, Adrien; Monnet, Isabelle; Etienne-Mastroianni, Bénédicte; Vincent, Michel; Trédaniel, Jean; Perrichon, Marielle; Foucher, Pascal; Coudert, Bruno; Moro-Sibilot, Denis; Dansin, Eric; Labonne, Stéphanie; Missy, Pascale; Morin, Franck; Blanché, Hélène; Zalcman, Gérard

    2015-05-01

    Lung cancer in never-smokers (LCINS) (fewer than 100 cigarettes in lifetime) is considered as a distinct entity and harbours an original molecular profile. However, the epidemiological and molecular features of LCINS in Europe remain poorly understood. All consecutive newly diagnosed LCINS patients were included in this prospective observational study by 75 participating centres during a 14-month period. Each patient completed a detailed questionnaire about risk factor exposure. Biomarker and pathological analyses were also collected. We report the main descriptive overall results with a focus on sex differences. 384 patients were included: 65 men and 319 women. 66% had been exposed to passive smoking (significantly higher among women). Definite exposure to main occupational carcinogens was significantly higher in men (35% versus 8% in women). A targetable molecular alteration was found in 73% of patients (without any significant sex difference): EGFR in 51%, ALK in 8%, KRAS in 6%, HER2 in 3%, BRAF in 3%, PI3KCA in less than 1%, and multiple in 2%. We present the largest and most comprehensive LCINS analysis in a European population. Physicians should track occupational exposure in men (35%), and a somatic molecular alteration in both sexes (73%). PMID:25657019

  3. Energy Minimization of Molecular Features Observed on the (110) Face of Lysozyme Crystals

    NASA Technical Reports Server (NTRS)

    Perozzo, Mary A.; Konnert, John H.; Li, Huayu; Nadarajah, Arunan; Pusey, Marc

    1999-01-01

    Molecular dynamics and energy minimization have been carried out using the program XPLOR to check the plausibility of a model lysozyme crystal surface. The molecular features of the (110) face of lysozyme were observed using atomic force microscopy (AFM). A model of the crystal surface was constructed using the PDB file 193L, and was used to simulate an AFM image. Molecule translations, van der Waals radii, and assumed AFM tip shape were adjusted to maximize the correlation coefficient between the experimental and simulated images. The highest degree of 0 correlation (0.92) was obtained with the molecules displaced over 6 A from their positions within the bulk of the crystal. The quality of this starting model, the extent of energy minimization, and the correlation coefficient between the final model and the experimental data will be discussed.

  4. Sub-resolution assist feature (SRAF) printing prediction using logistic regression

    NASA Astrophysics Data System (ADS)

    Tan, Chin Boon; Koh, Kar Kit; Zhang, Dongqing; Foong, Yee Mei

    2015-03-01

    In optical proximity correction (OPC), the sub-resolution assist feature (SRAF) has been used to enhance the process window of main structures. However, the printing of SRAF on wafer is undesirable as this may adversely degrade the overall process yield if it is transferred into the final pattern. A reasonably accurate prediction model is needed during OPC to ensure that the SRAF placement and size have no risk of SRAF printing. Current common practice in OPC is either using the main OPC model or model threshold adjustment (MTA) solution to predict the SRAF printing. This paper studies the feasibility of SRAF printing prediction using logistic regression (LR). Logistic regression is a probabilistic classification model that gives discrete binary outputs after receiving sufficient input variables from SRAF printing conditions. In the application of SRAF printing prediction, the binary outputs can be treated as 1 for SRAFPrinting and 0 for No-SRAF-Printing. The experimental work was performed using a 20nm line/space process layer. The results demonstrate that the accuracy of SRAF printing prediction using LR approach outperforms MTA solution. Overall error rate of as low as calibration 2% and verification 5% was achieved by LR approach compared to calibration 6% and verification 15% for MTA solution. In addition, the performance of LR approach was found to be relatively independent and consistent across different resist image planes compared to MTA solution.

  5. Prediction of bacterial type IV secreted effectors by C-terminal features

    PubMed Central

    2014-01-01

    Background Many bacteria can deliver pathogenic proteins (effectors) through type IV secretion systems (T4SSs) to eukaryotic cytoplasm, causing host diseases. The inherent property, such as sequence diversity and global scattering throughout the whole genome, makes it a big challenge to effectively identify the full set of T4SS effectors. Therefore, an effective inter-species T4SS effector prediction tool is urgently needed to help discover new effectors in a variety of bacterial species, especially those with few known effectors, e.g., Helicobacter pylori. Results In this research, we first manually annotated a full list of validated T4SS effectors from different bacteria and then carefully compared their C-terminal sequential and position-specific amino acid compositions, possible motifs and structural features. Based on the observed features, we set up several models to automatically recognize T4SS effectors. Three of the models performed strikingly better than the others and T4SEpre_Joint had the best performance, which could distinguish the T4SS effectors from non-effectors with a 5-fold cross-validation sensitivity of 89% at a specificity of 97%, based on the training datasets. An inter-species cross prediction showed that T4SEpre_Joint could recall most known effectors from a variety of species. The inter-species prediction tool package, T4SEpre, was further used to predict new T4SS effectors from H. pylori, an important human pathogen associated with gastritis, ulcer and cancer. In total, 24 new highly possible H. pylori T4S effector genes were computationally identified. Conclusions We conclude that T4SEpre, as an effective inter-species T4SS effector prediction software package, will help find new pathogenic T4SS effectors efficiently in a variety of pathogenic bacteria. PMID:24447430

  6. Predicting spectral features in galaxy spectra from broad-band photometry

    NASA Astrophysics Data System (ADS)

    Abdalla, F. B.; Mateus, A.; Santos, W. A.; Sodrè, L., Jr.; Ferreras, I.; Lahav, O.

    2008-07-01

    We explore the prospects of predicting emission-line features present in galaxy spectra given broad-band photometry alone. There is a general consent that colours, and spectral features, most notably the 4000 Å break, can predict many properties of galaxies, including star formation rates and hence they could infer some of the line properties. We argue that these techniques have great prospects in helping us understand line emission in extragalactic objects and might speed up future galaxy redshift surveys if they are to target emission-line objects only. We use two independent methods, Artificial Neural Networks (based on the ANNz code) and Locally Weighted Regression (LWR), to retrieve correlations present in the colour N-dimensional space and to predict the equivalent widths present in the corresponding spectra. We also investigate how well it is possible to separate galaxies with and without lines from broad-band photometry only. We find, unsurprisingly, that recombination lines can be well predicted by galaxy colours. However, among collisional lines some can and some cannot be predicted well from galaxy colours alone, without any further redshift information. We also use our techniques to estimate how much information contained in spectral diagnostic diagrams can be recovered from broad-band photometry alone. We find that it is possible to classify active galactic nuclei and star formation objects relatively well using colours only. We suggest that this technique could be used to considerably improve redshift surveys such as the upcoming Fibre Multi Object Spectrograph (FMOS) survey and the planned Wide Field Multi Object Spectrograph (WFMOS) survey.

  7. Prognostic Significance and Molecular Features of Signet-Ring Cell and Mucinous Components in Colorectal Carcinoma

    PubMed Central

    Mima, Kosuke; Sukawa, Yasutaka; Li, Tingting; Yasunari, Mika; Zhang, Xuehong; Wu, Kana; Meyerhardt, Jeffrey A.; Fuchs, Charles S.

    2014-01-01

    Background Colorectal carcinoma (CRC) represents a group of histopathologically and molecularly heterogeneous diseases, which may contain signet-ring cell component and/or mucinous component to a varying extent under pathology assessment. However, little is known about the prognostic significance of those components, independent of various tumor molecular features. Methods Utilizing a molecular pathological epidemiology database of 1,336 rectal and colon cancers in the Nurses’ Health Study and the Health Professionals Follow-up Study, we examined patient survival according to the proportion of signet-ring cell and mucinous components in CRCs. Cox proportional hazards models were used to compute hazard ratio (HR) for mortality, adjusting for potential confounders including stage, microsatellite instability, CpG island methylator phenotype, LINE-1 methylation, and KRAS, BRAF, and PIK3CA mutations. Results Compared to CRC without signet-ring cell component, 1–50 % signet-ring cell component was associated with multivariate CRC-specific mortality HR of 1.40 [95 % confidence interval (CI) 1.02–1.93], and >50 % signet-ring cell component was associated with multivariate CRC-specific mortality HR of 4.53 (95 % CI 2.53–8.12) (Ptrend > 0.0001). Compared to CRC without mucinous component, neither 1–50 % mucinous component (multivariate HR 1.04; 95 % CI 0.81–1.33) nor >50 % mucinous component (multivariate HR 0.82; 95 % CI 0.54–1.23) was significantly associated with CRC-specific mortality (Ptrend < 0.57). Conclusions Even a minor (50 % or less) signet-ring cell component in CRC was associated with higher patient mortality, independent of various tumor molecular and other clinicopathological features. In contrast, mucinous component was not associated with mortality in CRC patients. PMID:25326395

  8. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    PubMed Central

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  9. Observations of the interstellar ice grain feature in the Taurus molecular clouds

    SciTech Connect

    Whittet, D.C.B.; Bode, H.F.; Longmore, A.J.; Baines, D.W.T.; Evans, A.

    1983-01-01

    Although water ice was originally proposed as a major constituent of the interstellar grain population (e.g. Oort and van de Hulst, 1946), the advent of infrared astronomy has shown that the expected absorption due to O-H stretching vibrations at 3 ..mu..m is illusive. Observations have in fact revealed that the carrier of this feature is apparently restricted to regions deep within dense molecular clouds (Merrill et al., 1976; Willner et al., 1982). However, the exact carrier of this feature is still controversial, and many questions remain as to the conditions required for its appearance. It is also uncertain whether it is restricted to circumstellar shells, rather than the general cloud medium. Detailed discussion of the 3 ..mu..m band properties is given elsewhere in this volume. 15 references, 4 figures.

  10. Docking Studies and Molecular Dynamic Simulations Reveal Different Features of IDO1 Structure.

    PubMed

    Greco, Francesco Antonio; Bournique, Answald; Coletti, Alice; Custodi, Chiara; Dolciami, Daniela; Carotti, Andrea; Macchiarulo, Antonio

    2016-09-01

    In the last decade, indoleamine 2,3-dioxygenase 1 (IDO1) has attracted a great deal of attention being recognized as key regulator of immunosuppressive pathways in the tumor immuno-editing process. Several classes of inhibitors have been developed as potential anticancer agents, but only few of them have advanced in clinical trials. Hence, the quest of novel potent and selective inhibitors of the enzyme is still active and mostly pursued by structure-based drug design strategies based on early and more recent crystal structures of IDO1. Combining docking studies and molecular dynamic simulations, in this work we have comparatively investigated the structural features of each crystal structure of IDO1. The results pinpoint different features in specific crystal structures of the enzyme that may benefit the medicinal chemistry arena aiding the design of novel potent and selective inhibitors of IDO1. PMID:27546049

  11. Toll-like receptor 7 agonists: chemical feature based pharmacophore identification and molecular docking studies.

    PubMed

    Yu, Hui; Jin, Hongwei; Sun, Lidan; Zhang, Liangren; Sun, Gang; Wang, Zhanli; Yu, Yongchun

    2013-01-01

    Chemical feature based pharmacophore models were generated for Toll-like receptors 7 (TLR7) agonists using HypoGen algorithm, which is implemented in the Discovery Studio software. Several methods tools used in validation of pharmacophore model were presented. The first hypothesis Hypo1 was considered to be the best pharmacophore model, which consists of four features: one hydrogen bond acceptor, one hydrogen bond donor, and two hydrophobic features. In addition, homology modeling and molecular docking studies were employed to probe the intermolecular interactions between TLR7 and its agonists. The results further confirmed the reliability of the pharmacophore model. The obtained pharmacophore model (Hypo1) was then employed as a query to screen the Traditional Chinese Medicine Database (TCMD) for other potential lead compounds. One hit was identified as a potent TLR7 agonist, which has antiviral activity against hepatitis virus in vitro. Therefore, our current work provides confidence for the utility of the selected chemical feature based pharmacophore model to design novel TLR7 agonists with desired biological activity. PMID:23526932

  12. ERα-Negative and Triple Negative Breast Cancer: Molecular Features and Potential Therapeutic Approaches

    PubMed Central

    Chen, Jin-Qiang; Russo, Jose

    2010-01-01

    Triple negative breast cancer (TNBC) is a type of aggressive breast cancer lacking the expression of estrogen receptors (ER), progesterone receptors (PR) and human epidermal growth factor receptor-2 (HER-2). TNBC patients account for approximately 15% of total breast cancer patients and are more prevalent among young African, African-American and Latino women patients. The currently available ER-targeted and Her-2-based therapies are not effective for treating TNBC. Recent studies have revealed a number of novel features of TNBC. In the present work, we comprehensively addressed these features and discussed potential therapeutic approaches based on these features for TNBC, with particular focus on: 1) the pathological features of TNBC/basal-like breast cancer; 2) E2/ERβ – mediated signaling pathways; 3) G-protein coupling receptor-30/epithelial growth factor receptor (GPCR-30/EGFR) signaling pathway; 4) interactions of ERβ with breast cancer 1/2 (BRCA1/2); 5) chemokine CXCL8 and related chemokines; 6) altered microRNA signatures and suppression of ERα expression/ERα-signaling by micro-RNAs; 7) altered expression of several pro-oncongenic and tumor suppressor proteins; and 8) genotoxic effects caused by oxidative estrogen metabolites. Gaining better insights into these molecular pathways in TNBC may lead to identification of novel biomarkers and targets for development of diagnostic and therapeutic approaches for prevention and treatment of TNBC. PMID:19527773

  13. Systems Biological Approach of Molecular Descriptors Connectivity: Optimal Descriptors for Oral Bioavailability Prediction

    PubMed Central

    Ahmed, Shiek S. S. J.; Ramakrishnan, V.

    2012-01-01

    Background Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. Results The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/−bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. Conclusion The logistic algorithm with 47 selected descriptors correctly predicted the oral

  14. Symbolic features and classification via support vector machine for predicting death in patients with Chagas disease.

    PubMed

    Sady, Cristina C R; Ribeiro, Antonio Luiz P

    2016-03-01

    This paper introduces a technique for predicting death in patients with Chagas disease using features extracted from symbolic series and time-frequency indices of heart rate variability (HRV). The study included 150 patients: 15 patients who died and 135 who did not. The HRV series were obtained from 24-h Holter monitoring. Sequences of symbols from 5-min epochs from series of RR intervals were generated using symbolic dynamics and ordinal pattern statistics. Fourteen features were extracted from symbolic series and four derived from clinical aspects of patients. For classification, the 18 features from each epoch were used as inputs in a support vector machine (SVM) with a radial basis function (RBF) kernel. The results showed that it is possible to distinguish between the two classes, patients with Chagas disease who did or did not die, with a 95% accuracy rate. Therefore, we suggest that the use of new features based on symbolic series, coupled with classic time-frequency and clinical indices, proves to be a good predictor of death in patients with Chagas disease. PMID:26851730

  15. A switchable bis-branched [1]rotaxane featuring dual-mode molecular motions and tunable molecular aggregation.

    PubMed

    Li, Hong; Li, Xin; Cao, Zhan-Qi; Qu, Da-Hui; Ågren, Hans; Tian, He

    2014-01-01

    A multifunctional bis-branched [1]rotaxane containing a perylene bisimide (PBI) core and two identical bistable[1]rotaxane arms terminated with ferrocene units was prepared and characterized by (1)H NMR, (13)C NMR, and 2D ROESY NMR spectroscopies and by HR-ESI spectrometry. The system is shown to possess several key features: (1) In acetone solution, external acid-base stimuli can result in relative mechanical movements of its ring and thread, which can induce extension and contraction movements of the whole system accompanied by a rotational movement of the ferrocene units, thus realizing dual-mode molecular motions, and the optimized conformations at different states are obtained through molecular dynamics simulations employing the general Amber force field. (2) The introduction of PBI enables the system fluorescence encoding through distance-dependent photoinduced electron transfer process from the ferrocene units to the PBI fluorophore. (3) The addition of Zn(2+) can increase the degree of aggregation of the system, while adding base hinders aggregation because of the movement of the macrocycle. The tunable aggregated nanostructural morphologies of [1]rotaxane were examined by scanning electron microscopy. These results can pave the way to achieve precise control of integrated and coupling nanomechanical motions at a single-molecule level and provide more insight into controlling the aggregate behavior of switchable mechanically interlocked molecules. PMID:25302680

  16. Molecular features of interaction between VEGFA and anti-angiogenic drugs used in retinal diseases: a computational approach.

    PubMed

    Platania, Chiara B M; Di Paola, Luisa; Leggio, Gian M; Romano, Giovanni L; Drago, Filippo; Salomone, Salvatore; Bucolo, Claudio

    2015-01-01

    Anti-angiogenic agents are biological drugs used for treatment of retinal neovascular degenerative diseases. In this study, we aimed at in silico analysis of interaction of vascular endothelial growth factor A (VEGFA), the main mediator of angiogenesis, with binding domains of anti-angiogenic agents used for treatment of retinal diseases, such as ranibizumab, bevacizumab and aflibercept. The analysis of anti-VEGF/VEGFA complexes was carried out by means of protein-protein docking and molecular dynamics (MD) coupled to molecular mechanics-Poisson Boltzmann Surface Area (MM-PBSA) calculation. Molecular dynamics simulation was further analyzed by protein contact networks. Rough energetic evaluation with protein-protein docking scores revealed that aflibercept/VEGFA complex was characterized by electrostatic stabilization, whereas ranibizumab and bevacizumab complexes were stabilized by Van der Waals (VdW) energy term; these results were confirmed by MM-PBSA. Comparison of MM-PBSA predicted energy terms with experimental binding parameters reported in literature indicated that the high association rate (Kon) of aflibercept to VEGFA was consistent with high stabilizing electrostatic energy. On the other hand, the relatively low experimental dissociation rate (Koff) of ranibizumab may be attributed to lower conformational fluctuations of the ranibizumab/VEGFA complex, higher number of contacts and hydrogen bonds in comparison to bevacizumab and aflibercept. Thus, the anti-angiogenic agents have been found to be considerably different both in terms of molecular interactions and stabilizing energy. Characterization of such features can improve the design of novel biological drugs potentially useful in clinical practice. PMID:26578958

  17. Molecular features of interaction between VEGFA and anti-angiogenic drugs used in retinal diseases: a computational approach

    PubMed Central

    Platania, Chiara B. M.; Di Paola, Luisa; Leggio, Gian M.; Romano, Giovanni L.; Drago, Filippo; Salomone, Salvatore; Bucolo, Claudio

    2015-01-01

    Anti-angiogenic agents are biological drugs used for treatment of retinal neovascular degenerative diseases. In this study, we aimed at in silico analysis of interaction of vascular endothelial growth factor A (VEGFA), the main mediator of angiogenesis, with binding domains of anti-angiogenic agents used for treatment of retinal diseases, such as ranibizumab, bevacizumab and aflibercept. The analysis of anti-VEGF/VEGFA complexes was carried out by means of protein-protein docking and molecular dynamics (MD) coupled to molecular mechanics-Poisson Boltzmann Surface Area (MM-PBSA) calculation. Molecular dynamics simulation was further analyzed by protein contact networks. Rough energetic evaluation with protein-protein docking scores revealed that aflibercept/VEGFA complex was characterized by electrostatic stabilization, whereas ranibizumab and bevacizumab complexes were stabilized by Van der Waals (VdW) energy term; these results were confirmed by MM-PBSA. Comparison of MM-PBSA predicted energy terms with experimental binding parameters reported in literature indicated that the high association rate (Kon) of aflibercept to VEGFA was consistent with high stabilizing electrostatic energy. On the other hand, the relatively low experimental dissociation rate (Koff) of ranibizumab may be attributed to lower conformational fluctuations of the ranibizumab/VEGFA complex, higher number of contacts and hydrogen bonds in comparison to bevacizumab and aflibercept. Thus, the anti-angiogenic agents have been found to be considerably different both in terms of molecular interactions and stabilizing energy. Characterization of such features can improve the design of novel biological drugs potentially useful in clinical practice. PMID:26578958

  18. Respiratory trace feature analysis for the prediction of respiratory-gated PET quantification

    NASA Astrophysics Data System (ADS)

    Wang, Shouyi; Bowen, Stephen R.; Chaovalitwongse, W. Art; Sandison, George A.; Grabowski, Thomas J.; Kinahan, Paul E.

    2014-02-01

    The benefits of respiratory gating in quantitative PET/CT vary tremendously between individual patients. Respiratory pattern is among many patient-specific characteristics that are thought to play an important role in gating-induced imaging improvements. However, the quantitative relationship between patient-specific characteristics of respiratory pattern and improvements in quantitative accuracy from respiratory-gated PET/CT has not been well established. If such a relationship could be estimated, then patient-specific respiratory patterns could be used to prospectively select appropriate motion compensation during image acquisition on a per-patient basis. This study was undertaken to develop a novel statistical model that predicts quantitative changes in PET/CT imaging due to respiratory gating. Free-breathing static FDG-PET images without gating and respiratory-gated FDG-PET images were collected from 22 lung and liver cancer patients on a PET/CT scanner. PET imaging quality was quantified with peak standardized uptake value (SUVpeak) over lesions of interest. Relative differences in SUVpeak between static and gated PET images were calculated to indicate quantitative imaging changes due to gating. A comprehensive multidimensional extraction of the morphological and statistical characteristics of respiratory patterns was conducted, resulting in 16 features that characterize representative patterns of a single respiratory trace. The six most informative features were subsequently extracted using a stepwise feature selection approach. The multiple-regression model was trained and tested based on a leave-one-subject-out cross-validation. The predicted quantitative improvements in PET imaging achieved an accuracy higher than 90% using a criterion with a dynamic error-tolerance range for SUVpeak values. The results of this study suggest that our prediction framework could be applied to determine which patients would likely benefit from respiratory motion compensation

  19. Respiratory trace feature analysis for the prediction of respiratory-gated PET quantification.

    PubMed

    Wang, Shouyi; Bowen, Stephen R; Chaovalitwongse, W Art; Sandison, George A; Grabowski, Thomas J; Kinahan, Paul E

    2014-02-21

    The benefits of respiratory gating in quantitative PET/CT vary tremendously between individual patients. Respiratory pattern is among many patient-specific characteristics that are thought to play an important role in gating-induced imaging improvements. However, the quantitative relationship between patient-specific characteristics of respiratory pattern and improvements in quantitative accuracy from respiratory-gated PET/CT has not been well established. If such a relationship could be estimated, then patient-specific respiratory patterns could be used to prospectively select appropriate motion compensation during image acquisition on a per-patient basis. This study was undertaken to develop a novel statistical model that predicts quantitative changes in PET/CT imaging due to respiratory gating. Free-breathing static FDG-PET images without gating and respiratory-gated FDG-PET images were collected from 22 lung and liver cancer patients on a PET/CT scanner. PET imaging quality was quantified with peak standardized uptake value (SUV(peak)) over lesions of interest. Relative differences in SUV(peak) between static and gated PET images were calculated to indicate quantitative imaging changes due to gating. A comprehensive multidimensional extraction of the morphological and statistical characteristics of respiratory patterns was conducted, resulting in 16 features that characterize representative patterns of a single respiratory trace. The six most informative features were subsequently extracted using a stepwise feature selection approach. The multiple-regression model was trained and tested based on a leave-one-subject-out cross-validation. The predicted quantitative improvements in PET imaging achieved an accuracy higher than 90% using a criterion with a dynamic error-tolerance range for SUV(peak) values. The results of this study suggest that our prediction framework could be applied to determine which patients would likely benefit from respiratory motion

  20. Application of Molecular Dynamics Simulations in Molecular Property Prediction I: Density and Heat of Vaporization

    PubMed Central

    Wang, Junmei; Tingjun, Hou

    2011-01-01

    Molecular mechanical force field (FF) methods are useful in studying condensed phase properties. They are complementary to experiment and can often go beyond experiment in atomic details. Even a FF is specific for studying structures, dynamics and functions of biomolecules, it is still important for the FF to accurately reproduce the experimental liquid properties of small molecules that represent the chemical moieties of biomolecules. Otherwise, the force field may not describe the structures and energies of macromolecules in aqueous solutions properly. In this work, we have carried out a systematic study to evaluate the General AMBER Force Field (GAFF) in studying densities and heats of vaporization for a large set of organic molecules that covers the most common chemical functional groups. The latest techniques, such as the particle mesh Ewald (PME) for calculating electrostatic energies, and Langevin dynamics for scaling temperatures, have been applied in the molecular dynamics (MD) simulations. For density, the average percent error (APE) of 71 organic compounds is 4.43% when compared to the experimental values. More encouragingly, the APE drops to 3.43% after the exclusion of two outliers and four other compounds for which the experimental densities have been measured with pressures higher than 1.0 atm. For heat of vaporization, several protocols have been investigated and the best one, P4/ntt0, achieves an average unsigned error (AUE) and a root-mean-square error (RMSE) of 0.93 and 1.20 kcal/mol, respectively. How to reduce the prediction errors through proper van der Waals (vdW) parameterization has been discussed. An encouraging finding in vdW parameterization is that both densities and heats of vaporization approach their “ideal” values in a synchronous fashion when vdW parameters are tuned. The following hydration free energy calculation using thermodynamic integration further justifies the vdW refinement. We conclude that simple vdW parameterization

  1. Prediction of core cancer genes using a hybrid of feature selection and machine learning methods.

    PubMed

    Liu, Y X; Zhang, N N; He, Y; Lun, L J

    2015-01-01

    Machine learning techniques are of great importance in the analysis of microarray expression data, and provide a systematic and promising way to predict core cancer genes. In this study, a hybrid strategy was introduced based on machine learning techniques to select a small set of informative genes, which will lead to improving classification accuracy. First feature filtering algorithms were applied to select a set of top-ranked genes, and then hierarchical clustering and collapsing dense clusters were used to select core cancer genes. Through empirical study, our approach is capable of selecting relatively few core cancer genes while making high-accuracy predictions. The biological significance of these genes was evaluated using systems biology analysis. Extensive functional pathway and network analyses have confirmed findings in previous studies and can bring new insights into common cancer mechanisms. PMID:26345818

  2. Spiking neurons can discover predictive features by aggregate-label learning.

    PubMed

    Gütig, Robert

    2016-03-01

    The brain routinely discovers sensory clues that predict opportunities or dangers. However, it is unclear how neural learning processes can bridge the typically long delays between sensory clues and behavioral outcomes. Here, I introduce a learning concept, aggregate-label learning, that enables biologically plausible model neurons to solve this temporal credit assignment problem. Aggregate-label learning matches a neuron's number of output spikes to a feedback signal that is proportional to the number of clues but carries no information about their timing. Aggregate-label learning outperforms stochastic reinforcement learning at identifying predictive clues and is able to solve unsegmented speech-recognition tasks. Furthermore, it allows unsupervised neural networks to discover reoccurring constellations of sensory features even when they are widely dispersed across space and time. PMID:26941324

  3. Infrared images of reflection nebulae and Orion's bar: Fluorescent molecular hydrogen and the 3.3 micron feature

    NASA Technical Reports Server (NTRS)

    Burton, Michael G.; Moorhouse, Alan; Brand, P. W. J. L.; Roche, Patrick F.; Geballe, T. R.

    1989-01-01

    Images were obtained of the (fluorescent) molecular hydrogen 1-0 S(1) line, and of the 3.3 micron emission feature, in Orion's Bar and three reflection nebulae. The emission from these species appears to come from the same spatial locations in all sources observed. This suggests that the 3.3 micron feature is excited by the same energetic UV-photons which cause the molecular hydrogen to fluoresce.

  4. MERRF: Clinical features, muscle biopsy and molecular genetics in Brazilian patients.

    PubMed

    Lorenzoni, Paulo José; Scola, Rosana H; Kay, Cláudia S Kamoi; Arndt, Raquel C; Silvado, Carlos E; Werneck, Lineu C

    2011-05-01

    Myoclonic epilepsy with ragged red fibers (MERRF) is a mitochondrial disease that is characterized by myoclonic epilepsy with ragged red fibers (RRF) in muscle biopsies. The aim of this study was to analyze Brazilian patients with MERRF. Six patients with MERRF were studied and correlations between clinical findings, laboratory data, electrophysiology, histology and molecular features were examined. We found that blood lactate was increased in four patients. Electroencephalogram studies revealed generalized epileptiform discharges in five patients and generalized photoparoxysmal responses during intermittent photic stimulation in two patients. Muscle biopsies showed RRF in all patients using modified Gomori-trichrome and succinate dehydrogenase stains. Cytochrome c oxidase (COX) stain analysis indicated deficient activity in five patients and subsarcolemmal accumulation in one patient. Molecular analysis of the tRNA(Lys) gene with PCR/RFLP and direct sequencing showed the A8344G mutation of mtDNA in five patients. The presence of RRFs and COX deficiencies in muscle biopsies often confirmed the MERRF diagnosis. We conclude that molecular analysis of the tRNA(Lys) gene is an important criterion to help confirm the MERRF diagnosis. Furthermore, based on the findings of this study, we suggest a revision of the main characteristics of this disease. PMID:21303704

  5. FFPred 3: feature-based function prediction for all Gene Ontology domains

    PubMed Central

    Cozzetto, Domenico; Minneci, Federico; Currant, Hannah; Jones, David T.

    2016-01-01

    Predicting protein function has been a major goal of bioinformatics for several decades, and it has gained fresh momentum thanks to recent community-wide blind tests aimed at benchmarking available tools on a genomic scale. Sequence-based predictors, especially those performing homology-based transfers, remain the most popular but increasing understanding of their limitations has stimulated the development of complementary approaches, which mostly exploit machine learning. Here we present FFPred 3, which is intended for assigning Gene Ontology terms to human protein chains, when homology with characterized proteins can provide little aid. Predictions are made by scanning the input sequences against an array of Support Vector Machines (SVMs), each examining the relationship between protein function and biophysical attributes describing secondary structure, transmembrane helices, intrinsically disordered regions, signal peptides and other motifs. This update features a larger SVM library that extends its coverage to the cellular component sub-ontology for the first time, prompted by the establishment of a dedicated evaluation category within the Critical Assessment of Functional Annotation. The effectiveness of this approach is demonstrated through benchmarking experiments, and its usefulness is illustrated by analysing the potential functional consequences of alternative splicing in human and their relationship to patterns of biological features. PMID:27561554

  6. Unsupervised Feature Learning Improves Prediction of Human Brain Activity in Response to Natural Images

    PubMed Central

    Güçlü, Umut; van Gerven, Marcel A. J.

    2014-01-01

    Encoding and decoding in functional magnetic resonance imaging has recently emerged as an area of research to noninvasively characterize the relationship between stimulus features and human brain activity. To overcome the challenge of formalizing what stimulus features should modulate single voxel responses, we introduce a general approach for making directly testable predictions of single voxel responses to statistically adapted representations of ecologically valid stimuli. These representations are learned from unlabeled data without supervision. Our approach is validated using a parsimonious computational model of (i) how early visual cortical representations are adapted to statistical regularities in natural images and (ii) how populations of these representations are pooled by single voxels. This computational model is used to predict single voxel responses to natural images and identify natural images from stimulus-evoked multiple voxel responses. We show that statistically adapted low-level sparse and invariant representations of natural images better span the space of early visual cortical representations and can be more effectively exploited in stimulus identification than hand-designed Gabor wavelets. Our results demonstrate the potential of our approach to better probe unknown cortical representations. PMID:25101625

  7. FFPred 3: feature-based function prediction for all Gene Ontology domains.

    PubMed

    Cozzetto, Domenico; Minneci, Federico; Currant, Hannah; Jones, David T

    2016-01-01

    Predicting protein function has been a major goal of bioinformatics for several decades, and it has gained fresh momentum thanks to recent community-wide blind tests aimed at benchmarking available tools on a genomic scale. Sequence-based predictors, especially those performing homology-based transfers, remain the most popular but increasing understanding of their limitations has stimulated the development of complementary approaches, which mostly exploit machine learning. Here we present FFPred 3, which is intended for assigning Gene Ontology terms to human protein chains, when homology with characterized proteins can provide little aid. Predictions are made by scanning the input sequences against an array of Support Vector Machines (SVMs), each examining the relationship between protein function and biophysical attributes describing secondary structure, transmembrane helices, intrinsically disordered regions, signal peptides and other motifs. This update features a larger SVM library that extends its coverage to the cellular component sub-ontology for the first time, prompted by the establishment of a dedicated evaluation category within the Critical Assessment of Functional Annotation. The effectiveness of this approach is demonstrated through benchmarking experiments, and its usefulness is illustrated by analysing the potential functional consequences of alternative splicing in human and their relationship to patterns of biological features. PMID:27561554

  8. Prediction of bacterial protein subcellular localization by incorporating various features into Chou's PseAAC and a backward feature selection approach.

    PubMed

    Li, Liqi; Yu, Sanjiu; Xiao, Weidong; Li, Yongsheng; Li, Maolin; Huang, Lan; Zheng, Xiaoqi; Zhou, Shiwen; Yang, Hua

    2014-09-01

    Information on the subcellular localization of bacterial proteins is essential for protein function prediction, genome annotation and drug design. Here we proposed a novel approach to predict the subcellular localization of bacterial proteins by fusing features from position-specific score matrix (PSSM), Gene Ontology (GO) and PROFEAT. A backward feature selection approach by linear kennel of SVM was then used to rank the integrated feature vectors and extract optimal features. Finally, SVM was applied for predicting protein subcellular locations based on these optimal features. To validate the performance of our method, we employed jackknife cross-validation tests on three low similarity datasets, i.e., M638, Gneg1456 and Gpos523. The overall accuracies of 94.98%, 93.21%, and 94.57% were achieved for these three datasets, which are higher (from 1.8% to 10.9%) than those by state-of-the-art tools. Comparison results suggest that our method could serve as a very useful vehicle for expediting the prediction of bacterial protein subcellular localization. PMID:24929100

  9. Pathology Features in Bethesda Guidelines Predict Colorectal Cancer Microsatellite Instability: A Population-Based Study

    PubMed Central

    Jenkins, Mark A.; Hayashi, Shinichi; O’shea, Anne-Marie; Burgart, Lawrence J.; Smyrk, Tom C.; Shimizu, David; Waring, Paul M.; Ruszkiewicz, Andrew R.; Pollett, Aaron F.; Redston, Mark; Barker, Melissa A.; Baron, John A.; Casey, Graham R.; Dowty, James G.; Giles, Graham G.; Limburg, Paul; Newcomb, Polly; Young, Joanne P.; Walsh, Michael D.; Thibodeau, Stephen N.; Lindor, Noralane M.; Lemarchand, Loïc; Gallinger, Steven; Haile, Robert W.; Potter, John D.; Hopper, John L.; Jass, Jeremy R.

    2010-01-01

    Background & Aims The revised Bethesda guidelines for Lynch syndrome recommend microsatellite instability (MSI) testing all colorectal cancers in patients diagnosed before age 50 years and colorectal cancers diagnosed in patients between ages 50 and 59 years with particular pathology features. Our aim was to identify pathology and other features that independently predict high MSI (MSI-H). Methods Archival tissue from 1098 population-based colorectal cancers diagnosed before age 60 years was tested for MSI. Pathology features, site, and age at diagnosis were obtained. Multiple logistic regression was performed to determine the predictive value of each feature, as measured by an odds ratio (OR), from which a scoring system (MsPath) was developed to estimate the probability a colorectal cancer is MSI-H. Results Fifteen percent of tumors (162) were MSI-H. Independent predictors were tumor-infiltrating lymphocytes (OR, 9.1; 95% confidence interval [CI], 5.9 –14.1), proximal subsite (OR, 4.7; 95% CI, 3.1–7.3), mucinous histology (OR, 2.8; 95% CI, 1.7– 4.8), poor differentiation (OR, 1.9; 95% CI, 1.2–3.1), Crohn’s-like reaction (OR, 1.9; 95% CI, 1.2–2.9), and diagnosis before age 50 years (OR, 1.9; 95% CI, 1.3–2.9). MsPath score ≥ 1.0 had a sensitivity of 93% and a specificity of 55% for MSI-H. Conclusions The probability an individual colorectal cancer is MSI-H is predicted well by the MsPath score. There is little value in testing for DNA mismatch repair loss in tumors, or for germline mismatch repair mutations, for colorectal cancers diagnosed in patients before age 60 years with an MSPath score <1 (approximately 50%). Pathology can identify almost all MSI-H colorectal cancers diagnosed before age 60 years. PMID:17631130

  10. In silico predictive studies of mAHR congener binding using homology modelling and molecular docking.

    PubMed

    Panda, Roshni; Cleave, A Suneetha Susan; Suresh, P K

    2014-09-01

    The aryl hydrocarbon receptor (AHR) is one of the principal xenobiotic, nuclear receptor that is responsible for the early events involved in the transcription of a complex set of genes comprising the CYP450 gene family. In the present computational study, homology modelling and molecular docking were carried out with the objective of predicting the relationship between the binding efficiency and the lipophilicity of different polychlorinated biphenyl (PCB) congeners and the AHR in silico. Homology model of the murine AHR was constructed by several automated servers and assessed by PROCHECK, ERRAT, VERIFY3D and WHAT IF. The resulting model of the AHR by MODWEB was used to carry out molecular docking of 36 PCB congeners using PatchDock server. The lipophilicity of the congeners was predicted using the XLOGP3 tool. The results suggest that the lipophilicity influences binding energy scores and is positively correlated with the same. Score and Log P were correlated with r = +0.506 at p = 0.01 level. In addition, the number of chlorine (Cl) atoms and Log P were highly correlated with r = +0.900 at p = 0.01 level. The number of Cl atoms and scores also showed a moderate positive correlation of r = +0.481 at p = 0.01 level. To the best of our knowledge, this is the first study employing PatchDock in the docking of AHR to the environmentally deleterious congeners and attempting to correlate structural features of the AHR with its biochemical properties with regards to PCBs. The result of this study are consistent with those of other computational studies reported in the previous literature that suggests that a combination of docking, scoring and ranking organic pollutants could be a possible predictive tool for investigating ligand-mediated toxicity, for their subsequent validation using wet lab-based studies. PMID:23081860

  11. Melancholic depression prediction by identifying representative features in metabolic and microarray profiles with missing values.

    PubMed

    Nie, Zhi; Yang, Tao; Liu, Yashu; Li, Qingyang; Narayan, Vaibhav A; Wittenberg, Gayle; Ye, Jieping

    2015-01-01

    Recent studies have revealed that melancholic depression, one major subtype of depression, is closely associated with the concentration of some metabolites and biological functions of certain genes and pathways. Meanwhile, recent advances in biotechnologies have allowed us to collect a large amount of genomic data, e.g., metabolites and microarray gene expression. With such a huge amount of information available, one approach that can give us new insights into the understanding of the fundamental biology underlying melancholic depression is to build disease status prediction models using classification or regression methods. However, the existence of strong empirical correlations, e.g., those exhibited by genes sharing the same biological pathway in microarray profiles, tremendously limits the performance of these methods. Furthermore, the occurrence of missing values which are ubiquitous in biomedical applications further complicates the problem. In this paper, we hypothesize that the problem of missing values might in some way benefit from the correlation between the variables and propose a method to learn a compressed set of representative features through an adapted version of sparse coding which is capable of identifying correlated variables and addressing the issue of missing values simultaneously. An efficient algorithm is also developed to solve the proposed formulation. We apply the proposed method on metabolic and microarray profiles collected from a group of subjects consisting of both patients with melancholic depression and healthy controls. Results show that the proposed method can not only produce meaningful clusters of variables but also generate a set of representative features that achieve superior classification performance over those generated by traditional clustering and data imputation techniques. In particular, on both datasets, we found that in comparison with the competing algorithms, the representative features learned by the proposed

  12. MELANCHOLIC DEPRESSION PREDICTION BY IDENTIFYING REPRESENTATIVE FEATURES IN METABOLIC AND MICROARRAY PROFILES WITH MISSING VALUES

    PubMed Central

    Nie, Zhi; Yang, Tao; Liu, Yashu; Lin, Binbin; Li, Qingyang; Narayan, Vaibhav A; Wittenberg, Gayle; Ye, Jieping

    2014-01-01

    Recent studies have revealed that melancholic depression, one major subtype of depression, is closely associated with the concentration of some metabolites and biological functions of certain genes and pathways. Meanwhile, recent advances in biotechnologies have allowed us to collect a large amount of genomic data, e.g., metabolites and microarray gene expression. With such a huge amount of information available, one approach that can give us new insights into the understanding of the fundamental biology underlying melancholic depression is to build disease status prediction models using classification or regression methods. However, the existence of strong empirical correlations, e.g., those exhibited by genes sharing the same biological pathway in microarray profiles, tremendously limits the performance of these methods. Furthermore, the occurrence of missing values which are ubiquitous in biomedical applications further complicates the problem. In this paper, we hypothesize that the problem of missing values might in some way benefit from the correlation between the variables and propose a method to learn a compressed set of representative features through an adapted version of sparse coding which is capable of identifying correlated variables and addressing the issue of missing values simultaneously. An efficient algorithm is also developed to solve the proposed formulation. We apply the proposed method on metabolic and microarray profiles collected from a group of subjects consisting of both patients with melancholic depression and healthy controls. Results show that the proposed method can not only produce meaningful clusters of variables but also generate a set of representative features that achieve superior classification performance over those generated by traditional clustering and data imputation techniques. In particular, on both datasets, we found that in comparison with the competing algorithms, the representative features learned by the proposed

  13. Molecular features assisting in diagnosis, surgery, and treatment decision making in low-grade gliomas.

    PubMed

    Chen, Ricky; Ravindra, Vijay M; Cohen, Adam L; Jensen, Randy L; Salzman, Karen L; Prescot, Andrew P; Colman, Howard

    2015-03-01

    The preferred management of suspected low-grade gliomas (LGGs) has been disputed, and the implications of molecular changes for medical and surgical management of LGGs are important to consider. Current strategies that make use of molecular markers and imaging techniques and therapeutic considerations offer additional options for management of LGGs. Mutations in the isocitrate dehydrogenase 1 and 2 (IDH1 and IDH2) genes suggest a role for this abnormal metabolic pathway in the pathogenesis and progression of these primary brain tumors. Use of magnetic resonance spectroscopy can provide preoperative detection of IDH-mutated gliomas and affect surgical planning. In addition, IDH1 and IDH2 mutation status may have an effect on surgical resectability of gliomas. The IDH-mutated tumors exhibit better prognosis throughout every grade of glioma, and mutation may be an early genetic event, preceding lineage-specific secondary and tertiary alterations that transform LGGs into secondary glioblastomas. The O6-methylguanine-DNAmethyltransferase (MGMT) promoter methylation and 1p19q codeletion status can predict sensitivity to chemotherapy and radiation in low- and intermediate-grade gliomas. Thus, these recent advances, which have led to a better understanding of how molecular, genetic, and epigenetic alterations influence the pathogenicity of the different histological grades of gliomas, can lead to better prognostication and may lead to specific targeted surgical interventions and medical therapies. PMID:25727224

  14. Larval description of Drusus bosnicus Klapálek 1899 (Trichoptera: Limnephilidae), with distributional, molecular and ecological features

    PubMed Central

    KUČINIĆ, MLADEN; PREVIŠIĆ, ANA; GRAF, WOLFRAM; MIHOCI, IVA; ŠOUFEK, MARIN; STANIĆ-KOŠTROMAN, SVJETLANA; LELO, SUVAD; VITECEK, SIMON; WARINGER, JOHANN

    2016-01-01

    In this study we present morphological, molecular and ecological features of the last instar larvae of Drusus bosnicus with data about distribution of this species in Bosnia and Herzegovina. We also included are the most important diagnostic features enabling separation of larvae of D. bosnicus from larvae of the other European Drusinae and Trichoptera species. PMID:26249056

  15. Larval description of Drusus bosnicus Klapálek 1899 (Trichoptera: Limnephilidae), with distributional, molecular and ecological features.

    PubMed

    Kučinić, Mladen; Previšić, Ana; Graf, Wolfram; Mihoci, Iva; Šoufek, Marin; Stanić-Koštroman, Svjetlana; Lelo, Suvad; Vitecek, Simon; Waringer, Johann

    2015-01-01

    In this study we present morphological, molecular and ecological features of the last instar larvae of Drusus bosnicus with data about distribution of this species in Bosnia and Herzegovina. We also included  the most important diagnostic features enabling separation of larvae of D. bosnicus from larvae of the other European Drusinae and Trichoptera species. PMID:26249056

  16. Ribonucleotide reductases reveal novel viral diversity and predict biological and ecological features of unknown marine viruses.

    PubMed

    Sakowski, Eric G; Munsell, Erik V; Hyatt, Mara; Kress, William; Williamson, Shannon J; Nasko, Daniel J; Polson, Shawn W; Wommack, K Eric

    2014-11-01

    Virioplankton play a crucial role in aquatic ecosystems as top-down regulators of bacterial populations and agents of horizontal gene transfer and nutrient cycling. However, the biology and ecology of virioplankton populations in the environment remain poorly understood. Ribonucleotide reductases (RNRs) are ancient enzymes that reduce ribonucleotides to deoxyribonucleotides and thus prime DNA synthesis. Composed of three classes according to O2 reactivity, RNRs can be predictive of the physiological conditions surrounding DNA synthesis. RNRs are universal among cellular life, common within viral genomes and virioplankton shotgun metagenomes (viromes), and estimated to occur within >90% of the dsDNA virioplankton sampled in this study. RNRs occur across diverse viral groups, including all three morphological families of tailed phages, making these genes attractive for studies of viral diversity. Differing patterns in virioplankton diversity were clear from RNRs sampled across a broad oceanic transect. The most abundant RNRs belonged to novel lineages of podoviruses infecting α-proteobacteria, a bacterial class critical to oceanic carbon cycling. RNR class was predictive of phage morphology among cyanophages and RNR distribution frequencies among cyanophages were largely consistent with the predictions of the "kill the winner-cost of resistance" model. RNRs were also identified for the first time to our knowledge within ssDNA viromes. These data indicate that RNR polymorphism provides a means of connecting the biological and ecological features of virioplankton populations. PMID:25313075

  17. Ribonucleotide reductases reveal novel viral diversity and predict biological and ecological features of unknown marine viruses

    PubMed Central

    Sakowski, Eric G.; Munsell, Erik V.; Hyatt, Mara; Kress, William; Williamson, Shannon J.; Nasko, Daniel J.; Polson, Shawn W.; Wommack, K. Eric

    2014-01-01

    Virioplankton play a crucial role in aquatic ecosystems as top-down regulators of bacterial populations and agents of horizontal gene transfer and nutrient cycling. However, the biology and ecology of virioplankton populations in the environment remain poorly understood. Ribonucleotide reductases (RNRs) are ancient enzymes that reduce ribonucleotides to deoxyribonucleotides and thus prime DNA synthesis. Composed of three classes according to O2 reactivity, RNRs can be predictive of the physiological conditions surrounding DNA synthesis. RNRs are universal among cellular life, common within viral genomes and virioplankton shotgun metagenomes (viromes), and estimated to occur within >90% of the dsDNA virioplankton sampled in this study. RNRs occur across diverse viral groups, including all three morphological families of tailed phages, making these genes attractive for studies of viral diversity. Differing patterns in virioplankton diversity were clear from RNRs sampled across a broad oceanic transect. The most abundant RNRs belonged to novel lineages of podoviruses infecting α-proteobacteria, a bacterial class critical to oceanic carbon cycling. RNR class was predictive of phage morphology among cyanophages and RNR distribution frequencies among cyanophages were largely consistent with the predictions of the “kill the winner–cost of resistance” model. RNRs were also identified for the first time to our knowledge within ssDNA viromes. These data indicate that RNR polymorphism provides a means of connecting the biological and ecological features of virioplankton populations. PMID:25313075

  18. Time Score: A New Feature for Link Prediction in Social Networks

    NASA Astrophysics Data System (ADS)

    Munasinghe, Lankeshwara; Ichise, Ryutaro

    Link prediction in social networks, such as friendship networks and coauthorship networks, has recently attracted a great deal of attention. There have been numerous attempts to address the problem of link prediction through diverse approaches. In the present paper, we focus on the temporal behavior of the link strength, particularly the relationship between the time stamps of interactions or links and the temporal behavior of link strength and how link strength affects future link evolution. Most previous studies have not sufficiently discussed either the impact of time stamps of the interactions or time stamps of the links on link evolution. The gap between the current time and the time stamps of the interactions or links is also important to link evolution. In the present paper, we introduce a new time-aware feature, referred to as time score, that captures the important aspects of time stamps of interactions and the temporality of the link strengths. We also analyze the effectiveness of time score with different parameter settings for different network data sets. The results of the analysis revealed that the time score was sensitive to different networks and different time measures. We applied time score to two social network data sets, namely, Facebook friendship network data set and a coauthorship network data set. The results revealed a significant improvement in predicting future links.

  19. Accurate single-sequence prediction of solvent accessible surface area using local and global features

    PubMed Central

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-01-01

    We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org. PMID:25204636

  20. PGlcS: Prediction of protein O-GlcNAcylation sites with multiple features and analysis.

    PubMed

    Zhao, Xiaowei; Ning, Qiao; Chai, Haiting; Ai, Meiyue; Ma, Zhiqiang

    2015-09-01

    As a widespread type of protein post-translational modification, O-GlcNAcylation plays crucial regulatory roles in almost all cellular processes and is related to some diseases. To deeply understand O-GlcNAcylated mechanisms, identification of substrates and specific O-GlcNAcylated sites is crucial. Experimental identification is expensive and time-consuming, so computational prediction of O-GlcNAcylated sites has considerable value. In this work, we developed a novel O-GlcNAcylated sites predictor called PGlcS (Prediction of O-GlcNAcylated Sites) by using k-means cluster to obtain informative and reliable negative samples, and support vector machines classifier combined with a two-step feature selection. The performance of PGlcS was evaluated using an independent testing dataset resulting in a sensitivity of 64.62%, a specificity of 68.4%, an accuracy of 68.37%, and a Matthew׳s correlation coefficient of 0.0697, which demonstrated PGlcS was very promising for predicting O-GlcNAcylated sites. The datasets and source code were available in Supplementary information. PMID:26116363

  1. Accurate single-sequence prediction of solvent accessible surface area using local and global features.

    PubMed

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-11-01

    We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org. PMID:25204636

  2. Functional connectivity classification of autism identifies highly predictive brain features but falls short of biomarker standards

    PubMed Central

    Plitt, Mark; Barnes, Kelly Anne; Martin, Alex

    2014-01-01

    Objectives Autism spectrum disorders (ASD) are diagnosed based on early-manifesting clinical symptoms, including markedly impaired social communication. We assessed the viability of resting-state functional MRI (rs-fMRI) connectivity measures as diagnostic biomarkers for ASD and investigated which connectivity features are predictive of a diagnosis. Methods Rs-fMRI scans from 59 high functioning males with ASD and 59 age- and IQ-matched typically developing (TD) males were used to build a series of machine learning classifiers. Classification features were obtained using 3 sets of brain regions. Another set of classifiers was built from participants' scores on behavioral metrics. An additional age and IQ-matched cohort of 178 individuals (89 ASD; 89 TD) from the Autism Brain Imaging Data Exchange (ABIDE) open-access dataset (http://fcon_1000.projects.nitrc.org/indi/abide/) were included for replication. Results High classification accuracy was achieved through several rs-fMRI methods (peak accuracy 76.67%). However, classification via behavioral measures consistently surpassed rs-fMRI classifiers (peak accuracy 95.19%). The class probability estimates, P(ASD|fMRI data), from brain-based classifiers significantly correlated with scores on a measure of social functioning, the Social Responsiveness Scale (SRS), as did the most informative features from 2 of the 3 sets of brain-based features. The most informative connections predominantly originated from regions strongly associated with social functioning. Conclusions While individuals can be classified as having ASD with statistically significant accuracy from their rs-fMRI scans alone, this method falls short of biomarker standards. Classification methods provided further evidence that ASD functional connectivity is characterized by dysfunction of large-scale functional networks, particularly those involved in social information processing. PMID:25685703

  3. An approach to predict Sudden Cardiac Death (SCD) using time domain and bispectrum features from HRV signal.

    PubMed

    Houshyarifar, Vahid; Chehel Amirani, Mehdi

    2016-08-12

    In this paper we present a method to predict Sudden Cardiac Arrest (SCA) with higher order spectral (HOS) and linear (Time) features extracted from heart rate variability (HRV) signal. Predicting the occurrence of SCA is important in order to avoid the probability of Sudden Cardiac Death (SCD). This work is a challenge to predict five minutes before SCA onset. The method consists of four steps: pre-processing, feature extraction, feature reduction, and classification. In the first step, the QRS complexes are detected from the electrocardiogram (ECG) signal and then the HRV signal is extracted. In second step, bispectrum features of HRV signal and time-domain features are obtained. Six features are extracted from bispectrum and two features from time-domain. In the next step, these features are reduced to one feature by the linear discriminant analysis (LDA) technique. Finally, KNN and support vector machine-based classifiers are used to classify the HRV signals. We used two database named, MIT/BIH Sudden Cardiac Death (SCD) Database and Physiobank Normal Sinus Rhythm (NSR). In this work we achieved prediction of SCD occurrence for six minutes before the SCA with the accuracy over 91%. PMID:27567781

  4. A data-driven feature extraction framework for predicting the severity of condition of congestive heart failure patients.

    PubMed

    Sideris, Costas; Alshurafa, Nabil; Pourhomayoun, Mohammad; Shahmohammadi, Farhad; Samy, Lauren; Sarrafzadeh, Majid

    2015-08-01

    In this paper, we propose a novel methodology for utilizing disease diagnostic information to predict severity of condition for Congestive Heart Failure (CHF) patients. Our methodology relies on a novel, clustering-based, feature extraction framework using disease diagnostic information. To reduce the dimensionality we identify disease clusters using cooccurence frequencies. We then utilize these clusters as features to predict patient severity of condition. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 patients. We compare our cluster-based feature set with another that incorporates the Charlson comorbidity score as a feature and demonstrate an accuracy improvement of up to 14% in the predictability of the severity of condition. PMID:26736808

  5. Unveiling atomic-scale features of inherent heterogeneity in metallic glass by molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Hu, Y. C.; Guan, P. F.; Li, M. Z.; Liu, C. T.; Yang, Y.; Bai, H. Y.; Wang, W. H.

    2016-06-01

    Heterogeneity is commonly believed to be intrinsic to metallic glasses (MGs). Nevertheless, how to distinguish and characterize the heterogeneity at the atomic level is still debated. Based on the extensive molecular dynamics simulations that combine isoconfigurational ensemble and atomic pinning methods, we directly reveal that MG contains flow units and the elastic matrix which can be well distinguished by their distinctive atomic-level responsiveness and mechanical performance. The microscopic features of the flow units, such as the shape, spatial distribution dimensionality, and correlation length, are characterized from atomic position analyses. Furthermore, the correlation between the flow units and the landscape of energy state, free volume, atomic-level stress, and especially the local bond orientational order parameter is discussed.

  6. Predicting cytotoxicity of PAMAM dendrimers using molecular descriptors

    PubMed Central

    Jones, David E; Ghandehari, Hamidreza

    2015-01-01

    Summary The use of data mining techniques in the field of nanomedicine has been very limited. In this paper we demonstrate that data mining techniques can be used for the development of predictive models of the cytotoxicity of poly(amido amine) (PAMAM) dendrimers using their chemical and structural properties. We present predictive models developed using 103 PAMAM dendrimer cytotoxicity values that were extracted from twelve cancer nanomedicine journal articles. The results indicate that data mining and machine learning can be effectively used to predict the cytotoxicity of PAMAM dendrimers on Caco-2 cells. PMID:26665059

  7. Sequence features accurately predict genome-wide MeCP2 binding in vivo

    PubMed Central

    Rube, H. Tomas; Lee, Wooje; Hejna, Miroslav; Chen, Huaiyang; Yasui, Dag H.; Hess, John F.; LaSalle, Janine M.; Song, Jun S.; Gong, Qizhi

    2016-01-01

    Methyl-CpG binding protein 2 (MeCP2) is critical for proper brain development and expressed at near-histone levels in neurons, but the mechanism of its genomic localization remains poorly understood. Using high-resolution MeCP2-binding data, we show that DNA sequence features alone can predict binding with 88% accuracy. Integrating MeCP2 binding and DNA methylation in a probabilistic graphical model, we demonstrate that previously reported genome-wide association with methylation is in part due to MeCP2's affinity to GC-rich chromatin, a result replicated using published data. Furthermore, MeCP2 co-localizes with nucleosomes. Finally, MeCP2 binding downstream of promoters correlates with increased expression in Mecp2-deficient neurons. PMID:27008915

  8. Sequence features accurately predict genome-wide MeCP2 binding in vivo.

    PubMed

    Rube, H Tomas; Lee, Wooje; Hejna, Miroslav; Chen, Huaiyang; Yasui, Dag H; Hess, John F; LaSalle, Janine M; Song, Jun S; Gong, Qizhi

    2016-01-01

    Methyl-CpG binding protein 2 (MeCP2) is critical for proper brain development and expressed at near-histone levels in neurons, but the mechanism of its genomic localization remains poorly understood. Using high-resolution MeCP2-binding data, we show that DNA sequence features alone can predict binding with 88% accuracy. Integrating MeCP2 binding and DNA methylation in a probabilistic graphical model, we demonstrate that previously reported genome-wide association with methylation is in part due to MeCP2's affinity to GC-rich chromatin, a result replicated using published data. Furthermore, MeCP2 co-localizes with nucleosomes. Finally, MeCP2 binding downstream of promoters correlates with increased expression in Mecp2-deficient neurons. PMID:27008915

  9. Molecular effective coverage surface area of optical clearing agents for predicting optical clearing potential

    NASA Astrophysics Data System (ADS)

    Feng, Wei; Ma, Ning; Zhu, Dan

    2015-03-01

    The improvement of methods for optical clearing agent prediction exerts an important impact on tissue optical clearing technique. The molecular dynamic simulation is one of the most convincing and simplest approaches to predict the optical clearing potential of agents by analyzing the hydrogen bonds, hydrogen bridges and hydrogen bridges type forming between agents and collagen. However, the above analysis methods still suffer from some problem such as analysis of cyclic molecule by reason of molecular conformation. In this study, a molecular effective coverage surface area based on the molecular dynamic simulation was proposed to predict the potential of optical clearing agents. Several typical cyclic molecules, fructose, glucose and chain molecules, sorbitol, xylitol were analyzed by calculating their molecular effective coverage surface area, hydrogen bonds, hydrogen bridges and hydrogen bridges type, respectively. In order to verify this analysis methods, in vitro skin samples optical clearing efficacy were measured after 25 min immersing in the solutions, fructose, glucose, sorbitol and xylitol at concentration of 3.5 M using 1951 USAF resolution test target. The experimental results show accordance with prediction of molecular effective coverage surface area. Further to compare molecular effective coverage surface area with other parameters, it can show that molecular effective coverage surface area has a better performance in predicting OCP of agents.

  10. Assessment of two mammographic density related features in predicting near-term breast cancer risk

    NASA Astrophysics Data System (ADS)

    Zheng, Bin; Sumkin, Jules H.; Zuley, Margarita L.; Wang, Xingwei; Klym, Amy H.; Gur, David

    2012-02-01

    In order to establish a personalized breast cancer screening program, it is important to develop risk models that have high discriminatory power in predicting the likelihood of a woman developing an imaging detectable breast cancer in near-term (e.g., <3 years after a negative examination in question). In epidemiology-based breast cancer risk models, mammographic density is considered the second highest breast cancer risk factor (second to woman's age). In this study we explored a new feature, namely bilateral mammographic density asymmetry, and investigated the feasibility of predicting near-term screening outcome. The database consisted of 343 negative examinations, of which 187 depicted cancers that were detected during the subsequent screening examination and 155 that remained negative. We computed the average pixel value of the segmented breast areas depicted on each cranio-caudal view of the initial negative examinations. We then computed the mean and difference mammographic density for paired bilateral images. Using woman's age, subjectively rated density (BIRADS), and computed mammographic density related features we compared classification performance in estimating the likelihood of detecting cancer during the subsequent examination using areas under the ROC curves (AUC). The AUCs were 0.63+/-0.03, 0.54+/-0.04, 0.57+/-0.03, 0.68+/-0.03 when using woman's age, BIRADS rating, computed mean density and difference in computed bilateral mammographic density, respectively. Performance increased to 0.62+/-0.03 and 0.72+/-0.03 when we fused mean and difference in density with woman's age. The results suggest that, in this study, bilateral mammographic tissue density is a significantly stronger (p<0.01) risk indicator than both woman's age and mean breast density.

  11. Body Composition Features Predict Overall Survival in Patients With Hepatocellular Carcinoma

    PubMed Central

    Singal, Amit G; Zhang, Peng; Waljee, Akbar K; Ananthakrishnan, Lakshmi; Parikh, Neehar D; Sharma, Pratima; Barman, Pranab; Krishnamurthy, Venkataramu; Wang, Lu; Wang, Stewart C; Su, Grace L

    2016-01-01

    Objectives: Existing prognostic models for patients with hepatocellular carcinoma (HCC) have limitations. Analytic morphomics, a novel process to measure body composition using computational image-processing algorithms, may offer further prognostic information. The aim of this study was to develop and validate a prognostic model for HCC patients using body composition features and objective clinical information. Methods: Using computed tomography scans from a cohort of HCC patients at the VA Ann Arbor Healthcare System between January 2006 and December 2013, we developed a prognostic model using analytic morphomics and routine clinical data based on multivariate Cox regression and regularization methods. We assessed model performance using C-statistics and validated predicted survival probabilities. We validated model performance in an external cohort of HCC patients from Parkland Hospital, a safety-net health system in Dallas County. Results: The derivation cohort consisted of 204 HCC patients (20.1% Barcelona Clinic Liver Cancer classification (BCLC) 0/A), and the validation cohort had 225 patients (22.2% BCLC 0/A). The analytic morphomics model had good prognostic accuracy in the derivation cohort (C-statistic 0.80, 95% confidence interval (CI) 0.71–0.89) and external validation cohort (C-statistic 0.75, 95% CI 0.68–0.82). The accuracy of the analytic morphomics model was significantly higher than that of TNM and BCLC staging systems in derivation (P<0.001 for both) and validation (P<0.001 for both) cohorts. For calibration, mean absolute errors in predicted 1-year survival probabilities were 5.3% (90% quantile of 7.5%) and 7.6% (90% quantile of 12.5%) in the derivation and validation cohorts, respectively. Conclusion: Body composition features, combined with readily available clinical data, can provide valuable prognostic information for patients with newly diagnosed HCC. PMID:27228403

  12. Spatial Habitat Features Derived from Multiparametric Magnetic Resonance Imaging Data Are Associated with Molecular Subtype and 12-Month Survival Status in Glioblastoma Multiforme

    PubMed Central

    Lee, Joonsang; Narang, Shivali; Martinez, Juan; Rao, Ganesh; Rao, Arvind

    2015-01-01

    One of the most common and aggressive malignant brain tumors is Glioblastoma multiforme. Despite the multimodality treatment such as radiation therapy and chemotherapy (temozolomide: TMZ), the median survival rate of glioblastoma patient is less than 15 months. In this study, we investigated the association between measures of spatial diversity derived from spatial point pattern analysis of multiparametric magnetic resonance imaging (MRI) data with molecular status as well as 12-month survival in glioblastoma. We obtained 27 measures of spatial proximity (diversity) via spatial point pattern analysis of multiparametric T1 post-contrast and T2 fluid-attenuated inversion recovery MRI data. These measures were used to predict 12-month survival status (≤12 or >12 months) in 74 glioblastoma patients. Kaplan-Meier with receiver operating characteristic analyses was used to assess the relationship between derived spatial features and 12-month survival status as well as molecular subtype status in patients with glioblastoma. Kaplan-Meier survival analysis revealed that 14 spatial features were capable of stratifying overall survival in a statistically significant manner. For prediction of 12-month survival status based on these diversity indices, sensitivity and specificity were 0.86 and 0.64, respectively. The area under the receiver operating characteristic curve and the accuracy were 0.76 and 0.75, respectively. For prediction of molecular subtype status, proneural subtype shows highest accuracy of 0.93 among all molecular subtypes based on receiver operating characteristic analysis. We find that measures of spatial diversity from point pattern analysis of intensity habitats from T1 post-contrast and T2 fluid-attenuated inversion recovery images are associated with both tumor subtype status and 12-month survival status and may therefore be useful indicators of patient prognosis, in addition to providing potential guidance for molecularly-targeted therapies in

  13. Clinical, Pathological, and Molecular Features of Lung Adenocarcinomas with AXL Expression.

    PubMed

    Sato, Katsuaki; Suda, Kenichi; Shimizu, Shigeki; Sakai, Kazuko; Mizuuchi, Hiroshi; Tomizawa, Kenji; Takemoto, Toshiki; Nishio, Kazuto; Mitsudomi, Tetsuya

    2016-01-01

    The receptor tyrosine kinase AXL is a member of the Tyro3-Axl-Mer receptor tyrosine kinase subfamily. AXL affects several cellular functions, including growth and migration. AXL aberration is reportedly a marker for poor prognosis and treatment resistance in various cancers. In this study, we analyzed clinical, pathological, and molecular features of AXL expression in lung adenocarcinomas (LADs). We examined 161 LAD specimens from patients who underwent pulmonary resections. When AXL protein expression was quantified (0, 1+, 2+, 3+) according to immunohistochemical staining intensity, results were 0: 35%; 1+: 20%; 2+: 37%; and 3+: 7% for the 161 samples. AXL expression status did not correlate with clinical features, including smoking status and pathological stage. However, patients whose specimens showed strong AXL expression (3+) had markedly poorer prognoses than other groups (P = 0.0033). Strong AXL expression was also significantly associated with downregulation of E-cadherin (P = 0.025) and CD44 (P = 0.0010). In addition, 9 of 12 specimens with strong AXL expression had driver gene mutations (6 with EGFR, 2 with KRAS, 1 with ALK). In conclusion, we found that strong AXL expression in surgically resected LADs was a predictor of poor prognosis. LADs with strong AXL expression were characterized by mesenchymal status, higher expression of stem-cell-like markers, and frequent driver gene mutations. PMID:27100677

  14. Clinical, Pathological, and Molecular Features of Lung Adenocarcinomas with AXL Expression

    PubMed Central

    Suda, Kenichi; Shimizu, Shigeki; Sakai, Kazuko; Mizuuchi, Hiroshi; Tomizawa, Kenji; Takemoto, Toshiki; Nishio, Kazuto; Mitsudomi, Tetsuya

    2016-01-01

    The receptor tyrosine kinase AXL is a member of the Tyro3-Axl-Mer receptor tyrosine kinase subfamily. AXL affects several cellular functions, including growth and migration. AXL aberration is reportedly a marker for poor prognosis and treatment resistance in various cancers. In this study, we analyzed clinical, pathological, and molecular features of AXL expression in lung adenocarcinomas (LADs). We examined 161 LAD specimens from patients who underwent pulmonary resections. When AXL protein expression was quantified (0, 1+, 2+, 3+) according to immunohistochemical staining intensity, results were 0: 35%; 1+: 20%; 2+: 37%; and 3+: 7% for the 161 samples. AXL expression status did not correlate with clinical features, including smoking status and pathological stage. However, patients whose specimens showed strong AXL expression (3+) had markedly poorer prognoses than other groups (P = 0.0033). Strong AXL expression was also significantly associated with downregulation of E-cadherin (P = 0.025) and CD44 (P = 0.0010). In addition, 9 of 12 specimens with strong AXL expression had driver gene mutations (6 with EGFR, 2 with KRAS, 1 with ALK). In conclusion, we found that strong AXL expression in surgically resected LADs was a predictor of poor prognosis. LADs with strong AXL expression were characterized by mesenchymal status, higher expression of stem-cell-like markers, and frequent driver gene mutations. PMID:27100677

  15. Predicting DNA binding proteins using support vector machine with hybrid fractal features.

    PubMed

    Niu, Xiao-Hui; Hu, Xue-Hai; Shi, Feng; Xia, Jing-Bo

    2014-02-21

    DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances. PMID:24189096

  16. The extent of whole-genome copy number alterations predicts aggressive features in primary melanomas.

    PubMed

    Gandolfi, Greta; Longo, Caterina; Moscarella, Elvira; Zalaudek, Iris; Sancisi, Valentina; Raucci, Margherita; Manzotti, Gloria; Gugnoni, Mila; Piana, Simonetta; Argenziano, Giuseppe; Ciarrocchi, Alessia

    2016-03-01

    Recent evidence indicates that melanoma comprises distinct types of tumors and suggests that specific morphological features may help predict its clinical behavior. Using a SNP-array approach, we quantified chromosomal copy number alterations (CNA) across the whole genome in 41 primary melanomas and found a high degree of heterogeneity in their genomic asset. Association analysis correlating the number and relative length of CNA with clinical, morphological, and dermoscopic attributes of melanoma revealed that features of aggressiveness were strongly linked to the overall amount of genomic damage. Furthermore, we observed that melanoma progression and survival were mainly affected by a low number of large chromosome losses and a high number of small gains. We identified the alterations most frequently associated with aggressive melanoma, and by integrating our data with publicly available gene expression profiles, we identified five genes which expression was found to be necessary for melanoma cells proliferation. In conclusion, this work provides new evidence that the phenotypic heterogeneity of melanoma reflects a parallel genetic diversity and lays the basis to define novel strategies for a more precise prognostic stratification of patients. PMID:26575206

  17. What catches a radiologist's eye? A comprehensive comparison of feature types for saliency prediction

    NASA Astrophysics Data System (ADS)

    Alzubaidi, Mohammad; Balasubramanian, Vineeth; Patel, Ameet; Panchanathan, Sethuraman; Black, John A., Jr.

    2010-03-01

    Experienced radiologists are in short supply, and are sometimes called upon to read many images in a short amount of time. This leaves them with a limited amount of time to read images, and can lead to fatigue and stress which can be sources of error, as they overlook subtle abnormalities that they otherwise might not miss. Another factor in error rates is called satisfaction of search, where a radiologist misses a second (typically subtle) abnormality after finding the first. These types of errors are due primarily to a lack of attention to an important region of the image during the search. In this paper we discuss the use of eye tracker technology, in combination with image analysis and machine learning techniques, to learn what types of features catch the eye experienced radiologists when reading chest x-rays for diagnostic purposes, and to then use that information to produce saliency maps that predict what regions of each image might be most interesting to radiologists. We found that, out of 13 popular features types that are widely extracted to characterize images, 4 are particularly useful for this task: (1) Localized Edge Orientation Histograms (2) Haar Wavelets, (3) Gabor Filters, and (4) Steerable Filters.

  18. Ceruloplasmin/Hephaestin Knockout Mice Model Morphologic and Molecular Features of AMD

    PubMed Central

    Hadziahmetovic, Majda; Dentchev, Tzvete; Song, Ying; Haddad, Nadine; He, Xining; Hahn, Paul; Pratico, Domenico; Wen, Rong; Harris, Z. Leah; Lambris, John D.; Beard, John; Dunaief, Joshua L.

    2008-01-01

    Purpose Iron is an essential element in human metabolism but also is a potent generator of oxidative damage with levels that increase with age. Several studies suggest that iron accumulation may be a factor in age-related macular degeneration (AMD). In prior studies, both iron overload and features of AMD were identified in mice deficient in the ferroxidase ceruloplasmin (Cp) and its homologue hephaestin (Heph) (double knockout, DKO). In this study, the location and timing of iron accumulation, the rate and reproducibility of retinal degeneration, and the roles of oxidative stress and complement activation were determined. Methods Morphologic analysis and histochemical iron detection by Perls' staining was performed on retina sections from DKO and control mice. Immunofluorescence and immunohistochemistry were performed with antibodies detecting activated complement factor C3, transferrin receptor, L-ferritin, and macrophages. Tissue iron levels were measured by atomic absorption spectrophotometry. Isoprostane F2α-VI, a specific marker of oxidative stress, was quantified in the tissue by gas chromatography/mass spectrometry. Results DKOs exhibited highly reproducible age-dependent iron overload, which plateaued at 6 months of age, with subsequent progressive retinal degeneration continuing to at least 12 months. The degeneration shared some features of AMD, including RPE hypertrophy and hyperplasia, photoreceptor degeneration, subretinal neovascularization, RPE lipofuscin accumulation, oxidative stress, and complement activation. Conclusions DKOs have age-dependent iron accumulation followed by retinal degeneration modeling some of the morphologic and molecular features of AMD. Therefore, these mice are a good platform on which to test therapeutic agents for AMD, such as antioxidants, iron chelators, and antiangiogenic agents. PMID:18326691

  19. Using molecular structure for reliable predicting enthalpy of melting of nitroaromatic energetic compounds.

    PubMed

    Semnani, Abolfazl; Keshavarz, Mohammad Hossein

    2010-06-15

    In this work, a reliable simple method has been introduced for predicting enthalpy of melting of nitroaromatic energetic compounds through their molecular structures. This method can be used for a wide range of nitroaromatics including halogenated nitroaromatic compounds. The contribution of hydrogen bonding and polar groups as well as structural parameters can be used to improve the predicted values on the basis of the number of carbon, nitrogen and oxygen atoms. The predicted results show that this method gives reliable prediction of standard enthalpy of melting with respect to the best available methods for different nitroaromatic compounds including high explosives with complex molecular structures. PMID:20117881

  20. When can we expect statistical mechanics to help predict large scale atmospheric and oceanic features?

    NASA Astrophysics Data System (ADS)

    Nadiga, B. T.; Bouchet, F.

    2010-12-01

    While theoretical predictions of the large scales of turbulent geophysical flows is difficult, statistical mechanics has succeeded in describing various tropospheric features of Jupiter, the polar vortex, and oceanic jets and vortices. Nevertheless, the applicability of such statistical mechanical theories to non-equilibrium situations is unclear. Based on numerical studies of non-equilibrium, two dimensional and geostrophic turbulence and some recent experiments, we propose a criterion based on the relative importance of the forcing-dissipation time scale on the one hand, and an inertial relaxation time scale on the other. Across these studies, we find that when the inertial relaxation time scale is much smaller than the forcing-dissipation timescale, statistical mechanics gives good predictions of the large scale mean flow including those of possible transitions. We further elaborate on the extension of such a criterion to other situations. F. BOUCHET and B.T. NADIGA Criteria for the applicability of statistical mechanics for the statistics of the largest scales of turbulent flows, to be submitted to Journal of Fluid Mechanics F. BOUCHET and A. VENAILLE, Statistical mechanics of two-dimensional and geophysical flows, submitted to Physics Reports F. BOUCHET and J. SOMMERIA, Emergence of intense jets and Jupiter's Great Red Spot as maximum-entropy structures, Journal of Fluid Mechanics 464 (2002), 165-207.

  1. Beyond intensity: Spectral features effectively predict music-induced subjective arousal.

    PubMed

    Gingras, Bruno; Marin, Manuela M; Fitch, W Tecumseh

    2014-01-01

    Emotions in music are conveyed by a variety of acoustic cues. Notably, the positive association between sound intensity and arousal has particular biological relevance. However, although amplitude normalization is a common procedure used to control for intensity in music psychology research, direct comparisons between emotional ratings of original and amplitude-normalized musical excerpts are lacking. In this study, 30 nonmusicians retrospectively rated the subjective arousal and pleasantness induced by 84 six-second classical music excerpts, and an additional 30 nonmusicians rated the same excerpts normalized for amplitude. Following the cue-redundancy and Brunswik lens models of acoustic communication, we hypothesized that arousal and pleasantness ratings would be similar for both versions of the excerpts, and that arousal could be predicted effectively by other acoustic cues besides intensity. Although the difference in mean arousal and pleasantness ratings between original and amplitude-normalized excerpts correlated significantly with the amplitude adjustment, ratings for both sets of excerpts were highly correlated and shared a similar range of values, thus validating the use of amplitude normalization in music emotion research. Two acoustic parameters, spectral flux and spectral entropy, accounted for 65% of the variance in arousal ratings for both sets, indicating that spectral features can effectively predict arousal. Additionally, we confirmed that amplitude-normalized excerpts were adequately matched for loudness. Overall, the results corroborate our hypotheses and support the cue-redundancy and Brunswik lens models. PMID:24215647

  2. Search performance is better predicted by tileability than presence of a unique basic feature

    PubMed Central

    Chang, Honghua; Rosenholtz, Ruth

    2016-01-01

    Traditional models of visual search such as feature integration theory (FIT; Treisman & Gelade, 1980), have suggested that a key factor determining task difficulty consists of whether or not the search target contains a “basic feature” not found in the other display items (distractors). Here we discriminate between such traditional models and our recent texture tiling model (TTM) of search (Rosenholtz, Huang, Raj, Balas, & Ilie, 2012b), by designing new experiments that directly pit these models against each other. Doing so is nontrivial, for two reasons. First, the visual representation in TTM is fully specified, and makes clear testable predictions, but its complexity makes getting intuitions difficult. Here we elucidate a rule of thumb for TTM, which enables us to easily design new and interesting search experiments. FIT, on the other hand, is somewhat ill-defined and hard to pin down. To get around this, rather than designing totally new search experiments, we start with five classic experiments that FIT already claims to explain: T among Ls, 2 among 5s, Q among Os, O among Qs, and an orientation/luminance-contrast conjunction search. We find that fairly subtle changes in these search tasks lead to significant changes in performance, in a direction predicted by TTM, providing definitive evidence in favor of the texture tiling model as opposed to traditional views of search. PMID:27548090

  3. High Resolution Prediction of Calcium-Binding Sites in 3D Protein Structures Using FEATURE

    PubMed Central

    2015-01-01

    Metal-binding proteins are ubiquitous in biological systems ranging from enzymes to cell surface receptors. Among the various biologically active metal ions, calcium plays a large role in regulating cellular and physiological changes. With the increasing number of high-quality crystal structures of proteins associated with their metal ion ligands, many groups have built models to identify Ca2+ sites in proteins, utilizing information such as structure, geometry, or homology to do the inference. We present a FEATURE-based approach in building such a model and show that our model is able to discriminate between nonsites and calcium-binding sites with a very high precision of more than 98%. We demonstrate the high specificity of our model by applying it to test sets constructed from other ions. We also introduce an algorithm to convert high scoring regions into specific site predictions and demonstrate the usage by scanning a test set of 91 calcium-binding protein structures (190 calcium sites). The algorithm has a recall of more than 93% on the test set with predictions found within 3 Å of the actual sites. PMID:26226489

  4. Neuroendocrine Tumors of the Large Intestine: Clinicopathological Features and Predictive Factors of Lymph Node Metastasis

    PubMed Central

    Kojima, Motohiro; Ikeda, Koji; Saito, Norio; Sakuyama, Naoki; Koushi, Kenichi; Kawano, Shingo; Watanabe, Toshiaki; Sugihara, Kenichi; Ito, Masaaki; Ochiai, Atsushi

    2016-01-01

    A new histological classification of neuroendocrine tumors (NETs) was established in WHO 2010. ENET and NCCN proposed treatment algorithms for colorectal NET. Retrospective study of NET of the large intestine (colorectal and appendiceal NET) was performed among institutions allied with the Japanese Society for Cancer of the Colon and Rectum, and 760 neuroendocrine tumors from 2001 to 2011 were re-assessed using WHO 2010 criteria to elucidate the clinicopathological features of NET in the large intestine. Next, the clinicopathological relationship with lymph node metastasis was analyzed to predict lymph node metastasis in locally resected rectal NET. The primary site was rectum in 718/760 cases (94.5%), colon in 30/760 cases (3.9%), and appendix in 12/760 cases (1.6%). Patients were predominantly men (61.6%) with a mean age of 58.7 years. Tumor size was <10 mm in 65.4% of cases. Proportions of NET G1, G2, G3, and mixed adeno-neuroendocrine carcinoma (MANEC) were 88.4, 6.3, 3.9, and 1.3%, respectively. Of the 760 tumors, 468 were locally resected, and 292 were surgically resected with lymph node dissection. Rectal NET showed a higher proportion of NET G1, and colonic and appendiceal NET was more commonly G3 and MANEC. Of the 292 surgically resected cases, 233 NET G1 and G2 located in the rectum were used for the prediction of lymph node metastasis. Lymphatic and blood vessel invasion were independent predictive factors of lymph node metastasis. NET G2 cases showed more frequent lymph node metastasis than that seen in NET G1 cases, but this was not an independent predictor of lymph node metastasis. Of the 98 surgically resected cases <10 mm in size, we found 9 cases with lymph node metastasis (9.2%). All cases were NET G1, and eight of the nine cases were positive either for lymphatic invasion or blood vessel invasion. Using the WHO classification, we found NET in the large intestine showed a tumor-site-dependent variety of histological and clinicopathological

  5. Neuroendocrine Tumors of the Large Intestine: Clinicopathological Features and Predictive Factors of Lymph Node Metastasis.

    PubMed

    Kojima, Motohiro; Ikeda, Koji; Saito, Norio; Sakuyama, Naoki; Koushi, Kenichi; Kawano, Shingo; Watanabe, Toshiaki; Sugihara, Kenichi; Ito, Masaaki; Ochiai, Atsushi

    2016-01-01

    A new histological classification of neuroendocrine tumors (NETs) was established in WHO 2010. ENET and NCCN proposed treatment algorithms for colorectal NET. Retrospective study of NET of the large intestine (colorectal and appendiceal NET) was performed among institutions allied with the Japanese Society for Cancer of the Colon and Rectum, and 760 neuroendocrine tumors from 2001 to 2011 were re-assessed using WHO 2010 criteria to elucidate the clinicopathological features of NET in the large intestine. Next, the clinicopathological relationship with lymph node metastasis was analyzed to predict lymph node metastasis in locally resected rectal NET. The primary site was rectum in 718/760 cases (94.5%), colon in 30/760 cases (3.9%), and appendix in 12/760 cases (1.6%). Patients were predominantly men (61.6%) with a mean age of 58.7 years. Tumor size was <10 mm in 65.4% of cases. Proportions of NET G1, G2, G3, and mixed adeno-neuroendocrine carcinoma (MANEC) were 88.4, 6.3, 3.9, and 1.3%, respectively. Of the 760 tumors, 468 were locally resected, and 292 were surgically resected with lymph node dissection. Rectal NET showed a higher proportion of NET G1, and colonic and appendiceal NET was more commonly G3 and MANEC. Of the 292 surgically resected cases, 233 NET G1 and G2 located in the rectum were used for the prediction of lymph node metastasis. Lymphatic and blood vessel invasion were independent predictive factors of lymph node metastasis. NET G2 cases showed more frequent lymph node metastasis than that seen in NET G1 cases, but this was not an independent predictor of lymph node metastasis. Of the 98 surgically resected cases <10 mm in size, we found 9 cases with lymph node metastasis (9.2%). All cases were NET G1, and eight of the nine cases were positive either for lymphatic invasion or blood vessel invasion. Using the WHO classification, we found NET in the large intestine showed a tumor-site-dependent variety of histological and clinicopathological

  6. Computer extracted texture features on T2w MRI to predict biochemical recurrence following radiation therapy for prostate cancer

    NASA Astrophysics Data System (ADS)

    Ginsburg, Shoshana B.; Rusu, Mirabela; Kurhanewicz, John; Madabhushi, Anant

    2014-03-01

    In this study we explore the ability of a novel machine learning approach, in conjunction with computer-extracted features describing prostate cancer morphology on pre-treatment MRI, to predict whether a patient will develop biochemical recurrence within ten years of radiation therapy. Biochemical recurrence, which is characterized by a rise in serum prostate-specific antigen (PSA) of at least 2 ng/mL above the nadir PSA, is associated with increased risk of metastasis and prostate cancer-related mortality. Currently, risk of biochemical recurrence is predicted by the Kattan nomogram, which incorporates several clinical factors to predict the probability of recurrence-free survival following radiation therapy (but has limited prediction accuracy). Semantic attributes on T2w MRI, such as the presence of extracapsular extension and seminal vesicle invasion and surrogate measure- ments of tumor size, have also been shown to be predictive of biochemical recurrence risk. While the correlation between biochemical recurrence and factors like tumor stage, Gleason grade, and extracapsular spread are well- documented, it is less clear how to predict biochemical recurrence in the absence of extracapsular spread and for small tumors fully contained in the capsule. Computer{extracted texture features, which quantitatively de- scribe tumor micro-architecture and morphology on MRI, have been shown to provide clues about a tumor's aggressiveness. However, while computer{extracted features have been employed for predicting cancer presence and grade, they have not been evaluated in the context of predicting risk of biochemical recurrence. This work seeks to evaluate the role of computer-extracted texture features in predicting risk of biochemical recurrence on a cohort of sixteen patients who underwent pre{treatment 1.5 Tesla (T) T2w MRI. We extract a combination of first-order statistical, gradient, co-occurrence, and Gabor wavelet features from T2w MRI. To identify which of these

  7. Anion pairs in room temperature ionic liquids predicted by molecular dynamics simulation, verified by spectroscopic characterization

    SciTech Connect

    Schwenzer, Birgit; Kerisit, Sebastien N.; Vijayakumar, M.

    2014-01-01

    Molecular-level spectroscopic analyses of an aprotic and a protic room-temperature ionic liquid, BMIM OTf and BMIM HSO4, respectively, have been carried out with the aim of verifying molecular dynamics simulations that predict anion pair formation in these fluid structures. Fourier-transform infrared spectroscopy, Raman spectroscopy and nuclear magnetic resonance spectroscopy of various nuclei support the theoretically-determined average molecular arrangements.

  8. Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features

    PubMed Central

    Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou

    2016-01-01

    The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646

  9. Prediction of biomechanical properties of trabecular bone in MR images with geometric features and support vector regression.

    PubMed

    Huber, Markus B; Lancianese, Sarah L; Nagarajan, Mahesh B; Ikpot, Imoh Z; Lerner, Amy L; Wismuller, Axel

    2011-06-01

    Whole knee joint MR image datasets were used to compare the performance of geometric trabecular bone features and advanced machine learning techniques in predicting biomechanical strength properties measured on the corresponding ex vivo specimens. Changes of trabecular bone structure throughout the proximal tibia are indicative of several musculoskeletal disorders involving changes in the bone quality and the surrounding soft tissue. Recent studies have shown that MR imaging also allows non-invasive 3-D characterization of bone microstructure. Sophisticated features like the scaling index method (SIM) can estimate local structural and geometric properties of the trabecular bone and may improve the ability of MR imaging to determine local bone quality in vivo. A set of 67 bone cubes was extracted from knee specimens and their biomechanical strength estimated by the yield stress (YS) [in MPa] was determined through mechanical testing. The regional apparent bone volume fraction (BVF) and SIM derived features were calculated for each bone cube. A linear multiregression analysis (MultiReg) and a optimized support vector regression (SVR) algorithm were used to predict the YS from the image features. The prediction accuracy was measured by the root mean square error (RMSE) for each image feature on independent test sets. The best prediction result with the lowest prediction error of RMSE = 1.021 MPa was obtained with a combination of BVF and SIM features and by using SVR. The prediction accuracy with only SIM features and SVR (RMSE = 1.023 MPa) was still significantly better than BVF alone and MultiReg (RMSE = 1.073 MPa). The current study demonstrates that the combination of sophisticated bone structure features and supervised learning techniques can improve MR-based determination of trabecular bone quality. PMID:21356612

  10. Predicting the biomechanical strength of proximal femur specimens with bone mineral density features and support vector regression

    NASA Astrophysics Data System (ADS)

    Huber, Markus B.; Yang, Chien-Chun; Carballido-Gamio, Julio; Bauer, Jan S.; Baum, Thomas; Nagarajan, Mahesh B.; Eckstein, Felix; Lochmüller, Eva; Majumdar, Sharmila; Link, Thomas M.; Wismüller, Axel

    2012-03-01

    To improve the clinical assessment of osteoporotic hip fracture risk, recent computer-aided diagnosis systems explore new approaches to estimate the local trabecular bone quality beyond bone density alone to predict femoral bone strength. In this context, statistical bone mineral density (BMD) features extracted from multi-detector computed tomography (MDCT) images of proximal femur specimens and different function approximations methods were compared in their ability to predict the biomechanical strength. MDCT scans were acquired in 146 proximal femur specimens harvested from human cadavers. The femurs' failure load (FL) was determined through biomechanical testing. An automated volume of interest (VOI)-fitting algorithm was used to define a consistent volume in the femoral head of each specimen. In these VOIs, the trabecular bone was represented by statistical moments of the BMD distribution and by pairwise spatial occurrence of BMD values using the gray-level co-occurrence (GLCM) approach. A linear multi-regression analysis (MultiReg) and a support vector regression algorithm with a linear kernel (SVRlin) were used to predict the FL from the image feature sets. The prediction performance was measured by the root mean square error (RMSE) for each image feature on independent test sets; in addition the coefficient of determination R2 was calculated. The best prediction result was obtained with a GLCM feature set using SVRlin, which had the lowest prediction error (RSME = 1.040+/-0.143, R2 = 0.544) and which was significantly lower that the standard approach of using BMD.mean and MultiReg (RSME = 1.093+/-0.133, R2 = 0.490, p<0.0001). The combined sets including BMD.mean and GLCM features had a similar or slightly lower performance than using only GLCM features. The results indicate that the performance of high-dimensional BMD features extracted from MDCT images in predicting the biomechanical strength of proximal femur specimens can be significantly improved by