Kim, Hyungjin; Park, Sang Joon; Kim, Miso; Kim, Tae Min; Kim, Dong-Wan; Heo, Dae Seog; Goo, Jin Mo
2017-01-01
Purpose To determine if the radiomic features on CT can predict progression-free survival (PFS) in epidermal growth factor receptor (EGFR) mutant adenocarcinoma patients treated with first-line EGFR tyrosine kinase inhibitors (TKIs) and to identify the incremental value of radiomic features over conventional clinical factors in PFS prediction. Methods In this institutional review board–approved retrospective study, pretreatment contrast-enhanced CT and first follow-up CT after initiation of TKIs were analyzed in 48 patients (M:F = 23:25; median age: 61 years). Radiomic features at baseline, at 1st first follow-up, and the percentage change between the two were determined. A Cox regression model was used to predict PFS with nonredundant radiomic features and clinical factors, respectively. The incremental value of radiomic features over the clinical factors in PFS prediction was also assessed by way of a concordance index. Results Roundness (HR: 3.91; 95% CI: 1.72, 8.90; P = 0.001) and grey-level nonuniformity (HR: 3.60; 95% CI: 1.80, 7.18; P<0.001) were independent predictors of PFS. For clinical factors, patient age (HR: 2.11; 95% CI: 1.01, 4.39; P = 0.046), baseline tumor diameter (HR: 1.03; 95% CI: 1.01, 1.05; P = 0.002), and treatment response (HR: 0.46; 95% CI: 0.24, 0.87; P = 0.017) were independent predictors. The addition of radiomic features to clinical factors significantly improved predictive performance (concordance index; combined model = 0.77, clinical-only model = 0.69, P<0.001). Conclusions Radiomic features enable PFS estimation in EGFR mutant adenocarcinoma patients treated with first-line EGFR TKIs. Radiomic features combined with clinical factors provide significant improvement in prognostic performance compared with using only clinical factors. PMID:29099855
Andrabi, Munazah; Hutchins, Andrew Paul; Miranda-Saavedra, Diego; Kono, Hidetoshi; Nussinov, Ruth; Mizuguchi, Kenji; Ahmad, Shandar
2017-06-22
DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.
Radiomics-based Prognosis Analysis for Non-Small Cell Lung Cancer
NASA Astrophysics Data System (ADS)
Zhang, Yucheng; Oikonomou, Anastasia; Wong, Alexander; Haider, Masoom A.; Khalvati, Farzad
2017-04-01
Radiomics characterizes tumor phenotypes by extracting large numbers of quantitative features from radiological images. Radiomic features have been shown to provide prognostic value in predicting clinical outcomes in several studies. However, several challenges including feature redundancy, unbalanced data, and small sample sizes have led to relatively low predictive accuracy. In this study, we explore different strategies for overcoming these challenges and improving predictive performance of radiomics-based prognosis for non-small cell lung cancer (NSCLC). CT images of 112 patients (mean age 75 years) with NSCLC who underwent stereotactic body radiotherapy were used to predict recurrence, death, and recurrence-free survival using a comprehensive radiomics analysis. Different feature selection and predictive modeling techniques were used to determine the optimal configuration of prognosis analysis. To address feature redundancy, comprehensive analysis indicated that Random Forest models and Principal Component Analysis were optimum predictive modeling and feature selection methods, respectively, for achieving high prognosis performance. To address unbalanced data, Synthetic Minority Over-sampling technique was found to significantly increase predictive accuracy. A full analysis of variance showed that data endpoints, feature selection techniques, and classifiers were significant factors in affecting predictive accuracy, suggesting that these factors must be investigated when building radiomics-based predictive models for cancer prognosis.
Predictive analysis effectiveness in determining the epidemic disease infected area
NASA Astrophysics Data System (ADS)
Ibrahim, Najihah; Akhir, Nur Shazwani Md.; Hassan, Fadratul Hafinaz
2017-10-01
Epidemic disease outbreak had caused nowadays community to raise their great concern over the infectious disease controlling, preventing and handling methods to diminish the disease dissemination percentage and infected area. Backpropagation method was used for the counter measure and prediction analysis of the epidemic disease. The predictive analysis based on the backpropagation method can be determine via machine learning process that promotes the artificial intelligent in pattern recognition, statistics and features selection. This computational learning process will be integrated with data mining by measuring the score output as the classifier to the given set of input features through classification technique. The classification technique is the features selection of the disease dissemination factors that likely have strong interconnection between each other in causing infectious disease outbreaks. The predictive analysis of epidemic disease in determining the infected area was introduced in this preliminary study by using the backpropagation method in observation of other's findings. This study will classify the epidemic disease dissemination factors as the features for weight adjustment on the prediction of epidemic disease outbreaks. Through this preliminary study, the predictive analysis is proven to be effective method in determining the epidemic disease infected area by minimizing the error value through the features classification.
Gao, JianZhao; Tao, Xue-Wen; Zhao, Jia; Feng, Yuan-Ming; Cai, Yu-Dong; Zhang, Ning
2017-01-01
Lysine acetylation, as one type of post-translational modifications (PTM), plays key roles in cellular regulations and can be involved in a variety of human diseases. However, it is often high-cost and time-consuming to use traditional experimental approaches to identify the lysine acetylation sites. Therefore, effective computational methods should be developed to predict the acetylation sites. In this study, we developed a position-specific method for epsilon lysine acetylation site prediction. Sequences of acetylated proteins were retrieved from the UniProt database. Various kinds of features such as position specific scoring matrix (PSSM), amino acid factors (AAF), and disorders were incorporated. A feature selection method based on mRMR (Maximum Relevance Minimum Redundancy) and IFS (Incremental Feature Selection) was employed. Finally, 319 optimal features were selected from total 541 features. Using the 319 optimal features to encode peptides, a predictor was constructed based on dagging. As a result, an accuracy of 69.56% with MCC of 0.2792 was achieved. We analyzed the optimal features, which suggested some important factors determining the lysine acetylation sites. We developed a position-specific method for epsilon lysine acetylation site prediction. A set of optimal features was selected. Analysis of the optimal features provided insights into the mechanism of lysine acetylation sites, providing guidance of experimental validation. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Prediction and Informative Risk Factor Selection of Bone Diseases.
Li, Hui; Li, Xiaoyi; Ramanathan, Murali; Zhang, Aidong
2015-01-01
With the booming of healthcare industry and the overwhelming amount of electronic health records (EHRs) shared by healthcare institutions and practitioners, we take advantage of EHR data to develop an effective disease risk management model that not only models the progression of the disease, but also predicts the risk of the disease for early disease control or prevention. Existing models for answering these questions usually fall into two categories: the expert knowledge based model or the handcrafted feature set based model. To fully utilize the whole EHR data, we will build a framework to construct an integrated representation of features from all available risk factors in the EHR data and use these integrated features to effectively predict osteoporosis and bone fractures. We will also develop a framework for informative risk factor selection of bone diseases. A pair of models for two contrast cohorts (e.g., diseased patients versus non-diseased patients) will be established to discriminate their characteristics and find the most informative risk factors. Several empirical results on a real bone disease data set show that the proposed framework can successfully predict bone diseases and select informative risk factors that are beneficial and useful to guide clinical decisions.
Jackson, Todd; Chen, Hong
2015-10-01
Body surveillance and body shame are features of objectified body consciousness (OBC) that have been linked to disordered eating, yet the evidence base is largely cross-sectional and limited to samples in certain Western countries. Furthermore, it is not clear whether these factors contribute to the prediction of eating disturbances independent of conceptually related risk factors emphasized within other sociocultural accounts. In this prospective study, body surveillance, body shame, and features of complementary sociocultural models (i.e., perceived appearance pressure from mass media and close interpersonal networks, appearance social comparisons, negative affect, body dissatisfaction) were assessed as risk factors for and concomitants of eating disturbances over time. University-age, mainland Chinese women (n = 2144) and men (n = 1017) completed validated measures of eating-disorder pathology and hypothesized risk factors at baseline (T1) and 1-year follow-up (T2). Among women, elevations on T1 measures of sociocultural-model features predicted more T2 eating disturbances, independent of T1 disturbances. After controlling for other T1 predictors, body surveillance and shame made modest unique contributions to the model. Finally, heightened T2 body dissatisfaction, media, and interpersonal appearance pressure, negative affect, and body shame predicted concomitant increases in T2 eating concerns. For men, T1 features of sociocultural accounts (negative affect, body dissatisfaction) but not OBC predicted T2 eating disturbances, along with attendant elevations in T2 negative affect, interpersonal appearance pressure, and body shame. Implications are discussed for theory and intervention that target disordered eating. (c) 2015 APA, all rights reserved).
Convergent and Discriminant Validity of Psychopathy Factors Assessed Via Self-Report
Benning, Stephen D.; Patrick, Christopher J.; Salekin, Randall T.; Leistico, Anne-Marie R.
2008-01-01
Psychopathy has been conceptualized as a personality disorder with distinctive interpersonal-affective and behavioral deviance features. The authors examine correlates of the factors of the Psychopathic Personality Inventory (PPI), Self-Report Psychopathy–II (SRP-II) scale, and Antisocial Process Screening Device (APSD) to understand similarities and differences among the constructs embodied in these instruments. PPI Fearless Dominance and SRP-II Factor 1 were negatively related to most personality disorder symptoms and were both predicted by high Dominance and low Neuroticism. In addition, PPI Fearless Dominance correlated positively with antisocial personality features, although SRP-II Factor 1 did not. In contrast, PPI Impulsive Antisociality, SRP-II Factor 2, and both APSD factors correlated with antisocial personality features and symptoms of nearly all personality disorders, and were predicted by low Love. Results suggest ways in which the measurement of the constructs in each instrument may be improved. PMID:16123248
Graph regularized nonnegative matrix factorization for temporal link prediction in dynamic networks
NASA Astrophysics Data System (ADS)
Ma, Xiaoke; Sun, Penggang; Wang, Yu
2018-04-01
Many networks derived from society and nature are temporal and incomplete. The temporal link prediction problem in networks is to predict links at time T + 1 based on a given temporal network from time 1 to T, which is essential to important applications. The current algorithms either predict the temporal links by collapsing the dynamic networks or collapsing features derived from each network, which are criticized for ignoring the connection among slices. to overcome the issue, we propose a novel graph regularized nonnegative matrix factorization algorithm (GrNMF) for the temporal link prediction problem without collapsing the dynamic networks. To obtain the feature for each network from 1 to t, GrNMF factorizes the matrix associated with networks by setting the rest networks as regularization, which provides a better way to characterize the topological information of temporal links. Then, the GrNMF algorithm collapses the feature matrices to predict temporal links. Compared with state-of-the-art methods, the proposed algorithm exhibits significantly improved accuracy by avoiding the collapse of temporal networks. Experimental results of a number of artificial and real temporal networks illustrate that the proposed method is not only more accurate but also more robust than state-of-the-art approaches.
Dogan, Nergiz; Wu, Weisheng; Morrissey, Christapher S.; ...
2015-04-23
Regulated gene expression controls organismal development, and variation in regulatory patterns has been implicated in complex traits. Thus accurate prediction of enhancers is important for further understanding of these processes. Genome-wide measurement of epigenetic features, such as histone modifications and occupancy by transcription factors, is improving enhancer predictions, but the contribution of these features to prediction accuracy is not known. Given the importance of the hematopoietic transcription factor TAL1 for erythroid gene activation, we predicted candidate enhancers based on genomic occupancy by TAL1 and measured their activity. Contributions of multiple features to enhancer prediction were evaluated based on the resultsmore » of these and other studies. Results: TAL1-bound DNA segments were active enhancers at a high rate both in transient transfections of cultured cells (39 of 79, or 56%) and transgenic mice (43 of 66, or 65%). The level of binding signal for TAL1 or GATA1 did not help distinguish TAL1-bound DNA segments as active versus inactive enhancers, nor did the density of regulation-related histone modifications. A meta-analysis of results from this and other studies (273 tested predicted enhancers) showed that the presence of TAL1, GATA1, EP300, SMAD1, H3K4 methylation, H3K27ac, and CAGE tags at DNase hypersensitive sites gave the most accurate predictors of enhancer activity, with a success rate over 80% and a median threefold increase in activity. Chromatin accessibility assays and the histone modifications H3K4me1 and H3K27ac were sensitive for finding enhancers, but they have high false positive rates unless transcription factor occupancy is also included. Conclusions: Occupancy by key transcription factors such as TAL1, GATA1, SMAD1, and EP300, along with evidence of transcription, improves the accuracy of enhancer predictions based on epigenetic features.« less
Prediction of near-term breast cancer risk using a Bayesian belief network
NASA Astrophysics Data System (ADS)
Zheng, Bin; Ramalingam, Pandiyarajan; Hariharan, Harishwaran; Leader, Joseph K.; Gur, David
2013-03-01
Accurately predicting near-term breast cancer risk is an important prerequisite for establishing an optimal personalized breast cancer screening paradigm. In previous studies, we investigated and tested the feasibility of developing a unique near-term breast cancer risk prediction model based on a new risk factor associated with bilateral mammographic density asymmetry between the left and right breasts of a woman using a single feature. In this study we developed a multi-feature based Bayesian belief network (BBN) that combines bilateral mammographic density asymmetry with three other popular risk factors, namely (1) age, (2) family history, and (3) average breast density, to further increase the discriminatory power of our cancer risk model. A dataset involving "prior" negative mammography examinations of 348 women was used in the study. Among these women, 174 had breast cancer detected and verified in the next sequential screening examinations, and 174 remained negative (cancer-free). A BBN was applied to predict the risk of each woman having cancer detected six to 18 months later following the negative screening mammography. The prediction results were compared with those using single features. The prediction accuracy was significantly increased when using the BBN. The area under the ROC curve increased from an AUC=0.70 to 0.84 (p<0.01), while the positive predictive value (PPV) and negative predictive value (NPV) also increased from a PPV=0.61 to 0.78 and an NPV=0.65 to 0.75, respectively. This study demonstrates that a multi-feature based BBN can more accurately predict the near-term breast cancer risk than with a single feature.
Dong, Fei; Zeng, Qiang; Jiang, Biao; Yu, Xinfeng; Wang, Weiwei; Xu, Jingjing; Yu, Jinna; Li, Qian; Zhang, Minming
2018-05-01
To study whether some of the quantitative enhancement and necrosis features in preoperative conventional MRI (cMRI) had a predictive value for epidermal growth factor receptor (EGFR) gene amplification status in glioblastoma multiforme (GBM).Fifty-five patients with pathologically determined GBMs who underwent cMRI were retrospectively reviewed. The following cMRI features were quantitatively measured and recorded: long and short diameters of the enhanced portion (LDE and SDE), maximum and minimum thickness of the enhanced portion (MaxTE and MinTE), and long and short diameters of the necrotic portion (LDN and SDN). Univariate analysis of each feature and a decision tree model fed with all the features were performed. Area under the receiver operating characteristic (ROC) curve (AUC) was used to assess the performance of features, and predictive accuracy was used to assess the performance of the model.For single feature, MinTE showed the best performance in differentiating EGFR gene amplification negative (wild-type) (nEGFR) GBM from EGFR gene amplification positive (pEGFR) GBM, and it got an AUC of 0.68 with a cut-off value of 2.6 mm. The decision tree model included 2 features MinTE and SDN, and got an accuracy of 0.83 in validation dataset.Our results suggest that quantitative measurement of the features MinTE and SDN in preoperative cMRI had a high accuracy for predicting EGFR gene amplification status in GBM.
Compton, Michael T; Berez, Chantal; Walker, Elaine F
Family history of psychosis, gender, mode of onset, and age at onset are considered prognostic factors important to clinicians evaluating first-episode psychosis; yet, clinicians have little guidance as to how these four factors differentially predict early-course substance abuse, symptomatology, and functioning. We conducted a "head-to-head comparison" of these four factors regarding their associations with key clinical features at initial hospitalization. We also assessed potential interactions between gender and family history with regard to age at onset of psychosis and symptom severity. Consecutively admitted first-episode patients (n=334) were evaluated in two studies that rigorously assessed a number of early-course variables. Associations among variables of interest were examined using Pearson correlations, χ 2 tests, Student's t-tests, and 2×2 factorial analyses of variance. Substance (nicotine, alcohol, and cannabis) abuse and positive symptom severity were predicted only by male gender. Negative symptom severity and global functioning impairments were predicted by earlier age at onset of psychosis. General psychopathology symptom severity was predicted by both mode of onset and age at onset. Interaction effects were not observed with regard to gender and family history in predicting age at onset or symptom severity. The four prognostic features have differential associations with substance abuse, domains of symptom severity, and global functioning. Gender and age at onset of psychosis appear to be more predictive of clinical features at the time of initial evaluation (and thus presumably longer term outcomes) than the presence of a family history of psychosis and a more gradual mode of onset.
Prediction of interface residue based on the features of residue interaction network.
Jiao, Xiong; Ranganathan, Shoba
2017-11-07
Protein-protein interaction plays a crucial role in the cellular biological processes. Interface prediction can improve our understanding of the molecular mechanisms of the related processes and functions. In this work, we propose a classification method to recognize the interface residue based on the features of a weighted residue interaction network. The random forest algorithm is used for the prediction and 16 network parameters and the B-factor are acting as the element of the input feature vector. Compared with other similar work, the method is feasible and effective. The relative importance of these features also be analyzed to identify the key feature for the prediction. Some biological meaning of the important feature is explained. The results of this work can be used for the related work about the structure-function relationship analysis via a residue interaction network model. Copyright © 2017 Elsevier Ltd. All rights reserved.
Alshurafa, Nabil; Eastwood, Jo-Ann; Pourhomayoun, Mohammad; Liu, Jason J; Sarrafzadeh, Majid
2014-01-01
Current studies have produced a plethora of remote health monitoring (RHM) systems designed to enhance the care of patients with chronic diseases. Many RHM systems are designed to improve patient risk factors for cardiovascular disease, including physiological parameters such as body mass index (BMI) and waist circumference, and lipid profiles such as low density lipoprotein (LDL) and high density lipoprotein (HDL). There are several patient characteristics that could be determining factors for a patient's RHM outcome success, but these characteristics have been largely unidentified. In this paper, we analyze results from an RHM system deployed in a six month Women's Heart Health study of 90 patients, and apply advanced feature selection and machine learning algorithms to identify patients' key baseline contextual features and build effective prediction models that help determine RHM outcome success. We introduce Wanda-CVD, a smartphone-based RHM system designed to help participants with cardiovascular disease risk factors by motivating participants through wireless coaching using feedback and prompts as social support. We analyze key contextual features that secure positive patient outcomes in both physiological parameters and lipid profiles. Results from the Women's Heart Health study show that health threat of heart disease, quality of life, family history, stress factors, social support, and anxiety at baseline all help predict patient RHM outcome success.
Zheng, Lu-Lu; Niu, Shen; Hao, Pei; Feng, KaiYan; Cai, Yu-Dong; Li, Yixue
2011-01-01
Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations. PMID:22174779
Patient feature based dosimetric Pareto front prediction in esophageal cancer radiotherapy.
Wang, Jiazhou; Jin, Xiance; Zhao, Kuaike; Peng, Jiayuan; Xie, Jiang; Chen, Junchao; Zhang, Zhen; Studenski, Matthew; Hu, Weigang
2015-02-01
To investigate the feasibility of the dosimetric Pareto front (PF) prediction based on patient's anatomic and dosimetric parameters for esophageal cancer patients. Eighty esophagus patients in the authors' institution were enrolled in this study. A total of 2928 intensity-modulated radiotherapy plans were obtained and used to generate PF for each patient. On average, each patient had 36.6 plans. The anatomic and dosimetric features were extracted from these plans. The mean lung dose (MLD), mean heart dose (MHD), spinal cord max dose, and PTV homogeneity index were recorded for each plan. Principal component analysis was used to extract overlap volume histogram (OVH) features between PTV and other organs at risk. The full dataset was separated into two parts; a training dataset and a validation dataset. The prediction outcomes were the MHD and MLD. The spearman's rank correlation coefficient was used to evaluate the correlation between the anatomical features and dosimetric features. The stepwise multiple regression method was used to fit the PF. The cross validation method was used to evaluate the model. With 1000 repetitions, the mean prediction error of the MHD was 469 cGy. The most correlated factor was the first principal components of the OVH between heart and PTV and the overlap between heart and PTV in Z-axis. The mean prediction error of the MLD was 284 cGy. The most correlated factors were the first principal components of the OVH between heart and PTV and the overlap between lung and PTV in Z-axis. It is feasible to use patients' anatomic and dosimetric features to generate a predicted Pareto front. Additional samples and further studies are required improve the prediction model.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, J; Zhao, K; Peng, J
2014-06-15
Purpose: The purpose of this study is to study the feasibility of the dosimetric pareto front (PF) prediction based on patient anatomic and dosimetric parameters for esophagus cancer patients. Methods: Sixty esophagus patients in our institution were enrolled in this study. A total 2920 IMRT plans were created to generated PF for each patient. On average, each patient had 48 plans. The anatomic and dosimetric features were extracted from those plans. The mean lung dose (MLD), mean heart dose (MHD), spinal cord max dose and PTV homogeneous index (PTVHI) were recorded for each plan. The principal component analysis (PCA) wasmore » used to extract overlap volume histogram (OVH) features between PTV and other critical organs. The full dataset was separated into two parts include the training dataset and the validation dataset. The prediction outcomes were the MHD and MLD for the current study. The spearman rank correlation coefficient was used to evaluate the correlation between the anatomical features and dosimetric features. The PF was fit by the the stepwise multiple regression method. The cross-validation method was used to evaluation the model. Results: The mean prediction error of the MHD was 465 cGy with 100 repetitions. The most correlated factors were the first principal components of the OVH between heart and PTV, and the overlap between heart and PTV in Z-axis. The mean prediction error of the MLD was 195 cGy. The most correlated factors were the first principal components of the OVH between lung and PTV, and the overlap between lung and PTV in Z-axis. Conclusion: It is feasible to use patients anatomic and dosimetric features to generate a predicted PF. Additional samples and further studies were required to get a better prediction model.« less
Patient feature based dosimetric Pareto front prediction in esophageal cancer radiotherapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Jiazhou; Zhao, Kuaike; Peng, Jiayuan
2015-02-15
Purpose: To investigate the feasibility of the dosimetric Pareto front (PF) prediction based on patient’s anatomic and dosimetric parameters for esophageal cancer patients. Methods: Eighty esophagus patients in the authors’ institution were enrolled in this study. A total of 2928 intensity-modulated radiotherapy plans were obtained and used to generate PF for each patient. On average, each patient had 36.6 plans. The anatomic and dosimetric features were extracted from these plans. The mean lung dose (MLD), mean heart dose (MHD), spinal cord max dose, and PTV homogeneity index were recorded for each plan. Principal component analysis was used to extract overlapmore » volume histogram (OVH) features between PTV and other organs at risk. The full dataset was separated into two parts; a training dataset and a validation dataset. The prediction outcomes were the MHD and MLD. The spearman’s rank correlation coefficient was used to evaluate the correlation between the anatomical features and dosimetric features. The stepwise multiple regression method was used to fit the PF. The cross validation method was used to evaluate the model. Results: With 1000 repetitions, the mean prediction error of the MHD was 469 cGy. The most correlated factor was the first principal components of the OVH between heart and PTV and the overlap between heart and PTV in Z-axis. The mean prediction error of the MLD was 284 cGy. The most correlated factors were the first principal components of the OVH between heart and PTV and the overlap between lung and PTV in Z-axis. Conclusions: It is feasible to use patients’ anatomic and dosimetric features to generate a predicted Pareto front. Additional samples and further studies are required improve the prediction model.« less
Structural features that predict real-value fluctuations of globular proteins.
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2012-05-01
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.
Structural features that predict real-value fluctuations of globular proteins
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2012-01-01
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193
Spashett, Renee; Fernie, Gordon; Reid, Ian C; Cameron, Isobel M
2014-09-01
This study aimed to explore the relationship of Montgomery-Åsberg Depression Rating Scale (MADRS) symptom subtypes with response to electroconvulsive therapy (ECT) and subsequent ECT treatment within 12 months. A consecutive sample of 414 patients with depression receiving ECT in the North East of Scotland was assessed by retrospective chart review. Response rate was defined as greater than or equal to 50% decrease in pretreatment total MADRS score or a posttreatment total MADRS less than or equal to 10. Principal component analyses were conducted on a sample with psychotic features (n = 124) and a sample without psychotic features (n = 290). Scores on extracted factor subscales, clinical and demographic characteristics were assessed for association with response and subsequent ECT treatment within 12 months. Where more than 1 variable was associated with response or subsequent ECT, logistic regression analysis was applied. MADRS symptom subtypes formed 3 separate factors in both samples. Logistic regression revealed older age and high "Despondency" subscale score predicted response in the nonpsychotic group. Older age alone predicted response in the group with psychotic features. Nonpsychotic patients subsequently re-treated with ECT were older than those not prescribed subsequent ECT. No association of variables emerged with subsequent ECT treatment in the group with psychotic features. Being of older age and the presence of psychotic features predicted response. Presence of psychotic features alone predicted subsequent retreatment. Subscale scores of the MADRS are of limited use in predicting which patients with depression will respond to ECT, with the exception of "Despondency" subscale scores in patients without psychotic features.
Return to Work After Lumbar Microdiscectomy - Personalizing Approach Through Predictive Modeling.
Papić, Monika; Brdar, Sanja; Papić, Vladimir; Lončar-Turukalo, Tatjana
2016-01-01
Lumbar disc herniation (LDH) is the most common disease among working population requiring surgical intervention. This study aims to predict the return to work after operative treatment of LDH based on the observational study including 153 patients. The classification problem was approached using decision trees (DT), support vector machines (SVM) and multilayer perception (MLP) combined with RELIEF algorithm for feature selection. MLP provided best recall of 0.86 for the class of patients not returning to work, which combined with the selected features enables early identification and personalized targeted interventions towards subjects at risk of prolonged disability. The predictive modeling indicated at the most decisive risk factors in prolongation of work absence: psychosocial factors, mobility of the spine and structural changes of facet joints and professional factors including standing, sitting and microclimate.
Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.
2013-09-01
This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the setmore » of predictors to around 8 factors that can be validated using reputable medical and public health resources.« less
Using Conversation Topics for Predicting Therapy Outcomes in Schizophrenia
Howes, Christine; Purver, Matthew; McCabe, Rose
2013-01-01
Previous research shows that aspects of doctor-patient communication in therapy can predict patient symptoms, satisfaction and future adherence to treatment (a significant problem with conditions such as schizophrenia). However, automatic prediction has so far shown success only when based on low-level lexical features, and it is unclear how well these can generalize to new data, or whether their effectiveness is due to their capturing aspects of style, structure or content. Here, we examine the use of topic as a higher-level measure of content, more likely to generalize and to have more explanatory power. Investigations show that while topics predict some important factors such as patient satisfaction and ratings of therapy quality, they lack the full predictive power of lower-level features. For some factors, unsupervised methods produce models comparable to manual annotation. PMID:23943658
Expectation and Surprise Determine Neural Population Responses in the Ventral Visual Stream
Egner, Tobias; Monti, Jim M.; Summerfield, Christopher
2014-01-01
Visual cortex is traditionally viewed as a hierarchy of neural feature detectors, with neural population responses being driven by bottom-up stimulus features. Conversely, “predictive coding” models propose that each stage of the visual hierarchy harbors two computationally distinct classes of processing unit: representational units that encode the conditional probability of a stimulus and provide predictions to the next lower level; and error units that encode the mismatch between predictions and bottom-up evidence, and forward prediction error to the next higher level. Predictive coding therefore suggests that neural population responses in category-selective visual regions, like the fusiform face area (FFA), reflect a summation of activity related to prediction (“face expectation”) and prediction error (“face surprise”), rather than a homogenous feature detection response. We tested the rival hypotheses of the feature detection and predictive coding models by collecting functional magnetic resonance imaging data from the FFA while independently varying both stimulus features (faces vs houses) and subjects’ perceptual expectations regarding those features (low vs medium vs high face expectation). The effects of stimulus and expectation factors interacted, whereby FFA activity elicited by face and house stimuli was indistinguishable under high face expectation and maximally differentiated under low face expectation. Using computational modeling, we show that these data can be explained by predictive coding but not by feature detection models, even when the latter are augmented with attentional mechanisms. Thus, population responses in the ventral visual stream appear to be determined by feature expectation and surprise rather than by stimulus features per se. PMID:21147999
Real estate value prediction using multivariate regression models
NASA Astrophysics Data System (ADS)
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
NASA Astrophysics Data System (ADS)
Ginsburg, Shoshana B.; Rusu, Mirabela; Kurhanewicz, John; Madabhushi, Anant
2014-03-01
In this study we explore the ability of a novel machine learning approach, in conjunction with computer-extracted features describing prostate cancer morphology on pre-treatment MRI, to predict whether a patient will develop biochemical recurrence within ten years of radiation therapy. Biochemical recurrence, which is characterized by a rise in serum prostate-specific antigen (PSA) of at least 2 ng/mL above the nadir PSA, is associated with increased risk of metastasis and prostate cancer-related mortality. Currently, risk of biochemical recurrence is predicted by the Kattan nomogram, which incorporates several clinical factors to predict the probability of recurrence-free survival following radiation therapy (but has limited prediction accuracy). Semantic attributes on T2w MRI, such as the presence of extracapsular extension and seminal vesicle invasion and surrogate measure- ments of tumor size, have also been shown to be predictive of biochemical recurrence risk. While the correlation between biochemical recurrence and factors like tumor stage, Gleason grade, and extracapsular spread are well- documented, it is less clear how to predict biochemical recurrence in the absence of extracapsular spread and for small tumors fully contained in the capsule. Computer{extracted texture features, which quantitatively de- scribe tumor micro-architecture and morphology on MRI, have been shown to provide clues about a tumor's aggressiveness. However, while computer{extracted features have been employed for predicting cancer presence and grade, they have not been evaluated in the context of predicting risk of biochemical recurrence. This work seeks to evaluate the role of computer-extracted texture features in predicting risk of biochemical recurrence on a cohort of sixteen patients who underwent pre{treatment 1.5 Tesla (T) T2w MRI. We extract a combination of first-order statistical, gradient, co-occurrence, and Gabor wavelet features from T2w MRI. To identify which of these T2w MRI texture features are potential independent prognostic markers of PSA failure, we implement a partial least squares (PLS) method to embed the data in a low{dimensional space and then use the variable importance in projections (VIP) method to quantify the contributions of individual features to classification on the PLS embedding. In spite of the poor resolution of the 1.5 T MRI data, we are able to identify three Gabor wavelet features that, in conjunction with a logistic regression classifier, yield an area under the receiver operating characteristic curve of 0.83 for predicting the probability of biochemical recurrence following radiation therapy. In comparison to both the Kattan nomogram and semantic MRI attributes, the ability of these three computer-extracted features to predict biochemical recurrence risk is demonstrated.
AbdelRahman, Samir E; Zhang, Mingyuan; Bray, Bruce E; Kawamoto, Kensaku
2014-05-27
The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.
Better Forecasting for Better Planning: A Systems Approach.
ERIC Educational Resources Information Center
Austin, W. Burnet
Predictions and forecasts are the most critical features of rational planning as well as the most vulnerable to inaccuracy. Because plans are only as good as their forecasts, current planning procedures could be improved by greater forecasting accuracy. Economic factors explain and predict more than any other set of factors, making economic…
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets
Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.
2013-01-01
Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.
Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A
2013-07-01
Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.
Characterizing mammographic images by using generic texture features
2012-01-01
Introduction Although mammographic density is an established risk factor for breast cancer, its use is limited in clinical practice because of a lack of automated and standardized measurement methods. The aims of this study were to evaluate a variety of automated texture features in mammograms as risk factors for breast cancer and to compare them with the percentage mammographic density (PMD) by using a case-control study design. Methods A case-control study including 864 cases and 418 controls was analyzed automatically. Four hundred seventy features were explored as possible risk factors for breast cancer. These included statistical features, moment-based features, spectral-energy features, and form-based features. An elaborate variable selection process using logistic regression analyses was performed to identify those features that were associated with case-control status. In addition, PMD was assessed and included in the regression model. Results Of the 470 image-analysis features explored, 46 remained in the final logistic regression model. An area under the curve of 0.79, with an odds ratio per standard deviation change of 2.88 (95% CI, 2.28 to 3.65), was obtained with validation data. Adding the PMD did not improve the final model. Conclusions Using texture features to predict the risk of breast cancer appears feasible. PMD did not show any additional value in this study. With regard to the features assessed, most of the analysis tools appeared to reflect mammographic density, although some features did not correlate with PMD. It remains to be investigated in larger case-control studies whether these features can contribute to increased prediction accuracy. PMID:22490545
Nelson, David A; Coyne, Sarah M; Swanson, Savannah M; Hart, Craig H; Olsen, Joseph A
2014-08-01
Crick, Murray-Close, and Woods (2005) encouraged the study of relational aggression as a developmental precursor to borderline personality features in children and adolescents. A longitudinal study is needed to more fully explore this association, to contrast potential associations with physical aggression, and to assess generalizability across various cultural contexts. In addition, parenting is of particular interest in the prediction of aggression or borderline personality disorder. Early aggression and parenting experiences may differ in their long-term prediction of aggression or borderline features, which may have important implications for early intervention. The currrent study incorporated a longitudinal sample of preschool children (84 boys, 84 girls) living in intact, two-parent biological households in Voronezh, Russia. Teachers provided ratings of children's relational and physical aggression in preschool. Mothers and fathers also self-reported their engagement in authoritative, authoritarian, permissive, and psychological controlling forms of parenting with their preschooler. A decade later, 70.8% of the original child participants consented to a follow-up study in which they completed self-reports of relational and physical aggression and borderline personality features. The multivariate results of this study showed that preschool relational aggression in girls predicted adolescent relational aggression. Preschool aversive parenting (i.e., authoritarian, permissive, and psychologically controlling forms) significantly predicted aggression and borderline features in adolescent females. For adolescent males, preschool authoritative parenting served as a protective factor against aggression and borderline features, whereas authoritarian parenting was a risk factor for later aggression.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lafata, K; Ren, L; Wu, Q
Purpose: To develop a data-mining methodology based on quantum clustering and machine learning to predict expected dosimetric endpoints for lung SBRT applications based on patient-specific anatomic features. Methods: Ninety-three patients who received lung SBRT at our clinic from 2011–2013 were retrospectively identified. Planning information was acquired for each patient, from which various features were extracted using in-house semi-automatic software. Anatomic features included tumor-to-OAR distances, tumor location, total-lung-volume, GTV and ITV. Dosimetric endpoints were adopted from RTOG-0195 recommendations, and consisted of various OAR-specific partial-volume doses and maximum point-doses. First, PCA analysis and unsupervised quantum-clustering was used to explore the feature-space tomore » identify potentially strong classifiers. Secondly, a multi-class logistic regression algorithm was developed and trained to predict dose-volume endpoints based on patient-specific anatomic features. Classes were defined by discretizing the dose-volume data, and the feature-space was zero-mean normalized. Fitting parameters were determined by minimizing a regularized cost function, and optimization was performed via gradient descent. As a pilot study, the model was tested on two esophageal dosimetric planning endpoints (maximum point-dose, dose-to-5cc), and its generalizability was evaluated with leave-one-out cross-validation. Results: Quantum-Clustering demonstrated a strong separation of feature-space at 15Gy across the first-and-second Principle Components of the data when the dosimetric endpoints were retrospectively identified. Maximum point dose prediction to the esophagus demonstrated a cross-validation accuracy of 87%, and the maximum dose to 5cc demonstrated a respective value of 79%. The largest optimized weighting factor was placed on GTV-to-esophagus distance (a factor of 10 greater than the second largest weighting factor), indicating an intuitively strong correlation between this feature and both endpoints. Conclusion: This pilot study shows that it is feasible to predict dose-volume endpoints based on patient-specific anatomic features. The developed methodology can potentially help to identify patients at risk for higher OAR doses, thus improving the efficiency of treatment planning. R01-184173.« less
Hecht, Kathryn F.; Cicchetti, Dante; Rogosch, Fred A.; Crick, Nicki
2014-01-01
Child maltreatment has been established as a risk factor for borderline personality disorder (BPD), yet few studies consider how maltreatment influences the development of BPD features through childhood and adolescence. Subtype, developmental timing and chronicity of child maltreatment were examined as factors in the development of borderline personality features in childhood. Children (M age = 11.30, SD = 0.94), including 314 maltreated and 285 nonmaltreated children from comparable low socioeconomic backgrounds, provided self-reports of developmentally salient borderline personality traits. Maltreated children had higher overall borderline feature scores, higher scores on each individual subscale and were more likely to be identified as at high risk for development of BPD through raised scores on all 4 subscales. Chronicity of maltreatment predicted higher overall borderline feature scores and patterns of onset and recency of maltreatment significantly predicted whether a participant would meet criteria for the high-risk group. Implications of findings and recommendations for intervention are discussed. PMID:25047300
Hecht, Kathryn F; Cicchetti, Dante; Rogosch, Fred A; Crick, Nicki R
2014-08-01
Child maltreatment has been established as a risk factor for borderline personality disorder (BPD), yet few studies consider how maltreatment influences the development of BPD features through childhood and adolescence. Subtype, developmental timing, and chronicity of child maltreatment were examined as factors in the development of borderline personality features in childhood. Children (M age = 11.30, SD = 0.94), including 314 maltreated and 285 nonmaltreated children from comparable low socioeconomic backgrounds, provided self-reports of developmentally salient borderline personality traits. Maltreated children had higher overall borderline feature scores, had higher scores on each individual subscale, and were more likely to be identified as at high risk for development of BPD through raised scores on all four subscales. Chronicity of maltreatment predicted higher overall borderline feature scores, and patterns of onset and recency of maltreatment significantly predicted whether a participant would meet criteria for the high-risk group. Implications of findings and recommendations for intervention are discussed.
Chan, J; Chan, H Y F
2011-08-01
The study aims to evaluate the diagnostic utility of thyrogastric immune features in the identification of intrinsic factor antibody negative (IFA -ve) pernicious anaemia (PA) patients. Clinico-pathological features of 'intrinsic factor antibody positive (IFA +ve) PA' and 'IFA -ve presumed PA' Chinese patients in a single hospital (2001-2009) were studied. Coefficients of independent variables identified were used as weighted scores. The result was validated by patients (1994-2000) with Schilling test done. Comparison of 127 'IFA +ve PA' and 130 'IFA -ve presumed PA' patients showed four independent variables, namely (+) gastric parietal cell (GPC) antibody (OR, 2.907, 95%; CI, 2.346-3.468; P < 0.001), (+) antithyroid antibodies (OR, 3.098, 95%; CI, 2.496-3.70; P < 0.001), (+) gastric atrophy (OR, 3.827, 95%; CI, 3.041-4.64; P = 0.001), and (-) Helicobacter pylori (HP) organisms (OR, 0.134, 95%; CI, -1.60-1.869; P = 0.023). The respective scores were 1.067, 1.131, 1.342 and -2.012. Total scores for each patient ranged from 3.54 to -2.012. When the cut-off score 1.528 was applied to the validation sample (n = 75), the specificity of identifying IFA -ve PA was 100%, sensitivity 53%, positive predictive value 100%, and negative predictive value 36%. Patients with two out of three features, GPC, antithyroid antibodies, gastric atrophy, but without HP organisms; or three features with HP organisms, can be predicted to have PA. © 2011 Blackwell Publishing Ltd.
Rosenkrantz, Andrew B; Doshi, Ankur M; Ginocchio, Luke A; Aphinyanaphongs, Yindalon
2016-12-01
This study aimed to assess the performance of a text classification machine-learning model in predicting highly cited articles within the recent radiological literature and to identify the model's most influential article features. We downloaded from PubMed the title, abstract, and medical subject heading terms for 10,065 articles published in 25 general radiology journals in 2012 and 2013. Three machine-learning models were applied to predict the top 10% of included articles in terms of the number of citations to the article in 2014 (reflecting the 2-year time window in conventional impact factor calculations). The model having the highest area under the curve was selected to derive a list of article features (words) predicting high citation volume, which was iteratively reduced to identify the smallest possible core feature list maintaining predictive power. Overall themes were qualitatively assigned to the core features. The regularized logistic regression (Bayesian binary regression) model had highest performance, achieving an area under the curve of 0.814 in predicting articles in the top 10% of citation volume. We reduced the initial 14,083 features to 210 features that maintain predictivity. These features corresponded with topics relating to various imaging techniques (eg, diffusion-weighted magnetic resonance imaging, hyperpolarized magnetic resonance imaging, dual-energy computed tomography, computed tomography reconstruction algorithms, tomosynthesis, elastography, and computer-aided diagnosis), particular pathologies (prostate cancer; thyroid nodules; hepatic adenoma, hepatocellular carcinoma, non-alcoholic fatty liver disease), and other topics (radiation dose, electroporation, education, general oncology, gadolinium, statistics). Machine learning can be successfully applied to create specific feature-based models for predicting articles likely to achieve high influence within the radiological literature. Copyright © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Mining the key predictors for event outbreaks in social networks
NASA Astrophysics Data System (ADS)
Yi, Chengqi; Bao, Yuanyuan; Xue, Yibo
2016-04-01
It will be beneficial to devise a method to predict a so-called event outbreak. Existing works mainly focus on exploring effective methods for improving the accuracy of predictions, while ignoring the underlying causes: What makes event go viral? What factors that significantly influence the prediction of an event outbreak in social networks? In this paper, we proposed a novel definition for an event outbreak, taking into account the structural changes to a network during the propagation of content. In addition, we investigated features that were sensitive to predicting an event outbreak. In order to investigate the universality of these features at different stages of an event, we split the entire lifecycle of an event into 20 equal segments according to the proportion of the propagation time. We extracted 44 features, including features related to content, users, structure, and time, from each segment of the event. Based on these features, we proposed a prediction method using supervised classification algorithms to predict event outbreaks. Experimental results indicate that, as time goes by, our method is highly accurate, with a precision rate ranging from 79% to 97% and a recall rate ranging from 74% to 97%. In addition, after applying a feature-selection algorithm, the top five selected features can considerably improve the accuracy of the prediction. Data-driven experimental results show that the entropy of the eigenvector centrality, the entropy of the PageRank, the standard deviation of the betweenness centrality, the proportion of re-shares without content, and the average path length are the key predictors for an event outbreak. Our findings are especially useful for further exploring the intrinsic characteristics of outbreak prediction.
Stabilizing l1-norm prediction models by supervised feature grouping.
Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha
2016-02-01
Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l1-norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. Copyright © 2015 Elsevier Inc. All rights reserved.
Landscape features influence postrelease predation on endangered black-footed ferrets
Poessel, S.A.; Breck, S.W.; Biggins, D.E.; Livieri, T.M.; Crooks, K.R.; Angeloni, L.
2011-01-01
Predation can be a critical factor influencing recovery of endangered species. In most recovery efforts lethal and nonlethal influences of predators are not sufficiently understood to allow prediction of predation risk, despite its importance. We investigated whether landscape features could be used to model predation risk from coyotes (Canis latrans) and great horned owls (Bubo virginianus) on the endangered black-footed ferret (Mustela nigripes). We used location data of reintroduced ferrets from 3 sites in South Dakota to determine whether exposure to landscape features typically associated with predators affected survival of ferrets, and whether ferrets considered predation risk when choosing habitat near perches potentially used by owls or near linear features predicted to be used by coyotes. Exposure to areas near likely owl perches reduced ferret survival, but landscape features potentially associated with coyote movements had no appreciable effect on survival. Ferrets were located within 90 m of perches more than expected in 2 study sites that also had higher ferret mortality due to owl predation. Densities of potential coyote travel routes near ferret locations were no different than expected in all 3 sites. Repatriated ferrets might have selected resources based on factors other than predator avoidance. Considering an easily quantified landscape feature (i.e., owl perches) can enhance success of reintroduction efforts for ferrets. Nonetheless, development of predictive models of predation risk and management strategies to mitigate that risk is not necessarily straightforward for more generalist predators such as coyotes. ?? 2011 American Society of Mammalogists.
Tensor-driven extraction of developmental features from varying paediatric EEG datasets.
Kinney-Lang, Eli; Spyrou, Loukianos; Ebied, Ahmed; Chin, Richard Fm; Escudero, Javier
2018-05-21
Constant changes in developing children's brains can pose a challenge in EEG dependant technologies. Advancing signal processing methods to identify developmental differences in paediatric populations could help improve function and usability of such technologies. Taking advantage of the multi-dimensional structure of EEG data through tensor analysis may offer a framework for extracting relevant developmental features of paediatric datasets. A proof of concept is demonstrated through identifying latent developmental features in resting-state EEG. Approach. Three paediatric datasets (n = 50, 17, 44) were analyzed using a two-step constrained parallel factor (PARAFAC) tensor decomposition. Subject age was used as a proxy measure of development. Classification used support vector machines (SVM) to test if PARAFAC identified features could predict subject age. The results were cross-validated within each dataset. Classification analysis was complemented by visualization of the high-dimensional feature structures using t-distributed Stochastic Neighbour Embedding (t-SNE) maps. Main Results. Development-related features were successfully identified for the developmental conditions of each dataset. SVM classification showed the identified features could accurately predict subject at a significant level above chance for both healthy and impaired populations. t-SNE maps revealed suitable tensor factorization was key in extracting the developmental features. Significance. The described methods are a promising tool for identifying latent developmental features occurring throughout childhood EEG. © 2018 IOP Publishing Ltd.
Examining Overgeneral Autobiographical Memory as a Risk Factor for Adolescent Depression
ERIC Educational Resources Information Center
Rawal, Adhip; Rice, Frances
2012-01-01
Objective: Identifying risk factors for adolescent depression is an important research aim. Overgeneral autobiographical memory (OGM) is a feature of adolescent depression and a candidate cognitive risk factor for future depression. However, no study has ascertained whether OGM predicts the onset of adolescent depressive disorder. OGM was…
Classifying transcription factor targets and discovering relevant biological features
Holloway, Dustin T; Kon, Mark; DeLisi, Charles
2008-01-01
Background An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties. Principal Findings (1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4). (2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression. (3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties. (4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter. Conclusion Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite. Reviewers This article was reviewed by Igor Jouline, Todd Mockler(nominated by Valerian Dolja), and Sandor Pongor. PMID:18513408
Jaber, Mohammed; Wölfer, Johannes; Ewelt, Christian; Holling, Markus; Hasselblatt, Martin; Niederstadt, Thomas; Zoubi, Tarek; Weckesser, Matthias; Stummer, Walter
2016-03-01
Approximately 20% of grade II and most grade III gliomas fluoresce after 5-aminolevulinic acid (5-ALA) application. Conversely, approximately 30% of nonenhancing gliomas are actually high grade. The aim of this study was to identify preoperative factors (ie, age, enhancement, 18F-fluoroethyl tyrosine positron emission tomography [F-FET PET] uptake ratios) for predicting fluorescence in gliomas without typical glioblastomas imaging features and to determine whether fluorescence will allow prediction of tumor grade or molecular characteristics. Patients harboring gliomas without typical glioblastoma imaging features were given 5-ALA. Fluorescence was recorded intraoperatively, and biopsy specimens collected from fluorescing tissue. World Health Organization (WHO) grade, Ki-67/MIB-1 index, IDH1 (R132H) mutation status, O-methylguanine DNA methyltransferase (MGMT) promoter methylation status, and 1p/19q co-deletion status were assessed. Predictive factors for fluorescence were derived from preoperative magnetic resonance imaging and F-FET PET. Classification and regression tree analysis and receiver-operating-characteristic curves were generated for defining predictors. Of 166 tumors, 82 were diagnosed as WHO grade II, 76 as grade III, and 8 as glioblastomas grade IV. Contrast enhancement, tumor volume, and F-FET PET uptake ratio >1.85 predicted fluorescence. Fluorescence correlated with WHO grade (P < .001) and Ki-67/MIB-1 index (P < .001), but not with MGMT promoter methylation status, IDH1 mutation status, or 1p19q co-deletion status. The Ki-67/MIB-1 index in fluorescing grade III gliomas was higher than in nonfluorescing tumors, whereas in fluorescing and nonfluorescing grade II tumors, no differences were noted. Age, tumor volume, and F-FET PET uptake are factors predicting 5-ALA-induced fluorescence in gliomas without typical glioblastoma imaging features. Fluorescence was associated with an increased Ki-67/MIB-1 index and high-grade pathology. Whether fluorescence in grade II gliomas identifies a subtype with worse prognosis remains to be determined.
Using input feature information to improve ultraviolet retrieval in neural networks
NASA Astrophysics Data System (ADS)
Sun, Zhibin; Chang, Ni-Bin; Gao, Wei; Chen, Maosi; Zempila, Melina
2017-09-01
In neural networks, the training/predicting accuracy and algorithm efficiency can be improved significantly via accurate input feature extraction. In this study, some spatial features of several important factors in retrieving surface ultraviolet (UV) are extracted. An extreme learning machine (ELM) is used to retrieve the surface UV of 2014 in the continental United States, using the extracted features. The results conclude that more input weights can improve the learning capacities of neural networks.
NASA Astrophysics Data System (ADS)
Shi, Bibo; Grimm, Lars J.; Mazurowski, Maciej A.; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.
2017-03-01
Reducing the overdiagnosis and overtreatment associated with ductal carcinoma in situ (DCIS) requires accurate prediction of the invasive potential at cancer screening. In this work, we investigated the utility of pre-operative histologic and mammographic features to predict upstaging of DCIS. The goal was to provide intentionally conservative baseline performance using readily available data from radiologists and pathologists and only linear models. We conducted a retrospective analysis on 99 patients with DCIS. Of those 25 were upstaged to invasive cancer at the time of definitive surgery. Pre-operative factors including both the histologic features extracted from stereotactic core needle biopsy (SCNB) reports and the mammographic features annotated by an expert breast radiologist were investigated with statistical analysis. Furthermore, we built classification models based on those features in an attempt to predict the presence of an occult invasive component in DCIS, with generalization performance assessed by receiver operating characteristic (ROC) curve analysis. Histologic features including nuclear grade and DCIS subtype did not show statistically significant differences between cases with pure DCIS and with DCIS plus invasive disease. However, three mammographic features, i.e., the major axis length of DCIS lesion, the BI-RADS level of suspicion, and radiologist's assessment did achieve the statistical significance. Using those three statistically significant features as input, a linear discriminant model was able to distinguish patients with DCIS plus invasive disease from those with pure DCIS, with AUC-ROC equal to 0.62. Overall, mammograms used for breast screening contain useful information that can be perceived by radiologists and help predict occult invasive components in DCIS.
NASA Astrophysics Data System (ADS)
Chaudhury, Baishali; Zhou, Mu; Farhidzadeh, Hamidreza; Goldgof, Dmitry B.; Hall, Lawrence O.; Gatenby, Robert A.; Gillies, Robert J.; Weinfurtner, Robert J.; Drukteinis, Jennifer S.
2016-03-01
The use of Ki67% expression, a cell proliferation marker, as a predictive and prognostic factor has been widely studied in the literature. Yet its usefulness is limited due to inconsistent cut off scores for Ki67% expression, subjective differences in its assessment in various studies, and spatial variation in expression, which makes it difficult to reproduce as a reliable independent prognostic factor. Previous studies have shown that there are significant spatial variations in Ki67% expression, which may limit its clinical prognostic utility after core biopsy. These variations are most evident when examining the periphery of the tumor vs. the core. To date, prediction of Ki67% expression from quantitative image analysis of DCE-MRI is very limited. This work presents a novel computer aided diagnosis framework to use textural kinetics to (i) predict the ratio of periphery Ki67% expression to core Ki67% expression, and (ii) predict Ki67% expression from individual tumor habitats. The pilot cohort consists of T1 weighted fat saturated DCE-MR images from 17 patients. Support vector regression with a radial basis function was used for predicting the Ki67% expression and ratios. The initial results show that texture features from individual tumor habitats are more predictive of the Ki67% expression ratio and spatial Ki67% expression than features from the whole tumor. The Ki67% expression ratio could be predicted with a root mean square error (RMSE) of 1.67%. Quantitative image analysis of DCE-MRI using textural kinetic habitats, has the potential to be used as a non-invasive method for predicting Ki67 percentage and ratio, thus more accurately reporting high KI-67 expression for patient prognosis.
Establishing the situated features associated with perceived stress
Lebois, Lauren A.M.; Hertzog, Christopher; Slavich, George M.; Barrett, Lisa Feldman; Barsalou, Lawrence W.
2016-01-01
We propose that the domain general process of categorization contributes to the perception of stress. When a situation contains features associated with stressful experiences, it is categorized as stressful. From the perspective of situated cognition, the features used to categorize experiences as stressful are the features typically true of stressful situations. To test this hypothesis, we asked participants to evaluate the perceived stress of 572 imagined situations, and to also evaluate each situation for how much it possessed 19 features potentially associated with stressful situations and their processing (e.g., self-threat, familiarity, visual imagery, outcome certainty). Following variable reduction through factor analysis, a core set of 8 features associated with stressful situations—expectation violation, self-threat, coping efficacy, bodily experience, arousal, negative valence, positive valence, and perseveration—all loaded on a single Core Stress Features factor. In a multilevel model, this factor and an Imagery factor explained 88% of the variance in judgments of perceived stress, with significant random effects reflecting differences in how individual participants categorized stress. These results support the hypothesis that people categorize situations as stressful to the extent that typical features of stressful situations are present. To our knowledge, this is the first attempt to establish a comprehensive set of features that predicts perceived stress. PMID:27288834
Cheng, Chao; Ung, Matthew; Grant, Gavin D.; Whitfield, Michael L.
2013-01-01
Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. PMID:23874175
Effects of metric hierarchy and rhyme predictability on word duration in The Cat in the Hat.
Breen, Mara
2018-05-01
Word durations convey many types of linguistic information, including intrinsic lexical features like length and frequency and contextual features like syntactic and semantic structure. The current study was designed to investigate whether hierarchical metric structure and rhyme predictability account for durational variation over and above other features in productions of a rhyming, metrically-regular children's book: The Cat in the Hat (Dr. Seuss, 1957). One-syllable word durations and inter-onset intervals were modeled as functions of segment number, lexical frequency, word class, syntactic structure, repetition, and font emphasis. Consistent with prior work, factors predicting longer word durations and inter-onset intervals included more phonemes, lower frequency, first mention, alignment with a syntactic boundary, and capitalization. A model parameter corresponding to metric grid height improved model fit of word durations and inter-onset intervals. Specifically, speakers realized five levels of metric hierarchy with inter-onset intervals such that interval duration increased linearly with increased height in the metric hierarchy. Conversely, speakers realized only three levels of metric hierarchy with word duration, demonstrating that they shortened the highly predictable rhyme resolutions. These results further understanding of the factors that affect spoken word duration, and demonstrate the myriad cues that children receive about linguistic structure from nursery rhymes. Copyright © 2018 Elsevier B.V. All rights reserved.
Systematic Characterization and Prediction of Human Hypertension Genes.
Li, Yan-Hui; Zhang, Gai-Gai; Wang, Nanping
2017-02-01
Hypertension is a major cardiovascular risk factor and accounts for a large part of cardiovascular mortality. In this work, we analyzed the properties of hypertension genes and found that when compared with genes not yet known to be involved in hypertension regulation, known hypertension genes display distinguishing features: (1) hypertension genes tend to be located at network center; (2) hypertension genes tend to interact with each other; and (3) hypertension genes tend to enrich in certain biological processes and show certain phenotypes. Based on these features, we developed a machine-learning algorithm to predict new hypertension genes. One hundred and seventy-seven candidates were predicted with a posterior probability >0.9. Evidence supporting 17 of the predictions has been found. © 2016 American Heart Association, Inc.
Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection.
Ju, Zhe; He, Jian-Jun
2018-06-01
Lysine glutarylation is new type of protein acylation modification in both prokaryotes and eukaryotes. To better understand the molecular mechanism of glutarylation, it is important to identify glutarylated substrates and their corresponding glutarylation sites accurately. In this study, a novel bioinformatics tool named GlutPred is developed to predict glutarylation sites by using multiple feature extraction and maximum relevance minimum redundancy feature selection. On the one hand, amino acid factors, binary encoding, and the composition of k-spaced amino acid pairs features are incorporated to encode glutarylation sites. And the maximum relevance minimum redundancy method and the incremental feature selection algorithm are adopted to remove the redundant features. On the other hand, a biased support vector machine algorithm is used to handle the imbalanced problem in glutarylation sites training dataset. As illustrated by 10-fold cross-validation, the performance of GlutPred achieves a satisfactory performance with a Sensitivity of 64.80%, a Specificity of 76.60%, an Accuracy of 74.90% and a Matthew's correlation coefficient of 0.3194. Feature analysis shows that some k-spaced amino acid pair features play the most important roles in the prediction of glutarylation sites. The conclusions derived from this study might provide some clues for understanding the molecular mechanisms of glutarylation. Copyright © 2018 Elsevier Inc. All rights reserved.
Use of a twin dataset to identify AMD-related visual patterns controlled by genetic factors
NASA Astrophysics Data System (ADS)
Quellec, Gwénolé; Abràmoff, Michael D.; Russell, Stephen R.
2010-03-01
The mapping of genotype to the phenotype of age-related macular degeneration (AMD) is expected to improve the diagnosis and treatment of the disease in a near future. In this study, we focused on the first step to discover this mapping: we identified visual patterns related to AMD which seem to be controlled by genetic factors, without explicitly relating them to the genes. For this purpose, we used a dataset of eye fundus photographs from 74 twin pairs, either monozygotic twins, who have the same genotype, or dizygotic twins, whose genes responsible for AMD are less likely to be identical. If we are able to differentiate monozygotic twins from dizygotic twins, based on a given visual pattern, then this pattern is likely to be controlled by genetic factors. The main visible consequence of AMD is the apparition of drusen between the retinal pigment epithelium and Bruch's membrane. We developed two automated drusen detectors based on the wavelet transform: a shape-based detector for hard drusen, and a texture- and color- based detector for soft drusen. Forty visual features were evaluated at the location of the automatically detected drusen. These features characterize the texture, the shape, the color, the spatial distribution, or the amount of drusen. A distance measure between twin pairs was defined for each visual feature; a smaller distance should be measured between monozygotic twins for visual features controlled by genetic factors. The predictions of several visual features (75.7% accuracy) are comparable or better than the predictions of human experts.
Creasy, John M; Midya, Abhishek; Chakraborty, Jayasree; Adams, Lauryn B; Gomes, Camilla; Gonen, Mithat; Seastedt, Kenneth P; Sutton, Elizabeth J; Cercek, Andrea; Kemeny, Nancy E; Shia, Jinru; Balachandran, Vinod P; Kingham, T Peter; Allen, Peter J; DeMatteo, Ronald P; Jarnagin, William R; D'Angelica, Michael I; Do, Richard K G; Simpson, Amber L
2018-06-19
This study investigates whether quantitative image analysis of pretreatment CT scans can predict volumetric response to chemotherapy for patients with colorectal liver metastases (CRLM). Patients treated with chemotherapy for CRLM (hepatic artery infusion (HAI) combined with systemic or systemic alone) were included in the study. Patients were imaged at baseline and approximately 8 weeks after treatment. Response was measured as the percentage change in tumour volume from baseline. Quantitative imaging features were derived from the index hepatic tumour on pretreatment CT, and features statistically significant on univariate analysis were included in a linear regression model to predict volumetric response. The regression model was constructed from 70% of data, while 30% were reserved for testing. Test data were input into the trained model. Model performance was evaluated with mean absolute prediction error (MAPE) and R 2 . Clinicopatholologic factors were assessed for correlation with response. 157 patients were included, split into training (n = 110) and validation (n = 47) sets. MAPE from the multivariate linear regression model was 16.5% (R 2 = 0.774) and 21.5% in the training and validation sets, respectively. Stratified by HAI utilisation, MAPE in the validation set was 19.6% for HAI and 25.1% for systemic chemotherapy alone. Clinical factors associated with differences in median tumour response were treatment strategy, systemic chemotherapy regimen, age and KRAS mutation status (p < 0.05). Quantitative imaging features extracted from pretreatment CT are promising predictors of volumetric response to chemotherapy in patients with CRLM. Pretreatment predictors of response have the potential to better select patients for specific therapies. • Colorectal liver metastases (CRLM) are downsized with chemotherapy but predicting the patients that will respond to chemotherapy is currently not possible. • Heterogeneity and enhancement patterns of CRLM can be measured with quantitative imaging. • Prediction model constructed that predicts volumetric response with 20% error suggesting that quantitative imaging holds promise to better select patients for specific treatments.
Chen, Jing; Tang, Yuan Yan; Chen, C L Philip; Fang, Bin; Lin, Yuewei; Shang, Zhaowei
2014-12-01
Protein subcellular location prediction aims to predict the location where a protein resides within a cell using computational methods. Considering the main limitations of the existing methods, we propose a hierarchical multi-label learning model FHML for both single-location proteins and multi-location proteins. The latent concepts are extracted through feature space decomposition and label space decomposition under the nonnegative data factorization framework. The extracted latent concepts are used as the codebook to indirectly connect the protein features to their annotations. We construct dual fuzzy hypergraphs to capture the intrinsic high-order relations embedded in not only feature space, but also label space. Finally, the subcellular location annotation information is propagated from the labeled proteins to the unlabeled proteins by performing dual fuzzy hypergraph Laplacian regularization. The experimental results on the six protein benchmark datasets demonstrate the superiority of our proposed method by comparing it with the state-of-the-art methods, and illustrate the benefit of exploiting both feature correlations and label correlations.
Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin
2017-11-15
An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Burnside, Elizabeth S.; Liu, Jie; Wu, Yirong; Onitilo, Adedayo A.; McCarty, Catherine; Page, C. David; Peissig, Peggy; Trentham-Dietz, Amy; Kitchner, Terrie; Fan, Jun; Yuan, Ming
2015-01-01
Rationale and Objectives The discovery of germline genetic variants associated with breast cancer has engendered interest in risk stratification for improved, targeted detection and diagnosis. However, there has yet to be a comparison of the predictive ability of these genetic variants with mammography abnormality descriptors. Materials and Methods Our IRB-approved, HIPAA-compliant study utilized a personalized medicine registry in which participants consented to provide a DNA sample and participate in longitudinal follow-up. In our retrospective, age-matched, case-controlled study of 373 cases and 395 controls who underwent breast biopsy, we collected risk factors selected a priori based on the literature including: demographic variables based on the Gail model, common germline genetic variants, and diagnostic mammography findings according to BI-RADS. We developed predictive models using logistic regression to determine the predictive ability of: 1) demographic variables, 2) 10 selected genetic variants, or 3) mammography BI-RADS features. We evaluated each model in turn by calculating a risk score for each patient using 10-fold cross validation; used this risk estimate to construct ROC curves; and compared the AUC of each using the DeLong method. Results The performance of the regression model using demographic risk factors was not statistically different from the model using genetic variants (p=0.9). The model using mammography features (AUC = 0.689) was superior to both the demographic model (AUC = .598; p<0.001) and the genetic model (AUC = .601; p<0.001). Conclusion BI-RADS features exceeded the ability of demographic and 10 selected germline genetic variants to predict breast cancer in women recommended for biopsy. PMID:26514439
Predicting DNA hybridization kinetics from sequence
NASA Astrophysics Data System (ADS)
Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu
2018-01-01
Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.
Dynamic Socialized Gaussian Process Models for Human Behavior Prediction in a Health Social Network
Shen, Yelong; Phan, NhatHai; Xiao, Xiao; Jin, Ruoming; Sun, Junfeng; Piniewski, Brigitte; Kil, David; Dou, Dejing
2016-01-01
Modeling and predicting human behaviors, such as the level and intensity of physical activity, is a key to preventing the cascade of obesity and helping spread healthy behaviors in a social network. In our conference paper, we have developed a social influence model, named Socialized Gaussian Process (SGP), for socialized human behavior modeling. Instead of explicitly modeling social influence as individuals' behaviors influenced by their friends' previous behaviors, SGP models the dynamic social correlation as the result of social influence. The SGP model naturally incorporates personal behavior factor and social correlation factor (i.e., the homophily principle: Friends tend to perform similar behaviors) into a unified model. And it models the social influence factor (i.e., an individual's behavior can be affected by his/her friends) implicitly in dynamic social correlation schemes. The detailed experimental evaluation has shown the SGP model achieves better prediction accuracy compared with most of baseline methods. However, a Socialized Random Forest model may perform better at the beginning compared with the SGP model. One of the main reasons is the dynamic social correlation function is purely based on the users' sequential behaviors without considering other physical activity-related features. To address this issue, we further propose a novel “multi-feature SGP model” (mfSGP) which improves the SGP model by using multiple physical activity-related features in the dynamic social correlation learning. Extensive experimental results illustrate that the mfSGP model clearly outperforms all other models in terms of prediction accuracy and running time. PMID:27746515
DOE Office of Scientific and Technical Information (OSTI.GOV)
Magome, T; Haga, A; Igaki, H
Purpose: Although many outcome prediction models based on dose-volume information have been proposed, it is well known that the prognosis may be affected also by multiple clinical factors. The purpose of this study is to predict the survival time after radiotherapy for high-grade glioma patients based on features including clinical and dose-volume histogram (DVH) information. Methods: A total of 35 patients with high-grade glioma (oligodendroglioma: 2, anaplastic astrocytoma: 3, glioblastoma: 30) were selected in this study. All patients were treated with prescribed dose of 30–80 Gy after surgical resection or biopsy from 2006 to 2013 at The University of Tokyomore » Hospital. All cases were randomly separated into training dataset (30 cases) and test dataset (5 cases). The survival time after radiotherapy was predicted based on a multiple linear regression analysis and artificial neural network (ANN) by using 204 candidate features. The candidate features included the 12 clinical features (tumor location, extent of surgical resection, treatment duration of radiotherapy, etc.), and the 192 DVH features (maximum dose, minimum dose, D95, V60, etc.). The effective features for the prediction were selected according to a step-wise method by using 30 training cases. The prediction accuracy was evaluated by a coefficient of determination (R{sup 2}) between the predicted and actual survival time for the training and test dataset. Results: In the multiple regression analysis, the value of R{sup 2} between the predicted and actual survival time was 0.460 for the training dataset and 0.375 for the test dataset. On the other hand, in the ANN analysis, the value of R{sup 2} was 0.806 for the training dataset and 0.811 for the test dataset. Conclusion: Although a large number of patients would be needed for more accurate and robust prediction, our preliminary Result showed the potential to predict the outcome in the patients with high-grade glioma. This work was partly supported by the JSPS Core-to-Core Program(No. 23003) and Grant-in-aid from the JSPS Fellows.« less
Value-Driven Attentional Capture is Modulated by Spatial Context
Anderson, Brian A.
2014-01-01
When stimuli are associated with reward outcome, their visual features acquire high attentional priority such that stimuli possessing those features involuntarily capture attention. Whether a particular feature is predictive of reward, however, will vary with a number of contextual factors. One such factor is spatial location: for example, red berries are likely to be found in low-lying bushes, whereas yellow bananas are likely to be found on treetops. In the present study, I explore whether the attentional priority afforded to reward-associated features is modulated by such location-based contingencies. The results demonstrate that when a stimulus feature is associated with a reward outcome in one spatial location but not another, attentional capture by that feature is selective to when it appears in the rewarded location. This finding provides insight into how reward learning effectively modulates attention in an environment with complex stimulus–reward contingencies, thereby supporting efficient foraging. PMID:26069450
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klement, Rainer J., E-mail: rainer_klement@gmx.de; Department of Radiotherapy and Radiation Oncology, Leopoldina Hospital, Schweinfurt; Allgäuer, Michael
2014-03-01
Background: Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. Methods and Materials: We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performancemore » by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Results: Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED{sub ISO}) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED{sub ISO} and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED{sub ISO}, age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. Conclusions: These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning.« less
Klement, Rainer J; Allgäuer, Michael; Appold, Steffen; Dieckmann, Karin; Ernst, Iris; Ganswindt, Ute; Holy, Richard; Nestle, Ursula; Nevinny-Stickel, Meinhard; Semrau, Sabine; Sterzing, Florian; Wittig, Andrea; Andratschke, Nicolaus; Guckenberger, Matthias
2014-03-01
Several prognostic factors for local tumor control probability (TCP) after stereotactic body radiation therapy (SBRT) for early stage non-small cell lung cancer (NSCLC) have been described, but no attempts have been undertaken to explore whether a nonlinear combination of potential factors might synergistically improve the prediction of local control. We investigated a support vector machine (SVM) for predicting TCP in a cohort of 399 patients treated at 13 German and Austrian institutions. Among 7 potential input features for the SVM we selected those most important on the basis of forward feature selection, thereby evaluating classifier performance by using 10-fold cross-validation and computing the area under the ROC curve (AUC). The final SVM classifier was built by repeating the feature selection 10 times with different splitting of the data for cross-validation and finally choosing only those features that were selected at least 5 out of 10 times. It was compared with a multivariate logistic model that was built by forward feature selection. Local failure occurred in 12% of patients. Biologically effective dose (BED) at the isocenter (BED(ISO)) was the strongest predictor of TCP in the logistic model and also the most frequently selected input feature for the SVM. A bivariate logistic function of BED(ISO) and the pulmonary function indicator forced expiratory volume in 1 second (FEV1) yielded the best description of the data but resulted in a significantly smaller AUC than the final SVM classifier with the input features BED(ISO), age, baseline Karnofsky index, and FEV1 (0.696 ± 0.040 vs 0.789 ± 0.001, P<.03). The final SVM resulted in sensitivity and specificity of 67.0% ± 0.5% and 78.7% ± 0.3%, respectively. These results confirm that machine learning techniques like SVMs can be successfully applied to predict treatment outcome after SBRT. Improvements over traditional TCP modeling are expected through a nonlinear combination of multiple features, eventually helping in the task of personalized treatment planning. Copyright © 2014 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Kimonis, Eva R.; Frick, Paul J.; Boris, Neil W.; Smyke, Anna T.; Cornell, Amy H.; Farrell, Jamie M.; Zeanah, Charles H.
2006-01-01
A behaviorally-uninhibited temperament, callous-unemotional (CU) features, and harsh parenting have been associated with specific patterns of aggressive behavior in older children and adolescents. We tested the additive and interactive effects of these factors in predicting different types of aggressive behavior in a high-risk preschool sample.…
Are Sensory Processing Features Associated with Depressive Symptoms in Boys with an ASD?
ERIC Educational Resources Information Center
Bitsika, Vicki; Sharpley, Christopher F.; Mills, Richard
2016-01-01
The association between Sensory Processing Features (SPF) and depressive symptoms was investigated at two levels in 150 young males (6-18 years) with an ASD. First, a significant correlation was found between SPF and total depressive symptom scores. Second, different aspects of SPF significantly predicted different depressive symptom factors, with…
Growth/reflectance model interface for wheat and corresponding model
NASA Technical Reports Server (NTRS)
Suits, G. H.; Sieron, R.; Odenweller, J.
1984-01-01
The use of modeling to explore the possibility of discovering new and useful crop condition indicators which might be available from the Thematic Mapper and to connect these symptoms to the biological causes in the crop is discussed. A crop growth model was used to predict the day to day growth features of the crop as it responds biologically to the various environmental factors. A reflectance model was used to predict the character of the interaction of daylight with the predicted growth features. An atmospheric path radiance was added to the reflected daylight to simulate the radiance appearing at the sensor. Finally, the digitized data sent to a ground station were calculated. The crop under investigation is wheat.
NASA Astrophysics Data System (ADS)
Klomp, Sander; van der Sommen, Fons; Swager, Anne-Fré; Zinger, Svitlana; Schoon, Erik J.; Curvers, Wouter L.; Bergman, Jacques J.; de With, Peter H. N.
2017-03-01
Volumetric Laser Endomicroscopy (VLE) is a promising technique for the detection of early neoplasia in Barrett's Esophagus (BE). VLE generates hundreds of high resolution, grayscale, cross-sectional images of the esophagus. However, at present, classifying these images is a time consuming and cumbersome effort performed by an expert using a clinical prediction model. This paper explores the feasibility of using computer vision techniques to accurately predict the presence of dysplastic tissue in VLE BE images. Our contribution is threefold. First, a benchmarking is performed for widely applied machine learning techniques and feature extraction methods. Second, three new features based on the clinical detection model are proposed, having superior classification accuracy and speed, compared to earlier work. Third, we evaluate automated parameter tuning by applying simple grid search and feature selection methods. The results are evaluated on a clinically validated dataset of 30 dysplastic and 30 non-dysplastic VLE images. Optimal classification accuracy is obtained by applying a support vector machine and using our modified Haralick features and optimal image cropping, obtaining an area under the receiver operating characteristic of 0.95 compared to the clinical prediction model at 0.81. Optimal execution time is achieved using a proposed mean and median feature, which is extracted at least factor 2.5 faster than alternative features with comparable performance.
Factor Analysis of Drawings: Application to college student models of the greenhouse effect
NASA Astrophysics Data System (ADS)
Libarkin, Julie C.; Thomas, Stephen R.; Ording, Gabriel
2015-09-01
Exploratory factor analysis was used to identify models underlying drawings of the greenhouse effect made by over 200 entering university freshmen. Initial content analysis allowed deconstruction of drawings into salient features, with grouping of these features via factor analysis. A resulting 4-factor solution explains 62% of the data variance, suggesting that 4 archetype models of the greenhouse effect dominate thinking within this population. Factor scores, indicating the extent to which each student's drawing aligned with representative models, were compared to performance on conceptual understanding and attitudes measures, demographics, and non-cognitive features of drawings. Student drawings were also compared to drawings made by scientists to ascertain the extent to which models reflect more sophisticated and accurate models. Results indicate that student and scientist drawings share some similarities, most notably the presence of some features of the most sophisticated non-scientific model held among the study population. Prior knowledge, prior attitudes, gender, and non-cognitive components are also predictive of an individual student's model. This work presents a new technique for analyzing drawings, with general implications for the use of drawings in investigating student conceptions.
MRI textures as outcome predictor for Gamma Knife radiosurgery on vestibular schwannoma
NASA Astrophysics Data System (ADS)
Langenhuizen, P. P. J. H.; Legters, M. J. W.; Zinger, S.; Verheul, H. B.; Leenstra, S.; de With, P. H. N.
2018-02-01
Vestibular schwannomas (VS) are benign brain tumors that can be treated with high-precision focused radiation with the Gamma Knife in order to stop tumor growth. Outcome prediction of Gamma Knife radiosurgery (GKRS) treatment can help in determining whether GKRS will be effective on an individual patient basis. However, at present, prognostic factors of tumor control after GKRS for VS are largely unknown, and only clinical factors, such as size of the tumor at treatment and pre-treatment growth rate of the tumor, have been considered thus far. This research aims at outcome prediction of GKRS by means of quantitative texture feature analysis on conventional MRI scans. We compute first-order statistics and features based on gray-level co- occurrence (GLCM) and run-length matrices (RLM), and employ support vector machines and decision trees for classification. In a clinical dataset, consisting of 20 tumors showing treatment failure and 20 tumors exhibiting treatment success, we have discovered that the second-order statistical metrics distilled from GLCM and RLM are suitable for describing texture, but are slightly outperformed by simple first-order statistics, like mean, standard deviation and median. The obtained prediction accuracy is about 85%, but a final choice of the best feature can only be made after performing more extensive analyses on larger datasets. In any case, this work provides suitable texture measures for successful prediction of GKRS treatment outcome for VS.
Developing a clinical utility framework to evaluate prediction models in radiogenomics
NASA Astrophysics Data System (ADS)
Wu, Yirong; Liu, Jie; Munoz del Rio, Alejandro; Page, David C.; Alagoz, Oguzhan; Peissig, Peggy; Onitilo, Adedayo A.; Burnside, Elizabeth S.
2015-03-01
Combining imaging and genetic information to predict disease presence and behavior is being codified into an emerging discipline called "radiogenomics." Optimal evaluation methodologies for radiogenomics techniques have not been established. We aim to develop a clinical decision framework based on utility analysis to assess prediction models for breast cancer. Our data comes from a retrospective case-control study, collecting Gail model risk factors, genetic variants (single nucleotide polymorphisms-SNPs), and mammographic features in Breast Imaging Reporting and Data System (BI-RADS) lexicon. We first constructed three logistic regression models built on different sets of predictive features: (1) Gail, (2) Gail+SNP, and (3) Gail+SNP+BI-RADS. Then, we generated ROC curves for three models. After we assigned utility values for each category of findings (true negative, false positive, false negative and true positive), we pursued optimal operating points on ROC curves to achieve maximum expected utility (MEU) of breast cancer diagnosis. We used McNemar's test to compare the predictive performance of the three models. We found that SNPs and BI-RADS features augmented the baseline Gail model in terms of the area under ROC curve (AUC) and MEU. SNPs improved sensitivity of the Gail model (0.276 vs. 0.147) and reduced specificity (0.855 vs. 0.912). When additional mammographic features were added, sensitivity increased to 0.457 and specificity to 0.872. SNPs and mammographic features played a significant role in breast cancer risk estimation (p-value < 0.001). Our decision framework comprising utility analysis and McNemar's test provides a novel framework to evaluate prediction models in the realm of radiogenomics.
Sparse feature selection for classification and prediction of metastasis in endometrial cancer.
Ahsen, Mehmet Eren; Boren, Todd P; Singh, Nitin K; Misganaw, Burook; Mutch, David G; Moore, Kathleen N; Backes, Floor J; McCourt, Carolyn K; Lea, Jayanthi S; Miller, David S; White, Michael A; Vidyasagar, Mathukumalli
2017-03-27
Metastasis via pelvic and/or para-aortic lymph nodes is a major risk factor for endometrial cancer. Lymph-node resection ameliorates risk but is associated with significant co-morbidities. Incidence in patients with stage I disease is 4-22% but no mechanism exists to accurately predict it. Therefore, national guidelines for primary staging surgery include pelvic and para-aortic lymph node dissection for all patients whose tumor exceeds 2cm in diameter. We sought to identify a robust molecular signature that can accurately classify risk of lymph node metastasis in endometrial cancer patients. 86 tumors matched for age and race, and evenly distributed between lymph node-positive and lymph node-negative cases, were selected as a training cohort. Genomic micro-RNA expression was profiled for each sample to serve as the predictive feature matrix. An independent set of 28 tumor samples was collected and similarly characterized to serve as a test cohort. A feature selection algorithm was designed for applications where the number of samples is far smaller than the number of measured features per sample. A predictive miRNA expression signature was developed using this algorithm, which was then used to predict the metastatic status of the independent test cohort. A weighted classifier, using 18 micro-RNAs, achieved 100% accuracy on the training cohort. When applied to the testing cohort, the classifier correctly predicted 90% of node-positive cases, and 80% of node-negative cases (FDR = 6.25%). Results indicate that the evaluation of the quantitative sparse-feature classifier proposed here in clinical trials may lead to significant improvement in the prediction of lymphatic metastases in endometrial cancer patients.
NASA Astrophysics Data System (ADS)
Navares, Ricardo; Aznarte, José Luis
2017-04-01
In this paper, we approach the problem of predicting the concentrations of Poaceae pollen which define the main pollination season in the city of Madrid. A classification-based approach, based on a computational intelligence model (random forests), is applied to forecast the dates in which risk concentration levels are to be observed. Unlike previous works, the proposal extends the range of forecasting horizons up to 6 months ahead. Furthermore, the proposed model allows to determine the most influential factors for each horizon, making no assumptions about the significance of the weather features. The performace of the proposed model proves it as a successful tool for allergy patients in preventing and minimizing the exposure to risky pollen concentrations and for researchers to gain a deeper insight on the factors driving the pollination season.
Navares, Ricardo; Aznarte, José Luis
2017-04-01
In this paper, we approach the problem of predicting the concentrations of Poaceae pollen which define the main pollination season in the city of Madrid. A classification-based approach, based on a computational intelligence model (random forests), is applied to forecast the dates in which risk concentration levels are to be observed. Unlike previous works, the proposal extends the range of forecasting horizons up to 6 months ahead. Furthermore, the proposed model allows to determine the most influential factors for each horizon, making no assumptions about the significance of the weather features. The performace of the proposed model proves it as a successful tool for allergy patients in preventing and minimizing the exposure to risky pollen concentrations and for researchers to gain a deeper insight on the factors driving the pollination season.
Gonsalves, Valerie M; McLawsen, Julia E; Huss, Matthew T; Scalora, Mario J
2013-01-01
A wealth of research has underscored the strong relationship between PCL-R scores and recidivism. However, mounting criticism cites the PCL-R's cumbersome administration procedures and failure to adequately measure core features associated with the construct of psychopathy (Skeem, Polaschek, Patrick, & Lilienfeld, 2011). In light of these concerns, this study examined the PPI and the PPI-R, which were designed to measure core personality features associated with psychopathy (Lilienfeld & Andrews, 1996; Lilienfeld & Widows, 2005). Study one examined the PPI relative to the PCL-R and examined its factor structure. The instruments shared few significant correlations and neither the PCL-R nor the PPI significantly predicted recidivism. Study two examined the PPI-R relative to the PCL-R, the PPI, both history of violence and future criminal activity and measure of related constructs. The PPI-R was significantly correlated with measures of empathy and criminal thinking and the factors were related to a history of violence and predicted future violent criminal behavior. Copyright © 2013 Elsevier Ltd. All rights reserved.
Haltigan, John D; Vaillancourt, Tracy
2016-03-01
To examine trajectories of adolescent borderline personality (BP) features in a normative-risk cohort (n = 566) of Canadian children assessed at ages 13, 14, 15, and 16 and childhood predictors of trajectory group membership assessed at ages 8, 10, 11, and 12. Data were drawn from the McMaster Teen Study, an on-going study examining relations among bullying, mental health, and academic achievement. Participants and their parents completed a battery of mental health and peer relations questionnaires at each wave of the study. Academic competence was assessed at age 8 (Grade 3). Latent class growth analysis, analysis of variance, and logistic regression were used to analyze the data. Three distinct BP features trajectory groups were identified: elevated or rising, intermediate or stable, and low or stable. Parent- and child-reported mental health symptoms, peer relations risk factors, and intra-individual risk factors were significant predictors of elevated or rising and intermediate or stable trajectory groups. Child-reported attention-deficit hyperactivity disorder (ADHD) and somatization symptoms uniquely predicted elevated or rising trajectory group membership, whereas parent-reported anxiety and child-reported ADHD symptoms uniquely predicted intermediate or stable trajectory group membership. Child-reported somatization symptoms was the only predictor to differentiate the intermediate or stable and elevated or rising trajectory groups (OR 1.15, 95% CI 1.04 to 1.28). Associations between child-reported reactive temperament and elevated BP features trajectory group membership were 10.23 times higher among children who were bullied, supporting a diathesis-stress pathway in the development of BP features for these youth. Findings demonstrate the heterogeneous course of BP features in early adolescence and shed light on the potential prodromal course of later borderline personality disorder. © The Author(s) 2015.
Statistical Analysis of Complexity Generators for Cost Estimation
NASA Technical Reports Server (NTRS)
Rowell, Ginger Holmes
1999-01-01
Predicting the cost of cutting edge new technologies involved with spacecraft hardware can be quite complicated. A new feature of the NASA Air Force Cost Model (NAFCOM), called the Complexity Generator, is being developed to model the complexity factors that drive the cost of space hardware. This parametric approach is also designed to account for the differences in cost, based on factors that are unique to each system and subsystem. The cost driver categories included in this model are weight, inheritance from previous missions, technical complexity, and management factors. This paper explains the Complexity Generator framework, the statistical methods used to select the best model within this framework, and the procedures used to find the region of predictability and the prediction intervals for the cost of a mission.
Mwangi, Benson; Ebmeier, Klaus P; Matthews, Keith; Steele, J Douglas
2012-05-01
Quantitative abnormalities of brain structure in patients with major depressive disorder have been reported at a group level for decades. However, these structural differences appear subtle in comparison with conventional radiologically defined abnormalities, with considerable inter-subject variability. Consequently, it has not been possible to readily identify scans from patients with major depressive disorder at an individual level. Recently, machine learning techniques such as relevance vector machines and support vector machines have been applied to predictive classification of individual scans with variable success. Here we describe a novel hybrid method, which combines machine learning with feature selection and characterization, with the latter aimed at maximizing the accuracy of machine learning prediction. The method was tested using a multi-centre dataset of T(1)-weighted 'structural' scans. A total of 62 patients with major depressive disorder and matched controls were recruited from referred secondary care clinical populations in Aberdeen and Edinburgh, UK. The generalization ability and predictive accuracy of the classifiers was tested using data left out of the training process. High prediction accuracy was achieved (~90%). While feature selection was important for maximizing high predictive accuracy with machine learning, feature characterization contributed only a modest improvement to relevance vector machine-based prediction (~5%). Notably, while the only information provided for training the classifiers was T(1)-weighted scans plus a categorical label (major depressive disorder versus controls), both relevance vector machine and support vector machine 'weighting factors' (used for making predictions) correlated strongly with subjective ratings of illness severity. These results indicate that machine learning techniques have the potential to inform clinical practice and research, as they can make accurate predictions about brain scan data from individual subjects. Furthermore, machine learning weighting factors may reflect an objective biomarker of major depressive disorder illness severity, based on abnormalities of brain structure.
2009-01-01
Background The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes. Results We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality. Conclusion We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality. PMID:19758426
Stolt, S; Korja, R; Matomäki, J; Lapinleimu, H; Haataja, L; Lehtonen, L
2014-05-01
It is not clearly understood how the quality of early mother-child interaction influences language development in very-low-birth-weight children (VLBW). We aim to analyze associations between early language and the quality of mother-child interaction, and, the predictive value of the features of early mother-child interaction on language development at 24 months of corrected age in VLBW children. A longitudinal prospective follow-up study design was used. The participants were 28 VLBW children and 34 full-term controls. Language development was measured using different methods at 6, 12 and at 24 months of age. The quality of mother-child interaction was assessed using PC-ERA method at 6 and at 12 months of age. Associations between the features of early interaction and language development were different in the groups of VLBW and full-term children. There were no significant correlations between the features of mother-child interaction and language skills when measured at the same age in the VLBW group. Significant longitudinal correlations were detected in the VLBW group especially if the quality of early interactions was measured at six months and language skills at 2 years of age. However, when the predictive value of the features of early interactions for later poor language performance was analyzed separately, the features of early interaction predicted language skills in the VLBW group only weakly. The biological factors may influence on the language development more in the VLBW children than in the full-term children. The results also underline the role of maternal and dyadic factors in early interactions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Predictive factors of thyroid cancer in patients with Graves' disease.
Ren, Meng; Wu, Mu Chao; Shang, Chang Zhen; Wang, Xiao Yi; Zhang, Jing Lu; Cheng, Hua; Xu, Ming Tong; Yan, Li
2014-01-01
The best preoperative examination in Graves' disease with thyroid cancer still remains uncertain. The objectives of the present study were to investigate the prevalence of thyroid cancer in Graves' disease patients, and to identify the predictive factors and ultrasonographic features of thyroid cancer that may aid the preoperative diagnosis in Graves' disease. This retrospective study included 423 patients with Graves' disease who underwent surgical treatment from 2002 to 2012 at our institution. The clinical features and ultrasonographic findings of thyroid nodules were recorded. The diagnosis of thyroid cancer was determined according to the pathological results. Thyroid cancer was discovered in 58 of the 423 (13.7 %) surgically treated Graves' disease patients; 46 of those 58 patients had thyroid nodules, and the other 12 patients were diagnosed with incidentally discovered thyroid carcinomas without thyroid nodules. Among the 58 patients with thyroid cancer, papillary microcarcinomas were discovered in 50 patients, and multifocality and lymph node involvement were detected in the other 8 patients. Multivariate regression analysis showed younger age was the only significant factor predictive of metastatic thyroid cancer. Ultrasonographic findings of calcification and intranodular blood flow in thyroid nodules indicate that they are more likely to harbor thyroid cancers. Because the influencing factor of metastatic thyroid cancers in Graves' disease is young age, every suspicious nodule in Graves' disease patients should be evaluated and treated carefully, especially in younger patients because of the potential for metastasis.
Khodayari-Rostamabad, Ahmad; Reilly, James P; Hasey, Gary M; de Bruin, Hubert; Maccrimmon, Duncan J
2013-10-01
The problem of identifying, in advance, the most effective treatment agent for various psychiatric conditions remains an elusive goal. To address this challenge, we investigate the performance of the proposed machine learning (ML) methodology (based on the pre-treatment electroencephalogram (EEG)) for prediction of response to treatment with a selective serotonin reuptake inhibitor (SSRI) medication in subjects suffering from major depressive disorder (MDD). A relatively small number of most discriminating features are selected from a large group of candidate features extracted from the subject's pre-treatment EEG, using a machine learning procedure for feature selection. The selected features are fed into a classifier, which was realized as a mixture of factor analysis (MFA) model, whose output is the predicted response in the form of a likelihood value. This likelihood indicates the extent to which the subject belongs to the responder vs. non-responder classes. The overall method was evaluated using a "leave-n-out" randomized permutation cross-validation procedure. A list of discriminating EEG biomarkers (features) was found. The specificity of the proposed method is 80.9% while sensitivity is 94.9%, for an overall prediction accuracy of 87.9%. There is a 98.76% confidence that the estimated prediction rate is within the interval [75%, 100%]. These results indicate that the proposed ML method holds considerable promise in predicting the efficacy of SSRI antidepressant therapy for MDD, based on a simple and cost-effective pre-treatment EEG. The proposed approach offers the potential to improve the treatment of major depression and to reduce health care costs. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Shouval, Roni; Labopin, Myriam; Unger, Ron; Giebel, Sebastian; Ciceri, Fabio; Schmid, Christoph; Esteve, Jordi; Baron, Frederic; Gorin, Norbert Claude; Savani, Bipin; Shimoni, Avichai; Mohty, Mohamad; Nagler, Arnon
2016-01-01
Models for prediction of allogeneic hematopoietic stem transplantation (HSCT) related mortality partially account for transplant risk. Improving predictive accuracy requires understating of prediction limiting factors, such as the statistical methodology used, number and quality of features collected, or simply the population size. Using an in-silico approach (i.e., iterative computerized simulations), based on machine learning (ML) algorithms, we set out to analyze these factors. A cohort of 25,923 adult acute leukemia patients from the European Society for Blood and Marrow Transplantation (EBMT) registry was analyzed. Predictive objective was non-relapse mortality (NRM) 100 days following HSCT. Thousands of prediction models were developed under varying conditions: increasing sample size, specific subpopulations and an increasing number of variables, which were selected and ranked by separate feature selection algorithms. Depending on the algorithm, predictive performance plateaued on a population size of 6,611-8,814 patients, reaching a maximal area under the receiver operator characteristic curve (AUC) of 0.67. AUCs' of models developed on specific subpopulation ranged from 0.59 to 0.67 for patients in second complete remission and receiving reduced intensity conditioning, respectively. Only 3-5 variables were necessary to achieve near maximal AUCs. The top 3 ranking variables, shared by all algorithms were disease stage, donor type, and conditioning regimen. Our findings empirically demonstrate that with regards to NRM prediction, few variables "carry the weight" and that traditional HSCT data has been "worn out". "Breaking through" the predictive boundaries will likely require additional types of inputs.
You, Dokyoung S; Meagher, Mary W
2017-01-01
Individuals with greater borderline personality features may be vulnerable to chronic pain. Because pain is an unpleasant sensory and emotional experience, affect dysregulation as the core personality feature may be linked to pain hypersensitivity. Studies have found that greater borderline features are associated with increased intensity in clinical and experimental pain, and that depression mediates this increase. The current study further examined the association between borderline features and heat pain sensitivity, the contribution of affect dysregulation and the other borderline personality factors (identity problems, negative relationships, self-harming/impulsivity) to the association, and depression as a mediator. Additionally, we examined whether blunted sympathetic responses mediate the association between borderline features and temporal summation of second pain (TSSP). Thermal pain threshold, thermal TSSP and aftersensations pain were assessed in 79 healthy individuals with varying degrees of borderline features. TSSP is a proxy measure for central sensitization and refers to the gradual increase in pain to repeated nociceptive stimuli. A regression analysis showed that greater borderline features predicted greater TSSP (β = .22, p = .050, R 2 = .05). Borderline features were unrelated to pain threshold and TSSP decay. A stepwise regression showed greater TSSP in individuals with greater borderline features was accounted for by the negative relationships factor rather than the affect dysregulation factor. The results of mediational analyses showed depression and blunted sympathetic skin conductance responses mediated the positive association between TSSP and borderline features.
Fatigue Analyses Under Constant- and Variable-Amplitude Loading Using Small-Crack Theory
NASA Technical Reports Server (NTRS)
Newman, J. C., Jr.; Phillips, E. P.; Everett, R. A., Jr.
1999-01-01
Studies on the growth of small cracks have led to the observation that fatigue life of many engineering materials is primarily "crack growth" from micro-structural features, such as inclusion particles, voids, slip-bands or from manufacturing defects. This paper reviews the capabilities of a plasticity-induced crack-closure model to predict fatigue lives of metallic materials using "small-crack theory" under various loading conditions. Constraint factors, to account for three-dimensional effects, were selected to correlate large-crack growth rate data as a function of the effective stress-intensity factor range (delta-Keff) under constant-amplitude loading. Modifications to the delta-Keff-rate relations in the near-threshold regime were needed to fit measured small-crack growth rate behavior. The model was then used to calculate small-and large-crack growth rates, and to predict total fatigue lives, for notched and un-notched specimens under constant-amplitude and spectrum loading. Fatigue lives were predicted using crack-growth relations and micro-structural features like those that initiated cracks in the fatigue specimens for most of the materials analyzed. Results from the tests and analyses agreed well.
Seminal quality prediction using data mining methods.
Sahoo, Anoop J; Kumar, Yugal
2014-01-01
Now-a-days, some new classes of diseases have come into existences which are known as lifestyle diseases. The main reasons behind these diseases are changes in the lifestyle of people such as alcohol drinking, smoking, food habits etc. After going through the various lifestyle diseases, it has been found that the fertility rates (sperm quantity) in men has considerably been decreasing in last two decades. Lifestyle factors as well as environmental factors are mainly responsible for the change in the semen quality. The objective of this paper is to identify the lifestyle and environmental features that affects the seminal quality and also fertility rate in man using data mining methods. The five artificial intelligence techniques such as Multilayer perceptron (MLP), Decision Tree (DT), Navie Bayes (Kernel), Support vector machine+Particle swarm optimization (SVM+PSO) and Support vector machine (SVM) have been applied on fertility dataset to evaluate the seminal quality and also to predict the person is either normal or having altered fertility rate. While the eight feature selection techniques such as support vector machine (SVM), neural network (NN), evolutionary logistic regression (LR), support vector machine plus particle swarm optimization (SVM+PSO), principle component analysis (PCA), chi-square test, correlation and T-test methods have been used to identify more relevant features which affect the seminal quality. These techniques are applied on fertility dataset which contains 100 instances with nine attribute with two classes. The experimental result shows that SVM+PSO provides higher accuracy and area under curve (AUC) rate (94% & 0.932) among multi-layer perceptron (MLP) (92% & 0.728), Support Vector Machines (91% & 0.758), Navie Bayes (Kernel) (89% & 0.850) and Decision Tree (89% & 0.735) for some of the seminal parameters. This paper also focuses on the feature selection process i.e. how to select the features which are more important for prediction of fertility rate. In this paper, eight feature selection methods are applied on fertility dataset to find out a set of good features. The investigational results shows that childish diseases (0.079) and high fever features (0.057) has less impact on fertility rate while age (0.8685), season (0.843), surgical intervention (0.7683), alcohol consumption (0.5992), smoking habit (0.575), number of hours spent on setting (0.4366) and accident (0.5973) features have more impact. It is also observed that feature selection methods increase the accuracy of above mentioned techniques (multilayer perceptron 92%, support vector machine 91%, SVM+PSO 94%, Navie Bayes (Kernel) 89% and decision tree 89%) as compared to without feature selection methods (multilayer perceptron 86%, support vector machine 86%, SVM+PSO 85%, Navie Bayes (Kernel) 83% and decision tree 84%) which shows the applicability of feature selection methods in prediction. This paper lightens the application of artificial techniques in medical domain. From this paper, it can be concluded that data mining methods can be used to predict a person with or without disease based on environmental and lifestyle parameters/features rather than undergoing various medical test. In this paper, five data mining techniques are used to predict the fertility rate and among which SVM+PSO provide more accurate results than support vector machine and decision tree.
Sturtz, Timothy M; Schichtel, Bret A; Larson, Timothy V
2014-10-07
Source contributions to total fine particle carbon predicted by a chemical transport model (CTM) were incorporated into the positive matrix factorization (PMF) receptor model to form a receptor-oriented hybrid model. The level of influence of the CTM versus traditional PMF was varied using a weighting parameter applied to an object function as implemented in the Multilinear Engine (ME-2). The methodology provides the ability to separate features that would not be identified using PMF alone, without sacrificing fit to observations. The hybrid model was applied to IMPROVE data taken from 2006 through 2008 at Monture and Sula Peak, Montana. It was able to separately identify major contributions of total carbon (TC) from wildfires and minor contributions from biogenic sources. The predictions of TC had a lower cross-validated RMSE than those from either PMF or CTM alone. Two unconstrained, minor features were identified at each site, a soil derived feature with elevated summer impacts and a feature enriched in sulfate and nitrate with significant, but sporadic contributions across the sampling period. The respective mean TC contributions from wildfires, biogenic emissions, and other sources were 1.18, 0.12, and 0.12 ugC/m(3) at Monture and 1.60, 0.44, and 0.06 ugC/m(3) at Sula Peak.
Occlusal factors are not related to self-reported bruxism.
Manfredini, Daniele; Visscher, Corine M; Guarda-Nardini, Luca; Lobbezoo, Frank
2012-01-01
To estimate the contribution of various occlusal features of the natural dentition that may identify self-reported bruxers compared to nonbruxers. Two age- and sex-matched groups of self-reported bruxers (n = 67) and self-reported nonbruxers (n = 75) took part in the study. For each patient, the following occlusal features were clinically assessed: retruded contact position (RCP) to intercuspal contact position (ICP) slide length (< 2 mm was considered normal), vertical overlap (< 0 mm was considered an anterior open bite; > 4 mm, a deep bite), horizontal overlap (> 4 mm was considered a large horizontal overlap), incisor dental midline discrepancy (< 2 mm was considered normal), and the presence of a unilateral posterior crossbite, mediotrusive interferences, and laterotrusive interferences. A multiple logistic regression model was used to identify the significant associations between the assessed occlusal features (independent variables) and self-reported bruxism (dependent variable). Accuracy values to predict self-reported bruxism were unacceptable for all occlusal variables. The only variable remaining in the final regression model was laterotrusive interferences (P = .030). The percentage of explained variance for bruxism by the final multiple regression model was 4.6%. This model including only one occlusal factor showed low positive (58.1%) and negative predictive values (59.7%), thus showing a poor accuracy to predict the presence of self-reported bruxism (59.2%). This investigation suggested that the contribution of occlusion to the differentiation between bruxers and nonbruxers is negligible. This finding supports theories that advocate a much diminished role for peripheral anatomical-structural factors in the pathogenesis of bruxism.
Li, Mingzhong; Xue, Jianquan; Li, Yanchao; Tang, Shukai
2014-01-01
Considering the influence of particle shape and the rheological properties of fluid, two artificial intelligence methods (Artificial Neural Network and Support Vector Machine) were used to predict the wall factor which is widely introduced to deduce the net hydrodynamic drag force of confining boundaries on settling particles. 513 data points were culled from the experimental data of previous studies, which were divided into training set and test set. Particles with various shapes were divided into three kinds: sphere, cylinder, and rectangular prism; feature parameters of each kind of particle were extracted; prediction models of sphere and cylinder using artificial neural network were established. Due to the little number of rectangular prism sample, support vector machine was used to predict the wall factor, which is more suitable for addressing the problem of small samples. The characteristic dimension was presented to describe the shape and size of the diverse particles and a comprehensive prediction model of particles with arbitrary shapes was established to cover all types of conditions. Comparisons were conducted between the predicted values and the experimental results. PMID:24772024
Li, Mingzhong; Zhang, Guodong; Xue, Jianquan; Li, Yanchao; Tang, Shukai
2014-01-01
Considering the influence of particle shape and the rheological properties of fluid, two artificial intelligence methods (Artificial Neural Network and Support Vector Machine) were used to predict the wall factor which is widely introduced to deduce the net hydrodynamic drag force of confining boundaries on settling particles. 513 data points were culled from the experimental data of previous studies, which were divided into training set and test set. Particles with various shapes were divided into three kinds: sphere, cylinder, and rectangular prism; feature parameters of each kind of particle were extracted; prediction models of sphere and cylinder using artificial neural network were established. Due to the little number of rectangular prism sample, support vector machine was used to predict the wall factor, which is more suitable for addressing the problem of small samples. The characteristic dimension was presented to describe the shape and size of the diverse particles and a comprehensive prediction model of particles with arbitrary shapes was established to cover all types of conditions. Comparisons were conducted between the predicted values and the experimental results.
Jaber, Mohammed; Wölfer, Johannes; Ewelt, Christian; Holling, Markus; Hasselblatt, Martin; Niederstadt, Thomas; Zoubi, Tarek; Weckesser, Matthias
2015-01-01
BACKGROUND: Approximately 20% of grade II and most grade III gliomas fluoresce after 5-aminolevulinic acid (5-ALA) application. Conversely, approximately 30% of nonenhancing gliomas are actually high grade. OBJECTIVE: The aim of this study was to identify preoperative factors (ie, age, enhancement, 18F-fluoroethyl tyrosine positron emission tomography [18F-FET PET] uptake ratios) for predicting fluorescence in gliomas without typical glioblastomas imaging features and to determine whether fluorescence will allow prediction of tumor grade or molecular characteristics. METHODS: Patients harboring gliomas without typical glioblastoma imaging features were given 5-ALA. Fluorescence was recorded intraoperatively, and biopsy specimens collected from fluorescing tissue. World Health Organization (WHO) grade, Ki-67/MIB-1 index, IDH1 (R132H) mutation status, O6-methylguanine DNA methyltransferase (MGMT) promoter methylation status, and 1p/19q co-deletion status were assessed. Predictive factors for fluorescence were derived from preoperative magnetic resonance imaging and 18F-FET PET. Classification and regression tree analysis and receiver-operating-characteristic curves were generated for defining predictors. RESULTS: Of 166 tumors, 82 were diagnosed as WHO grade II, 76 as grade III, and 8 as glioblastomas grade IV. Contrast enhancement, tumor volume, and 18F-FET PET uptake ratio >1.85 predicted fluorescence. Fluorescence correlated with WHO grade (P < .001) and Ki-67/MIB-1 index (P < .001), but not with MGMT promoter methylation status, IDH1 mutation status, or 1p19q co-deletion status. The Ki-67/MIB-1 index in fluorescing grade III gliomas was higher than in nonfluorescing tumors, whereas in fluorescing and nonfluorescing grade II tumors, no differences were noted. CONCLUSION: Age, tumor volume, and 18F-FET PET uptake are factors predicting 5-ALA-induced fluorescence in gliomas without typical glioblastoma imaging features. Fluorescence was associated with an increased Ki-67/MIB-1 index and high-grade pathology. Whether fluorescence in grade II gliomas identifies a subtype with worse prognosis remains to be determined. ABBREVIATIONS: 5-ALA, 5-aminolevulinic acid CRT, classification and regression tree 18F-FET PET, 18F-fluoroethyl tyrosine positron emission tomography FLAIR, fluid-attenuated inversion recovery GBM, glioblastoma multiforme O6-MGMT, methylguanine DNA methyltransferase ROC, receiver-operating characteristic SUV, standardized uptake value WHO, World Health Organization PMID:26366972
Optimal designs for prediction studies of whiplash.
Kamper, Steven J; Hancock, Mark J; Maher, Christopher G
2011-12-01
Commentary. To provide guidance for the design and interpretation of predictive studies of whiplash associated disorders (WAD). Numerous studies have sought to define and explain the clinical course and response to treatment of people with WAD. Design of these studies is often suboptimal, which can lead to biased findings and issues with interpreting the results. Literature review and commentary. Predictive studies can be grouped into four broad categories; studies of symptomatic course, studies that aim to identify factors that predict outcome, studies that aim to isolate variables that are causally responsible for outcome, and studies that aim to identify patients who respond best to particular treatments. Although the specific research question will determine the optimal methods, there are a number of generic features that should be incorporated into design of such studies. The aim of these features is to minimize bias, generate adequately precise prognostic estimates, and ensure generalizability of the findings. This paper provides a summary of important considerations in the design, conduct, and reporting of prediction studies in the field of whiplash.
NASA Astrophysics Data System (ADS)
Song, Jiangdian; Zang, Yali; Li, Weimin; Zhong, Wenzhao; Shi, Jingyun; Dong, Di; Fang, Mengjie; Liu, Zaiyi; Tian, Jie
2017-03-01
Accurately predict the risk of disease progression and benefit of tyrosine kinase inhibitors (TKIs) therapy for stage IV non-small cell lung cancer (NSCLC) patients with activing epidermal growth factor receptor (EGFR) mutations by current staging methods are challenge. We postulated that integrating a classifier consisted of multiple computed tomography (CT) phenotypic features, and other clinicopathological risk factors into a single model could improve risk stratification and prediction of progression-free survival (PFS) of EGFR TKIs for these patients. Patients confirmed as stage IV EGFR-mutant NSCLC received EGFR TKIs with no resection; pretreatment contrast enhanced CT performed at approximately 2 weeks before the treatment was enrolled. A six-CT-phenotypic-feature-based classifier constructed by the LASSO Cox regression model, and three clinicopathological factors: pathologic N category, performance status (PS) score, and intrapulmonary metastasis status were used to construct a nomogram in a training set of 115 patients. The prognostic and predictive accuracy of this nomogram was then subjected to an external independent validation of 107 patients. PFS between the training and independent validation set is no statistical difference by Mann-Whitney U test (P = 0.2670). PFS of the patients could be predicted with good consistency compared with the actual survival. C-index of the proposed individualized nomogram in the training set (0·707, 95%CI: 0·643, 0·771) and the independent validation set (0·715, 95%CI: 0·650, 0·780) showed the potential of clinical prognosis to predict PFS of stage IV EGFR-mutant NSCLC from EGFR TKIs. The individualized nomogram might facilitate patient counselling and individualise management of patients with this disease.
Patient Stratification Using Electronic Health Records from a Chronic Disease Management Program.
Chen, Robert; Sun, Jimeng; Dittus, Robert S; Fabbri, Daniel; Kirby, Jacqueline; Laffer, Cheryl L; McNaughton, Candace D; Malin, Bradley
2016-01-04
The goal of this study is to devise a machine learning framework to assist care coordination programs in prognostic stratification to design and deliver personalized care plans and to allocate financial and medical resources effectively. This study is based on a de-identified cohort of 2,521 hypertension patients from a chronic care coordination program at the Vanderbilt University Medical Center. Patients were modeled as vectors of features derived from electronic health records (EHRs) over a six-year period. We applied a stepwise regression to identify risk factors associated with a decrease in mean arterial pressure of at least 2 mmHg after program enrollment. The resulting features were subsequently validated via a logistic regression classifier. Finally, risk factors were applied to group the patients through model-based clustering. We identified a set of predictive features that consisted of a mix of demographic, medication, and diagnostic concepts. Logistic regression over these features yielded an area under the ROC curve (AUC) of 0.71 (95% CI: [0.67, 0.76]). Based on these features, four clinically meaningful groups are identified through clustering - two of which represented patients with more severe disease profiles, while the remaining represented patients with mild disease profiles. Patients with hypertension can exhibit significant variation in their blood pressure control status and responsiveness to therapy. Yet this work shows that a clustering analysis can generate more homogeneous patient groups, which may aid clinicians in designing and implementing customized care programs. The study shows that predictive modeling and clustering using EHR data can be beneficial for providing a systematic, generalized approach for care providers to tailor their management approach based upon patient-level factors.
Annoyance from industrial noise: indicators for a wide variety of industrial sources.
Alayrac, M; Marquis-Favre, C; Viollon, S; Morel, J; Le Nost, G
2010-09-01
In the study of noises generated by industrial sources, one issue is the variety of industrial noise sources and consequently the complexity of noises generated. Therefore, characterizing the environmental impact of an industrial plant requires better understanding of the noise annoyance caused by industrial noise sources. To deal with the variety of industrial sources, the proposed approach is set up by type of spectral features and based on a perceptive typology of steady and permanent industrial noises comprising six categories. For each perceptive category, listening tests based on acoustical factors are performed on noise annoyance. Various indicators are necessary to predict noise annoyance due to various industrial noise sources. Depending on the spectral features of the industrial noise sources, noise annoyance indicators are thus assessed. In case of industrial noise sources without main spectral features such as broadband noise, noise annoyance is predicted by the A-weighted sound pressure level L(Aeq) or the loudness level L(N). For industrial noises with spectral components such as low-frequency noises with a main component at 100 Hz or noises with spectral components in middle frequencies, indicators are proposed here that allow good prediction of noise annoyance by taking into account spectral features.
Latent feature decompositions for integrative analysis of multi-platform genomic data
Gregory, Karl B.; Momin, Amin A.; Coombes, Kevin R.; Baladandayuthapani, Veerabhadran
2015-01-01
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to, a glioblastoma multiforme dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features. PMID:26146492
A new approach to modeling the influence of image features on fixation selection in scenes
Nuthmann, Antje; Einhäuser, Wolfgang
2015-01-01
Which image characteristics predict where people fixate when memorizing natural images? To answer this question, we introduce a new analysis approach that combines a novel scene-patch analysis with generalized linear mixed models (GLMMs). Our method allows for (1) directly describing the relationship between continuous feature value and fixation probability, and (2) assessing each feature's unique contribution to fixation selection. To demonstrate this method, we estimated the relative contribution of various image features to fixation selection: luminance and luminance contrast (low-level features); edge density (a mid-level feature); visual clutter and image segmentation to approximate local object density in the scene (higher-level features). An additional predictor captured the central bias of fixation. The GLMM results revealed that edge density, clutter, and the number of homogenous segments in a patch can independently predict whether image patches are fixated or not. Importantly, neither luminance nor contrast had an independent effect above and beyond what could be accounted for by the other predictors. Since the parcellation of the scene and the selection of features can be tailored to the specific research question, our approach allows for assessing the interplay of various factors relevant for fixation selection in scenes in a powerful and flexible manner. PMID:25752239
NASA Astrophysics Data System (ADS)
Tan, Maxine; Leader, Joseph K.; Liu, Hong; Zheng, Bin
2015-03-01
We recently investigated a new mammographic image feature based risk factor to predict near-term breast cancer risk after a woman has a negative mammographic screening. We hypothesized that unlike the conventional epidemiology-based long-term (or lifetime) risk factors, the mammographic image feature based risk factor value will increase as the time lag between the negative and positive mammography screening decreases. The purpose of this study is to test this hypothesis. From a large and diverse full-field digital mammography (FFDM) image database with 1278 cases, we collected all available sequential FFDM examinations for each case including the "current" and 1 to 3 most recently "prior" examinations. All "prior" examinations were interpreted negative, and "current" ones were either malignant or recalled negative/benign. We computed 92 global mammographic texture and density based features, and included three clinical risk factors (woman's age, family history and subjective breast density BIRADS ratings). On this initial feature set, we applied a fast and accurate Sequential Forward Floating Selection (SFFS) feature selection algorithm to reduce feature dimensionality. The features computed on both mammographic views were individually/ separately trained using two artificial neural network (ANN) classifiers. The classification scores of the two ANNs were then merged with a sequential ANN. The results show that the maximum adjusted odds ratios were 5.59, 7.98, and 15.77 for using the 3rd, 2nd, and 1st "prior" FFDM examinations, respectively, which demonstrates a higher association of mammographic image feature change and an increasing risk trend of developing breast cancer in the near-term after a negative screening.
Jiang, Xiaoyu; Fuchs, Mathias
2017-01-01
As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility. PMID:28546826
Kudumija Slijepcevic, Marija; Jukic, Vlado; Novalic, Darko; Zarkovic-Palijan, Tija; Milosevic, Milan; Rosenzweig, Ivana
2014-04-01
To determine predictive risk factors for violent offending in patients with paranoid schizophrenia in Croatia. The cross-sectional study including male in-patients with paranoid schizophrenia with (N=104) and without (N=102) history of physical violence and violent offending was conducted simultaneously in several hospitals in Croatia during one-year period (2010-2011). Data on their sociodemographic characteristics, duration of untreated illness phase (DUP), alcohol abuse, suicidal behavior, personality features, and insight into illness were collected and compared between groups. Binary logistic regression model was used to determine the predictors of violent offending. Predictors of violent offending were older age, DUP before first contact with psychiatric services, and alcohol abuse. Regression model showed that the strongest positive predictive factor was harmful alcohol use, as determined by AUDIT test (odds ratio 37.01; 95% confidence interval 5.20-263.24). Psychopathy, emotional stability, and conscientiousness were significant positive predictive factors, while extroversion, pleasantness, and intellect were significant negative predictive factors for violent offending. This study found an association between alcohol abuse and the risk for violent offending in paranoid schizophrenia. We hope that this finding will help improve public and mental health prevention strategies in this vulnerable patient group.
Wang, Xin; Wang, Ying; Sun, Hongbin
2016-01-01
In social media, trust and distrust among users are important factors in helping users make decisions, dissect information, and receive recommendations. However, the sparsity and imbalance of social relations bring great difficulties and challenges in predicting trust and distrust. Meanwhile, there are numerous inducing factors to determine trust and distrust relations. The relationship among inducing factors may be dependency, independence, and conflicting. Dempster-Shafer theory and neural network are effective and efficient strategies to deal with these difficulties and challenges. In this paper, we study trust and distrust prediction based on the combination of Dempster-Shafer theory and neural network. We firstly analyze the inducing factors about trust and distrust, namely, homophily, status theory, and emotion tendency. Then, we quantify inducing factors of trust and distrust, take these features as evidences, and construct evidence prototype as input nodes of multilayer neural network. Finally, we propose a framework of predicting trust and distrust which uses multilayer neural network to model the implementing process of Dempster-Shafer theory in different hidden layers, aiming to overcome the disadvantage of Dempster-Shafer theory without optimization method. Experimental results on a real-world dataset demonstrate the effectiveness of the proposed framework. PMID:27034651
Cai, Kefu; Xu, Tian; Shen, Lihua; Ni, Yaohui; Ji, Qiuhong
2016-04-01
To investigate risk factors to predict postoperative fever after endovascular treatment of ruptured intracranial aneurysms. Patients undergoing endovascular coiling to treat subarachnoid hemorrhage in Nantong University between November 2011 and September 2014 were retrospectively reviewed. Postoperative temperature and patient demographic data, admission status, characteristic features of aneurysms, and endovascular coiling procedure were documented and analyzed. There were 336 consecutive patients included in this study, and 111 were classified as febrile (tympanic temperature >38.3°C for at least 2 consecutive days). Univariate analysis demonstrated that age, interval from onset of subarachnoid hemorrhage to operation, history of hypertension and smoking, Hunt and Hess grade, Fisher grade, temperature before coiling, leukocyte count on admission, and infectious complications were correlated with postoperative fever. Five variables were independent risk factors to predict fever by multivariate logistic regression: age >70 years (odds ratio [OR] = 2.6, 95% confidence interval [CI] = 1.2-5.6), Fisher grade 3 or 4 (OR = 2.2, 95% CI = 1.1-4.3), leukocyte count >10,000/mm(3) on admission (OR = 2.3, 95% CI = 1.3-4.0), temperature >37.5°C before coiling (OR = 4.6, 95% CI = 2.0-10.7), and infectious complications (OR = 4.4, 95% CI = 2.2-8.6). Postoperative fever after coil embolization was predicted by changeable and unchangeable risk factors in subarachnoid hemorrhage. However, characteristic features of aneurysms and the coiling procedure had no impact on development of postoperative fever. Preventing any infectious complications, lowering temperature before embolization, and draining bloody cerebrospinal fluid may assist in the prevention of subsequent fever. Copyright © 2016 Elsevier Inc. All rights reserved.
Social cognition and functional capacity in bipolar disorder and schizophrenia.
Thaler, Nicholas S; Sutton, Griffin P; Allen, Daniel N
2014-12-15
Social cognition is a functionally relevant predictor of capacity in schizophrenia (SZ), though research concerning its value for bipolar disorder (BD) is limited. The current investigation examined the relationship between two social cognitive factors and functional capacity in bipolar disorder. This study included 48 individuals with bipolar disorder (24 with psychotic features) and 30 patients with schizophrenia. Multiple regression controlling for estimated IQ scores was used to assess the predictive value of social cognitive factors on the UCSD Performance-Based Functional Skills Assessment (UPSA). Results found that for the bipolar with psychosis and schizophrenia groups, the social/emotion processing factor predicted the UPSA. The theory of mind factor only predicted the UPSA for the schizophrenia group.. Findings support the clinical utility of evaluating emotion processing in individuals with a history of psychosis. For BD, theory of mind may be better explained by a generalized cognitive deficit. In contrast, social/emotion processing may be linked to distinct neurobiological processes associated with psychosis. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
A Real-time Breakdown Prediction Method for Urban Expressway On-ramp Bottlenecks
NASA Astrophysics Data System (ADS)
Ye, Yingjun; Qin, Guoyang; Sun, Jian; Liu, Qiyuan
2018-01-01
Breakdown occurrence on expressway is considered to relate with various factors. Therefore, to investigate the association between breakdowns and these factors, a Bayesian network (BN) model is adopted in this paper. Based on the breakdown events identified at 10 urban expressways on-ramp in Shanghai, China, 23 parameters before breakdowns are extracted, including dynamic environment conditions aggregated with 5-minutes and static geometry features. Different time periods data are used to predict breakdown. Results indicate that the models using 5-10 min data prior to breakdown performs the best prediction, with the prediction accuracies higher than 73%. Moreover, one unified model for all bottlenecks is also built and shows reasonably good prediction performance with the classification accuracy of breakdowns about 75%, at best. Additionally, to simplify the model parameter input, the random forests (RF) model is adopted to identify the key variables. Modeling with the selected 7 parameters, the refined BN model can predict breakdown with adequate accuracy.
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.
Tuo, Youlin; An, Ning; Zhang, Ming
2018-03-01
The aim of the present study was to investigate the feature genes in metastatic breast cancer samples. A total of 5 expression profiles of metastatic breast cancer samples were downloaded from the Gene Expression Omnibus database, which were then analyzed using the MetaQC and MetaDE packages in R language. The feature genes between metastasis and non‑metastasis samples were screened under the threshold of P<0.05. Based on the protein‑protein interactions (PPIs) in the Biological General Repository for Interaction Datasets, Human Protein Reference Database and Biomolecular Interaction Network Database, the PPI network of the feature genes was constructed. The feature genes identified by topological characteristics were then used for support vector machine (SVM) classifier training and verification. The accuracy of the SVM classifier was then evaluated using another independent dataset from The Cancer Genome Atlas database. Finally, function and pathway enrichment analyses for genes in the SVM classifier were performed. A total of 541 feature genes were identified between metastatic and non‑metastatic samples. The top 10 genes with the highest betweenness centrality values in the PPI network of feature genes were Nuclear RNA Export Factor 1, cyclin‑dependent kinase 2 (CDK2), myelocytomatosis proto‑oncogene protein (MYC), Cullin 5, SHC Adaptor Protein 1, Clathrin heavy chain, Nucleolin, WD repeat domain 1, proteasome 26S subunit non‑ATPase 2 and telomeric repeat binding factor 2. The cyclin‑dependent kinase inhibitor 1A (CDKN1A), E2F transcription factor 1 (E2F1), and MYC interacted with CDK2. The SVM classifier constructed by the top 30 feature genes was able to distinguish metastatic samples from non‑metastatic samples [correct rate, specificity, positive predictive value and negative predictive value >0.89; sensitivity >0.84; area under the receiver operating characteristic curve (AUROC) >0.96]. The verification of the SVM classifier in an independent dataset (35 metastatic samples and 143 non‑metastatic samples) revealed an accuracy of 94.38% and AUROC of 0.958. Cell cycle associated functions and pathways were the most significant terms of the 30 feature genes. A SVM classifier was constructed to assess the possibility of breast cancer metastasis, which presented high accuracy in several independent datasets. CDK2, CDKN1A, E2F1 and MYC were indicated as the potential feature genes in metastatic breast cancer.
Spelt, Lidewij; Sasor, Agata; Ansari, Daniel; Andersson, Roland
2016-10-01
To identify significant predictive factors for overall survival (OS) and disease-free survival (DFS) after liver resection for colon cancer metastases, with special focus on features of the primary colon cancer, such as lymph node ratio (LNR), vascular invasion, and perineural invasion. Patients operated for colonic cancer liver metastases between 2006 and 2014 were included. Details on patient characteristics, the primary colon cancer operation and metastatic disease were collected. Multivariate analysis was performed to select predictive variables for OS and DFS. Median OS and DFS were 67 and 20 months, respectively. 1-, 3- and 5-year OS were 97, 76, and 52%. 1-, 3- and 5-year DFS were 65, 42, and 37%. Multivariate analysis showed LNR to be an independent predictive factor for DFS but not for OS. Other identified predictive factors were vascular and perineural invasion of the primary colon cancer, size of the largest metastasis and severe complications after liver surgery for OS, and perineural invasion, number of liver metastases and preoperative CEA-level for DFS. Traditional N-stage was also considered to be an independent predictive factor for DFS in a separate multivariate analysis. LNR and perineural invasion of the primary colon cancer can be used as a prognostic variable for DFS after a concomitant liver resection for colon cancer metastases. Vascular and perineural invasion of the primary colon cancer are predictive for OS.
Analyses of Fatigue and Fatigue-Crack Growth under Constant- and Variable-Amplitude Loading
NASA Technical Reports Server (NTRS)
Newman, J. C., Jr.
1999-01-01
Studies on the growth of small cracks have led to the observation that fatigue life of many engineering materials is primarily crack growth from micro-structural features, such as inclusion particles, voids, slip-bands or from manufacturing defects. This paper reviews the capabilities of a plasticity-induced crack-closure model to predict fatigue lives of metallic materials using small-crack theory under various loading conditions. Constraint factors, to account for three-dimensional effects, were selected to correlate large-crack growth rate data as a function of the effective stress-intensity factor range (delta K(sub eff)) under constant-amplitude loading. Modifications to the delta K(sub eff)-rate relations in the near-threshold regime were needed to fit measured small-crack growth rate behavior. The model was then used to calculate small- and large-crack growth rates, and to predict total fatigue lives, for notched and un-notched specimens under constant-amplitude and spectrum loading. Fatigue lives were predicted using crack-growth relations and micro-structural features like those that initiated cracks in the fatigue specimens for most of the materials analyzed. Results from the tests and analyses agreed well.
Childhood Precursors of Adult Borderline Personality Disorder Features: A Longitudinal Study.
Cramer, Phebe
2016-07-01
This study identifies childhood personality traits that are precursors of adult Borderline Personality Disorder (BPD) features. In a longitudinal study, childhood personality traits were assessed at age 11 (N = 100) using the California Child Q-set (CCQ: Block and Block, 1980). A number of these Q-items were found to be significantly correlated (p < 0.001) with a prototype-based measure of BPD features at age 23. Factor analysis of these Q-items suggested that they could be characterized by two underlying personality dimensions: Impulsivity and Nonconformity/Aggression. The findings thus provide evidence that childhood personality traits predict adult BPD features. Identifying such childhood precursors provides an opportunity for early intervention.
NASA Astrophysics Data System (ADS)
Peng, Chong; Wang, Lun; Liao, T. Warren
2015-10-01
Currently, chatter has become the critical factor in hindering machining quality and productivity in machining processes. To avoid cutting chatter, a new method based on dynamic cutting force simulation model and support vector machine (SVM) is presented for the prediction of chatter stability lobes. The cutting force is selected as the monitoring signal, and the wavelet energy entropy theory is used to extract the feature vectors. A support vector machine is constructed using the MATLAB LIBSVM toolbox for pattern classification based on the feature vectors derived from the experimental cutting data. Then combining with the dynamic cutting force simulation model, the stability lobes diagram (SLD) can be estimated. Finally, the predicted results are compared with existing methods such as zero-order analytical (ZOA) and semi-discretization (SD) method as well as actual cutting experimental results to confirm the validity of this new method.
Fan, Jun; Yang, Jing; Jiang, Zhenran
2018-04-01
Drug side effects are one of the public health concerns. Using powerful machine-learning methods to predict potential side effects before the drugs reach the clinical stages is of great importance to reduce time consumption and protect the security of patients. Recently, researchers have proved that the central nervous system (CNS) side effects of a drug are closely related to its permeability to the blood-brain barrier (BBB). Inspired by this, we proposed an extended neighborhood-based recommendation method to predict CNS side effects using drug permeability to the BBB and other known features of drug. To the best of our knowledge, this is the first attempt to predict CNS side effects considering drug permeability to the BBB. Computational experiments demonstrated that drug permeability to the BBB is an important factor in CNS side effects prediction. Moreover, we built an ensemble recommendation model and obtained higher AUC score (area under the receiver operating characteristic curve) and AUPR score (area under the precision-recall curve) on the data set of CNS side effects by integrating various features of drug.
Modeling first impressions from highly variable facial images.
Vernon, Richard J W; Sutherland, Clare A M; Young, Andrew W; Hartley, Tom
2014-08-12
First impressions of social traits, such as trustworthiness or dominance, are reliably perceived in faces, and despite their questionable validity they can have considerable real-world consequences. We sought to uncover the information driving such judgments, using an attribute-based approach. Attributes (physical facial features) were objectively measured from feature positions and colors in a database of highly variable "ambient" face photographs, and then used as input for a neural network to model factor dimensions (approachability, youthful-attractiveness, and dominance) thought to underlie social attributions. A linear model based on this approach was able to account for 58% of the variance in raters' impressions of previously unseen faces, and factor-attribute correlations could be used to rank attributes by their importance to each factor. Reversing this process, neural networks were then used to predict facial attributes and corresponding image properties from specific combinations of factor scores. In this way, the factors driving social trait impressions could be visualized as a series of computer-generated cartoon face-like images, depicting how attributes change along each dimension. This study shows that despite enormous variation in ambient images of faces, a substantial proportion of the variance in first impressions can be accounted for through linear changes in objectively defined features.
2010-09-01
versus nosocomial Klebsiella pneumoniae bacteremia: clinical features, treatment outcomes, and clinical implication of antimicrobial resistance . J...antibiotic resistance , strain clonality, and other host factors on morbidity and mortality. All patients with thermal burns infected with K pneumoniae between...revealed that an infection with ESBL-producing K pneumoniae during the hospital stay was the factor most predictive of death, with a nearly 4-fold increased
Using patient data similarities to predict radiation pneumonitis via a self-organizing map
NASA Astrophysics Data System (ADS)
Chen, Shifeng; Zhou, Sumin; Yin, Fang-Fang; Marks, Lawrence B.; Das, Shiva K.
2008-01-01
This work investigates the use of the self-organizing map (SOM) technique for predicting lung radiation pneumonitis (RP) risk. SOM is an effective method for projecting and visualizing high-dimensional data in a low-dimensional space (map). By projecting patients with similar data (dose and non-dose factors) onto the same region of the map, commonalities in their outcomes can be visualized and categorized. Once built, the SOM may be used to predict pneumonitis risk by identifying the region of the map that is most similar to a patient's characteristics. Two SOM models were developed from a database of 219 lung cancer patients treated with radiation therapy (34 clinically diagnosed with Grade 2+ pneumonitis). The models were: SOMall built from all dose and non-dose factors and, for comparison, SOMdose built from dose factors alone. Both models were tested using ten-fold cross validation and Receiver Operating Characteristics (ROC) analysis. Models SOMall and SOMdose yielded ten-fold cross-validated ROC areas of 0.73 (sensitivity/specificity = 71%/68%) and 0.67 (sensitivity/specificity = 63%/66%), respectively. The significant difference between the cross-validated ROC areas of these two models (p < 0.05) implies that non-dose features add important information toward predicting RP risk. Among the input features selected by model SOMall, the two with highest impact for increasing RP risk were: (a) higher mean lung dose and (b) chemotherapy prior to radiation therapy. The SOM model developed here may not be extrapolated to treatment techniques outside that used in our database, such as several-field lung intensity modulated radiation therapy or gated radiation therapy.
Fate of abstracts presented at the 2008 European Congress of Physical and Rehabilitation Medicine.
Allart, E; Beaucamp, F; Tiffreau, V; Thevenon, A
2015-08-01
The subsequent full-text publication of abstracts presented at a scientific congress reflects the latter's scientific quality. The aim of this paper was to evaluate the publication rate for abstracts presented at the 2008 European Congress of Physical and Rehabilitation Medicine (ECPRM), characterize the publications and identify factors that were predictive of publication. It is a bibliography search. We used the PubMed database to search for subsequent publication of abstracts. We screened the abstracts' characteristics for features that were predictive of publication among abstracts features, such the status of the authors, the topic and the type of work. We performed univariate analyses and a logistic regression analysis. Of 779 abstracts presented at ECPRM 2008, 169 (21.2%) were subsequently published. The mean time to publication was 12±15.7 months and the mean impact factor of the publishing journals was 2.05±2.1. In a univariate analysis, university status (P<10-6), geographic origin (P=10-3), oral presentation (P<10-6), and original research (P<10-6) (and particularly multicentre trials [P<0.01] and randomized controlled trials [P=10-3]) were predictive of publication. In a logistic regression analysis, oral presentation (odds ratio [OR]=0.37) and university status (OR=0.36) were significant, independent predictors of publication. ECPRM 2008 publication rate and impact factor were relatively low, when compared with most other national and international conferences in this field. University status, the type of abstract and oral presentation were predictive of subsequent publication.
Pre-Test Analysis Predictions for the Shell Buckling Knockdown Factor Checkout Tests - TA01 and TA02
NASA Technical Reports Server (NTRS)
Thornburgh, Robert P.; Hilburger, Mark W.
2011-01-01
This report summarizes the pre-test analysis predictions for the SBKF-P2-CYL-TA01 and SBKF-P2-CYL-TA02 shell buckling tests conducted at the Marshall Space Flight Center (MSFC) in support of the Shell Buckling Knockdown Factor (SBKF) Project, NASA Engineering and Safety Center (NESC) Assessment. The test article (TA) is an 8-foot-diameter aluminum-lithium (Al-Li) orthogrid cylindrical shell with similar design features as that of the proposed Ares-I and Ares-V barrel structures. In support of the testing effort, detailed structural analyses were conducted and the results were used to monitor the behavior of the TA during the testing. A summary of predicted results for each of the five load sequences is presented herein.
Child and Adolescent Clinical Features Preceding Adult Suicide Attempts.
Serra, Giulia; Koukopoulos, Athanasios; De Chiara, Lavinia; Napoletano, Flavia; Koukopoulos, Alexia; Sani, Gabriele; Faedda, Gianni L; Girardi, Paolo; Reginaldi, Daniela; Baldessarini, Ross J
2017-07-03
The objective of this study was to identify the predictive value of juvenile factors for adult suicidal behavior. We reviewed clinical records to compare factors identified in childhood and adolescence between adult suicidal versus nonsuicidal major affective disorder subjects. Suicide attempts occurred in 23.1% of subjects. Age-at-first-symptom was 14.2 vs. 20.2 years among suicidal versus nonsuicidal subjects (p < 0.0001). More prevalent in suicidal versus non-suicidal subjects by multivariate analysis were: depressive symptoms, hyper-emotionality, younger-at-first-affective-episode, family suicide history, childhood mood-swings, and adolescence low self-esteem. Presence of one factor yielded a Bayesian sensitivity of 64%, specificity of 50%, and negative predictive power of 86%. Several juvenile factors were associated with adult suicidal behavior; their absence was strongly associated with a lack of adult suicidal behavior.
NASA Technical Reports Server (NTRS)
Meisel, D. D.
1976-01-01
Preliminary data required to extrapolate available meteor physics information (obtained in the photographic, visual and near ultraviolet spectral regions) into the middle and far ultraviolet are presented. Wavelength tables, telluric attenuation factors, meteor rates, and telluric airglow data are summarized in the context of near-earth observation vehicle parameters using moderate to low spectral resolution instrumentation. Considerable attenuation is given to the problem of meteor excitation temperatures since these are required to predict the strength of UV features. Relative line intensities are computed for an assumed chondritic composition. Features of greatest predicted intensities, the major problems in meteor physics, detectability of UV meteor events, complications of spacecraft motion, and UV instrumentation options are summarized.
Hendry, Melissa C; Douglas, Kevin S; Winter, Elizabeth A; Edens, John F
2013-01-01
Much of the risk assessment literature has focused on the predictive validity of risk assessment tools. However, these tools often comprise a list of risk factors that are themselves complex constructs, and focusing on the quality of measurement of individual risk factors may improve the predictive validity of the tools. The present study illustrates this concern using the Antisocial Features and Aggression scales of the Personality Assessment Inventory (Morey, 1991). In a sample of 1,545 prison inmates and offenders undergoing treatment for substance abuse (85% male), we evaluated (a) the factorial validity of the ANT and AGG scales, (b) the utility of original ANT and AGG scales and newly derived ANT and AGG scales for predicting antisocial outcomes (recidivism and institutional infractions), and (c) whether items with a stronger relationship to the underlying constructs (higher factor loadings) were in turn more strongly related to antisocial outcomes. Confirmatory factor analyses (CFAs) indicated that ANT and AGG items were not structured optimally in these data in terms of correspondence to the subscale structure identified in the PAI manual. Exploratory factor analyses were conducted on a random split-half of the sample to derive optimized alternative factor structures, and cross-validated in the second split-half using CFA. Four-factor models emerged for both the ANT and AGG scales, and, as predicted, the size of item factor loadings was associated with the strength with which items were associated with institutional infractions and community recidivism. This suggests that the quality by which a construct is measured is associated with its predictive strength. Implications for risk assessment are discussed. Copyright © 2013 John Wiley & Sons, Ltd.
Multifactorial disease risk calculator: Risk prediction for multifactorial disease pedigrees.
Campbell, Desmond D; Li, Yiming; Sham, Pak C
2018-03-01
Construction of multifactorial disease models from epidemiological findings and their application to disease pedigrees for risk prediction is nontrivial for all but the simplest of cases. Multifactorial Disease Risk Calculator is a web tool facilitating this. It provides a user-friendly interface, extending a reported methodology based on a liability-threshold model. Multifactorial disease models incorporating all the following features in combination are handled: quantitative risk factors (including polygenic scores), categorical risk factors (including major genetic risk loci), stratified age of onset curves, and the partition of the population variance in disease liability into genetic, shared, and unique environment effects. It allows the application of such models to disease pedigrees. Pedigree-related outputs are (i) individual disease risk for pedigree members, (ii) n year risk for unaffected pedigree members, and (iii) the disease pedigree's joint liability distribution. Risk prediction for each pedigree member is based on using the constructed disease model to appropriately weigh evidence on disease risk available from personal attributes and family history. Evidence is used to construct the disease pedigree's joint liability distribution. From this, lifetime and n year risk can be predicted. Example disease models and pedigrees are provided at the website and are used in accompanying tutorials to illustrate the features available. The website is built on an R package which provides the functionality for pedigree validation, disease model construction, and risk prediction. Website: http://grass.cgs.hku.hk:3838/mdrc/current. © 2017 WILEY PERIODICALS, INC.
Identifying Trajectories of Borderline Personality Features in Adolescence
Haltigan, John D.
2016-01-01
Objective: To examine trajectories of adolescent borderline personality (BP) features in a normative-risk cohort (n = 566) of Canadian children assessed at ages 13, 14, 15, and 16 and childhood predictors of trajectory group membership assessed at ages 8, 10, 11, and 12. Method: Data were drawn from the McMaster Teen Study, an on-going study examining relations among bullying, mental health, and academic achievement. Participants and their parents completed a battery of mental health and peer relations questionnaires at each wave of the study. Academic competence was assessed at age 8 (Grade 3). Latent class growth analysis, analysis of variance, and logistic regression were used to analyze the data. Results: Three distinct BP features trajectory groups were identified: elevated or rising, intermediate or stable, and low or stable. Parent- and child-reported mental health symptoms, peer relations risk factors, and intra-individual risk factors were significant predictors of elevated or rising and intermediate or stable trajectory groups. Child-reported attention-deficit hyperactivity disorder (ADHD) and somatization symptoms uniquely predicted elevated or rising trajectory group membership, whereas parent-reported anxiety and child-reported ADHD symptoms uniquely predicted intermediate or stable trajectory group membership. Child-reported somatization symptoms was the only predictor to differentiate the intermediate or stable and elevated or rising trajectory groups (OR 1.15, 95% CI 1.04 to 1.28). Associations between child-reported reactive temperament and elevated BP features trajectory group membership were 10.23 times higher among children who were bullied, supporting a diathesis–stress pathway in the development of BP features for these youth. Conclusions: Findings demonstrate the heterogeneous course of BP features in early adolescence and shed light on the potential prodromal course of later borderline personality disorder. PMID:27254092
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fave, X; Court, L; UT Health Science Center, Graduate School of Biomedical Sciences, Houston, TX
Purpose: To determine how radiomics features change during radiation therapy and whether those changes (delta-radiomics features) can improve prognostic models built with clinical factors. Methods: 62 radiomics features, including histogram, co-occurrence, run-length, gray-tone difference, and shape features, were calculated from pretreatment and weekly intra-treatment CTs for 107 stage III NSCLC patients (5–9 images per patient). Image preprocessing for each feature was determined using the set of pretreatment images: bit-depth resample and/or a smoothing filter were tested for their impact on volume-correlation and significance of each feature in univariate cox regression models to maximize their information content. Next, the optimized featuresmore » were calculated from the intratreatment images and tested in linear mixed-effects models to determine which features changed significantly with dose-fraction. The slopes in these significant features were defined as delta-radiomics features. To test their prognostic potential multivariate cox regression models were fitted, first using only clinical features and then clinical+delta-radiomics features for overall-survival, local-recurrence, and distant-metastases. Leave-one-out cross validation was used for model-fitting and patient predictions. Concordance indices(c-index) and p-values for the log-rank test with patients stratified at the median were calculated. Results: Approximately one-half of the 62 optimized features required no preprocessing, one-fourth required smoothing, and one-fourth required smoothing and resampling. From these, 54 changed significantly during treatment. For overall-survival, the c-index improved from 0.52 for clinical factors alone to 0.62 for clinical+delta-radiomics features. For distant-metastases, the c-index improved from 0.53 to 0.58, while for local-recurrence it did not improve. Patient stratification significantly improved (p-value<0.05) for overallsurvival and distant-metastases when delta-radiomics features were included. The delta-radiomics versions of autocorrelation, kurtosis, and compactness were selected most frequently in leave-one-out iterations. Conclusion: Weekly changes in radiomics features can potentially be used to evaluate treatment response and predict patient outcomes. High-risk patients could be recommended for dose escalation or consolidation chemotherapy. This project was funded in part by grants from the National Cancer Institute (NCI) and the Cancer Prevention Research Institute of Texas (CPRIT).« less
Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity
Elias-Kirma, Shani; Nir, Ronit; Segal, Eran
2017-01-01
Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements. PMID:28922394
CT findings of persistent pure ground glass opacity: can we predict the invasiveness?
Liu, Li-Heng; Liu, Ming; Wei, Ran; Jin, Er-Hu; Liu, Yu-Hui; Xu, Liang; Li, Wen-Wu; Huang, Yong
2015-01-01
To investigate whether CT findings can predict the invasiveness of persistent cancerous pure ground glass opacity (pGGO) by correlating the CT imaging features of persistent pGGO with pathological changes. Ninety five patients with persistent pGGOs were included. Three radiologists evaluated the morphologic features of these pGGOs at high resolution CT (HRCT). Binary logistic regression was used to assess the association between CT findings and histopathological classification (pre-invasive and invasive groups). Receiver operating characteristic (ROC) curve analysis was performed to evaluate the diagnostic performance of diameters. A total of 105 pGGOs were identified. Between pre-invasive (atypical adenomatous hyperplasia, AAH, and adenocarcinoma in situ, AIS) and invasive group (minimally invasive adenocarcinoma, MIA and invasive lung adenocarcinomas, ILA), there were significant differences in diameter, spiculation and vessel dilatation (p<0.05). No difference was found in air-bronchogram, bubble- lucency, lobulated-margin, pleural indentation or vascular convergence (p>0.05). The optimal threshold value of the diameters to predict the invasiveness of pGGO was 12.50mm. HRCT features can predict the invasiveness of persistent pGGO. The pGGO with a diameter more than 12.50mm, presences of spiculation and vessel dilatation are important factors to differentiate invasive adenocarcinoma from pre-invasive cancerous lesions.
Changes in quantitative 3D shape features of the optic nerve head associated with age
NASA Astrophysics Data System (ADS)
Christopher, Mark; Tang, Li; Fingert, John H.; Scheetz, Todd E.; Abramoff, Michael D.
2013-02-01
Optic nerve head (ONH) structure is an important biological feature of the eye used by clinicians to diagnose and monitor progression of diseases such as glaucoma. ONH structure is commonly examined using stereo fundus imaging or optical coherence tomography. Stereo fundus imaging provides stereo views of the ONH that retain 3D information useful for characterizing structure. In order to quantify 3D ONH structure, we applied a stereo correspondence algorithm to a set of stereo fundus images. Using these quantitative 3D ONH structure measurements, eigen structures were derived using principal component analysis from stereo images of 565 subjects from the Ocular Hypertension Treatment Study (OHTS). To evaluate the usefulness of the eigen structures, we explored associations with the demographic variables age, gender, and race. Using regression analysis, the eigen structures were found to have significant (p < 0.05) associations with both age and race after Bonferroni correction. In addition, classifiers were constructed to predict the demographic variables based solely on the eigen structures. These classifiers achieved an area under receiver operating characteristic curve of 0.62 in predicting a binary age variable, 0.52 in predicting gender, and 0.67 in predicting race. The use of objective, quantitative features or eigen structures can reveal hidden relationships between ONH structure and demographics. The use of these features could similarly allow specific aspects of ONH structure to be isolated and associated with the diagnosis of glaucoma, disease progression and outcomes, and genetic factors.
Electronic health record analysis via deep poisson factor models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Henao, Ricardo; Lu, James T.; Lucas, Joseph E.
Electronic Health Record (EHR) phenotyping utilizes patient data captured through normal medical practice, to identify features that may represent computational medical phenotypes. These features may be used to identify at-risk patients and improve prediction of patient morbidity and mortality. We present a novel deep multi-modality architecture for EHR analysis (applicable to joint analysis of multiple forms of EHR data), based on Poisson Factor Analysis (PFA) modules. Each modality, composed of observed counts, is represented as a Poisson distribution, parameterized in terms of hidden binary units. In-formation from different modalities is shared via a deep hierarchy of common hidden units. Activationmore » of these binary units occurs with probability characterized as Bernoulli-Poisson link functions, instead of more traditional logistic link functions. In addition, we demon-strate that PFA modules can be adapted to discriminative modalities. To compute model parameters, we derive efficient Markov Chain Monte Carlo (MCMC) inference that scales efficiently, with significant computational gains when compared to related models based on logistic link functions. To explore the utility of these models, we apply them to a subset of patients from the Duke-Durham patient cohort. We identified a cohort of over 12,000 patients with Type 2 Diabetes Mellitus (T2DM) based on diagnosis codes and laboratory tests out of our patient population of over 240,000. Examining the common hidden units uniting the PFA modules, we identify patient features that represent medical concepts. Experiments indicate that our learned features are better able to predict mortality and morbidity than clinical features identified previously in a large-scale clinical trial.« less
Electronic health record analysis via deep poisson factor models
Henao, Ricardo; Lu, James T.; Lucas, Joseph E.; ...
2016-01-01
Electronic Health Record (EHR) phenotyping utilizes patient data captured through normal medical practice, to identify features that may represent computational medical phenotypes. These features may be used to identify at-risk patients and improve prediction of patient morbidity and mortality. We present a novel deep multi-modality architecture for EHR analysis (applicable to joint analysis of multiple forms of EHR data), based on Poisson Factor Analysis (PFA) modules. Each modality, composed of observed counts, is represented as a Poisson distribution, parameterized in terms of hidden binary units. In-formation from different modalities is shared via a deep hierarchy of common hidden units. Activationmore » of these binary units occurs with probability characterized as Bernoulli-Poisson link functions, instead of more traditional logistic link functions. In addition, we demon-strate that PFA modules can be adapted to discriminative modalities. To compute model parameters, we derive efficient Markov Chain Monte Carlo (MCMC) inference that scales efficiently, with significant computational gains when compared to related models based on logistic link functions. To explore the utility of these models, we apply them to a subset of patients from the Duke-Durham patient cohort. We identified a cohort of over 12,000 patients with Type 2 Diabetes Mellitus (T2DM) based on diagnosis codes and laboratory tests out of our patient population of over 240,000. Examining the common hidden units uniting the PFA modules, we identify patient features that represent medical concepts. Experiments indicate that our learned features are better able to predict mortality and morbidity than clinical features identified previously in a large-scale clinical trial.« less
Feature selection and classification model construction on type 2 diabetic patients' data.
Huang, Yue; McCullagh, Paul; Black, Norman; Harper, Roy
2007-11-01
Diabetes affects between 2% and 4% of the global population (up to 10% in the over 65 age group), and its avoidance and effective treatment are undoubtedly crucial public health and health economics issues in the 21st century. The aim of this research was to identify significant factors influencing diabetes control, by applying feature selection to a working patient management system to assist with ranking, classification and knowledge discovery. The classification models can be used to determine individuals in the population with poor diabetes control status based on physiological and examination factors. The diabetic patients' information was collected by Ulster Community and Hospitals Trust (UCHT) from year 2000 to 2004 as part of clinical management. In order to discover key predictors and latent knowledge, data mining techniques were applied. To improve computational efficiency, a feature selection technique, feature selection via supervised model construction (FSSMC), an optimisation of ReliefF, was used to rank the important attributes affecting diabetic control. After selecting suitable features, three complementary classification techniques (Naïve Bayes, IB1 and C4.5) were applied to the data to predict how well the patients' condition was controlled. FSSMC identified patients' 'age', 'diagnosis duration', the need for 'insulin treatment', 'random blood glucose' measurement and 'diet treatment' as the most important factors influencing blood glucose control. Using the reduced features, a best predictive accuracy of 95% and sensitivity of 98% was achieved. The influence of factors, such as 'type of care' delivered, the use of 'home monitoring', and the importance of 'smoking' on outcome can contribute to domain knowledge in diabetes control. In the care of patients with diabetes, the more important factors identified: patients' 'age', 'diagnosis duration' and 'family history', are beyond the control of physicians. Treatment methods such as 'insulin', 'diet' and 'tablets' (a variety of oral medicines) may be controlled. However lifestyle indicators such as 'body mass index' and 'smoking status' are also important and may be controlled by the patient. This further underlines the need for public health education to aid awareness and prevention. More subtle data interactions need to be better understood and data mining can contribute to the clinical evidence base. The research confirms and to a lesser extent challenges current thinking. Whilst fully appreciating the requirement for clinical verification and interpretation, this work supports the use of data mining as an exploratory tool, particularly as the domain is suffering from a data explosion due to enhanced monitoring and the (potential) storage of this data in the electronic health record. FSSMC has proved a useful feature estimator for large data sets, where processing efficiency is an important factor.
Impact of experimental design on PET radiomics in predicting somatic mutation status.
Yip, Stephen S F; Parmar, Chintan; Kim, John; Huynh, Elizabeth; Mak, Raymond H; Aerts, Hugo J W L
2017-12-01
PET-based radiomic features have demonstrated great promises in predicting genetic data. However, various experimental parameters can influence the feature extraction pipeline, and hence, Here, we investigated how experimental settings affect the performance of radiomic features in predicting somatic mutation status in non-small cell lung cancer (NSCLC) patients. 348 NSCLC patients with somatic mutation testing and diagnostic PET images were included in our analysis. Radiomic feature extractions were analyzed for varying voxel sizes, filters and bin widths. 66 radiomic features were evaluated. The performance of features in predicting mutations status was assessed using the area under the receiver-operating-characteristic curve (AUC). The influence of experimental parameters on feature predictability was quantified as the relative difference between the minimum and maximum AUC (δ). The large majority of features (n=56, 85%) were significantly predictive for EGFR mutation status (AUC≥0.61). 29 radiomic features significantly predicted EGFR mutations and were robust to experimental settings with δ Overall <5%. The overall influence (δ Overall ) of the voxel size, filter and bin width for all features ranged from 5% to 15%, respectively. For all features, none of the experimental designs was predictive of KRAS+ from KRAS- (AUC≤0.56). The predictability of 29 radiomic features was robust to the choice of experimental settings; however, these settings need to be carefully chosen for all other features. The combined effect of the investigated processing methods could be substantial and must be considered. Optimized settings that will maximize the predictive performance of individual radiomic features should be investigated in the future. Copyright © 2017 Elsevier B.V. All rights reserved.
Mirroshandel, Seyed Abolghasem; Ghasemian, Fatemeh; Monji-Azad, Sara
2016-12-01
Aspiration of a good-quality sperm during intracytoplasmic sperm injection (ICSI) is one of the main concerns. Understanding the influence of individual sperm morphology on fertilization, embryo quality, and pregnancy probability is one of the most important subjects in male factor infertility. Embryologists need to decide the best sperm for injection in real time during ICSI cycle. Our objective is to predict the quality of zygote, embryo, and implantation outcome before injection of each sperm in an ICSI cycle for male factor infertility with the aim of providing a decision support system on the sperm selection. The information was collected from 219 patients with male factor infertility at the infertility therapy center of Alzahra hospital in Rasht from 2012 through 2014. The prepared dataset included the quality of zygote, embryo, and implantation outcome of 1544 injected sperms into the related oocytes. In our study, embryo transfer was performed at day 3. Each sperm was represented with thirteen clinical features. Data preprocessing was the first step in the proposed data mining algorithm. After applying more than 30 classifiers, 9 successful classifiers were selected and evaluated by 10-fold cross validation technique using precision, recall, F1, and AUC measures. Another important experiment was measuring the effect of each feature in prediction process. In zygote and embryo quality prediction, IBK and RandomCommittee models provided 79.2% and 83.8% F1, respectively. In implantation outcome prediction, KStar model achieved 95.9% F1, which is even better than prediction of human experts. All these predictions can be done in real time. A machine learning-based decision support system would be helpful in sperm selection phase of ICSI cycle to improve the success rate of ICSI treatment. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Reproducibility and Prognosis of Quantitative Features Extracted from CT Images12
Balagurunathan, Yoganand; Gu, Yuhua; Wang, Hua; Kumar, Virendra; Grove, Olya; Hawkins, Sam; Kim, Jongphil; Goldgof, Dmitry B; Hall, Lawrence O; Gatenby, Robert A; Gillies, Robert J
2014-01-01
We study the reproducibility of quantitative imaging features that are used to describe tumor shape, size, and texture from computed tomography (CT) scans of non-small cell lung cancer (NSCLC). CT images are dependent on various scanning factors. We focus on characterizing image features that are reproducible in the presence of variations due to patient factors and segmentation methods. Thirty-two NSCLC nonenhanced lung CT scans were obtained from the Reference Image Database to Evaluate Response data set. The tumors were segmented using both manual (radiologist expert) and ensemble (software-automated) methods. A set of features (219 three-dimensional and 110 two-dimensional) was computed, and quantitative image features were statistically filtered to identify a subset of reproducible and nonredundant features. The variability in the repeated experiment was measured by the test-retest concordance correlation coefficient (CCCTreT). The natural range in the features, normalized to variance, was measured by the dynamic range (DR). In this study, there were 29 features across segmentation methods found with CCCTreT and DR ≥ 0.9 and R2Bet ≥ 0.95. These reproducible features were tested for predicting radiologist prognostic score; some texture features (run-length and Laws kernels) had an area under the curve of 0.9. The representative features were tested for their prognostic capabilities using an independent NSCLC data set (59 lung adenocarcinomas), where one of the texture features, run-length gray-level nonuniformity, was statistically significant in separating the samples into survival groups (P ≤ .046). PMID:24772210
Density and nest survival of golden-cheeked warblers: Spatial scale matters
Jennifer L. Reidy; Frank R., III Thompson; Lisa O' Donnell
2017-01-01
Conservation and management plans often rely on indicators such as species occupancy or density to define habitat quality, ignoring factors that influence reproductive success, and potentially limiting conservation achievements. We examined relationships between predicted density and nest survival with environmental features at multiple spatial scales for the golden-...
Predicting axillary lymph node metastasis from kinetic statistics of DCE-MRI breast images
NASA Astrophysics Data System (ADS)
Ashraf, Ahmed B.; Lin, Lilie; Gavenonis, Sara C.; Mies, Carolyn; Xanthopoulos, Eric; Kontos, Despina
2012-03-01
The presence of axillary lymph node metastases is the most important prognostic factor in breast cancer and can influence the selection of adjuvant therapy, both chemotherapy and radiotherapy. In this work we present a set of kinetic statistics derived from DCE-MRI for predicting axillary node status. Breast DCE-MRI images from 69 women with known nodal status were analyzed retrospectively under HIPAA and IRB approval. Axillary lymph nodes were positive in 12 patients while 57 patients had no axillary lymph node involvement. Kinetic curves for each pixel were computed and a pixel-wise map of time-to-peak (TTP) was obtained. Pixels were first partitioned according to the similarity of their kinetic behavior, based on TTP values. For every kinetic curve, the following pixel-wise features were computed: peak enhancement (PE), wash-in-slope (WIS), wash-out-slope (WOS). Partition-wise statistics for every feature map were calculated, resulting in a total of 21 kinetic statistic features. ANOVA analysis was done to select features that differ significantly between node positive and node negative women. Using the computed kinetic statistic features a leave-one-out SVM classifier was learned that performs with AUC=0.77 under the ROC curve, outperforming the conventional kinetic measures, including maximum peak enhancement (MPE) and signal enhancement ratio (SER), (AUCs of 0.61 and 0.57 respectively). These findings suggest that our DCE-MRI kinetic statistic features can be used to improve the prediction of axillary node status in breast cancer patients. Such features could ultimately be used as imaging biomarkers to guide personalized treatment choices for women diagnosed with breast cancer.
Tamayo, Pablo; Cho, Yoon-Jae; Tsherniak, Aviad; Greulich, Heidi; Ambrogio, Lauren; Schouten-van Meeteren, Netteke; Zhou, Tianni; Buxton, Allen; Kool, Marcel; Meyerson, Matthew; Pomeroy, Scott L.; Mesirov, Jill P.
2011-01-01
Purpose Despite significant progress in the molecular understanding of medulloblastoma, stratification of risk in patients remains a challenge. Focus has shifted from clinical parameters to molecular markers, such as expression of specific genes and selected genomic abnormalities, to improve accuracy of treatment outcome prediction. Here, we show how integration of high-level clinical and genomic features or risk factors, including disease subtype, can yield more comprehensive, accurate, and biologically interpretable prediction models for relapse versus no-relapse classification. We also introduce a novel Bayesian nomogram indicating the amount of evidence that each feature contributes on a patient-by-patient basis. Patients and Methods A Bayesian cumulative log-odds model of outcome was developed from a training cohort of 96 children treated for medulloblastoma, starting with the evidence provided by clinical features of metastasis and histology (model A) and incrementally adding the evidence from gene-expression–derived features representing disease subtype–independent (model B) and disease subtype–dependent (model C) pathways, and finally high-level copy-number genomic abnormalities (model D). The models were validated on an independent test cohort (n = 78). Results On an independent multi-institutional test data set, models A to D attain an area under receiver operating characteristic (au-ROC) curve of 0.73 (95% CI, 0.60 to 0.84), 0.75 (95% CI, 0.64 to 0.86), 0.80 (95% CI, 0.70 to 0.90), and 0.78 (95% CI, 0.68 to 0.88), respectively, for predicting relapse versus no relapse. Conclusion The proposed models C and D outperform the current clinical classification schema (au-ROC, 0.68), our previously published eight-gene outcome signature (au-ROC, 0.71), and several new schemas recently proposed in the literature for medulloblastoma risk stratification. PMID:21357789
NASA Astrophysics Data System (ADS)
Vasu, Nikhil N.; Lee, Seung-Rae
2016-06-01
An ever-increasing trend of extreme rainfall events in South Korea owing to climate change is causing shallow landslides and debris flows in mountains that cover 70% of the total land area of the nation. These catastrophic, gravity-driven processes cost the government several billion KRW (South Korean Won) in losses in addition to fatalities every year. The most common type of landslide observed is the shallow landslide, which occurs at 1-3 m depth, and may mobilize into more catastrophic flow-type landslides. Hence, to predict potential landslide areas, susceptibility maps are developed in a geographical information system (GIS) environment utilizing available morphological, hydrological, geotechnical, and geological data. Landslide susceptibility models were developed using 163 landslide points and an equal number of nonlandslide points in Mt. Woomyeon, Seoul, and 23 landslide conditioning factors. However, because not all of the factors contribute to the determination of the spatial probability for landslide initiation, and a simple filter or wrapper-based approach is not efficient in identifying all of the relevant features, a feedback-loop-based hybrid algorithm was implemented in conjunction with a learning scheme called an extreme learning machine, which is based on a single-layer, feed-forward network. Validation of the constructed susceptibility model was conducted using a testing set of landslide inventory data through a prediction rate curve. The model selected 13 relevant conditioning factors out of the initial 23; and the resulting susceptibility map shows a success rate of 85% and a prediction rate of 89.45%, indicating a good performance, in contrast to the low success and prediction rate of 69.19% and 56.19%, respectively, as obtained using a wrapper technique.
Iwata, Hiroaki; Sawada, Ryusuke; Mizutani, Sayaka; Yamanishi, Yoshihiro
2015-02-23
Drug repositioning, or the application of known drugs to new indications, is a challenging issue in pharmaceutical science. In this study, we developed a new computational method to predict unknown drug indications for systematic drug repositioning in a framework of supervised network inference. We defined a descriptor for each drug-disease pair based on the phenotypic features of drugs (e.g., medicinal effects and side effects) and various molecular features of diseases (e.g., disease-causing genes, diagnostic markers, disease-related pathways, and environmental factors) and constructed a statistical model to predict new drug-disease associations for a wide range of diseases in the International Classification of Diseases. Our results show that the proposed method outperforms previous methods in terms of accuracy and applicability, and its performance does not depend on drug chemical structure similarity. Finally, we performed a comprehensive prediction of a drug-disease association network consisting of 2349 drugs and 858 diseases and described biologically meaningful examples of newly predicted drug indications for several types of cancers and nonhereditary diseases.
Similarity-based Regularized Latent Feature Model for Link Prediction in Bipartite Networks.
Wang, Wenjun; Chen, Xue; Jiao, Pengfei; Jin, Di
2017-12-05
Link prediction is an attractive research topic in the field of data mining and has significant applications in improving performance of recommendation system and exploring evolving mechanisms of the complex networks. A variety of complex systems in real world should be abstractly represented as bipartite networks, in which there are two types of nodes and no links connect nodes of the same type. In this paper, we propose a framework for link prediction in bipartite networks by combining the similarity based structure and the latent feature model from a new perspective. The framework is called Similarity Regularized Nonnegative Matrix Factorization (SRNMF), which explicitly takes the local characteristics into consideration and encodes the geometrical information of the networks by constructing a similarity based matrix. We also develop an iterative scheme to solve the objective function based on gradient descent. Extensive experiments on a variety of real world bipartite networks show that the proposed framework of link prediction has a more competitive, preferable and stable performance in comparison with the state-of-art methods.
Modeling first impressions from highly variable facial images
Vernon, Richard J. W.; Sutherland, Clare A. M.; Young, Andrew W.; Hartley, Tom
2014-01-01
First impressions of social traits, such as trustworthiness or dominance, are reliably perceived in faces, and despite their questionable validity they can have considerable real-world consequences. We sought to uncover the information driving such judgments, using an attribute-based approach. Attributes (physical facial features) were objectively measured from feature positions and colors in a database of highly variable “ambient” face photographs, and then used as input for a neural network to model factor dimensions (approachability, youthful-attractiveness, and dominance) thought to underlie social attributions. A linear model based on this approach was able to account for 58% of the variance in raters’ impressions of previously unseen faces, and factor-attribute correlations could be used to rank attributes by their importance to each factor. Reversing this process, neural networks were then used to predict facial attributes and corresponding image properties from specific combinations of factor scores. In this way, the factors driving social trait impressions could be visualized as a series of computer-generated cartoon face-like images, depicting how attributes change along each dimension. This study shows that despite enormous variation in ambient images of faces, a substantial proportion of the variance in first impressions can be accounted for through linear changes in objectively defined features. PMID:25071197
Bi, Qiu; Xiao, Zhibo; Lv, Fajin; Liu, Yao; Zou, Chunxia; Shen, Yiqing
2018-02-05
The objective of this study was to find clinical parameters and qualitative and quantitative magnetic resonance imaging (MRI) features for differentiating uterine sarcoma from atypical leiomyoma (ALM) preoperatively and to calculate predictive values for uterine sarcoma. Data from 60 patients with uterine sarcoma and 88 patients with ALM confirmed by surgery and pathology were collected. Clinical parameters, qualitative MRI features, diffusion-weighted imaging with apparent diffusion coefficient values, and quantitative parameters of dynamic contrast-enhanced MRI of these two tumor types were compared. Predictive values for uterine sarcoma were calculated using multivariable logistic regression. Patient clinical manifestations, tumor locations, margins, T2-weighted imaging signals, mean apparent diffusion coefficient values, minimum apparent diffusion coefficient values, and time-signal intensity curves of solid tumor components were obvious significant parameters for distinguishing between uterine sarcoma and ALM (all P <.001). Abnormal vaginal bleeding, tumors located mainly in the uterine cavity, ill-defined tumor margins, and mean apparent diffusion coefficient values of <1.272 × 10 -3 mm 2 /s were significant preoperative predictors of uterine sarcoma. When the overall scores of these four predictors were greater than or equal to 7 points, the sensitivity, the specificity, the accuracy, and the positive and negative predictive values were 88.9%, 99.9%, 95.7%, 97.0%, and 95.1%, respectively. The use of clinical parameters and multiparametric MRI as predictive factors was beneficial for diagnosing uterine sarcoma preoperatively. These findings could be helpful for guiding treatment decisions. Copyright © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Runoff as a factor in USLE/RUSLE technology
NASA Astrophysics Data System (ADS)
Kinnell, Peter
2014-05-01
Modelling erosion for prediction purposes started with the development of the Universal Soil Loss Equation the focus of which was the prediction of long term (~20) average annul soil loss from field sized areas. That purpose has been maintained in the subsequent revision RUSLE, the most widely used erosion prediction model in the world. The lack of ability to predict short term soil loss saw the development of so-called process based models like WEPP and EUROSEM which focussed on predicting event erosion but failed to improve the prediction of long term erosion where the RUSLE worked well. One of the features of erosion recognised in the so-called process based modes is the fact that runoff is a primary factor in rainfall erosion and some modifications of USLE/RUSLE model have been proposed have included runoff as in independent factor in determining event erosivity. However, these models have ignored fundamental mathematical rules. The USLE-M which replaces the EI30 index by the product of the runoff ratio and EI30 was developed from the concept that soil loss is the product of runoff and sediment concentration and operates in a way that obeys the mathematical rules upon which the USLE/RUSLE model was based. In accounts for event soil loss better that the EI30 index where runoff values are known or predicted adequately. RUSLE2 now includes a capacity to model runoff driven erosion.
ERIC Educational Resources Information Center
Haskett, Mary E.; Allaire, Jason C.; Kreig, Shawn; Hart, Kendrea C.
2008-01-01
Objective: Although social maladjustment appears to be common among abused children, negative outcomes are not inevitable. This investigation was designed to determine whether ethnicity and features of the parenting context predicted children's social adjustment, and whether the strength and direction of these relations differed for abused and…
Limb-Enhancer Genie: An accessible resource of accurate enhancer predictions in the developing limb
Monti, Remo; Barozzi, Iros; Osterwalder, Marco; ...
2017-08-21
Epigenomic mapping of enhancer-associated chromatin modifications facilitates the genome-wide discovery of tissue-specific enhancers in vivo. However, reliance on single chromatin marks leads to high rates of false-positive predictions. More sophisticated, integrative methods have been described, but commonly suffer from limited accessibility to the resulting predictions and reduced biological interpretability. Here we present the Limb-Enhancer Genie (LEG), a collection of highly accurate, genome-wide predictions of enhancers in the developing limb, available through a user-friendly online interface. We predict limb enhancers using a combination of > 50 published limb-specific datasets and clusters of evolutionarily conserved transcription factor binding sites, taking advantage ofmore » the patterns observed at previously in vivo validated elements. By combining different statistical models, our approach outperforms current state-of-the-art methods and provides interpretable measures of feature importance. Our results indicate that including a previously unappreciated score that quantifies tissue-specific nuclease accessibility significantly improves prediction performance. We demonstrate the utility of our approach through in vivo validation of newly predicted elements. Moreover, we describe general features that can guide the type of datasets to include when predicting tissue-specific enhancers genome-wide, while providing an accessible resource to the general biological community and facilitating the functional interpretation of genetic studies of limb malformations.« less
Dimensions and categories: the "big five" factors and the DSM personality disorders.
Morey, L C; Gunderson, J; Quigley, B D; Lyons, M
2000-09-01
The five-factor model of personality, which has been widely studied in personality psychology, has been hypothesized to have specific relevance for DSM-defined personality disorders. To evaluate hypothesized relationships of the five-factor model of personality to personality disorders, 144 patients with personality disorders (diagnosed via a structured interview) completed an inventory to assess the five-factor model. Results indicated that the majority of the personality disorders can be differentiated in theoretically predictable ways using the five-factor model of personality. However, while the personality disorders as a whole appear to be differentiable from normal personality functioning on the five factors, the patterns are quite similar across the disorders, a finding that may provide some insight into the general nature of personality pathology but may also suggest problems with discriminant validity. Third, it does not appear that considering disorders as special combinations of features (as might be expected in some categorical models) is more informative than considering them as the sum of certain features (as might be expected in a dimensional model).
[Asymmetric confusability effect in recognition memory of cats pictures].
Ando, M; Hakoda, Y
1999-06-01
Performance superiority of the addition of features in the stimuli over the deletion on recognition (asymmetric confusability effect) has been shown in previous studies (Pezdek, Maki, Valencia-Laver, Whetstone, Stoeckert, & Dougherty, 1988; Ando & Hakoda, 1998). We investigated the same effect by using a familiar living thing (cat) as a stimulus. Ten subjects were given a recognition task using pictures of cats with feature changes (additions, deletions, or no change). Results showed that the picture with deletions were easier to recognize than those with additions, which was opposite to the previous studies. Then, we examined the possibility that performance superiority of the deletions over the additions was mediated by the factor of impression. Another group of 18 subjects was asked to rate the impression scales consisting of a "typicality-reality factor", a "stability-balance factor", and a "grotesque-disgust factor". Results showed that there was a significant difference in impression ratings for each factor between the additions and the deletions, and that impression ratings predicted recognition performance well. It was concluded that performance superiority of the deletions over the additions was mediated by the factor of impression.
Lai, Hongmin; Su, Chiu-Wen; Yen, Amy Ming-Fang; Chiu, Sherry Yueh-Hsia; Fann, Jean Ching-Yuan; Wu, Wendy Yi-Ying; Chuang, Shu-Lin; Liu, Hsing-Chih; Chen, Hsiu-Hsi; Chen, Li-Sheng
2015-05-01
The aim of this study was to predict periodontal disease (PD) with demographical features, oral health behaviour, and clinical correlates based on a national survey of periodontal disease in Taiwan. A total of 4061 subjects who were enrolled in a cross-sectional nationwide survey on periodontal conditions of residents aged 18 years or older in Taiwan between 2007 and 2008 were included. The community periodontal index (CPI) was used to measure the periodontal status at the subject and sextant levels. Information on demographical features and other relevant predictive factors for PD was collected using a questionnaire. In our study population, 56.2% of subjects had CPI grades ≥3. Periodontitis, as defined by CPI ≥3, was best predicted by a model including age, gender, education, brushing frequency, mobile teeth, gingival bleeding, smoking, and BMI. The area under the curve (AUC) for the final prediction model was 0.712 (0.690-0.734). The AUC was 0.702 (0.665-0.740) according to cross-validation. A prediction model for PD using information obtained from questionnaires was developed. The feasibility of its application to risk stratification of PD should be considered with regard to community-based screening for asymptomatic PD. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.
Ni, Qianwu; Chen, Lei
2017-01-01
Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
ERIC Educational Resources Information Center
Sweller, Naomi
2015-01-01
Individuals with autism have difficulty generalising information from one situation to another, a process that requires the learning of categories and concepts. Category information may be learned through: (1) classifying items into categories, or (2) predicting missing features of category items. Predicting missing features has to this point been…
Little, Paul; Hobbs, FD Richard; Mant, David; McNulty, Cliodna AM; Mullee, Mark
2012-01-01
Background Management of pharyngitis is commonly based on features which are thought to be associated with Lancefield group A beta-haemolytic streptococci (GABHS) but it is debatable which features best predict GABHS. Non-group A strains share major virulence factors with group A, but it is unclear how commonly they present and whether their presentation differs. Aim To assess the incidence and clinical variables associated with streptococcal infections. Design and setting Prospective diagnostic cohort study in UK primary care. Method The presence of pathogenic streptococci from throat swabs was assessed among patients aged ≥5 years presenting with acute sore throat. Results Pathogenic streptococci were found in 204/597 patients (34%, 95% CI = 31 to 38%): 33% (68/204) were non-group A streptococci, mostly C (n = 29), G (n = 18) and B (n = 17); rarely D (n = 3) and Streptococcus pneumoniae (n = 1). Patients presented with similar features whether the streptococci were group A or non-group A. The features best predicting A, C or G beta-haemolytic streptococci were patient’s assessment of severity (odds ratio [OR] for a bad sore throat 3.31, 95% CI = 1.24 to 8.83); doctors’ assessment of severity (severely inflamed tonsils OR 2.28, 95% CI = 1.39 to 3.74); absence of a bad cough (OR 2.73, 95% CI = 1.56 to 4.76), absence of a coryza (OR 1.54, 95% CI = 0.99 to 2.41); and moderately bad or worse muscle aches (OR 2.20, 95% CI = 1.41 to 3.42). Conclusion Non-group A strains commonly cause streptococcal sore throats, and present with similar symptomatic clinical features to group A streptococci. The best features to predict streptococcal sore throat presenting in primary care deserve revisiting. PMID:23211183
Little, Paul; Hobbs, F D Richard; Mant, David; McNulty, Cliodna A M; Mullee, Mark
2012-11-01
Management of pharyngitis is commonly based on features which are thought to be associated with Lancefield group A beta-haemolytic streptococci (GABHS) but it is debatable which features best predict GABHS. Non-group A strains share major virulence factors with group A, but it is unclear how commonly they present and whether their presentation differs. To assess the incidence and clinical variables associated with streptococcal infections. Prospective diagnostic cohort study in UK primary care. The presence of pathogenic streptococci from throat swabs was assessed among patients aged ≥5 years presenting with acute sore throat. Pathogenic streptococci were found in 204/597 patients (34%, 95% CI = 31 to 38%): 33% (68/204) were non-group A streptococci, mostly C (n = 29), G (n = 18) and B (n = 17); rarely D (n = 3) and Streptococcus pneumoniae (n = 1). Patients presented with similar features whether the streptococci were group A or non-group A. The features best predicting A, C or G beta-haemolytic streptococci were patient's assessment of severity (odds ratio [OR] for a bad sore throat 3.31, 95% CI = 1.24 to 8.83); doctors' assessment of severity (severely inflamed tonsils OR 2.28, 95% CI = 1.39 to 3.74); absence of a bad cough (OR 2.73, 95% CI = 1.56 to 4.76), absence of a coryza (OR 1.54, 95% CI = 0.99 to 2.41); and moderately bad or worse muscle aches (OR 2.20, 95% CI = 1.41 to 3.42). Non-group A strains commonly cause streptococcal sore throats, and present with similar symptomatic clinical features to group A streptococci. The best features to predict streptococcal sore throat presenting in primary care deserve revisiting.
Shen, Wei-Chih; Chen, Shang-Wen; Liang, Ji-An; Hsieh, Te-Chun; Yen, Kuo-Yang; Kao, Chia-Hung
2017-09-01
In this study, we investigated the correlation between the lymph node (LN) status or histological types and textural features of cervical cancers on 18 F-fluorodeoxyglucose positron emission tomography/computed tomography. We retrospectively reviewed the imaging records of 170 patients with International Federation of Gynecology and Obstetrics stage IB-IVA cervical cancer. Four groups of textural features were studied in addition to the maximum standardized uptake value (SUV max ), metabolic tumor volume, and total lesion glycolysis (TLG). Moreover, we studied the associations between the indices and clinical parameters, including the LN status, clinical stage, and histology. Receiver operating characteristic curves were constructed to evaluate the optimal predictive performance among the various textural indices. Quantitative differences were determined using the Mann-Whitney U test. Multivariate logistic regression analysis was performed to determine the independent factors, among all the variables, for predicting LN metastasis. Among all the significant indices related to pelvic LN metastasis, homogeneity derived from the gray-level co-occurrence matrix (GLCM) was the sole independent predictor. By combining SUV max , the risk of pelvic LN metastasis can be scored accordingly. The TLG mean was the independent feature of positive para-aortic LNs. Quantitative differences between squamous and nonsquamous histology can be determined using short-zone emphasis (SZE) from the gray-level size zone matrix (GLSZM). This study revealed that in patients with cervical cancer, pelvic or para-aortic LN metastases can be predicted by using textural feature of homogeneity from the GLCM and TLG mean, respectively. SZE from the GLSZM is the sole feature associated with quantitative differences between squamous and nonsquamous histology.
Associations between park features and adolescent park use for physical activity.
Edwards, Nicole; Hooper, Paula; Knuiman, Matthew; Foster, Sarah; Giles-Corti, Billie
2015-02-18
Eighty per cent of adolescents globally do insufficient physical activity. Parks are a popular place for adolescents to be active. However, little is known about which park features are associated with higher levels of park use by adolescents. This study aimed to examine which environmental park features, and combination of features, were correlated with higher levels of park use for physical activity among adolescents. By examining park features in parks used by adolescents for physical activity, this study also aimed to create a park 'attractiveness' score predictive of adolescent park use, and to identify factors that might predict use of their closest park. Adolescents (n = 1304) living in Geraldton, a large rural centre of Western Australia, completed a survey that measured physical activity behaviour, perceptions of park availability and the main park used for physical activity. All parks in the study area (n = 58) were digitized using a Geographic Information System (GIS) and features audited using the Public Open Space Desktop Auditing Tool (POSDAT). Only 27% of participants reported using their closest park for physical activity. Park use was associated with seven features: presence of a skate park, walking paths, barbeques, picnic table, public access toilets, lighting around courts and equipment and number of trees >25. When combined to create an overall attractiveness score, every additional 'attractive' feature present, resulted in a park being nearly three times more likely to be in the high use category. To increase park use for physical activity, urban planners and designers should incorporate park features attractive to adolescents.
Finding New Perovskite Halides via Machine learning
NASA Astrophysics Data System (ADS)
Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab
2016-04-01
Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning) via building a support vector machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.
Nie, Zhi; Vairavan, Srinivasan; Narayan, Vaibhav A; Ye, Jieping; Li, Qingqin S
2018-01-01
Identification of risk factors of treatment resistance may be useful to guide treatment selection, avoid inefficient trial-and-error, and improve major depressive disorder (MDD) care. We extended the work in predictive modeling of treatment resistant depression (TRD) via partition of the data from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) cohort into a training and a testing dataset. We also included data from a small yet completely independent cohort RIS-INT-93 as an external test dataset. We used features from enrollment and level 1 treatment (up to week 2 response only) of STAR*D to explore the feature space comprehensively and applied machine learning methods to model TRD outcome at level 2. For TRD defined using QIDS-C16 remission criteria, multiple machine learning models were internally cross-validated in the STAR*D training dataset and externally validated in both the STAR*D testing dataset and RIS-INT-93 independent dataset with an area under the receiver operating characteristic curve (AUC) of 0.70-0.78 and 0.72-0.77, respectively. The upper bound for the AUC achievable with the full set of features could be as high as 0.78 in the STAR*D testing dataset. Model developed using top 30 features identified using feature selection technique (k-means clustering followed by χ2 test) achieved an AUC of 0.77 in the STAR*D testing dataset. In addition, the model developed using overlapping features between STAR*D and RIS-INT-93, achieved an AUC of > 0.70 in both the STAR*D testing and RIS-INT-93 datasets. Among all the features explored in STAR*D and RIS-INT-93 datasets, the most important feature was early or initial treatment response or symptom severity at week 2. These results indicate that prediction of TRD prior to undergoing a second round of antidepressant treatment could be feasible even in the absence of biomarker data.
Fatigue life and crack growth prediction methodology
NASA Technical Reports Server (NTRS)
Newman, J. C., Jr.; Phillips, E. P.; Everett, R. A., Jr.
1993-01-01
The capabilities of a plasticity-induced crack-closure model and life-prediction code to predict fatigue crack growth and fatigue lives of metallic materials are reviewed. Crack-tip constraint factors, to account for three-dimensional effects, were selected to correlate large-crack growth rate data as a function of the effective-stress-intensity factor range (delta(K(sub eff))) under constant-amplitude loading. Some modifications to the delta(K(sub eff))-rate relations were needed in the near threshold regime to fit small-crack growth rate behavior and endurance limits. The model was then used to calculate small- and large-crack growth rates, and in some cases total fatigue lives, for several aluminum and titanium alloys under constant-amplitude, variable-amplitude, and spectrum loading. Fatigue lives were calculated using the crack growth relations and microstructural features like those that initiated cracks. Results from the tests and analyses agreed well.
A framework for feature extraction from hospital medical data with applications in risk prediction.
Tran, Truyen; Luo, Wei; Phung, Dinh; Gupta, Sunil; Rana, Santu; Kennedy, Richard Lee; Larkins, Ann; Venkatesh, Svetha
2014-12-30
Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities. Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods. For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD-baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes-baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders-baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia-baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72). The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.
A digital prediction algorithm for a single-phase boost PFC
NASA Astrophysics Data System (ADS)
Qing, Wang; Ning, Chen; Weifeng, Sun; Shengli, Lu; Longxing, Shi
2012-12-01
A novel digital control algorithm for digital control power factor correction is presented, which is called the prediction algorithm and has a feature of a higher PF (power factor) with lower total harmonic distortion, and a faster dynamic response with the change of the input voltage or load current. For a certain system, based on the current system state parameters, the prediction algorithm can estimate the track of the output voltage and the inductor current at the next switching cycle and get a set of optimized control sequences to perfectly track the trajectory of input voltage. The proposed prediction algorithm is verified at different conditions, and computer simulation and experimental results under multi-situations confirm the effectiveness of the prediction algorithm. Under the circumstances that the input voltage is in the range of 90-265 V and the load current in the range of 20%-100%, the PF value is larger than 0.998. The startup and the recovery times respectively are about 0.1 s and 0.02 s without overshoot. The experimental results also verify the validity of the proposed method.
Prediction of road traffic death rate using neural networks optimised by genetic algorithm.
Jafari, Seyed Ali; Jahandideh, Sepideh; Jahandideh, Mina; Asadabadi, Ebrahim Barzegari
2015-01-01
Road traffic injuries (RTIs) are realised as a main cause of public health problems at global, regional and national levels. Therefore, prediction of road traffic death rate will be helpful in its management. Based on this fact, we used an artificial neural network model optimised through Genetic algorithm to predict mortality. In this study, a five-fold cross-validation procedure on a data set containing total of 178 countries was used to verify the performance of models. The best-fit model was selected according to the root mean square errors (RMSE). Genetic algorithm, as a powerful model which has not been introduced in prediction of mortality to this extent in previous studies, showed high performance. The lowest RMSE obtained was 0.0808. Such satisfactory results could be attributed to the use of Genetic algorithm as a powerful optimiser which selects the best input feature set to be fed into the neural networks. Seven factors have been known as the most effective factors on the road traffic mortality rate by high accuracy. The gained results displayed that our model is very promising and may play a useful role in developing a better method for assessing the influence of road traffic mortality risk factors.
Cui, Zaixu; Gong, Gaolang
2018-06-02
Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.
Liu, Liang; Cai, Yudong; Lu, Wencong; Feng, Kaiyan; Peng, Chunrong; Niu, Bing
2009-03-06
Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.
ERIC Educational Resources Information Center
Bülbül, Ayse Eliüsük; Arslan, Coskun
2017-01-01
The main objective of this study was to determine the relationship between self-determination, self-compassion and the five-factor personality traits of university students. Moreover it was aimed to determine whether self-compassion, self-determination and personality traits predict patience levels at a meaningful level. The sample population of…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fried, David V.; Graduate School of Biomedical Sciences, The University of Texas Health Science Center at Houston, Houston, Texas; Tucker, Susan L.
2014-11-15
Purpose: To determine whether pretreatment CT texture features can improve patient risk stratification beyond conventional prognostic factors (CPFs) in stage III non-small cell lung cancer (NSCLC). Methods and Materials: We retrospectively reviewed 91 cases with stage III NSCLC treated with definitive chemoradiation therapy. All patients underwent pretreatment diagnostic contrast enhanced computed tomography (CE-CT) followed by 4-dimensional CT (4D-CT) for treatment simulation. We used the average-CT and expiratory (T50-CT) images from the 4D-CT along with the CE-CT for texture extraction. Histogram, gradient, co-occurrence, gray tone difference, and filtration-based techniques were used for texture feature extraction. Penalized Cox regression implementing cross-validation wasmore » used for covariate selection and modeling. Models incorporating texture features from the 33 image types and CPFs were compared to those with models incorporating CPFs alone for overall survival (OS), local-regional control (LRC), and freedom from distant metastases (FFDM). Predictive Kaplan-Meier curves were generated using leave-one-out cross-validation. Patients were stratified based on whether their predicted outcome was above or below the median. Reproducibility of texture features was evaluated using test-retest scans from independent patients and quantified using concordance correlation coefficients (CCC). We compared models incorporating the reproducibility seen on test-retest scans to our original models and determined the classification reproducibility. Results: Models incorporating both texture features and CPFs demonstrated a significant improvement in risk stratification compared to models using CPFs alone for OS (P=.046), LRC (P=.01), and FFDM (P=.005). The average CCCs were 0.89, 0.91, and 0.67 for texture features extracted from the average-CT, T50-CT, and CE-CT, respectively. Incorporating reproducibility within our models yielded 80.4% (±3.7% SD), 78.3% (±4.0% SD), and 78.8% (±3.9% SD) classification reproducibility in terms of OS, LRC, and FFDM, respectively. Conclusions: Pretreatment tumor texture may provide prognostic information beyond that obtained from CPFs. Models incorporating feature reproducibility achieved classification rates of ∼80%. External validation would be required to establish texture as a prognostic factor.« less
Ozkiris, Ayse; Essizoglu, Altan; Gulec, Gulcan; Aksaray, Gokay
2015-04-01
The aim of this study is firstly to compare the obsessive-compulsive disorder (OCD) patients with good insight and OCD patients with poor insight in terms of socio-demographic and clinical features; to investigate the relation between insight and the level of the expressed emotion (EE) in the patients; and lastly to specify the factors that predict level of insight. OCD patients with good insight and patients with poor insight were compared in terms of clinical features and the perceived EE level of the patients and the individuals that they live with in order to specify the factors that predict the insight level, and to investigate the relationship between insight level and EE. It was found that the total Expressed Emotion Scale, total Level of Expressed Emotion (LEE), LEE-Emotional Response and LEE-Tolerance/Expectation subscale scores of the group comprised of patients with poor insight are higher than the other group. The results also show that the duration of illness and Yale-Brown Obsessive Compulsive Scale (Y-BOCS) total score predict insight level. This study shows that the level of EE perceived by the patients with poor insight and the person that he/she lives with, is higher than the group with good insight. The studies that investigate the relationship between the factors of insight level and EE level, which are indicated to determine the level of the illness severity and its chronicity, will enable the researchers to understand the importance of the role of the family on the treatment processes of OCD.
Critical Features of Fragment Libraries for Protein Structure Prediction
dos Santos, Karina Baptista
2017-01-01
The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928
Critical Features of Fragment Libraries for Protein Structure Prediction.
Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel
2017-01-01
The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.
Hettige, Nuwan C; Nguyen, Thai Binh; Yuan, Chen; Rajakulendran, Thanara; Baddour, Jermeen; Bhagwat, Nikhil; Bani-Fatemi, Ali; Voineskos, Aristotle N; Mallar Chakravarty, M; De Luca, Vincenzo
2017-07-01
Suicide is a major concern for those afflicted by schizophrenia. Identifying patients at the highest risk for future suicide attempts remains a complex problem for psychiatric interventions. Machine learning models allow for the integration of many risk factors in order to build an algorithm that predicts which patients are likely to attempt suicide. Currently it is unclear how to integrate previously identified risk factors into a clinically relevant predictive tool to estimate the probability of a patient with schizophrenia for attempting suicide. We conducted a cross-sectional assessment on a sample of 345 participants diagnosed with schizophrenia spectrum disorders. Suicide attempters and non-attempters were clearly identified using the Columbia Suicide Severity Rating Scale (C-SSRS) and the Beck Suicide Ideation Scale (BSS). We developed four classification algorithms using a regularized regression, random forest, elastic net and support vector machine models with sociocultural and clinical variables as features to train the models. All classification models performed similarly in identifying suicide attempters and non-attempters. Our regularized logistic regression model demonstrated an accuracy of 67% and an area under the curve (AUC) of 0.71, while the random forest model demonstrated 66% accuracy and an AUC of 0.67. Support vector classifier (SVC) model demonstrated an accuracy of 67% and an AUC of 0.70, and the elastic net model demonstrated and accuracy of 65% and an AUC of 0.71. Machine learning algorithms offer a relatively successful method for incorporating many clinical features to predict individuals at risk for future suicide attempts. Increased performance of these models using clinically relevant variables offers the potential to facilitate early treatment and intervention to prevent future suicide attempts. Copyright © 2017 Elsevier Inc. All rights reserved.
The value of nodal information in predicting lung cancer relapse using 4DPET/4DCT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Heyse, E-mail: heyse.li@mail.utoronto.ca; Becker, Nathan; Raman, Srinivas
2015-08-15
Purpose: There is evidence that computed tomography (CT) and positron emission tomography (PET) imaging metrics are prognostic and predictive in nonsmall cell lung cancer (NSCLC) treatment outcomes. However, few studies have explored the use of standardized uptake value (SUV)-based image features of nodal regions as predictive features. The authors investigated and compared the use of tumor and node image features extracted from the radiotherapy target volumes to predict relapse in a cohort of NSCLC patients undergoing chemoradiation treatment. Methods: A prospective cohort of 25 patients with locally advanced NSCLC underwent 4DPET/4DCT imaging for radiation planning. Thirty-seven image features were derivedmore » from the CT-defined volumes and SUVs of the PET image from both the tumor and nodal target regions. The machine learning methods of logistic regression and repeated stratified five-fold cross-validation (CV) were used to predict local and overall relapses in 2 yr. The authors used well-known feature selection methods (Spearman’s rank correlation, recursive feature elimination) within each fold of CV. Classifiers were ranked on their Matthew’s correlation coefficient (MCC) after CV. Area under the curve, sensitivity, and specificity values are also presented. Results: For predicting local relapse, the best classifier found had a mean MCC of 0.07 and was composed of eight tumor features. For predicting overall relapse, the best classifier found had a mean MCC of 0.29 and was composed of a single feature: the volume greater than 0.5 times the maximum SUV (N). Conclusions: The best classifier for predicting local relapse had only tumor features. In contrast, the best classifier for predicting overall relapse included a node feature. Overall, the methods showed that nodes add value in predicting overall relapse but not local relapse.« less
Hafiz, Pegah; Nematollahi, Mohtaram; Boostani, Reza; Namavar Jahromi, Bahia
2017-10-01
In vitro fertilization (IVF) and intracytoplasmic sperm injection (ICSI) are two important subsets of the assisted reproductive techniques, used for the treatment of infertility. Predicting implantation outcome of IVF/ICSI or the chance of pregnancy is essential for infertile couples, since these treatments are complex and expensive with a low probability of conception. In this cross-sectional study, the data of 486 patients were collected using census method. The IVF/ICSI dataset contains 29 variables along with an identifier for each patient that is either negative or positive. Mean accuracy and mean area under the receiver operating characteristic (ROC) curve are calculated for the classifiers. Sensitivity, specificity, positive and negative predictive values, and likelihood ratios of classifiers are employed as indicators of performance. The state-of-art classifiers which are candidates for this study include support vector machines, recursive partitioning (RPART), random forest (RF), adaptive boosting, and one-nearest neighbor. RF and RPART outperform the other comparable methods. The results revealed the areas under the ROC curve (AUC) as 84.23 and 82.05%, respectively. The importance of IVF/ICSI features was extracted from the output of RPART. Our findings demonstrate that the probability of pregnancy is low for women aged above 38. Classifiers RF and RPART are better at predicting IVF/ICSI cases compared to other decision makers that were tested in our study. Elicited decision rules of RPART determine useful predictive features of IVF/ICSI. Out of 20 factors, the age of woman, number of developed embryos, and serum estradiol level on the day of human chorionic gonadotropin administration are the three best features for such prediction. Copyright© by Royan Institute. All rights reserved.
Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2015-01-01
Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797
Trull, T J; Widiger, T A; Burr, R
2001-04-01
The Structured Interview for the Five-Factor Model (SIFFM; Trull & Widiger, 1997) is an 120-item semistructured interview that assesses both adaptive and maladaptive features of the personality traits included in the five-factor model of personality, or "Big Five." In this article, we evaluate the ability of SIFFM scores to predict personality disorder symptomatology in a sample of 232 adults (46 outpatients and 186 nonclinical college students). Personality disorder symptoms were assessed using the Personality Diagnostic Questionnaire-Revised (PDQ-R; Hyler & Rider, 1987). Results indicated that many of the predicted associations between lower-order personality traits and personality disorders were supported. Further, many of these associations held even after controlling for comorbid personality disorder symptoms. These findings may help inform conceptualizations of the personality disorders, as well as etiological theories and treatment.
Effect of horizontal curves on urban arterial crashes.
Banihashemi, Mohamadreza
2016-10-01
The crash prediction models of the Highway Safety Manual (HSM), 2010 estimate the expected number of crashes for different facility types. Models in Part C Chapter 12 of the first edition of the HSM include crash prediction models for divided and undivided urban arterials. Each of the HSM crash prediction models for highway segments is comprised of a "Safety Performance Function," a function of AADT and segment length, plus, a series of "Crash Modification Factors" (CMFs). The SPF estimates the expected number of crashes for the site if the site features are of base condition. The effects of the other features of the site, if their values are different from base condition, are carried out through use of CMFs. The existing models for urban arterials do not have any CMF for horizontal curvature. The goal of this research is to investigate if the horizontal alignment has any significant effect on crashes on any of these types of facilities and if so, to develop a CMF for this feature. Washington State cross sectional data from the Highway Safety Information System (HSIS), 2014 was used in this research. Data from 2007 to 2009 was used to conduct the investigation. The 2010 data was used to validate the results. As the results showed, the horizontal curvature has significant safety effect on two-lane undivided urban arterials with speed limits of 35 mph and higher and using a CMF for horizontal curvature in the crash prediction model of this type of facility improves the prediction of crashes significantly, for both tangent and curve segments. Copyright © 2016 Elsevier Ltd. All rights reserved.
Bronchoalveolar carcinoma: clinical, radiologic, and pathologic factors and survival.
Okubo, K; Mark, E J; Flieder, D; Wain, J C; Wright, C D; Moncure, A C; Grillo, H C; Mathisen, D J
1999-10-01
The principal feature of bronchoalveolar carcinoma is that it spreads along airways or aerogenously with multifocality, but many issues are unresolved. We studied 119 patients with pathologically confirmed bronchoalveolar carcinoma. Symptoms, smoking status, radiologic findings, the size of tumor, operative procedures, and complications were reviewed. We studied the pathologic features: presence or absence of aerogenous spread, patterns of growth, cell type, nuclear grade, mitosis, rate of bronchoalveolar carcinoma in adenocarcinoma, and lymphocyte infiltration. The correlation among clinical, radiologic, and pathologic findings was examined, and the factors affecting survival were analyzed. Symptomatic patients had more infiltrative radiographic features, and asymptomatic patients tended to have more mass-like features (P <.0001). Tumors with radiographically infiltrating lesions tended to have mucinous histologic features (P =.006). Tumors with mass lesions by radiograph tended to have nonmucinous and sclerosing histologic features (P =.003). Aerogenous spread was seen in 94% of specimens. The presence of a variety of cell types suggested multiple clonal origin. The overall survival in those patients undergoing resection was 69.1% at 5 years and 56.5% at 10 years. The significant factors affecting survival were radiologic presence of a mass or infiltrate, pathologic findings of the presence of sclerosis, association with a scar, the rate of bronchoalveolar carcinoma in adenocarcinoma, lymphocyte infiltration grade, nodal involvement, and status of complete resection. Mitosis or nuclear grade of tumor cells did not correlate with survival. Bronchoalveolar carcinoma showed good overall survival with appropriate surgical procedures. Certain radiologic or pathologic findings correlated with survival. These findings may enhance the ability to predict long-term survival.
Coates, Peter S.; Howe, Kristy B.; Casazza, Michael L.; Delehanty, David J.
2014-01-01
A goal in avian ecology is to understand factors that influence differences in nesting habitat and distribution among species, especially within changing landscapes. Over the past 2 decades, humans have altered sagebrush ecosystems as a result of expansion in energy production and transmission. Our primary study objective was to identify differences in the use of landscape characteristics and natural and anthropogenic features by nesting Common Ravens (Corvus corax) and 3 species of buteo (Swainson's Hawk [Buteo swainsoni], Red-tailed Hawk [B. jamaicensis], and Ferruginous Hawk [B. regalis]) within a sagebrush ecosystem in southeastern Idaho. During 2007–2009, we measured multiple environmental factors associated with 212 nest sites using data collected remotely and in the field. We then developed multinomial models to predict nesting probabilities by each species and predictive response curves based on model-averaged estimates. We found differences among species related to nesting substrate (natural vs. anthropogenic), agriculture, native grassland, and edge (interface of 2 cover types). Most important, ravens had a higher probability of nesting on anthropogenic features (0.80) than the other 3 species (Artemisia spp.), favoring increased numbers of nesting ravens and fewer nesting Ferruginous Hawks. Our results indicate that habitat alterations, fragmentation, and forthcoming disturbances anticipated with continued energy development in sagebrush steppe ecosystems can lead to predictable changes in raptor and raven communities.
Hestand, Matthew S; van Galen, Michiel; Villerius, Michel P; van Ommen, Gert-Jan B; den Dunnen, Johan T; 't Hoen, Peter AC
2008-01-01
Background The identification of transcription factor binding sites is difficult since they are only a small number of nucleotides in size, resulting in large numbers of false positives and false negatives in current approaches. Computational methods to reduce false positives are to look for over-representation of transcription factor binding sites in a set of similarly regulated promoters or to look for conservation in orthologous promoter alignments. Results We have developed a novel tool, "CORE_TF" (Conserved and Over-REpresented Transcription Factor binding sites) that identifies common transcription factor binding sites in promoters of co-regulated genes. To improve upon existing binding site predictions, the tool searches for position weight matrices from the TRANSFACR database that are over-represented in an experimental set compared to a random set of promoters and identifies cross-species conservation of the predicted transcription factor binding sites. The algorithm has been evaluated with expression and chromatin-immunoprecipitation on microarray data. We also implement and demonstrate the importance of matching the random set of promoters to the experimental promoters by GC content, which is a unique feature of our tool. Conclusion The program CORE_TF is accessible in a user friendly web interface at . It provides a table of over-represented transcription factor binding sites in the users input genes' promoters and a graphical view of evolutionary conserved transcription factor binding sites. In our test data sets it successfully predicts target transcription factors and their binding sites. PMID:19036135
Asymmetric bagging and feature selection for activities prediction of drug molecules.
Li, Guo-Zheng; Meng, Hao-Hua; Lu, Wen-Cong; Yang, Jack Y; Yang, Mary Qu
2008-05-28
Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.
Using machine learning to identify factors that govern amorphization of irradiated pyrochlores
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pilania, Ghanshyam; Whittle, Karl R.; Jiang, Chao
Structure–property relationships are a key materials science concept that enables the design of new materials. In the case of materials for application in radiation environments, correlating radiation tolerance with fundamental structural features of a material enables materials discovery. Here, we use a machine learning model to examine the factors that govern amorphization resistance in the complex oxide pyrochlore (A 2B 2O 7) in a regime in which amorphization occurs as a consequence of defect accumulation. We examine the fidelity of predictions based on cation radii and electronegativities, the oxygen positional parameter, and the energetics of disordering and amorphizing the material.more » No one factor alone adequately predicts amorphization resistance. We find that when multiple families of pyrochlores (with different B cations) are considered, radii and electronegativities provide the best prediction, but when the machine learning model is restricted to only the B = Ti pyrochlores, the energetics of disordering and amorphization are critical factors. We discuss how these static quantities provide insight into an inherently kinetic property such as amorphization resistance at finite temperature. Lastly, this work provides new insight into the factors that govern the amorphization susceptibility and highlights the ability of machine learning approaches to generate that insight.« less
Using machine learning to identify factors that govern amorphization of irradiated pyrochlores
Pilania, Ghanshyam; Whittle, Karl R.; Jiang, Chao; ...
2017-02-10
Structure–property relationships are a key materials science concept that enables the design of new materials. In the case of materials for application in radiation environments, correlating radiation tolerance with fundamental structural features of a material enables materials discovery. Here, we use a machine learning model to examine the factors that govern amorphization resistance in the complex oxide pyrochlore (A 2B 2O 7) in a regime in which amorphization occurs as a consequence of defect accumulation. We examine the fidelity of predictions based on cation radii and electronegativities, the oxygen positional parameter, and the energetics of disordering and amorphizing the material.more » No one factor alone adequately predicts amorphization resistance. We find that when multiple families of pyrochlores (with different B cations) are considered, radii and electronegativities provide the best prediction, but when the machine learning model is restricted to only the B = Ti pyrochlores, the energetics of disordering and amorphization are critical factors. We discuss how these static quantities provide insight into an inherently kinetic property such as amorphization resistance at finite temperature. Lastly, this work provides new insight into the factors that govern the amorphization susceptibility and highlights the ability of machine learning approaches to generate that insight.« less
Visual Prediction Error Spreads Across Object Features in Human Visual Cortex
Summerfield, Christopher; Egner, Tobias
2016-01-01
Visual cognition is thought to rely heavily on contextual expectations. Accordingly, previous studies have revealed distinct neural signatures for expected versus unexpected stimuli in visual cortex. However, it is presently unknown how the brain combines multiple concurrent stimulus expectations such as those we have for different features of a familiar object. To understand how an unexpected object feature affects the simultaneous processing of other expected feature(s), we combined human fMRI with a task that independently manipulated expectations for color and motion features of moving-dot stimuli. Behavioral data and neural signals from visual cortex were then interrogated to adjudicate between three possible ways in which prediction error (surprise) in the processing of one feature might affect the concurrent processing of another, expected feature: (1) feature processing may be independent; (2) surprise might “spread” from the unexpected to the expected feature, rendering the entire object unexpected; or (3) pairing a surprising feature with an expected feature might promote the inference that the two features are not in fact part of the same object. To formalize these rival hypotheses, we implemented them in a simple computational model of multifeature expectations. Across a range of analyses, behavior and visual neural signals consistently supported a model that assumes a mixing of prediction error signals across features: surprise in one object feature spreads to its other feature(s), thus rendering the entire object unexpected. These results reveal neurocomputational principles of multifeature expectations and indicate that objects are the unit of selection for predictive vision. SIGNIFICANCE STATEMENT We address a key question in predictive visual cognition: how does the brain combine multiple concurrent expectations for different features of a single object such as its color and motion trajectory? By combining a behavioral protocol that independently varies expectation of (and attention to) multiple object features with computational modeling and fMRI, we demonstrate that behavior and fMRI activity patterns in visual cortex are best accounted for by a model in which prediction error in one object feature spreads to other object features. These results demonstrate how predictive vision forms object-level expectations out of multiple independent features. PMID:27810936
Relevance popularity: A term event model based feature selection scheme for text classification.
Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao
2017-01-01
Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.
Problems of psychological monitoring in astronaut training.
Morgun, V V
1997-10-01
Monitoring of the goal-oriented psychological changes of a man during professional training is necessary. The level development of the astronaut psychic features is checked by means of psychological testing with the final aim to evaluate each professionally important psychological qualities and to evaluate in general. The list of psychological features needed for evaluation is determined and empirically selected weight factors based on wide statistical sampling is introduced. Accumulation of psychological test results can predict an astronaut's ability of solving complicated problems in a flight mission. It can help to correct the training process and reveal weakness.
Disfani, Fatemeh Miri; Hsu, Wei-Lun; Mizianty, Marcin J.; Oldfield, Christopher J.; Xue, Bin; Dunker, A. Keith; Uversky, Vladimir N.; Kurgan, Lukasz
2012-01-01
Motivation: Molecular recognition features (MoRFs) are short binding regions located within longer intrinsically disordered regions that bind to protein partners via disorder-to-order transitions. MoRFs are implicated in important processes including signaling and regulation. However, only a limited number of experimentally validated MoRFs is known, which motivates development of computational methods that predict MoRFs from protein chains. Results: We introduce a new MoRF predictor, MoRFpred, which identifies all MoRF types (α, β, coil and complex). We develop a comprehensive dataset of annotated MoRFs to build and empirically compare our method. MoRFpred utilizes a novel design in which annotations generated by sequence alignment are fused with predictions generated by a Support Vector Machine (SVM), which uses a custom designed set of sequence-derived features. The features provide information about evolutionary profiles, selected physiochemical properties of amino acids, and predicted disorder, solvent accessibility and B-factors. Empirical evaluation on several datasets shows that MoRFpred outperforms related methods: α-MoRF-Pred that predicts α-MoRFs and ANCHOR which finds disordered regions that become ordered when bound to a globular partner. We show that our predicted (new) MoRF regions have non-random sequence similarity with native MoRFs. We use this observation along with the fact that predictions with higher probability are more accurate to identify putative MoRF regions. We also identify a few sequence-derived hallmarks of MoRFs. They are characterized by dips in the disorder predictions and higher hydrophobicity and stability when compared to adjacent (in the chain) residues. Availability: http://biomine.ece.ualberta.ca/MoRFpred/; http://biomine.ece.ualberta.ca/MoRFpred/Supplement.pdf Contact: lkurgan@ece.ualberta.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22689782
What Was Learned in Predicting Slender Airframe Aerodynamics with the F16-XL Aircraft
NASA Technical Reports Server (NTRS)
Rizzi, Arthur; Lucking, James M.
2014-01-01
The CAWAPI-2 coordinated project has been underway to improve CFD predictions of slender airframe aerodynamics. The work is focused on two flow conditions and leverages a unique flight data set obtained with the F-16XL aircraft for comparison and verification. These conditions, a low-speed high angle-of-attack case and a transonic low angle-of-attack case, were selected from a prior prediction campaign wherein the CFD failed to provide acceptable results. In re-visiting these two cases, approaches for improved results include better, denser grids using more grid adaptation to local flow features as well as unsteady higher-fidelity physical modeling like hybrid RANS/URANS-LES methods. The work embodies predictions from multiple numerical formulations that are contributed from multiple organizations where some authors investigate other possible factors that could explain the discrepancies in agreement, e.g. effects due to deflected control surfaces during the flight tests, as well as static aeroelastic deflection of the outer wing. This paper presents the synthesis of all the results and findings and draws some conclusions that lead to an improved understanding of the underlying flow physics, and finally making the connections between the physics and aircraft features.
Understanding the biological underpinnings of ecohydrological processes
NASA Astrophysics Data System (ADS)
Huxman, T. E.; Scott, R. L.; Barron-Gafford, G. A.; Hamerlynck, E. P.; Jenerette, D.; Tissue, D. T.; Breshears, D. D.; Saleska, S. R.
2012-12-01
Climate change presents a challenge for predicting ecosystem response, as multiple factors drive both the physical and life processes happening on the land surface and their interactions result in a complex, evolving coupled system. For example, changes in surface temperature and precipitation influence near-surface hydrology through impacts on system energy balance, affecting a range of physical processes. These changes in the salient features of the environment affect biological processes and elicit responses along the hierarchy of life (biochemistry to community composition). Many of these structural or process changes can alter patterns of soil water-use and influence land surface characteristics that affect local climate. Of the many features that affect our ability to predict the future dynamics of ecosystems, it is this hierarchical response of life that creates substantial complexity. Advances in the ability to predict or understand aspects of demography help describe thresholds in coupled ecohydrological system. Disentangling the physical and biological features that underlie land surface dynamics following disturbance are allowing a better understanding of the partitioning of water in the time-course of recovery. Better predicting the timing of phenology and key seasonal events allow for a more accurate description of the full functional response of the land surface to climate. In addition, explicitly considering the hierarchical structural features of life are helping to describe complex time-dependent behavior in ecosystems. However, despite this progress, we have yet to build an ability to fully account for the generalization of the main features of living systems into models that can describe ecohydrological processes, especially acclimation, assembly and adaptation. This is unfortunate, given that many key ecosystem services are functions of these coupled co-evolutionary processes. To date, both the lack of controlled measurements and experimentation has precluded determination of sufficient theoretical development. Understanding the land-surface response and feedback to climate change requires a mechanistic understanding of the coupling of ecological and hydrological processes and an expansion of theory from the life sciences to appropriately contribute to the broader Earth system science goal.
NASA Astrophysics Data System (ADS)
Folkert, Michael R.; Setton, Jeremy; Apte, Aditya P.; Grkovski, Milan; Young, Robert J.; Schöder, Heiko; Thorstad, Wade L.; Lee, Nancy Y.; Deasy, Joseph O.; Oh, Jung Hun
2017-07-01
In this study, we investigate the use of imaging feature-based outcomes research (‘radiomics’) combined with machine learning techniques to develop robust predictive models for the risk of all-cause mortality (ACM), local failure (LF), and distant metastasis (DM) following definitive chemoradiation therapy (CRT). One hundred seventy four patients with stage III-IV oropharyngeal cancer (OC) treated at our institution with CRT with retrievable pre- and post-treatment 18F-fluorodeoxyglucose positron emission tomography (FDG-PET) scans were identified. From pre-treatment PET scans, 24 representative imaging features of FDG-avid disease regions were extracted. Using machine learning-based feature selection methods, multiparameter logistic regression models were built incorporating clinical factors and imaging features. All model building methods were tested by cross validation to avoid overfitting, and final outcome models were validated on an independent dataset from a collaborating institution. Multiparameter models were statistically significant on 5 fold cross validation with the area under the receiver operating characteristic curve (AUC) = 0.65 (p = 0.004), 0.73 (p = 0.026), and 0.66 (p = 0.015) for ACM, LF, and DM, respectively. The model for LF retained significance on the independent validation cohort with AUC = 0.68 (p = 0.029) whereas the models for ACM and DM did not reach statistical significance, but resulted in comparable predictive power to the 5 fold cross validation with AUC = 0.60 (p = 0.092) and 0.65 (p = 0.062), respectively. In the largest study of its kind to date, predictive features including increasing metabolic tumor volume, increasing image heterogeneity, and increasing tumor surface irregularity significantly correlated to mortality, LF, and DM on 5 fold cross validation in a relatively uniform single-institution cohort. The LF model also retained significance in an independent population.
Classification of AB O 3 perovskite solids: a machine learning study
Pilania, G.; Balachandran, P. V.; Gubernatis, J. E.; ...
2015-07-23
Here we explored the use of machine learning methods for classifying whether a particularABO 3chemistry forms a perovskite or non-perovskite structured solid. Starting with three sets of feature pairs (the tolerance and octahedral factors, theAandBionic radii relative to the radius of O, and the bond valence distances between theAandBions from the O atoms), we used machine learning to create a hyper-dimensional partial dependency structure plot using all three feature pairs or any two of them. Doing so increased the accuracy of our predictions by 2–3 percentage points over using any one pair. We also included the Mendeleev numbers of theAandBatomsmore » to this set of feature pairs. Moreover, doing this and using the capabilities of our machine learning algorithm, the gradient tree boosting classifier, enabled us to generate a new type of structure plot that has the simplicity of one based on using just the Mendeleev numbers, but with the added advantages of having a higher accuracy and providing a measure of likelihood of the predicted structure.« less
Does the concept of borderline personality features have clinical utility in childhood?
Hawes, David J
2014-01-01
Phenotypic features of borderline personality disorder may first emerge during childhood, alongside symptoms of common externalizing and internalizing disorders. Children with these borderline personality features (BPF) are, therefore, likely to come into contact with clinical services prior to adolescence. This raises the question of whether BPF may be clinically informative with respect to the formulation and treatment of childhood psychopathology. BPF in late childhood appear to be highly heritable, while also predicted by environmental risk factors that overlap with those related to both externalizing and internalizing disorders. These risk factors include hostile parenting, maternal insensitivity to infant attachment cues, and early peer victimization, thereby implicating both family and peer processes that play out across early development. Children with BPF appear to be further characterized by social-cognitive factors including social perspective coordination deficits, a shame-prone self-concept, and hypermentalizing, which may represent potential therapeutic targets. Clinical research into the implications of BPF for the treatment of childhood psychopathology is a current priority. It is proposed that the research designs that have contributed to recent evidence for the clinical utility of childhood psychopathic traits may likewise aid in understanding the potential clinical utility of BPF in children.
Clinicopathological Features to Predict Progression of IgA Nephropathy with Mild Proteinuria.
Chen, Ding; Liu, Jian; Duan, Shuwei; Chen, Pu; Tang, Li; Zhang, Li; Feng, Zhe; Cai, Guangyan; Wu, Jie; Chen, Xiangmei
2018-03-06
In the past, little attention has been paid to patients with IgA nephropathy (IgAN) who had minimal proteinuria upon the onset. The aim of this study was to analyze the clinicopathological features and the prognostic factors in patients with IgA nephropathy. Data of patients that had their first renal biopsy in our hospital and were diagnosed with primary IgAN with proteinuria <1 g/d from January 1995 to December 2014 were retrospectively examined. Clinical records of the clinicopathological features, renal function, and proteinuria were collected and investigated. The factors affecting the renal function and proteinuria were analyzed by Cox regression. The predictive efficiencies of clinical and pathological models were evaluated by Harrell concordance index (C-index). A total of 506 patients with IgA nephropathy were included in this study. (1) Baseline proteinuria greater than 0.5 g/d was positively associated with Oxford M, S, and T lesions. eGFR less than 90 mL/min/1.73 m2 were positively associated with Oxford T. (2) In the follow-up with a median of 50 months, 82 patients (16.2%) achieved complete clinical remission (CCR), whereas 54 patients (10.6%) showed an increase in creatinine by more than 50% (not progressing to end-stage renal disease). The cumulative proportion of creatinine increased >50%, and the values obtained by life-table analysis in 10, 15, and 20 years were 15%, 21%, and 22%, respectively. Significant differences were found in baseline age, proteinuria, and Oxford T between the group of creatinine increase >50% and the CCR group. (4) Multivariate COX regression showed that baseline age and proteinuria > 0.5 g/d were independent risk factors of adverse outcome. C-index suggested that the clinical model was more effective than the pathological models in predicting endpoint events. (5) Effect of the mean value during the follow-up on adverse endpoint events: Multivariate COX regression found that the mean proteinuria during follow-up was an independent influencing factor for the increase of creatinine by more than 50%. (1) Proteinuria > 0.5g/d and eGFR < 90 mL/min/1.73 m2 may predict more severe pathological changes; (2) With the increase in age and baseline proteinuria, the risks of adverse endpoint events would increase significantly; (3) Pathology could roughly predict the adverse endpoint events but is less efficient than the clinical indicators; (4) Data during follow-up suggested that the patients should regularly test their renal function and proactively control their proteinuria. © 2018 The Author(s). Published by S. Karger AG, Basel.
PhyloGibbs-MP: Module Prediction and Discriminative Motif-Finding by Gibbs Sampling
Siddharthan, Rahul
2008-01-01
PhyloGibbs, our recent Gibbs-sampling motif-finder, takes phylogeny into account in detecting binding sites for transcription factors in DNA and assigns posterior probabilities to its predictions obtained by sampling the entire configuration space. Here, in an extension called PhyloGibbs-MP, we widen the scope of the program, addressing two major problems in computational regulatory genomics. First, PhyloGibbs-MP can localise predictions to small, undetermined regions of a large input sequence, thus effectively predicting cis-regulatory modules (CRMs) ab initio while simultaneously predicting binding sites in those modules—tasks that are usually done by two separate programs. PhyloGibbs-MP's performance at such ab initio CRM prediction is comparable with or superior to dedicated module-prediction software that use prior knowledge of previously characterised transcription factors. Second, PhyloGibbs-MP can predict motifs that differentiate between two (or more) different groups of regulatory regions, that is, motifs that occur preferentially in one group over the others. While other “discriminative motif-finders” have been published in the literature, PhyloGibbs-MP's implementation has some unique features and flexibility. Benchmarks on synthetic and actual genomic data show that this algorithm is successful at enhancing predictions of differentiating sites and suppressing predictions of common sites and compares with or outperforms other discriminative motif-finders on actual genomic data. Additional enhancements include significant performance and speed improvements, the ability to use “informative priors” on known transcription factors, and the ability to output annotations in a format that can be visualised with the Generic Genome Browser. In stand-alone motif-finding, PhyloGibbs-MP remains competitive, outperforming PhyloGibbs-1.0 and other programs on benchmark data. PMID:18769735
Know your data: understanding implicit usage versus explicit action in video content classification
NASA Astrophysics Data System (ADS)
Yew, Jude; Shamma, David A.
2011-02-01
In this paper, we present a method for video category classification using only social metadata from websites like YouTube. In place of content analysis, we utilize communicative and social contexts surrounding videos as a means to determine a categorical genre, e.g. Comedy, Music. We hypothesize that video clips belonging to different genre categories would have distinct signatures and patterns that are reflected in their collected metadata. In particular, we define and describe social metadata as usage or action to aid in classification. We trained a Naive Bayes classifier to predict categories from a sample of 1,740 YouTube videos representing the top five genre categories. Using just a small number of the available metadata features, we compare the classifications produced by our Naive Bayes classifier with those provided by the uploader of that particular video. Compared to random predictions with the YouTube data (21% accurate), our classifier attained a mediocre 33% accuracy in predicting video genres. However, we found that the accuracy of our classifier significantly improves by nominal factoring of the explicit data features. By factoring the ratings of the videos in the dataset, the classifier was able to accurately predict the genres of 75% of the videos. We argue that the patterns of social activity found in the metadata are not just meaningful in their own right, but are indicative of the meaning of the shared video content. The results presented by this project represents a first step in investigating the potential meaning and significance of social metadata and its relation to the media experience.
Bossi, Flavia; Fan, Jue; Xiao, Jun; Chandra, Lilyana; Shen, Max; Dorone, Yanniv; Wagner, Doris; Rhee, Seung Y
2017-06-26
The molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. To identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation. We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.
Life on the boundary: Environmental factors as drivers of habitat distribution in the littoral zone
NASA Astrophysics Data System (ADS)
Cefalì, Maria Elena; Cebrian, Emma; Chappuis, Eglantine; Pinedo, Susana; Terradas, Marc; Mariani, Simone; Ballesteros, Enric
2016-04-01
The boundary between land and sea, i.e. the littoral zone, is home to a large number of habitats whose distribution is primarily driven by the distance to the sea level but also by other environmental factors such as littoral's geomorphological features, wave exposure, water temperature or orientation. Here we explore the relative importance of those major environmental factors that drive the presence of littoral rocky habitats along 1100 Km of Catalonia's shoreline (Spain, NW Mediterranean) by using Geographic Information Systems and Generalized Linear Models. The distribution of mediolittoral and upper infralittoral habitats responded to different environmental factors. Mediolittoral habitats showed regional differences drawn by sea-water temperature and substrate type. Wave exposure (hydrodynamism), slope and geological features were only relevant to those mediolittoral habitats with specific environmental needs. We did not find any regional pattern of distribution in upper infralittoral habitats, and selected factors only played a moderate role in habitat distribution at the local scale. This study shows for the first time that environmental factors determining habitat distribution differ within the mediolittoral and the upper infralittoral zones and provides the basis for further development of models oriented at predicting the distribution of littoral marine habitats.
Bayesian network ensemble as a multivariate strategy to predict radiation pneumonitis risk
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Sangkyu, E-mail: sangkyu.lee@mail.mcgill.ca; Ybarra, Norma; Jeyaseelan, Krishinima
2015-05-15
Purpose: Prediction of radiation pneumonitis (RP) has been shown to be challenging due to the involvement of a variety of factors including dose–volume metrics and radiosensitivity biomarkers. Some of these factors are highly correlated and might affect prediction results when combined. Bayesian network (BN) provides a probabilistic framework to represent variable dependencies in a directed acyclic graph. The aim of this study is to integrate the BN framework and a systems’ biology approach to detect possible interactions among RP risk factors and exploit these relationships to enhance both the understanding and prediction of RP. Methods: The authors studied 54 nonsmall-cellmore » lung cancer patients who received curative 3D-conformal radiotherapy. Nineteen RP events were observed (common toxicity criteria for adverse events grade 2 or higher). Serum concentration of the following four candidate biomarkers were measured at baseline and midtreatment: alpha-2-macroglobulin, angiotensin converting enzyme (ACE), transforming growth factor, interleukin-6. Dose-volumetric and clinical parameters were also included as covariates. Feature selection was performed using a Markov blanket approach based on the Koller–Sahami filter. The Markov chain Monte Carlo technique estimated the posterior distribution of BN graphs built from the observed data of the selected variables and causality constraints. RP probability was estimated using a limited number of high posterior graphs (ensemble) and was averaged for the final RP estimate using Bayes’ rule. A resampling method based on bootstrapping was applied to model training and validation in order to control under- and overfit pitfalls. Results: RP prediction power of the BN ensemble approach reached its optimum at a size of 200. The optimized performance of the BN model recorded an area under the receiver operating characteristic curve (AUC) of 0.83, which was significantly higher than multivariate logistic regression (0.77), mean heart dose (0.69), and a pre-to-midtreatment change in ACE (0.66). When RP prediction was made only with pretreatment information, the AUC ranged from 0.76 to 0.81 depending on the ensemble size. Bootstrap validation of graph features in the ensemble quantified confidence of association between variables in the graphs where ten interactions were statistically significant. Conclusions: The presented BN methodology provides the flexibility to model hierarchical interactions between RP covariates, which is applied to probabilistic inference on RP. The authors’ preliminary results demonstrate that such framework combined with an ensemble method can possibly improve prediction of RP under real-life clinical circumstances such as missing data or treatment plan adaptation.« less
ERIC Educational Resources Information Center
Alci, Bulent
2015-01-01
This study aims to determine the predictive and explanatory model in terms of university students' academic performance in "General Chemistry" course and their motivational features. The participants were 169 university students in the 1st grade at university. Of the participants, 132 were female and 37 were male students. Regarding…
ERIC Educational Resources Information Center
Lawford, Heather L.; Ramey, Heather L.; Rose-Krasnor, Linda; Proctor, Andrea S.
2012-01-01
The purpose of this study is to examine the factors involved in predicting successful development after an intensive exchange experience in adolescence. Specifically, we considered the eight positive features, as conceptualized by Eccles and Gootman (2002), as well as the amount of input youth had into their exchange experience as predictors of…
Saliency predicts change detection in pictures of natural scenes.
Wright, Michael J
2005-01-01
It has been proposed that the visual system encodes the salience of objects in the visual field in an explicit two-dimensional map that guides visual selective attention. Experiments were conducted to determine whether salience measurements applied to regions of pictures of outdoor scenes could predict the detection of changes in those regions. To obtain a quantitative measure of change detection, observers located changes in pairs of colour pictures presented across an interstimulus interval (ISI). Salience measurements were then obtained from different observers for image change regions using three independent methods, and all were positively correlated with change detection. Factor analysis extracted a single saliency factor that accounted for 62% of the variance contained in the four measures. Finally, estimates of the magnitude of the image change in each picture pair were obtained, using nine separate visual filters representing low-level vision features (luminance, colour, spatial frequency, orientation, edge density). None of the feature outputs was significantly associated with change detection or saliency. On the other hand it was shown that high-level (structural) properties of the changed region were related to saliency and to change detection: objects were more salient than shadows and more detectable when changed.
Burger, Birgitta; Thompson, Marc R.; Luck, Geoff; Saarikallio, Suvi; Toiviainen, Petri
2013-01-01
Music makes us move. Several factors can affect the characteristics of such movements, including individual factors or musical features. For this study, we investigated the effect of rhythm- and timbre-related musical features as well as tempo on movement characteristics. Sixty participants were presented with 30 musical stimuli representing different styles of popular music, and instructed to move along with the music. Optical motion capture was used to record participants’ movements. Subsequently, eight movement features and four rhythm- and timbre-related musical features were computationally extracted from the data, while the tempo was assessed in a perceptual experiment. A subsequent correlational analysis revealed that, for instance, clear pulses seemed to be embodied with the whole body, i.e., by using various movement types of different body parts, whereas spectral flux and percussiveness were found to be more distinctly related to certain body parts, such as head and hand movement. A series of ANOVAs with the stimuli being divided into three groups of five stimuli each based on the tempo revealed no significant differences between the groups, suggesting that the tempo of our stimuli set failed to have an effect on the movement features. In general, the results can be linked to the framework of embodied music cognition, as they show that body movements are used to reflect, imitate, and predict musical characteristics. PMID:23641220
Burger, Birgitta; Thompson, Marc R; Luck, Geoff; Saarikallio, Suvi; Toiviainen, Petri
2013-01-01
Music makes us move. Several factors can affect the characteristics of such movements, including individual factors or musical features. For this study, we investigated the effect of rhythm- and timbre-related musical features as well as tempo on movement characteristics. Sixty participants were presented with 30 musical stimuli representing different styles of popular music, and instructed to move along with the music. Optical motion capture was used to record participants' movements. Subsequently, eight movement features and four rhythm- and timbre-related musical features were computationally extracted from the data, while the tempo was assessed in a perceptual experiment. A subsequent correlational analysis revealed that, for instance, clear pulses seemed to be embodied with the whole body, i.e., by using various movement types of different body parts, whereas spectral flux and percussiveness were found to be more distinctly related to certain body parts, such as head and hand movement. A series of ANOVAs with the stimuli being divided into three groups of five stimuli each based on the tempo revealed no significant differences between the groups, suggesting that the tempo of our stimuli set failed to have an effect on the movement features. In general, the results can be linked to the framework of embodied music cognition, as they show that body movements are used to reflect, imitate, and predict musical characteristics.
[Geographical distribution of the Serum creatinine reference values of healthy adults].
Wei, De-Zhi; Ge, Miao; Wang, Cong-Xia; Lin, Qian-Yi; Li, Meng-Jiao; Li, Peng
2016-11-20
To explore the relationship between serum creatinine (Scr) reference values in healthy adults and geographic factors and provide evidence for establishing Scr reference values in different regions. We collected 29 697 Scr reference values from healthy adults measured by 347 medical facilities from 23 provinces, 4 municipalities and 5 autonomous regions. We chose 23 geographical factors and analyzed their correlation with Scr reference values to identify the factors correlated significantly with Scr reference values. According to the Principal component analysis and Ridge regression analysis, two predictive models were constructed and the optimal model was chosen after comparison of the two model's fitting degree of predicted results and measured results. The distribution map of Scr reference values was drawn using the Kriging interpolation method. Seven geographic factors, including latitude, annual sunshine duration, annual average temperature, annual average relative humidity, annual precipitation, annual temperature range and topsoil (silt) cation exchange capacity were found to correlate significantly with Scr reference values. The overall distribution of Scr reference values featured a pattern that the values were high in the south and low in the north, varying consistently with the latitude change. The data of the geographic factors in a given region allows the prediction of the Scr values in healthy adults. Analysis of these geographical factors can facilitate the determination of the reference values specific to a region to improve the accuracy for clinical diagnoses.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features
Mohammad-Noori, Morteza; Beer, Michael A.
2014-01-01
Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408
Enhanced regulatory sequence prediction using gapped k-mer features.
Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A
2014-07-01
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.
2011-01-01
Background Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. Results The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%), 97.6% (sd = 2.8%) and 90.8% (sd = 5.5%) and average specificities of: 93.6% (sd = 4.1%), 99% (sd = 2.2%) and 79.4% (sd = 9.8%) in 100 independent two-fold cross-validations. Conclusions We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific can be interpreted as a sub-mode and retained for further analysis to identify potential biomarkers. As opposed to standard matrix factorization methods this can be achieved on a sample (experiment)-by-sample basis. Postulating one or more components with indifferent features enables their removal from disease and control specific components on a sample-by-sample basis. This yields selected components with reduced complexity and generally, it increases prediction accuracy. PMID:22208882
Predictive information processing in music cognition. A critical review.
Rohrmeier, Martin A; Koelsch, Stefan
2012-02-01
Expectation and prediction constitute central mechanisms in the perception and cognition of music, which have been explored in theoretical and empirical accounts. We review the scope and limits of theoretical accounts of musical prediction with respect to feature-based and temporal prediction. While the concept of prediction is unproblematic for basic single-stream features such as melody, it is not straight-forward for polyphonic structures or higher-order features such as formal predictions. Behavioural results based on explicit and implicit (priming) paradigms provide evidence of priming in various domains that may reflect predictive behaviour. Computational learning models, including symbolic (fragment-based), probabilistic/graphical, or connectionist approaches, provide well-specified predictive models of specific features and feature combinations. While models match some experimental results, full-fledged music prediction cannot yet be modelled. Neuroscientific results regarding the early right-anterior negativity (ERAN) and mismatch negativity (MMN) reflect expectancy violations on different levels of processing complexity, and provide some neural evidence for different predictive mechanisms. At present, the combinations of neural and computational modelling methodologies are at early stages and require further research. Copyright © 2012 Elsevier B.V. All rights reserved.
Reliability of vascular geometry factors derived from clinical MRA
NASA Astrophysics Data System (ADS)
Bijari, Payam B.; Antiga, Luca; Steinman, David A.
2009-02-01
Recent work from our group has demonstrated that the amount of disturbed flow at the carotid bifurcation, believed to be a local risk factor for carotid atherosclerosis, can be predicted from luminal geometric factors. The next step along the way to a large-scale retrospective or prospective imaging study of such local risk factors for atherosclerosis is to investigate whether these geometric features are reproducible and accurate from routine 3D contrast-enhanced magnetic resonance angiography (CEMRA) using a fast and practical method of extraction. Motivated by this fact, we examined the reproducibility of multiple geometric features that are believed important in atherosclerosis risk assessment. We reconstructed three-dimensional carotid bifurcations from 15 clinical study participants who had previously undergone baseline and repeat CEMRA acquisitions. Certain geometric factors were extracted and compared between the baseline and the repeat scan. As the spatial resolution of the CEMRA data was noticeably coarse and anisotropic, we also investigated whether this might affect the measurement of the same geometric risk factors by simulating the CEMRA acquisition for 15 normal carotid bifurcations previously acquired at high resolution. Our results show that the extracted geometric factors are reproducible and faithful, with intra-subject uncertainties well below inter-subject variabilities. More importantly, these geometric risk factors can be extracted consistently and quickly for potential use as disturbed flow predictors.
Finding new perovskite halides via machine learning
Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; ...
2016-04-26
Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach toward rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning, henceforth referred to as ML) via building a support vectormore » machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX 3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br, or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 185 experimentally known ABX 3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor, and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. As a result, the trained and validated models then predict, with a high degree of confidence, several novel ABX 3 compositions with perovskite crystal structure.« less
Finding new perovskite halides via machine learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho
Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach toward rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning, henceforth referred to as ML) via building a support vectormore » machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX 3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br, or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 185 experimentally known ABX 3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor, and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. As a result, the trained and validated models then predict, with a high degree of confidence, several novel ABX 3 compositions with perovskite crystal structure.« less
MacLean, Mary H; Giesbrecht, Barry
2015-07-01
Task-relevant and physically salient features influence visual selective attention. In the present study, we investigated the influence of task-irrelevant and physically nonsalient reward-associated features on visual selective attention. Two hypotheses were tested: One predicts that the effects of target-defining task-relevant and task-irrelevant features interact to modulate visual selection; the other predicts that visual selection is determined by the independent combination of relevant and irrelevant feature effects. These alternatives were tested using a visual search task that contained multiple targets, placing a high demand on the need for selectivity, and that was data-limited and required unspeeded responses, emphasizing early perceptual selection processes. One week prior to the visual search task, participants completed a training task in which they learned to associate particular colors with a specific reward value. In the search task, the reward-associated colors were presented surrounding targets and distractors, but were neither physically salient nor task-relevant. In two experiments, the irrelevant reward-associated features influenced performance, but only when they were presented in a task-relevant location. The costs induced by the irrelevant reward-associated features were greater when they oriented attention to a target than to a distractor. In a third experiment, we examined the effects of selection history in the absence of reward history and found that the interaction between task relevance and selection history differed, relative to when the features had previously been associated with reward. The results indicate that under conditions that demand highly efficient perceptual selection, physically nonsalient task-irrelevant and task-relevant factors interact to influence visual selective attention.
NASA Astrophysics Data System (ADS)
Cao, Kunlin; Bhagalia, Roshni; Sood, Anup; Brogi, Edi; Mellinghoff, Ingo K.; Larson, Steven M.
2015-03-01
Positron emission tomography (PET) using uorodeoxyglucose (18F-FDG) is commonly used in the assessment of breast lesions by computing voxel-wise standardized uptake value (SUV) maps. Simple metrics derived from ensemble properties of SUVs within each identified breast lesion are routinely used for disease diagnosis. The maximum SUV within the lesion (SUVmax) is the most popular of these metrics. However these simple metrics are known to be error-prone and are susceptible to image noise. Finding reliable SUV map-based features that correlate to established molecular phenotypes of breast cancer (viz. estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) expression) will enable non-invasive disease management. This study investigated 36 SUV features based on first and second order statistics, local histograms and texture of segmented lesions to predict ER and PR expression in 51 breast cancer patients. True ER and PR expression was obtained via immunohistochemistry (IHC) of tissue samples from each lesion. A supervised learning, adaptive boosting-support vector machine (AdaBoost-SVM), framework was used to select a subset of features to classify breast lesions into distinct phenotypes. Performance of the trained multi-feature classifier was compared against the baseline single-feature SUVmax classifier using receiver operating characteristic (ROC) curves. Results show that texture features encoding local lesion homogeneity extracted from gray-level co-occurrence matrices are the strongest discriminator of lesion ER expression. In particular, classifiers including these features increased prediction accuracy from 0.75 (baseline) to 0.82 and the area under the ROC curve from 0.64 (baseline) to 0.75.
Technology adoption and prediction tools for everyday technologies aimed at people with dementia.
Chaurasia, Priyanka; McClean, Sally I; Nugent, Chris D; Cleland, Ian; Shuai Zhang; Donnelly, Mark P; Scotney, Bryan W; Sanders, Chelsea; Smith, Ken; Norton, Maria C; Tschanz, JoAnn
2016-08-01
A wide range of assistive technologies have been developed to support the elderly population with the goal of promoting independent living. The adoption of these technology based solutions is, however, critical to their overarching success. In our previous research we addressed the significance of modelling user adoption to reminding technologies based on a range of physical, environmental and social factors. In our current work we build upon our initial modeling through considering a wider range of computational approaches and identify a reduced set of relevant features that can aid the medical professionals to make an informed choice of whether to recommend the technology or not. The adoption models produced were evaluated on a multi-criterion basis: in terms of prediction performance, robustness and bias in relation to two types of errors. The effects of data imbalance on prediction performance was also considered. With handling the imbalance in the dataset, a 16 feature-subset was evaluated consisting of 173 instances, resulting in the ability to differentiate between adopters and non-adopters with an overall accuracy of 99.42 %.
Gurarie, David; King, Charles H; Yoon, Nara; Li, Emily
2016-08-04
Schistosoma parasites sustain a complex transmission process that cycles between a definitive human host, two free-swimming larval stages, and an intermediate snail host. Multiple factors modify their transmission and affect their control, including heterogeneity in host populations and environment, the aggregated distribution of human worm burdens, and features of parasite reproduction and host snail biology. Because these factors serve to enhance local transmission, their inclusion is important in attempting accurate quantitative prediction of the outcomes of schistosomiasis control programs. However, their inclusion raises many mathematical and computational challenges. To address these, we have recently developed a tractable stratified worm burden (SWB) model that occupies an intermediate place between simpler deterministic mean worm burden models and the very computationally-intensive, autonomous agent models. To refine the accuracy of model predictions, we modified an earlier version of the SWB by incorporating factors representing essential in-host biology (parasite mating, aggregation, density-dependent fecundity, and random egg-release) into demographically structured host communities. We also revised the snail component of the transmission model to reflect a saturable form of human-to-snail transmission. The new model allowed us to realistically simulate overdispersed egg-test results observed in individual-level field data. We further developed a Bayesian-type calibration methodology that accounted for model and data uncertainties. The new model methodology was applied to multi-year, individual-level field data on S. haematobium infections in coastal Kenya. We successfully derived age-specific estimates of worm burden distributions and worm fecundity and crowding functions for children and adults. Estimates from the new SWB model were compared with those from the older, simpler SWB with some substantial differences noted. We validated our new SWB estimates in prediction of drug treatment-based control outcomes for a typical Kenyan community. The new version of the SWB model provides a better tool to predict the outcomes of ongoing schistosomiasis control programs. It reflects parasite features that augment and perpetuate transmission, while it also readily incorporates differences in diagnostic testing and human sub-population differences in treatment coverage. Once extended to other Schistosoma species and transmission environments, it will provide a useful and efficient tool for planning control and elimination strategies.
Desbordes, Paul; Ruan, Su; Modzelewski, Romain; Pineau, Pascal; Vauclin, Sébastien; Gouel, Pierrick; Michel, Pierre; Di Fiore, Frédéric; Vera, Pierre; Gardin, Isabelle
2017-01-01
In oncology, texture features extracted from positron emission tomography with 18-fluorodeoxyglucose images (FDG-PET) are of increasing interest for predictive and prognostic studies, leading to several tens of features per tumor. To select the best features, the use of a random forest (RF) classifier was investigated. Sixty-five patients with an esophageal cancer treated with a combined chemo-radiation therapy were retrospectively included. All patients underwent a pretreatment whole-body FDG-PET. The patients were followed for 3 years after the end of the treatment. The response assessment was performed 1 month after the end of the therapy. Patients were classified as complete responders and non-complete responders. Sixty-one features were extracted from medical records and PET images. First, Spearman's analysis was performed to eliminate correlated features. Then, the best predictive and prognostic subsets of features were selected using a RF algorithm. These results were compared to those obtained by a Mann-Whitney U test (predictive study) and a univariate Kaplan-Meier analysis (prognostic study). Among the 61 initial features, 28 were not correlated. From these 28 features, the best subset of complementary features found using the RF classifier to predict response was composed of 2 features: metabolic tumor volume (MTV) and homogeneity from the co-occurrence matrix. The corresponding predictive value (AUC = 0.836 ± 0.105, Se = 82 ± 9%, Sp = 91 ± 12%) was higher than the best predictive results found using the Mann-Whitney test: busyness from the gray level difference matrix (P < 0.0001, AUC = 0.810, Se = 66%, Sp = 88%). The best prognostic subset found using RF was composed of 3 features: MTV and 2 clinical features (WHO status and nutritional risk index) (AUC = 0.822 ± 0.059, Se = 79 ± 9%, Sp = 95 ± 6%), while no feature was significantly prognostic according to the Kaplan-Meier analysis. The RF classifier can improve predictive and prognostic values compared to the Mann-Whitney U test and the univariate Kaplan-Meier survival analysis when applied to several tens of features in a limited patient database.
Prediction of essential proteins based on gene expression programming.
Zhong, Jiancheng; Wang, Jianxin; Peng, Wei; Zhang, Zhen; Pan, Yi
2013-01-01
Essential proteins are indispensable for cell survive. Identifying essential proteins is very important for improving our understanding the way of a cell working. There are various types of features related to the essentiality of proteins. Many methods have been proposed to combine some of them to predict essential proteins. However, it is still a big challenge for designing an effective method to predict them by integrating different features, and explaining how these selected features decide the essentiality of protein. Gene expression programming (GEP) is a learning algorithm and what it learns specifically is about relationships between variables in sets of data and then builds models to explain these relationships. In this work, we propose a GEP-based method to predict essential protein by combing some biological features and topological features. We carry out experiments on S. cerevisiae data. The experimental results show that the our method achieves better prediction performance than those methods using individual features. Moreover, our method outperforms some machine learning methods and performs as well as a method which is obtained by combining the outputs of eight machine learning methods. The accuracy of predicting essential proteins can been improved by using GEP method to combine some topological features and biological features.
Houshyarifar, Vahid; Chehel Amirani, Mehdi
2016-08-12
In this paper we present a method to predict Sudden Cardiac Arrest (SCA) with higher order spectral (HOS) and linear (Time) features extracted from heart rate variability (HRV) signal. Predicting the occurrence of SCA is important in order to avoid the probability of Sudden Cardiac Death (SCD). This work is a challenge to predict five minutes before SCA onset. The method consists of four steps: pre-processing, feature extraction, feature reduction, and classification. In the first step, the QRS complexes are detected from the electrocardiogram (ECG) signal and then the HRV signal is extracted. In second step, bispectrum features of HRV signal and time-domain features are obtained. Six features are extracted from bispectrum and two features from time-domain. In the next step, these features are reduced to one feature by the linear discriminant analysis (LDA) technique. Finally, KNN and support vector machine-based classifiers are used to classify the HRV signals. We used two database named, MIT/BIH Sudden Cardiac Death (SCD) Database and Physiobank Normal Sinus Rhythm (NSR). In this work we achieved prediction of SCD occurrence for six minutes before the SCA with the accuracy over 91%.
Moon, Hee Jung; Kwak, Jin Young; Choi, Yoon Seong; Kim, Eun-Kyung
2012-03-01
The aim of this study was to investigate the factors for considering surgery on thyroid nodules that had non-diagnostic results on two consecutive cytology examinations. A total of 104 thyroid nodules with two consecutive non-diagnostic cytology examinations in 104 patients were investigated. Nodules with one or more suspicious ultrasonography (US) features of marked hypoechogenicity, a not well defined margin, microcalcifications, or a taller-than-wide shape were assessed as sonographically suspicious. Those without any suspicious features were assessed as sonographically benign. The clinicopathologic characteristics of patients and US features of the nodules were compared according to malignancy and benignity. The odds ratio for predicting malignancy was calculated. Altogether, 12 nodules were malignant, and 92 were benign. Age, sex, nodule size, and solidness were not associated with malignancy (P = 0.73, 0.92, 0.48, and 0.73, respectively). The malignancy rate of sonographically suspicious nodules was 25.7%, higher than the 4.3% of sonographically benign nodules (P = 0.002). The odds ratio of sonographically suspicious nodules for predicting malignancy was 16.01 (95% confidence interval 2.36-108.54, P = 0.005). Based on sonographic features, surgery can be performed selectively on nodules with two consecutive non-diagnostic cytology results.
Crisan, Dana; Grigorescu, Mircea Dan; Radu, Corina; Suciu, Alina; Grigorescu, Mircea
2017-04-01
One of the multiple factors contributing to virological response in chronic hepatitis C (CHC) is interferon-gamma-inducible protein-10 (IP-10). Its level reflects the status of interferon-stimulated genes, which in turn is associated with virological response to antiviral therapy. The aim of this study was to evaluate the role of serum IP-10 levels on sustained virological response (SVR) and the association of this parameter with insulin resistance (IR) and liver histology. Two hundred and three consecutive biopsy proven CHC patients were included in the study. Serum levels of IP-10 were determined using ELISA method. IR was evaluated by homeostasis model assessment-IR (HOMA-IR). Histological features were assessed invasively by liver biopsy and noninvasively using FibroTest, ActiTest and SteatoTest. Predictive factors for SVR and their interrelations were assessed. A cut-off value for IP-10 of 392 pg/ml was obtained to discriminate between responders and non-responders. SVR was obtained in 107 patients (52.70%). Area under the receiver operating characteristic curve for SVR was 0.875 with a sensitivity of 91.6 per cent, specificity 74.7 per cent, positive predictive value 80.3 per cent and negative predictive value 88.7 per cent. Higher values of IP-10 were associated with increasing stages of fibrosis (P<0.01) and higher grades of inflammation (P=0.02, P=0.07) assessed morphologically and noninvasively through FibroTest and ActiTest. Significant steatosis and IR were also associated with increased levels of IP-10 (P=0.01 and P=0.02). In multivariate analysis, IP-10 levels and fibrosis stages were independently associated with SVR. Our findings showed that the assessment of serum IP-10 level could be a predictive factor for SVR and it was associated with fibrosis, necroinflammatory activity, significant steatosis and IR in patients with chronic HCV infection.
Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T
2017-10-01
Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.
Zhu, Xiang; Srivastava, Diane S.; Martin, Kathy
2012-01-01
Lewis's Woodpecker (Melanerpes lewis) has experienced population declines in both Canada and the United States and in 2010 was assigned a national listing of threatened in Canada. We conducted a two-year study (2004–2005) of this species at its northern range limit, the South Okanagan Valley in British Columbia, Canada. Our main objective was to determine whether the habitat features that influenced nest-site selection also predicted nest success, or whether other factors (e.g. cavity dimensions, clutch initiation date or time of season) were more important. Nest tree decay class, density of suitable cavities and total basal area of large trees were the best predictors of nest-site selection, but these factors were unrelated to nesting success. Estimates of demographic parameters (mean ± SE) included daily nest survival rate (0.988±0.003, years combined), nest success (0.52±0.08), clutch size (5.00±0.14 eggs), female fledglings per successful nest (1.31±0.11), and annual productivity (0.68±0.12 female fledglings per nest per year). Although higher nest survival was associated with both early and late initiated clutches, early-initiated clutches allowed birds to gain the highest annual productivity as early clutches were larger. Nests in deep cavities with small entrances experienced lower predation risk especially during the peak period of nest predation. We concluded that nest-site selection can be predicted by a number of easily measured habitat variables, whereas nest success depended on complicated ecological interactions among nest predators, breeding behaviors, and cavity features. Thus, habitat-based conservation strategies should also consider ecological factors that may not be well predicted by habitat. PMID:23028525
Ma, Xin; Guo, Jing; Sun, Xiao
2015-01-01
The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.
Image Feature Types and Their Predictions of Aesthetic Preference and Naturalness
Ibarra, Frank F.; Kardan, Omid; Hunter, MaryCarol R.; Kotabe, Hiroki P.; Meyer, Francisco A. C.; Berman, Marc G.
2017-01-01
Previous research has investigated ways to quantify visual information of a scene in terms of a visual processing hierarchy, i.e., making sense of visual environment by segmentation and integration of elementary sensory input. Guided by this research, studies have developed categories for low-level visual features (e.g., edges, colors), high-level visual features (scene-level entities that convey semantic information such as objects), and how models of those features predict aesthetic preference and naturalness. For example, in Kardan et al. (2015a), 52 participants provided aesthetic preference and naturalness ratings, which are used in the current study, for 307 images of mixed natural and urban content. Kardan et al. (2015a) then developed a model using low-level features to predict aesthetic preference and naturalness and could do so with high accuracy. What has yet to be explored is the ability of higher-level visual features (e.g., horizon line position relative to viewer, geometry of building distribution relative to visual access) to predict aesthetic preference and naturalness of scenes, and whether higher-level features mediate some of the association between the low-level features and aesthetic preference or naturalness. In this study we investigated these relationships and found that low- and high- level features explain 68.4% of the variance in aesthetic preference ratings and 88.7% of the variance in naturalness ratings. Additionally, several high-level features mediated the relationship between the low-level visual features and aaesthetic preference. In a multiple mediation analysis, the high-level feature mediators accounted for over 50% of the variance in predicting aesthetic preference. These results show that high-level visual features play a prominent role predicting aesthetic preference, but do not completely eliminate the predictive power of the low-level visual features. These strong predictors provide powerful insights for future research relating to landscape and urban design with the aim of maximizing subjective well-being, which could lead to improved health outcomes on a larger scale. PMID:28503158
Genotypic and phenotypic predictors of inflammation in patients with chronic kidney disease.
Luttropp, Karin; Debowska, Malgorzata; Lukaszuk, Tomasz; Bobrowski, Leon; Carrero, Juan Jesus; Qureshi, Abdul Rashid; Stenvinkel, Peter; Lindholm, Bengt; Waniewski, Jacek; Nordfors, Louise
2016-12-01
In complex diseases such as chronic kidney disease (CKD), the risk of clinical complications is determined by interactions between phenotypic and genotypic factors. However, clinical epidemiological studies rarely attempt to analyse the combined effect of large numbers of phenotype and genotype features. We have recently shown that the relaxed linear separability (RLS) model of feature selection can address such complex issues. Here, it is applied to identify risk factors for inflammation in CKD. The RLS model was applied in 225 CKD stage 5 patients sampled in conjunction with dialysis initiation. Fifty-seven anthropometric or biochemical measurements and 79 genetic polymorphisms were entered into the model. The model was asked to identify phenotypes and genotypes that, when combined, could separate inflamed from non-inflamed patients. Inflammation was defined as a high-sensitivity C-reactive protein concentration above the median (5 mg/L). Among the 60 genotypic and phenotypic features predicting inflammation, 31 were genetic. Among the 10 strongest predictors of inflammation, 8 were single nucleotide polymorphisms located in the NAMPT, CIITA, BMP2 and PIK3CB genes, whereas fibrinogen and bone mineral density were the only phenotypic biomarkers. These results indicate a larger involvement of hereditary factors in inflammation than might have been expected and suggest that inclusion of genotype features in risk assessment studies is critical. The RLS model demonstrates that inflammation in CKD is determined by an extensive panel of factors and may prove to be a suitable tool that could enable a much-needed multifactorial approach as opposed to the commonly utilized single-factor analysis. © The Author 2016. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
NASA Astrophysics Data System (ADS)
Ma, Chuang; Bao, Zhong-Kui; Zhang, Hai-Feng
2017-10-01
So far, many network-structure-based link prediction methods have been proposed. However, these methods only highlight one or two structural features of networks, and then use the methods to predict missing links in different networks. The performances of these existing methods are not always satisfied in all cases since each network has its unique underlying structural features. In this paper, by analyzing different real networks, we find that the structural features of different networks are remarkably different. In particular, even in the same network, their inner structural features are utterly different. Therefore, more structural features should be considered. However, owing to the remarkably different structural features, the contributions of different features are hard to be given in advance. Inspired by these facts, an adaptive fusion model regarding link prediction is proposed to incorporate multiple structural features. In the model, a logistic function combing multiple structural features is defined, then the weight of each feature in the logistic function is adaptively determined by exploiting the known structure information. Last, we use the "learnt" logistic function to predict the connection probabilities of missing links. According to our experimental results, we find that the performance of our adaptive fusion model is better than many similarity indices.
Yu, Ya-Hui; Xia, Wei-Xiong; Shi, Jun-Li; Ma, Wen-Juan; Li, Yong; Ye, Yan-Fang; Liang, Hu; Ke, Liang-Ru; Lv, Xing; Yang, Jing; Xiang, Yan-Qun; Guo, Xiang
2016-06-29
For patients with nasopharyngeal carcinoma (NPC) who undergo re-irradiation with intensity-modulated radiotherapy (IMRT), lethal nasopharyngeal necrosis (LNN) is a severe late adverse event. The purpose of this study was to identify risk factors for LNN and develop a model to predict LNN after radical re-irradiation with IMRT in patients with recurrent NPC. Patients who underwent radical re-irradiation with IMRT for locally recurrent NPC between March 2001 and December 2011 and who had no evidence of distant metastasis were included in this study. Clinical characteristics, including recurrent carcinoma conditions and dosimetric features, were evaluated as candidate risk factors for LNN. Logistic regression analysis was used to identify independent risk factors and construct the predictive scoring model. Among 228 patients enrolled in this study, 204 were at risk of developing LNN based on risk analysis. Of the 204 patients treated, 31 (15.2%) developed LNN. Logistic regression analysis showed that female sex (P = 0.008), necrosis before re-irradiation (P = 0.008), accumulated total prescription dose to the gross tumor volume (GTV) ≥145.5 Gy (P = 0.043), and recurrent tumor volume ≥25.38 cm(3) (P = 0.009) were independent risk factors for LNN. A model to predict LNN was then constructed that included these four independent risk factors. A model that includes sex, necrosis before re-irradiation, accumulated total prescription dose to GTV, and recurrent tumor volume can effectively predict the risk of developing LNN in NPC patients who undergo radical re-irradiation with IMRT.
Gao, Yu-Fei; Li, Bi-Qing; Cai, Yu-Dong; Feng, Kai-Yan; Li, Zhan-Dong; Jiang, Yang
2013-01-27
Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.
NASA Astrophysics Data System (ADS)
Valdes, Gilmer; Solberg, Timothy D.; Heskel, Marina; Ungar, Lyle; Simone, Charles B., II
2016-08-01
To develop a patient-specific ‘big data’ clinical decision tool to predict pneumonitis in stage I non-small cell lung cancer (NSCLC) patients after stereotactic body radiation therapy (SBRT). 61 features were recorded for 201 consecutive patients with stage I NSCLC treated with SBRT, in whom 8 (4.0%) developed radiation pneumonitis. Pneumonitis thresholds were found for each feature individually using decision stumps. The performance of three different algorithms (Decision Trees, Random Forests, RUSBoost) was evaluated. Learning curves were developed and the training error analyzed and compared to the testing error in order to evaluate the factors needed to obtain a cross-validated error smaller than 0.1. These included the addition of new features, increasing the complexity of the algorithm and enlarging the sample size and number of events. In the univariate analysis, the most important feature selected was the diffusion capacity of the lung for carbon monoxide (DLCO adj%). On multivariate analysis, the three most important features selected were the dose to 15 cc of the heart, dose to 4 cc of the trachea or bronchus, and race. Higher accuracy could be achieved if the RUSBoost algorithm was used with regularization. To predict radiation pneumonitis within an error smaller than 10%, we estimate that a sample size of 800 patients is required. Clinically relevant thresholds that put patients at risk of developing radiation pneumonitis were determined in a cohort of 201 stage I NSCLC patients treated with SBRT. The consistency of these thresholds can provide radiation oncologists with an estimate of their reliability and may inform treatment planning and patient counseling. The accuracy of the classification is limited by the number of patients in the study and not by the features gathered or the complexity of the algorithm.
Search performance is better predicted by tileability than presence of a unique basic feature.
Chang, Honghua; Rosenholtz, Ruth
2016-08-01
Traditional models of visual search such as feature integration theory (FIT; Treisman & Gelade, 1980), have suggested that a key factor determining task difficulty consists of whether or not the search target contains a "basic feature" not found in the other display items (distractors). Here we discriminate between such traditional models and our recent texture tiling model (TTM) of search (Rosenholtz, Huang, Raj, Balas, & Ilie, 2012b), by designing new experiments that directly pit these models against each other. Doing so is nontrivial, for two reasons. First, the visual representation in TTM is fully specified, and makes clear testable predictions, but its complexity makes getting intuitions difficult. Here we elucidate a rule of thumb for TTM, which enables us to easily design new and interesting search experiments. FIT, on the other hand, is somewhat ill-defined and hard to pin down. To get around this, rather than designing totally new search experiments, we start with five classic experiments that FIT already claims to explain: T among Ls, 2 among 5s, Q among Os, O among Qs, and an orientation/luminance-contrast conjunction search. We find that fairly subtle changes in these search tasks lead to significant changes in performance, in a direction predicted by TTM, providing definitive evidence in favor of the texture tiling model as opposed to traditional views of search.
Analyzing cross-college course enrollments via contextual graph mining
Liu, Xiaozhong; Chen, Yan
2017-01-01
The ability to predict what courses a student may enroll in the coming semester plays a pivotal role in the allocation of learning resources, which is a hot topic in the domain of educational data mining. In this study, we propose an innovative approach to characterize students’ cross-college course enrollments by leveraging a novel contextual graph. Specifically, different kinds of variables, such as students, courses, colleges and diplomas, as well as various types of variable relations, are utilized to depict the context of each variable, and then a representation learning algorithm node2vec is applied to extracting sophisticated graph-based features for the enrollment analysis. In this manner, the relations between any pair of variables can be measured quantitatively, which enables the variable type to transform from nominal to ratio. These graph-based features are examined by the random forest algorithm, and experiments on 24,663 students, 1,674 courses and 417,590 enrollment records demonstrate that the contextual graph can successfully improve analyzing the cross-college course enrollments, where three of the graph-based features have significantly stronger impacts on prediction accuracy than the others. Besides, the empirical results also indicate that the student’s course preference is the most important factor in predicting future course enrollments, which is consistent to the previous studies that acknowledge the course interest is a key point for course recommendations. PMID:29186171
Conventional MRI features for predicting the clinical outcome of patients with invasive placenta
Chen, Ting; Xu, Xiao-Quan; Shi, Hai-Bin; Yang, Zheng-Qiang; Zhou, Xin; Pan, Yi
2017-01-01
PURPOSE We aimed to evaluate whether morphologic magnetic resonance imaging (MRI) features could help to predict the maternal outcome after uterine artery embolization (UAE)-assisted cesarean section (CS) in patients with invasive placenta previa. METHODS We retrospectively reviewed the MRI data of 40 pregnant women who have undergone UAE-assisted cesarean section due to suspected high risk of massive hemorrhage caused by invasive placenta previa. Patients were divided into two groups based on the maternal outcome (good-outcome group: minor hemorrhage and uterus preserved; poor-outcome group: significant hemorrhage or emergency hysterectomy). Morphologic MRI features were compared between the two groups. Multivariate logistic regression analysis was used to identify the most valuable variables, and predictive value of the identified risk factor was determined. RESULTS Low signal intensity bands on T2-weighted imaging (P < 0.001), placenta percreta (P = 0.011), and placental cervical protrusion sign (P = 0.002) were more frequently observed in patients with poor outcome. Low signal intensity bands on T2-weighted imaging was the only significant predictor of poor maternal outcome in multivariate analysis (P = 0.020; odds ratio, 14.79), with 81.3% sensitivity and 84.3% specificity. CONCLUSION Low signal intensity bands on T2-weighted imaging might be a predictor of poor maternal outcome after UAE-assisted cesarean section in patients with invasive placenta previa. PMID:28345524
Caldieraro, Marco Antonio; Walsh, Samantha; Deckersbach, Thilo; Bobo, William V; Gao, Keming; Ketter, Terence A; Shelton, Richard C; Reilly-Harrington, Noreen A; Tohen, Mauricio; Calabrese, Joseph R; Thase, Michael E; Kocsis, James H; Sylvia, Louisa G; Nierenberg, Andrew A
2017-11-01
Activation encompasses energy and activity and is a central feature of bipolar disorder. However, the impact of activation on treatment response of bipolar depression requires further exploration. The aims of this study were to assess the association of decreased activation and sustained remission in bipolar depression and test for factors that could affect this association. We assessed participants with Diagnostic and Statistical Manual of Mental Disorders (4th ed) bipolar depression ( n = 303) included in a comparative effectiveness study of lithium- and quetiapine-based treatments (the Bipolar CHOICE study). Activation was evaluated using items from the Bipolar Inventory of Symptoms Scale. The selection of these items was based on a dimension of energy and interest symptoms associated with poorer treatment response in major depression. Decreased activation was associated with lower remission rates in the raw analyses and in a logistic regression model adjusted for baseline severity and subsyndromal manic symptoms (odds ratio = 0.899; p = 0.015). The manic features also predicted lower remission (odds ratio = 0.934; p < 0.001). Remission rates were similar in the two treatment groups. Decreased activation and subsyndromal manic symptoms predict lower remission rates in bipolar depression. Patients with these features may require specific treatment approaches, but new studies are necessary to identify treatments that could improve outcomes in this population.
Analyzing cross-college course enrollments via contextual graph mining.
Wang, Yongzhen; Liu, Xiaozhong; Chen, Yan
2017-01-01
The ability to predict what courses a student may enroll in the coming semester plays a pivotal role in the allocation of learning resources, which is a hot topic in the domain of educational data mining. In this study, we propose an innovative approach to characterize students' cross-college course enrollments by leveraging a novel contextual graph. Specifically, different kinds of variables, such as students, courses, colleges and diplomas, as well as various types of variable relations, are utilized to depict the context of each variable, and then a representation learning algorithm node2vec is applied to extracting sophisticated graph-based features for the enrollment analysis. In this manner, the relations between any pair of variables can be measured quantitatively, which enables the variable type to transform from nominal to ratio. These graph-based features are examined by the random forest algorithm, and experiments on 24,663 students, 1,674 courses and 417,590 enrollment records demonstrate that the contextual graph can successfully improve analyzing the cross-college course enrollments, where three of the graph-based features have significantly stronger impacts on prediction accuracy than the others. Besides, the empirical results also indicate that the student's course preference is the most important factor in predicting future course enrollments, which is consistent to the previous studies that acknowledge the course interest is a key point for course recommendations.
Thin-slice vision: inference of confidence measure from perceptual video quality
NASA Astrophysics Data System (ADS)
Hameed, Abdul; Balas, Benjamin; Dai, Rui
2016-11-01
There has been considerable research on thin-slice judgments, but no study has demonstrated the predictive validity of confidence measures when assessors watch videos acquired from communication systems, in which the perceptual quality of videos could be degraded by limited bandwidth and unreliable network conditions. This paper studies the relationship between high-level thin-slice judgments of human behavior and factors that contribute to perceptual video quality. Based on a large number of subjective test results, it has been found that the confidence of a single individual present in all the videos, called speaker's confidence (SC), could be predicted by a list of features that contribute to perceptual video quality. Two prediction models, one based on artificial neural network and the other based on a decision tree, were built to predict SC. Experimental results have shown that both prediction models can result in high correlation measures.
Assessing Predictive Properties of Genome-Wide Selection in Soybeans
Xavier, Alencar; Muir, William M.; Rainey, Katy Martin
2016-01-01
Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was to evaluate alternative scenarios to implement genomic selection for yield components in soybean (Glycine max L. merr). We used a nested association panel with cross validation to evaluate the impacts of training population size, genotyping density, and prediction model on the accuracy of genomic prediction. Our results indicate that training population size was the factor most relevant to improvement in genome-wide prediction, with greatest improvement observed in training sets up to 2000 individuals. We discuss assumptions that influence the choice of the prediction model. Although alternative models had minor impacts on prediction accuracy, the most robust prediction model was the combination of reproducing kernel Hilbert space regression and BayesB. Higher genotyping density marginally improved accuracy. Our study finds that breeding programs seeking efficient genomic selection in soybeans would best allocate resources by investing in a representative training set. PMID:27317786
Predicting age groups of Twitter users based on language and metadata features
Morgan-Lopez, Antonio A.; Chew, Robert F.; Ruddle, Paul
2017-01-01
Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles’ metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen’s d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as “school” for youth and “college” for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for informing public health surveillance and evaluation research. PMID:28850620
Predicting age groups of Twitter users based on language and metadata features.
Morgan-Lopez, Antonio A; Kim, Annice E; Chew, Robert F; Ruddle, Paul
2017-01-01
Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for informing public health surveillance and evaluation research.
Adhikari, Badri; Hou, Jie; Cheng, Jianlin
2018-03-01
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.
[Relational database for urinary stone ambulatory consultation. Assessment of initial outcomes].
Sáenz Medina, J; Páez Borda, A; Crespo Martinez, L; Gómez Dos Santos, V; Barrado, C; Durán Poveda, M
2010-05-01
To create a relational database for monitoring lithiasic patients. We describe the architectural details and the initial results of the statistical analysis. Microsoft Access 2002 was used as template. Four different tables were constructed to gather demographic data (table 1), clinical and laboratory findings (table 2), stone features (table 3) and therapeutic approach (table 4). For a reliability analysis of the database the number of correctly stored data was gathered. To evaluate the performance of the database, a prospective analysis was conducted, from May 2004 to August 2009, on 171 stone free patients after treatment (EWSL, surgery or medical) from a total of 511 patients stored in the database. Lithiasic status (stone free or stone relapse) was used as primary end point, while demographic factors (age, gender), lithiasic history, upper urinary tract alterations and characteristics of the stone (side, location, composition and size) were considered as predictive factors. An univariate analysis was conducted initially by chi square test and supplemented by Kaplan Meier estimates for time to stone recurrence. A multiple Cox proportional hazards regression model was generated to jointly assess the prognostic value of the demographic factors and the predictive value of stones characteristics. For the reliability analysis 22,084 data were available corresponding to 702 consultations on 511 patients. Analysis of data showed a recurrence rate of 85.4% (146/171, median time to recurrence 608 days, range 70-1758). In the univariate and multivariate analysis, none of the factors under consideration had a significant effect on recurrence rate (p=ns). The relational database is useful for monitoring patients with urolithiasis. It allows easy control and update, as well as data storage for later use. The analysis conducted for its evaluation showed no influence of demographic factors and stone features on stone recurrence.
A feature-based approach to modeling protein-protein interaction hot spots.
Cho, Kyu-il; Kim, Dongsup; Lee, Doheon
2009-05-01
Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to pi-related interactions, especially pi . . . pi interactions.
Evaluation of non-negative matrix factorization of grey matter in age prediction.
Varikuti, Deepthi P; Genon, Sarah; Sotiras, Aristeidis; Schwender, Holger; Hoffstaedter, Felix; Patil, Kaustubh R; Jockwitz, Christiane; Caspers, Svenja; Moebus, Susanne; Amunts, Katrin; Davatzikos, Christos; Eickhoff, Simon B
2018-06-01
The relationship between grey matter volume (GMV) patterns and age can be captured by multivariate pattern analysis, allowing prediction of individuals' age based on structural imaging. Raw data, voxel-wise GMV and non-sparse factorization (with Principal Component Analysis, PCA) show good performance but do not promote relatively localized brain components for post-hoc examinations. Here we evaluated a non-negative matrix factorization (NNMF) approach to provide a reduced, but also interpretable representation of GMV data in age prediction frameworks in healthy and clinical populations. This examination was performed using three datasets: a multi-site cohort of life-span healthy adults, a single site cohort of older adults and clinical samples from the ADNI dataset with healthy subjects, participants with Mild Cognitive Impairment and patients with Alzheimer's disease (AD) subsamples. T1-weighted images were preprocessed with VBM8 standard settings to compute GMV values after normalization, segmentation and modulation for non-linear transformations only. Non-negative matrix factorization was computed on the GM voxel-wise values for a range of granularities (50-690 components) and LASSO (Least Absolute Shrinkage and Selection Operator) regression were used for age prediction. First, we compared the performance of our data compression procedure (i.e., NNMF) to various other approaches (i.e., uncompressed VBM data, PCA-based factorization and parcellation-based compression). We then investigated the impact of the granularity on the accuracy of age prediction, as well as the transferability of the factorization and model generalization across datasets. We finally validated our framework by examining age prediction in ADNI samples. Our results showed that our framework favorably compares with other approaches. They also demonstrated that the NNMF based factorization derived from one dataset could be efficiently applied to compress VBM data of another dataset and that granularities between 300 and 500 components give an optimal representation for age prediction. In addition to the good performance in healthy subjects our framework provided relatively localized brain regions as the features contributing to the prediction, thereby offering further insights into structural changes due to brain aging. Finally, our validation in clinical populations showed that our framework is sensitive to deviance from normal structural variations in pathological aging. Copyright © 2018 Elsevier Inc. All rights reserved.
Gastric biomarkers: a global review.
Baniak, Nick; Senger, Jenna-Lynn; Ahmed, Shahid; Kanthan, S C; Kanthan, Rani
2016-08-11
Gastric cancer is an aggressive disease with a poor 5-year survival and large global burden of disease. The disease is biologically and genetically heterogeneous with a poorly understood carcinogenesis at the molecular level. Despite the many prognostic, predictive, and therapeutic biomarkers investigated to date, gastric cancer continues to be detected at an advanced stage with resultant poor clinical outcomes. This is a global review of gastric biomarkers with an emphasis on HER2, E-cadherin, fibroblast growth factor receptor, mammalian target of rapamycin, and hepatocyte growth factor receptor as well as sections on microRNAs, long noncoding RNAs, matrix metalloproteinases, PD-L1, TP53, and microsatellite instability. A deeper understanding of the pathogenesis and biological features of gastric cancer, including the identification and characterization of diagnostic, prognostic, predictive, and therapeutic biomarkers, hopefully will provide improved clinical outcomes.
Bommert, Andrea; Rahnenführer, Jörg; Lang, Michel
2017-01-01
Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.
A predictive control framework for optimal energy extraction of wind farms
NASA Astrophysics Data System (ADS)
Vali, M.; van Wingerden, J. W.; Boersma, S.; Petrović, V.; Kühn, M.
2016-09-01
This paper proposes an adjoint-based model predictive control for optimal energy extraction of wind farms. It employs the axial induction factor of wind turbines to influence their aerodynamic interactions through the wake. The performance index is defined here as the total power production of the wind farm over a finite prediction horizon. A medium-fidelity wind farm model is utilized to predict the inflow propagation in advance. The adjoint method is employed to solve the formulated optimization problem in a cost effective way and the first part of the optimal solution is implemented over the control horizon. This procedure is repeated at the next controller sample time providing the feedback into the optimization. The effectiveness and some key features of the proposed approach are studied for a two turbine test case through simulations.
Adolescent precursors of adult borderline personality pathology in a high-risk community sample.
Conway, Christopher C; Hammen, Constance; Brennan, Patricia A
2015-06-01
Longitudinal studies of the exact environmental conditions and personal attributes contributing to the development of borderline personality disorder (BPD) are rare. Furthermore, existing research typically examines risk factors in isolation, limiting our knowledge of the relative effect sizes of different risk factors and how they act in concert to bring about borderline personality pathology. The present study investigated the prospective effects of diverse acute and chronic stressors, proband psychopathology, and maternal psychopathology on BPD features in a high-risk community sample (N = 700) of youth followed from mid-adolescence to young adulthood. Multivariate analyses revealed significant effects of maternal externalizing disorder history, offspring internalizing disorder history, family stressors, and school-related stressors on BPD risk. Contrary to expectations, no interactions between chronically stressful environmental conditions and personal characteristics in predicting borderline personality features were detected. Implications of these findings for etiological theories of BPD and early screening efforts are discussed.
Sensory features and repetitive behaviors in children with autism and developmental delays.
Boyd, Brian A; Baranek, Grace T; Sideris, John; Poe, Michele D; Watson, Linda R; Patten, Elena; Miller, Heather
2010-04-01
This study combined parent and observational measures to examine the association between aberrant sensory features and restricted, repetitive behaviors in children with autism (N=67) and those with developmental delays (N=42). Confirmatory factor analysis was used to empirically validate three sensory constructs of interest: hyperresponsiveness, hyporesponsiveness, and sensory seeking. Examining the association between the three derived sensory factor scores and scores on the Repetitive Behavior Scales--Revised revealed the co-occurrence of these behaviors in both clinical groups. Specifically, high levels of hyperresponsive behaviors predicted high levels of repetitive behaviors, and the relationship between these variables remained the same controlling for mental age. We primarily found non-significant associations between hyporesponsiveness or sensory seeking and repetitive behaviors, with the exception that sensory seeking was associated with ritualistic/sameness behaviors. These findings suggest that shared neurobiological mechanisms may underlie hyperresponsive sensory symptoms and repetitive behaviors and have implications for diagnostic classification as well as intervention.
What Was Learned in Predicting Slender Airframe Aerodynamics with the F-16XL Aircraft
NASA Technical Reports Server (NTRS)
Rizzi, Arthur; Luckring, James M.
2016-01-01
The second Cranked-Arrow Wing Aerodynamics Project, International, coordinated project has been underway to improve high-fidelity computational-fluid-dynamics predictions of slender airframe aerodynamics. The work is focused on two flow conditions and leverages a unique flight data set obtained with the F-16XL aircraft for comparison and validation. These conditions, a low-speed high-angle-of-attack case and a transonic low-angle-of-attack case, were selected from a prior prediction campaign wherein the computational fluid dynamics failed to provide acceptable results. In revisiting these two cases, approaches for improved results include better, denser grids using more grid adaptation to local flow features as well as unsteady higher-fidelity physical modeling like hybrid Reynolds-averaged Navier-Stokes/unsteady Reynolds-averaged Navier-Stokes/large-eddy simulation methods. The work embodies predictions from multiple numerical formulations that are contributed from multiple organizations where some authors investigate other possible factors that could explain the discrepancies in agreement (e.g., effects due to deflected control surfaces during the flight tests as well as static aeroelastic deflection of the outer wing). This paper presents the synthesis of all the results and findings and draws some conclusions that lead to an improved understanding of the underlying flow physics, finally making the connections between the physics and aircraft features.
Predicting discovery rates of genomic features.
Gravel, Simon
2014-06-01
Successful sequencing experiments require judicious sample selection. However, this selection must often be performed on the basis of limited preliminary data. Predicting the statistical properties of the final sample based on preliminary data can be challenging, because numerous uncertain model assumptions may be involved. Here, we ask whether we can predict "omics" variation across many samples by sequencing only a fraction of them. In the infinite-genome limit, we find that a pilot study sequencing 5% of a population is sufficient to predict the number of genetic variants in the entire population within 6% of the correct value, using an estimator agnostic to demography, selection, or population structure. To reach similar accuracy in a finite genome with millions of polymorphisms, the pilot study would require ∼15% of the population. We present computationally efficient jackknife and linear programming methods that exhibit substantially less bias than the state of the art when applied to simulated data and subsampled 1000 Genomes Project data. Extrapolating based on the National Heart, Lung, and Blood Institute Exome Sequencing Project data, we predict that 7.2% of sites in the capture region would be variable in a sample of 50,000 African Americans and 8.8% in a European sample of equal size. Finally, we show how the linear programming method can also predict discovery rates of various genomic features, such as the number of transcription factor binding sites across different cell types. Copyright © 2014 by the Genetics Society of America.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bossi, Flavia; Fan, Jue; Xiao, Jun
Here, the molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. As a result, to identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation.more » We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.« less
Bossi, Flavia; Fan, Jue; Xiao, Jun; ...
2017-06-26
Here, the molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. As a result, to identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation.more » We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.« less
Which ante mortem clinical features predict progressive supranuclear palsy pathology?
Respondek, Gesine; Kurz, Carolin; Arzberger, Thomas; Compta, Yaroslau; Englund, Elisabet; Ferguson, Leslie W; Gelpi, Ellen; Giese, Armin; Irwin, David J; Meissner, Wassilios G; Nilsson, Christer; Pantelyat, Alexander; Rajput, Alex; van Swieten, John C; Troakes, Claire; Josephs, Keith A; Lang, Anthony E; Mollenhauer, Brit; Müller, Ulrich; Whitwell, Jennifer L; Antonini, Angelo; Bhatia, Kailash P; Bordelon, Yvette; Corvol, Jean-Christophe; Colosimo, Carlo; Dodel, Richard; Grossman, Murray; Kassubek, Jan; Krismer, Florian; Levin, Johannes; Lorenzl, Stefan; Morris, Huw; Nestor, Peter; Oertel, Wolfgang H; Rabinovici, Gil D; Rowe, James B; van Eimeren, Thilo; Wenning, Gregor K; Boxer, Adam; Golbe, Lawrence I; Litvan, Irene; Stamelou, Maria; Höglinger, Günter U
2017-07-01
Progressive supranuclear palsy (PSP) is a neuropathologically defined disease presenting with a broad spectrum of clinical phenotypes. To identify clinical features and investigations that predict or exclude PSP pathology during life, aiming at an optimization of the clinical diagnostic criteria for PSP. We performed a systematic review of the literature published since 1996 to identify clinical features and investigations that may predict or exclude PSP pathology. We then extracted standardized data from clinical charts of patients with pathologically diagnosed PSP and relevant disease controls and calculated the sensitivity, specificity, and positive predictive value of key clinical features for PSP in this cohort. Of 4166 articles identified by the database inquiry, 269 met predefined standards. The literature review identified clinical features predictive of PSP, including features of the following 4 functional domains: ocular motor dysfunction, postural instability, akinesia, and cognitive dysfunction. No biomarker or genetic feature was found reliably validated to predict definite PSP. High-quality original natural history data were available from 206 patients with pathologically diagnosed PSP and from 231 pathologically diagnosed disease controls (54 corticobasal degeneration, 51 multiple system atrophy with predominant parkinsonism, 53 Parkinson's disease, 73 behavioral variant frontotemporal dementia). We identified clinical features that predicted PSP pathology, including phenotypes other than Richardson's syndrome, with varying sensitivity and specificity. Our results highlight the clinical variability of PSP and the high prevalence of phenotypes other than Richardson's syndrome. The features of variant phenotypes with high specificity and sensitivity should serve to optimize clinical diagnosis of PSP. © 2017 International Parkinson and Movement Disorder Society. © 2017 International Parkinson and Movement Disorder Society.
Associative cueing of attention through implicit feature-location binding.
Girardi, Giovanna; Nico, Daniele
2017-09-01
In order to assess associative learning between two task-irrelevant features in cueing spatial attention, we devised a task in which participants have to make an identity comparison between two sequential visual stimuli. Unbeknownst to them, location of the second stimulus could be predicted by the colour of the first or a concurrent sound. Albeit unnecessary to perform the identity-matching judgment the predictive features thus provided an arbitrary association favouring the spatial anticipation of the second stimulus. A significant advantage was found with faster responses at predicted compared to non-predicted locations. Results clearly demonstrated an associative cueing of attention via a second-order arbitrary feature/location association but with a substantial discrepancy depending on the sensory modality of the predictive feature. With colour as predictive feature, significant advantages emerged only after the completion of three blocks of trials. On the contrary, sound affected responses from the first block of trials and significant advantages were manifest from the beginning of the second. The possible mechanisms underlying the associative cueing of attention in both conditions are discussed. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Christopher, Mark; Tang, Li; Fingert, John H.; Scheetz, Todd E.; Abramoff, Michael D.
2014-03-01
Evaluation of optic nerve head (ONH) structure is a commonly used clinical technique for both diagnosis and monitoring of glaucoma. Glaucoma is associated with characteristic changes in the structure of the ONH. We present a method for computationally identifying ONH structural features using both imaging and genetic data from a large cohort of participants at risk for primary open angle glaucoma (POAG). Using 1054 participants from the Ocular Hypertension Treatment Study, ONH structure was measured by application of a stereo correspondence algorithm to stereo fundus images. In addition, the genotypes of several known POAG genetic risk factors were considered for each participant. ONH structural features were discovered using both a principal component analysis approach to identify the major modes of variance within structural measurements and a linear discriminant analysis approach to capture the relationship between genetic risk factors and ONH structure. The identified ONH structural features were evaluated based on the strength of their associations with genotype and development of POAG by the end of the OHTS study. ONH structural features with strong associations with genotype were identified for each of the genetic loci considered. Several identified ONH structural features were significantly associated (p < 0.05) with the development of POAG after Bonferroni correction. Further, incorporation of genetic risk status was found to substantially increase performance of early POAG prediction. These results suggest incorporating both imaging and genetic data into ONH structural modeling significantly improves the ability to explain POAG-related changes to ONH structure.
Protein location prediction using atomic composition and global features of the amino acid sequence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.
2010-01-22
Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less
Accurate prediction of personalized olfactory perception from large-scale chemoinformatic features.
Li, Hongyang; Panwar, Bharat; Omenn, Gilbert S; Guan, Yuanfang
2018-02-01
The olfactory stimulus-percept problem has been studied for more than a century, yet it is still hard to precisely predict the odor given the large-scale chemoinformatic features of an odorant molecule. A major challenge is that the perceived qualities vary greatly among individuals due to different genetic and cultural backgrounds. Moreover, the combinatorial interactions between multiple odorant receptors and diverse molecules significantly complicate the olfaction prediction. Many attempts have been made to establish structure-odor relationships for intensity and pleasantness, but no models are available to predict the personalized multi-odor attributes of molecules. In this study, we describe our winning algorithm for predicting individual and population perceptual responses to various odorants in the DREAM Olfaction Prediction Challenge. We find that random forest model consisting of multiple decision trees is well suited to this prediction problem, given the large feature spaces and high variability of perceptual ratings among individuals. Integrating both population and individual perceptions into our model effectively reduces the influence of noise and outliers. By analyzing the importance of each chemical feature, we find that a small set of low- and nondegenerative features is sufficient for accurate prediction. Our random forest model successfully predicts personalized odor attributes of structurally diverse molecules. This model together with the top discriminative features has the potential to extend our understanding of olfactory perception mechanisms and provide an alternative for rational odorant design.
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.
Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E
2018-04-25
Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
SU-F-R-04: Radiomics for Survival Prediction in Glioblastoma (GBM)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, H; Molitoris, J; Bhooshan, N
Purpose: To develop a quantitative radiomics approach for survival prediction of glioblastoma (GBM) patients treated with chemoradiotherapy (CRT). Methods: 28 GBM patients who received CRT at our institution were retrospectively studied. 255 radiomic features were extracted from 3 gadolinium-enhanced T1 weighted MRIs for 2 regions of interest (ROIs) (the surgical cavity and its surrounding enhancement rim). The 3 MRIs were at pre-treatment, 1-month and 3-month post-CRT. The imaging features comprehensively quantified the intensity, spatial variation (texture), geometric property and their spatial-temporal changes for the 2 ROIs. 3 demographics features (age, race, gender) and 12 clinical parameters (KPS, extent of resection,more » whether concurrent temozolomide was adjusted/stopped and radiotherapy related information) were also included. 4 Machine learning models (logistic regression (LR), support vector machine (SVM), decision tree (DT), neural network (NN)) were applied to predict overall survival (OS) and progression-free survival (PFS). The number of cases and percentage of cases predicted correctly were collected and AUC (area under the receiver operating characteristic (ROC) curve) were determined after leave-one-out cross-validation. Results: From univariate analysis, 27 features (1 demographic, 1 clinical and 25 imaging) were statistically significant (p<0.05) for both OS and PFS. Two sets of features (each contained 24 features) were algorithmically selected from all features to predict OS and PFS. High prediction accuracy of OS was achieved by using NN (96%, 27 of 28 cases were correctly predicted, AUC = 0.99), LR (93%, 26 of 28 cases were correctly predicted, AUC = 0.95) and SVM (93%, 26 of 28 cases were correctly predicted, AUC = 0.90). When predicting PFS, NN obtained the highest prediction accuracy (89%, 25 of 28 cases were correctly predicted, AUC = 0.92). Conclusion: Radiomics approach combined with patients’ demographics and clinical parameters can accurately predict survival in GBM patients treated with CRT.« less
Differential Predictability of Four Dimensions of Affect Intensity
Rubin, David C.; Hoyle, Rick H.; Leary, Mark R.
2013-01-01
Individual differences in affect intensity are typically assessed with the Affect Intensity Measure (AIM). Previous factor analyses suggest that the AIM is comprised of four weakly correlated factors: Positive Affectivity, Negative Reactivity, Negative Intensity and Positive Intensity or Serenity. However, little data exist to show whether its four factors relate to other measures differently enough to preclude use of the total scale score. The present study replicated the four-factor solution and found that subscales derived from the four factors correlated differently with criterion variables that assess personality domains, affective dispositions, and cognitive patterns that are associated with emotional reactions. The results show that use of the total AIM score can obscure relationships between specific features of affect intensity and other variables and suggest that researchers should examine the individual AIM subscales. PMID:21707262
Ma, Xin; Guo, Jing; Sun, Xiao
2016-01-01
DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.
Latent Variable Modeling of Brain Gray Matter Volume and Psychopathy in Incarcerated Offenders
Baskin-Sommers, Arielle R.; Neumann, Craig S.; Cope, Lora M.; Kiehl, Kent A.
2016-01-01
Advanced statistical modeling has become a prominent feature in psychological science and can be a useful approach for representing the neural architecture linked to psychopathology. Psychopathy, a disorder characterized by dysfunction in interpersonal-affective and impulsive-antisocial domains, is associated with widespread neural abnormalities. Several imaging studies suggest that underlying structural deficits in paralimbic regions are associated with psychopathy. While these studies are useful, they make assumptions about the organization of the brain and its relevance to individuals displaying psychopathic features. Capitalizing on statistical modeling, the present study (N=254) used latent variable methods to examine the structure of gray matter volume in male offenders, and assessed the latent relations between psychopathy and gray matter factors reflecting paralimbic and non-paralimbic regions. Results revealed good fit for a four-factor gray matter paralimbic model and these first-order factors were accounted for by a super-ordinate paralimbic ‘system’ factor. Moreover, a super-ordinate psychopathy factor significantly predicted the paralimbic, but not the non-paralimbic factor. The latent variable paralimbic model, specifically linked with psychopathy, goes beyond understanding of single brain regions within the system and provides evidence for psychopathy-related gray matter volume reductions in the paralimbic system as a whole. PMID:27269123
Summary of the key features of seven biomathematical models of human fatigue and performance.
Mallis, Melissa M; Mejdal, Sig; Nguyen, Tammy T; Dinges, David F
2004-03-01
Biomathematical models that quantify the effects of circadian and sleep/wake processes on the regulation of alertness and performance have been developed in an effort to predict the magnitude and timing of fatigue-related responses in a variety of contexts (e.g., transmeridian travel, sustained operations, shift work). This paper summarizes key features of seven biomathematical models reviewed as part of the Fatigue and Performance Modeling Workshop held in Seattle, WA, on June 13-14, 2002. The Workshop was jointly sponsored by the National Aeronautics and Space Administration, U.S. Department of Defense, U.S. Army Medical Research and Materiel Command, Office of Naval Research, Air Force Office of Scientific Research, and U.S. Department of Transportation. An invitation was sent to developers of seven biomathematical models that were commonly cited in scientific literature and/or supported by government funding. On acceptance of the invitation to attend the Workshop, developers were asked to complete a survey of the goals, capabilities, inputs, and outputs of their biomathematical models of alertness and performance. Data from the completed surveys were summarized and juxtaposed to provide a framework for comparing features of the seven models. Survey responses revealed that models varied greatly relative to their reported goals and capabilities. While all modelers reported that circadian factors were key components of their capabilities, they differed markedly with regard to the roles of sleep and work times as input factors for prediction: four of the seven models had work time as their sole input variable(s), while the other three models relied on various aspects of sleep timing for model input. Models also differed relative to outputs: five sought to predict results from laboratory experiments, field, and operational data, while two models were developed without regard to predicting laboratory experimental results. All modelers provided published papers describing their models, with three of the models being proprietary. Although all models appear to have been fundamentally influenced by the two-process model of sleep regulation by Borbély, there is considerable diversity among them in the number and type of input and output variables, and their stated goals and capabilities.
Summary of the key features of seven biomathematical models of human fatigue and performance
NASA Technical Reports Server (NTRS)
Mallis, Melissa M.; Mejdal, Sig; Nguyen, Tammy T.; Dinges, David F.
2004-01-01
BACKGROUND: Biomathematical models that quantify the effects of circadian and sleep/wake processes on the regulation of alertness and performance have been developed in an effort to predict the magnitude and timing of fatigue-related responses in a variety of contexts (e.g., transmeridian travel, sustained operations, shift work). This paper summarizes key features of seven biomathematical models reviewed as part of the Fatigue and Performance Modeling Workshop held in Seattle, WA, on June 13-14, 2002. The Workshop was jointly sponsored by the National Aeronautics and Space Administration, U.S. Department of Defense, U.S. Army Medical Research and Materiel Command, Office of Naval Research, Air Force Office of Scientific Research, and U.S. Department of Transportation. METHODS: An invitation was sent to developers of seven biomathematical models that were commonly cited in scientific literature and/or supported by government funding. On acceptance of the invitation to attend the Workshop, developers were asked to complete a survey of the goals, capabilities, inputs, and outputs of their biomathematical models of alertness and performance. Data from the completed surveys were summarized and juxtaposed to provide a framework for comparing features of the seven models. RESULTS: Survey responses revealed that models varied greatly relative to their reported goals and capabilities. While all modelers reported that circadian factors were key components of their capabilities, they differed markedly with regard to the roles of sleep and work times as input factors for prediction: four of the seven models had work time as their sole input variable(s), while the other three models relied on various aspects of sleep timing for model input. Models also differed relative to outputs: five sought to predict results from laboratory experiments, field, and operational data, while two models were developed without regard to predicting laboratory experimental results. All modelers provided published papers describing their models, with three of the models being proprietary. CONCLUSIONS: Although all models appear to have been fundamentally influenced by the two-process model of sleep regulation by Borbely, there is considerable diversity among them in the number and type of input and output variables, and their stated goals and capabilities.
Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping
2017-12-12
In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.
NASA Astrophysics Data System (ADS)
Mu, Wei; Qi, Jin; Lu, Hong; Schabath, Matthew; Balagurunathan, Yoganand; Tunali, Ilke; Gillies, Robert James
2018-02-01
Purpose: Investigate the ability of using complementary information provided by the fusion of PET/CT images to predict immunotherapy response in non-small cell lung cancer (NSCLC) patients. Materials and methods: We collected 64 patients diagnosed with primary NSCLC treated with anti PD-1 checkpoint blockade. Using PET/CT images, fused images were created following multiple methodologies, resulting in up to 7 different images for the tumor region. Quantitative image features were extracted from the primary image (PET/CT) and the fused images, which included 195 from primary images and 1235 features from the fusion images. Three clinical characteristics were also analyzed. We then used support vector machine (SVM) classification models to identify discriminant features that predict immunotherapy response at baseline. Results: A SVM built with 87 fusion features and 13 primary PET/CT features on validation dataset had an accuracy and area under the ROC curve (AUROC) of 87.5% and 0.82, respectively, compared to a model built with 113 original PET/CT features on validation dataset 78.12% and 0.68. Conclusion: The fusion features shows better ability to predict immunotherapy response prediction compared to individual image features.
Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS
Li, Bi-Qing; Feng, Kai-Yan; Chen, Lei; Huang, Tao; Cai, Yu-Dong
2012-01-01
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction. PMID:22937126
NASA Astrophysics Data System (ADS)
Watari, Chinatsu; Matsuhiro, Mikio; Näppi, Janne J.; Nasirudin, Radin A.; Hironaka, Toru; Kawata, Yoshiki; Niki, Noboru; Yoshida, Hiroyuki
2018-03-01
We investigated the effect of radiomic texture-curvature (RTC) features of lung CT images in the prediction of the overall survival of patients with rheumatoid arthritis-associated interstitial lung disease (RA-ILD). We retrospectively collected 70 RA-ILD patients who underwent thin-section lung CT and serial pulmonary function tests. After the extraction of the lung region, we computed hyper-curvature features that included the principal curvatures, curvedness, bright/dark sheets, cylinders, blobs, and curvature scales for the bronchi and the aerated lungs. We also computed gray-level co-occurrence matrix (GLCM) texture features on the segmented lungs. An elastic-net penalty method was used to select and combine these features with a Cox proportional hazards model for predicting the survival of the patient. Evaluation was performed by use of concordance index (C-index) as a measure of prediction performance. The C-index values of the texture features, hyper-curvature features, and the combination thereof (RTC features) in predicting patient survival was estimated by use of bootstrapping with 2,000 replications, and they were compared with an established clinical prognostic biomarker known as the gender, age, and physiology (GAP) index by means of two-sided t-test. Bootstrap evaluation yielded the following C-index values for the clinical and radiomic features: (a) GAP index: 78.3%; (b) GLCM texture features: 79.6%; (c) hypercurvature features: 80.8%; and (d) RTC features: 86.8%. The RTC features significantly outperformed any of the other predictors (P < 0.001). The Kaplan-Meier survival curves of patients stratified to low- and high-risk groups based on the RTC features showed statistically significant (P < 0.0001) difference. Thus, the RTC features can provide an effective imaging biomarker for predicting the overall survival of patients with RA-ILD.
Huang, Zhengxing; Dong, Wei; Duan, Huilong; Liu, Jiquan
2018-05-01
Acute coronary syndrome (ACS), as a common and severe cardiovascular disease, is a leading cause of death and the principal cause of serious long-term disability globally. Clinical risk prediction of ACS is important for early intervention and treatment. Existing ACS risk scoring models are based mainly on a small set of hand-picked risk factors and often dichotomize predictive variables to simplify the score calculation. This study develops a regularized stacked denoising autoencoder (SDAE) model to stratify clinical risks of ACS patients from a large volume of electronic health records (EHR). To capture characteristics of patients at similar risk levels, and preserve the discriminating information across different risk levels, two constraints are added on SDAE to make the reconstructed feature representations contain more risk information of patients, which contribute to a better clinical risk prediction result. We validate our approach on a real clinical dataset consisting of 3464 ACS patient samples. The performance of our approach for predicting ACS risk remains robust and reaches 0.868 and 0.73 in terms of both AUC and accuracy, respectively. The obtained results show that the proposed approach achieves a competitive performance compared to state-of-the-art models in dealing with the clinical risk prediction problem. In addition, our approach can extract informative risk factors of ACS via a reconstructive learning strategy. Some of these extracted risk factors are not only consistent with existing medical domain knowledge, but also contain suggestive hypotheses that could be validated by further investigations in the medical domain.
A feature-based approach to modeling protein–protein interaction hot spots
Cho, Kyu-il; Kim, Dongsup; Lee, Doheon
2009-01-01
Identifying features that effectively represent the energetic contribution of an individual interface residue to the interactions between proteins remains problematic. Here, we present several new features and show that they are more effective than conventional features. By combining the proposed features with conventional features, we develop a predictive model for interaction hot spots. Initially, 54 multifaceted features, composed of different levels of information including structure, sequence and molecular interaction information, are quantified. Then, to identify the best subset of features for predicting hot spots, feature selection is performed using a decision tree. Based on the selected features, a predictive model for hot spots is created using support vector machine (SVM) and tested on an independent test set. Our model shows better overall predictive accuracy than previous methods such as the alanine scanning methods Robetta and FOLDEF, and the knowledge-based method KFC. Subsequent analysis yields several findings about hot spots. As expected, hot spots have a larger relative surface area burial and are more hydrophobic than other residues. Unexpectedly, however, residue conservation displays a rather complicated tendency depending on the types of protein complexes, indicating that this feature is not good for identifying hot spots. Of the selected features, the weighted atomic packing density, relative surface area burial and weighted hydrophobicity are the top 3, with the weighted atomic packing density proving to be the most effective feature for predicting hot spots. Notably, we find that hot spots are closely related to π–related interactions, especially π · · · π interactions. PMID:19273533
Jin, Mingwu; Deng, Weishu
2018-05-15
There is a spectrum of the progression from healthy control (HC) to mild cognitive impairment (MCI) without conversion to Alzheimer's disease (AD), to MCI with conversion to AD (cMCI), and to AD. This study aims to predict the different disease stages using brain structural information provided by magnetic resonance imaging (MRI) data. The neighborhood component analysis (NCA) is applied to select most powerful features for prediction. The ensemble decision tree classifier is built to predict which group the subject belongs to. The best features and model parameters are determined by cross validation of the training data. Our results show that 16 out of a total of 429 features were selected by NCA using 240 training subjects, including MMSE score and structural measures in memory-related regions. The boosting tree model with NCA features can achieve prediction accuracy of 56.25% on 160 test subjects. Principal component analysis (PCA) and sequential feature selection (SFS) are used for feature selection, while support vector machine (SVM) is used for classification. The boosting tree model with NCA features outperforms all other combinations of feature selection and classification methods. The results suggest that NCA be a better feature selection strategy than PCA and SFS for the data used in this study. Ensemble tree classifier with boosting is more powerful than SVM to predict the subject group. However, more advanced feature selection and classification methods or additional measures besides structural MRI may be needed to improve the prediction performance. Copyright © 2018 Elsevier B.V. All rights reserved.
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sabbaghi, Mostafa, E-mail: mostafas@buffalo.edu; Esmaeilian, Behzad, E-mail: b.esmaeilian@neu.edu; Raihanian Mashhadi, Ardeshir, E-mail: ardeshir@buffalo.edu
Highlights: • We analyzed a data set of HDDs returned back to an e-waste collection site. • We studied factors that affect the storage behavior. • Consumer type, brand and size are among factors which affect the storage behavior. • Commercial consumers have stored computers more than household consumers. • Machine learning models were used to predict the storage behavior. - Abstract: Consumers often have a tendency to store their used, old or un-functional electronics for a period of time before they discard them and return them back to the waste stream. This behavior increases the obsolescence rate of usedmore » still-functional products leading to lower profitability that could be resulted out of End-of-Use (EOU) treatments such as reuse, upgrade, and refurbishment. These types of behaviors are influenced by several product and consumer-related factors such as consumers’ traits and lifestyles, technology evolution, product design features, product market value, and pro-environmental stimuli. Better understanding of different groups of consumers, their utilization and storage behavior and the connection of these behaviors with product design features helps Original Equipment Manufacturers (OEMs) and recycling and recovery industry to better overcome the challenges resulting from the undesirable storage of used products. This paper aims at providing insightful statistical analysis of Electronic Waste (e-waste) dynamic nature by studying the effects of design characteristics, brand and consumer type on the electronics usage time and end of use time-in-storage. A database consisting of 10,063 Hard Disk Drives (HDD) of used personal computers returned back to a remanufacturing facility located in Chicago, IL, USA during 2011–2013 has been selected as the base for this study. The results show that commercial consumers have stored computers more than household consumers regardless of brand and capacity factors. Moreover, a heterogeneous storage behavior is observed for different brands of HDDs regardless of capacity and consumer type factors. Finally, the storage behavior trends are projected for short-time forecasting and the storage times are precisely predicted by applying machine learning methods.« less
Current from a nano-gap hyperbolic diode using shape-factors: Theory
NASA Astrophysics Data System (ADS)
Jensen, Kevin L.; Shiffler, Donald A.; Peckerar, Martin; Harris, John R.; Petillo, John J.
2017-08-01
Quantum tunneling by field emission from nanoscale features or sharp field emission structures for which the anode-cathode gap is nanometers in scale ("nano diodes") experience strong deviations from the planar image charge lowered tunneling barrier used in the Murphy and Good formulation of the Fowler-Nordheim equation. These deviations alter the prediction of total current from a curved surface. Modifications to the emission barrier are modeled using a hyperbolic (prolate spheroidal) geometry to determine the trajectories along which the Gamow factor in a WKB-like treatment is undertaken; a quadratic equivalent potential is determined, and a method of shape factors is used to evaluate the corrected total current from a protrusion or wedge geometry.
Turner, Jonathan W; Moazzez, Rebecca; Banerjee, Avijit
2012-09-01
The art and craft of recording intra-oral anatomy successfully with dental impressions relies on the interaction of three critical factors--the 'golden triangle of impression-taking': an appreciation of the anatomical features to be recorded, the material used to take the impression and the clinical handling/operative technique applied. This paper aims to discuss the three factors and their inter-relationships, detailing clinical tips for successful, reproducible and consistent outcomes. Obtaining accurate dental impressions is the key to success in a wide range of clinical restorative procedures. This paper offers clinical advice to practitioners to plan and then take predictable, good quality impressions for their restorative cases.
Combining Feature Extraction Methods to Assist the Diagnosis of Alzheimer's Disease.
Segovia, F; Górriz, J M; Ramírez, J; Phillips, C
2016-01-01
Neuroimaging data as (18)F-FDG PET is widely used to assist the diagnosis of Alzheimer's disease (AD). Looking for regions with hypoperfusion/ hypometabolism, clinicians may predict or corroborate the diagnosis of the patients. Modern computer aided diagnosis (CAD) systems based on the statistical analysis of whole neuroimages are more accurate than classical systems based on quantifying the uptake of some predefined regions of interests (ROIs). In addition, these new systems allow determining new ROIs and take advantage of the huge amount of information comprised in neuroimaging data. A major branch of modern CAD systems for AD is based on multivariate techniques, which analyse a neuroimage as a whole, considering not only the voxel intensities but also the relations among them. In order to deal with the vast dimensionality of the data, a number of feature extraction methods have been successfully applied. In this work, we propose a CAD system based on the combination of several feature extraction techniques. First, some commonly used feature extraction methods based on the analysis of the variance (as principal component analysis), on the factorization of the data (as non-negative matrix factorization) and on classical magnitudes (as Haralick features) were simultaneously applied to the original data. These feature sets were then combined by means of two different combination approaches: i) using a single classifier and a multiple kernel learning approach and ii) using an ensemble of classifier and selecting the final decision by majority voting. The proposed approach was evaluated using a labelled neuroimaging database along with a cross validation scheme. As conclusion, the proposed CAD system performed better than approaches using only one feature extraction technique. We also provide a fair comparison (using the same database) of the selected feature extraction methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Z; MD Anderson Cancer Center, Houston, TX; Ho, A
Purpose: To develop and validate a prediction model using radiomics features extracted from MR images to distinguish radiation necrosis from tumor progression for brain metastases treated with Gamma knife radiosurgery. Methods: The images used to develop the model were T1 post-contrast MR scans from 71 patients who had had pathologic confirmation of necrosis or progression; 1 lesion was identified per patient (17 necrosis and 54 progression). Radiomics features were extracted from 2 images at 2 time points per patient, both obtained prior to resection. Each lesion was manually contoured on each image, and 282 radiomics features were calculated for eachmore » lesion. The correlation for each radiomics feature between two time points was calculated within each group to identify a subset of features with distinct values between two groups. The delta of this subset of radiomics features, characterizing changes from the earlier time to the later one, was included as a covariate to build a prediction model using support vector machines with a cubic polynomial kernel function. The model was evaluated with a 10-fold cross-validation. Results: Forty radiomics features were selected based on consistent correlation values of approximately 0 for the necrosis group and >0.2 for the progression group. In performing the 10-fold cross-validation, we narrowed this number down to 11 delta radiomics features for the model. This 11-delta-feature model showed an overall prediction accuracy of 83.1%, with a true positive rate of 58.8% in predicting necrosis and 90.7% for predicting tumor progression. The area under the curve for the prediction model was 0.79. Conclusion: These delta radiomics features extracted from MR scans showed potential for distinguishing radiation necrosis from tumor progression. This tool may be a useful, noninvasive means of determining the status of an enlarging lesion after radiosurgery, aiding decision-making regarding surgical resection versus conservative medical management.« less
Sasaki, Motoko; Sato, Yasunori
2017-04-01
Biliary tumors showing intraductal papillary growth (Pap-BTs) include intraductal papillary neoplasm of the bile duct (IPNB) and papillary cholangiocarcinoma (CC). A differential diagnosis between IPNB and papillary CC currently remains challenging. The aim of the present study is to identify histological features and immunohistochemical markers of malignant potential such as tumor invasion in Pap-BTs. Subjects comprised 37 patients with Pap-BT (intrahepatic and perihilar [proximal], 27: 17 noninvasive and 10 invasive; distal, 10: all invasive). We examined histological features and the expression of p53, enhancer of zeste homolog 2, insulin-like growth factor II mRNA-binding protein 3 (IMP3), and DNA methyltransferase-1 in the intraductal area in Pap-BTs. Noninvasive Pap-BT was characterized by the presence of a low-grade dysplastic area, edematous stroma, and the absence of necrosis. The expression of p53, enhancer of zeste homolog 2, IMP3, and DNA methyltransferase-1 was significantly weaker in noninvasive Pap-BTs than in invasive Pap-BTs (P<.01). Diffuse cytoplasmic IMP3 expression was absent in noninvasive Pap-BTs. IMP3 showed the greatest specificity to predict a presence of invasion. A heatmap demonstrated that proximal noninvasive Pap-BTs and distal Pap-BTs may be completely different. In bile duct biopsies, the expression of IMP3 was the most precise predictor of invasion in Pap-BTs. In conclusion, Pap-BTs may be separated into 3 subgroups: (1) proximal noninvasive Pap-BT, corresponding to IPNB; (2) distal invasive Pap-BT, corresponding to papillary CC; and (3) the remaining Pap-BT including IPNB with associated adenocarcinomas, based on histological and immunohistochemical features. IMP3 may be a useful marker for predicting invasion in Pap-BT. Copyright © 2017 Elsevier Inc. All rights reserved.
Tong, Tong; Gao, Qinquan; Guerrero, Ricardo; Ledig, Christian; Chen, Liang; Rueckert, Daniel; Initiative, Alzheimer's Disease Neuroimaging
2017-01-01
Identifying mild cognitive impairment (MCI) subjects who will progress to Alzheimer's disease (AD) is not only crucial in clinical practice, but also has a significant potential to enrich clinical trials. The purpose of this study is to develop an effective biomarker for an accurate prediction of MCI-to-AD conversion from magnetic resonance images. We propose a novel grading biomarker for the prediction of MCI-to-AD conversion. First, we comprehensively study the effects of several important factors on the performance in the prediction task including registration accuracy, age correction, feature selection, and the selection of training data. Based on the studies of these factors, a grading biomarker is then calculated for each MCI subject using sparse representation techniques. Finally, the grading biomarker is combined with age and cognitive measures to provide a more accurate prediction of MCI-to-AD conversion. Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the proposed global grading biomarker achieved an area under the receiver operating characteristic curve (AUC) in the range of 79-81% for the prediction of MCI-to-AD conversion within three years in tenfold cross validations. The classification AUC further increases to 84-92% when age and cognitive measures are combined with the proposed grading biomarker. The obtained accuracy of the proposed biomarker benefits from the contributions of different factors: a tradeoff registration level to align images to the template space, the removal of the normal aging effect, selection of discriminative voxels, the calculation of the grading biomarker using AD and normal control groups, and the integration of sparse representation technique and the combination of cognitive measures. The evaluation on the ADNI dataset shows the efficacy of the proposed biomarker and demonstrates a significant contribution in accurate prediction of MCI-to-AD conversion.
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Tien Bui, Dieu; Pourghasemi, Hamid Reza; Indra, Prakash; Dholakia, M. B.
2017-04-01
The objective of this study is to make a comparison of the prediction performance of three techniques, Functional Trees (FT), Multilayer Perceptron Neural Networks (MLP Neural Nets), and Naïve Bayes (NB) for landslide susceptibility assessment at the Uttarakhand Area (India). Firstly, a landslide inventory map with 430 landslide locations in the study area was constructed from various sources. Landslide locations were then randomly split into two parts (i) 70 % landslide locations being used for training models (ii) 30 % landslide locations being employed for validation process. Secondly, a total of eleven landslide conditioning factors including slope angle, slope aspect, elevation, curvature, lithology, soil, land cover, distance to roads, distance to lineaments, distance to rivers, and rainfall were used in the analysis to elucidate the spatial relationship between these factors and landslide occurrences. Feature selection of Linear Support Vector Machine (LSVM) algorithm was employed to assess the prediction capability of these conditioning factors on landslide models. Subsequently, the NB, MLP Neural Nets, and FT models were constructed using training dataset. Finally, success rate and predictive rate curves were employed to validate and compare the predictive capability of three used models. Overall, all the three models performed very well for landslide susceptibility assessment. Out of these models, the MLP Neural Nets and the FT models had almost the same predictive capability whereas the MLP Neural Nets (AUC = 0.850) was slightly better than the FT model (AUC = 0.849). The NB model (AUC = 0.838) had the lowest predictive capability compared to other models. Landslide susceptibility maps were final developed using these three models. These maps would be helpful to planners and engineers for the development activities and land-use planning.
How the environment shapes animal signals: a test of the acoustic adaptation hypothesis in frogs.
Goutte, S; Dubois, A; Howard, S D; Márquez, R; Rowley, J J L; Dehling, J M; Grandcolas, P; Xiong, R C; Legendre, F
2018-01-01
Long-distance acoustic signals are widely used in animal communication systems and, in many cases, are essential for reproduction. The acoustic adaptation hypothesis (AAH) implies that acoustic signals should be selected for further transmission and better content integrity under the acoustic constraints of the habitat in which they are produced. In this study, we test predictions derived from the AAH in frogs. Specifically, we focus on the difference between torrent frogs and frogs calling in less noisy habitats. Torrents produce sounds that can mask frog vocalizations and constitute a major acoustic constraint on call evolution. We combine data collected in the field, material from scientific collections and the literature for a total of 79 primarily Asian species, of the families Ranidae, Rhacophoridae, Dicroglossidae and Microhylidae. Using phylogenetic comparative methods and including morphological and environmental potential confounding factors, we investigate putatively adaptive call features in torrent frogs. We use broad habitat categories as well as fine-scale habitat measurements and test their correlation with six call characteristics. We find mixed support for the AAH. Spectral features of torrent frog calls are different from those of frogs calling in other habitats and are related to ambient noise levels, as predicted by the AAH. However, temporal call features do not seem to be shaped by the frogs' calling habitats. Our results underline both the complexity of call evolution and the need to consider multiple factors when investigating this issue. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Jalali, Ali; Licht, Daniel J.; Nataraj, C.
2013-01-01
This paper is concerned with the prediction of the occurrence of Periventricular Leukomalacia (PVL) that occurs in neonates after heart surgery. The data which is collected over a period of 12 hours after the cardiac surgery contains vital measurements as well as blood gas measurements with different resolutions. The decision tree classification technique has been selected as a tool for prediction of the PVL because of its capacity for discovering rules and novel associations in the data. Vital data measured using near-inferred spectroscopy (NIRS) at the sampling rate of 0.25 Hz and blood gas measurement up to 12 times with irregular time intervals for 35 patients collected from Children's Hospital of Philadelphia (CHOP) are used for this study. Vital data contain heart rate (HR), mean arterial pressure (MAP), right atrium pressure (RAP), blood hemoglobin (Hb), hemoglobin oxygen content (HbO2), oxygen saturation (SpO2) and relative cerebral blood flow (rCBF). Features derived from the data include statistical moments (mean, variance, skewness and kurtosis), trend and min and max of the vital data and rate of change, time weighted mean and a custom defined out of range index (ORI) for the blood gas data. A decision tree is developed for the vital data in order to identify the most important vital measurements. In addition, a decision tree is developed for blood gas data to find important factors for the prediction of PVL occurrence. Results show that in blood gas data, maximum rate of change in the concentration of bicarbonate ions in blood (HCO3) and minimum rate of change in the partial pressure of dissolved CO2 in the blood (PaCO2) are the most important factors for prediction of the PVL. Among vital features the kurtosis of HR and Hb are the most important parameters. PMID:23367279
Kolacz, Jacek; Raspa, Melissa; Heilman, Keri J; Porges, Stephen W
2018-06-01
Individuals with fragile X syndrome (FXS), especially those co-diagnosed with autism spectrum disorder (ASD), face many sensory processing challenges. However, sensory processing measures informed by neurophysiology are lacking. This paper describes the development and psychometric properties of a parent/caregiver report, the Brain-Body Center Sensory Scales (BBCSS), based on Polyvagal Theory. Parents/guardians reported on 333 individuals with FXS, 41% with ASD features. Factor structure using a split-sample exploratory-confirmatory design conformed to neurophysiological predictions. Internal consistency, test-retest, and inter-rater reliability were good to excellent. BBCSS subscales converged with the Sensory Profile and Sensory Experiences Questionnaire. However, data also suggest that BBCSS subscales reflect unique features related to sensory processing. Individuals with FXS and ASD features displayed more sensory challenges on most subscales.
Alminhana, Letícia O; Farias, Miguel; Claridge, Gordon; Cloninger, Claude R; Moreira-Almeida, Alexander
2017-01-01
It is unclear why some individuals reporting psychotic experiences have balanced lives while others go on to develop mental health problems. The objective of this study was to test if the personality traits of harm avoidance, self-directedness, and self-transcendence can be used as criteria to differentiate healthy from unhealthy schizotypal individuals. We interviewed 115 participants who reported a high frequency of psychotic experiences. The instruments used were the Temperament and Character Inventory (140), Structured Clinical Interview for DSM-IV, and the Oxford-Liverpool Inventory of Feelings and Experiences. Harm avoidance predicted cognitive disorganization (β = 0.319; t = 2.94), while novelty seeking predicted bipolar disorder (β = 0.136, Exp [β] = 1.146) and impulsive non-conformity (β = 0.322; t = 3.55). Self-directedness predicted an overall decrease in schizotypy, most of all in cognitive disorganization (β = -0.356; t = -2.95) and in impulsive non-conformity (β = -0.313; t = -2.83). Finally, self-transcendence predicted unusual experiences (β = 0.256; t = 2.32). Personality features are important criteria to distinguish between pathology and mental health in individuals presenting high levels of anomalous experiences (AEs). While self-directedness is a protective factor, both harm avoidance and novelty seeking were predictors of negative mental health outcomes. We suggest that the impact of AEs on mental health is moderated by personality factors.
How important is vehicle safety for older consumers in the vehicle purchase process?
Koppel, Sjaan; Clark, Belinda; Hoareau, Effie; Charlton, Judith L; Newstead, Stuart V
2013-01-01
This study aimed to investigate the importance of vehicle safety to older consumers in the vehicle purchase process. Older (n = 102), middle-aged (n = 791), and younger (n = 109) participants throughout the eastern Australian states of Victoria, New South Wales, and Queensland who had recently purchased a new or used vehicle completed an online questionnaire about their vehicle purchase process. When asked to list the 3 most important considerations in the vehicle purchase process (in an open-ended format), older consumers were mostly likely to list price as their most important consideration (43%). Similarly, when presented with a list of vehicle factors (such as price, design, Australasian New Car Assessment Program [ANCAP] rating), older consumers were most likely to identify price as the most important vehicle factor (36%). When presented with a list of vehicle features (such as automatic transmission, braking, air bags), older consumers in the current study were most likely to identify an antilock braking system (41%) as the most important vehicle feature, and 50 percent of older consumers identified a safety-related vehicle feature as the highest priority vehicle feature (50%). When asked to list up to 3 factors that make a vehicle safe, older consumers in the current study were most likely to list braking systems (35%), air bags (22%), and the driver's behavior or skill (11%). When asked about the influence of safety in the new vehicle purchase process, one third of older consumers reported that all new vehicles are safe (33%) and almost half of the older consumers rated their vehicle as safer than average (49%). A logistic regression model was developed to predict the profile of older consumers more likely to assign a higher priority to safety features in the vehicle purchasing process. The model predicted that the importance of safety-related features was influenced by several variables, including older consumers' beliefs that they could protect themselves and their family from a crash, their traffic infringement history, and whether they had children. These findings are consistent with previous research that suggests that, though older consumers highlight the importance of safety features (i.e., seat belts, air bags, braking), they often downplay the role of safety in their vehicle purchasing process and are more likely to equate vehicle safety with the presence of specific vehicle safety features or technologies rather than the vehicle's crash safety/test results or crashworthiness. The findings from this study provide a foundation to support further research in this area that can be used by policy makers, manufacturers, and other stakeholders to better target the promotion and publicity of vehicle safety features to particular consumer groups (such as older consumers). Better targeted campaigns may help to emphasize the value of safety features and their role in reducing the risk of injury/death. If older consumers are better informed of the benefits of safety features when purchasing a vehicle, a further reduction in injuries and deaths related to motor vehicle crashes may be realized.
Customer Churn Prediction for Broadband Internet Services
NASA Astrophysics Data System (ADS)
Huang, B. Q.; Kechadi, M.-T.; Buckley, B.
Although churn prediction has been an area of research in the voice branch of telecommunications services, more focused studies on the huge growth area of Broadband Internet services are limited. Therefore, this paper presents a new set of features for broadband Internet customer churn prediction, based on Henley segments, the broadband usage, dial types, the spend of dial-up, line-information, bill and payment information, account information. Then the four prediction techniques (Logistic Regressions, Decision Trees, Multilayer Perceptron Neural Networks and Support Vector Machines) are applied in customer churn, based on the new features. Finally, the evaluation of new features and a comparative analysis of the predictors are made for broadband customer churn prediction. The experimental results show that the new features with these four modelling techniques are efficient for customer churn prediction in the broadband service field.
A new method for enhancer prediction based on deep belief network.
Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong
2017-10-16
Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.
Splicing predictions reliably classify different types of alternative splicing
Busch, Anke; Hertel, Klemens J.
2015-01-01
Alternative splicing is a key player in the creation of complex mammalian transcriptomes and its misregulation is associated with many human diseases. Multiple mRNA isoforms are generated from most human genes, a process mediated by the interplay of various RNA signature elements and trans-acting factors that guide spliceosomal assembly and intron removal. Here, we introduce a splicing predictor that evaluates hundreds of RNA features simultaneously to successfully differentiate between exons that are constitutively spliced, exons that undergo alternative 5′ or 3′ splice-site selection, and alternative cassette-type exons. Surprisingly, the splicing predictor did not feature strong discriminatory contributions from binding sites for known splicing regulators. Rather, the ability of an exon to be involved in one or multiple types of alternative splicing is dictated by its immediate sequence context, mainly driven by the identity of the exon's splice sites, the conservation around them, and its exon/intron architecture. Thus, the splicing behavior of human exons can be reliably predicted based on basic RNA sequence elements. PMID:25805853
Wang, Jie-sheng; Han, Shuang; Shen, Na-na
2014-01-01
For predicting the key technology indicators (concentrate grade and tailings recovery rate) of flotation process, an echo state network (ESN) based fusion soft-sensor model optimized by the improved glowworm swarm optimization (GSO) algorithm is proposed. Firstly, the color feature (saturation and brightness) and texture features (angular second moment, sum entropy, inertia moment, etc.) based on grey-level co-occurrence matrix (GLCM) are adopted to describe the visual characteristics of the flotation froth image. Then the kernel principal component analysis (KPCA) method is used to reduce the dimensionality of the high-dimensional input vector composed by the flotation froth image characteristics and process datum and extracts the nonlinear principal components in order to reduce the ESN dimension and network complex. The ESN soft-sensor model of flotation process is optimized by the GSO algorithm with congestion factor. Simulation results show that the model has better generalization and prediction accuracy to meet the online soft-sensor requirements of the real-time control in the flotation process. PMID:24982935
Szagun, Gisela; Schramm, Satyam A
2016-05-01
The aim of the present study was to analyze the relative influence of age at implantation, parental expansions, and child language internal factors on grammatical progress in children with cochlear implants (CI). Data analyses used two longitudinal corpora of spontaneous speech samples, one with twenty-two and one with twenty-six children, implanted between 0;6 and 3;10. Analyses were performed on the combined and separate samples. Regression analyses indicate that early child MLU is the strongest predictor of child MLU two and two-and-a-half years later, followed by parental expansions and age at implantation. Associations between earliest MLU gains and MLU two years later point to stability of individual differences. Early type and token frequencies of determiners predict MLU two years later more strongly than early frequency of lexical words. We conclude that features of CI children's very early language have considerable predictive value for later language outcomes.
Lin, Lung-Chang; Chen, Sharon Chia-Ju; Chiang, Ching-Tai; Wu, Hui-Chuan; Yang, Rei-Cheng; Ouyang, Chen-Sen
2017-03-01
The life quality of patients with refractory epilepsy is extremely affected by abrupt and unpredictable seizures. A reliable method for predicting seizures is important in the management of refractory epilepsy. A critical factor in seizure prediction involves the classification of the preictal and interictal stages. This study aimed to develop an efficient, automatic, quantitative, and individualized approach for preictal/interictal stage identification. Five epileptic children, who had experienced at least 2 episodes of seizures during a 24-hour video EEG recording, were included. Artifact-free preictal and interictal EEG epochs were acquired, respectively, and characterized with 216 global feature descriptors. The best subset of 5 discriminative descriptors was identified. The best subsets showed differences among the patients. Statistical analysis revealed most of the 5 descriptors in each subset were significantly different between the preictal and interictal stages for each patient. The proposed approach yielded weighted averages of 97.50% correctness, 96.92% sensitivity, 97.78% specificity, and 95.45% precision on classifying test epochs. Although the case number was limited, this study successfully integrated a new EEG analytical method to classify preictal and interictal EEG segments and might be used further in predicting the occurrence of seizures.
Promotion and resignation in employee networks
NASA Astrophysics Data System (ADS)
Yuan, Jia; Zhang, Qian-Ming; Gao, Jian; Zhang, Linyan; Wan, Xue-Song; Yu, Xiao-Jun; Zhou, Tao
2016-02-01
Enterprises have put more and more emphasis on data analysis so as to obtain effective management advices. Managers and researchers are trying to dig out the major factors that lead to employees' promotion and resignation. Most previous analyses are based on questionnaire survey, which usually consists of a small fraction of samples and contains biases caused by psychological defense. In this paper, we successfully collect a data set consisting of all the employees' work-related interactions (action network, AN for short) and online social connections (social network, SN for short) of a company, which inspires us to reveal the correlations between structural features and employees' career development, namely promotion and resignation. Through statistical analysis, we show that the structural features of both AN and SN are correlated and predictive to employees' promotion and resignation, and the AN has higher correlation and predictability. More specifically, the in-degree in AN is the most relevant indicator for promotion, while the k-shell index in AN and in-degree in SN are both very predictive to resignation. Our results provide a novel and actionable understanding of enterprise management and suggest that to enhance the interplays among employees, no matter work-related or social interplays, can be helpful to reduce staffs' turnover risk.
Liu, Chunming; Dong, Zhengchao; Xu, Liang; Khursheed, Aiman; Dong, Longchun; Liu, Zhenxing; Yang, Jun; Liu, Jun
2015-11-01
The aims of this study were to observe magnetic resonance imaging (MRI) features and the frequency of hemorrhagic transformation (HT) in patients with acute cerebral infarction and to identify the risk factors of HT. We first performed multimodal MRI (anatomical, diffusion weighted, and susceptibility weighted) scans on 87 patients with acute cerebral infarction within 24 hours after symptom onset and documented the image findings. We then performed follow-up examinations 3 days to 2 weeks after the onset or whenever the conditions of the patients worsened within 3 days. We utilized univariate statistics to identify the correlations between HT and image features and used multivariate logistical regression to correct for confounding factors to determine relevant independent image features of HT. HT was observed in 17 out of total 87 patients (19.5 %). The infarct size (p = 0.021), cerebral microbleeds (CMBs) (p = 0.004), relative apparent diffusion (rADC) (p = 0.023), and venous anomalies (p = 0.000) were significantly related with HT in the univariate statistics. Multivariate analysis demonstrated that CMBs (odd ratio (OR) = 0.082; 95 % confidence interval (CI) = 0.011-0.597; p = 0.014), rADC (OR = 0.000; 95 % CI = 0.000-0.692; p = 0.041), and venous anomalies (OR = 0.066; 95 % CI = 0.011-0.403; p = 0.003) were independent risk factors for HT. The frequency of HT is 19.5 % in this study. CMBs, rADC, and venous anomalies are independent risk factors for HT of acute cerebral infarction.
Pracht, M; Mogha, A; Lespagnol, A; Fautrel, A; Mouchet, N; Le Gall, F; Paumier, V; Lefeuvre-Plesse, C; Rioux-Leclerc, N; Mosser, J; Oger, E; Adamski, H; Galibert, M-D; Lesimple, T
2015-08-01
Mutations of BRAF, NRAS and c-KIT oncogenes are preferentially described in certain histological subtypes of melanoma and linked to specific histopathological features. BRAF-, MEK- and KIT-inhibitors led to improvement in overall survival of patients harbouring mutated metastatic melanoma. To assess the prevalence and types of BRAF, NRAS, c-KIT and MITF mutations in cutaneous and mucous melanoma and to correlate mutation status with clinicopathological features and outcome. Clinicopathological features and mutation status of 108 samples and of 98 consecutive patients were, respectively, assessed in one retrospective and one prospective study. Clinicopathological features were correlated with mutation status and the predictive value of these mutations was studied. This work identified significant correlations between BRAF mutations and melanoma occurring on non-chronic sun-damaged skin and superficial spreading melanoma (P < 0.05) on one hand, and between NRAS mutations and nodular melanoma (P < 0.05) on the other hand. Younger age (P < 0.05), microscopic (P < 0.05) and macroscopic (P < 0.05) lymphatic involvement at diagnosis of primary melanoma were significantly linked to BRAF mutations. A mutated status was a positive predictive factor of a response to BRAF inhibitors (OR = 3.44). Mutated melanoma showed a significantly (P = 0.038) higher objective response rate to cytotoxic chemotherapy (26.3%) than wild-type tumours (6.7%). Clinical and pathological characteristics of the primary melanoma differed between wild-type and BRAF- or NRAS-mutated tumours. Patients with BRAF-mutated tumours were younger at diagnosis of primary melanoma. Patients carrying mutations showed better responses better to specific kinase inhibitors and interestingly also to systemic cytotoxic chemotherapy. © 2015 European Academy of Dermatology and Venereology.
Identification of informative features for predicting proinflammatory potentials of engine exhausts.
Wang, Chia-Chi; Lin, Ying-Chi; Lin, Yuan-Chung; Jhang, Syu-Ruei; Tung, Chun-Wei
2017-08-18
The immunotoxicity of engine exhausts is of high concern to human health due to the increasing prevalence of immune-related diseases. However, the evaluation of immunotoxicity of engine exhausts is currently based on expensive and time-consuming experiments. It is desirable to develop efficient methods for immunotoxicity assessment. To accelerate the development of safe alternative fuels, this study proposed a computational method for identifying informative features for predicting proinflammatory potentials of engine exhausts. A principal component regression (PCR) algorithm was applied to develop prediction models. The informative features were identified by a sequential backward feature elimination (SBFE) algorithm. A total of 19 informative chemical and biological features were successfully identified by SBFE algorithm. The informative features were utilized to develop a computational method named FS-CBM for predicting proinflammatory potentials of engine exhausts. FS-CBM model achieved a high performance with correlation coefficient values of 0.997 and 0.943 obtained from training and independent test sets, respectively. The FS-CBM model was developed for predicting proinflammatory potentials of engine exhausts with a large improvement on prediction performance compared with our previous CBM model. The proposed method could be further applied to construct models for bioactivities of mixtures.
Sideris, Costas; Alshurafa, Nabil; Pourhomayoun, Mohammad; Shahmohammadi, Farhad; Samy, Lauren; Sarrafzadeh, Majid
2015-01-01
In this paper, we propose a novel methodology for utilizing disease diagnostic information to predict severity of condition for Congestive Heart Failure (CHF) patients. Our methodology relies on a novel, clustering-based, feature extraction framework using disease diagnostic information. To reduce the dimensionality we identify disease clusters using cooccurence frequencies. We then utilize these clusters as features to predict patient severity of condition. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 patients. We compare our cluster-based feature set with another that incorporates the Charlson comorbidity score as a feature and demonstrate an accuracy improvement of up to 14% in the predictability of the severity of condition.
Sun, X; Chen, K J; Berg, E P; Newman, D J; Schwartz, C A; Keller, W L; Maddock Carlin, K R
2014-02-01
The objective was to use digital color image texture features to predict troponin-T degradation in beef. Image texture features, including 88 gray level co-occurrence texture features, 81 two-dimension fast Fourier transformation texture features, and 48 Gabor wavelet filter texture features, were extracted from color images of beef strip steaks (longissimus dorsi, n = 102) aged for 10d obtained using a digital camera and additional lighting. Steaks were designated degraded or not-degraded based on troponin-T degradation determined on d 3 and d 10 postmortem by immunoblotting. Statistical analysis (STEPWISE regression model) and artificial neural network (support vector machine model, SVM) methods were designed to classify protein degradation. The d 3 and d 10 STEPWISE models were 94% and 86% accurate, respectively, while the d 3 and d 10 SVM models were 63% and 71%, respectively, in predicting protein degradation in aged meat. STEPWISE and SVM models based on image texture features show potential to predict troponin-T degradation in meat. © 2013.
Fish swarm intelligent to optimize real time monitoring of chips drying using machine vision
NASA Astrophysics Data System (ADS)
Hendrawan, Y.; Hawa, L. C.; Damayanti, R.
2018-03-01
This study attempted to apply machine vision-based chips drying monitoring system which is able to optimise the drying process of cassava chips. The objective of this study is to propose fish swarm intelligent (FSI) optimization algorithms to find the most significant set of image features suitable for predicting water content of cassava chips during drying process using artificial neural network model (ANN). Feature selection entails choosing the feature subset that maximizes the prediction accuracy of ANN. Multi-Objective Optimization (MOO) was used in this study which consisted of prediction accuracy maximization and feature-subset size minimization. The results showed that the best feature subset i.e. grey mean, L(Lab) Mean, a(Lab) energy, red entropy, hue contrast, and grey homogeneity. The best feature subset has been tested successfully in ANN model to describe the relationship between image features and water content of cassava chips during drying process with R2 of real and predicted data was equal to 0.9.
PrAS: Prediction of amidation sites using multiple feature extraction.
Wang, Tong; Zheng, Wei; Wuyun, Qiqige; Wu, Zhenfeng; Ruan, Jishou; Hu, Gang; Gao, Jianzhao
2017-02-01
Amidation plays an important role in a variety of pathological processes and serious diseases like neural dysfunction and hypertension. However, identification of protein amidation sites through traditional experimental methods is time consuming and expensive. In this paper, we proposed a novel predictor for Prediction of Amidation Sites (PrAS), which is the first software package for academic users. The method incorporated four representative feature types, which are position-based features, physicochemical and biochemical properties features, predicted structure-based features and evolutionary information features. A novel feature selection method, positive contribution feature selection was proposed to optimize features. PrAS achieved AUC of 0.96, accuracy of 92.1%, sensitivity of 81.2%, specificity of 94.9% and MCC of 0.76 on the independent test set. PrAS is freely available at https://sourceforge.net/p/praspkg. Copyright © 2016 Elsevier Ltd. All rights reserved.
Essentialist beliefs about homosexuality: structure and implications for prejudice.
Haslam, Nick; Levy, Sheri R
2006-04-01
The structure of beliefs about the nature of homosexuality, and their association with antigay attitudes, were examined in three studies (Ns = 309, 487, and 216). Contrary to previous research, three dimensions were obtained: the belief that homosexuality is biologically based, immutable, and fixed early in life; the belief that it is cross-culturally and historically universal; and the belief that it constitutes a discrete, entitative type with defining features. Study 1 supported a three-factor structure for essentialist beliefs about male homosexuality. Study 2 replicated this structure with confirmatory factor analysis, extended it to beliefs about lesbianism, showed that all three dimensions predicted antigay attitudes, and demonstrated that essentialist beliefs mediate associations between prejudice and gender, ethnicity, and religiosity. Study 3 replicated the belief structure and mediation effects in a community sample and showed that essentialist beliefs predict antigay prejudice independently of right-wing authoritarianism, social dominance orientation, and political conservatism.
Detecting the Presence of a Personality Disorder Using Interpersonal and Self-Dysfunction.
Beeney, Joseph E; Lazarus, Sophie A; Hallquist, Michael N; Stepp, Stephanie D; Wright, Aidan G C; Scott, Lori N; Giertych, Rachel A; Pilkonis, Paul A
2018-03-05
Calls have increased to place interpersonal and self-disturbance as defining features of personality disorders (PDs). Findings from a methodologically diverse set of studies suggest that a common factor undergirds all PDs. The nature of this core of PDs, however, is not clear. In the current study, interviews were completed for DSM-IV PD diagnosis and interpersonal dysfunction independently with 272 individuals (PD = 191, no-PD = 91). Specifically, we evaluated interpersonal dysfunction across social domains. In addition, we empirically assessed the structure of self-dysfunction in PDs. We found dysfunction in work and romantic domains, and unstable identity uniquely predicted variance in the presence of a PD. Using receiver operating characteristic analysis, we found that the interpersonal dysfunction and self-dysfunction scales each predicted PDs with high accuracy. In combination, the scales resulted in excellent sensitivity (.90) and specificity (.88). The results support interpersonal and self-dysfunction as general factors of PD.
Hierarchy of stability factors in reverse shoulder arthroplasty.
Gutiérrez, Sergio; Keller, Tony S; Levy, Jonathan C; Lee, William E; Luo, Zong-Ping
2008-03-01
Reverse shoulder arthroplasty is being used more frequently to treat irreparable rotator cuff tears in the presence of glenohumeral arthritis and instability. To date, however, design features and functions of reverse shoulder arthroplasty, which may be associated with subluxation and dislocation of these implants, have been poorly understood. We asked: (1) what is the hierarchy of importance of joint compressive force, prosthetic socket depth, and glenosphere size in relation to stability, and (2) is this hierarchy defined by underlying and theoretically predictable joint contact characteristics? We examined the intrinsic stability in terms of the force required to dislocate the humerosocket from the glenosphere of eight commercially available reverse shoulder arthroplasty devices. The hierarchy of factors was led by compressive force followed by socket depth; glenosphere size played a much lesser role in stability of the reverse shoulder arthroplasty device. Similar results were predicted by a mathematical model, suggesting the stability was determined primarily by compressive forces generated by muscles.
Verification of a SEU model for advanced 1-micron CMOS structures using heavy ions
NASA Technical Reports Server (NTRS)
Cable, J. S.; Carter, J. R.; Witteles, A. A.
1986-01-01
Modeling and test results are reported for 1 micron CMOS circuits. Analytical predictions are correlated with experimental data, and sensitivities to process and design variations are discussed. Unique features involved in predicting the SEU performance of these devices are described. The results show that the critical charge for upset exhibits a strong dependence on pulse width for very fast devices, and upset predictions must factor in the pulse shape. Acceptable SEU error rates can be achieved for a 1 micron bulk CMOS process. A thin retrograde well provides complete SEU immunity for N channel hits at normal incidence angle. Source interconnect resistance can be important parameter in determining upset rates, and Cf-252 testing can be a valuable tool for cost-effective SEU testing.
Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou
2016-01-01
The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646
Feature Selection Methods for Zero-Shot Learning of Neural Activity.
Caceres, Carlos A; Roos, Matthew J; Rupp, Kyle M; Milsap, Griffin; Crone, Nathan E; Wolmetz, Michael E; Ratto, Christopher R
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.
[Review and prospect of analysis on UHMWPE wear debris in artificial hip joints].
Wu, Jingping; Yuan, Chengqing; Yan, Xinping
2010-02-01
This paper briefly reviews the latest progress in the analyses of the technologies for artificial hip joints; and in the researches directed to the features of UHMWPE debris obtained from all kinds of experimental conditions, to the wear process and wear mechanism, and to the factors which influence the wear mechanism. Furthermore, the signification of debris atlas was illustrated. Finally, future directions to be furthered were considered and envisaged. It is suggested that emphases be laid on the relationship between the UHMWPE debris feature and the wear mechanism, and be laid synergistic effects of biochemical environment and loading environment so as to establish the predictive wear models of artificial hip joints.
NASA Astrophysics Data System (ADS)
Checefsky, Walter A.; Abidin, Anas Z.; Nagarajan, Mahesh B.; Bauer, Jan S.; Baum, Thomas; Wismüller, Axel
2016-03-01
The current clinical standard for measuring Bone Mineral Density (BMD) is dual X-ray absorptiometry, however more recently BMD derived from volumetric quantitative computed tomography has been shown to demonstrate a high association with spinal fracture susceptibility. In this study, we propose a method of fracture risk assessment using structural properties of trabecular bone in spinal vertebrae. Experimental data was acquired via axial multi-detector CT (MDCT) from 12 spinal vertebrae specimens using a whole-body 256-row CT scanner with a dedicated calibration phantom. Common image processing methods were used to annotate the trabecular compartment in the vertebral slices creating a circular region of interest (ROI) that excluded cortical bone for each slice. The pixels inside the ROI were converted to values indicative of BMD. High dimensional geometrical features were derived using the scaling index method (SIM) at different radii and scaling factors (SF). The mean BMD values within the ROI were then extracted and used in conjunction with a support vector machine to predict the failure load of the specimens. Prediction performance was measured using the root-mean-square error (RMSE) metric and determined that SIM combined with mean BMD features (RMSE = 0.82 +/- 0.37) outperformed MDCT-measured mean BMD (RMSE = 1.11 +/- 0.33) (p < 10-4). These results demonstrate that biomechanical strength prediction in vertebrae can be significantly improved through the use of SIM-derived texture features from trabecular bone.
DemQSAR: predicting human volume of distribution and clearance of drugs
NASA Astrophysics Data System (ADS)
Demir-Kavuk, Ozgur; Bentzien, Jörg; Muegge, Ingo; Knapp, Ernst-Walter
2011-12-01
In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure-activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VDss) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VDss and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/.
DemQSAR: predicting human volume of distribution and clearance of drugs.
Demir-Kavuk, Ozgur; Bentzien, Jörg; Muegge, Ingo; Knapp, Ernst-Walter
2011-12-01
In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure-activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VD(ss)) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VD(ss) and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/ .
Tsuboi, Hiroto; Sumida, Takayuki; Noma, Hisashi; Yamagishi, Kazumasa; Anami, Ai; Fukushima, Kotaro; Horigome, Hitoshi; Maeno, Yasuki; Kishimoto, Mitsumasa; Takasaki, Yoshinari; Nakayama, Masahiro; Waguri, Masako; Sago, Haruhiko; Murashima, Atsuko
2016-07-01
To determine the maternal predictive factors for fetal congenital heart block (CHB) in pregnancy in mothers positive for anti-SS-A antibodies. The Research Team for Surveillance of Autoantibody-Exposed Fetuses and Treatment of Neonatal Lupus Erythematosus, the Research Program of the Japan Ministry of Health, Labor and Welfare, performed a national survey on pregnancy of mothers positive for anti-SS-A antibodies. We analyzed 635 pregnant mothers who tested positive for anti-SS-A antibodies before conception but had no previous history of fetal CHB. We performed univariate and multivariate analysis (models 1, 2, and 3 using different set of independent variables) investigated the relation between risk of fetal CHB and maternal clinical features. Of the 635 pregnant mothers, fetal CHB was detected in 16. Univariate analysis showed that fetal CHB associated with use of corticosteroids before conception (OR 3.72, p = 0.04), and negatively with use of corticosteroids (equivalent doses of prednisolone (PSL), at ≥10 mg/day) after conception before 16-week gestation (OR 0.17, p = 0.03). In multivariate analysis, model 1 identified the use of corticosteroids before conception (OR 4.28, p = 0.04) and high titer of anti-SS-A antibodies (OR 3.58, p = 0.02) as independent and significant risk factors, and model 3 identified use of corticosteroids (equivalent doses of PSL, at ≥10 mg/day) after conception before 16-week gestation as independent protective factor against the development of fetal CHB (OR 0.16, p = 0.03). Other maternal clinical features did not influence the development of fetal CHB. The results identified high titers of anti-SS-A antibodies and use of corticosteroids before conception as independent risk factors, and use of corticosteroids (equivalent doses of PSL, at ≥10 mg/day) after conception before 16-week gestation as an independent protective factor for fetal CHB.
Jacobs, Richard H A H; Haak, Koen V; Thumfart, Stefan; Renken, Remco; Henson, Brian; Cornelissen, Frans W
2016-01-01
Our world is filled with texture. For the human visual system, this is an important source of information for assessing environmental and material properties. Indeed-and presumably for this reason-the human visual system has regions dedicated to processing textures. Despite their abundance and apparent relevance, only recently the relationships between texture features and high-level judgments have captured the interest of mainstream science, despite long-standing indications for such relationships. In this study, we explore such relationships, as these might be used to predict perceived texture qualities. This is relevant, not only from a psychological/neuroscience perspective, but also for more applied fields such as design, architecture, and the visual arts. In two separate experiments, observers judged various qualities of visual textures such as beauty, roughness, naturalness, elegance, and complexity. Based on factor analysis, we find that in both experiments, ~75% of the variability in the judgments could be explained by a two-dimensional space, with axes that are closely aligned to the beauty and roughness judgments. That a two-dimensional judgment space suffices to capture most of the variability in the perceived texture qualities suggests that observers use a relatively limited set of internal scales on which to base various judgments, including aesthetic ones. Finally, for both of these judgments, we determined the relationship with a large number of texture features computed for each of the texture stimuli. We find that the presence of lower spatial frequencies, oblique orientations, higher intensity variation, higher saturation, and redness correlates with higher beauty ratings. Features that captured image intensity and uniformity correlated with roughness ratings. Therefore, a number of computational texture features are predictive of these judgments. This suggests that perceived texture qualities-including the aesthetic appreciation-are sufficiently universal to be predicted-with reasonable accuracy-based on the computed feature content of the textures.
Li I AND K I SCATTER IN COOL PLEIADES DWARFS
DOE Office of Scientific and Technical Information (OSTI.GOV)
King, Jeremy R.; Schuler, Simon C.; Hobbs, L. M.
2010-02-20
We utilize high-resolution (R {approx} 60,000), high signal-to-noise ratio ({approx}100) spectroscopy of 17 cool Pleiades dwarfs to examine the confounding star-to-star scatter in the lambda6707 Li I line strengths in this young cluster. Our Pleiades, selected for their small projected rotational velocity and modest chromospheric emission, evince substantial scatter in the line strengths of lambda6707 Li I feature that is absent in the lambda7699 K I resonance line. The Li I scatter is not correlated with that in the high-excitation lambda7774 O I feature, and the magnitude of the former is greater than the latter despite the larger temperature sensitivitymore » of the O I feature. These results suggest that systematic errors in line strength measurements due to blending, color (or color-based T{sub eff}) errors, or line formation effects related to an overlying chromosphere are not the principal source of Li I scatter in our stars. There do exist analytic spot models that can produce, via line formation effects, the observed Li scatter without introducing scatter in the K I line strengths or the color-magnitude diagram. However, these models predict factor of >=3 differences in abundances derived from the subordinate lambda6104 and resonance lambda6707 Li I features; we find no difference in the abundances determined from these two features. These analytic spot models also predict CN line strengths significantly larger than we observe in our spectra. The simplest explanation of the Li, K, CN, and photometric data is that there must be a real abundance component to the Pleiades Li dispersion. We suggest that this real abundance component is the manifestation of relic differences in erstwhile pre-main-sequence Li burning caused by effects of surface activity on stellar structure. We discuss observational predictions of these effects, which may be related to other anomalous stellar phenomena.« less
Blackmore, Emma Robertson; Gustafsson, Hanna; Gilchrist, Michelle; Wyman, Claire; O’Connor, Thomas G
2016-01-01
Background Pregnancy-related anxiety (PrA) has attracted considerable research attention, but questions remain about its distinctiveness from conventional constructs and measures. In a high psychosocial risk, ethnically diverse sample, we examine the degree to which PrA is distinct from continuous and diagnostic measures of anxiety and worry in terms of longitudinal course, associations with psychosocial and perinatal risk, and prediction of postnatal mood disturbance. Methods 345 women oversampled for prenatal anxiety and depression were selected from an urban obstetrics clinic serving a predominantly low-income, ethnically diverse population. PrA was assessed at 20 and 32 weeks gestation; anxiety and depression symptoms were assessed from questionnaire and from clinical interview at 20 and 32 weeks gestation and again at 2 and 6 months postnatally. Data relevant to psychosocial and obstetric risks were ascertained from interview, medical exam, and chart review. Results Two distinct factors of PrA were identified, indexing specific concerns about the child’s health and about the birth; these two PrA factors showed distinct longitudinal patterns in the prenatal period, and modest associations with general measures of anxiety and depression from questionnaire and clinical interview. PrA was also distinguished from conventional symptom measures in its associated features and prediction of birth weight and postnatal mood. Limitations The sample was at high psychosocial risk and ethnically diverse; findings may not generalize to other samples. Conclusions PrA can be distinguished from general measures of anxiety in pregnancy in terms of longitudinal course, associated features, and prediction to postnatal mood disturbance, and may warrant specific clinical attention. PMID:26999549
Galatzer-Levy, I R; Ma, S; Statnikov, A; Yehuda, R; Shalev, A Y
2017-01-01
To date, studies of biological risk factors have revealed inconsistent relationships with subsequent post-traumatic stress disorder (PTSD). The inconsistent signal may reflect the use of data analytic tools that are ill equipped for modeling the complex interactions between biological and environmental factors that underlay post-traumatic psychopathology. Further, using symptom-based diagnostic status as the group outcome overlooks the inherent heterogeneity of PTSD, potentially contributing to failures to replicate. To examine the potential yield of novel analytic tools, we reanalyzed data from a large longitudinal study of individuals identified following trauma in the general emergency room (ER) that failed to find a linear association between cortisol response to traumatic events and subsequent PTSD. First, latent growth mixture modeling empirically identified trajectories of post-traumatic symptoms, which then were used as the study outcome. Next, support vector machines with feature selection identified sets of features with stable predictive accuracy and built robust classifiers of trajectory membership (area under the receiver operator characteristic curve (AUC)=0.82 (95% confidence interval (CI)=0.80–0.85)) that combined clinical, neuroendocrine, psychophysiological and demographic information. Finally, graph induction algorithms revealed a unique path from childhood trauma via lower cortisol during ER admission, to non-remitting PTSD. Traditional general linear modeling methods then confirmed the newly revealed association, thereby delineating a specific target population for early endocrine interventions. Advanced computational approaches offer innovative ways for uncovering clinically significant, non-shared biological signals in heterogeneous samples. PMID:28323285
Galatzer-Levy, I R; Ma, S; Statnikov, A; Yehuda, R; Shalev, A Y
2017-03-21
To date, studies of biological risk factors have revealed inconsistent relationships with subsequent post-traumatic stress disorder (PTSD). The inconsistent signal may reflect the use of data analytic tools that are ill equipped for modeling the complex interactions between biological and environmental factors that underlay post-traumatic psychopathology. Further, using symptom-based diagnostic status as the group outcome overlooks the inherent heterogeneity of PTSD, potentially contributing to failures to replicate. To examine the potential yield of novel analytic tools, we reanalyzed data from a large longitudinal study of individuals identified following trauma in the general emergency room (ER) that failed to find a linear association between cortisol response to traumatic events and subsequent PTSD. First, latent growth mixture modeling empirically identified trajectories of post-traumatic symptoms, which then were used as the study outcome. Next, support vector machines with feature selection identified sets of features with stable predictive accuracy and built robust classifiers of trajectory membership (area under the receiver operator characteristic curve (AUC)=0.82 (95% confidence interval (CI)=0.80-0.85)) that combined clinical, neuroendocrine, psychophysiological and demographic information. Finally, graph induction algorithms revealed a unique path from childhood trauma via lower cortisol during ER admission, to non-remitting PTSD. Traditional general linear modeling methods then confirmed the newly revealed association, thereby delineating a specific target population for early endocrine interventions. Advanced computational approaches offer innovative ways for uncovering clinically significant, non-shared biological signals in heterogeneous samples.
Brewer, S.K.; Rabeni, C.F.; Sowa, S.P.; Annis, G.
2007-01-01
Protecting and restoring fish populations on a regional basis are most effective if the multiscale factors responsible for the relative quality of a fishery are known. We spatially linked Missouri's statewide historical fish collections to environmental features in a geographic information system, which was used as a basis for modeling the importance of landscape and stream segment features in supporting a population of smallmouth bass Micropterus dolomieu. Decision tree analyses were used to develop probability-based models to predict statewide occurrence and within-range relative abundances. We were able to identify the range of smallmouth bass throughout Missouri and the probability of occurrence within that range by using a few broad landscape variables: the percentage of coarse-textured soils in the watershed, watershed relief, and the percentage of soils with low permeability in the watershed. The within-range relative abundance model included both landscape and stream segment variables. As with the statewide probability of occurrence model, soil permeability was particularly significant. The predicted relative abundance of smallmouth bass in stream segments containing low percentages of permeable soils was further influenced by channel gradient, stream size, spring-flow volume, and local slope. Assessment of model accuracy with an independent data set showed good concordance. A conceptual framework involving naturally occurring factors that affect smallmouth bass potential is presented as a comparative model for assessing transferability to other geographic areas and for studying potential land use and biotic effects. We also identify the benefits, caveats, and data requirements necessary to improve predictions and promote ecological understanding. ?? Copyright by the American Fisheries Society 2007.
Guo, Yanyong; Li, Zhibin; Wu, Yao; Xu, Chengcheng
2018-06-01
Bicyclists running the red light at crossing facilities increase the potential of colliding with motor vehicles. Exploring the contributing factors could improve the prediction of running red-light probability and develop countermeasures to reduce such behaviors. However, individuals could have unobserved heterogeneities in running a red light, which make the accurate prediction more challenging. Traditional models assume that factor parameters are fixed and cannot capture the varying impacts on red-light running behaviors. In this study, we employed the full Bayesian random parameters logistic regression approach to account for the unobserved heterogeneous effects. Two types of crossing facilities were considered which were the signalized intersection crosswalks and the road segment crosswalks. Electric and conventional bikes were distinguished in the modeling. Data were collected from 16 crosswalks in urban area of Nanjing, China. Factors such as individual characteristics, road geometric design, environmental features, and traffic variables were examined. Model comparison indicates that the full Bayesian random parameters logistic regression approach is statistically superior to the standard logistic regression model. More red-light runners are predicted at signalized intersection crosswalks than at road segment crosswalks. Factors affecting red-light running behaviors are gender, age, bike type, road width, presence of raised median, separation width, signal type, green ratio, bike and vehicle volume, and average vehicle speed. Factors associated with the unobserved heterogeneity are gender, bike type, signal type, separation width, and bike volume. Copyright © 2018 Elsevier Ltd. All rights reserved.
A real-time method to predict social media popularity
NASA Astrophysics Data System (ADS)
Chen, Xiao; Lu, Zhe-Ming
How to predict the future popularity of a message or video on online social media (OSM) has long been an attractive problem for researchers. Although many difficulties are still ahead, recent studies suggest that temporal and topological features of early adopters generally play a very important role. However, with the increase of the adopters, the feature space will grow explosively. How to select the most effective features is still an open issue. In this work, we investigate several feature extraction methods over the Twitter platform and find that most predictive power concentrates on the second half of the propagation period, and that not only a model trained on one platform generalizes well to others as previous works observed, but also a model trained on one dataset performs well on predicting the popularity for other datasets with different number of observed early adopters. According to these findings, at least for the best features by far, the data used to extract features can be halved without loss of evident accuracy and we provide a way to roughly predict the growth trend of a social-media item in real-time.
Eskiizmir, G; Ozgur, E; Karaca, G; Temiz, P; Yanar, N Hacioglu; Ozyurt, B Cengiz
2017-10-01
To determine the locoregional control and survival rates (in terms of risk factors) of patients who underwent surgical resection of early-stage lip cancer and for whom a 'wait and see' policy in terms of neck status had been implemented. The sociodemographic data, tumour stage, tumour characteristics and histopathological features of 41 patients with early-stage lip cancer were evaluated. Factors predictive of survival and locoregional recurrence were analysed. The five-year overall survival and disease-free survival rates were determined, and the prognostic risk factors were compared. The mean follow-up period was 60.5 months (range, 4-92 months). Age, sex, tumour stage, tumour thickness and volume, and perineural involvement were not predictive of locoregional recurrence or survival. Pathological tumour stage (T1 vs T2) was a prognostic factor for both five-year overall survival (87.3 vs 65.6 per cent, p = 0.042) and disease-free survival (88.6 vs 65.6 per cent, p = 0.037). Tumour stage was clearly a major factor affecting the prognosis of surgically treated patients with early-stage lip cancer for whom a 'wait and see' policy in terms of neck status had been implemented.
Qi, Miao; Wang, Ting; Yi, Yugen; Gao, Na; Kong, Jun; Wang, Jianzhong
2017-04-01
Feature selection has been regarded as an effective tool to help researchers understand the generating process of data. For mining the synthesis mechanism of microporous AlPOs, this paper proposes a novel feature selection method by joint l 2,1 norm and Fisher discrimination constraints (JNFDC). In order to obtain more effective feature subset, the proposed method can be achieved in two steps. The first step is to rank the features according to sparse and discriminative constraints. The second step is to establish predictive model with the ranked features, and select the most significant features in the light of the contribution of improving the predictive accuracy. To the best of our knowledge, JNFDC is the first work which employs the sparse representation theory to explore the synthesis mechanism of six kinds of pore rings. Numerical simulations demonstrate that our proposed method can select significant features affecting the specified structural property and improve the predictive accuracy. Moreover, comparison results show that JNFDC can obtain better predictive performances than some other state-of-the-art feature selection methods. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Chen, Cong; Zhang, Guohui; Huang, Helai; Wang, Jiangfeng; Tarefder, Rafiqul A
2016-11-01
Rural non-interstate crashes induce a significant amount of severe injuries and fatalities. Examination of such injury patterns and the associated contributing factors is of practical importance. Taking into account the ordinal nature of injury severity levels and the hierarchical feature of crash data, this study employs a hierarchical ordered logit model to examine the significant factors in predicting driver injury severities in rural non-interstate crashes based on two-year New Mexico crash records. Bayesian inference is utilized in model estimation procedure and 95% Bayesian Credible Interval (BCI) is applied to testing variable significance. An ordinary ordered logit model omitting the between-crash variance effect is evaluated as well for model performance comparison. Results indicate that the model employed in this study outperforms ordinary ordered logit model in model fit and parameter estimation. Variables regarding crash features, environment conditions, and driver and vehicle characteristics are found to have significant influence on the predictions of driver injury severities in rural non-interstate crashes. Factors such as road segments far from intersection, wet road surface condition, collision with animals, heavy vehicle drivers, male drivers and driver seatbelt used tend to induce less severe driver injury outcomes than the factors such as multiple-vehicle crashes, severe vehicle damage in a crash, motorcyclists, females, senior drivers, driver with alcohol or drug impairment, and other major collision types. Research limitations regarding crash data and model assumptions are also discussed. Overall, this research provides reasonable results and insight in developing effective road safety measures for crash injury severity reduction and prevention. Copyright © 2016 Elsevier Ltd. All rights reserved.
Learning Semantic Tags from Big Data for Clinical Text Representation.
Li, Yanpeng; Liu, Hongfang
2015-01-01
In clinical text mining, it is one of the biggest challenges to represent medical terminologies and n-gram terms in sparse medical reports using either supervised or unsupervised methods. Addressing this issue, we propose a novel method for word and n-gram representation at semantic level. We first represent each word by its distance with a set of reference features calculated by reference distance estimator (RDE) learned from labeled and unlabeled data, and then generate new features using simple techniques of discretization, random sampling and merging. The new features are a set of binary rules that can be interpreted as semantic tags derived from word and n-grams. We show that the new features significantly outperform classical bag-of-words and n-grams in the task of heart disease risk factor extraction in i2b2 2014 challenge. It is promising to see that semantics tags can be used to replace the original text entirely with even better prediction performance as well as derive new rules beyond lexical level.
Learning templates for artistic portrait lighting analysis.
Chen, Xiaowu; Jin, Xin; Wu, Hongyu; Zhao, Qinping
2015-02-01
Lighting is a key factor in creating impressive artistic portraits. In this paper, we propose to analyze portrait lighting by learning templates of lighting styles. Inspired by the experience of artists, we first define several novel features that describe the local contrasts in various face regions. The most informative features are then selected with a stepwise feature pursuit algorithm to derive the templates of various lighting styles. After that, the matching scores that measure the similarity between a testing portrait and those templates are calculated for lighting style classification. Furthermore, we train a regression model by the subjective scores and the feature responses of a template to predict the score of a portrait lighting quality. Based on the templates, a novel face illumination descriptor is defined to measure the difference between two portrait lightings. Experimental results show that the learned templates can well describe the lighting styles, whereas the proposed approach can assess the lighting quality of artistic portraits as human being does.
Geochemical Constraints on Core Formation in the Earth
NASA Technical Reports Server (NTRS)
Jones, John H.; Drake, Michael J.
1986-01-01
New experimental data on the partitioning of siderophile and chalcophile elements among metallic and silicate phases may be used to constrain hypotheses of core formation in the Earth. Three current hypotheses can explain gross features of mantle geochemistry, but none predicts siderophile and chalcophile element abundances to within a factor of two of observed values. Either our understanding of metal-silicate interactions and/or our understanding of the early Earth requires revision.
NASA Astrophysics Data System (ADS)
Zargari, Abolfazl; Du, Yue; Thai, Theresa C.; Gunderson, Camille C.; Moore, Kathleen; Mannel, Robert S.; Liu, Hong; Zheng, Bin; Qiu, Yuchen
2018-02-01
The objective of this study is to investigate the performance of global and local features to better estimate the characteristics of highly heterogeneous metastatic tumours, for accurately predicting the treatment effectiveness of the advanced stage ovarian cancer patients. In order to achieve this , a quantitative image analysis scheme was developed to estimate a total of 103 features from three different groups including shape and density, Wavelet, and Gray Level Difference Method (GLDM) features. Shape and density features are global features, which are directly applied on the entire target image; wavelet and GLDM features are local features, which are applied on the divided blocks of the target image. To assess the performance, the new scheme was applied on a retrospective dataset containing 120 recurrent and high grade ovary cancer patients. The results indicate that the three best performed features are skewness, root-mean-square (rms) and mean of local GLDM texture, indicating the importance of integrating local features. In addition, the averaged predicting performance are comparable among the three different categories. This investigation concluded that the local features contains at least as copious tumour heterogeneity information as the global features, which may be meaningful on improving the predicting performance of the quantitative image markers for the diagnosis and prognosis of ovary cancer patients.
[Predictors of remission from major depressive disorder in secondary care].
Salvo, Lilian; Saldivia, Sandra; Parra, Carlos; Cifuentes, Manuel; Bustos, Claudio; Acevedo, Paola; Díaz, Marcela; Ormazabal, Mitza; Guerra, Ivonne; Navarrete, Nicol; Bravo, Verónica; Castro, Andrea
2017-12-01
Background The knowledge of predictive factors in depression should help to deal with the disease. Aim To assess potential predictors of remission of major depressive disorders (MDD) in secondary care and to propose a predictive model. Material and Methods A 12 month follow-up study was conducted in a sample of 112 outpatients at three psychiatric care centers of Chile, with baseline and quarterly assessments. Demographic, psychosocial, clinical and treatment factors as potential predictors, were assessed. A clinical interview with the checklist of DSM-IV diagnostic criteria, the Hamilton Depression Scale and the List of Threatening Experiences and Multidimensional Scale of Perceived Social Support were applied. Results The number of stressful events, perceived social support, baseline depression scores, melancholic features, time prior to beginning treatment at the secondary level and psychotherapeutic sessions were included in the model as predictors of remission. Sex, age, number of previous depressive episodes, psychiatric comorbidity and medical comorbidity were not significantly related with remission. Conclusions This model allows to predict depression score at six months with 70% of accuracy and the score at 12 months with 72% of accuracy.
The movement ecology and dynamics of plant communities in fragmented landscapes.
Damschen, Ellen I; Brudvig, Lars A; Haddad, Nick M; Levey, Douglas J; Orrock, John L; Tewksbury, Joshua J
2008-12-09
A conceptual model of movement ecology has recently been advanced to explain all movement by considering the interaction of four elements: internal state, motion capacity, navigation capacities, and external factors. We modified this framework to generate predictions for species richness dynamics of fragmented plant communities and tested them in experimental landscapes across a 7-year time series. We found that two external factors, dispersal vectors and habitat features, affected species colonization and recolonization in habitat fragments and their effects varied and depended on motion capacity. Bird-dispersed species richness showed connectivity effects that reached an asymptote over time, but no edge effects, whereas wind-dispersed species richness showed steadily accumulating edge and connectivity effects, with no indication of an asymptote. Unassisted species also showed increasing differences caused by connectivity over time, whereas edges had no effect. Our limited use of proxies for movement ecology (e.g., dispersal mode as a proxy for motion capacity) resulted in moderate predictive power for communities and, in some cases, highlighted the importance of a more complete understanding of movement ecology for predicting how landscape conservation actions affect plant community dynamics.
Fatigue-Life Prediction Methodology Using Small-Crack Theory
NASA Technical Reports Server (NTRS)
Newmann, James C., Jr.; Phillips, Edward P.; Swain, M. H.
1997-01-01
This paper reviews the capabilities of a plasticity-induced crack-closure model to predict fatigue lives of metallic materials using 'small-crack theory' for various materials and loading conditions. Crack-tip constraint factors, to account for three-dimensional state-of-stress effects, were selected to correlate large-crack growth rate data as a function of the effective-stress-intensity factor range (delta K(eff)) under constant-amplitude loading. Some modifications to the delta k(eff)-rate relations were needed in the near-threshold regime to fit measured small-crack growth rate behavior and fatigue endurance limits. The model was then used to calculate small- and large-crack growth rates, and to predict total fatigue lives, for notched and un-notched specimens made of two aluminum alloys and a steel under constant-amplitude and spectrum loading. Fatigue lives were calculated using the crack-growth relations and microstructural features like those that initiated cracks for the aluminum alloys and steel for edge-notched specimens. An equivalent-initial-flaw-size concept was used to calculate fatigue lives in other cases. Results from the tests and analyses agreed well.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Damschen, Ellen I.; Brudvig, Lars A.; Haddad, Nick M.
A conceptual model of movement ecology has recently been advanced to explain all movement by considering the interaction of four elements: internal state, motion capacity, navigation capacities,and external factors. We modified this framework togenerate predictions for species richness dynamics of fragmented plant communities and tested them in experimental landscapes across a 7-year time series. We found that two external factors, dispersal vectors and habitat features, affected species colonization and recolonization in habitat fragments and their effects varied and depended on motion capacity. Bird-dispersed species richness showed connectivity effects that reached an asymptote over time, but no edge effects, whereas wind-dispersedmore » species richness showed steadily accumulating edge and connectivity effects, with no indication of an asymptote. Unassisted species also showed increasing differences caused by connectivity over time,whereas edges had no effect. Our limited use of proxies for movement ecology (e.g., dispersal mode as a proxy for motion capacity) resulted in moderate predictive power for communities and, in some cases, highlighted the importance of a more complete understanding of movement ecology for predicting how landscape conservation actions affect plant community dynamics.« less
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Jacob, Gitta A; Ower, Nicole; Buchholz, Angela
2013-03-01
Experiential avoidance (EA) is an important factor in maintaining different forms of psychopathology including borderline personality pathology (BPD). So far little is known about the functions of EA, BPD features and general psychopathology for positive emotions. In this study we investigated three different anticipated pathways of their influence on positive emotions. A total of 334 subjects varying in general psychopathology &/or BPD features completed an online survey including self-ratings of BPD features, psychopathology, negative and positive emotions, and EA. Measures of positive emotions included both a general self-rating (PANAS) and emotional changes induced by two positive movie clips. Data were analyzed by means of path analysis. In comparing the three path models, one model was found clearly superior: In this model, EA acts as a mediator of the influence of psychopathology, BPD features, and negative emotions in the prediction of both measures of positive emotions. EA plays a central role in maintaining lack of positive emotions. Therapeutic implications and study limitations are discussed. Copyright © 2012 Elsevier Ltd. All rights reserved.
Zhang, Huiping; Shi, Qiusheng; Gu, Jiying; Jiang, Luying; Bai, Min; Liu, Long; Wu, Ying; Du, Lianfang
2014-02-01
This study aimed to investigate the value of sonographic features including Virtual Touch tissue quantification (VTQ; Siemens Medical Solutions, Mountain View, CA) for differentiating benign and malignant thyroid nodules smaller than 10 mm. Seventy-one thyroid nodules smaller than 10 mm with pathologic diagnoses were included in this study. The conventional sonographic features and quantitative elasticity features (VTQ) were observed and compared between benign and malignant nodules. There were 39 benign and 32 malignant nodules according to histopathologic examination. When compared with benign nodules, malignant nodules were more frequently taller than wide, poorly defined, and markedly hypoechoic (P < .05). Color Doppler sonographic features were not significantly different between benign and malignant nodules. The VTQ value for malignant nodules (mean ± SD 3.260 ± 0.725 m/s) was significantly higher than that of benign ones (2.108 ± 0.455 m/s; P < .001). The cutoff point for the differential diagnosis was 2.910 m/s, with sensitivity, specificity, a positive predictive value, a negative predictive value, and diagnostic accuracy of 71.9%, 100%, 100%, 81.2%, and 87.3% respectively. Logistic regression analysis showed that a taller-than-wide shape, a poorly defined boundary, marked hypoechogenicity, and a VTQ value greater than 2.910 m/s were independent risk factors for malignancy, with odds ratios of 69.366, 41.864, 5.945, and 64.991. The combination of VTQ with a taller-than-wide shape had the highest sensitivity and specificity of 90.6% and 97.4%. The shape, margin, echogenicity, and VTQ value are useful sonographic criteria for differentiating benign and malignant thyroid nodules smaller than 10 mm. When VTQ was combined with B-mode sonographic features, the sensitivity was improved significantly.
Modelling assistive technology adoption for people with dementia.
Chaurasia, Priyanka; McClean, Sally I; Nugent, Chris D; Cleland, Ian; Zhang, Shuai; Donnelly, Mark P; Scotney, Bryan W; Sanders, Chelsea; Smith, Ken; Norton, Maria C; Tschanz, JoAnn
2016-10-01
Assistive technologies have been identified as a potential solution for the provision of elderly care. Such technologies have in general the capacity to enhance the quality of life and increase the level of independence among their users. Nevertheless, the acceptance of these technologies is crucial to their success. Generally speaking, the elderly are not well-disposed to technologies and have limited experience; these factors contribute towards limiting the widespread acceptance of technology. It is therefore important to evaluate the potential success of technologies prior to their deployment. The research described in this paper builds upon our previous work on modelling adoption of assistive technology, in the form of cognitive prosthetics such as reminder apps and aims at identifying a refined sub-set of features which offer improved accuracy in predicting technology adoption. Consequently, in this paper, an adoption model is built using a set of features extracted from a user's background to minimise the likelihood of non-adoption. The work is based on analysis of data from the Cache County Study on Memory and Aging (CCSMA) with 31 features covering a range of age, gender, education and details of health condition. In the process of modelling adoption, feature selection and feature reduction is carried out followed by identifying the best classification models. With the reduced set of labelled features the technology adoption model built achieved an average prediction accuracy of 92.48% when tested on 173 participants. We conclude that modelling user adoption from a range of parameters such as physical, environmental and social perspectives is beneficial in recommending a technology to a particular user based on their profile. Copyright © 2016 Elsevier Inc. All rights reserved.
Sugimoto, Masahiro; Takada, Masahiro; Toi, Masakazu
2014-12-09
Nomograms are a standard computational tool to predict the likelihood of an outcome using multiple available patient features. We have developed a more powerful data mining methodology, to predict axillary lymph node (AxLN) metastasis and response to neoadjuvant chemotherapy (NAC) in primary breast cancer patients. We developed websites to use these tools. The tools calculate the probability of AxLN metastasis (AxLN model) and pathological complete response to NAC (NAC model). As a calculation algorithm, we employed a decision tree-based prediction model known as the alternative decision tree (ADTree), which is an analog development of if-then type decision trees. An ensemble technique was used to combine multiple ADTree predictions, resulting in higher generalization abilities and robustness against missing values. The AxLN model was developed with training datasets (n=148) and test datasets (n=143), and validated using an independent cohort (n=174), yielding an area under the receiver operating characteristic curve (AUC) of 0.768. The NAC model was developed and validated with n=150 and n=173 datasets from a randomized controlled trial, yielding an AUC of 0.787. AxLN and NAC models require users to input up to 17 and 16 variables, respectively. These include pathological features, including human epidermal growth factor receptor 2 (HER2) status and imaging findings. Each input variable has an option of "unknown," to facilitate prediction for cases with missing values. The websites developed facilitate the use of these tools, and serve as a database for accumulating new datasets.
Imaoka, Hiroshi; Shimizu, Yasuhiro; Mizuno, Nobumasa; Hara, Kazuo; Hijioka, Susumu; Tajika, Masahiro; Tanaka, Tsutomu; Ishihara, Makoto; Ogura, Takeshi; Obayashi, Tomohiko; Shinagawa, Akihide; Sakaguchi, Masafumi; Yamaura, Hidekazu; Kato, Mina; Niwa, Yasumasa; Yamao, Kenji
2014-01-01
Adenosquamous carcinoma of the pancreas (ASC) is a rare malignant neoplasm of the pancreas, exhibiting both glandular and squamous differentiation. However, little is known about its imaging features. This study examined the imaging features of pancreatic ASC. We evaluated images of contrast-enhanced computed tomography (CT) and endoscopic ultrasonography (EUS). As controls, solid pancreatic neoplasms matched in a 2:1 ratio to ASC cases for age, sex and tumor location were also evaluated. Twenty-three ASC cases were examined, and 46 solid pancreatic neoplasms (43 pancreatic ductal adenocarcinomas, two pancreatic neuroendocrine tumors and one acinar cell carcinoma) were matched as controls. Univariate analysis demonstrated significant differences in the outline and vascularity of tumors on contrast-enhanced CT in the ASC and control groups (P < 0.001 and P < 0.001, respectively). A smooth outline, cystic changes, and the ring-enhancement pattern on contrast-enhanced CT were seen to have significant predictive powers by stepwise forward logistic regression analysis (P = 0.044, P = 0.010, and P = 0.001, respectively). Of the three, the ring-enhancement pattern was the most useful, and its predictive diagnostic sensitivity, specificity, positive predictive value and negative predictive value for diagnosis of ASC were 65.2%, 89.6%, 75.0% and 84.3%, respectively. These results demonstrate that presence of the ring-enhancement pattern on contrast-enhanced CT is the most useful predictive factor for ASC. Copyright © 2014 IAP and EPC. Published by Elsevier B.V. All rights reserved.
Thyroiditis de Quervain. Are there predictive factors for long-term hormone-replacement?
Schenke, S; Klett, R; Braun, S; Zimny, M
2013-01-01
Subacute thyroiditis is a usually self-limiting disease of the thyroid. However, approximately 0.5-15% of the patients require permanent thyroxine substitution. Aim was to determine predictive factors for the necessity of long-term hormone-replacement (LTH). We retrospectively reviewed the records of 72 patients with subacute thyroiditis. Morphological and serological parameters as well as type of therapy were tested as predictive factors of consecutive hypothyroidism. Mean age was 49 ± 11 years, f/m-ratio was 4.5 : 1. Thyroid pain and signs of hyperthyroidism were leading symptoms. Initial subclinical or overt hyperthyroidism was found in 20% and 37%, respectively. Within six months after onset 15% and 1.3% of the patients developed subclinical or overt hypothyroidism, respectively. At latest follow-up 26% were classified as liable to LTH. At onset the thyroid was enlarged in 64%, and at latest follow-up in 8.3%, with a significant reduction of the thyroid volume after three months. At the endpoint the thyroid volume was less in patients in the LTH group compared with the non-LTH group (41.7% vs. 57.2% of sex-adjusted upper norm, p = 0.041). Characteristic ultrasonographic features occurred in 74% of the patients in both lobes. Serological and morphological parameters as well as type of therapy were not related with the need of LTH. In this study the proportion of patients who received LTH was 26%. At the endpoint these patients had a lower thyroid volume compared with euthyroid patients. No predictive factors for LTH were found.
Kim, Byungjun; Jeon, Pyoung; Kim, Keonha; Kim, Sungtae; Kim, Hyungjin; Byun, Hong Sik; Jo, Kyung-Il
2016-04-01
Endovascular treatment using Onyx has been increasingly used to treat intracranial dural arteriovenous fistulas (DAVFs). This study evaluated predictive factors for favorable treatment outcome in patients with intracranial noncavernous DAVFs treated by transarterial Onyx embolization. Between August 2008 and August 2014, 55 patients who underwent transarterial Onyx embolization for noncavernous DAVFs were retrospectively reviewed. Patients' demographic, clinical, and procedural data were analyzed to find statistically significant predictive factors for favorable treatment outcomes after Onyx embolization. Fistulas were classified angiographically according to the relationship between fistulas and dural venous sinuses and the presence of leptomeningeal venous reflux. Sixty-eight Onyx embolizations were performed in 55 patients. Immediate angiographic cure was achieved in 28 patients, and 14 of 27 patients with residual shunts showed progressive occlusion at follow-up imaging studies. Therefore, the overall favorable treatment outcome was 76.4% (42/55). The remaining 13 patients (23.6%) showed persistent residual shunts, and 3 (5.5%) of them showed aggravation of residual lesion on follow-up studies. Of 25 patients with non-sinus fistulas, 23 patients (92%) showed favorable treatment outcomes, and 19 of 30 patients (63.3%) with sinus fistulas showed favorable outcomes. Among the evaluated variables, non-sinus DAVFs was a statistically significant predictive factor for favorable response to transarterial Onyx embolization (P < 0.05). Transarterial Onyx embolization is a highly effective treatment method for non-sinus DAVFs. Careful consideration of angiographic features and multimodal embolization strategies are required for treatment of sinus DAVFs. Copyright © 2016 Elsevier Inc. All rights reserved.
Examining overgeneral autobiographical memory as a risk factor for adolescent depression.
Rawal, Adhip; Rice, Frances
2012-05-01
Identifying risk factors for adolescent depression is an important research aim. Overgeneral autobiographical memory (OGM) is a feature of adolescent depression and a candidate cognitive risk factor for future depression. However, no study has ascertained whether OGM predicts the onset of adolescent depressive disorder. OGM was investigated as a predictor of depressive disorder and symptoms in a longitudinal study of high-risk adolescents. In addition, cross-sectional associations between OGM and current depression and OGM differences between depressed adolescents with different clinical outcomes were examined over time. A 1-year longitudinal study of adolescents at familial risk for depression (n = 277, 10-18 years old) was conducted. Autobiographical memory was assessed at baseline. Clinical interviews assessed diagnostic status at baseline and follow-up. Currently depressed adolescents showed an OGM bias compared with adolescents with no disorder and those with anxiety or externalizing disorders. OGM to negative cues predicted the onset of depressive disorder and depressive symptoms at follow-up in adolescents free from depressive disorder at baseline. This effect was independent of the contribution of age, IQ, and baseline depressive symptoms. OGM did not predict onset of anxiety or externalizing disorders. Adolescents with depressive disorder at both assessments were not more overgeneral than adolescents who recovered from depressive disorder over the follow-up period. OGM to negative cues predicted the onset of depressive disorder (but not other disorders) and depressive symptoms over time in adolescents at familial risk for depression. Results are consistent with OGM as a risk factor for depression. Copyright © 2012 American Academy of Child and Adolescent Psychiatry. Published by Elsevier Inc. All rights reserved.
Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming
2014-12-01
Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Feature selection using probabilistic prediction of support vector regression.
Yang, Jian-Bo; Ong, Chong-Jin
2011-06-01
This paper presents a new wrapper-based feature selection method for support vector regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiments show that the proposed method generally performs better than, or at least as well as, the existing methods, with notable advantage when the dataset is sparse.
Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection
NASA Astrophysics Data System (ADS)
Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu
Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.
SU-F-R-51: Radiomics in CT Perfusion Maps of Head and Neck Cancer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nesteruk, M; Riesterer, O; Veit-Haibach, P
2016-06-15
Purpose: The aim of this study was to test the predictive value of radiomics features of CT perfusion (CTP) for tumor control, based on a preselection of radiomics features in a robustness study. Methods: 11 patients with head and neck cancer (HNC) and 11 patients with lung cancer were included in the robustness study to preselect stable radiomics parameters. Data from 36 HNC patients treated with definitive radiochemotherapy (median follow-up 30 months) was used to build a predictive model based on these parameters. All patients underwent pre-treatment CTP. 315 texture parameters were computed for three perfusion maps: blood volume, bloodmore » flow and mean transit time. The variability of texture parameters was tested with respect to non-standardizable perfusion computation factors (noise level and artery contouring) using intraclass correlation coefficients (ICC). The parameter with the highest ICC in the correlated group of parameters (inter-parameter Spearman correlations) was tested for its predictive value. The final model to predict tumor control was built using multivariate Cox regression analysis with backward selection of the variables. For comparison, a predictive model based on tumor volume was created. Results: Ten parameters were found to be stable in both HNC and lung cancer regarding potentially non-standardizable factors after the correction for inter-parameter correlations. In the multivariate backward selection of the variables, blood flow entropy showed a highly significant impact on tumor control (p=0.03) with concordance index (CI) of 0.76. Blood flow entropy was significantly lower in the patient group with controlled tumors at 18 months (p<0.1). The new model showed a higher concordance index compared to the tumor volume model (CI=0.68). Conclusion: The preselection of variables in the robustness study allowed building a predictive radiomics-based model of tumor control in HNC despite a small patient cohort. This model was found to be superior to the volume-based model. The project was supported by the KFSP Tumor Oxygenation of the University of Zurich, by a grant of the Center for Clinical Research, University and University Hospital Zurich and by a research grant from Merck (Schweiz) AG.« less
A Feature Fusion Based Forecasting Model for Financial Time Series
Guo, Zhiqiang; Wang, Huaiqing; Liu, Quan; Yang, Jie
2014-01-01
Predicting the stock market has become an increasingly interesting research area for both researchers and investors, and many prediction models have been proposed. In these models, feature selection techniques are used to pre-process the raw data and remove noise. In this paper, a prediction model is constructed to forecast stock market behavior with the aid of independent component analysis, canonical correlation analysis, and a support vector machine. First, two types of features are extracted from the historical closing prices and 39 technical variables obtained by independent component analysis. Second, a canonical correlation analysis method is utilized to combine the two types of features and extract intrinsic features to improve the performance of the prediction model. Finally, a support vector machine is applied to forecast the next day's closing price. The proposed model is applied to the Shanghai stock market index and the Dow Jones index, and experimental results show that the proposed model performs better in the area of prediction than other two similar models. PMID:24971455
Zhang, Hua; Zhang, Tuo; Gao, Jianzhao; Ruan, Jishou; Shen, Shiyi; Kurgan, Lukasz
2012-01-01
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.
Hardwick, David R.; Cutmore, Timothy R. H.; Hine, Trevor J.
2014-01-01
Saccadic latency is reduced by a temporal gap between fixation point and target, by identification of a target feature, and by movement in a new direction (inhibition of saccadic return, ISR). A simple additive model was compared with a shared resources model that predicts a three-way interaction. Twenty naïve participants made horizontal saccades to targets left and right of fixation in a randomised block design. There was a significant three-way interaction among the factors on saccade latency. This was revealed in a two-way interaction between feature identification and the gap versus no gap factor which was only apparent when the saccade was in the same direction as the previous saccade. No interaction was apparent when the saccade was in the opposite direction. This result supports an attentional inhibitory effect that is present during ISR to a previous location which is only partly released by the facilitative effect of feature identification and gap. Together, anticipatory error data and saccade latency interactions suggest a source of ISR at a higher level of attention, possibly localised in the dorsolateral prefrontal cortex and involving tonic activation. PMID:24719754
2011-01-01
Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots. PMID:21798070
Velankar, Sameer; Kryshtafovych, Andriy; Huang, Shen‐You; Schneidman‐Duhovny, Dina; Sali, Andrej; Segura, Joan; Fernandez‐Fuentes, Narcis; Viswanath, Shruthi; Elber, Ron; Grudinin, Sergei; Popov, Petr; Neveu, Emilie; Lee, Hasup; Baek, Minkyung; Park, Sangwoo; Heo, Lim; Rie Lee, Gyu; Seok, Chaok; Qin, Sanbo; Zhou, Huan‐Xiang; Ritchie, David W.; Maigret, Bernard; Devignes, Marie‐Dominique; Ghoorah, Anisah; Torchala, Mieczyslaw; Chaleil, Raphaël A.G.; Bates, Paul A.; Ben‐Zeev, Efrat; Eisenstein, Miriam; Negi, Surendra S.; Weng, Zhiping; Vreven, Thom; Pierce, Brian G.; Borrman, Tyler M.; Yu, Jinchao; Ochsenbein, Françoise; Guerois, Raphaël; Vangone, Anna; Rodrigues, João P.G.L.M.; van Zundert, Gydo; Nellen, Mehdi; Xue, Li; Karaca, Ezgi; Melquiond, Adrien S.J.; Visscher, Koen; Kastritis, Panagiotis L.; Bonvin, Alexandre M.J.J.; Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Li, Jilong; Ma, Zhiwei; Cheng, Jianlin; Zou, Xiaoqin; Shen, Yang; Peterson, Lenna X.; Kim, Hyung‐Rae; Roy, Amit; Han, Xusi; Esquivel‐Rodriguez, Juan; Kihara, Daisuke; Yu, Xiaofeng; Bruce, Neil J.; Fuller, Jonathan C.; Wade, Rebecca C.; Anishchenko, Ivan; Kundrotas, Petras J.; Vakser, Ilya A.; Imai, Kenichiro; Yamada, Kazunori; Oda, Toshiyuki; Nakamura, Tsukasa; Tomii, Kentaro; Pallara, Chiara; Romero‐Durana, Miguel; Jiménez‐García, Brian; Moal, Iain H.; Férnandez‐Recio, Juan; Joung, Jong Young; Kim, Jong Yun; Joo, Keehyoung; Lee, Jooyoung; Kozakov, Dima; Vajda, Sandor; Mottarella, Scott; Hall, David R.; Beglov, Dmitri; Mamonov, Artem; Xia, Bing; Bohnuud, Tanggis; Del Carpio, Carlos A.; Ichiishi, Eichiro; Marze, Nicholas; Kuroda, Daisuke; Roy Burman, Shourya S.; Gray, Jeffrey J.; Chermak, Edrisse; Cavallo, Luigi; Oliva, Romina; Tovchigrechko, Andrey
2016-01-01
ABSTRACT We present the results for CAPRI Round 30, the first joint CASP‐CAPRI experiment, which brought together experts from the protein structure prediction and protein–protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact‐sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology‐built subunit models and the smaller pair‐wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy. Proteins 2016; 84(Suppl 1):323–348. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:27122118
Yasuda, Akihito; Onuki, Yoshinori; Kikuchi, Shingo; Takayama, Kozo
2010-11-01
The quality by design concept in pharmaceutical formulation development requires establishment of a science-based rationale and a design space. We integrated thin-plate spline (TPS) interpolation and Kohonen's self-organizing map (SOM) to visualize the latent structure underlying causal factors and pharmaceutical responses. As a model pharmaceutical product, theophylline powders were prepared based on the standard formulation. The angle of repose, compressibility, cohesion, and dispersibility were measured as the response variables. These responses were predicted quantitatively on the basis of a nonlinear TPS. A large amount of data on these powders was generated and classified into several clusters using an SOM. The experimental values of the responses were predicted with high accuracy, and the data generated for the powders could be classified into several distinctive clusters. The SOM feature map allowed us to analyze the global and local correlations between causal factors and powder characteristics. For instance, the quantities of microcrystalline cellulose (MCC) and magnesium stearate (Mg-St) were classified distinctly into each cluster, indicating that the quantities of MCC and Mg-St were crucial for determining the powder characteristics. This technique provides a better understanding of the relationships between causal factors and pharmaceutical responses in theophylline powder formulations. © 2010 Wiley-Liss, Inc. and the American Pharmacists Association
[Lack of assertiveness in patients with eating disorders].
Behar A, Rosa; Manzo G, Rodrigo; Casanova Z, Dunny
2006-03-01
Low self-assertion has been noted as an important feature among patients with eating disorders. To verify, in a female population, if assertiveness is related or has a predictive capacity for the development of eating disorders. An structured clinical interview, the Eating Attitudes Test (EAT-40) and the Rathus Assertiveness Scale (RAS) were administered to 62 patients that fulfilled the DSM-IV diagnostic criteria for eating disorders and to 120 female students without eating problems. Patients with eating disorders ranked significantly higher on the EAT-40 and its factors (p <0.001) and showed a lower level of assertiveness on the RAS (p <0.001). Assertiveness measured by RAS and its factors was inversely related to EAT-40 and its items (r= -0.21). The predictive capability of the lack of self-assertion in the development of an eating disorder reached 53%, when patients with eating disorders and subjects at risk were considered together and compared to students without such disorder. Lack of assertiveness is a significant trait in patients with eating disorders; it may worsen its outcome and even perpetuate symptoms. Low self-assertion may be considered a predictive factor in the development of an eating disorder and must be managed from a preventive or therapeutic point of view.
Da Fonseca, D; Cury, F; Rufo, M; Poinso, F
2007-10-01
The aim of this study was to complete the identification of predictive factors of depression during adolescence. For some authors, depression is characterized by a style of attribution, which consists essentially in attributing most of the negative outcomes to internal, stable, and uncontrollable factors. It seems that these attributions depend essentially on the type of their beliefs and in particular, those concerning the nature of intelligence. These beliefs called "implicit theories of intelligence", are the entity theory of intelligence and the incremental theory of intelligence. The entity theory of intelligence corresponds to the belief according to which intelligence is the expression of a relatively stable, fixed, and noncontrollable feature, and which we cannot change. In contrast, the incremental theory corresponds to the belief according to which intelligence is a controllable quality, which we can develop through effort and work. Several studies have demonstrated that the adolescents who consider intelligence as a malleable quality explain their bad results by internal, unstable, and controllable factors. Conversely, students who consider intelligence as a fixed capacity tend to strongly attribute their failure to internal, stable, and uncontrollable factors. We have consequently formulated the hypothesis according to which the entity theory should be a predictive factor of depression. We have also tested the fact that anxiety should be a mediating factor within the relation between the entity theory and depression. The sample was composed of 424 adolescents. Using different questionnaires, we measured implicit theories of the intelligence (TIDI), self-esteem (EES), anxiety (STAI-Form Y-B) and depression (CDI). Multiple regression analyses demonstrated that the entity theory of intelligence positively predicts depression. Self-esteem negatively predicts anxiety and depression. Moreover, anxiety is a mediator of the relation between self-esteem and depression, on one hand, and the relation between the entity theory of intelligence and depression, on the other. Finally, the effect of the entity theory of intelligence appears to be modulated by the level of self-esteem. This study explains the mechanisms by which the implicit theories of intelligence engender anxiety and depression. Furthermore, this approach provides interesting perspectives in the prevention and management of adolescents presenting depression.
Ahern, Tomás; Swiecicka, Agnieszka; Eendebak, Robert J A H; Carter, Emma L; Finn, Joseph D; Pye, Stephen R; O'Neill, Terence W; Antonio, Leen; Keevil, Brian; Bartfai, György; Casanueva, Felipe F; Forti, Gianni; Giwercman, Aleksander; Han, Thang S; Kula, Krzysztof; Lean, Michael E J; Pendleton, Neil; Punab, Margus; Rastrelli, Giulia; Rutter, Martin K; Vanderschueren, Dirk; Huhtaniemi, Ilpo T; Wu, Frederick C W
2016-12-01
In ageing men, the incidence and clinical significance of testosterone (T) decline accompanied by elevated luteinizing hormone (LH) are unclear. We describe the natural history, risk factors and clinical features associated with the development of biochemical primary hypogonadism (PHG, T < 10·5 nmol/l and LH>9·4U/l) in ageing men. A prospective observational cohort survey of 3,369 community-dwelling men aged 40-79 years, followed up for 4·3 years. Men were classified as incident (i) PHG (eugonadal [EUG, T ≥ 10·5 nmol/l] at baseline, PHG at follow-up), persistent (p) PHG (PHG at baseline and follow-up), pEUG (EUG at baseline and follow-up) and reversed (r) PHG (PHG at baseline, EUG at follow-up). Predictors and changes in clinical features associated with the development of PHG were analysed by regression models. Of 1,991 men comprising the analytical sample, 97·5% had pEUG, 1·1% iPHG, 1·1% pPHG and 0·3% rPHG. The incidence of PHG was 0·2%/year. Higher age (>70 years) [OR 12·48 (1·27-122·13), P = 0·030] and chronic illnesses [OR 4·24 (1·08-16·56); P = 0·038] predicted iPHG. Upon transition from EUG to PHG, erectile function, physical vigour and haemoglobin worsened significantly. Men with pPHG had decreased morning erections, sexual thoughts and haemoglobin with increased insulin resistance. Primary testicular failure in men is uncommon and predicted by old age and chronic illness. Some clinical features attributable to androgen deficiency, but not others, accompanied the T decline in men who developed biochemical PHG. Whether androgen replacement can improve sexual and/or physical function in elderly men with PHG merits further study. © 2016 John Wiley & Sons Ltd.
Matheson, Louise S; Bolland, Daniel J; Chovanec, Peter; Krueger, Felix; Andrews, Simon; Koohy, Hashem; Corcoran, Anne E
2017-01-01
V(D)J recombination is essential for the generation of diverse antigen receptor (AgR) repertoires. In B cells, immunoglobulin kappa ( Igκ ) light chain recombination follows immunoglobulin heavy chain ( Igh ) recombination. We recently developed the DNA-based VDJ-seq assay for the unbiased quantitation of Igh VH and DH repertoires. Integration of VDJ-seq data with genome-wide datasets revealed that two chromatin states at the recombination signal sequence (RSS) of VH genes are highly predictive of recombination in mouse pro-B cells. It is unknown whether local chromatin states contribute to Vκ gene choice during Igκ recombination. Here we adapt VDJ-seq to profile the Igκ VκJκ repertoire and present a comprehensive readout in mouse pre-B cells, revealing highly variable Vκ gene usage. Integration with genome-wide datasets for histone modifications, DNase hypersensitivity, transcription factor binding and germline transcription identified PU.1 binding at the RSS, which was unimportant for Igh , as highly predictive of whether a Vκ gene will recombine or not, suggesting that it plays a binary, all-or-nothing role, priming genes for recombination. Thereafter, the frequency with which these genes recombine was shaped both by the presence and level of enrichment of several other chromatin features, including H3K4 methylation and IKAROS binding. Moreover, in contrast to the Igh locus, the chromatin landscape of the promoter, as well as of the RSS, contributes to Vκ gene recombination. Thus, multiple facets of local chromatin features explain much of the variation in Vκ gene usage. Together, these findings reveal shared and divergent roles for epigenetic features and transcription factors in AgR V(D)J recombination and provide avenues for further investigation of chromatin signatures that may underpin V(D)J-mediated chromosomal translocations.
Matheson, Louise S.; Bolland, Daniel J.; Chovanec, Peter; Krueger, Felix; Andrews, Simon; Koohy, Hashem; Corcoran, Anne E.
2017-01-01
V(D)J recombination is essential for the generation of diverse antigen receptor (AgR) repertoires. In B cells, immunoglobulin kappa (Igκ) light chain recombination follows immunoglobulin heavy chain (Igh) recombination. We recently developed the DNA-based VDJ-seq assay for the unbiased quantitation of Igh VH and DH repertoires. Integration of VDJ-seq data with genome-wide datasets revealed that two chromatin states at the recombination signal sequence (RSS) of VH genes are highly predictive of recombination in mouse pro-B cells. It is unknown whether local chromatin states contribute to Vκ gene choice during Igκ recombination. Here we adapt VDJ-seq to profile the Igκ VκJκ repertoire and present a comprehensive readout in mouse pre-B cells, revealing highly variable Vκ gene usage. Integration with genome-wide datasets for histone modifications, DNase hypersensitivity, transcription factor binding and germline transcription identified PU.1 binding at the RSS, which was unimportant for Igh, as highly predictive of whether a Vκ gene will recombine or not, suggesting that it plays a binary, all-or-nothing role, priming genes for recombination. Thereafter, the frequency with which these genes recombine was shaped both by the presence and level of enrichment of several other chromatin features, including H3K4 methylation and IKAROS binding. Moreover, in contrast to the Igh locus, the chromatin landscape of the promoter, as well as of the RSS, contributes to Vκ gene recombination. Thus, multiple facets of local chromatin features explain much of the variation in Vκ gene usage. Together, these findings reveal shared and divergent roles for epigenetic features and transcription factors in AgR V(D)J recombination and provide avenues for further investigation of chromatin signatures that may underpin V(D)J-mediated chromosomal translocations. PMID:29204143
Lee, Jeong Sub; Kim, Se Hyung; Im, Seock-Ah; Kim, Min A; Han, Joon Koo
2017-01-01
To retrospectively analyze the qualitative CT features that correlate with human epidermal growth factor receptor 2 (HER2)-expression in pathologically-proven gastric cancers. A total of 181 patients with pathologically-proven unresectable gastric cancers with HER2-expression (HER2-positive [n = 32] and negative [n = 149]) were included. CT features of primary gastric and metastatic tumors were reviewed. The prevalence of each CT finding was compared in both groups. Thereafter, binary logistic regression determined the most significant differential CT features. Clinical outcomes were compared using Kaplan-Meier method. HER2-postive cancers showed lower clinical T stage (21.9% vs. 8.1%; p = 0.015), hyperattenuation on portal phase (62.5% vs. 30.9%; p = 0.003), and was more frequently metastasized to the liver (62.5% vs. 32.2%; p = 0.001), than HER2-negative cancers. On binary regression analysis, hyperattenuation of the tumor (odds ratio [OR], 4.68; p < 0.001) and hepatic metastasis (OR, 4.43; p = 0.001) were significant independent factors that predict HER2-positive cancers. Median survival of HER2-positive cancers (13.7 months) was significantly longer than HER2-negative cancers (9.6 months) ( p = 0.035). HER2-positive gastric cancers show less-advanced T stage, hyperattenuation on the portal phase, and frequently metastasize to the liver, as compared to HER2-negative cancers.
Feature Selection Methods for Zero-Shot Learning of Neural Activity
Caceres, Carlos A.; Roos, Matthew J.; Rupp, Kyle M.; Milsap, Griffin; Crone, Nathan E.; Wolmetz, Michael E.; Ratto, Christopher R.
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy. PMID:28690513
NECAP 4.1: NASA's Energy-Cost Analysis Program input manual
NASA Technical Reports Server (NTRS)
Jensen, R. N.
1982-01-01
The computer program NECAP (NASA's Energy Cost Analysis Program) is described. The program is a versatile building design and energy analysis tool which has embodied within it state of the art techniques for performing thermal load calculations and energy use predictions. With the program, comparisons of building designs and operational alternatives for new or existing buildings can be made. The major feature of the program is the response factor technique for calculating the heat transfer through the building surfaces which accounts for the building's mass. The program expands the response factor technique into a space response factor to account for internal building temperature swings; this is extremely important in determining true building loads and energy consumption when internal temperatures are allowed to swing.
NASA Astrophysics Data System (ADS)
Shi, Bibo; Grimm, Lars J.; Mazurowski, Maciej A.; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.
2017-03-01
Predicting the risk of occult invasive disease in ductal carcinoma in situ (DCIS) is an important task to help address the overdiagnosis and overtreatment problems associated with breast cancer. In this work, we investigated the feasibility of using computer-extracted mammographic features to predict occult invasive disease in patients with biopsy proven DCIS. We proposed a computer-vision algorithm based approach to extract mammographic features from magnification views of full field digital mammography (FFDM) for patients with DCIS. After an expert breast radiologist provided a region of interest (ROI) mask for the DCIS lesion, the proposed approach is able to segment individual microcalcifications (MCs), detect the boundary of the MC cluster (MCC), and extract 113 mammographic features from MCs and MCC within the ROI. In this study, we extracted mammographic features from 99 patients with DCIS (74 pure DCIS; 25 DCIS plus invasive disease). The predictive power of the mammographic features was demonstrated through binary classifications between pure DCIS and DCIS with invasive disease using linear discriminant analysis (LDA). Before classification, the minimum redundancy Maximum Relevance (mRMR) feature selection method was first applied to choose subsets of useful features. The generalization performance was assessed using Leave-One-Out Cross-Validation and Receiver Operating Characteristic (ROC) curve analysis. Using the computer-extracted mammographic features, the proposed model was able to distinguish DCIS with invasive disease from pure DCIS, with an average classification performance of AUC = 0.61 +/- 0.05. Overall, the proposed computer-extracted mammographic features are promising for predicting occult invasive disease in DCIS.
Ning, Kaida; Chen, Bo; Sun, Fengzhu; Hobel, Zachary; Zhao, Lu; Matloff, Will; Toga, Arthur W
2018-08-01
A long-standing question is how to best use brain morphometric and genetic data to distinguish Alzheimer's disease (AD) patients from cognitively normal (CN) subjects and to predict those who will progress from mild cognitive impairment (MCI) to AD. Here, we use a neural network (NN) framework on both magnetic resonance imaging-derived quantitative structural brain measures and genetic data to address this question. We tested the effectiveness of NN models in classifying and predicting AD. We further performed a novel analysis of the NN model to gain insight into the most predictive imaging and genetics features and to identify possible interactions between features that affect AD risk. Data were obtained from the AD Neuroimaging Initiative cohort and included baseline structural MRI data and single nucleotide polymorphism (SNP) data for 138 AD patients, 225 CN subjects, and 358 MCI patients. We found that NN models with both brain and SNP features as predictors perform significantly better than models with either alone in classifying AD and CN subjects, with an area under the receiver operating characteristic curve (AUC) of 0.992, and in predicting the progression from MCI to AD (AUC=0.835). The most important predictors in the NN model were the left middle temporal gyrus volume, the left hippocampus volume, the right entorhinal cortex volume, and the APOE (a gene that encodes apolipoprotein E) ɛ4 risk allele. Furthermore, we identified interactions between the right parahippocampal gyrus and the right lateral occipital gyrus, the right banks of the superior temporal sulcus and the left posterior cingulate, and SNP rs10838725 and the left lateral occipital gyrus. Our work shows the ability of NN models to not only classify and predict AD occurrence but also to identify important AD risk factors and interactions among them. Copyright © 2018 Elsevier Inc. All rights reserved.
Favre, Mônica R; La Mendola, Deborah; Meystre, Julie; Christodoulou, Dimitri; Cochrane, Melissa J; Markram, Henry; Markram, Kamila
2015-01-01
Understanding the effects of environmental stimulation in autism can improve therapeutic interventions against debilitating sensory overload, social withdrawal, fear and anxiety. Here, we evaluate the role of environmental predictability on behavior and protein expression, and inter-individual differences, in the valproic acid (VPA) model of autism. Male rats embryonically exposed (E11.5) either to VPA, a known autism risk factor in humans, or to saline, were housed from weaning into adulthood in a standard laboratory environment, an unpredictably enriched environment, or a predictably enriched environment. Animals were tested for sociability, nociception, stereotypy, fear conditioning and anxiety, and for tissue content of glutamate signaling proteins in the primary somatosensory cortex, hippocampus and amygdala, and of corticosterone in plasma, amygdala and hippocampus. Standard group analyses on separate measures were complemented with a composite emotionality score, using Cronbach's Alpha analysis, and with multivariate profiling of individual animals, using Hierarchical Cluster Analysis. We found that predictable environmental enrichment prevented the development of hyper-emotionality in the VPA-exposed group, while unpredictable enrichment did not. Individual variation in the severity of the autistic-like symptoms (fear, anxiety, social withdrawal and sensory abnormalities) correlated with neurochemical profiles, and predicted their responsiveness to predictability in the environment. In controls, the association between socio-affective behaviors, neurochemical profiles and environmental predictability was negligible. This study suggests that rearing in a predictable environment prevents the development of hyper-emotional features in animals exposed to an autism risk factor, and demonstrates that unpredictable environments can lead to negative outcomes, even in the presence of environmental enrichment.
Favre, Mônica R.; La Mendola, Deborah; Meystre, Julie; Christodoulou, Dimitri; Cochrane, Melissa J.; Markram, Henry; Markram, Kamila
2015-01-01
Understanding the effects of environmental stimulation in autism can improve therapeutic interventions against debilitating sensory overload, social withdrawal, fear and anxiety. Here, we evaluate the role of environmental predictability on behavior and protein expression, and inter-individual differences, in the valproic acid (VPA) model of autism. Male rats embryonically exposed (E11.5) either to VPA, a known autism risk factor in humans, or to saline, were housed from weaning into adulthood in a standard laboratory environment, an unpredictably enriched environment, or a predictably enriched environment. Animals were tested for sociability, nociception, stereotypy, fear conditioning and anxiety, and for tissue content of glutamate signaling proteins in the primary somatosensory cortex, hippocampus and amygdala, and of corticosterone in plasma, amygdala and hippocampus. Standard group analyses on separate measures were complemented with a composite emotionality score, using Cronbach's Alpha analysis, and with multivariate profiling of individual animals, using Hierarchical Cluster Analysis. We found that predictable environmental enrichment prevented the development of hyper-emotionality in the VPA-exposed group, while unpredictable enrichment did not. Individual variation in the severity of the autistic-like symptoms (fear, anxiety, social withdrawal and sensory abnormalities) correlated with neurochemical profiles, and predicted their responsiveness to predictability in the environment. In controls, the association between socio-affective behaviors, neurochemical profiles and environmental predictability was negligible. This study suggests that rearing in a predictable environment prevents the development of hyper-emotional features in animals exposed to an autism risk factor, and demonstrates that unpredictable environments can lead to negative outcomes, even in the presence of environmental enrichment. PMID:26089770
Talih, Soha; Balhas, Zainab; Eissenberg, Thomas; Salman, Rola; Karaoghlanian, Nareg; El Hellani, Ahmad; Baalbaki, Rima; Saliba, Najat; Shihadeh, Alan
2015-02-01
Some electronic cigarette (ECIG) users attain tobacco cigarette-like plasma nicotine concentrations while others do not. Understanding the factors that influence ECIG aerosol nicotine delivery is relevant to regulation, including product labeling and abuse liability. These factors may include user puff topography, ECIG liquid composition, and ECIG design features. This study addresses how these factors can influence ECIG nicotine yield. Aerosols were machine generated with 1 type of ECIG cartridge (V4L CoolCart) using 5 distinct puff profiles representing a tobacco cigarette smoker (2-s puff duration, 33-ml/s puff velocity), a slow average ECIG user (4 s, 17 ml/s), a fast average user (4 s, 33 ml/s), a slow extreme user (8 s, 17 ml/s), and a fast extreme user (8 s, 33 ml/s). Output voltage (3.3-5.2 V or 3.0-7.5 W) and e-liquid nicotine concentration (18-36 mg/ml labeled concentration) were varied. A theoretical model was also developed to simulate the ECIG aerosol production process and to provide insight into the empirical observations. Nicotine yields from 15 puffs varied by more than 50-fold across conditions. Experienced ECIG user profiles (longer puffs) resulted in higher nicotine yields relative to the tobacco smoker (shorter puffs). Puff velocity had no effect on nicotine yield. Higher nicotine concentration and higher voltages resulted in higher nicotine yields. These results were predicted well by the theoretical model (R (2) = 0.99). Depending on puff conditions and product features, 15 puffs from an ECIG can provide far less or far more nicotine than a single tobacco cigarette. ECIG emissions can be predicted using physical principles, with knowledge of puff topography and a few ECIG device design parameters. © The Author 2014. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Fédrigo, Olivier; Haygood, Ralph; Mukherjee, Sayan; Wray, Gregory A.
2009-01-01
Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary constraint. PMID:19506001
Talih, Soha; Balhas, Zainab; Eissenberg, Thomas; Salman, Rola; Karaoghlanian, Nareg; El Hellani, Ahmad; Baalbaki, Rima; Saliba, Najat
2015-01-01
Introduction: Some electronic cigarette (ECIG) users attain tobacco cigarette–like plasma nicotine concentrations while others do not. Understanding the factors that influence ECIG aerosol nicotine delivery is relevant to regulation, including product labeling and abuse liability. These factors may include user puff topography, ECIG liquid composition, and ECIG design features. This study addresses how these factors can influence ECIG nicotine yield. Methods: Aerosols were machine generated with 1 type of ECIG cartridge (V4L CoolCart) using 5 distinct puff profiles representing a tobacco cigarette smoker (2-s puff duration, 33-ml/s puff velocity), a slow average ECIG user (4 s, 17 ml/s), a fast average user (4 s, 33 ml/s), a slow extreme user (8 s, 17 ml/s), and a fast extreme user (8 s, 33 ml/s). Output voltage (3.3–5.2 V or 3.0–7.5 W) and e-liquid nicotine concentration (18–36 mg/ml labeled concentration) were varied. A theoretical model was also developed to simulate the ECIG aerosol production process and to provide insight into the empirical observations. Results: Nicotine yields from 15 puffs varied by more than 50-fold across conditions. Experienced ECIG user profiles (longer puffs) resulted in higher nicotine yields relative to the tobacco smoker (shorter puffs). Puff velocity had no effect on nicotine yield. Higher nicotine concentration and higher voltages resulted in higher nicotine yields. These results were predicted well by the theoretical model (R 2 = 0.99). Conclusions: Depending on puff conditions and product features, 15 puffs from an ECIG can provide far less or far more nicotine than a single tobacco cigarette. ECIG emissions can be predicted using physical principles, with knowledge of puff topography and a few ECIG device design parameters. PMID:25187061
NASA Astrophysics Data System (ADS)
Riest, Jonas; Nägele, Gerhard; Liu, Yun; Wagner, Norman J.; Godfrin, P. Douglas
2018-02-01
Recently, atypical static features of microstructural ordering in low-salinity lysozyme protein solutions have been extensively explored experimentally and explained theoretically based on a short-range attractive plus long-range repulsive (SALR) interaction potential. However, the protein dynamics and the relationship to the atypical SALR structure remain to be demonstrated. Here, the applicability of semi-analytic theoretical methods predicting diffusion properties and viscosity in isotropic particle suspensions to low-salinity lysozyme protein solutions is tested. Using the interaction potential parameters previously obtained from static structure factor measurements, our results of Monte Carlo simulations representing seven experimental lysoyzme samples indicate that they exist either in dispersed fluid or random percolated states. The self-consistent Zerah-Hansen scheme is used to describe the static structure factor, S(q), which is the input to our calculation schemes for the short-time hydrodynamic function, H(q), and the zero-frequency viscosity η. The schemes account for hydrodynamic interactions included on an approximate level. Theoretical predictions for H(q) as a function of the wavenumber q quantitatively agree with experimental results at small protein concentrations obtained using neutron spin echo measurements. At higher concentrations, qualitative agreement is preserved although the calculated hydrodynamic functions are overestimated. We attribute the differences for higher concentrations and lower temperatures to translational-rotational diffusion coupling induced by the shape and interaction anisotropy of particles and clusters, patchiness of the lysozyme particle surfaces, and the intra-cluster dynamics, features not included in our simple globular particle model. The theoretical results for the solution viscosity, η, are in qualitative agreement with our experimental data even at higher concentrations. We demonstrate that semi-quantitative predictions of diffusion properties and viscosity of solutions of globular proteins are possible given only the equilibrium structure factor of proteins. Furthermore, we explore the effects of changing the attraction strength on H(q) and η.
Universal LD50 predictions using deep learning
NICEATM Predictive Models for Acute Oral Systemic Toxicity LD50 entry Risa R. Sayre (sayre.risa@epa.gov) & Christopher M. Grulke Our approach uses an ensemble of multilayer perceptron regressions to predict rat acute oral LD50 values from chemical features. Features were genera...
Local-search based prediction of medical image registration error
NASA Astrophysics Data System (ADS)
Saygili, Görkem
2018-03-01
Medical image registration is a crucial task in many different medical imaging applications. Hence, considerable amount of work has been published recently that aim to predict the error in a registration without any human effort. If provided, these error predictions can be used as a feedback to the registration algorithm to further improve its performance. Recent methods generally start with extracting image-based and deformation-based features, then apply feature pooling and finally train a Random Forest (RF) regressor to predict the real registration error. Image-based features can be calculated after applying a single registration but provide limited accuracy whereas deformation-based features such as variation of deformation vector field may require up to 20 registrations which is a considerably high time-consuming task. This paper proposes to use extracted features from a local search algorithm as image-based features to estimate the error of a registration. The proposed method comprises a local search algorithm to find corresponding voxels between registered image pairs and based on the amount of shifts and stereo confidence measures, it predicts the amount of registration error in millimetres densely using a RF regressor. Compared to other algorithms in the literature, the proposed algorithm does not require multiple registrations, can be efficiently implemented on a Graphical Processing Unit (GPU) and can still provide highly accurate error predictions in existence of large registration error. Experimental results with real registrations on a public dataset indicate a substantially high accuracy achieved by using features from the local search algorithm.
Wen, Ping-Ping; Shi, Shao-Ping; Xu, Hao-Dong; Wang, Li-Na; Qiu, Jian-Ding
2016-10-15
As one of the most important reversible types of post-translational modification, protein methylation catalyzed by methyltransferases carries many pivotal biological functions as well as many essential biological processes. Identification of methylation sites is prerequisite for decoding methylation regulatory networks in living cells and understanding their physiological roles. Experimental methods are limitations of labor-intensive and time-consuming. While in silicon approaches are cost-effective and high-throughput manner to predict potential methylation sites, but those previous predictors only have a mixed model and their prediction performances are not fully satisfactory now. Recently, with increasing availability of quantitative methylation datasets in diverse species (especially in eukaryotes), there is a growing need to develop a species-specific predictor. Here, we designed a tool named PSSMe based on information gain (IG) feature optimization method for species-specific methylation site prediction. The IG method was adopted to analyze the importance and contribution of each feature, then select the valuable dimension feature vectors to reconstitute a new orderly feature, which was applied to build the finally prediction model. Finally, our method improves prediction performance of accuracy about 15% comparing with single features. Furthermore, our species-specific model significantly improves the predictive performance compare with other general methylation prediction tools. Hence, our prediction results serve as useful resources to elucidate the mechanism of arginine or lysine methylation and facilitate hypothesis-driven experimental design and validation. The tool online service is implemented by C# language and freely available at http://bioinfo.ncu.edu.cn/PSSMe.aspx CONTACT: jdqiu@ncu.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Perceptual quality prediction on authentically distorted images using a bag of features approach
Ghadiyaram, Deepti; Bovik, Alan C.
2017-01-01
Current top-performing blind perceptual image quality prediction models are generally trained on legacy databases of human quality opinion scores on synthetically distorted images. Therefore, they learn image features that effectively predict human visual quality judgments of inauthentic and usually isolated (single) distortions. However, real-world images usually contain complex composite mixtures of multiple distortions. We study the perceptually relevant natural scene statistics of such authentically distorted images in different color spaces and transform domains. We propose a “bag of feature maps” approach that avoids assumptions about the type of distortion(s) contained in an image and instead focuses on capturing consistencies—or departures therefrom—of the statistics of real-world images. Using a large database of authentically distorted images, human opinions of them, and bags of features computed on them, we train a regressor to conduct image quality prediction. We demonstrate the competence of the features toward improving automatic perceptual quality prediction by testing a learned algorithm using them on a benchmark legacy database as well as on a newly introduced distortion-realistic resource called the LIVE In the Wild Image Quality Challenge Database. We extensively evaluate the perceptual quality prediction model and algorithm and show that it is able to achieve good-quality prediction power that is better than other leading models. PMID:28129417
Biomechanics of Atherosclerotic Coronary Plaque: Site, Stability and In Vivo Elasticity Modeling
Ohayon, Jacques; Finet, Gérard; Le Floc’h, Simon; Cloutier, Guy; Gharib, Ahmed M.; Heroux, Julie; Pettigrew, Roderic I.
2016-01-01
Coronary atheroma develop in local sites that are widely variable among patients and are considerably variable in their vulnerability for rupture. This article summarizes studies conducted by our collaborative laboratories on predictive biomechanical modeling of coronary plaques. It aims to give insights into the role of biomechanics in the development and localization of atherosclerosis, the morphologic features that determine vulnerable plaque stability, and emerging in vivo imaging techniques that may detect and characterize vulnerable plaque. Composite biomechanical and hemodynamic factors that influence the actual site of development of plaques have been studied. Plaque vulnerability, in vivo, is more challenging to assess. Important steps have been made in defining the biomechanical factors that are predictive of plaque rupture and the likelihood of this occurring if characteristic features are known. A critical key in defining plaque vulnerability is the accurate quantification of both the morphology and the mechanical properties of the diseased arteries. Recently, an early IVUS based palpography technique developed to assess local strain, elasticity and mechanical instabilities has been successfully revisited and improved to account for complex plaque geometries. This is based on an initial best estimation of the plaque components’ contours, allowing subsequent iteration for elastic modulus assessment as a basis for plaque stability determination. The improved method has also been preliminarily evaluated in patients with successful histologic correlation. Further clinical evaluation and refinement are on the horizon. PMID:24043605
Texture analysis of medical images for radiotherapy applications
Rizzo, Giovanna
2017-01-01
The high-throughput extraction of quantitative information from medical images, known as radiomics, has grown in interest due to the current necessity to quantitatively characterize tumour heterogeneity. In this context, texture analysis, consisting of a variety of mathematical techniques that can describe the grey-level patterns of an image, plays an important role in assessing the spatial organization of different tissues and organs. For these reasons, the potentiality of texture analysis in the context of radiotherapy has been widely investigated in several studies, especially for the prediction of the treatment response of tumour and normal tissues. Nonetheless, many different factors can affect the robustness, reproducibility and reliability of textural features, thus limiting the impact of this technique. In this review, an overview of the most recent works that have applied texture analysis in the context of radiotherapy is presented, with particular focus on the assessment of tumour and tissue response to radiations. Preliminary, the main factors that have an influence on features estimation are discussed, highlighting the need of more standardized image acquisition and reconstruction protocols and more accurate methods for region of interest identification. Despite all these limitations, texture analysis is increasingly demonstrating its ability to improve the characterization of intratumour heterogeneity and the prediction of clinical outcome, although prospective studies and clinical trials are required to draw a more complete picture of the full potential of this technique. PMID:27885836
Baldwin, Keith D; Brusalis, Christopher M; Nduaguba, Afamefuna M; Sankar, Wudbhav N
2016-05-04
Differentiating between septic arthritis and Lyme disease of the knee in endemic areas can be challenging and has major implications for patient management. The purpose of this study was to identify a prediction rule to differentiate septic arthritis from Lyme disease in children presenting with knee pain and effusion. We retrospectively reviewed the records of patients younger than 18 years of age with knee effusions who underwent arthrocentesis at our institution from 2005 to 2013. Patients with either septic arthritis (positive joint fluid culture or synovial white blood-cell count of >60,000 white blood cells/mm(3) with negative Lyme titer) or Lyme disease (positive Lyme immunoglobulin G on Western blot analysis) were included. To avoid misclassification bias, undiagnosed knee effusions and joints with both a positive culture and positive Lyme titers were excluded. Historical, clinical, and laboratory data were compared between groups to identify variables for comparison. Binary logistic regression analysis was used to identify independent predictive variables. One hundred and eighty-nine patients were studied: 23 with culture-positive septic arthritis, 26 with culture-negative septic arthritis, and 140 with Lyme disease. Multivariate binary logistic regression identified pain with short arc motion, history of fever reported by the patient or a family member, C-reactive protein of >4 mg/L, and age younger than 2 years as independent predictive factors for septic arthritis. A simpler model was developed that showed that the risk of septic arthritis with none of these factors was 2%, with 1 of these factors was 18%, with 2 of these factors was 45%, with 3 of these factors was 84%, or with all 4 of these factors was 100%. Although septic arthritis of the knee and Lyme monoarthritis share common features that can make them difficult to distinguish clinically, the presence of pain with short arc motion, C-reactive protein of >4.0 mg/L, patient-reported history of fever, and age younger than 2 years were independent predictive factors of septic arthritis in pediatric patients. The more factors that are present, the higher the risk of having septic arthritis. Diagnostic Level III. See Instructions for Authors for a complete description of levels of evidence. Copyright © 2016 by The Journal of Bone and Joint Surgery, Incorporated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yip, S; Coroller, T; Rios Velazquez, E
Purpose: Although PET-based radiomic features have been proposed to quantify tumor heterogeneity and shown promise in outcome prediction, little is known about their relationship with tumor genetics. This study assessed the association of [{sup 18}F]fluorodeoxyglucose (FDG)-PET-based radiomic features with non-small cell lung cancer (NSCLC) mutations. Methods: 348 NSCLC patients underwent FDG-PET/CT scans before treatment and were tested for genetic mutations. 13% (44/348) and 28% (96/348) patients were found to harbor EGFR (EGFR+) and KRAS (KRAS+) mutations, respectively. We evaluated nineteen PET-based radiomic features quantifying phenotypic traits, and compared them with conventional PET features (metabolic tumor volume (MTV) and maximum-SUV). Themore » association between the feature values and mutation status was evaluated using the Wilcoxcon-rank-sum-test. The ability of each measure to predict mutations was assessed by the area under the receiver operating curve (AUC). Noether’s test was used to determine if the AUCs were significantly from random (AUC=0.50). All p-values were corrected for multiple testing by controlling the false discovery rate (FDR{sub Wilcoxon} and FDR{sub Noether}) of 10%. Results: Eight radiomic features, MTV, and maximum-SUV, were significantly associated with the EGFR mutation (FDR{sub Wilcoxon}=0.01–0.10). However, KRAS+ demonstrated no significantly distinctive imaging features compared to KRAS− (FDR{sub Wilcoxon}≥0.92). EGFR+ and EGFR− were significantly discriminated by conventional PET features (AUC=0.61, FDR{sub Noether}=0.04 for MTV and AUC=0.64, FDR{sub Noether}=0.01 for maximum-SUV). Eight radiomic features were significantly predictive for EGFR+ compared to EGFR− (AUC=0.59–0.67, FDR{sub Noether}=0.0032–0.09). Normalized-inverse-difference-moment outperformed all features in predicting EGFR mutation (AUC=0.67, FDR{sub Noether}=0.0032). Moreover, only the radiomic feature normalized-inverse-difference-moment could significantly predict KRAS+ from EGFR+ (AUC=0.65, FDR{sub Noether}=0.05). All measures failed to predict KRAS+ from KRAS− (AUC=0.50–0.54, FDR{sub Noether}≥0.92). Conclusion: PET imaging features were strongly associated with EGFR mutations in NSCLC. Radiomic features have great potential in predicting EGFR mutations. Our study may help develop a non-invasive imaging biomarker for EGFR mutation. R.M. has consulting interests with Amgen.« less
Ramírez-Venegas, Alejandra; Sansores, Raul H; Quintana-Carrillo, Roger H; Velázquez-Uncal, Monica; Hernandez-Zenteno, Rafael J; Sánchez-Romero, Candelaria; Velazquez-Montero, Alejandra; Flores-Trujillo, Fernando
2014-11-01
Biomass exposure is an important risk factor for chronic obstructive pulmonary disease (COPD). However, the time-course behavior of FEV1 in subjects exposed to biomass is unknown. We undertook this study to determine the FEV1 rate decline in subjects exposed to biomass. Pulmonary function was assessed every year in a Mexican cohort of patients with COPD associated with biomass or tobacco during a 15-year follow-up period. The mean rate of decline was significantly lower for the biomass exposure COPD group (BE-COPD) than for the tobacco smoke COPD group (TS-COPD) (23 vs. 42 ml, respectively; P < 0.01). Of the TS-COPD group, 11% were rapid decliners, whereas only one rapid decliner was found in the BE-COPD group; 69 and 21% of smokers versus 17 and 83% of the BE-COPD group were slow decliners and sustainers, respectively. A higher FEV1 both as % predicted and milliliters was a predictive factor for decline for BE-COPD and TS-COPD, whereas reversibility to bronchodilator was a predictive factor for both groups when adjusted by FEV1% predicted and only for the TS-COPD group when adjusted by milliliters. In the biomass exposure COPD group the rate of FEV1 decline is slower and shows a more homogeneous rate of decline over time in comparison with smokers. The rapid rate of FEV1 decline is a rare feature of biomass-induced airflow limitation.
Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi
2014-01-01
Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.
20180411 - Universal LD50 predictions using deep learning (ICCVAM)
NICEATM Predictive Models for Acute Oral Systemic Toxicity LD50 entry Risa R. Sayre (sayre.risa@epa.gov) & Christopher M. Grulke Our approach uses an ensemble of multilayer perceptron regressions to predict rat acute oral LD50 values from chemical features. Features were gene...
Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.
Funk, Christopher S; Kahanda, Indika; Ben-Hur, Asa; Verspoor, Karin M
2015-01-01
Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated.
Tadayyon, Hadi; Sannachi, Lakshmanan; Gangeh, Mehrdad J.; Kim, Christina; Ghandi, Sonal; Trudeau, Maureen; Pritchard, Kathleen; Tran, William T.; Slodkowska, Elzbieta; Sadeghi-Naini, Ali; Czarnota, Gregory J.
2017-01-01
Quantitative ultrasound (QUS) can probe tissue structure and analyze tumour characteristics. Using a 6-MHz ultrasound system, radiofrequency data were acquired from 56 locally advanced breast cancer patients prior to their neoadjuvant chemotherapy (NAC) and QUS texture features were computed from regions of interest in tumour cores and their margins as potential predictive and prognostic indicators. Breast tumour molecular features were also collected and used for analysis. A multiparametric QUS model was constructed, which demonstrated a response prediction accuracy of 88% and ability to predict patient 5-year survival rates (p = 0.01). QUS features demonstrated superior performance in comparison to molecular markers and the combination of QUS and molecular markers did not improve response prediction. This study demonstrates, for the first time, that non-invasive QUS features in the core and margin of breast tumours can indicate breast cancer response to neoadjuvant chemotherapy (NAC) and predict five-year recurrence-free survival. PMID:28401902
Tadayyon, Hadi; Sannachi, Lakshmanan; Gangeh, Mehrdad J; Kim, Christina; Ghandi, Sonal; Trudeau, Maureen; Pritchard, Kathleen; Tran, William T; Slodkowska, Elzbieta; Sadeghi-Naini, Ali; Czarnota, Gregory J
2017-04-12
Quantitative ultrasound (QUS) can probe tissue structure and analyze tumour characteristics. Using a 6-MHz ultrasound system, radiofrequency data were acquired from 56 locally advanced breast cancer patients prior to their neoadjuvant chemotherapy (NAC) and QUS texture features were computed from regions of interest in tumour cores and their margins as potential predictive and prognostic indicators. Breast tumour molecular features were also collected and used for analysis. A multiparametric QUS model was constructed, which demonstrated a response prediction accuracy of 88% and ability to predict patient 5-year survival rates (p = 0.01). QUS features demonstrated superior performance in comparison to molecular markers and the combination of QUS and molecular markers did not improve response prediction. This study demonstrates, for the first time, that non-invasive QUS features in the core and margin of breast tumours can indicate breast cancer response to neoadjuvant chemotherapy (NAC) and predict five-year recurrence-free survival.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, R; Aguilera, T; Shultz, D
2014-06-15
Purpose: This study aims to develop predictive models of patient outcome by extracting advanced imaging features (i.e., Radiomics) from FDG-PET images. Methods: We acquired pre-treatment PET scans for 51 stage I NSCLC patients treated with SABR. We calculated 139 quantitative features from each patient PET image, including 5 morphological features, 8 statistical features, 27 texture features, and 100 features from the intensity-volume histogram. Based on the imaging features, we aim to distinguish between 2 risk groups of patients: those with regional failure or distant metastasis versus those without. We investigated 3 pattern classification algorithms: linear discriminant analysis (LDA), naive Bayesmore » (NB), and logistic regression (LR). To avoid the curse of dimensionality, we performed feature selection by first removing redundant features and then applying sequential forward selection using the wrapper approach. To evaluate the predictive performance, we performed 10-fold cross validation with 1000 random splits of the data and calculated the area under the ROC curve (AUC). Results: Feature selection identified 2 texture features (homogeneity and/or wavelet decompositions) for NB and LR, while for LDA SUVmax and one texture feature (correlation) were identified. All 3 classifiers achieved statistically significant improvements over conventional PET imaging metrics such as tumor volume (AUC = 0.668) and SUVmax (AUC = 0.737). Overall, NB achieved the best predictive performance (AUC = 0.806). This also compares favorably with MTV using the best threshold at an SUV of 11.6 (AUC = 0.746). At a sensitivity of 80%, NB achieved 69% specificity, while SUVmax and tumor volume only had 36% and 47% specificity. Conclusion: Through a systematic analysis of advanced PET imaging features, we are able to build models with improved predictive value over conventional imaging metrics. If validated in a large independent cohort, the proposed techniques could potentially aid in identifying patients who might benefit from adjuvant therapy.« less
[The changes in spectral features of the staple-food bamboos of giant panda after flowering].
Liu, Xue-Hua; Wu, Yan
2012-12-01
Large-area flowering of the giant pandas' staple food is an important factor which can influence their survival. Therefore, it is necessary to predict the bamboo flowering. Foping Nature Reserve was taken as the study area. The research selected the giant pandas' staple-food bamboos Bashania fargesii, Fargesia qinlingensis and Fargesia dracocephala with different flowering situations (i. e., flowering, potential flowering, non-flowering with far distance) to measure the spectral reflectance of bamboo leaves. We studied the influence of bamboo flowering on the spectral features of three bamboo species through analyzing the original spectral reflectance and their red edge parameters. The results showed that (1) the flowering changed the spectra features of bamboo species. The spectral reflectance of B. fargesii shows a pattern: flowering bamboo < potential flowering bamboo < non-flowering bamboo with far distance, while F. qinlingensis and F. dracocephala show the different pattern: flowering bamboo > or = potential flowering bamboo > non-flowering bamboo with far distance. Among three bamboo species, F. dracocephala showed the greatest change, and then F. qinlingensis. (2) After bamboo flowering, the red edge of B. fargesii has no obvious shifting, while the other two bamboos have distinctive shifting towards the shorter waves. The study found that the original spectral feature and the red edge all changed under various flowering states, which can be used to provide the experimental basis and theoretic support for the future prediction of bamboo flowering through remote sensing.
NASA Astrophysics Data System (ADS)
Arendt, C. A.; Heikoop, J. M.; Newman, B. D.; Wales, N. A.; McCaully, R. E.; Wilson, C. J.; Wullschleger, S.
2017-12-01
The geochemical evolution of Arctic regions as permafrost degrades, significantly impacts nutrient availability. The release of nitrogen compounds from permafrost degradation fertilizes both microbial decomposition and plant productivity. Arctic warming promotes permafrost degradation, causing geomorphic and hydrologic transitions that have the potential to convert saturated zones to unsaturated zones and subsequently alter the nitrate production capacity of permafrost regions. Changes in Nitrate (NO3-) content associated with shifting moisture regimes are a primary factor determining Arctic fertilization and subsequent primary productivity, and have direct feedbacks to carbon cycling. We have documented a broad survey of co-located soil moisture and nitrate concentration measurements in shallow active layer regions across a variety of topographic features in the expansive continuous permafrost region encompassing the Barrow Peninsula of Alaska. Topographic features of interest are slightly higher relative to surrounding landscapes with drier soils and elevated nitrate, including the rims of low centered polygons, the centers of flat and high centered polygons, the rims of young, old and ancient drain thaw lake basins and drainage slopes that exist across the landscape. With this information, we model the nitrate inventory of the Barrow Peninsula using multiple geospatial approaches to estimate total area cover by unsaturated features of interest and further predict how various drying scenarios increase the magnitude of nitrate produced in degrading permafrost regions across the Arctic. This work is supported by the US Department of Energy Next Generation Ecosystem Experiment, NGEE-Arctic.
Fuzzy-Trace Theory and Lifespan Cognitive Development
Brainerd, C J.; Reyna, Valerie F.
2015-01-01
Fuzzy-trace theory (FTT) emphasizes the use of core theoretical principles, such as the verbatim-gist distinction, to predict new findings about cognitive development that are counterintuitive from the perspective of other theories or of common-sense. To the extent that such predictions are confirmed, the range of phenomena that are explained expands without increasing the complexity of the theory's assumptions. We examine research on recent examples of such predictions during four epochs of cognitive development: childhood, adolescence, young adulthood, and late adulthood. During the first two, the featured predictions are surprising developmental reversals in false memory (childhood) and in risky decision making (adolescence). During young adulthood, FTT predicts that a retrieval operation that figures centrally in dual-process theories of memory, recollection, is bivariate rather than univariate. During the late adulthood, FTT identifies a retrieval operation, reconstruction, that has been omitted from current theories of normal memory declines in aging and pathological declines in dementia. The theory predicts that reconstruction is a major factor in such declines and that it is able to forecast future dementia. PMID:26644632
Fuzzy-Trace Theory and Lifespan Cognitive Development.
Brainerd, C J; Reyna, Valerie F
2015-12-01
Fuzzy-trace theory (FTT) emphasizes the use of core theoretical principles, such as the verbatim-gist distinction, to predict new findings about cognitive development that are counterintuitive from the perspective of other theories or of common-sense. To the extent that such predictions are confirmed, the range of phenomena that are explained expands without increasing the complexity of the theory's assumptions. We examine research on recent examples of such predictions during four epochs of cognitive development: childhood, adolescence, young adulthood, and late adulthood. During the first two, the featured predictions are surprising developmental reversals in false memory (childhood) and in risky decision making (adolescence). During young adulthood, FTT predicts that a retrieval operation that figures centrally in dual-process theories of memory, recollection, is bivariate rather than univariate. During the late adulthood, FTT identifies a retrieval operation, reconstruction, that has been omitted from current theories of normal memory declines in aging and pathological declines in dementia. The theory predicts that reconstruction is a major factor in such declines and that it is able to forecast future dementia.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhong, H; Wang, J; Shen, L
Purpose: The purpose of this study is to investigate the relationship between computed tomographic (CT) texture features of primary lesions and metastasis-free survival for rectal cancer patients; and to develop a datamining prediction model using texture features. Methods: A total of 220 rectal cancer patients treated with neoadjuvant chemo-radiotherapy (CRT) were enrolled in this study. All patients underwent CT scans before CRT. The primary lesions on the CT images were delineated by two experienced oncologists. The CT images were filtered by Laplacian of Gaussian (LoG) filters with different filter values (1.0–2.5: from fine to coarse). Both filtered and unfiltered imagesmore » were analyzed using Gray-level Co-occurrence Matrix (GLCM) texture analysis with different directions (transversal, sagittal, and coronal). Totally, 270 texture features with different species, directions and filter values were extracted. Texture features were examined with Student’s t-test for selecting predictive features. Principal Component Analysis (PCA) was performed upon the selected features to reduce the feature collinearity. Artificial neural network (ANN) and logistic regression were applied to establish metastasis prediction models. Results: Forty-six of 220 patients developed metastasis with a follow-up time of more than 2 years. Sixtyseven texture features were significantly different in t-test (p<0.05) between patients with and without metastasis, and 12 of them were extremely significant (p<0.001). The Area-under-the-curve (AUC) of ANN was 0.72, and the concordance index (CI) of logistic regression was 0.71. The predictability of ANN was slightly better than logistic regression. Conclusion: CT texture features of primary lesions are related to metastasisfree survival of rectal cancer patients. Both ANN and logistic regression based models can be developed for prediction.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mathavan, T., E-mail: tjmathavan@gmail.com; Divya, A.; Benial, A. Milton Franklin
2016-05-23
Polyaniline (PANI) and its composites PANI-ZnO (Zinc oxide) and PANI-ZnO-GO (Graphene oxide) were successfully constructed. These materials were characterized by electron spin resonance (ESR) technique and ultraviolet visible spectrometry. The parameters such as line width, g-factor and spin concentration were deduced from ESR spectra, from the results the radical cation stabilization of PANI, PANI-ZnO and PANI-ZnO-GO composites were compared by the polaron and bipolaron formation. The absorption features obtained in the UV absorption spectra reveal the band gap of these modified PANI composites and also predicted the information of increasing and decreasing features of signal intensity and spin concentration.
NASA Astrophysics Data System (ADS)
Mathavan, T.; Divya, A.; Archana, J.; Ramasubbu, A.; Benial, A. Milton Franklin; Jothirajan, M. A.
2016-05-01
Polyaniline (PANI) and its composites PANI-ZnO (Zinc oxide) and PANI-ZnO-GO (Graphene oxide) were successfully constructed. These materials were characterized by electron spin resonance (ESR) technique and ultraviolet visible spectrometry. The parameters such as line width, g-factor and spin concentration were deduced from ESR spectra, from the results the radical cation stabilization of PANI, PANI-ZnO and PANI-ZnO-GO composites were compared by the polaron and bipolaron formation. The absorption features obtained in the UV absorption spectra reveal the band gap of these modified PANI composites and also predicted the information of increasing and decreasing features of signal intensity and spin concentration.
Schrodi, Steven J.; Mukherjee, Shubhabrata; Shan, Ying; Tromp, Gerard; Sninsky, John J.; Callear, Amy P.; Carter, Tonia C.; Ye, Zhan; Haines, Jonathan L.; Brilliant, Murray H.; Crane, Paul K.; Smelser, Diane T.; Elston, Robert C.; Weeks, Daniel E.
2014-01-01
Translation of results from genetic findings to inform medical practice is a highly anticipated goal of human genetics. The aim of this paper is to review and discuss the role of genetics in medically-relevant prediction. Germline genetics presages disease onset and therefore can contribute prognostic signals that augment laboratory tests and clinical features. As such, the impact of genetic-based predictive models on clinical decisions and therapy choice could be profound. However, given that (i) medical traits result from a complex interplay between genetic and environmental factors, (ii) the underlying genetic architectures for susceptibility to common diseases are not well-understood, and (iii) replicable susceptibility alleles, in combination, account for only a moderate amount of disease heritability, there are substantial challenges to constructing and implementing genetic risk prediction models with high utility. In spite of these challenges, concerted progress has continued in this area with an ongoing accumulation of studies that identify disease predisposing genotypes. Several statistical approaches with the aim of predicting disease have been published. Here we summarize the current state of disease susceptibility mapping and pharmacogenetics efforts for risk prediction, describe methods used to construct and evaluate genetic-based predictive models, and discuss applications. PMID:24917882
Breast dosimetry in clinical mammography
NASA Astrophysics Data System (ADS)
Benevides, Luis Alberto Do Rego
The objective of this study was show that a clinical dosimetry protocol that utilizes a dosimetric breast phantom series based on population anthropometric measurements can reliably predict the average glandular dose (AGD) imparted to the patient during a routine screening mammogram. In the study, AGD was calculated using entrance skin exposure and dose conversion factors based on fibroglandular content, compressed breast thickness, mammography unit parameters and modifying parameters for homogeneous phantom (phantom factor), compressed breast lateral dimensions (volume factor) and anatomical features (anatomical factor). The protocol proposes the use of a fiber-optic coupled (FOCD) or Metal Oxide Semiconductor Field Effect Transistor (MOSFET) dosimeter to measure the entrance skin exposure at the time of the mammogram without interfering with diagnostic information of the mammogram. The study showed that FOCD had sensitivity with less than 7% energy dependence, linear in all tube current-time product stations, and was reproducible within 2%. FOCD was superior to MOSFET dosimeter in sensitivity, reusability, and reproducibility. The patient fibroglandular content was evaluated using a calibrated modified breast tissue equivalent homogeneous phantom series (BRTES-MOD) designed from anthropomorphic measurements of a screening mammography population and whose elemental composition was referenced to International Commission on Radiation Units and Measurements Report 44 tissues. The patient fibroglandular content, compressed breast thickness along with unit parameters and spectrum half-value layer were used to derive the currently used dose conversion factor (DgN). The study showed that the use of a homogeneous phantom, patient compressed breast lateral dimensions and patient anatomical features can affect AGD by as much as 12%, 3% and 1%, respectively. The protocol was found to be superior to existing methodologies. In addition, the study population anthropometric measurements enabled the development of analytical equations to calculate the whole breast area, estimate for the skin layer thickness and optimal location for automatic exposure control ionization chamber. The clinical dosimetry protocol developed in this study can reliably predict the AGD imparted to an individual patient during a routine screening mammogram.
Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu
2012-12-01
Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Shiju; Qian, Wei; Guan, Yubao
2016-06-15
Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initiallymore » computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.« less
Osteosarcoma: Diagnostic dilemmas in histopathology and prognostic factors
Wadhwa, Neelam
2014-01-01
Osteosarcoma (OS), the commonest malignancy of osteoarticular origin, is a very aggressive neoplasm. Divergent histologic differentiation is common in OS; hence triple diagnostic approach is essential in all cases. 20% cases are atypical owing to lack of concurrence among clinicoradiologic and pathologic features necessitating resampling. Recognition of specific anatomic and histologic variants is essential in view of better outcome. Traditional prognostic factors of OS do stratify patients for short term outcome, but often fail to predict their long term outcome. Considering the negligible improvement in the patient outcome during the last 20 years, search for novel prognostic factors is in progress like ezrin vascular endothelial growth factor, chemokine receptors, dysregulation of various micro ribonucleic acid are potentially promising. Their utility needs to be validated by long term followup studies before they are incorporated in routine clinical practice. PMID:24932029
Factors influencing protein tyrosine nitration – structure-based predictive models
Bayden, Alexander S.; Yakovlev, Vasily A.; Graves, Paul R.; Mikkelsen, Ross B.; Kellogg, Glen E.
2010-01-01
Models for exploring tyrosine nitration in proteins have been created based on 3D structural features of 20 proteins for which high resolution X-ray crystallographic or NMR data are available and for which nitration of 35 total tyrosines has been experimentally proven under oxidative stress. Factors suggested in previous work to enhance nitration were examined with quantitative structural descriptors. The role of neighboring acidic and basic residues is complex: for the majority of tyrosines that are nitrated the distance to the heteroatom of the closest charged sidechain corresponds to the distance needed for suspected nitrating species to form hydrogen bond bridges between the tyrosine and that charged amino acid. This suggests that such bridges play a very important role in tyrosine nitration. Nitration is generally hindered for tyrosines that are buried and for those tyrosines where there is insufficient space for the nitro group. For in vitro nitration, closed environments with nearby heteroatoms or unsaturated centers that can stabilize radicals are somewhat favored. Four quantitative structure-based models, depending on the conditions of nitration, have been developed for predicting site-specific tyrosine nitration. The best model, relevant for both in vitro and in vivo cases predicts 30 of 35 tyrosine nitrations (positive predictive value) and has a sensitivity of 60/71 (11 false positives). PMID:21172423
Okubo, Hidenori; Ohori, Makoto; Ohno, Yoshio; Nakashima, Jun; Inoue, Rie; Nagao, Toshitaka; Tachibana, Masaaki
2014-05-01
To develop a nomogram based on postoperative factors and prostate-specific antigen levels to predict the non-biochemical recurrence rate after radical prostatectomy ina Japanese cohort. A total of 606 Japanese patients with T1-3N0M0 prostate cancer who underwent radical prostatectomy and pelvic lymph node dissection at Tokyo Medical University hospital from 2000 to 2010 were studied. A nomogram was constructed based on Cox hazard regression analysis evaluating the prognostic significance of serum prostate-specific antigen and pathological factors in the radical prostatectomy specimens. The discriminating ability of the nomogram was assessed by the concordance index (C-index), and the predicted and actual outcomes were compared with a bootstrapped calibration plot. With a mean follow up of 60.0 months, a total of 187 patients (30.9%) experienced biochemical recurrence, with a 5-year non-biochemical recurrence rate of 72.3%. Based on a Cox hazard regression model, a nomogram was constructed to predict non-biochemical recurrence using serum prostate-specific antigen level and pathological features in radical prostatectomy specimens. The concordance index was 0.77, and the calibration plots appeared to be accurate. The postoperative nomogram described here can provide valuable information regarding the need for adjuvant/salvage radiation or hormonal therapy in patients after radical prostatectomy.
Using multiscale texture and density features for near-term breast cancer risk analysis
Sun, Wenqing; Tseng, Tzu-Liang (Bill); Qian, Wei; Zhang, Jianying; Saltzstein, Edward C.; Zheng, Bin; Lure, Fleming; Yu, Hui; Zhou, Shi
2015-01-01
Purpose: To help improve efficacy of screening mammography by eventually establishing a new optimal personalized screening paradigm, the authors investigated the potential of using the quantitative multiscale texture and density feature analysis of digital mammograms to predict near-term breast cancer risk. Methods: The authors’ dataset includes digital mammograms acquired from 340 women. Among them, 141 were positive and 199 were negative/benign cases. The negative digital mammograms acquired from the “prior” screening examinations were used in the study. Based on the intensity value distributions, five subregions at different scales were extracted from each mammogram. Five groups of features, including density and texture features, were developed and calculated on every one of the subregions. Sequential forward floating selection was used to search for the effective combinations. Using the selected features, a support vector machine (SVM) was optimized using a tenfold validation method to predict the risk of each woman having image-detectable cancer in the next sequential mammography screening. The area under the receiver operating characteristic curve (AUC) was used as the performance assessment index. Results: From a total number of 765 features computed from multiscale subregions, an optimal feature set of 12 features was selected. Applying this feature set, a SVM classifier yielded performance of AUC = 0.729 ± 0.021. The positive predictive value was 0.657 (92 of 140) and the negative predictive value was 0.755 (151 of 200). Conclusions: The study results demonstrated a moderately high positive association between risk prediction scores generated by the quantitative multiscale mammographic image feature analysis and the actual risk of a woman having an image-detectable breast cancer in the next subsequent examinations. PMID:26127038
Feature maps driven no-reference image quality prediction of authentically distorted images
NASA Astrophysics Data System (ADS)
Ghadiyaram, Deepti; Bovik, Alan C.
2015-03-01
Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.
Ryan, J E; Warrier, S K; Lynch, A C; Ramsay, R G; Phillips, W A; Heriot, A G
2016-03-01
Approximately 20% of patients treated with neoadjuvant chemoradiotherapy (nCRT) for locally advanced rectal cancer achieve a pathological complete response (pCR) while the remainder derive the benefit of improved local control and downstaging and a small proportion show a minimal response. The ability to predict which patients will benefit would allow for improved patient stratification directing therapy to those who are likely to achieve a good response, thereby avoiding ineffective treatment in those unlikely to benefit. A systematic review of the English language literature was conducted to identify pathological factors, imaging modalities and molecular factors that predict pCR following chemoradiotherapy. PubMed, MEDLINE and Cochrane Database searches were conducted with the following keywords and MeSH search terms: 'rectal neoplasm', 'response', 'neoadjuvant', 'preoperative chemoradiation', 'tumor response'. After review of title and abstracts, 85 articles addressing the prediction of pCR were selected. Clear methods to predict pCR before chemoradiotherapy have not been defined. Clinical and radiological features of the primary cancer have limited ability to predict response. Molecular profiling holds the greatest potential to predict pCR but adoption of this technology will require greater concordance between cohorts for the biomarkers currently under investigation. At present no robust markers of the prediction of pCR have been identified and the topic remains an area for future research. This review critically evaluates existing literature providing an overview of the methods currently available to predict pCR to nCRT for locally advanced rectal cancer. The review also provides a comprehensive comparison of the accuracy of each modality. Colorectal Disease © 2015 The Association of Coloproctology of Great Britain and Ireland.
A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
Bollegala, Danushka; Kontonatsios, Georgios; Ananiadou, Sophia
2015-01-01
Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks. PMID:26030738
Güssregen, Stefan; Matter, Hans; Hessler, Gerhard; Müller, Marco; Schmidt, Friedemann; Clark, Timothy
2012-09-24
Current 3D-QSAR methods such as CoMFA or CoMSIA make use of classical force-field approaches for calculating molecular fields. Thus, they can not adequately account for noncovalent interactions involving halogen atoms like halogen bonds or halogen-π interactions. These deficiencies in the underlying force fields result from the lack of treatment of the anisotropy of the electron density distribution of those atoms, known as the "σ-hole", although recent developments have begun to take specific interactions such as halogen bonding into account. We have now replaced classical force field derived molecular fields by local properties such as the local ionization energy, local electron affinity, or local polarizability, calculated using quantum-mechanical (QM) techniques that do not suffer from the above limitation for 3D-QSAR. We first investigate the characteristics of QM-based local property fields to show that they are suitable for statistical analyses after suitable pretreatment. We then analyze these property fields with partial least-squares (PLS) regression to predict biological affinities of two data sets comprising factor Xa and GABA-A/benzodiazepine receptor ligands. While the resulting models perform equally well or even slightly better in terms of consistency and predictivity than the classical CoMFA fields, the most important aspect of these augmented field-types is that the chemical interpretation of resulting QM-based property field models reveals unique SAR trends driven by electrostatic and polarizability effects, which cannot be extracted directly from CoMFA electrostatic maps. Within the factor Xa set, the interaction of chlorine and bromine atoms with a tyrosine side chain in the protease S1 pocket are correctly predicted. Within the GABA-A/benzodiazepine ligand data set, PLS models of high predictivity resulted for our QM-based property fields, providing novel insights into key features of the SAR for two receptor subtypes and cross-receptor selectivity of the ligands. The detailed interpretation of regression models derived using improved QM-derived property fields thus provides a significant advantage by revealing chemically meaningful correlations with biological activity and helps in understanding novel structure-activity relationship features. This will allow such knowledge to be used to design novel molecules on the basis of interactions additional to steric and hydrogen-bonding features.
Furlanello, Cesare; Serafini, Maria; Merler, Stefano; Jurman, Giuseppe
2003-11-06
We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process). With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles. Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.
2014-01-01
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895
Zafar, Raheel; Dass, Sarat C; Malik, Aamir Saeed
2017-01-01
Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain-computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method.
Quantifying female bodily attractiveness by a statistical analysis of body measurements.
Gründl, Martin; Eisenmann-Klein, Marita; Prantl, Lukas
2009-03-01
To investigate what makes a female figure attractive, an extensive experiment was conducted using high-quality photographic stimulus material and several systematically varied figure parameters. The objective was to predict female bodily attractiveness by using figure measurements. For generating stimulus material, a frontal-view photograph of a woman with normal body proportions was taken. Using morphing software, 243 variations of this photograph were produced by systematically manipulating the following features: weight, hip width, waist width, bust size, and leg length. More than 34,000 people participated in the web-based experiment and judged the attractiveness of the figures. All of the altered figures were measured (e.g., bust width, underbust width, waist width, hip width, and so on). Based on these measurements, ratios were calculated (e.g., waist-to-hip ratio). A multiple regression analysis was designed to predict the attractiveness rank of a figure by using figure measurements. The results show that the attractiveness of a woman's figure may be predicted by using her body measurements. The regression analysis explains a variance of 80 percent. Important predictors are bust-to-underbust ratio, bust-to-waist ratio, waist-to-hip ratio, and an androgyny index (an indicator of a typical female body). The study shows that the attractiveness of a female figure is the result of complex interactions of numerous factors. It affirms the importance of viewing the appearance of a bodily feature in the context of other bodily features when performing preoperative analysis. Based on the standardized beta-weights of the regression model, the relative importance of figure parameters in context of preoperative analysis is discussed.
2013-01-01
Background Assessment of potential allergenicity of protein is necessary whenever transgenic proteins are introduced into the food chain. Bioinformatics approaches in allergen prediction have evolved appreciably in recent years to increase sophistication and performance. However, what are the critical features for protein's allergenicity have been not fully investigated yet. Results We presented a more comprehensive model in 128 features space for allergenic proteins prediction by integrating various properties of proteins, such as biochemical and physicochemical properties, sequential features and subcellular locations. The overall accuracy in the cross-validation reached 93.42% to 100% with our new method. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) procedure were applied to obtain which features are essential for allergenicity. Results of the performance comparisons showed the superior of our method to the existing methods used widely. More importantly, it was observed that the features of subcellular locations and amino acid composition played major roles in determining the allergenicity of proteins, particularly extracellular/cell surface and vacuole of the subcellular locations for wheat and soybean. To facilitate the allergen prediction, we implemented our computational method in a web application, which can be available at http://gmobl.sjtu.edu.cn/PREAL/index.php. Conclusions Our new approach could improve the accuracy of allergen prediction. And the findings may provide novel insights for the mechanism of allergies. PMID:24565053
Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter
2017-05-12
A better understanding of the genetic architecture of complex traits can contribute to improve genomic prediction. We hypothesized that genomic variants associated with mastitis and milk production traits in dairy cattle are enriched in hepatic transcriptomic regions that are responsive to intra-mammary infection (IMI). Genomic markers [e.g. single nucleotide polymorphisms (SNPs)] from those regions, if included, may improve the predictive ability of a genomic model. We applied a genomic feature best linear unbiased prediction model (GFBLUP) to implement the above strategy by considering the hepatic transcriptomic regions responsive to IMI as genomic features. GFBLUP, an extension of GBLUP, includes a separate genomic effect of SNPs within a genomic feature, and allows differential weighting of the individual marker relationships in the prediction equation. Since GFBLUP is computationally intensive, we investigated whether a SNP set test could be a computationally fast way to preselect predictive genomic features. The SNP set test assesses the association between a genomic feature and a trait based on single-SNP genome-wide association studies. We applied these two approaches to mastitis and milk production traits (milk, fat and protein yield) in Holstein (HOL, n = 5056) and Jersey (JER, n = 1231) cattle. We observed that a majority of genomic features were enriched in genomic variants that were associated with mastitis and milk production traits. Compared to GBLUP, the accuracy of genomic prediction with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction. The highest increase (164.4%) in prediction accuracy was observed in across-breed prediction. The significance of genomic features based on the SNP set test were correlated with changes in prediction accuracy of GFBLUP (P < 0.05). GFBLUP provides a framework for integrating multiple layers of biological knowledge to provide novel insights into the biological basis of complex traits, and to improve the accuracy of genomic prediction. The SNP set test might be used as a first-step to improve GFBLUP models. Approaches like GFBLUP and SNP set test will become increasingly useful, as the functional annotations of genomes keep accumulating for a range of species and traits.
SU-F-R-46: Predicting Distant Failure in Lung SBRT Using Multi-Objective Radiomics Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Z; Folkert, M; Iyengar, P
2016-06-15
Purpose: To predict distant failure in lung stereotactic body radiation therapy (SBRT) in early stage non-small cell lung cancer (NSCLC) by using a new multi-objective radiomics model. Methods: Currently, most available radiomics models use the overall accuracy as the objective function. However, due to data imbalance, a single object may not reflect the performance of a predictive model. Therefore, we developed a multi-objective radiomics model which considers both sensitivity and specificity as the objective functions simultaneously. The new model is used to predict distant failure in lung SBRT using 52 patients treated at our institute. Quantitative imaging features of PETmore » and CT as well as clinical parameters are utilized to build the predictive model. Image features include intensity features (9), textural features (12) and geometric features (8). Clinical parameters for each patient include demographic parameters (4), tumor characteristics (8), treatment faction schemes (4) and pretreatment medicines (6). The modelling procedure consists of two steps: extracting features from segmented tumors in PET and CT; and selecting features and training model parameters based on multi-objective. Support Vector Machine (SVM) is used as the predictive model, while a nondominated sorting-based multi-objective evolutionary computation algorithm II (NSGA-II) is used for solving the multi-objective optimization. Results: The accuracy for PET, clinical, CT, PET+clinical, PET+CT, CT+clinical, PET+CT+clinical are 71.15%, 84.62%, 84.62%, 85.54%, 82.69%, 84.62%, 86.54%, respectively. The sensitivities for the above seven combinations are 41.76%, 58.33%, 50.00%, 50.00%, 41.67%, 41.67%, 58.33%, while the specificities are 80.00%, 92.50%, 90.00%, 97.50%, 92.50%, 97.50%, 97.50%. Conclusion: A new multi-objective radiomics model for predicting distant failure in NSCLC treated with SBRT was developed. The experimental results show that the best performance can be obtained by combining all features.« less
Prediction of survival with multi-scale radiomic analysis in glioblastoma patients.
Chaddad, Ahmad; Sabri, Siham; Niazi, Tamim; Abdulkarim, Bassam
2018-06-19
We propose a multiscale texture features based on Laplacian-of Gaussian (LoG) filter to predict progression free (PFS) and overall survival (OS) in patients newly diagnosed with glioblastoma (GBM). Experiments use the extracted features derived from 40 patients of GBM with T1-weighted imaging (T1-WI) and Fluid-attenuated inversion recovery (FLAIR) images that were segmented manually into areas of active tumor, necrosis, and edema. Multiscale texture features were extracted locally from each of these areas of interest using a LoG filter and the relation between features to OS and PFS was investigated using univariate (i.e., Spearman's rank correlation coefficient, log-rank test and Kaplan-Meier estimator) and multivariate analyses (i.e., Random Forest classifier). Three and seven features were statistically correlated with PFS and OS, respectively, with absolute correlation values between 0.32 and 0.36 and p < 0.05. Three features derived from active tumor regions only were associated with OS (p < 0.05) with hazard ratios (HR) of 2.9, 3, and 3.24, respectively. Combined features showed an AUC value of 85.37 and 85.54% for predicting the PFS and OS of GBM patients, respectively, using the random forest (RF) classifier. We presented a multiscale texture features to characterize the GBM regions and predict he PFS and OS. The efficiency achievable suggests that this technique can be developed into a GBM MR analysis system suitable for clinical use after a thorough validation involving more patients. Graphical abstract Scheme of the proposed model for characterizing the heterogeneity of GBM regions and predicting the overall survival and progression free survival of GBM patients. (1) Acquisition of pretreatment MRI images; (2) Affine registration of T1-WI image with its corresponding FLAIR images, and GBM subtype (phenotypes) labelling; (3) Extraction of nine texture features from the three texture scales fine, medium, and coarse derived from each of GBM regions; (4) Comparing heterogeneity between GBM regions by ANOVA test; Survival analysis using Univariate (Spearman rank correlation between features and survival (i.e., PFS and OS) based on each of the GBM regions, Kaplan-Meier estimator and log-rank test to predict the PFS and OS of patient groups that grouped based on median of feature), and multivariate (random forest model) for predicting the PFS and OS of patients groups that grouped based on median of PFS and OS.
Combination of lateral and PA view radiographs to study development of knee OA and associated pain
NASA Astrophysics Data System (ADS)
Minciullo, Luca; Thomson, Jessie; Cootes, Timothy F.
2017-03-01
Knee Osteoarthritis (OA) is the most common form of arthritis, affecting millions of people around the world. The effects of the disease have been studied using the shape and texture features of bones in PosteriorAnterior (PA) and Lateral radiographs separately. In this work we compare the utility of features from each view, and evaluate whether combining features from both is advantageous. We built a fully automated system to independently locate landmark points in both radiographic images using Random Forest Constrained Local Models. We extracted discriminative features from the two bony outlines using Appearance Models. The features were used to train Random Forest classifiers to solve three specific tasks: (i) OA classification, distinguishing patients with structural signs of OA from the others; (ii) predicting future onset of the disease and (iii) predicting which patients with no current pain will have a positive pain score later in a follow-up visit. Using a subset of the MOST dataset we show that the PA view has more discriminative features to classify and predict OA, while the lateral view contains features that achieve better performance in predicting pain, and that combining the features from both views gives a small improvement in accuracy of the classification compared to the individual views.
NASA Astrophysics Data System (ADS)
Pandremmenou, K.; Shahid, M.; Kondi, L. P.; Lövström, B.
2015-03-01
In this work, we propose a No-Reference (NR) bitstream-based model for predicting the quality of H.264/AVC video sequences, affected by both compression artifacts and transmission impairments. The proposed model is based on a feature extraction procedure, where a large number of features are calculated from the packet-loss impaired bitstream. Many of the features are firstly proposed in this work, and the specific set of the features as a whole is applied for the first time for making NR video quality predictions. All feature observations are taken as input to the Least Absolute Shrinkage and Selection Operator (LASSO) regression method. LASSO indicates the most important features, and using only them, it is possible to estimate the Mean Opinion Score (MOS) with high accuracy. Indicatively, we point out that only 13 features are able to produce a Pearson Correlation Coefficient of 0.92 with the MOS. Interestingly, the performance statistics we computed in order to assess our method for predicting the Structural Similarity Index and the Video Quality Metric are equally good. Thus, the obtained experimental results verified the suitability of the features selected by LASSO as well as the ability of LASSO in making accurate predictions through sparse modeling.
Geospatial Analytics in Retail Site Selection and Sales Prediction.
Ting, Choo-Yee; Ho, Chiung Ching; Yee, Hui Jia; Matsah, Wan Razali
2018-03-01
Studies have shown that certain features from geography, demography, trade area, and environment can play a vital role in retail site selection, largely due to the impact they asserted on retail performance. Although the relevant features could be elicited by domain experts, determining the optimal feature set can be intractable and labor-intensive exercise. The challenges center around (1) how to determine features that are important to a particular retail business and (2) how to estimate retail sales performance given a new location? The challenges become apparent when the features vary across time. In this light, this study proposed a nonintervening approach by employing feature selection algorithms and subsequently sales prediction through similarity-based methods. The results of prediction were validated by domain experts. In this study, data sets from different sources were transformed and aggregated before an analytics data set that is ready for analysis purpose could be obtained. The data sets included data about feature location, population count, property type, education status, and monthly sales from 96 branches of a telecommunication company in Malaysia. The finding suggested that (1) optimal retail performance can only be achieved through fulfillment of specific location features together with the surrounding trade area characteristics and (2) similarity-based method can provide solution to retail sales prediction.
Prediction of type III secretion signals in genomes of gram-negative bacteria.
Löwer, Martin; Schneider, Gisbert
2009-06-15
Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway ("effector proteins") have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown. In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%). We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server (www.modlab.org).
Computational intelligence models to predict porosity of tablets using minimum features
Khalid, Mohammad Hassan; Kazemi, Pezhman; Perez-Gandarillas, Lucia; Michrafy, Abderrahim; Szlęk, Jakub; Jachowicz, Renata; Mendyk, Aleksander
2017-01-01
The effects of different formulations and manufacturing process conditions on the physical properties of a solid dosage form are of importance to the pharmaceutical industry. It is vital to have in-depth understanding of the material properties and governing parameters of its processes in response to different formulations. Understanding the mentioned aspects will allow tighter control of the process, leading to implementation of quality-by-design (QbD) practices. Computational intelligence (CI) offers an opportunity to create empirical models that can be used to describe the system and predict future outcomes in silico. CI models can help explore the behavior of input parameters, unlocking deeper understanding of the system. This research endeavor presents CI models to predict the porosity of tablets created by roll-compacted binary mixtures, which were milled and compacted under systematically varying conditions. CI models were created using tree-based methods, artificial neural networks (ANNs), and symbolic regression trained on an experimental data set and screened using root-mean-square error (RMSE) scores. The experimental data were composed of proportion of microcrystalline cellulose (MCC) (in percentage), granule size fraction (in micrometers), and die compaction force (in kilonewtons) as inputs and porosity as an output. The resulting models show impressive generalization ability, with ANNs (normalized root-mean-square error [NRMSE] =1%) and symbolic regression (NRMSE =4%) as the best-performing methods, also exhibiting reliable predictive behavior when presented with a challenging external validation data set (best achieved symbolic regression: NRMSE =3%). Symbolic regression demonstrates the transition from the black box modeling paradigm to more transparent predictive models. Predictive performance and feature selection behavior of CI models hints at the most important variables within this factor space. PMID:28138223
Computational intelligence models to predict porosity of tablets using minimum features.
Khalid, Mohammad Hassan; Kazemi, Pezhman; Perez-Gandarillas, Lucia; Michrafy, Abderrahim; Szlęk, Jakub; Jachowicz, Renata; Mendyk, Aleksander
2017-01-01
The effects of different formulations and manufacturing process conditions on the physical properties of a solid dosage form are of importance to the pharmaceutical industry. It is vital to have in-depth understanding of the material properties and governing parameters of its processes in response to different formulations. Understanding the mentioned aspects will allow tighter control of the process, leading to implementation of quality-by-design (QbD) practices. Computational intelligence (CI) offers an opportunity to create empirical models that can be used to describe the system and predict future outcomes in silico. CI models can help explore the behavior of input parameters, unlocking deeper understanding of the system. This research endeavor presents CI models to predict the porosity of tablets created by roll-compacted binary mixtures, which were milled and compacted under systematically varying conditions. CI models were created using tree-based methods, artificial neural networks (ANNs), and symbolic regression trained on an experimental data set and screened using root-mean-square error (RMSE) scores. The experimental data were composed of proportion of microcrystalline cellulose (MCC) (in percentage), granule size fraction (in micrometers), and die compaction force (in kilonewtons) as inputs and porosity as an output. The resulting models show impressive generalization ability, with ANNs (normalized root-mean-square error [NRMSE] =1%) and symbolic regression (NRMSE =4%) as the best-performing methods, also exhibiting reliable predictive behavior when presented with a challenging external validation data set (best achieved symbolic regression: NRMSE =3%). Symbolic regression demonstrates the transition from the black box modeling paradigm to more transparent predictive models. Predictive performance and feature selection behavior of CI models hints at the most important variables within this factor space.
Drug-target interaction prediction using ensemble learning and dimensionality reduction.
Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong
2017-10-01
Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction. Copyright © 2017 Elsevier Inc. All rights reserved.
Evaluation of Voice Acoustics as Predictors of Clinical Depression Scores.
Hashim, Nik Wahidah; Wilkes, Mitch; Salomon, Ronald; Meggs, Jared; France, Daniel J
2017-03-01
The aim of the present study was to determine if acoustic measures of voice, characterizing specific spectral and timing properties, predict clinical ratings of depression severity measured in a sample of patients using the Hamilton Depression Rating Scale (HAMD) and Beck Depression Inventory (BDI-II). This is a prospective study. Voice samples and clinical depression scores were collected prospectively from consenting adult patients who were referred to psychiatry from the adult emergency department or primary care clinics. The patients were audio-recorded as they read a standardized passage in a nearly closed-room environment. Mean Absolute Error (MAE) between actual and predicted depression scores was used as the primary outcome measure. The average MAE between predicted and actual HAMD scores was approximately two scores for both men and women, and the MAE for the BDI-II scores was approximately one score for men and eight scores for women. Timing features were predictive of HAMD scores in female patients while a combination of timing features and spectral features was predictive of scores in male patients. Timing features were predictive of BDI-II scores in male patients. Voice acoustic features extracted from read speech demonstrated variable effectiveness in predicting clinical depression scores in men and women. Voice features were highly predictive of HAMD scores in men and women, and BDI-II scores in men, respectively. The methodology is feasible for diagnostic applications in diverse clinical settings as it can be implemented during a standard clinical interview in a normal closed room and without strict control on the recording environment. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions
Roy, Sushmita; Martinez, Diego; Platero, Harriett; Lane, Terran; Werner-Washburne, Margaret
2009-01-01
Background Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information. Results AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins. Conclusion AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains. PMID:19936254
Shoji, Fumihiro; Haratake, Naoki; Akamine, Takaki; Takamori, Shinkichi; Katsura, Masakazu; Takada, Kazuki; Toyokawa, Gouji; Okamoto, Tatsuro; Maehara, Yoshihiko
2017-02-01
The prognostic Controlling Nutritional Status (CONUT) score is used to evaluate immuno-nutritional conditions and is a predictive factor of postoperative survival in patients with digestive tract cancer. We retrospectively analyzed clinicopathological features of patients with pathological stage I non-small cell lung cancer (NSCLC) to identify predictors or prognostic factors of postoperative survival and to investigate the role of preoperative CONUT score in predicting survival. We selected 138 consecutive patients with pathological stage I NSCLC treated from August 2005 to August 2010. We measured their preoperative CONUT score in uni- and multivariate Cox regression analyses of postoperative survival. A high CONUT score was positively associated with preoperative serum carcinoembryonic antigen level (p=0.0100) and postoperative recurrence (p=0.0767). In multivariate analysis, the preoperative CONUT score [relative risk (RR)=6.058; 95% confidence interval (CI)=1.068-113.941; p=0.0407), increasing age (RR=7.858; 95% CI=2.034-36.185; p=0.0029), and pleural invasion (RR=36.615; 95% CI=5.900-362.620; p<0.0001) were independent prognostic factors. In Kaplan-Meier analysis of recurrence-free survival (RFS), cancer-specific survival (CS), and overall survival (OS), the group with high CONUT score had a significantly shorter RFS, CS, and OS than did the low-CONUT score group by log-rank test (p=0.0458, p=0.0104 and p=0.0096, respectively). The preoperative CONUT score is both a predictive and prognostic factor in patients with pathological stage I NSCLC. This immuno-nutritional score can indicate patients at high risk of postoperative recurrence and death. Copyright© 2017, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Wang, Ting-Yao; Chen, Wei-Ming; Yang, Lan-Yan; Chen, Chao-Yu; Chou, Wen-Chi; Chen, Yi-Yang; Chen, Chih-Cheng; Lee, Kuan-Der; Lu, Chang-Hsien
2016-11-01
Induction chemotherapy with docetaxel improved outcome in advanced head and neck squamous cell carcinoma (HNSCC) patients, but docetaxel was not recommended in liver dysfunction patients for treatment toxicities. Severe neutropenic events (SNE) including severe neutropenia (SN) and febrile neutropenia (FN) still developed in these patients with normal serum transaminases. Ultrasonography (US) fibrotic score represented degree of hepatic parenchymal damage and showed good correlation to fibrotic changes histologically. This study aims to evaluate the association of US fibrotic score with docetaxel treatment-related SNE in advanced HNSCC patients with normal serum transaminases. Between 1 January 2011 and 31 December 2013, a total of 47 advanced HNSCC patients treated with induction docetaxel were enrolled. The clinical features were collected to assess predictive factors for SNE. The patients were divided into two groups by the US fibrotic score with a cutoff value of 7. The Mann-Whitney U test and logistic regression method were used for the risk factor analysis. The background, treatment, and response were similar in both groups except for lower lymphocyte and platelet count in patients with higher US score. Twenty-seven patients (51 %) developed grade 3/4 neutropenia, and more SNE developed in patients with US score ≧7. In multivariate analysis, only US score ≥7 was independent predictive factor for developing SN (hazard ratio 7.71, p = 0.043) and FN (hazard ratio 20.95, p = 0.008). US score ≥7 is an independent risk factor for SNE in advanced HNSCC patients treated with induction docetaxel. US score could be used for risk prediction of docetaxel-related SNE.
Grouping like catchments: A novel means to compare 40+ watersheds in the Northeastern U.S.
NASA Astrophysics Data System (ADS)
Shaw, S. B.; Walter, M.; Marjerison, R. D.
2008-12-01
One difficulty in understanding the effect of multi-scale patterns in watersheds comes from finding a concise way to identify and compare features across many basins. Comparing raw data (i.e. discharge time series) requires one to account for highly variable climate drivers while extracting meaningful metrics from the data series. Comparing model parameters imposes model assumptions that may obscure fundamental differences, potentially making it an exercise in comparing calibration factors. As a possible middle ground, we have found that the probability of a given basin-wide runoff response can be predicted by combining rainfall frequency with 1. a curve establishing a relationship between basin storage and base flow and 2. the baseflow flow-duration curve. In addition to providing a means to predict runoff, these curves succinctly present empirical runoff-response information, allowing ready graphical comparison of multiple watersheds. From 40+ watersheds throughout the Northeastern U.S., we demonstrate the potential to group watersheds and identify critical hydrologic features, providing particular insight into the influence of land use patterns as well as basin scale.
Ground motion in the presence of complex Topography II: Earthquake sources and 3D simulations
Hartzell, Stephen; Ramirez-Guzman, Leonardo; Meremonte, Mark; Leeds, Alena L.
2017-01-01
Eight seismic stations were placed in a linear array with a topographic relief of 222 m over Mission Peak in the east San Francisco Bay region for a period of one year to study topographic effects. Seventy‐two well‐recorded local earthquakes are used to calculate spectral amplitude ratios relative to a reference site. A well‐defined fundamental resonance peak is observed with individual station amplitudes following the theoretically predicted progression of larger amplitudes in the upslope direction. Favored directions of vibration are also seen that are related to the trapping of shear waves within the primary ridge dimensions. Spectral peaks above the fundamental one are also related to topographic effects but follow a more complex pattern. Theoretical predictions using a 3D velocity model and accurate topography reproduce many of the general frequency and time‐domain features of the data. Shifts in spectral frequencies and amplitude differences, however, are related to deficiencies of the model and point out the importance of contributing factors, including the shear‐wave velocity under the topographic feature, near‐surface velocity gradients, and source parameters.
Inertial measurements of free-living activities: assessing mobility to predict falls.
Wang, Kejia; Lovell, Nigel H; Del Rosario, Michael B; Liu, Ying; Wang, Jingjing; Narayanan, Michael R; Brodie, Matthew A D; Delbaere, Kim; Menant, Jasmine; Lord, Stephen R; Redmond, Stephen J
2014-01-01
An exploratory analysis was conducted into how simple features, from acceleration at the lower back and ankle during simulated free-living walking, stair ascent and descent, correlate with age, the overall fall risk from a clinically validated Physiological Profile Assessment (PPA), and its sub-components. Inertial data were captured from 92 older adults aged 78-95 (42 female, mean age 84.1, standard deviation 3.9 years). The dominant frequency, peak width from Welch's power spectral density estimate, and signal variance along each axis, from each sensor location and for each activity were calculated. Several correlations were found between these features and the physiological risk factors. The strongest correlations were from the dominant frequency at the ankle along the mediolateral direction during stair ascent (Spearman's correlation coefficient p = - 0.45) with anterioposterior sway, and signal variance of the anterioposterior acceleration at the lower back during stair descent (p = - 0.45) with age. These findings should aid future attempts to classify activities and predict falls in older adults, based on true free-living data from a range of activities.
Features and prognostic factors for elderly with acute poisoning in the emergency department.
Hu, Yu-Hui; Chou, Hsiu-Ling; Lu, Wen-Hua; Huang, Hsien-Hao; Yang, Cheng-Chang; Yen, David H T; Kao, Wei-Fong; Deng, Jou-Fan; Huang, Chun-I
2010-02-01
Elderly persons with acute poisoning in the emergency department (ED) and prognostic factors of outcomes have not been well addressed in previous research. This study aimed to investigate the characteristics of elderly patients with acute poisoning visiting the ED, and to identify the possible predictive factors of mortality. Patients aged > or = 65 years with acute poisoning who visited the ED in Taipei Veterans General Hospital from January 1, 2006 through to September 30, 2008 were enrolled in the study. We collected demographic information on underlying diseases, initial presentations, causes and toxic substances, complications, dispositions, and outcomes. Analyses were conducted among different groups categorized according to age, suicide attempt, and outcome. Multiple logistic regression was applied to identify possible predictive clinical factors influencing mortality in the elderly with acute poisoning. A total of 250 patients were enrolled in the study, with a mean age of 77 years and male predominance. The most common cause of intoxication was unintentional poisoning. Medication accounted for 57.6% of poisonous substances, of which benzodiazepine was the most common drug, followed by warfarin. The overall mortality rate was 9.6%. The average length of stay in the ED increased significantly in the old (65-74 years), very old (75-84 years) and extremely old (> or = 85 years) groups. Suicide attempt patients experienced more complications including respiratory failure, aspiration pneumonia, hypotension and mortality. Three clinical predictive factors of mortality were identified: herbicide poisoning, hypotension and respiratory failure upon presentation. Our results demonstrated that elderly patients with acute poisoning had a mortality rate of 9.6%. Suicide attempts resulted in more serious complications. The risk factors for mortality were herbicide intoxication, hypotension and respiratory failure. Copyright 2010 Elsevier. Published by Elsevier B.V. All rights reserved.
Predictive Value of Morphological Features in Patients with Autism versus Normal Controls
ERIC Educational Resources Information Center
Ozgen, H.; Hellemann, G. S.; de Jonge, M. V.; Beemer, F. A.; van Engeland, H.
2013-01-01
We investigated the predictive power of morphological features in 224 autistic patients and 224 matched-pairs controls. To assess the relationship between the morphological features and autism, we used the receiver operator curves (ROC). In addition, we used recursive partitioning (RP) to determine a specific pattern of abnormalities that is…
Forced to remember: when memory is biased by salient information.
Santangelo, Valerio
2015-04-15
The last decades have seen a rapid growing in the attempt to understand the key factors involved in the internal memory representation of the external world. Visual salience have been found to provide a major contribution in predicting the probability for an item/object embedded in a complex setting (i.e., a natural scene) to be encoded and then remembered later on. Here I review the existing literature highlighting the impact of perceptual- (based on low-level sensory features) and semantics-related salience (based on high-level knowledge) on short-term memory representation, along with the neural mechanisms underpinning the interplay between these factors. The available evidence reveal that both perceptual- and semantics-related factors affect attention selection mechanisms during the encoding of natural scenes. Biasing internal memory representation, both perceptual and semantics factors increase the probability to remember high- to the detriment of low-saliency items. The available evidence also highlight an interplay between these factors, with a reduced impact of perceptual-related salience in biasing memory representation as a function of the increasing availability of semantics-related salient information. The neural mechanisms underpinning this interplay involve the activation of different portions of the frontoparietal attention control network. Ventral regions support the assignment of selection/encoding priorities based on high-level semantics, while the involvement of dorsal regions reflects priorities assignment based on low-level sensory features. Copyright © 2015 Elsevier B.V. All rights reserved.
Predicting protein amidation sites by orchestrating amino acid sequence features
NASA Astrophysics Data System (ADS)
Zhao, Shuqiu; Yu, Hua; Gong, Xiujun
2017-08-01
Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.
Prediction of acoustic feature parameters using myoelectric signals.
Lee, Ki-Seung
2010-07-01
It is well-known that a clear relationship exists between human voices and myoelectric signals (MESs) from the area of the speaker's mouth. In this study, we utilized this information to implement a speech synthesis scheme in which MES alone was used to predict the parameters characterizing the vocal-tract transfer function of specific speech signals. Several feature parameters derived from MES were investigated to find the optimal feature for maximization of the mutual information between the acoustic and the MES features. After the optimal feature was determined, an estimation rule for the acoustic parameters was proposed, based on a minimum mean square error (MMSE) criterion. In a preliminary study, 60 isolated words were used for both objective and subjective evaluations. The results showed that the average Euclidean distance between the original and predicted acoustic parameters was reduced by about 30% compared with the average Euclidean distance of the original parameters. The intelligibility of the synthesized speech signals using the predicted features was also evaluated. A word-level identification ratio of 65.5% and a syllable-level identification ratio of 73% were obtained through a listening test.
Zheng, Ce; Kurgan, Lukasz
2008-10-10
beta-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of beta-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based beta-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential beta-turns, while the remaining four amino acids are useful to predict non-beta-turns. Empirical evaluation using three nonredundant datasets shows favorable Q total, Q predicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Q total barrier and achieves Q total = 80.9%, MCC = 0.47, and Q predicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between beta-turns and non-beta-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at http://biomine.ece.ualberta.ca/BTNpred/BTNpred.html.
Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining.
Habibi, Shafi; Ahmadi, Maryam; Alizadeh, Somayeh
2015-03-18
The aim of this study was to examine a predictive model using features related to the diabetes type 2 risk factors. The data were obtained from a database in a diabetes control system in Tabriz, Iran. The data included all people referred for diabetes screening between 2009 and 2011. The features considered as "Inputs" were: age, sex, systolic and diastolic blood pressure, family history of diabetes, and body mass index (BMI). Moreover, we used diagnosis as "Class". We applied the "Decision Tree" technique and "J48" algorithm in the WEKA (3.6.10 version) software to develop the model. After data preprocessing and preparation, we used 22,398 records for data mining. The model precision to identify patients was 0.717. The age factor was placed in the root node of the tree as a result of higher information gain. The ROC curve indicates the model function in identification of patients and those individuals who are healthy. The curve indicates high capability of the model, especially in identification of the healthy persons. We developed a model using the decision tree for screening T2DM which did not require laboratory tests for T2DM diagnosis.
Interictal epileptiform discharge characteristics underlying expert interrater agreement.
Bagheri, Elham; Dauwels, Justin; Dean, Brian C; Waters, Chad G; Westover, M Brandon; Halford, Jonathan J
2017-10-01
The presence of interictal epileptiform discharges (IED) in the electroencephalogram (EEG) is a key finding in the medical workup of a patient with suspected epilepsy. However, inter-rater agreement (IRA) regarding the presence of IED is imperfect, leading to incorrect and delayed diagnoses. An improved understanding of which IED attributes mediate expert IRA might help in developing automatic methods for IED detection able to emulate the abilities of experts. Therefore, using a set of IED scored by a large number of experts, we set out to determine which attributes of IED predict expert agreement regarding the presence of IED. IED were annotated on a 5-point scale by 18 clinical neurophysiologists within 200 30-s EEG segments from recordings of 200 patients. 5538 signal analysis features were extracted from the waveforms, including wavelet coefficients, morphological features, signal energy, nonlinear energy operator response, electrode location, and spectrogram features. Feature selection was performed by applying elastic net regression and support vector regression (SVR) was applied to predict expert opinion, with and without the feature selection procedure and with and without several types of signal normalization. Multiple types of features were useful for predicting expert annotations, but particular types of wavelet features performed best. Local EEG normalization also enhanced best model performance. As the size of the group of EEGers used to train the models was increased, the performance of the models leveled off at a group size of around 11. The features that best predict inter-rater agreement among experts regarding the presence of IED are wavelet features, using locally standardized EEG. Our models for predicting expert opinion based on EEGer's scores perform best with a large group of EEGers (more than 10). By examining a large group of EEG signal analysis features we found that wavelet features with certain wavelet basis functions performed best to identify IEDs. Local normalization also improves predictability, suggesting the importance of IED morphology over amplitude-based features. Although most IED detection studies in the past have used opinion from three or fewer experts, our study suggests a "wisdom of the crowd" effect, such that pooling over a larger number of expert opinions produces a better correlation between expert opinion and objectively quantifiable features of the EEG. Copyright © 2017 International Federation of Clinical Neurophysiology. Published by Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Voisin, Sophie; Pinto, Frank M; Morin-Ducote, Garnetta
2013-01-01
Purpose: The primary aim of the present study was to test the feasibility of predicting diagnostic errors in mammography by merging radiologists gaze behavior and image characteristics. A secondary aim was to investigate group-based and personalized predictive models for radiologists of variable experience levels. Methods: The study was performed for the clinical task of assessing the likelihood of malignancy of mammographic masses. Eye-tracking data and diagnostic decisions for 40 cases were acquired from 4 Radiology residents and 2 breast imaging experts as part of an IRB-approved pilot study. Gaze behavior features were extracted from the eye-tracking data. Computer-generated and BIRADsmore » images features were extracted from the images. Finally, machine learning algorithms were used to merge gaze and image features for predicting human error. Feature selection was thoroughly explored to determine the relative contribution of the various features. Group-based and personalized user modeling was also investigated. Results: Diagnostic error can be predicted reliably by merging gaze behavior characteristics from the radiologist and textural characteristics from the image under review. Leveraging data collected from multiple readers produced a reasonable group model (AUC=0.79). Personalized user modeling was far more accurate for the more experienced readers (average AUC of 0.837 0.029) than for the less experienced ones (average AUC of 0.667 0.099). The best performing group-based and personalized predictive models involved combinations of both gaze and image features. Conclusions: Diagnostic errors in mammography can be predicted reliably by leveraging the radiologists gaze behavior and image content.« less
Development of the Advance Warning Airborne System(AWAS)
NASA Technical Reports Server (NTRS)
Adamson, H. Patrick
1992-01-01
The thermal characteristics of microbursts are utilized by the AWAS IR and OAT features to provide predictive warning of hazardous microbursts ahead of the aircraft during landing or take off. The AWAS was evaluated satisfactorily in 1990 on a Cessna Citation that was intentionally flown into a number of wind shear events. The events were detected, and both the IR and OAT thermal features were shown to be effective. In 1991, AWAS units were flown on three American Airline MD-80's and three Northwest Airlines DC-9's to study and to decrease the nuisance alert response of the system. The AWAS was also flown on the NASA B737 during the summer of 1991. The results of these flights were inconclusive and disappointing. The results were not as promising as before because NASA conducted research flights which were outside of the normal operating envelope for which the AWAS is designed to operate. In an attempt to compensate for these differences in airspeed and mounting location, the automatic features of the system were sometimes overridden by NASA personnel during the flight. Each of these critical factors is discussed in detail. The effect of rain on the OAT signals is presented as a function of the air speed. Use of a 4 pole 1/20 Hertz filter is demonstrated by both the IR and thermal data. Participation in the NASA 1992 program was discussed. FAA direction in the continuing Certification program requires the addition of a reactive feature to the AWAS predictive system. This combined system will not require flight guidance on newer aircraft. The features of AWAS-IV, with the NASA algorithm included, were presented. Expected completion of the FAA Certification plan was also described.
Predicting Key Events in the Popularity Evolution of Online Information.
Hu, Ying; Hu, Changjun; Fu, Shushen; Fang, Mingzhe; Xu, Wenwen
2017-01-01
The popularity of online information generally experiences a rising and falling evolution. This paper considers the "burst", "peak", and "fade" key events together as a representative summary of popularity evolution. We propose a novel prediction task-predicting when popularity undergoes these key events. It is of great importance to know when these three key events occur, because doing so helps recommendation systems, online marketing, and containment of rumors. However, it is very challenging to solve this new prediction task due to two issues. First, popularity evolution has high variation and can follow various patterns, so how can we identify "burst", "peak", and "fade" in different patterns of popularity evolution? Second, these events usually occur in a very short time, so how can we accurately yet promptly predict them? In this paper we address these two issues. To handle the first one, we use a simple moving average to smooth variation, and then a universal method is presented for different patterns to identify the key events in popularity evolution. To deal with the second one, we extract different types of features that may have an impact on the key events, and then a correlation analysis is conducted in the feature selection step to remove irrelevant and redundant features. The remaining features are used to train a machine learning model. The feature selection step improves prediction accuracy, and in order to emphasize prediction promptness, we design a new evaluation metric which considers both accuracy and promptness to evaluate our prediction task. Experimental and comparative results show the superiority of our prediction solution.
Predicting Key Events in the Popularity Evolution of Online Information
Fu, Shushen; Fang, Mingzhe; Xu, Wenwen
2017-01-01
The popularity of online information generally experiences a rising and falling evolution. This paper considers the “burst”, “peak”, and “fade” key events together as a representative summary of popularity evolution. We propose a novel prediction task—predicting when popularity undergoes these key events. It is of great importance to know when these three key events occur, because doing so helps recommendation systems, online marketing, and containment of rumors. However, it is very challenging to solve this new prediction task due to two issues. First, popularity evolution has high variation and can follow various patterns, so how can we identify “burst”, “peak”, and “fade” in different patterns of popularity evolution? Second, these events usually occur in a very short time, so how can we accurately yet promptly predict them? In this paper we address these two issues. To handle the first one, we use a simple moving average to smooth variation, and then a universal method is presented for different patterns to identify the key events in popularity evolution. To deal with the second one, we extract different types of features that may have an impact on the key events, and then a correlation analysis is conducted in the feature selection step to remove irrelevant and redundant features. The remaining features are used to train a machine learning model. The feature selection step improves prediction accuracy, and in order to emphasize prediction promptness, we design a new evaluation metric which considers both accuracy and promptness to evaluate our prediction task. Experimental and comparative results show the superiority of our prediction solution. PMID:28046121
Rus, Holly M; Cameron, Linda D
2016-10-01
Social media provides unprecedented opportunities for enhancing health communication and health care, including self-management of chronic conditions such as diabetes. Creating messages that engage users is critical for enhancing message impact and dissemination. This study analyzed health communications within ten diabetes-related Facebook pages to identify message features predictive of user engagement. The Common-Sense Model of Illness Self-Regulation and established health communication techniques guided content analyses of 500 Facebook posts. Each post was coded for message features predicted to engage users and numbers of likes, shares, and comments during the week following posting. Multi-level, negative binomial regressions revealed that specific features predicted different forms of engagement. Imagery emerged as a strong predictor; messages with images had higher rates of liking and sharing relative to messages without images. Diabetes consequence information and positive identity predicted higher sharing while negative affect, social support, and crowdsourcing predicted higher commenting. Negative affect, crowdsourcing, and use of external links predicted lower sharing while positive identity predicted lower commenting. The presence of imagery weakened or reversed the positive relationships of several message features with engagement. Diabetes control information and negative affect predicted more likes in text-only messages, but fewer likes when these messages included illustrative imagery. Similar patterns of imagery's attenuating effects emerged for the positive relationships of consequence information, control information, and positive identity with shares and for positive relationships of negative affect and social support with comments. These findings hold promise for guiding communication design in health-related social media.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, J; Pollom, E; Durkee, B
2015-06-15
Purpose: To predict response to radiation treatment using computational FDG-PET and CT images in locally advanced head and neck cancer (HNC). Methods: 68 patients with State III-IVB HNC treated with chemoradiation were included in this retrospective study. For each patient, we analyzed primary tumor and lymph nodes on PET and CT scans acquired both prior to and during radiation treatment, which led to 8 combinations of image datasets. From each image set, we extracted high-throughput, radiomic features of the following types: statistical, morphological, textural, histogram, and wavelet, resulting in a total of 437 features. We then performed unsupervised redundancy removalmore » and stability test on these features. To avoid over-fitting, we trained a logistic regression model with simultaneous feature selection based on least absolute shrinkage and selection operator (LASSO). To objectively evaluate the prediction ability, we performed 5-fold cross validation (CV) with 50 random repeats of stratified bootstrapping. Feature selection and model training was solely conducted on the training set and independently validated on the holdout test set. Receiver operating characteristic (ROC) curve of the pooled Result and the area under the ROC curve (AUC) was calculated as figure of merit. Results: For predicting local-regional recurrence, our model built on pre-treatment PET of lymph nodes achieved the best performance (AUC=0.762) on 5-fold CV, which compared favorably with node volume and SUVmax (AUC=0.704 and 0.449, p<0.001). Wavelet coefficients turned out to be the most predictive features. Prediction of distant recurrence showed a similar trend, in which pre-treatment PET features of lymph nodes had the highest AUC of 0.705. Conclusion: The radiomics approach identified novel imaging features that are predictive to radiation treatment response. If prospectively validated in larger cohorts, they could aid in risk-adaptive treatment of HNC.« less
Zhou, Hang; Yang, Yang; Shen, Hong-Bin
2017-03-15
Protein subcellular localization prediction has been an important research topic in computational biology over the last decade. Various automatic methods have been proposed to predict locations for large scale protein datasets, where statistical machine learning algorithms are widely used for model construction. A key step in these predictors is encoding the amino acid sequences into feature vectors. Many studies have shown that features extracted from biological domains, such as gene ontology and functional domains, can be very useful for improving the prediction accuracy. However, domain knowledge usually results in redundant features and high-dimensional feature spaces, which may degenerate the performance of machine learning models. In this paper, we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0, which covers 12 human subcellular localizations. The sequences are represented by multi-view complementary features, i.e. context vocabulary annotation-based gene ontology (GO) terms, peptide-based functional domains, and residue-based statistical features. To systematically reflect the structural hierarchy of the domain knowledge bases, we propose a novel feature representation protocol denoted as HCM (Hidden Correlation Modeling), which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms. Experimental results on four benchmark datasets show that HCM improves prediction accuracy by 5-11% and F 1 by 8-19% compared with conventional GO-based methods. A large-scale application of Hum-mPLoc 3.0 on the whole human proteome reveals proteins co-localization preferences in the cell. www.csbio.sjtu.edu.cn/bioinf/Hum-mPLoc3/. hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Comprehensive Computational Pathological Image Analysis Predicts Lung Cancer Prognosis.
Luo, Xin; Zang, Xiao; Yang, Lin; Huang, Junzhou; Liang, Faming; Rodriguez-Canales, Jaime; Wistuba, Ignacio I; Gazdar, Adi; Xie, Yang; Xiao, Guanghua
2017-03-01
Pathological examination of histopathological slides is a routine clinical procedure for lung cancer diagnosis and prognosis. Although the classification of lung cancer has been updated to become more specific, only a small subset of the total morphological features are taken into consideration. The vast majority of the detailed morphological features of tumor tissues, particularly tumor cells' surrounding microenvironment, are not fully analyzed. The heterogeneity of tumor cells and close interactions between tumor cells and their microenvironments are closely related to tumor development and progression. The goal of this study is to develop morphological feature-based prediction models for the prognosis of patients with lung cancer. We developed objective and quantitative computational approaches to analyze the morphological features of pathological images for patients with NSCLC. Tissue pathological images were analyzed for 523 patients with adenocarcinoma (ADC) and 511 patients with squamous cell carcinoma (SCC) from The Cancer Genome Atlas lung cancer cohorts. The features extracted from the pathological images were used to develop statistical models that predict patients' survival outcomes in ADC and SCC, respectively. We extracted 943 morphological features from pathological images of hematoxylin and eosin-stained tissue and identified morphological features that are significantly associated with prognosis in ADC and SCC, respectively. Statistical models based on these extracted features stratified NSCLC patients into high-risk and low-risk groups. The models were developed from training sets and validated in independent testing sets: a predicted high-risk group versus a predicted low-risk group (for patients with ADC: hazard ratio = 2.34, 95% confidence interval: 1.12-4.91, p = 0.024; for patients with SCC: hazard ratio = 2.22, 95% confidence interval: 1.15-4.27, p = 0.017) after adjustment for age, sex, smoking status, and pathologic tumor stage. The results suggest that the quantitative morphological features of tumor pathological images predict prognosis in patients with lung cancer. Copyright © 2016 International Association for the Study of Lung Cancer. Published by Elsevier Inc. All rights reserved.
Rispo, Antonio; Imperatore, Nicola; Testa, Anna; Bucci, Luigi; Luglio, Gaetano; De Palma, Giovanni Domenico; Rea, Matilde; Nardone, Olga Maria; Caporaso, Nicola; Castiglione, Fabiana
2018-03-08
In the management of Crohn's Disease (CD) patients, having a simple score combining clinical, endoscopic and imaging features to predict the risk of surgery could help to tailor treatment more effectively. AIMS: to prospectively evaluate the one-year risk factors for surgery in refractory/severe CD and to generate a risk matrix for predicting the probability of surgery at one year. CD patients needing a disease re-assessment at our tertiary IBD centre underwent clinical, laboratory, endoscopy and bowel sonography (BS) examinations within one week. The optimal cut-off values in predicting surgery were identified using ROC curves for Simple Endoscopic Score for CD (SES-CD), bowel wall thickness (BWT) at BS, and small bowel CD extension at BS. Binary logistic regression and Cox's regression were then carried out. Finally, the probabilities of surgery were calculated for selected baseline levels of covariates and results were arranged in a prediction matrix. Of 100 CD patients, 30 underwent surgery within one year. SES-CD©9 (OR 15.3; p<0.001), BWT©7 mm (OR 15.8; p<0.001), small bowel CD extension at BS©33 cm (OR 8.23; p<0.001) and stricturing/penetrating behavior (OR 4.3; p<0.001) were the only independent factors predictive of surgery at one-year based on binary logistic and Cox's regressions. Our matrix model combined these risk factors and the probability of surgery ranged from 0.48% to 87.5% (sixteen combinations). Our risk matrix combining clinical, endoscopic and ultrasonographic findings can accurately predict the one-year risk of surgery in patients with severe/refractory CD requiring a disease re-evaluation. This tool could be of value in clinical practice, serving as the basis for a tailored management of CD patients.
Understanding Confounding Effects in Linguistic Coordination: An Information-Theoretic Approach
Gao, Shuyang; Ver Steeg, Greg; Galstyan, Aram
2015-01-01
We suggest an information-theoretic approach for measuring stylistic coordination in dialogues. The proposed measure has a simple predictive interpretation and can account for various confounding factors through proper conditioning. We revisit some of the previous studies that reported strong signatures of stylistic accommodation, and find that a significant part of the observed coordination can be attributed to a simple confounding effect—length coordination. Specifically, longer utterances tend to be followed by longer responses, which gives rise to spurious correlations in the other stylistic features. We propose a test to distinguish correlations in length due to contextual factors (topic of conversation, user verbosity, etc.) and turn-by-turn coordination. We also suggest a test to identify whether stylistic coordination persists even after accounting for length coordination and contextual factors. PMID:26115446
Desrosiers, Christian; Hassan, Lama; Tanougast, Camel
2016-01-01
Objective: Predicting the survival outcome of patients with glioblastoma multiforme (GBM) is of key importance to clinicians for selecting the optimal course of treatment. The goal of this study was to evaluate the usefulness of geometric shape features, extracted from MR images, as a potential non-invasive way to characterize GBM tumours and predict the overall survival times of patients with GBM. Methods: The data of 40 patients with GBM were obtained from the Cancer Genome Atlas and Cancer Imaging Archive. The T1 weighted post-contrast and fluid-attenuated inversion-recovery volumes of patients were co-registered and segmented into delineate regions corresponding to three GBM phenotypes: necrosis, active tumour and oedema/invasion. A set of two-dimensional shape features were then extracted slicewise from each phenotype region and combined over slices to describe the three-dimensional shape of these phenotypes. Thereafter, a Kruskal–Wallis test was employed to identify shape features with significantly different distributions across phenotypes. Moreover, a Kaplan–Meier analysis was performed to find features strongly associated with GBM survival. Finally, a multivariate analysis based on the random forest model was used for predicting the survival group of patients with GBM. Results: Our analysis using the Kruskal–Wallis test showed that all but one shape feature had statistically significant differences across phenotypes, with p-value < 0.05, following Holm–Bonferroni correction, justifying the analysis of GBM tumour shapes on a per-phenotype basis. Furthermore, the survival analysis based on the Kaplan–Meier estimator identified three features derived from necrotic regions (i.e. Eccentricity, Extent and Solidity) that were significantly correlated with overall survival (corrected p-value < 0.05; hazard ratios between 1.68 and 1.87). In the multivariate analysis, features from necrotic regions gave the highest accuracy in predicting the survival group of patients, with a mean area under the receiver-operating characteristic curve (AUC) of 63.85%. Combining the features of all three phenotypes increased the mean AUC to 66.99%, suggesting that shape features from different phenotypes can be used in a synergic manner to predict GBM survival. Conclusion: Results show that shape features, in particular those extracted from necrotic regions, can be used effectively to characterize GBM tumours and predict the overall survival of patients with GBM. Advances in knowledge: Simple volumetric features have been largely used to characterize the different phenotypes of a GBM tumour (i.e. active tumour, oedema and necrosis). This study extends previous work by considering a wide range of shape features, extracted in different phenotypes, for the prediction of survival in patients with GBM. PMID:27781499
A flexible data-driven comorbidity feature extraction framework.
Sideris, Costas; Pourhomayoun, Mohammad; Kalantarian, Haik; Sarrafzadeh, Majid
2016-06-01
Disease and symptom diagnostic codes are a valuable resource for classifying and predicting patient outcomes. In this paper, we propose a novel methodology for utilizing disease diagnostic information in a predictive machine learning framework. Our methodology relies on a novel, clustering-based feature extraction framework using disease diagnostic information. To reduce the data dimensionality, we identify disease clusters using co-occurrence statistics. We optimize the number of generated clusters in the training set and then utilize these clusters as features to predict patient severity of condition and patient readmission risk. We build our clustering and feature extraction algorithm using the 2012 National Inpatient Sample (NIS), Healthcare Cost and Utilization Project (HCUP) which contains 7 million hospital discharge records and ICD-9-CM codes. The proposed framework is tested on Ronald Reagan UCLA Medical Center Electronic Health Records (EHR) from 3041 Congestive Heart Failure (CHF) patients and the UCI 130-US diabetes dataset that includes admissions from 69,980 diabetic patients. We compare our cluster-based feature set with the commonly used comorbidity frameworks including Charlson's index, Elixhauser's comorbidities and their variations. The proposed approach was shown to have significant gains between 10.7-22.1% in predictive accuracy for CHF severity of condition prediction and 4.65-5.75% in diabetes readmission prediction. Copyright © 2016 Elsevier Ltd. All rights reserved.
Application of Exactly Linearized Error Transport Equations to AIAA CFD Prediction Workshops
NASA Technical Reports Server (NTRS)
Derlaga, Joseph M.; Park, Michael A.; Rallabhandi, Sriram
2017-01-01
The computational fluid dynamics (CFD) prediction workshops sponsored by the AIAA have created invaluable opportunities in which to discuss the predictive capabilities of CFD in areas in which it has struggled, e.g., cruise drag, high-lift, and sonic boom pre diction. While there are many factors that contribute to disagreement between simulated and experimental results, such as modeling or discretization error, quantifying the errors contained in a simulation is important for those who make decisions based on the computational results. The linearized error transport equations (ETE) combined with a truncation error estimate is a method to quantify one source of errors. The ETE are implemented with a complex-step method to provide an exact linearization with minimal source code modifications to CFD and multidisciplinary analysis methods. The equivalency of adjoint and linearized ETE functional error correction is demonstrated. Uniformly refined grids from a series of AIAA prediction workshops demonstrate the utility of ETE for multidisciplinary analysis with a connection between estimated discretization error and (resolved or under-resolved) flow features.
Vize, Colin E.; Lynam, Donald R.; Lamkin, Joanna; Miller, Joshua D; Pardini, Dustin
2015-01-01
Despite years of research, and inclusion of psychopathy DSM-5, there remains debate over the fundamental components of psychopathy. Although there is agreement about traits related to Agreeableness and Conscientiousness, there is less agreement about traits related to Fearless Dominance (FD) or Boldness. The present paper uses proxies of FD and Self-centered Impulsivity (SCI) to examine the contribution of FD-related traits to the predictive utility of psychopathy in a large, longitudinal, sample of boys to test four possibilities: FD 1. assessed earlier is a risk factor, 2. interacts with other risk-related variables to predict later psychopathy, 3. interacts with SCI interact to predict outcomes, and 4. bears curvilinear relations to outcomes. SCI received excellent support as a measure of psychopathy in adolescence; however, FD was unrelated to criteria in all tests. It is suggested that FD be dropped from psychopathy and that future research focus on Agreeableness and Conscientiousness. PMID:27347448
Nissley, Daniel A.; Sharma, Ajeet K.; Ahmed, Nabeel; Friedrich, Ulrike A.; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P.
2016-01-01
The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally—a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process. PMID:26887592
NASA Astrophysics Data System (ADS)
Sturtz, Timothy M.
Source apportionment models attempt to untangle the relationship between pollution sources and the impacts at downwind receptors. Two frameworks of source apportionment models exist: source-oriented and receptor-oriented. Source based apportionment models use presumed emissions and atmospheric processes to estimate the downwind source contributions. Conversely, receptor based models leverage speciated concentration data from downwind receptors and apply statistical methods to predict source contributions. Integration of both source-oriented and receptor-oriented models could lead to a better understanding of the implications sources have on the environment and society. The research presented here investigated three different types of constraints applied to the Positive Matrix Factorization (PMF) receptor model within the framework of the Multilinear Engine (ME-2): element ratio constraints, spatial separation constraints, and chemical transport model (CTM) source attribution constraints. PM10-2.5 mass and trace element concentrations were measured in Winston-Salem, Chicago, and St. Paul at up to 60 sites per city during two different seasons in 2010. PMF was used to explore the underlying sources of variability. Information on previously reported PM10-2.5 tire and brake wear profiles were used to constrain these features in PMF by prior specification of selected species ratios. We also modified PMF to allow for combining the measurements from all three cities into a single model while preserving city-specific soil features. Relatively minor differences were observed between model predictions with and without the prior ratio constraints, increasing confidence in our ability to identify separate brake wear and tire wear features. Using separate data, source contributions to total fine particle carbon predicted by a CTM were incorporated into the PMF receptor model to form a receptor-oriented hybrid model. The level of influence of the CTM versus traditional PMF was varied using a weighting parameter applied to an object function as implemented in ME-2. The resulting hybrid model was used to quantify the contributions of total carbon from both wildfires and biogenic sources at two Interagency Monitoring of Protected Visual Environment monitoring sites, Monture and Sula Peak, Montana, from 2006 through 2008.
Recent results from the first polar direct drive plastic capsule implosions on NIF
NASA Astrophysics Data System (ADS)
Schmitt, Mark J.
2012-10-01
Polar direct drive (PDD) offers a simplified platform for conducting strongly driven implosions on NIF to investigate mix, hydro-burn and ignition-relevant physics. Its successful use necessitates a firm understanding and predictive capability of its implosion characteristics including hydro performance, symmetry and yield. To assess this capability, the first two PDD implosions of deuterium filled CH capsules were recently conducted at NIF. The P2 Legendre mode symmetry seen in these implosions agreed with pre-shot predictions even though the 700kJ drive energy produced intensities that far exceeded thresholds for both Raman and Brillouin stimulated scattering. These shots were also the first to employ image backlighting driven by two laser quads. Preliminary results indicate that the yield from the uncoated 2.25 mm diameter, 42 μm thick, CH shells was reduced by about a factor of two owing to as-shot laser drive asymmetries. Similarly, a small (sim50 μm) centroid offset between the upper and lower shell hemispheres seen in the first shot appears to be indicative of the laser quad energies. Overall, the implosion trajectories agreed with pre-shot predictions of bangtime. The second shot incorporated an 80 ?m wide,10 ?m deep depression encircling the equator of the capsule. This engineered feature was imposed to test our capability to predict the effect of high-mode features on yield and mix. A predicted yield reduction factor of 3 was not observed.[4pt] In collaboration with P. A. Bradley, J. A. Cobble, P. Hakel, S. C. Hsu, N. S. Krasheninnikova, G. A. Kyrala, G. R. Magelssen, T. J. Murphy, K. A. Obrey, R. C. Shah, I. L. Tregillis and F. J. Wysocki of Los Alamos National Laboratory; M. Marinak, R. Wallace, T. Parham, M. Cowan, S. Glenn, R. Benedetti and the NIF Operations Team of Lawrence Livermore National Laboratory; R. S. Craxton and P. W. McKenty of the Univ. Rochester; P. Fitzsimmons and A. Nikroo of General Atomics; H. Rinderknecht, M. Rosenberg, and M. G. Johnson, MIT; Work supported by US DOE/NNSA, performed at LANL, operated by LANS LLC under contract DE-AC52-06NA25396.
Generic decoding of seen and imagined objects using hierarchical visual features.
Horikawa, Tomoyasu; Kamitani, Yukiyasu
2017-05-22
Object recognition is a key function in both human and machine vision. While brain decoding of seen and imagined objects has been achieved, the prediction is limited to training examples. We present a decoding approach for arbitrary objects using the machine vision principle that an object category is represented by a set of features rendered invariant through hierarchical processing. We show that visual features, including those derived from a deep convolutional neural network, can be predicted from fMRI patterns, and that greater accuracy is achieved for low-/high-level features with lower-/higher-level visual areas, respectively. Predicted features are used to identify seen/imagined object categories (extending beyond decoder training) from a set of computed features for numerous object images. Furthermore, decoding of imagined objects reveals progressive recruitment of higher-to-lower visual representations. Our results demonstrate a homology between human and machine vision and its utility for brain-based information retrieval.
HIV-1 protease cleavage site prediction based on two-stage feature selection method.
Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong
2013-03-01
Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.
OnTheFly: a database of Drosophila melanogaster transcription factors and their binding sites.
Shazman, Shula; Lee, Hunjoong; Socol, Yakov; Mann, Richard S; Honig, Barry
2014-01-01
We present OnTheFly (http://bhapp.c2b2.columbia.edu/OnTheFly/index.php), a database comprising a systematic collection of transcription factors (TFs) of Drosophila melanogaster and their DNA-binding sites. TFs predicted in the Drosophila melanogaster genome are annotated and classified and their structures, obtained via experiment or homology models, are provided. All known preferred TF DNA-binding sites obtained from the B1H, DNase I and SELEX methodologies are presented. DNA shape parameters predicted for these sites are obtained from a high throughput server or from crystal structures of protein-DNA complexes where available. An important feature of the database is that all DNA-binding domains and their binding sites are fully annotated in a eukaryote using structural criteria and evolutionary homology. OnTheFly thus provides a comprehensive view of TFs and their binding sites that will be a valuable resource for deciphering non-coding regulatory DNA.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tan, Shan; Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan; Kligerman, Seth
2013-04-01
Purpose: To extract and study comprehensive spatial-temporal {sup 18}F-labeled fluorodeoxyglucose ([{sup 18}F]FDG) positron emission tomography (PET) features for the prediction of pathologic tumor response to neoadjuvant chemoradiation therapy (CRT) in esophageal cancer. Methods and Materials: Twenty patients with esophageal cancer were treated with trimodal therapy (CRT plus surgery) and underwent [{sup 18}F]FDG-PET/CT scans both before (pre-CRT) and after (post-CRT) CRT. The 2 scans were rigidly registered. A tumor volume was semiautomatically delineated using a threshold standardized uptake value (SUV) of ≥2.5, followed by manual editing. Comprehensive features were extracted to characterize SUV intensity distribution, spatial patterns (texture), tumor geometry, andmore » associated changes resulting from CRT. The usefulness of each feature in predicting pathologic tumor response to CRT was evaluated using the area under the receiver operating characteristic curve (AUC) value. Results: The best traditional response measure was decline in maximum SUV (SUV{sub max}; AUC, 0.76). Two new intensity features, decline in mean SUV (SUV{sub mean}) and skewness, and 3 texture features (inertia, correlation, and cluster prominence) were found to be significant predictors with AUC values ≥0.76. According to these features, a tumor was more likely to be a responder when the SUV{sub mean} decline was larger, when there were relatively fewer voxels with higher SUV values pre-CRT, or when [{sup 18}F]FDG uptake post-CRT was relatively homogeneous. All of the most accurate predictive features were extracted from the entire tumor rather than from the most active part of the tumor. For SUV intensity features and tumor size features, changes were more predictive than pre- or post-CRT assessment alone. Conclusion: Spatial-temporal [{sup 18}F]FDG-PET features were found to be useful predictors of pathologic tumor response to neoadjuvant CRT in esophageal cancer.« less
Structural features based genome-wide characterization and prediction of nucleosome organization
2012-01-01
Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene expression regulation. The results indicated that our proposed methods are effective in predicting nucleosome occupancy and positions and that these structural features are highly predictive of nucleosome organization. The implementation of our DLaNe method based on structural features is available online. PMID:22449207
Simple predictive model for Early Childhood Caries of Chilean children.
Fierro Monti, Claudia; Pérez Flores, M; Brunotto, M
2014-01-01
Early Childhood Caries (ECC), in both industrialized and developing countries, is the most prevalent chronic disease in childhood and it is still a health public problem, affecting mainly populations considered as vulnerable, despite being preventable. The purpose of this study was to obtain a simple predictive model based on risk factors for improving public health strategies for ECC prevention for 3-5 year-old children. Clinical, environmental and psycho-socio-cultural data of children (n=250) aged 3-5 years, of both genders, from the Health Centers, were recorded in a Clinical History and Behavioral Survey. 24% of children presented behavioral problems (bizarre behavior was the main feature observed as behavioral problems). The variables associated to dmf ?4 were: bad children temperament (OR=2.43 [1.34, 4.40]) and home stress (OR=3.14 [1.54, 6.41]). It was observed that the model for male gender has higher accuracy for ECC (AUC= 78%, p-value=0.000) than others. Based on the results, we proposed a model where oral hygiene, sugar intake, male gender, and difficult temperament are main factors for predicting ECC. This model could be a promising tool for cost-effective early childhood caries control.
The movement ecology and dynamics of plant communities in fragmented landscapes
Damschen, Ellen I.; Brudvig, Lars A.; Haddad, Nick M.; ...
2008-12-05
A conceptual model of movement ecology has recently been advanced to explain all movement by considering the interaction of four elements: internal state, motion capacity, navigation capacities,and external factors. We modified this framework togenerate predictions for species richness dynamics of fragmented plant communities and tested them in experimental landscapes across a 7-year time series. We found that two external factors, dispersal vectors and habitat features, affected species colonization and recolonization in habitat fragments and their effects varied and depended on motion capacity. Bird-dispersed species richness showed connectivity effects that reached an asymptote over time, but no edge effects, whereas wind-dispersedmore » species richness showed steadily accumulating edge and connectivity effects, with no indication of an asymptote. Unassisted species also showed increasing differences caused by connectivity over time,whereas edges had no effect. Our limited use of proxies for movement ecology (e.g., dispersal mode as a proxy for motion capacity) resulted in moderate predictive power for communities and, in some cases, highlighted the importance of a more complete understanding of movement ecology for predicting how landscape conservation actions affect plant community dynamics.« less
Kamışlı, Songül; Dil, Satı; Daştan, Leyla; Eni, Nurhayat
2016-01-01
In this study, we investigated whether liberty-restricting and other factors can predict internalized stigma among psychiatric inpatients and outpatients. The study sample comprised of 129 inpatients, admitted at least once to psychiatry ward, and 100 outpatients who have never been hospitalized, receiving psychiatric treatment. In addition to demographic and clinical features, patients were evaluated for perceived deprivation of liberty and internalized stigma levels. Patients stated that their liberty was restrained mostly due to involuntary treatment, communication problems, side effects of medical treatment and inability to choose their treatment team. Regression analysis showed that internalized stigma was predicted by perceived deprivation of liberty, marital status and number of admissions to ward. Stigma was related to marital status and admissions to the psychiatry ward. Perceived deprivation of liberty predicts stigma regardless of the disease severity CONCLUSION: Perception of stigma leads to self-isolation, behavioral avoidance and refusal of aid-seeking. Our study indicated that perceived deprivation of liberty is one of the most important factors that lead to increased stigma. Based on our findings, we can say that as patients experience less perceived deprivation of liberty, they would have less stigma and thus, their compliance would increase.
How Children Use Examples to Make Conditional Predictions
ERIC Educational Resources Information Center
Kalish, Charles W.
2010-01-01
Two experiments explored children's and adults' use of examples to make conditional predictions. In Experiment 1 adults (N = 20) but not 4-year-olds (N = 21) or 8-year-olds (N =1 8) distinguished predictable from unpredictable features when features were partially correlated (e.g., necessary but not sufficient). Children did make reliable…
Novel method to predict body weight in children based on age and morphological facial features.
Huang, Ziyin; Barrett, Jeffrey S; Barrett, Kyle; Barrett, Ryan; Ng, Chee M
2015-04-01
A new and novel approach of predicting the body weight of children based on age and morphological facial features using a three-layer feed-forward artificial neural network (ANN) model is reported. The model takes in four parameters, including age-based CDC-inferred median body weight and three facial feature distances measured from digital facial images. In this study, thirty-nine volunteer subjects with age ranging from 6-18 years old and BW ranging from 18.6-96.4 kg were used for model development and validation. The final model has a mean prediction error of 0.48, a mean squared error of 18.43, and a coefficient of correlation of 0.94. The model shows significant improvement in prediction accuracy over several age-based body weight prediction methods. Combining with a facial recognition algorithm that can detect, extract and measure the facial features used in this study, mobile applications that incorporate this body weight prediction method may be developed for clinical investigations where access to scales is limited. © 2014, The American College of Clinical Pharmacology.
Adeniyi, D A; Wei, Z; Yang, Y
2018-01-30
A wealth of data are available within the health care system, however, effective analysis tools for exploring the hidden patterns in these datasets are lacking. To alleviate this limitation, this paper proposes a simple but promising hybrid predictive model by suitably combining the Chi-square distance measurement with case-based reasoning technique. The study presents the realization of an automated risk calculator and death prediction in some life-threatening ailments using Chi-square case-based reasoning (χ 2 CBR) model. The proposed predictive engine is capable of reducing runtime and speeds up execution process through the use of critical χ 2 distribution value. This work also showcases the development of a novel feature selection method referred to as frequent item based rule (FIBR) method. This FIBR method is used for selecting the best feature for the proposed χ 2 CBR model at the preprocessing stage of the predictive procedures. The implementation of the proposed risk calculator is achieved through the use of an in-house developed PHP program experimented with XAMP/Apache HTTP server as hosting server. The process of data acquisition and case-based development is implemented using the MySQL application. Performance comparison between our system, the NBY, the ED-KNN, the ANN, the SVM, the Random Forest and the traditional CBR techniques shows that the quality of predictions produced by our system outperformed the baseline methods studied. The result of our experiment shows that the precision rate and predictive quality of our system in most cases are equal to or greater than 70%. Our result also shows that the proposed system executes faster than the baseline methods studied. Therefore, the proposed risk calculator is capable of providing useful, consistent, faster, accurate and efficient risk level prediction to both the patients and the physicians at any time, online and on a real-time basis.
Mohebbi, Maryam; Ghassemian, Hassan; Asl, Babak Mohammadzadeh
2011-05-01
This paper aims to propose an effective paroxysmal atrial fibrillation (PAF) predictor which is based on the analysis of the heart rate variability (HRV) signal. Predicting the onset of PAF, based on non-invasive techniques, is clinically important and can be invaluable in order to avoid useless therapeutic interventions and to minimize the risks for the patients. This method consists of four steps: Preprocessing, feature extraction, feature reduction, and classification. In the first step, the QRS complexes are detected from the electrocardiogram (ECG) signal and then the HRV signal is extracted. In the next step, the recurrence plot (RP) of HRV signal is obtained and six features are extracted to characterize the basic patterns of the RP. These features consist of length of longest diagonal segments, average length of the diagonal lines, entropy, trapping time, length of longest vertical line, and recurrence trend. In the third step, these features are reduced to three features by the linear discriminant analysis (LDA) technique. Using LDA not only reduces the number of the input features, but also increases the classification accuracy by selecting the most discriminating features. Finally, a support vector machine-based classifier is used to classify the HRV signals. The performance of the proposed method in prediction of PAF episodes was evaluated using the Atrial Fibrillation Prediction Database which consists of both 30-minutes ECG recordings end just prior to the onset of PAF and segments at least 45 min distant from any PAF events. The obtained sensitivity, specificity, and positive predictivity were 96.55%, 100%, and 100%, respectively.
Prior probability and feature predictability interactively bias perceptual decisions
Dunovan, Kyle E.; Tremel, Joshua J.; Wheeler, Mark E.
2014-01-01
Anticipating a forthcoming sensory experience facilitates perception for expected stimuli but also hinders perception for less likely alternatives. Recent neuroimaging studies suggest that expectation biases arise from feature-level predictions that enhance early sensory representations and facilitate evidence accumulation for contextually probable stimuli while suppressing alternatives. Reasonably then, the extent to which prior knowledge biases subsequent sensory processing should depend on the precision of expectations at the feature level as well as the degree to which expected features match those of an observed stimulus. In the present study we investigated how these two sources of uncertainty modulated pre- and post-stimulus bias mechanisms in the drift-diffusion model during a probabilistic face/house discrimination task. We tested several plausible models of choice bias, concluding that predictive cues led to a bias in both the starting-point and rate of evidence accumulation favoring the more probable stimulus category. We further tested the hypotheses that prior bias in the starting-point was conditional on the feature-level uncertainty of category expectations and that dynamic bias in the drift-rate was modulated by the match between expected and observed stimulus features. Starting-point estimates suggested that subjects formed a constant prior bias in favor of the face category, which exhibits less feature-level variability, that was strengthened or weakened by trial-wise predictive cues. Furthermore, we found that the gain on face/house evidence was increased for stimuli with less ambiguous features and that this relationship was enhanced by valid category expectations. These findings offer new evidence that bridges psychological models of decision-making with recent predictive coding theories of perception. PMID:24978303
Extracting physicochemical features to predict protein secondary structure.
Huang, Yin-Fu; Chen, Shu-Ying
2013-01-01
We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.
Extracting Physicochemical Features to Predict Protein Secondary Structure
Chen, Shu-Ying
2013-01-01
We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances. PMID:23766688
Automated analysis of free speech predicts psychosis onset in high-risk youths
Bedi, Gillinder; Carrillo, Facundo; Cecchi, Guillermo A; Slezak, Diego Fernández; Sigman, Mariano; Mota, Natália B; Ribeiro, Sidarta; Javitt, Daniel C; Copelli, Mauro; Corcoran, Cheryl M
2015-01-01
Background/Objectives: Psychiatry lacks the objective clinical tests routinely used in other specializations. Novel computerized methods to characterize complex behaviors such as speech could be used to identify and predict psychiatric illness in individuals. AIMS: In this proof-of-principle study, our aim was to test automated speech analyses combined with Machine Learning to predict later psychosis onset in youths at clinical high-risk (CHR) for psychosis. Methods: Thirty-four CHR youths (11 females) had baseline interviews and were assessed quarterly for up to 2.5 years; five transitioned to psychosis. Using automated analysis, transcripts of interviews were evaluated for semantic and syntactic features predicting later psychosis onset. Speech features were fed into a convex hull classification algorithm with leave-one-subject-out cross-validation to assess their predictive value for psychosis outcome. The canonical correlation between the speech features and prodromal symptom ratings was computed. Results: Derived speech features included a Latent Semantic Analysis measure of semantic coherence and two syntactic markers of speech complexity: maximum phrase length and use of determiners (e.g., which). These speech features predicted later psychosis development with 100% accuracy, outperforming classification from clinical interviews. Speech features were significantly correlated with prodromal symptoms. Conclusions: Findings support the utility of automated speech analysis to measure subtle, clinically relevant mental state changes in emergent psychosis. Recent developments in computer science, including natural language processing, could provide the foundation for future development of objective clinical tests for psychiatry. PMID:27336038
Consciousness isn't all-or-none: Evidence for partial awareness during the attentional blink.
Elliott, James C; Baird, Benjamin; Giesbrecht, Barry
2016-02-01
Alternative views of the nature of consciousness posit that awareness of an object is either an all-or-none phenomenon or that awareness can be partial, occurring independently for different levels of representation. The all-or-none hypothesis predicts that when one feature of an object is identified, all other features should be consciously accessible. The partial awareness hypothesis predicts that one feature may reach consciousness while others do not. These competing predictions were tested in two experiments that presented two targets within a central stream of letters. We used the attentional blink evoked by the first target to assess consciousness for two different features of the second target. The results provide evidence that there can be a severe impairment in conscious access to one feature even when another feature is accurately reported. This behavioral evidence supports the partial awareness hypothesis, showing that consciousness of different features of the same object can be dissociated. Copyright © 2015 Elsevier Inc. All rights reserved.
Classification of speech dysfluencies using LPC based parameterization techniques.
Hariharan, M; Chee, Lim Sin; Ai, Ooi Chia; Yaacob, Sazali
2012-06-01
The goal of this paper is to discuss and compare three feature extraction methods: Linear Predictive Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC) and Weighted Linear Prediction Cepstral Coefficients (WLPCC) for recognizing the stuttered events. Speech samples from the University College London Archive of Stuttered Speech (UCLASS) were used for our analysis. The stuttered events were identified through manual segmentation and were used for feature extraction. Two simple classifiers namely, k-nearest neighbour (kNN) and Linear Discriminant Analysis (LDA) were employed for speech dysfluencies classification. Conventional validation method was used for testing the reliability of the classifier results. The study on the effect of different frame length, percentage of overlapping, value of ã in a first order pre-emphasizer and different order p were discussed. The speech dysfluencies classification accuracy was found to be improved by applying statistical normalization before feature extraction. The experimental investigation elucidated LPC, LPCC and WLPCC features can be used for identifying the stuttered events and WLPCC features slightly outperforms LPCC features and LPC features.
Baert, Isabel A C; Meeus, Mira; Mahmoudian, Armaghan; Luyten, Frank P; Nijs, Jo; Verschueren, Sabine M P
2017-09-01
The aim of this study was to examine the relationship of psychosocial factors, namely, pain catastrophizing, kinesiophobia, and maladaptive coping strategies, with muscle strength, pain, and physical performance in patients with knee osteoarthritis (OA)-related symptoms. A total of 109 women (64 with knee OA-related symptoms) with a mean age of 65.4 years (49-81 years) were recruited for this study. Psychosocial factors were quantified by the Pain Catastrophizing Scale, Tampa Scale for Kinesiophobia, and Pain Coping Inventory. Clinical features were assessed using isometric and isokinetic knee muscle strength measurements, visual analog scale, Western Ontario and McMaster Universities Osteoarthritis Index, and functional tests. Associations were examined using correlation and regression analysis. In knee OA patients, pain catastrophizing, kinesiophobia, and coping strategy explained a significant proportion of the variability in isometric knee extension and flexion strength (6.3%-9.2%), accounting for more overall variability than some demographic and medical status variables combined. Psychosocial factors were not significant independent predictors of isokinetic strength, knee pain, or physical performance. In understanding clinical features related to knee OA, such as muscle weakness, pain catastrophizing, kinesiophobia, and coping strategy might offer something additional beyond what might be explained by traditional factors, underscoring the importance of a biopsychosocial approach in knee OA management. Further research on individual patient characteristics that mediate the effects of psychosocial factors is, however, required in order to create opportunities for more targeted, personalized treatment for knee OA.
Responses to single photons in visual cells of Limulus
Borsellino, A.; Fuortes, M. G. F.
1968-01-01
1. A system proposed in a previous article as a model of responses of visual cells has been analysed with the purpose of predicting the features of responses to single absorbed photons. 2. As a result of this analysis, the stochastic variability of responses has been expressed as a function of the amplification of the system. 3. The theoretical predictions have been compared to the results obtained by recording electrical responses of visual cells of Limulus to flashes delivering only few photons. 4. Experimental responses to single photons have been tentatively identified and it was shown that the stochastic variability of these responses is similar to that predicted for a model with a multiplication factor of at least twenty-five. 5. These results lead to the conclusion that the processes responsible for visual responses incorporate some form of amplification. This conclusion may prove useful for identifying the physical mechanisms underlying the transducer action of visual cells. PMID:5664231
Modelling biological invasions: species traits, species interactions, and habitat heterogeneity.
Cannas, Sergio A; Marco, Diana E; Páez, Sergio A
2003-05-01
In this paper we explore the integration of different factors to understand, predict and control ecological invasions, through a general cellular automaton model especially developed. The model includes life history traits of several species in a modular structure interacting multiple cellular automata. We performed simulations using field values corresponding to the exotic Gleditsia triacanthos and native co-dominant trees in a montane area. Presence of G. triacanthos juvenile bank was a determinant condition for invasion success. Main parameters influencing invasion velocity were mean seed dispersal distance and minimum reproductive age. Seed production had a small influence on the invasion velocity. Velocities predicted by the model agreed well with estimations from field data. Values of population density predicted matched field values closely. The modular structure of the model, the explicit interaction between the invader and the native species, and the simplicity of parameters and transition rules are novel features of the model.
A Canonical Ensemble Correlation Prediction Model for Seasonal Precipitation Anomaly
NASA Technical Reports Server (NTRS)
Shen, Samuel S. P.; Lau, William K. M.; Kim, Kyu-Myong; Li, Guilong
2001-01-01
This report describes an optimal ensemble forecasting model for seasonal precipitation and its error estimation. Each individual forecast is based on the canonical correlation analysis (CCA) in the spectral spaces whose bases are empirical orthogonal functions (EOF). The optimal weights in the ensemble forecasting crucially depend on the mean square error of each individual forecast. An estimate of the mean square error of a CCA prediction is made also using the spectral method. The error is decomposed onto EOFs of the predictand and decreases linearly according to the correlation between the predictor and predictand. This new CCA model includes the following features: (1) the use of area-factor, (2) the estimation of prediction error, and (3) the optimal ensemble of multiple forecasts. The new CCA model is applied to the seasonal forecasting of the United States precipitation field. The predictor is the sea surface temperature.
Communication: Theoretical prediction of free-energy landscapes for complex self-assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacobs, William M.; Reinhardt, Aleks; Frenkel, Daan
2015-01-14
We present a technique for calculating free-energy profiles for the nucleation of multicomponent structures that contain as many species as building blocks. We find that a key factor is the topology of the graph describing the connectivity of the target assembly. By considering the designed interactions separately from weaker, incidental interactions, our approach yields predictions for the equilibrium yield and nucleation barriers. These predictions are in good agreement with corresponding Monte Carlo simulations. We show that a few fundamental properties of the connectivity graph determine the most prominent features of the assembly thermodynamics. Surprisingly, we find that polydispersity in themore » strengths of the designed interactions stabilizes intermediate structures and can be used to sculpt the free-energy landscape for self-assembly. Finally, we demonstrate that weak incidental interactions can preclude assembly at equilibrium due to the combinatorial possibilities for incorrect association.« less
Standard Clock in primordial density perturbations and cosmic microwave background
NASA Astrophysics Data System (ADS)
Chen, Xingang; Namjoo, Mohammad Hossein
2014-12-01
Standard Clocks in the primordial epoch leave a special type of features in the primordial perturbations, which can be used to directly measure the scale factor of the primordial universe as a function of time a (t), thus discriminating between inflation and alternatives. We have started to search for such signals in the Planck 2013 data using the key predictions of the Standard Clock. In this Letter, we summarize the key predictions of the Standard Clock and present an interesting candidate example in Planck 2013 data. Motivated by this candidate, we construct and compute full Standard Clock models and use the more complete prediction to make more extensive comparison with data. Although this candidate is not yet statistically significant, we use it to illustrate how Standard Clocks appear in Cosmic Microwave Background (CMB) and how they can be further tested by future data. We also use it to motivate more detailed theoretical model building.
Feature Biases in Early Word Learning: Network Distinctiveness Predicts Age of Acquisition
ERIC Educational Resources Information Center
Engelthaler, Tomas; Hills, Thomas T.
2017-01-01
Do properties of a word's features influence the order of its acquisition in early word learning? Combining the principles of mutual exclusivity and shape bias, the present work takes a network analysis approach to understanding how feature distinctiveness predicts the order of early word learning. Distance networks were built from nouns with edge…
ERIC Educational Resources Information Center
Cree, George S.; McNorgan, Chris; McRae, Ken
2006-01-01
The authors present data from 2 feature verification experiments designed to determine whether distinctive features have a privileged status in the computation of word meaning. They use an attractor-based connectionist model of semantic memory to derive predictions for the experiments. Contrary to central predictions of the conceptual structure…
Yasuda, Akihito; Onuki, Yoshinori; Obata, Yasuko; Yamamoto, Rie; Takayama, Kozo
2013-01-01
The "quality by design" concept in pharmaceutical formulation development requires the establishment of a science-based rationale and a design space. We integrated thin-plate spline (TPS) interpolation and Kohonen's self-organizing map (SOM) to visualize the latent structure underlying causal factors and pharmaceutical responses. As a model pharmaceutical product, theophylline tablets were prepared based on a standard formulation. The tensile strength, disintegration time, and stability of these variables were measured as response variables. These responses were predicted quantitatively based on nonlinear TPS. A large amount of data on these tablets was generated and classified into several clusters using an SOM. The experimental values of the responses were predicted with high accuracy, and the data generated for the tablets were classified into several distinct clusters. The SOM feature map allowed us to analyze the global and local correlations between causal factors and tablet characteristics. The results of this study suggest that increasing the proportion of microcrystalline cellulose (MCC) improved the tensile strength and the stability of tensile strength of these theophylline tablets. In addition, the proportion of MCC has an optimum value for disintegration time and stability of disintegration. Increasing the proportion of magnesium stearate extended disintegration time. Increasing the compression force improved tensile strength, but degraded the stability of disintegration. This technique provides a better understanding of the relationships between causal factors and pharmaceutical responses in theophylline tablet formulations.
Frank, Till D.; Carmody, Aimée M.; Kholodenko, Boris N.
2012-01-01
We derive a statistical model of transcriptional activation using equilibrium thermodynamics of chemical reactions. We examine to what extent this statistical model predicts synergy effects of cooperative activation of gene expression. We determine parameter domains in which greater-than-additive and less-than-additive effects are predicted for cooperative regulation by two activators. We show that the statistical approach can be used to identify different causes of synergistic greater-than-additive effects: nonlinearities of the thermostatistical transcriptional machinery and three-body interactions between RNA polymerase and two activators. In particular, our model-based analysis suggests that at low transcription factor concentrations cooperative activation cannot yield synergistic greater-than-additive effects, i.e., DNA transcription can only exhibit less-than-additive effects. Accordingly, transcriptional activity turns from synergistic greater-than-additive responses at relatively high transcription factor concentrations into less-than-additive responses at relatively low concentrations. In addition, two types of re-entrant phenomena are predicted. First, our analysis predicts that under particular circumstances transcriptional activity will feature a sequence of less-than-additive, greater-than-additive, and eventually less-than-additive effects when for fixed activator concentrations the regulatory impact of activators on the binding of RNA polymerase to the promoter increases from weak, to moderate, to strong. Second, for appropriate promoter conditions when activator concentrations are increased then the aforementioned re-entrant sequence of less-than-additive, greater-than-additive, and less-than-additive effects is predicted as well. Finally, our model-based analysis suggests that even for weak activators that individually induce only negligible increases in promoter activity, promoter activity can exhibit greater-than-additive responses when transcription factors and RNA polymerase interact by means of three-body interactions. Overall, we show that versatility of transcriptional activation is brought about by nonlinearities of transcriptional response functions and interactions between transcription factors, RNA polymerase and DNA. PMID:22506020
Zheng, Ce; Kurgan, Lukasz
2008-01-01
Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. Results We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between β-turns and non-β-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at . PMID:18847492
Klinker, Matthew W.; Marklein, Ross A.; Lo Surdo, Jessica L.; Wei, Cheng-Hong
2017-01-01
Human mesenchymal stromal cell (MSC) lines can vary significantly in their functional characteristics, and the effectiveness of MSC-based therapeutics may be realized by finding predictive features associated with MSC function. To identify features associated with immunosuppressive capacity in MSCs, we developed a robust in vitro assay that uses principal-component analysis to integrate multidimensional flow cytometry data into a single measurement of MSC-mediated inhibition of T-cell activation. We used this assay to correlate single-cell morphological data with overall immunosuppressive capacity in a cohort of MSC lines derived from different donors and manufacturing conditions. MSC morphology after IFN-γ stimulation significantly correlated with immunosuppressive capacity and accurately predicted the immunosuppressive capacity of MSC lines in a validation cohort. IFN-γ enhanced the immunosuppressive capacity of all MSC lines, and morphology predicted the magnitude of IFN-γ–enhanced immunosuppressive activity. Together, these data identify MSC morphology as a predictive feature of MSC immunosuppressive function. PMID:28283659
Sun, Xin; Young, Jennifer; Liu, Jeng-Hung; Newman, David
2018-06-01
The objective of this project was to develop a computer vision system (CVS) for objective measurement of pork loin under industry speed requirement. Color images of pork loin samples were acquired using a CVS. Subjective color and marbling scores were determined according to the National Pork Board standards by a trained evaluator. Instrument color measurement and crude fat percentage were used as control measurements. Image features (18 color features; 1 marbling feature; 88 texture features) were extracted from whole pork loin color images. Artificial intelligence prediction model (support vector machine) was established for pork color and marbling quality grades. The results showed that CVS with support vector machine modeling reached the highest prediction accuracy of 92.5% for measured pork color score and 75.0% for measured pork marbling score. This research shows that the proposed artificial intelligence prediction model with CVS can provide an effective tool for predicting color and marbling in the pork industry at online speeds. Copyright © 2018 Elsevier Ltd. All rights reserved.
Wiebe, Nicholas J P; Meyer, Irmtraud M
2010-06-24
The prediction of functional RNA structures has attracted increased interest, as it allows us to study the potential functional roles of many genes. RNA structure prediction methods, however, assume that there is a unique functional RNA structure and also do not predict functional features required for in vivo folding. In order to understand how functional RNA structures form in vivo, we require sophisticated experiments or reliable prediction methods. So far, there exist only a few, experimentally validated transient RNA structures. On the computational side, there exist several computer programs which aim to predict the co-transcriptional folding pathway in vivo, but these make a range of simplifying assumptions and do not capture all features known to influence RNA folding in vivo. We want to investigate if evolutionarily related RNA genes fold in a similar way in vivo. To this end, we have developed a new computational method, Transat, which detects conserved helices of high statistical significance. We introduce the method, present a comprehensive performance evaluation and show that Transat is able to predict the structural features of known reference structures including pseudo-knotted ones as well as those of known alternative structural configurations. Transat can also identify unstructured sub-sequences bound by other molecules and provides evidence for new helices which may define folding pathways, supporting the notion that homologous RNA sequence not only assume a similar reference RNA structure, but also fold similarly. Finally, we show that the structural features predicted by Transat differ from those assuming thermodynamic equilibrium. Unlike the existing methods for predicting folding pathways, our method works in a comparative way. This has the disadvantage of not being able to predict features as function of time, but has the considerable advantage of highlighting conserved features and of not requiring a detailed knowledge of the cellular environment.
Niederkrotenthaler, Thomas; Arendt, Florian; Till, Benedikt
2015-01-01
Research on factors that influence the intention to read suicide awareness material is lacking. To identify how social and state similarities between the featured protagonist of a suicide awareness story and the audience impact on the intent to read similar stories. Laboratory experiment with n = 104 students. Participants were randomly assigned to study groups. In the first group, the role model provided his personal story of crisis and was a student. In the second group, the content was identical but the model was socially dissimilar. The third group read about a topic unrelated to suicide. Depression, identification, and exposure intent were measured after the experiment. Conditional process analysis was carried out. In the group featuring a once-suicidal role model with high social similarity, depression in the audience increased the intention to read similar material in the future via identification with the role model; 82% of individuals wanted to read similar material in the future, but only 50% wanted to do so in the group featuring a dissimilar person. Exposure intention increases via identification when role model and audience characteristics align regarding social traits and the experience of depression. These factors are relevant when developing campaigns targeting individuals with stories of recovery.
The pathology of the foreign body reaction against biomaterials.
Klopfleisch, R; Jung, F
2017-03-01
The healing process after implantation of biomaterials involves the interaction of many contributing factors. Besides their in vivo functionality, biomaterials also require characteristics that allow their integration into the designated tissue without eliciting an overshooting foreign body reaction (FBR). The targeted design of biomaterials with these features, thus, needs understanding of the molecular mechanisms of the FBR. Much effort has been put into research on the interaction of engineered materials and the host tissue. This elucidated many aspects of the five FBR phases, that is protein adsorption, acute inflammation, chronic inflammation, foreign body giant cell formation, and fibrous capsule formation. However, in practice, it is still difficult to predict the response against a newly designed biomaterial purely based on the knowledge of its physical-chemical surface features. This insufficient knowledge leads to a high number of factors potentially influencing the FBR, which have to be analyzed in complex animal experiments including appropriate data-based sample sizes. This review is focused on the current knowledge on the general mechanisms of the FBR against biomaterials and the influence of biomaterial surface topography and chemical and physical features on the quality and quantity of the reaction. © 2016 Wiley Periodicals, Inc. J Biomed Mater Res Part A: 105A: 927-940, 2017. © 2016 Wiley Periodicals, Inc.
High-order graph matching based feature selection for Alzheimer's disease identification.
Liu, Feng; Suk, Heung-Il; Wee, Chong-Yaw; Chen, Huafu; Shen, Dinggang
2013-01-01
One of the main limitations of l1-norm feature selection is that it focuses on estimating the target vector for each sample individually without considering relations with other samples. However, it's believed that the geometrical relation among target vectors in the training set may provide useful information, and it would be natural to expect that the predicted vectors have similar geometric relations as the target vectors. To overcome these limitations, we formulate this as a graph-matching feature selection problem between a predicted graph and a target graph. In the predicted graph a node is represented by predicted vector that may describe regional gray matter volume or cortical thickness features, and in the target graph a node is represented by target vector that include class label and clinical scores. In particular, we devise new regularization terms in sparse representation to impose high-order graph matching between the target vectors and the predicted ones. Finally, the selected regional gray matter volume and cortical thickness features are fused in kernel space for classification. Using the ADNI dataset, we evaluate the effectiveness of the proposed method and obtain the accuracies of 92.17% and 81.57% in AD and MCI classification, respectively.
2017-01-01
Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain–computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method. PMID:28558002
Zhou, Jingyu; Tian, Shulin; Yang, Chenglin
2014-01-01
Few researches pay attention to prediction about analog circuits. The few methods lack the correlation with circuit analysis during extracting and calculating features so that FI (fault indicator) calculation often lack rationality, thus affecting prognostic performance. To solve the above problem, this paper proposes a novel prediction method about single components of analog circuits based on complex field modeling. Aiming at the feature that faults of single components hold the largest number in analog circuits, the method starts with circuit structure, analyzes transfer function of circuits, and implements complex field modeling. Then, by an established parameter scanning model related to complex field, it analyzes the relationship between parameter variation and degeneration of single components in the model in order to obtain a more reasonable FI feature set via calculation. According to the obtained FI feature set, it establishes a novel model about degeneration trend of analog circuits' single components. At last, it uses particle filter (PF) to update parameters for the model and predicts remaining useful performance (RUP) of analog circuits' single components. Since calculation about the FI feature set is more reasonable, accuracy of prediction is improved to some extent. Finally, the foregoing conclusions are verified by experiments.
NASA Astrophysics Data System (ADS)
Vallières, M.; Freeman, C. R.; Skamene, S. R.; El Naqa, I.
2015-07-01
This study aims at developing a joint FDG-PET and MRI texture-based model for the early evaluation of lung metastasis risk in soft-tissue sarcomas (STSs). We investigate if the creation of new composite textures from the combination of FDG-PET and MR imaging information could better identify aggressive tumours. Towards this goal, a cohort of 51 patients with histologically proven STSs of the extremities was retrospectively evaluated. All patients had pre-treatment FDG-PET and MRI scans comprised of T1-weighted and T2-weighted fat-suppression sequences (T2FS). Nine non-texture features (SUV metrics and shape features) and forty-one texture features were extracted from the tumour region of separate (FDG-PET, T1 and T2FS) and fused (FDG-PET/T1 and FDG-PET/T2FS) scans. Volume fusion of the FDG-PET and MRI scans was implemented using the wavelet transform. The influence of six different extraction parameters on the predictive value of textures was investigated. The incorporation of features into multivariable models was performed using logistic regression. The multivariable modeling strategy involved imbalance-adjusted bootstrap resampling in the following four steps leading to final prediction model construction: (1) feature set reduction; (2) feature selection; (3) prediction performance estimation; and (4) computation of model coefficients. Univariate analysis showed that the isotropic voxel size at which texture features were extracted had the most impact on predictive value. In multivariable analysis, texture features extracted from fused scans significantly outperformed those from separate scans in terms of lung metastases prediction estimates. The best performance was obtained using a combination of four texture features extracted from FDG-PET/T1 and FDG-PET/T2FS scans. This model reached an area under the receiver-operating characteristic curve of 0.984 ± 0.002, a sensitivity of 0.955 ± 0.006, and a specificity of 0.926 ± 0.004 in bootstrapping evaluations. Ultimately, lung metastasis risk assessment at diagnosis of STSs could improve patient outcomes by allowing better treatment adaptation.
MO-AB-BRA-10: Cancer Therapy Outcome Prediction Based On Dempster-Shafer Theory and PET Imaging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lian, C; University of Rouen, QuantIF - EA 4108 LITIS, 76000 Rouen; Li, H
2015-06-15
Purpose: In cancer therapy, utilizing FDG-18 PET image-based features for accurate outcome prediction is challenging because of 1) limited discriminative information within a small number of PET image sets, and 2) fluctuant feature characteristics caused by the inferior spatial resolution and system noise of PET imaging. In this study, we proposed a new Dempster-Shafer theory (DST) based approach, evidential low-dimensional transformation with feature selection (ELT-FS), to accurately predict cancer therapy outcome with both PET imaging features and clinical characteristics. Methods: First, a specific loss function with sparse penalty was developed to learn an adaptive low-rank distance metric for representing themore » dissimilarity between different patients’ feature vectors. By minimizing this loss function, a linear low-dimensional transformation of input features was achieved. Also, imprecise features were excluded simultaneously by applying a l2,1-norm regularization of the learnt dissimilarity metric in the loss function. Finally, the learnt dissimilarity metric was applied in an evidential K-nearest-neighbor (EK- NN) classifier to predict treatment outcome. Results: Twenty-five patients with stage II–III non-small-cell lung cancer and thirty-six patients with esophageal squamous cell carcinomas treated with chemo-radiotherapy were collected. For the two groups of patients, 52 and 29 features, respectively, were utilized. The leave-one-out cross-validation (LOOCV) protocol was used for evaluation. Compared to three existing linear transformation methods (PCA, LDA, NCA), the proposed ELT-FS leads to higher prediction accuracy for the training and testing sets both for lung-cancer patients (100+/−0.0, 88.0+/−33.17) and for esophageal-cancer patients (97.46+/−1.64, 83.33+/−37.8). The ELT-FS also provides superior class separation in both test data sets. Conclusion: A novel DST- based approach has been proposed to predict cancer treatment outcome using PET image features and clinical characteristics. A specific loss function has been designed for robust accommodation of feature set incertitude and imprecision, facilitating adaptive learning of the dissimilarity metric for the EK-NN classifier.« less
Yen, Michael; Chen, Jenny; Ausayakhun, Somsanguan; Kunavisarut, Paradee; Vichitvejpaisal, Pornpattana; Ausayakhun, Sakarin; Jirawison, Choeng; Shantha, Jessica; Holland, Gary N; Heiden, David; Margolis, Todd P; Keenan, Jeremy D
2014-01-01
Purpose To determine risk factors predictive of retinal detachment in patients with cytomegalovirus (CMV) retinitis in a setting with limited access to ophthalmic care. Design Case-control study. Methods Sixty-four patients with CMV retinitis and retinal detachment were identified from the Ocular Infectious Diseases and Retina Clinics at Chiang Mai University. Three control patients with CMV retinitis but no retinal detachment were selected for each case, matched by calendar date. The medical records of each patient were reviewed, with patient-level and eye-level features recorded for the clinic visit used to match cases and controls, and also for the initial clinic visit at which CMV retinitis was diagnosed. Risk factors for retinal detachment were assessed separately for each of these time points using multivariate conditional logistic regression models that included 1 eye from each patient. Results Patients with a retinal detachment were more likely than controls to have low visual acuity (OR, 1.24 per line of worse vision on the logMAR scale; 95%CI, 1.16-1.33) and bilateral disease (OR, 2.12; 95%CI, 0.92-4.90). Features present at the time of the initial diagnosis of CMV retinitis that predicted subsequent retinal detachment included bilateral disease (OR, 2.68; 95%CI, 1.18-6.08) and lesion size (OR, 2.64 per 10% increase in lesion size; 95%CI, 1.41-4.94). Conclusion Bilateral CMV retinitis and larger lesion sizes, each of which is a marker of advanced disease, were associated with subsequent retinal detachment. Earlier detection and treatment may reduce the likelihood that patients with CMV retinitis develop a retinal detachment. PMID:25448999
Biomarkers of Progression after HIV Acute/Early Infection: Nothing Compares to CD4+ T-cell Count?
Ghiglione, Yanina; Hormanstorfer, Macarena; Coloccini, Romina; Salido, Jimena; Trifone, César; Ruiz, María Julia; Falivene, Juliana; Caruso, María Paula; Figueroa, María Inés; Salomón, Horacio; Giavedoni, Luis D.; Pando, María de los Ángeles; Gherardi, María Magdalena; Rabinovich, Roberto Daniel; Sued, Omar
2018-01-01
Progression of HIV infection is variable among individuals, and definition disease progression biomarkers is still needed. Here, we aimed to categorize the predictive potential of several variables using feature selection methods and decision trees. A total of seventy-five treatment-naïve subjects were enrolled during acute/early HIV infection. CD4+ T-cell counts (CD4TC) and viral load (VL) levels were determined at enrollment and for one year. Immune activation, HIV-specific immune response, Human Leukocyte Antigen (HLA) and C-C chemokine receptor type 5 (CCR5) genotypes, and plasma levels of 39 cytokines were determined. Data were analyzed by machine learning and non-parametric methods. Variable hierarchization was performed by Weka correlation-based feature selection and J48 decision tree. Plasma interleukin (IL)-10, interferon gamma-induced protein (IP)-10, soluble IL-2 receptor alpha (sIL-2Rα) and tumor necrosis factor alpha (TNF-α) levels correlated directly with baseline VL, whereas IL-2, TNF-α, fibroblast growth factor (FGF)-2 and macrophage inflammatory protein (MIP)-1β correlated directly with CD4+ T-cell activation (p < 0.05). However, none of these cytokines had good predictive values to distinguish “progressors” from “non-progressors”. Similarly, immune activation, HIV-specific immune responses and HLA/CCR5 genotypes had low discrimination power. Baseline CD4TC was the most potent discerning variable with a cut-off of 438 cells/μL (accuracy = 0.93, κ-Cohen = 0.85). Limited discerning power of the other factors might be related to frequency, variability and/or sampling time. Future studies based on decision trees to identify biomarkers of post-treatment control are warrantied. PMID:29342870
Yen, Michael; Chen, Jenny; Ausayakhun, Somsanguan; Kunavisarut, Paradee; Vichitvejpaisal, Pornpattana; Ausayakhun, Sakarin; Jirawison, Choeng; Shantha, Jessica; Holland, Gary N; Heiden, David; Margolis, Todd P; Keenan, Jeremy D
2015-01-01
To determine risk factors predictive of retinal detachment in patients with cytomegalovirus (CMV) retinitis in a setting with limited access to ophthalmic care. Case-control study. Sixty-four patients with CMV retinitis and retinal detachment were identified from the Ocular Infectious Diseases and Retina Clinics at Chiang Mai University. Three control patients with CMV retinitis but no retinal detachment were selected for each case, matched by calendar date. The medical records of each patient were reviewed, with patient-level and eye-level features recorded for the clinic visit used to match cases and controls, and also for the initial clinic visit at which CMV retinitis was diagnosed. Risk factors for retinal detachment were assessed separately for each of these time points using multivariate conditional logistic regression models that included 1 eye from each patient. Patients with a retinal detachment were more likely than controls to have low visual acuity (odds ratio [OR], 1.24 per line of worse vision on the logMAR scale; 95% confidence interval [CI], 1.16-1.33) and bilateral disease (OR, 2.12; 95% CI, 0.92-4.90). Features present at the time of the initial diagnosis of CMV retinitis that predicted subsequent retinal detachment included bilateral disease (OR, 2.68; 95% CI, 1.18-6.08) and lesion size (OR, 2.64 per 10% increase in lesion size; 95% CI, 1.41-4.94). Bilateral CMV retinitis and larger lesion sizes, each of which is a marker of advanced disease, were associated with subsequent retinal detachment. Earlier detection and treatment may reduce the likelihood that patients with CMV retinitis develop a retinal detachment. Copyright © 2015 Elsevier Inc. All rights reserved.
Sabbaghi, Mostafa; Esmaeilian, Behzad; Raihanian Mashhadi, Ardeshir; Behdad, Sara; Cade, Willie
2015-02-01
Consumers often have a tendency to store their used, old or un-functional electronics for a period of time before they discard them and return them back to the waste stream. This behavior increases the obsolescence rate of used still-functional products leading to lower profitability that could be resulted out of End-of-Use (EOU) treatments such as reuse, upgrade, and refurbishment. These types of behaviors are influenced by several product and consumer-related factors such as consumers' traits and lifestyles, technology evolution, product design features, product market value, and pro-environmental stimuli. Better understanding of different groups of consumers, their utilization and storage behavior and the connection of these behaviors with product design features helps Original Equipment Manufacturers (OEMs) and recycling and recovery industry to better overcome the challenges resulting from the undesirable storage of used products. This paper aims at providing insightful statistical analysis of Electronic Waste (e-waste) dynamic nature by studying the effects of design characteristics, brand and consumer type on the electronics usage time and end of use time-in-storage. A database consisting of 10,063 Hard Disk Drives (HDD) of used personal computers returned back to a remanufacturing facility located in Chicago, IL, USA during 2011-2013 has been selected as the base for this study. The results show that commercial consumers have stored computers more than household consumers regardless of brand and capacity factors. Moreover, a heterogeneous storage behavior is observed for different brands of HDDs regardless of capacity and consumer type factors. Finally, the storage behavior trends are projected for short-time forecasting and the storage times are precisely predicted by applying machine learning methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
Biomarkers of Progression after HIV Acute/Early Infection: Nothing Compares to CD4⁺ T-cell Count?
Turk, Gabriela; Ghiglione, Yanina; Hormanstorfer, Macarena; Laufer, Natalia; Coloccini, Romina; Salido, Jimena; Trifone, César; Ruiz, María Julia; Falivene, Juliana; Holgado, María Pía; Caruso, María Paula; Figueroa, María Inés; Salomón, Horacio; Giavedoni, Luis D; Pando, María de Los Ángeles; Gherardi, María Magdalena; Rabinovich, Roberto Daniel; Pury, Pedro A; Sued, Omar
2018-01-13
Progression of HIV infection is variable among individuals, and definition disease progression biomarkers is still needed. Here, we aimed to categorize the predictive potential of several variables using feature selection methods and decision trees. A total of seventy-five treatment-naïve subjects were enrolled during acute/early HIV infection. CD4⁺ T-cell counts (CD4TC) and viral load (VL) levels were determined at enrollment and for one year. Immune activation, HIV-specific immune response, Human Leukocyte Antigen (HLA) and C-C chemokine receptor type 5 (CCR5) genotypes, and plasma levels of 39 cytokines were determined. Data were analyzed by machine learning and non-parametric methods. Variable hierarchization was performed by Weka correlation-based feature selection and J48 decision tree. Plasma interleukin (IL)-10, interferon gamma-induced protein (IP)-10, soluble IL-2 receptor alpha (sIL-2Rα) and tumor necrosis factor alpha (TNF-α) levels correlated directly with baseline VL, whereas IL-2, TNF-α, fibroblast growth factor (FGF)-2 and macrophage inflammatory protein (MIP)-1β correlated directly with CD4⁺ T-cell activation ( p < 0.05). However, none of these cytokines had good predictive values to distinguish "progressors" from "non-progressors". Similarly, immune activation, HIV-specific immune responses and HLA/CCR5 genotypes had low discrimination power. Baseline CD4TC was the most potent discerning variable with a cut-off of 438 cells/μL (accuracy = 0.93, κ-Cohen = 0.85). Limited discerning power of the other factors might be related to frequency, variability and/or sampling time. Future studies based on decision trees to identify biomarkers of post-treatment control are warrantied.
Identification of sequence motifs significantly associated with antisense activity.
McQuisten, Kyle A; Peek, Andrew S
2007-06-07
Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic mediators to speed the process along like the RNA Induced Silencing Complex (RISC) in RNAi. The independence of motif position and antisense activity also allows us to bypass consideration of this feature in the modelling process, promoting model efficiency and reducing the chance of overfitting when predicting antisense activity. The increase in SVR correlation with significant features compared to nearest-neighbour features indicates that thermodynamics alone is likely not the only factor in determining antisense efficiency.
A link prediction approach to cancer drug sensitivity prediction.
Turki, Turki; Wei, Zhi
2017-10-03
Predicting the response to a drug for cancer disease patients based on genomic information is an important problem in modern clinical oncology. This problem occurs in part because many available drug sensitivity prediction algorithms do not consider better quality cancer cell lines and the adoption of new feature representations; both lead to the accurate prediction of drug responses. By predicting accurate drug responses to cancer, oncologists gain a more complete understanding of the effective treatments for each patient, which is a core goal in precision medicine. In this paper, we model cancer drug sensitivity as a link prediction, which is shown to be an effective technique. We evaluate our proposed link prediction algorithms and compare them with an existing drug sensitivity prediction approach based on clinical trial data. The experimental results based on the clinical trial data show the stability of our link prediction algorithms, which yield the highest area under the ROC curve (AUC) and are statistically significant. We propose a link prediction approach to obtain new feature representation. Compared with an existing approach, the results show that incorporating the new feature representation to the link prediction algorithms has significantly improved the performance.
NASA Astrophysics Data System (ADS)
Sun, Wenqing; Tseng, Tzu-Liang B.; Zheng, Bin; Zhang, Jianying; Qian, Wei
2015-03-01
A novel breast cancer risk analysis approach is proposed for enhancing performance of computerized breast cancer risk analysis using bilateral mammograms. Based on the intensity of breast area, five different sub-regions were acquired from one mammogram, and bilateral features were extracted from every sub-region. Our dataset includes 180 bilateral mammograms from 180 women who underwent routine screening examinations, all interpreted as negative and not recalled by the radiologists during the original screening procedures. A computerized breast cancer risk analysis scheme using four image processing modules, including sub-region segmentation, bilateral feature extraction, feature selection, and classification was designed to detect and compute image feature asymmetry between the left and right breasts imaged on the mammograms. The highest computed area under the curve (AUC) is 0.763 ± 0.021 when applying the multiple sub-region features to our testing dataset. The positive predictive value and the negative predictive value were 0.60 and 0.73, respectively. The study demonstrates that (1) features extracted from multiple sub-regions can improve the performance of our scheme compared to using features from whole breast area only; (2) a classifier using asymmetry bilateral features can effectively predict breast cancer risk; (3) incorporating texture and morphological features with density features can boost the classification accuracy.
Reuzé, Sylvain; Orlhac, Fanny; Chargari, Cyrus; Nioche, Christophe; Limkin, Elaine; Riet, François; Escande, Alexandre; Haie-Meder, Christine; Dercle, Laurent; Gouy, Sébastien; Buvat, Irène; Deutsch, Eric; Robert, Charlotte
2017-06-27
To identify an imaging signature predicting local recurrence for locally advanced cervical cancer (LACC) treated by chemoradiation and brachytherapy from baseline 18F-FDG PET images, and to evaluate the possibility of gathering images from two different PET scanners in a radiomic study. 118 patients were included retrospectively. Two groups (G1, G2) were defined according to the PET scanner used for image acquisition. Eleven radiomic features were extracted from delineated cervical tumors to evaluate: (i) the predictive value of features for local recurrence of LACC, (ii) their reproducibility as a function of the scanner within a hepatic reference volume, (iii) the impact of voxel size on feature values. Eight features were statistically significant predictors of local recurrence in G1 (p < 0.05). The multivariate signature trained in G2 was validated in G1 (AUC=0.76, p<0.001) and identified local recurrence more accurately than SUVmax (p=0.022). Four features were significantly different between G1 and G2 in the liver. Spatial resampling was not sufficient to explain the stratification effect. This study showed that radiomic features could predict local recurrence of LACC better than SUVmax. Further investigation is needed before applying a model designed using data from one PET scanner to another.
Comparison of multiobjective evolutionary algorithms: empirical results.
Zitzler, E; Deb, K; Thiele, L
2000-01-01
In this paper, we provide a systematic comparison of various evolutionary approaches to multiobjective optimization using six carefully chosen test functions. Each test function involves a particular feature that is known to cause difficulty in the evolutionary optimization process, mainly in converging to the Pareto-optimal front (e.g., multimodality and deception). By investigating these different problem features separately, it is possible to predict the kind of problems to which a certain technique is or is not well suited. However, in contrast to what was suspected beforehand, the experimental results indicate a hierarchy of the algorithms under consideration. Furthermore, the emerging effects are evidence that the suggested test functions provide sufficient complexity to compare multiobjective optimizers. Finally, elitism is shown to be an important factor for improving evolutionary multiobjective search.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Voisin, Sophie; Tourassi, Georgia D.; Pinto, Frank
2013-10-15
Purpose: The primary aim of the present study was to test the feasibility of predicting diagnostic errors in mammography by merging radiologists’ gaze behavior and image characteristics. A secondary aim was to investigate group-based and personalized predictive models for radiologists of variable experience levels.Methods: The study was performed for the clinical task of assessing the likelihood of malignancy of mammographic masses. Eye-tracking data and diagnostic decisions for 40 cases were acquired from four Radiology residents and two breast imaging experts as part of an IRB-approved pilot study. Gaze behavior features were extracted from the eye-tracking data. Computer-generated and BIRADS imagesmore » features were extracted from the images. Finally, machine learning algorithms were used to merge gaze and image features for predicting human error. Feature selection was thoroughly explored to determine the relative contribution of the various features. Group-based and personalized user modeling was also investigated.Results: Machine learning can be used to predict diagnostic error by merging gaze behavior characteristics from the radiologist and textural characteristics from the image under review. Leveraging data collected from multiple readers produced a reasonable group model [area under the ROC curve (AUC) = 0.792 ± 0.030]. Personalized user modeling was far more accurate for the more experienced readers (AUC = 0.837 ± 0.029) than for the less experienced ones (AUC = 0.667 ± 0.099). The best performing group-based and personalized predictive models involved combinations of both gaze and image features.Conclusions: Diagnostic errors in mammography can be predicted to a good extent by leveraging the radiologists’ gaze behavior and image content.« less
Kelly, John F.; Hoeppner, Bettina B.; Urbanoski, Karen A.; Slaymaker, Valerie
2011-01-01
Objective Failure to maintain abstinence despite incurring severe harm is perhaps the key defining feature of addiction. Relapse prevention strategies have been developed to attenuate this propensity to relapse, but predicting who will, and who will not, relapse has stymied attempts to more efficiently tailor treatments according to relapse risk profile. Here we examine the psychometric properties of a promising relapse risk measure - the Advance WArning of RElapse scale (AWARE) scale (Miller and Harris, 2000) in an understudied but clinically important sample of young adults. Method Inpatient youth (N=303; Age 18-24; 26% female) completed the AWARE scale and the Brief Symptom Inventory-18 (BSI) at the end of residential treatment, and at 1-, 3-, and 6-months following discharge. Internal and convergent validity was tested for each of these four timepoints using confirmatory factor analysis and correlations (with BSI scores). Predictive validity was tested for relapse 1, 3, and 6 months following discharge, as was incremental utility, where AWARE scores were used as predictors of any substance use while controlling for treatment entry substance use severity and having spent time in a controlled environment following treatment. Results Confirmatory factor analysis revealed a single, internally consistent, 25-item factor that demonstrated convergent validity and predicted subsequent relapse alone and when controlling for other important relapse risk predictors. Conclusions The AWARE scale may be a useful and efficient clinical tool for assessing short-term relapse risk among young people and, thus, could serve to enhance the effectiveness of relapse prevention efforts. PMID:21700396
Kelly, John F; Hoeppner, Bettina B; Urbanoski, Karen A; Slaymaker, Valerie
2011-10-01
Failure to maintain abstinence despite incurring severe harm is perhaps the key defining feature of addiction. Relapse prevention strategies have been developed to attenuate this propensity to relapse, but predicting who will, and who will not, relapse has stymied attempts to more efficiently tailor treatments according to relapse risk profile. Here we examine the psychometric properties of a promising relapse risk measure-the Advance WArning of RElapse (AWARE) scale (Miller & Harris, 2000) in an understudied but clinically important sample of young adults. Inpatient youth (N=303; Ages 18-24; 26% female) completed the AWARE scale and the Brief Symptom Inventory-18 (BSI) at the end of residential treatment, and at 1-, 3-, and 6-months following discharge. Internal and convergent validity was tested for each of these four timepoints using confirmatory factor analysis and correlations (with BSI scores). Predictive validity was tested for relapse 1, 3, and 6 months following discharge, as was incremental utility, where AWARE scores were used as predictors of any substance use while controlling for treatment entry substance use severity and having spent time in a controlled environment following treatment. Confirmatory factor analysis revealed a single, internally consistent, 25-item factor that demonstrated convergent validity and predicted subsequent relapse alone and when controlling for other important relapse risk predictors. The AWARE scale may be a useful and efficient clinical tool for assessing short-term relapse risk among young people and, thus, could serve to enhance the effectiveness of relapse prevention efforts. Copyright © 2011 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.
2004-08-06
The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayedmore » embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Measuring conservation of sequence features closely linked to function--such as binding-site clustering--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less
What variables are important in predicting bovine viral diarrhea virus? A random forest approach.
Machado, Gustavo; Mendoza, Mariana Recamonde; Corbellini, Luis Gustavo
2015-07-24
Bovine viral diarrhea virus (BVDV) causes one of the most economically important diseases in cattle, and the virus is found worldwide. A better understanding of the disease associated factors is a crucial step towards the definition of strategies for control and eradication. In this study we trained a random forest (RF) prediction model and performed variable importance analysis to identify factors associated with BVDV occurrence. In addition, we assessed the influence of features selection on RF performance and evaluated its predictive power relative to other popular classifiers and to logistic regression. We found that RF classification model resulted in an average error rate of 32.03% for the negative class (negative for BVDV) and 36.78% for the positive class (positive for BVDV).The RF model presented area under the ROC curve equal to 0.702. Variable importance analysis revealed that important predictors of BVDV occurrence were: a) who inseminates the animals, b) number of neighboring farms that have cattle and c) rectal palpation performed routinely. Our results suggest that the use of machine learning algorithms, especially RF, is a promising methodology for the analysis of cross-sectional studies, presenting a satisfactory predictive power and the ability to identify predictors that represent potential risk factors for BVDV investigation. We examined classical predictors and found some new and hard to control practices that may lead to the spread of this disease within and among farms, mainly regarding poor or neglected reproduction management, which should be considered for disease control and eradication.
Hemorrhagic transformation after ischemic stroke in animals and humans
Jickling, Glen C; Liu, DaZhi; Stamova, Boryana; Ander, Bradley P; Zhan, Xinhua; Lu, Aigang; Sharp, Frank R
2014-01-01
Hemorrhagic transformation (HT) is a common complication of ischemic stroke that is exacerbated by thrombolytic therapy. Methods to better prevent, predict, and treat HT are needed. In this review, we summarize studies of HT in both animals and humans. We propose that early HT (<18 to 24 hours after stroke onset) relates to leukocyte-derived matrix metalloproteinase-9 (MMP-9) and brain-derived MMP-2 that damage the neurovascular unit and promote blood–brain barrier (BBB) disruption. This contrasts to delayed HT (>18 to 24 hours after stroke) that relates to ischemia activation of brain proteases (MMP-2, MMP-3, MMP-9, and endogenous tissue plasminogen activator), neuroinflammation, and factors that promote vascular remodeling (vascular endothelial growth factor and high-moblity-group-box-1). Processes that mediate BBB repair and reduce HT risk are discussed, including transforming growth factor beta signaling in monocytes, Src kinase signaling, MMP inhibitors, and inhibitors of reactive oxygen species. Finally, clinical features associated with HT in patients with stroke are reviewed, including approaches to predict HT by clinical factors, brain imaging, and blood biomarkers. Though remarkable advances in our understanding of HT have been made, additional efforts are needed to translate these discoveries to the clinic and reduce the impact of HT on patients with ischemic stroke. PMID:24281743
Chen, Lei; Zhang, Yu-Hang; Zheng, Mingyue; Huang, Tao; Cai, Yu-Dong
2016-12-01
Compound-protein interactions play important roles in every cell via the recognition and regulation of specific functional proteins. The correct identification of compound-protein interactions can lead to a good comprehension of this complicated system and provide useful input for the investigation of various attributes of compounds and proteins. In this study, we attempted to understand this system by extracting properties from both proteins and compounds, in which proteins were represented by gene ontology and KEGG pathway enrichment scores and compounds were represented by molecular fragments. Advanced feature selection methods, including minimum redundancy maximum relevance, incremental feature selection, and the basic machine learning algorithm random forest, were used to analyze these properties and extract core factors for the determination of actual compound-protein interactions. Compound-protein interactions reported in The Binding Databases were used as positive samples. To improve the reliability of the results, the analytic procedure was executed five times using different negative samples. Simultaneously, five optimal prediction methods based on a random forest and yielding maximum MCCs of approximately 77.55 % were constructed and may be useful tools for the prediction of compound-protein interactions. This work provides new clues to understanding the system of compound-protein interactions by analyzing extracted core features. Our results indicate that compound-protein interactions are related to biological processes involving immune, developmental and hormone-associated pathways.
Lin, Lung-Chang; Ouyang, Chen-Sen; Chiang, Ching-Tai; Yang, Rei-Cheng; Wu, Rong-Ching; Wu, Hui-Chuan
2014-11-01
Refractory epilepsy often has deleterious effects on an individual's health and quality of life. Early identification of patients whose seizures are refractory to antiepileptic drugs is important in considering the use of alternative treatments. Although idiopathic epilepsy is regarded as having a significantly lower risk factor of developing refractory epilepsy, still a subset of patients with idiopathic epilepsy might be refractory to medical treatment. In this study, we developed an effective method to predict the refractoriness of idiopathic epilepsy. Sixteen EEG segments from 12 well-controlled patients and 14 EEG segments from 11 refractory patients were analyzed at the time of first EEG recordings before antiepileptic drug treatment. Ten crucial EEG feature descriptors were selected for classification. Three of 10 were related to decorrelation time, and four of 10 were related to relative power of delta/gamma. There were significantly higher values in these seven feature descriptors in the well-controlled group as compared to the refractory group. On the contrary, the remaining three feature descriptors related to spectral edge frequency, kurtosis, and energy of wavelet coefficients demonstrated significantly lower values in the well-controlled group as compared to the refractory group. The analyses yielded a weighted precision rate of 94.2%, and a 93.3% recall rate. Therefore, the developed method is a useful tool in identifying the possibility of developing refractory epilepsy in patients with idiopathic epilepsy.
Predicting missing biomarker data in a longitudinal study of Alzheimer disease.
Lo, Raymond Y; Jagust, William J
2012-05-01
To investigate predictors of missing data in a longitudinal study of Alzheimer disease (AD). The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a clinic-based, multicenter, longitudinal study with blood, CSF, PET, and MRI scans repeatedly measured in 229 participants with normal cognition (NC), 397 with mild cognitive impairment (MCI), and 193 with mild AD during 2005-2007. We used univariate and multivariable logistic regression models to examine the associations between baseline demographic/clinical features and loss of biomarker follow-ups in ADNI. CSF studies tended to recruit and retain patients with MCI with more AD-like features, including lower levels of baseline CSF Aβ(42). Depression was the major predictor for MCI dropouts, while family history of AD kept more patients with AD enrolled in PET and MRI studies. Poor cognitive performance was associated with loss of follow-up in most biomarker studies, even among NC participants. The presence of vascular risk factors seemed more critical than cognitive function for predicting dropouts in AD. The missing data are not missing completely at random in ADNI and likely conditional on certain features in addition to cognitive function. Missing data predictors vary across biomarkers and even MCI and AD groups do not share the same missing data pattern. Understanding the missing data structure may help in the design of future longitudinal studies and clinical trials in AD.
Predicting missing biomarker data in a longitudinal study of Alzheimer disease
Jagust, William J.; Aisen, Paul; Jack, Clifford R.; Toga, Arthur W.; Beckett, Laurel; Gamst, Anthony; Soares, Holly; C. Green, Robert; Montine, Tom; Thomas, Ronald G.; Donohue, Michael; Walter, Sarah; Dale, Anders; Bernstein, Matthew; Felmlee, Joel; Fox, Nick; Thompson, Paul; Schuff, Norbert; Alexander, Gene; DeCarli, Charles; Bandy, Dan; Chen, Kewei; Morris, John; Lee, Virginia M.-Y.; Korecka, Magdalena; Crawford, Karen; Neu, Scott; Harvey, Danielle; Kornak, John; Saykin, Andrew J.; Foroud, Tatiana M.; Potkin, Steven; Shen, Li; Buckholtz, Neil; Kaye, Jeffrey; Dolen, Sara; Quinn, Joseph; Schneider, Lon; Pawluczyk, Sonia; Spann, Bryan M.; Brewer, James; Vanderswag, Helen; Heidebrink, Judith L.; Lord, Joanne L.; Petersen, Ronald; Johnson, Kris; Doody, Rachelle S.; Villanueva-Meyer, Javier; Chowdhury, Munir; Stern, Yaakov; Honig, Lawrence S.; Bell, Karen L.; Morris, John C.; Mintun, Mark A.; Schneider, Stacy; Marson, Daniel; Griffith, Randall; Clark, David; Grossman, Hillel; Tang, Cheuk; Marzloff, George; Toledo-Morrell, Leylade; Shah, Raj C.; Duara, Ranjan; Varon, Daniel; Roberts, Peggy; Albert, Marilyn S.; Pedroso, Julia; Toroney, Jaimie; Rusinek, Henry; de Leon, Mony J; De Santi, Susan M; Doraiswamy, P. Murali; Petrella, Jeffrey R.; Aiello, Marilyn; Clark, Christopher M.; Pham, Cassie; Nunez, Jessica; Smith, Charles D.; Given, Curtis A.; Hardy, Peter; Lopez, Oscar L.; Oakley, MaryAnn; Simpson, Donna M.; Ismail, M. Saleem; Brand, Connie; Richard, Jennifer; Mulnard, Ruth A.; Thai, Gaby; Mc-Adams-Ortiz, Catherine; Diaz-Arrastia, Ramon; Martin-Cook, Kristen; DeVous, Michael; Levey, Allan I.; Lah, James J.; Cellar, Janet S.; Burns, Jeffrey M.; Anderson, Heather S.; Laubinger, Mary M.; Bartzokis, George; Silverman, Daniel H.S.; Lu, Po H.; Graff-Radford MBBCH, Neill R; Parfitt, Francine; Johnson, Heather; Farlow, Martin; Herring, Scott; Hake, Ann M.; van Dyck, Christopher H.; MacAvoy, Martha G.; Benincasa, Amanda L.; Chertkow, Howard; Bergman, Howard; Hosein, Chris; Black, Sandra; Graham, Simon; Caldwell, Curtis; Hsiung, Ging-Yuek Robin; Feldman, Howard; Assaly, Michele; Kertesz, Andrew; Rogers, John; Trost, Dick; Bernick, Charles; Munic, Donna; Wu, Chuang-Kuo; Johnson, Nancy; Mesulam, Marsel; Sadowsky, Carl; Martinez, Walter; Villena, Teresa; Turner, Scott; Johnson, Kathleen B.; Behan, Kelly E.; Sperling, Reisa A.; Rentz, Dorene M.; Johnson, Keith A.; Rosen, Allyson; Tinklenberg, Jared; Ashford, Wes; Sabbagh, Marwan; Connor, Donald; Jacobson, Sandra; Killiany, Ronald; Norbash, Alexander; Nair, Anil; Obisesan, Thomas O.; Jayam-Trouth, Annapurni; Wang, Paul; Lerner, Alan; Hudson, Leon; Ogrocki, Paula; DeCarli, Charles; Fletcher, Evan; Carmichael, Owen; Kittur, Smita; Mirje, Seema; Borrie, Michael; Lee, T-Y; Bartha, Dr Rob; Johnson, Sterling; Asthana, Sanjay; Carlsson, Cynthia M.; Potkin, Steven G.; Preda, Adrian; Nguyen, Dana; Tariot, Pierre; Fleisher, Adam; Reeder, Stephanie; Bates, Vernice; Capote, Horacio; Rainka, Michelle; Hendin, Barry A.; Scharre, Douglas W.; Kataki, Maria; Zimmerman, Earl A.; Celmins, Dzintra; Brown, Alice D.; Gandy, Sam; Marenberg, Marjorie E.; Rovner, Barry W.; Pearlson, Godfrey; Anderson, Karen; Saykin, Andrew J.; Santulli, Robert B.; Englert, Jessica; Williamson, Jeff D.; Sink, Kaycee M.; Watkins, Franklin; Ott, Brian R.; Wu, Chuang-Kuo; Cohen, Ronald; Salloway, Stephen; Malloy, Paul; Correia, Stephen; Rosen, Howard J.; Miller, Bruce L.; Mintzer, Jacobo
2012-01-01
Objective: To investigate predictors of missing data in a longitudinal study of Alzheimer disease (AD). Methods: The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a clinic-based, multicenter, longitudinal study with blood, CSF, PET, and MRI scans repeatedly measured in 229 participants with normal cognition (NC), 397 with mild cognitive impairment (MCI), and 193 with mild AD during 2005–2007. We used univariate and multivariable logistic regression models to examine the associations between baseline demographic/clinical features and loss of biomarker follow-ups in ADNI. Results: CSF studies tended to recruit and retain patients with MCI with more AD-like features, including lower levels of baseline CSF Aβ42. Depression was the major predictor for MCI dropouts, while family history of AD kept more patients with AD enrolled in PET and MRI studies. Poor cognitive performance was associated with loss of follow-up in most biomarker studies, even among NC participants. The presence of vascular risk factors seemed more critical than cognitive function for predicting dropouts in AD. Conclusion: The missing data are not missing completely at random in ADNI and likely conditional on certain features in addition to cognitive function. Missing data predictors vary across biomarkers and even MCI and AD groups do not share the same missing data pattern. Understanding the missing data structure may help in the design of future longitudinal studies and clinical trials in AD. PMID:22491869
Edwards, Stefan M.; Sørensen, Izel F.; Sarup, Pernille; Mackay, Trudy F. C.; Sørensen, Peter
2016-01-01
Predicting individual quantitative trait phenotypes from high-resolution genomic polymorphism data is important for personalized medicine in humans, plant and animal breeding, and adaptive evolution. However, this is difficult for populations of unrelated individuals when the number of causal variants is low relative to the total number of polymorphisms and causal variants individually have small effects on the traits. We hypothesized that mapping molecular polymorphisms to genomic features such as genes and their gene ontology categories could increase the accuracy of genomic prediction models. We developed a genomic feature best linear unbiased prediction (GFBLUP) model that implements this strategy and applied it to three quantitative traits (startle response, starvation resistance, and chill coma recovery) in the unrelated, sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel. Our results indicate that subsetting markers based on genomic features increases the predictive ability relative to the standard genomic best linear unbiased prediction (GBLUP) model. Both models use all markers, but GFBLUP allows differential weighting of the individual genetic marker relationships, whereas GBLUP weighs the genetic marker relationships equally. Simulation studies show that it is possible to further increase the accuracy of genomic prediction for complex traits using this model, provided the genomic features are enriched for causal variants. Our GFBLUP model using prior information on genomic features enriched for causal variants can increase the accuracy of genomic predictions in populations of unrelated individuals and provides a formal statistical framework for leveraging and evaluating information across multiple experimental studies to provide novel insights into the genetic architecture of complex traits. PMID:27235308
ERIC Educational Resources Information Center
Primativo, Silvia; Reilly, Jamie; Crutch, Sebastian J
2017-01-01
The Abstract Conceptual Feature (ACF) framework predicts that word meaning is represented within a high-dimensional semantic space bounded by weighted contributions of perceptual, affective, and encyclopedic information. The ACF, like latent semantic analysis, is amenable to distance metrics between any two words. We applied predictions of the ACF…
Christensen, Daniel; Zubrick, Stephen R; Lawrence, David; Mitrou, Francis; Taylor, Catherine L
2014-01-01
Receptive vocabulary development is a component of the human language system that emerges in the first year of life and is characterised by onward expansion throughout life. Beginning in infancy, children's receptive vocabulary knowledge builds the foundation for oral language and reading skills. The foundations for success at school are built early, hence the public health policy focus on reducing developmental inequalities before children start formal school. The underlying assumption is that children's development is stable, and therefore predictable, over time. This study investigated this assumption in relation to children's receptive vocabulary ability. We investigated the extent to which low receptive vocabulary ability at 4 years was associated with low receptive vocabulary ability at 8 years, and the predictive utility of a multivariate model that included child, maternal and family risk factors measured at 4 years. The study sample comprised 3,847 children from the first nationally representative Longitudinal Study of Australian Children (LSAC). Multivariate logistic regression was used to investigate risks for low receptive vocabulary ability from 4-8 years and sensitivity-specificity analysis was used to examine the predictive utility of the multivariate model. In the multivariate model, substantial risk factors for receptive vocabulary delay from 4-8 years, in order of descending magnitude, were low receptive vocabulary ability at 4 years, low maternal education, and low school readiness. Moderate risk factors, in order of descending magnitude, were low maternal parenting consistency, socio-economic area disadvantage, low temperamental persistence, and NESB status. The following risk factors were not significant: One or more siblings, low family income, not reading to the child, high maternal work hours, and Aboriginal or Torres Strait Islander ethnicity. The results of the sensitivity-specificity analysis showed that a well-fitted multivariate model featuring risks of substantive magnitude does not do particularly well in predicting low receptive vocabulary ability from 4-8 years.