Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context.
Martinez, Josue G; Carroll, Raymond J; Müller, Samuel; Sampson, Joshua N; Chatterjee, Nilanjan
2011-11-01
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.
Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context
Martinez, Josue G.; Carroll, Raymond J.; Müller, Samuel; Sampson, Joshua N.; Chatterjee, Nilanjan
2012-01-01
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso. PMID:22347720
Cross-validation pitfalls when selecting and assessing regression and classification models.
Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon
2014-03-29
We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.
Development of estrogen receptor beta binding prediction model using large sets of chemicals.
Sakkiah, Sugunadevi; Selvaraj, Chandrabose; Gong, Ping; Zhang, Chaoyang; Tong, Weida; Hong, Huixiao
2017-11-03
We developed an ER β binding prediction model to facilitate identification of chemicals specifically bind ER β or ER α together with our previously developed ER α binding model. Decision Forest was used to train ER β binding prediction model based on a large set of compounds obtained from EADB. Model performance was estimated through 1000 iterations of 5-fold cross validations. Prediction confidence was analyzed using predictions from the cross validations. Informative chemical features for ER β binding were identified through analysis of the frequency data of chemical descriptors used in the models in the 5-fold cross validations. 1000 permutations were conducted to assess the chance correlation. The average accuracy of 5-fold cross validations was 93.14% with a standard deviation of 0.64%. Prediction confidence analysis indicated that the higher the prediction confidence the more accurate the predictions. Permutation testing results revealed that the prediction model is unlikely generated by chance. Eighteen informative descriptors were identified to be important to ER β binding prediction. Application of the prediction model to the data from ToxCast project yielded very high sensitivity of 90-92%. Our results demonstrated ER β binding of chemicals could be accurately predicted using the developed model. Coupling with our previously developed ER α prediction model, this model could be expected to facilitate drug development through identification of chemicals that specifically bind ER β or ER α .
Multivariate Adaptive Regression Splines (Preprint)
1990-08-01
fold cross -validation would take about ten time as long, and MARS is not all that fast to begin with. Friedman has a number of examples showing...standardized mean squared error of prediction (MSEP), the generalized cross validation (GCV), and the number of selected terms (TERMS). In accordance with...and mi= 10 case were almost exclusively spurious cross product terms and terms involving the nuisance variables x6 through xlo. This large number of
K-Fold Crossvalidation in Canonical Analysis.
ERIC Educational Resources Information Center
Liang, Kun-Hsia; And Others
1995-01-01
A computer-assisted, K-fold cross-validation technique is discussed in the framework of canonical correlation analysis of randomly generated data sets. Analysis results suggest that this technique can effectively reduce the contamination of canonical variates and canonical correlations by sample-specific variance components. (Author/SLD)
Design and Implementation of a Smart Home System Using Multisensor Data Fusion Technology.
Hsu, Yu-Liang; Chou, Po-Huan; Chang, Hsing-Cheng; Lin, Shyan-Lung; Yang, Shih-Chin; Su, Heng-Yi; Chang, Chih-Chien; Cheng, Yuan-Sheng; Kuo, Yu-Chen
2017-07-15
This paper aims to develop a multisensor data fusion technology-based smart home system by integrating wearable intelligent technology, artificial intelligence, and sensor fusion technology. We have developed the following three systems to create an intelligent smart home environment: (1) a wearable motion sensing device to be placed on residents' wrists and its corresponding 3D gesture recognition algorithm to implement a convenient automated household appliance control system; (2) a wearable motion sensing device mounted on a resident's feet and its indoor positioning algorithm to realize an effective indoor pedestrian navigation system for smart energy management; (3) a multisensor circuit module and an intelligent fire detection and alarm algorithm to realize a home safety and fire detection system. In addition, an intelligent monitoring interface is developed to provide in real-time information about the smart home system, such as environmental temperatures, CO concentrations, communicative environmental alarms, household appliance status, human motion signals, and the results of gesture recognition and indoor positioning. Furthermore, an experimental testbed for validating the effectiveness and feasibility of the smart home system was built and verified experimentally. The results showed that the 3D gesture recognition algorithm could achieve recognition rates for automated household appliance control of 92.0%, 94.8%, 95.3%, and 87.7% by the 2-fold cross-validation, 5-fold cross-validation, 10-fold cross-validation, and leave-one-subject-out cross-validation strategies. For indoor positioning and smart energy management, the distance accuracy and positioning accuracy were around 0.22% and 3.36% of the total traveled distance in the indoor environment. For home safety and fire detection, the classification rate achieved 98.81% accuracy for determining the conditions of the indoor living environment.
Design and Implementation of a Smart Home System Using Multisensor Data Fusion Technology
Chou, Po-Huan; Chang, Hsing-Cheng; Lin, Shyan-Lung; Yang, Shih-Chin; Su, Heng-Yi; Chang, Chih-Chien; Cheng, Yuan-Sheng; Kuo, Yu-Chen
2017-01-01
This paper aims to develop a multisensor data fusion technology-based smart home system by integrating wearable intelligent technology, artificial intelligence, and sensor fusion technology. We have developed the following three systems to create an intelligent smart home environment: (1) a wearable motion sensing device to be placed on residents’ wrists and its corresponding 3D gesture recognition algorithm to implement a convenient automated household appliance control system; (2) a wearable motion sensing device mounted on a resident’s feet and its indoor positioning algorithm to realize an effective indoor pedestrian navigation system for smart energy management; (3) a multisensor circuit module and an intelligent fire detection and alarm algorithm to realize a home safety and fire detection system. In addition, an intelligent monitoring interface is developed to provide in real-time information about the smart home system, such as environmental temperatures, CO concentrations, communicative environmental alarms, household appliance status, human motion signals, and the results of gesture recognition and indoor positioning. Furthermore, an experimental testbed for validating the effectiveness and feasibility of the smart home system was built and verified experimentally. The results showed that the 3D gesture recognition algorithm could achieve recognition rates for automated household appliance control of 92.0%, 94.8%, 95.3%, and 87.7% by the 2-fold cross-validation, 5-fold cross-validation, 10-fold cross-validation, and leave-one-subject-out cross-validation strategies. For indoor positioning and smart energy management, the distance accuracy and positioning accuracy were around 0.22% and 3.36% of the total traveled distance in the indoor environment. For home safety and fire detection, the classification rate achieved 98.81% accuracy for determining the conditions of the indoor living environment. PMID:28714884
NASA Astrophysics Data System (ADS)
Folkert, Michael R.; Setton, Jeremy; Apte, Aditya P.; Grkovski, Milan; Young, Robert J.; Schöder, Heiko; Thorstad, Wade L.; Lee, Nancy Y.; Deasy, Joseph O.; Oh, Jung Hun
2017-07-01
In this study, we investigate the use of imaging feature-based outcomes research (‘radiomics’) combined with machine learning techniques to develop robust predictive models for the risk of all-cause mortality (ACM), local failure (LF), and distant metastasis (DM) following definitive chemoradiation therapy (CRT). One hundred seventy four patients with stage III-IV oropharyngeal cancer (OC) treated at our institution with CRT with retrievable pre- and post-treatment 18F-fluorodeoxyglucose positron emission tomography (FDG-PET) scans were identified. From pre-treatment PET scans, 24 representative imaging features of FDG-avid disease regions were extracted. Using machine learning-based feature selection methods, multiparameter logistic regression models were built incorporating clinical factors and imaging features. All model building methods were tested by cross validation to avoid overfitting, and final outcome models were validated on an independent dataset from a collaborating institution. Multiparameter models were statistically significant on 5 fold cross validation with the area under the receiver operating characteristic curve (AUC) = 0.65 (p = 0.004), 0.73 (p = 0.026), and 0.66 (p = 0.015) for ACM, LF, and DM, respectively. The model for LF retained significance on the independent validation cohort with AUC = 0.68 (p = 0.029) whereas the models for ACM and DM did not reach statistical significance, but resulted in comparable predictive power to the 5 fold cross validation with AUC = 0.60 (p = 0.092) and 0.65 (p = 0.062), respectively. In the largest study of its kind to date, predictive features including increasing metabolic tumor volume, increasing image heterogeneity, and increasing tumor surface irregularity significantly correlated to mortality, LF, and DM on 5 fold cross validation in a relatively uniform single-institution cohort. The LF model also retained significance in an independent population.
A Diagnostic Model for Impending Death in Cancer Patients: Preliminary Report
Hui, David; Hess, Kenneth; dos Santos, Renata; Chisholm, Gary; Bruera, Eduardo
2015-01-01
Background We recently identified several highly specific bedside physical signs associated with impending death within 3 days among patients with advanced cancer. In this study, we developed and assessed a diagnostic model for impending death based on these physical signs. Methods We systematically documented 62 physical signs every 12 hours from admission to death or discharge in 357 patients with advanced cancer admitted to acute palliative care units (APCUs) at two tertiary care cancer centers. We used recursive partitioning analysis (RPA) to develop a prediction model for impending death in 3 days using admission data. We validated the model with 5 iterations of 10-fold cross-validation, and also applied the model to APCU days 2/3/4/5/6. Results Among 322/357 (90%) patients with complete data for all signs, the 3-day mortality was 24% on admission. The final model was based on 2 variables (palliative performance scale [PPS] and drooping of nasolabial fold) and had 4 terminal leaves: PPS≤20% and drooping of nasolabial fold present, PPS≤20% and drooping of nasolabial fold absent, PPS 30–60% and PPS ≥ 70%, with 3-day mortality of 94%, 42%, 16% and 3%, respectively. The diagnostic accuracy was 81% for the original tree, 80% for cross-validation, and 79%–84% for subsequent APCU days. Conclusion(s) We developed a diagnostic model for impending death within 3 days based on 2 objective bedside physical signs. This model was applicable to both APCU admission and subsequent days. Upon further external validation, this model may help clinicians to formulate the diagnosis of impending death. PMID:26218612
Gupta, Meenal; Moily, Nagaraj S; Kaur, Harpreet; Jajodia, Ajay; Jain, Sanjeev; Kukreti, Ritushree
2013-08-01
Atypical antipsychotic (AAP) drugs are the preferred choice of treatment for schizophrenia patients. Patients who do not show favorable response to AAP monotherapy are subjected to random prolonged therapeutic treatment with AAP multitherapy, typical antipsychotics or a combination of both. Therefore, prior identification of patients' response to drugs can be an important step in providing efficacious and safe therapeutic treatment. We thus attempted to elucidate a genetic signature which could predict patients' response to AAP monotherapy. Our logistic regression analyses indicated the probability that 76% patients carrying combination of four SNPs will not show favorable response to AAP therapy. The robustness of this prediction model was assessed using repeated 10-fold cross validation method, and the results across n-fold cross-validations (mean accuracy=71.91%; 95%CI=71.47-72.35) suggest high accuracy and reliability of the prediction model. Further validations of these results in large sample sets are likely to establish their clinical applicability. Copyright © 2013 Elsevier Inc. All rights reserved.
Wang, Wenyi; Kim, Marlene T.; Sedykh, Alexander
2015-01-01
Purpose Experimental Blood–Brain Barrier (BBB) permeability models for drug molecules are expensive and time-consuming. As alternative methods, several traditional Quantitative Structure-Activity Relationship (QSAR) models have been developed previously. In this study, we aimed to improve the predictivity of traditional QSAR BBB permeability models by employing relevant public bio-assay data in the modeling process. Methods We compiled a BBB permeability database consisting of 439 unique compounds from various resources. The database was split into a modeling set of 341 compounds and a validation set of 98 compounds. Consensus QSAR modeling workflow was employed on the modeling set to develop various QSAR models. A five-fold cross-validation approach was used to validate the developed models, and the resulting models were used to predict the external validation set compounds. Furthermore, we used previously published membrane transporter models to generate relevant transporter profiles for target compounds. The transporter profiles were used as additional biological descriptors to develop hybrid QSAR BBB models. Results The consensus QSAR models have R2=0.638 for fivefold cross-validation and R2=0.504 for external validation. The consensus model developed by pooling chemical and transporter descriptors showed better predictivity (R2=0.646 for five-fold cross-validation and R2=0.526 for external validation). Moreover, several external bio-assays that correlate with BBB permeability were identified using our automatic profiling tool. Conclusions The BBB permeability models developed in this study can be useful for early evaluation of new compounds (e.g., new drug candidates). The combination of chemical and biological descriptors shows a promising direction to improve the current traditional QSAR models. PMID:25862462
A diagnostic model for impending death in cancer patients: Preliminary report.
Hui, David; Hess, Kenneth; dos Santos, Renata; Chisholm, Gary; Bruera, Eduardo
2015-11-01
Several highly specific bedside physical signs associated with impending death within 3 days for patients with advanced cancer were recently identified. A diagnostic model for impending death based on these physical signs was developed and assessed. Sixty-two physical signs were systematically documented every 12 hours from admission to death or discharge for 357 patients with advanced cancer who were admitted to acute palliative care units (APCUs) at 2 tertiary care cancer centers. Recursive partitioning analysis was used to develop a prediction model for impending death within 3 days with admission data. The model was validated with 5 iterations of 10-fold cross-validation, and the model was also applied to APCU days 2 to 6. For the 322 of 357 patients (90%) with complete data for all signs, the 3-day mortality rate was 24% on admission. The final model was based on 2 variables (Palliative Performance Scale [PPS] and drooping of nasolabial folds) and had 4 terminal leaves: PPS score ≤ 20% and drooping of nasolabial folds present, PPS score ≤ 20% and drooping of nasolabial folds absent, PPS score of 30% to 60%, and PPS score ≥ 70%. The 3-day mortality rates were 94%, 42%, 16%, and 3%, respectively. The diagnostic accuracy was 81% for the original tree, 80% for cross-validation, and 79% to 84% for subsequent APCU days. Based on 2 objective bedside physical signs, a diagnostic model was developed for impending death within 3 days. This model was applicable to both APCU admission and subsequent days. Upon further external validation, this model may help clinicians to formulate the diagnosis of impending death. © 2015 American Cancer Society.
Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction.
Zhou, Yao; Vales, M Isabel; Wang, Aoxue; Zhang, Zhiwu
2017-09-01
Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Predicting protein-binding regions in RNA using nucleotide profiles and compositions.
Choi, Daesik; Park, Byungkyu; Chae, Hanju; Lee, Wook; Han, Kyungsook
2017-03-14
Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. We developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others. Sequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful features than nucleotide compositions in finding protein-binding regions in RNA sequences. But, a slight performance gain was obtained when using the sequence profiles along with nucleotide compositions. These are preliminary results of ongoing research, but demonstrate the potential of our approach as a powerful predictor of protein-binding regions in RNA. The program and supporting data are available at http://bclab.inha.ac.kr/RBPbinding .
Development of a Bayesian model to estimate health care outcomes in the severely wounded
Stojadinovic, Alexander; Eberhardt, John; Brown, Trevor S; Hawksworth, Jason S; Gage, Frederick; Tadaki, Douglas K; Forsberg, Jonathan A; Davis, Thomas A; Potter, Benjamin K; Dunne, James R; Elster, E A
2010-01-01
Background: Graphical probabilistic models have the ability to provide insights as to how clinical factors are conditionally related. These models can be used to help us understand factors influencing health care outcomes and resource utilization, and to estimate morbidity and clinical outcomes in trauma patient populations. Study design: Thirty-two combat casualties with severe extremity injuries enrolled in a prospective observational study were analyzed using step-wise machine-learned Bayesian belief network (BBN) and step-wise logistic regression (LR). Models were evaluated using 10-fold cross-validation to calculate area-under-the-curve (AUC) from receiver operating characteristics (ROC) curves. Results: Our BBN showed important associations between various factors in our data set that could not be developed using standard regression methods. Cross-validated ROC curve analysis showed that our BBN model was a robust representation of our data domain and that LR models trained on these findings were also robust: hospital-acquired infection (AUC: LR, 0.81; BBN, 0.79), intensive care unit length of stay (AUC: LR, 0.97; BBN, 0.81), and wound healing (AUC: LR, 0.91; BBN, 0.72) showed strong AUC. Conclusions: A BBN model can effectively represent clinical outcomes and biomarkers in patients hospitalized after severe wounding, and is confirmed by 10-fold cross-validation and further confirmed through logistic regression modeling. The method warrants further development and independent validation in other, more diverse patient populations. PMID:21197361
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, H; Liu, T; Xu, X
Purpose: There are clinical decision challenges to select optimal treatment positions for left-sided breast cancer patients—supine free breathing (FB), supine Deep Inspiration Breath Hold (DIBH) and prone free breathing (prone). Physicians often make the decision based on experiences and trials, which might not always result optimal OAR doses. We herein propose a mathematical model to predict the lowest OAR doses among these three positions, providing a quantitative tool for corresponding clinical decision. Methods: Patients were scanned in FB, DIBH, and prone positions under an IRB approved protocol. Tangential beam plans were generated for each position, and OAR doses were calculated.more » The position with least OAR doses is defined as the optimal position. The following features were extracted from each scan to build the model: heart, ipsilateral lung, breast volume, in-field heart, ipsilateral lung volume, distance between heart and target, laterality of heart, and dose to heart and ipsilateral lung. Principal Components Analysis (PCA) was applied to remove the co-linearity of the input data and also to lower the data dimensionality. Feature selection, another method to reduce dimensionality, was applied as a comparison. Support Vector Machine (SVM) was then used for classification. Thirtyseven patient data were acquired; up to now, five patient plans were available. K-fold cross validation was used to validate the accuracy of the classifier model with small training size. Results: The classification results and K-fold cross validation demonstrated the model is capable of predicting the optimal position for patients. The accuracy of K-fold cross validations has reached 80%. Compared to PCA, feature selection allows causal features of dose to be determined. This provides more clinical insights. Conclusion: The proposed classification system appeared to be feasible. We are generating plans for the rest of the 37 patient images, and more statistically significant results are to be presented.« less
Zhang, Hui; Ren, Ji-Xia; Kang, Yan-Li; Bo, Peng; Liang, Jun-Yu; Ding, Lan; Kong, Wei-Bao; Zhang, Ji
2017-08-01
Toxicological testing associated with developmental toxicity endpoints are very expensive, time consuming and labor intensive. Thus, developing alternative approaches for developmental toxicity testing is an important and urgent task in the drug development filed. In this investigation, the naïve Bayes classifier was applied to develop a novel prediction model for developmental toxicity. The established prediction model was evaluated by the internal 5-fold cross validation and external test set. The overall prediction results for the internal 5-fold cross validation of the training set and external test set were 96.6% and 82.8%, respectively. In addition, four simple descriptors and some representative substructures of developmental toxicants were identified. Thus, we hope the established in silico prediction model could be used as alternative method for toxicological assessment. And these obtained molecular information could afford a deeper understanding on the developmental toxicants, and provide guidance for medicinal chemists working in drug discovery and lead optimization. Copyright © 2017 Elsevier Inc. All rights reserved.
Cheng, Feixiong; Shen, Jie; Yu, Yue; Li, Weihua; Liu, Guixia; Lee, Philip W; Tang, Yun
2011-03-01
There is an increasing need for the rapid safety assessment of chemicals by both industries and regulatory agencies throughout the world. In silico techniques are practical alternatives in the environmental hazard assessment. It is especially true to address the persistence, bioaccumulative and toxicity potentials of organic chemicals. Tetrahymena pyriformis toxicity is often used as a toxic endpoint. In this study, 1571 diverse unique chemicals were collected from the literature and composed of the largest diverse data set for T. pyriformis toxicity. Classification predictive models of T. pyriformis toxicity were developed by substructure pattern recognition and different machine learning methods, including support vector machine (SVM), C4.5 decision tree, k-nearest neighbors and random forest. The results of a 5-fold cross-validation showed that the SVM method performed better than other algorithms. The overall predictive accuracies of the SVM classification model with radial basis functions kernel was 92.2% for the 5-fold cross-validation and 92.6% for the external validation set, respectively. Furthermore, several representative substructure patterns for characterizing T. pyriformis toxicity were also identified via the information gain analysis methods. Copyright © 2010 Elsevier Ltd. All rights reserved.
Automated measurement of vocal fold vibratory asymmetry from high-speed videoendoscopy recordings.
Mehta, Daryush D; Deliyski, Dimitar D; Quatieri, Thomas F; Hillman, Robert E
2011-02-01
In prior work, a manually derived measure of vocal fold vibratory phase asymmetry correlated to varying degrees with visual judgments made from laryngeal high-speed videoendoscopy (HSV) recordings. This investigation extended this work by establishing an automated HSV-based framework to quantify 3 categories of vocal fold vibratory asymmetry. HSV-based analysis provided for cycle-to-cycle estimates of left-right phase asymmetry, left-right amplitude asymmetry, and axis shift during glottal closure for 52 speakers with no vocal pathology producing comfortable and pressed phonation. An initial cross-validation of the automated left-right phase asymmetry measure was performed by correlating the measure with other objective and subjective assessments of phase asymmetry. Vocal fold vibratory asymmetry was exhibited to a similar extent in both comfortable and pressed phonations. The automated measure of left-right phase asymmetry strongly correlated with manually derived measures and moderately correlated with visual-perceptual ratings. Correlations with the visual-perceptual ratings remained relatively consistent as the automated measure was derived from kymograms taken at different glottal locations. An automated HSV-based framework for the quantification of vocal fold vibratory asymmetry was developed and initially validated. This framework serves as a platform for investigating relationships between vocal fold tissue motion and acoustic measures of voice function.
Choi, Kwanghun; Spohn, Marie; Park, Soo Jin; Huwe, Bernd; Ließ, Mareike
2017-01-01
Nitrogen (N) and phosphorus (P) in topsoils are critical for plant nutrition. Relatively little is known about the spatial patterns of N and P in the organic layer of mountainous landscapes. Therefore, the spatial distributions of N and P in both the organic layer and the A horizon were analyzed using a light detection and ranging (LiDAR) digital elevation model and vegetation metrics. The objective of the study was to analyze the effect of vegetation and topography on the spatial patterns of N and P in a small watershed covered by forest in South Korea. Soil samples were collected using the conditioned latin hypercube method. LiDAR vegetation metrics, the normalized difference vegetation index (NDVI), and terrain parameters were derived as predictors. Spatial explicit predictions of N/P ratios were obtained using a random forest with uncertainty analysis. We tested different strategies of model validation (repeated 2-fold to 20-fold and leave-one-out cross validation). Repeated 10-fold cross validation was selected for model validation due to the comparatively high accuracy and low variance of prediction. Surface curvature was the best predictor of P contents in the organic layer and in the A horizon, while LiDAR vegetation metrics and NDVI were important predictors of N in the organic layer. N/P ratios increased with surface curvature and were higher on the convex upper slope than on the concave lower slope. This was due to P enrichment of the soil on the lower slope and a more even spatial distribution of N. Our digital soil maps showed that the topsoils on the upper slopes contained relatively little P. These findings are critical for understanding N and P dynamics in mountainous ecosystems. PMID:28837590
Non-destructive Techniques for Classifying Aircraft Coating Degradation
2015-03-26
model is bidirectional reflectance distribution func- tions ( BRDF ) which describes how much radiation is reflected for each solid angle and each...incident angle. An intermediate model between ideal reflectors and BRDF is to assume all reflectance is a combination of diffuse and specular reflectance...19 K-Fold Cross Validation
Schleier, Jerome J.; Peterson, Robert K.D.; Irvine, Kathryn M.; Marshall, Lucy M.; Weaver, David K.; Preftakes, Collin J.
2012-01-01
One of the more effective ways of managing high densities of adult mosquitoes that vector human and animal pathogens is ultra-low-volume (ULV) aerosol applications of insecticides. The U.S. Environmental Protection Agency uses models that are not validated for ULV insecticide applications and exposure assumptions to perform their human and ecological risk assessments. Currently, there is no validated model that can accurately predict deposition of insecticides applied using ULV technology for adult mosquito management. In addition, little is known about the deposition and drift of small droplets like those used under conditions encountered during ULV applications. The objective of this study was to perform field studies to measure environmental concentrations of insecticides and to develop a validated model to predict the deposition of ULV insecticides. The final regression model was selected by minimizing the Bayesian Information Criterion and its prediction performance was evaluated using k-fold cross validation. Density of the formulation and the density and CMD interaction coefficients were the largest in the model. The results showed that as density of the formulation decreases, deposition increases. The interaction of density and CMD showed that higher density formulations and larger droplets resulted in greater deposition. These results are supported by the aerosol physics literature. A k-fold cross validation demonstrated that the mean square error of the selected regression model is not biased, and the mean square error and mean square prediction error indicated good predictive ability.
Novel Screening Tool for Stroke Using Artificial Neural Network.
Abedi, Vida; Goyal, Nitin; Tsivgoulis, Georgios; Hosseinichimeh, Niyousha; Hontecillas, Raquel; Bassaganya-Riera, Josep; Elijovich, Lucas; Metter, Jeffrey E; Alexandrov, Anne W; Liebeskind, David S; Alexandrov, Andrei V; Zand, Ramin
2017-06-01
The timely diagnosis of stroke at the initial examination is extremely important given the disease morbidity and narrow time window for intervention. The goal of this study was to develop a supervised learning method to recognize acute cerebral ischemia (ACI) and differentiate that from stroke mimics in an emergency setting. Consecutive patients presenting to the emergency department with stroke-like symptoms, within 4.5 hours of symptoms onset, in 2 tertiary care stroke centers were randomized for inclusion in the model. We developed an artificial neural network (ANN) model. The learning algorithm was based on backpropagation. To validate the model, we used a 10-fold cross-validation method. A total of 260 patients (equal number of stroke mimics and ACIs) were enrolled for the development and validation of our ANN model. Our analysis indicated that the average sensitivity and specificity of ANN for the diagnosis of ACI based on the 10-fold cross-validation analysis was 80.0% (95% confidence interval, 71.8-86.3) and 86.2% (95% confidence interval, 78.7-91.4), respectively. The median precision of ANN for the diagnosis of ACI was 92% (95% confidence interval, 88.7-95.3). Our results show that ANN can be an effective tool for the recognition of ACI and differentiation of ACI from stroke mimics at the initial examination. © 2017 American Heart Association, Inc.
Watch-Dog: Detecting Self-Harming Activities From Wrist Worn Accelerometers.
Bharti, Pratool; Panwar, Anurag; Gopalakrishna, Ganesh; Chellappan, Sriram
2018-05-01
In a 2012 survey, in the United States alone, there were more than 35 000 reported suicides with approximately 1800 of being psychiatric inpatients. Recent Centers for Disease Control and Prevention (CDC) reports indicate an upward trend in these numbers. In psychiatric facilities, staff perform intermittent or continuous observation of patients manually in order to prevent such tragedies, but studies show that they are insufficient, and also consume staff time and resources. In this paper, we present the Watch-Dog system, to address the problem of detecting self-harming activities when attempted by in-patients in clinical settings. Watch-Dog comprises of three key components-Data sensed by tiny accelerometer sensors worn on wrists of subjects; an efficient algorithm to classify whether a user is active versus dormant (i.e., performing a physical activity versus not performing any activity); and a novel decision selection algorithm based on random forests and continuity indices for fine grained activity classification. With data acquired from 11 subjects performing a series of activities (both self-harming and otherwise), Watch-Dog achieves a classification accuracy of , , and for same-user 10-fold cross-validation, cross-user 10-fold cross-validation, and cross-user leave-one-out evaluation, respectively. We believe that the problem addressed in this paper is practical, important, and timely. We also believe that our proposed system is practically deployable, and related discussions are provided in this paper.
Parkinson's disease detection based on dysphonia measurements
NASA Astrophysics Data System (ADS)
Lahmiri, Salim
2017-04-01
Assessing dysphonic symptoms is a noninvasive and effective approach to detect Parkinson's disease (PD) in patients. The main purpose of this study is to investigate the effect of different dysphonia measurements on PD detection by support vector machine (SVM). Seven categories of dysphonia measurements are considered. Experimental results from ten-fold cross-validation technique demonstrate that vocal fundamental frequency statistics yield the highest accuracy of 88 % ± 0.04. When all dysphonia measurements are employed, the SVM classifier achieves 94 % ± 0.03 accuracy. A refinement of the original patterns space by removing dysphonia measurements with similar variation across healthy and PD subjects allows achieving 97.03 % ± 0.03 accuracy. The latter performance is larger than what is reported in the literature on the same dataset with ten-fold cross-validation technique. Finally, it was found that measures of ratio of noise to tonal components in the voice are the most suitable dysphonic symptoms to detect PD subjects as they achieve 99.64 % ± 0.01 specificity. This finding is highly promising for understanding PD symptoms.
A Computational Model for Predicting RNase H Domain of Retrovirus.
Wu, Sijia; Zhang, Xinman; Han, Jiuqiang
2016-01-01
RNase H (RNH) is a pivotal domain in retrovirus to cleave the DNA-RNA hybrid for continuing retroviral replication. The crucial role indicates that RNH is a promising drug target for therapeutic intervention. However, annotated RNHs in UniProtKB database have still been insufficient for a good understanding of their statistical characteristics so far. In this work, a computational RNH model was proposed to annotate new putative RNHs (np-RNHs) in the retroviruses. It basically predicts RNH domains through recognizing their start and end sites separately with SVM method. The classification accuracy rates are 100%, 99.01% and 97.52% respectively corresponding to jack-knife, 10-fold cross-validation and 5-fold cross-validation test. Subsequently, this model discovered 14,033 np-RNHs after scanning sequences without RNH annotations. All these predicted np-RNHs and annotated RNHs were employed to analyze the length, hydrophobicity and evolutionary relationship of RNH domains. They are all related to retroviral genera, which validates the classification of retroviruses to a certain degree. In the end, a software tool was designed for the application of our prediction model. The software together with datasets involved in this paper can be available for free download at https://sourceforge.net/projects/rhtool/files/?source=navbar.
Mapping the Transmission Risk of Zika Virus using Machine Learning Models.
Jiang, Dong; Hao, Mengmeng; Ding, Fangyu; Fu, Jingying; Li, Meng
2018-06-19
Zika virus, which has been linked to severe congenital abnormalities, is exacerbating global public health problems with its rapid transnational expansion fueled by increased global travel and trade. Suitability mapping of the transmission risk of Zika virus is essential for drafting public health plans and disease control strategies, which are especially important in areas where medical resources are relatively scarce. Predicting the risk of Zika virus outbreak has been studied in recent years, but the published literature rarely includes multiple model comparisons or predictive uncertainty analysis. Here, three relatively popular machine learning models including backward propagation neural network (BPNN), gradient boosting machine (GBM) and random forest (RF) were adopted to map the probability of Zika epidemic outbreak at the global level, pairing high-dimensional multidisciplinary covariate layers with comprehensive location data on recorded Zika virus infection in humans. The results show that the predicted high-risk areas for Zika transmission are concentrated in four regions: Southeastern North America, Eastern South America, Central Africa and Eastern Asia. To evaluate the performance of machine learning models, the 50 modeling processes were conducted based on a training dataset. The BPNN model obtained the highest predictive accuracy with a 10-fold cross-validation area under the curve (AUC) of 0.966 [95% confidence interval (CI) 0.965-0.967], followed by the GBM model (10-fold cross-validation AUC = 0.964[0.963-0.965]) and the RF model (10-fold cross-validation AUC = 0.963[0.962-0.964]). Based on training samples, compared with the BPNN-based model, we find that significant differences (p = 0.0258* and p = 0.0001***, respectively) are observed for prediction accuracies achieved by the GBM and RF models. Importantly, the prediction uncertainty introduced by the selection of absence data was quantified and could provide more accurate fundamental and scientific information for further study on disease transmission prediction and risk assessment. Copyright © 2018. Published by Elsevier B.V.
Using patient data similarities to predict radiation pneumonitis via a self-organizing map
NASA Astrophysics Data System (ADS)
Chen, Shifeng; Zhou, Sumin; Yin, Fang-Fang; Marks, Lawrence B.; Das, Shiva K.
2008-01-01
This work investigates the use of the self-organizing map (SOM) technique for predicting lung radiation pneumonitis (RP) risk. SOM is an effective method for projecting and visualizing high-dimensional data in a low-dimensional space (map). By projecting patients with similar data (dose and non-dose factors) onto the same region of the map, commonalities in their outcomes can be visualized and categorized. Once built, the SOM may be used to predict pneumonitis risk by identifying the region of the map that is most similar to a patient's characteristics. Two SOM models were developed from a database of 219 lung cancer patients treated with radiation therapy (34 clinically diagnosed with Grade 2+ pneumonitis). The models were: SOMall built from all dose and non-dose factors and, for comparison, SOMdose built from dose factors alone. Both models were tested using ten-fold cross validation and Receiver Operating Characteristics (ROC) analysis. Models SOMall and SOMdose yielded ten-fold cross-validated ROC areas of 0.73 (sensitivity/specificity = 71%/68%) and 0.67 (sensitivity/specificity = 63%/66%), respectively. The significant difference between the cross-validated ROC areas of these two models (p < 0.05) implies that non-dose features add important information toward predicting RP risk. Among the input features selected by model SOMall, the two with highest impact for increasing RP risk were: (a) higher mean lung dose and (b) chemotherapy prior to radiation therapy. The SOM model developed here may not be extrapolated to treatment techniques outside that used in our database, such as several-field lung intensity modulated radiation therapy or gated radiation therapy.
Support vector machines and generalisation in HEP
NASA Astrophysics Data System (ADS)
Bevan, Adrian; Gamboa Goñi, Rodrigo; Hays, Jon; Stevenson, Tom
2017-10-01
We review the concept of Support Vector Machines (SVMs) and discuss examples of their use in a number of scenarios. Several SVM implementations have been used in HEP and we exemplify this algorithm using the Toolkit for Multivariate Analysis (TMVA) implementation. We discuss examples relevant to HEP including background suppression for H → τ + τ - at the LHC with several different kernel functions. Performance benchmarking leads to the issue of generalisation of hyper-parameter selection. The avoidance of fine tuning (over training or over fitting) in MVA hyper-parameter optimisation, i.e. the ability to ensure generalised performance of an MVA that is independent of the training, validation and test samples, is of utmost importance. We discuss this issue and compare and contrast performance of hold-out and k-fold cross-validation. We have extended the SVM functionality and introduced tools to facilitate cross validation in TMVA and present results based on these improvements.
Li, Haiquan; Dai, Xinbin; Zhao, Xuechun
2008-05-01
Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.
Classification of echolocation clicks from odontocetes in the Southern California Bight.
Roch, Marie A; Klinck, Holger; Baumann-Pickering, Simone; Mellinger, David K; Qui, Simon; Soldevilla, Melissa S; Hildebrand, John A
2011-01-01
This study presents a system for classifying echolocation clicks of six species of odontocetes in the Southern California Bight: Visually confirmed bottlenose dolphins, short- and long-beaked common dolphins, Pacific white-sided dolphins, Risso's dolphins, and presumed Cuvier's beaked whales. Echolocation clicks are represented by cepstral feature vectors that are classified by Gaussian mixture models. A randomized cross-validation experiment is designed to provide conditions similar to those found in a field-deployed system. To prevent matched conditions from inappropriately lowering the error rate, echolocation clicks associated with a single sighting are never split across the training and test data. Sightings are randomly permuted before assignment to folds in the experiment. This allows different combinations of the training and test data to be used while keeping data from each sighting entirely in the training or test set. The system achieves a mean error rate of 22% across 100 randomized three-fold cross-validation experiments. Four of the six species had mean error rates lower than the overall mean, with the presumed Cuvier's beaked whale clicks showing the best performance (<2% error rate). Long-beaked common and bottlenose dolphins proved the most difficult to classify, with mean error rates of 53% and 68%, respectively.
Zhang, Yingtao; Wang, Tao; Liu, Kangkang; Xia, Yao; Lu, Yi; Jing, Qinlong; Yang, Zhicong; Hu, Wenbiao; Lu, Jiahai
2016-02-01
Dengue is a re-emerging infectious disease of humans, rapidly growing from endemic areas to dengue-free regions due to favorable conditions. In recent decades, Guangzhou has again suffered from several big outbreaks of dengue; as have its neighboring cities. This study aims to examine the impact of dengue epidemics in Guangzhou, China, and to develop a predictive model for Zhongshan based on local weather conditions and Guangzhou dengue surveillance information. We obtained weekly dengue case data from 1st January, 2005 to 31st December, 2014 for Guangzhou and Zhongshan city from the Chinese National Disease Surveillance Reporting System. Meteorological data was collected from the Zhongshan Weather Bureau and demographic data was collected from the Zhongshan Statistical Bureau. A negative binomial regression model with a log link function was used to analyze the relationship between weekly dengue cases in Guangzhou and Zhongshan, controlling for meteorological factors. Cross-correlation functions were applied to identify the time lags of the effect of each weather factor on weekly dengue cases. Models were validated using receiver operating characteristic (ROC) curves and k-fold cross-validation. Our results showed that weekly dengue cases in Zhongshan were significantly associated with dengue cases in Guangzhou after the treatment of a 5 weeks prior moving average (Relative Risk (RR) = 2.016, 95% Confidence Interval (CI): 1.845-2.203), controlling for weather factors including minimum temperature, relative humidity, and rainfall. ROC curve analysis indicated our forecasting model performed well at different prediction thresholds, with 0.969 area under the receiver operating characteristic curve (AUC) for a threshold of 3 cases per week, 0.957 AUC for a threshold of 2 cases per week, and 0.938 AUC for a threshold of 1 case per week. Models established during k-fold cross-validation also had considerable AUC (average 0.938-0.967). The sensitivity and specificity obtained from k-fold cross-validation was 78.83% and 92.48% respectively, with a forecasting threshold of 3 cases per week; 91.17% and 91.39%, with a threshold of 2 cases; and 85.16% and 87.25% with a threshold of 1 case. The out-of-sample prediction for the epidemics in 2014 also showed satisfactory performance. Our study findings suggest that the occurrence of dengue outbreaks in Guangzhou could impact dengue outbreaks in Zhongshan under suitable weather conditions. Future studies should focus on developing integrated early warning systems for dengue transmission including local weather and human movement.
Genomic selection across multiple breeding cycles in applied bread wheat breeding.
Michel, Sebastian; Ametz, Christian; Gungor, Huseyin; Epure, Doru; Grausgruber, Heinrich; Löschenberger, Franziska; Buerstmayr, Hermann
2016-06-01
We evaluated genomic selection across five breeding cycles of bread wheat breeding. Bias of within-cycle cross-validation and methods for improving the prediction accuracy were assessed. The prospect of genomic selection has been frequently shown by cross-validation studies using the same genetic material across multiple environments, but studies investigating genomic selection across multiple breeding cycles in applied bread wheat breeding are lacking. We estimated the prediction accuracy of grain yield, protein content and protein yield of 659 inbred lines across five independent breeding cycles and assessed the bias of within-cycle cross-validation. We investigated the influence of outliers on the prediction accuracy and predicted protein yield by its components traits. A high average heritability was estimated for protein content, followed by grain yield and protein yield. The bias of the prediction accuracy using populations from individual cycles using fivefold cross-validation was accordingly substantial for protein yield (17-712 %) and less pronounced for protein content (8-86 %). Cross-validation using the cycles as folds aimed to avoid this bias and reached a maximum prediction accuracy of [Formula: see text] = 0.51 for protein content, [Formula: see text] = 0.38 for grain yield and [Formula: see text] = 0.16 for protein yield. Dropping outlier cycles increased the prediction accuracy of grain yield to [Formula: see text] = 0.41 as estimated by cross-validation, while dropping outlier environments did not have a significant effect on the prediction accuracy. Independent validation suggests, on the other hand, that careful consideration is necessary before an outlier correction is undertaken, which removes lines from the training population. Predicting protein yield by multiplying genomic estimated breeding values of grain yield and protein content raised the prediction accuracy to [Formula: see text] = 0.19 for this derived trait.
Geometry and Kinematics of Fault-Propagation Folds with Variable Interlimb Angles
NASA Astrophysics Data System (ADS)
Dhont, D.; Jabbour, M.; Hervouet, Y.; Deroin, J.
2009-12-01
Fault-propagation folds are common features in foreland basins and fold-and-thrust belts. Several conceptual models have been proposed to account for their geometry and kinematics. It is generally accepted that the shape of fault-propagation folds depends directly from both the amount of displacement along the basal decollement level and the dip angle of the ramp. Among these, the variable interlimb angle model proposed by Mitra (1990) is based on a folding kinematics that is able to explain open and close natural folds. However, the application of this model is limited because the geometric evolution and thickness variation of the fold directly depend on imposed parameters such as the maximal value of the ramp height. Here, we use the ramp and the interlimb angles as input data to develop a forward fold modelling accounting for thickness variations in the forelimb. The relationship between the fold amplitude and fold wavelength are subsequently applied to build balanced geologic cross-sections from surface parameters only, and to propose a kinematic restoration of the folding through time. We considered three natural examples to validate the variable interlimb angle model. Observed thickness variations in the forelimb of the Turner Valley anticline in the Alberta foothills of Canada precisely correspond to the theoretical values proposed by our model. Deep reconstruction of the Alima anticline in the southern Tunisian Atlas implies that the decollement level is localized in the Triassic-Liassic series, as highlighted by seismic imaging. Our kinematic reconstruction of the Ucero anticline in the Spanish Castilian mountains is also in agreement with the anticline geometry derived from two cross-sections. The variable interlimb angle model implies that the fault-propagation fold can be symmetric, normal asymmetric (with a greater dip value in the forelimb than in the backlimb), or reversely asymmetric (with greater dip in the backlimb) depending on the shortening amount. This model allows also: (i) to easily explain folds with wide variety of geometries; (ii) to understand the deep architecture of anticlines; and (iii) to deduce the kinematic evolution of folding with time. Mitra, S., 1990, Fault-propagation folds: geometry, kinematic evolution, and hydrocarbon traps. AAPG Bulletin, v. 74, no. 6, p. 921-945.
Benchmark of Machine Learning Methods for Classification of a SENTINEL-2 Image
NASA Astrophysics Data System (ADS)
Pirotti, F.; Sunar, F.; Piragnolo, M.
2016-06-01
Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.
Brodie, Nicholas I.; Popov, Konstantin I.; Petrotchenko, Evgeniy V.; Dokholyan, Nikolay V.; Borchers, Christoph H.
2017-01-01
We present an integrated experimental and computational approach for de novo protein structure determination in which short-distance cross-linking data are incorporated into rapid discrete molecular dynamics (DMD) simulations as constraints, reducing the conformational space and achieving the correct protein folding on practical time scales. We tested our approach on myoglobin and FK506 binding protein—models for α helix–rich and β sheet–rich proteins, respectively—and found that the lowest-energy structures obtained were in agreement with the crystal structure, hydrogen-deuterium exchange, surface modification, and long-distance cross-linking validation data. Our approach is readily applicable to other proteins with unknown structures. PMID:28695211
Brodie, Nicholas I; Popov, Konstantin I; Petrotchenko, Evgeniy V; Dokholyan, Nikolay V; Borchers, Christoph H
2017-07-01
We present an integrated experimental and computational approach for de novo protein structure determination in which short-distance cross-linking data are incorporated into rapid discrete molecular dynamics (DMD) simulations as constraints, reducing the conformational space and achieving the correct protein folding on practical time scales. We tested our approach on myoglobin and FK506 binding protein-models for α helix-rich and β sheet-rich proteins, respectively-and found that the lowest-energy structures obtained were in agreement with the crystal structure, hydrogen-deuterium exchange, surface modification, and long-distance cross-linking validation data. Our approach is readily applicable to other proteins with unknown structures.
Farkas, Viktor; Jákli, Imre; Tóth, Gábor K; Perczel, András
2016-09-19
Both far- and near-UV electronic circular dichroism (ECD) spectra have bands sensitive to thermal unfolding of Trp and Tyr residues containing proteins. Beside spectral changes at 222 nm reporting secondary structural variations (far-UV range), L b bands (near-UV range) are applicable as 3D-fold sensors of protein's core structure. In this study we show that both L b (Tyr) and L b (Trp) ECD bands could be used as sensors of fold compactness. ECD is a relative method and thus requires NMR referencing and cross-validation, also provided here. The ensemble of 204 ECD spectra of Trp-cage miniproteins is analysed as a training set for "calibrating" Trp↔Tyr folded systems of known NMR structure. While in the far-UV ECD spectra changes are linear as a function of the temperature, near-UV ECD data indicate a non-linear and thus, cooperative unfolding mechanism of these proteins. Ensemble of ECD spectra deconvoluted gives both conformational weights and insight to a protein folding↔unfolding mechanism. We found that the L b 293 band is reporting on the 3D-structure compactness. In addition, the pure near-UV ECD spectrum of the unfolded state is described here for the first time. Thus, ECD folding information now validated can be applied with confidence in a large thermal window (5≤T≤85 °C) compared to NMR for studying the unfolding of Trp↔Tyr residue pairs. In conclusion, folding propensities of important proteins (RNA polymerase II, ubiquitin protein ligase, tryptase-inhibitor etc.) can now be analysed with higher confidence. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity.
Zhang, Hui; Kang, Yan-Li; Zhu, Yuan-Yuan; Zhao, Kai-Xia; Liang, Jun-Yu; Ding, Lan; Zhang, Teng-Guo; Zhang, Ji
2017-06-01
Prediction of drug candidates for mutagenicity is a regulatory requirement since mutagenic compounds could pose a toxic risk to humans. The aim of this investigation was to develop a novel prediction model of mutagenicity by using a naïve Bayes classifier. The established model was validated by the internal 5-fold cross validation and external test sets. For comparison, the recursive partitioning classifier prediction model was also established and other various reported prediction models of mutagenicity were collected. Among these methods, the prediction performance of naïve Bayes classifier established here displayed very well and stable, which yielded average overall prediction accuracies for the internal 5-fold cross validation of the training set and external test set I set were 89.1±0.4% and 77.3±1.5%, respectively. The concordance of the external test set II with 446 marketed drugs was 90.9±0.3%. In addition, four simple molecular descriptors (e.g., Apol, No. of H donors, Num-Rings and Wiener) related to mutagenicity and five representative substructures of mutagens (e.g., aromatic nitro, hydroxyl amine, nitroso, aromatic amine and N-methyl-N-methylenemethanaminum) produced by ECFP_14 fingerprints were identified. We hope the established naïve Bayes prediction model can be applied to risk assessment processes; and the obtained important information of mutagenic chemicals can guide the design of chemical libraries for hit and lead optimization. Copyright © 2017 Elsevier B.V. All rights reserved.
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
Pareek, Gyan; Acharya, U Rajendra; Sree, S Vinitha; Swapna, G; Yantri, Ratna; Martis, Roshan Joy; Saba, Luca; Krishnamurthi, Ganapathy; Mallarini, Giorgio; El-Baz, Ayman; Al Ekish, Shadi; Beland, Michael; Suri, Jasjit S
2013-12-01
In this work, we have proposed an on-line computer-aided diagnostic system called "UroImage" that classifies a Transrectal Ultrasound (TRUS) image into cancerous or non-cancerous with the help of non-linear Higher Order Spectra (HOS) features and Discrete Wavelet Transform (DWT) coefficients. The UroImage system consists of an on-line system where five significant features (one DWT-based feature and four HOS-based features) are extracted from the test image. These on-line features are transformed by the classifier parameters obtained using the training dataset to determine the class. We trained and tested six classifiers. The dataset used for evaluation had 144 TRUS images which were split into training and testing sets. Three-fold and ten-fold cross-validation protocols were adopted for training and estimating the accuracy of the classifiers. The ground truth used for training was obtained using the biopsy results. Among the six classifiers, using 10-fold cross-validation technique, Support Vector Machine and Fuzzy Sugeno classifiers presented the best classification accuracy of 97.9% with equally high values for sensitivity, specificity and positive predictive value. Our proposed automated system, which achieved more than 95% values for all the performance measures, can be an adjunct tool to provide an initial diagnosis for the identification of patients with prostate cancer. The technique, however, is limited by the limitations of 2D ultrasound guided biopsy, and we intend to improve our technique by using 3D TRUS images in the future.
Breast cancer detection via Hu moment invariant and feedforward neural network
NASA Astrophysics Data System (ADS)
Zhang, Xiaowei; Yang, Jiquan; Nguyen, Elijah
2018-04-01
One of eight women can get breast cancer during all her life. This study used Hu moment invariant and feedforward neural network to diagnose breast cancer. With the help of K-fold cross validation, we can test the out-of-sample accuracy of our method. Finally, we found that our methods can improve the accuracy of detecting breast cancer and reduce the difficulty of judging.
Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.
Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi
2017-09-22
DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.
Porrata, Luis F; Inwards, David J; Ansell, Stephen M; Micallef, Ivana N; Johnston, Patrick B; Hogan, William J; Markovic, Svetomir N
2015-07-03
The infused autograft lymphocyte-to-monocyte ratio (A-LMR) is a prognostic factor for survival in B-cell lymphomas post-autologous peripheral hematopoietic stem cell transplantation (APHSCT). Thus, we set out to investigate if the A-LMR is also a prognostic factor for survival post-APHSCT in T-cell lymphomas. From 1998 to 2014, 109 T-cell lymphoma patients that underwent APHSCT were studied. Receiver operating characteristic (ROC) and area under the curve (AUC) were used to identify the optimal cut-off value of A-LMR for survival analysis and k-fold cross-validation model to validate the A-LMR cut-off value. Univariate and multivariate Cox proportional hazard models were used to assess the prognostic discriminator power of A-LMR. ROC and AUC identified an A-LMR ≥ 1 as the best cut-off value and was validated by k-fold cross-validation. Multivariate analysis showed A-LMR to be an independent prognostic factor for overall survival (OS) and progression-free survival (PFS). Patients with an A-LMR ≥ 1.0 experienced a superior OS and PFS versus patients with an A-LMR < 1.0 [median OS was not reached vs 17.9 months, 5-year OS rates of 87% (95% confidence interval (CI), 75-94%) vs 26% (95% CI, 13-42%), p < 0.0001; median PFS was not reached vs 11.9 months, 5-year PFS rates of 72% (95% CI, 58-83%) vs 16% (95% CI, 6-32%), p < 0.0001]. A-LMR is also a prognostic factor for clinical outcomes in patients with T-cell lymphomas undergoing APHSCT.
Gu, Xiang; Liu, Cong-Jian; Wei, Jian-Jie
2017-11-13
Given that the pathogenesis of ankylosing spondylitis (AS) remains unclear, the aim of this study was to detect the potentially functional pathway cross-talk in AS to further reveal the pathogenesis of this disease. Using microarray profile of AS and biological pathways as study objects, Monte Carlo cross-validation method was used to identify the significant pathway cross-talks. In the process of Monte Carlo cross-validation, all steps were iterated 50 times. For each run, detection of differentially expressed genes (DEGs) between two groups was conducted. The extraction of the potential disrupted pathways enriched by DEGs was then implemented. Subsequently, we established a discriminating score (DS) for each pathway pair according to the distribution of gene expression levels. After that, we utilized random forest (RF) classification model to screen out the top 10 paired pathways with the highest area under the curve (AUCs), which was computed using 10-fold cross-validation approach. After 50 bootstrap, the best pairs of pathways were identified. According to their AUC values, the pair of pathways, antigen presentation pathway and fMLP signaling in neutrophils, achieved the best AUC value of 1.000, which indicated that this pathway cross-talk could distinguish AS patients from normal subjects. Moreover, the paired pathways of SAPK/JNK signaling and mitochondrial dysfunction were involved in 5 bootstraps. Two paired pathways (antigen presentation pathway and fMLP signaling in neutrophil, as well as SAPK/JNK signaling and mitochondrial dysfunction) can accurately distinguish AS and control samples. These paired pathways may be helpful to identify patients with AS for early intervention.
A new fold-cross metal mesh filter for suppressing side lobe leakage in terahertz region
NASA Astrophysics Data System (ADS)
Lu, Changgui; Qi, Zhengqing; Guo, Wengao; Cui, Yiping
2018-04-01
In this paper we propose a new type of fold-cross metal mesh band pass filter, which keeps diffraction side lobe far away from the main transmission peak and shows much better side lobe suppression. Both experimental and theoretical studies are made to analyze the mechanism of side lobe. Compared to the traditional cross filter, the fold-cross filter has a much lower side lobe with almost the same central frequency, bandwidth and highest transmission about 98%. Using the photolithography and electroplating techniques, we experimentally extend the distance between the main peak and diffraction side lobe to larger than 1 THz for the fold-cross filter, which is two times larger than the cross filter while maintaining the main peak transmissions of 89% at 1.25 THz for the two structures. This type of single layer substrate-free fold-cross metal structure shows better design flexibility and structure reliability with the introduction of fold arms for metal mesh band pass filters.
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
NASA Astrophysics Data System (ADS)
Pizzati, Mattia; Cavozzi, Cristian; Magistroni, Corrado; Storti, Fabrizio
2016-04-01
Fracture density pattern predictions with low uncertainty is a fundamental issue for constraining fluid flow pathways in thrust-related anticlines in the frontal parts of thrust-and-fold belts and accretionary prisms, which can also provide plays for hydrocarbon exploration and development. Among the drivers that concur to determine the distribution of fractures in fold-and-thrust-belts, the complex kinematic pathways of folded structures play a key role. In areas with scarce and not reliable underground information, analogue modelling can provide effective support for developing and validating reliable hypotheses on structural architectures and their evolution. In this contribution, we propose a working method that combines analogue and numerical modelling. We deformed a sand-silicone multilayer to eventually produce a non-cylindrical thrust-related anticline at the wedge toe, which was our test geological structure at the reservoir scale. We cut 60 serial cross-sections through the central part of the deformed model to analyze faults and folds geometry using dedicated software (3D Move). The cross-sections were also used to reconstruct the 3D geometry of reference surfaces that compose the mechanical stratigraphy thanks to the use of the software GoCad. From the 3D model of the experimental anticline, by using 3D Move it was possible to calculate the cumulative stress and strain underwent by the deformed reference layers at the end of the deformation and also in incremental steps of fold growth. Based on these model outputs it was also possible to predict the orientation of three main fractures sets (joints and conjugate shear fractures) and their occurrence and density on model surfaces. The next step was the upscaling of the fracture network to the entire digital model volume, to create DFNs.
Ang, Rebecca P; Chong, Wan Har; Huan, Vivien S; Yeo, Lay See
2007-01-01
This article reports the development and initial validation of scores obtained from the Adolescent Concerns Measure (ACM), a scale which assesses concerns of Asian adolescent students. In Study 1, findings from exploratory factor analysis using 619 adolescents suggested a 24-item scale with four correlated factors--Family Concerns (9 items), Peer Concerns (5 items), Personal Concerns (6 items), and School Concerns (4 items). Initial estimates of convergent validity for ACM scores were also reported. The four-factor structure of ACM scores derived from Study 1 was confirmed via confirmatory factor analysis in Study 2 using a two-fold cross-validation procedure with a separate sample of 811 adolescents. Support was found for both the multidimensional and hierarchical models of adolescent concerns using the ACM. Internal consistency and test-retest reliability estimates were adequate for research purposes. ACM scores show promise as a reliable and potentially valid measure of Asian adolescents' concerns.
Athas, Jasmin C; Nguyen, Catherine P; Kummar, Shailaa; Raghavan, Srinivasa R
2018-04-04
The spontaneous folding of flat gel films into tubes is an interesting example of self-assembly. Typically, a rectangular film folds along its short axis when forming a tube; folding along the long axis has been seen only in rare instances when the film is constrained. Here, we report a case where the same free-swelling gel film folds along either its long or short axis depending on the concentration of a solute. Our gels are sandwiches (bilayers) of two layers: a passive layer of cross-linked N,N'-dimethylyacrylamide (DMAA) and an active layer of cross-linked DMAA that also contains chains of the biopolymer alginate. Multivalent cations like Ca2+ and Cu2+ induce these bilayer gels to fold into tubes. The folding occurs instantly when a flat film of the gel is introduced into a solution of these cations. The likely cause for folding is that the active layer stiffens and shrinks (because the alginate chains in it get cross-linked by the cations) whereas the passive layer is unaffected. The resulting mismatch in swelling degree between the two layers creates internal stresses that drive folding. Cations that are incapable of cross-linking alginate, such as Na+ and Mg2+, do not induce gel folding. Moreover, the striking aspect is the direction of folding. When the Ca2+ concentration is high (100 mM or higher), the gels fold along their long axis, whereas when the Ca2+ concentration is low (40 to 80 mM), the gels fold along their short axis. We hypothesize that the folding axis is dictated by the inhomogeneous nature of alginate-cation cross-linking, i.e., that the edges get cross-linked before the faces of the gel. At high Ca2+ concentration, the stiffer edges constrain the folding; in turn, the gel folds such that the longer edges are deformed less, which explains the folding along the long axis. At low Ca2+ concentration, the edges and the faces of the gel are more similar in their degree of cross-linking; therefore, the gel folds along its short axis. An analogy can be made to natural structures (such as leaves and seed pods) where stiff elements provide the directionality for folding.
Mirage: a visible signature evaluation tool
NASA Astrophysics Data System (ADS)
Culpepper, Joanne B.; Meehan, Alaster J.; Shao, Q. T.; Richards, Noel
2017-10-01
This paper presents the Mirage visible signature evaluation tool, designed to provide a visible signature evaluation capability that will appropriately reflect the effect of scene content on the detectability of targets, providing a capability to assess visible signatures in the context of the environment. Mirage is based on a parametric evaluation of input images, assessing the value of a range of image metrics and combining them using the boosted decision tree machine learning method to produce target detectability estimates. It has been developed using experimental data from photosimulation experiments, where human observers search for vehicle targets in a variety of digital images. The images used for tool development are synthetic (computer generated) images, showing vehicles in many different scenes and exhibiting a wide variation in scene content. A preliminary validation has been performed using k-fold cross validation, where 90% of the image data set was used for training and 10% of the image data set was used for testing. The results of the k-fold validation from 200 independent tests show a prediction accuracy between Mirage predictions of detection probability and observed probability of detection of r(262) = 0:63, p < 0:0001 (Pearson correlation) and a MAE = 0:21 (mean absolute error).
Jovov, Biljana; Araujo-Perez, Felix; Sigel, Carlie S; Stratford, Jeran K; McCoy, Amber N; Yeh, Jen Jen; Keku, Temitope
2012-01-01
The incidence and mortality of colorectal cancer (CRC) is higher in African Americans (AAs) than other ethnic groups in the U. S., but reasons for the disparities are unknown. We performed gene expression profiling of sporadic CRCs from AAs vs. European Americans (EAs) to assess the contribution to CRC disparities. We evaluated the gene expression of 43 AA and 43 EA CRC tumors matched by stage and 40 matching normal colorectal tissues using the Agilent human whole genome 4x44K cDNA arrays. Gene and pathway analyses were performed using Significance Analysis of Microarrays (SAM), Ten-fold cross validation, and Ingenuity Pathway Analysis (IPA). SAM revealed that 95 genes were differentially expressed between AA and EA patients at a false discovery rate of ≤5%. Using IPA we determined that most prominent disease and pathway associations of differentially expressed genes were related to inflammation and immune response. Ten-fold cross validation demonstrated that following 10 genes can predict ethnicity with an accuracy of 94%: CRYBB2, PSPH, ADAL, VSIG10L, C17orf81, ANKRD36B, ZNF835, ARHGAP6, TRNT1 and WDR8. Expression of these 10 genes was validated by qRT-PCR in an independent test set of 28 patients (10 AA, 18 EA). Our results are the first to implicate differential gene expression in CRC racial disparities and indicate prominent difference in CRC inflammation between AA and EA patients. Differences in susceptibility to inflammation support the existence of distinct tumor microenvironments in these two patient populations.
Jovov, Biljana; Araujo-Perez, Felix; Sigel, Carlie S.; Stratford, Jeran K.; McCoy, Amber N.; Yeh, Jen Jen; Keku, Temitope
2012-01-01
The incidence and mortality of colorectal cancer (CRC) is higher in African Americans (AAs) than other ethnic groups in the U. S., but reasons for the disparities are unknown. We performed gene expression profiling of sporadic CRCs from AAs vs. European Americans (EAs) to assess the contribution to CRC disparities. We evaluated the gene expression of 43 AA and 43 EA CRC tumors matched by stage and 40 matching normal colorectal tissues using the Agilent human whole genome 4x44K cDNA arrays. Gene and pathway analyses were performed using Significance Analysis of Microarrays (SAM), Ten-fold cross validation, and Ingenuity Pathway Analysis (IPA). SAM revealed that 95 genes were differentially expressed between AA and EA patients at a false discovery rate of ≤5%. Using IPA we determined that most prominent disease and pathway associations of differentially expressed genes were related to inflammation and immune response. Ten-fold cross validation demonstrated that following 10 genes can predict ethnicity with an accuracy of 94%: CRYBB2, PSPH, ADAL, VSIG10L, C17orf81, ANKRD36B, ZNF835, ARHGAP6, TRNT1 and WDR8. Expression of these 10 genes was validated by qRT-PCR in an independent test set of 28 patients (10 AA, 18 EA). Our results are the first to implicate differential gene expression in CRC racial disparities and indicate prominent difference in CRC inflammation between AA and EA patients. Differences in susceptibility to inflammation support the existence of distinct tumor microenvironments in these two patient populations. PMID:22276153
GWAS-based machine learning approach to predict duloxetine response in major depressive disorder.
Maciukiewicz, Malgorzata; Marshe, Victoria S; Hauschild, Anne-Christin; Foster, Jane A; Rotzinger, Susan; Kennedy, James L; Kennedy, Sidney H; Müller, Daniel J; Geraci, Joseph
2018-04-01
Major depressive disorder (MDD) is one of the most prevalent psychiatric disorders and is commonly treated with antidepressant drugs. However, large variability is observed in terms of response to antidepressants. Machine learning (ML) models may be useful to predict treatment outcomes. A sample of 186 MDD patients received treatment with duloxetine for up to 8 weeks were categorized as "responders" based on a MADRS change >50% from baseline; or "remitters" based on a MADRS score ≤10 at end point. The initial dataset (N = 186) was randomly divided into training and test sets in a nested 5-fold cross-validation, where 80% was used as a training set and 20% made up five independent test sets. We performed genome-wide logistic regression to identify potentially significant variants related to duloxetine response/remission and extracted the most promising predictors using LASSO regression. Subsequently, classification-regression trees (CRT) and support vector machines (SVM) were applied to construct models, using ten-fold cross-validation. With regards to response, none of the pairs performed significantly better than chance (accuracy p > .1). For remission, SVM achieved moderate performance with an accuracy = 0.52, a sensitivity = 0.58, and a specificity = 0.46, and 0.51 for all coefficients for CRT. The best performing SVM fold was characterized by an accuracy = 0.66 (p = .071), sensitivity = 0.70 and a sensitivity = 0.61. In this study, the potential of using GWAS data to predict duloxetine outcomes was examined using ML models. The models were characterized by a promising sensitivity, but specificity remained moderate at best. The inclusion of additional non-genetic variables to create integrated models may improve prediction. Copyright © 2017. Published by Elsevier Ltd.
Hierarchical Recognition Scheme for Human Facial Expression Recognition Systems
Siddiqi, Muhammad Hameed; Lee, Sungyoung; Lee, Young-Koo; Khan, Adil Mehmood; Truc, Phan Tran Ho
2013-01-01
Over the last decade, human facial expressions recognition (FER) has emerged as an important research area. Several factors make FER a challenging research problem. These include varying light conditions in training and test images; need for automatic and accurate face detection before feature extraction; and high similarity among different expressions that makes it difficult to distinguish these expressions with a high accuracy. This work implements a hierarchical linear discriminant analysis-based facial expressions recognition (HL-FER) system to tackle these problems. Unlike the previous systems, the HL-FER uses a pre-processing step to eliminate light effects, incorporates a new automatic face detection scheme, employs methods to extract both global and local features, and utilizes a HL-FER to overcome the problem of high similarity among different expressions. Unlike most of the previous works that were evaluated using a single dataset, the performance of the HL-FER is assessed using three publicly available datasets under three different experimental settings: n-fold cross validation based on subjects for each dataset separately; n-fold cross validation rule based on datasets; and, finally, a last set of experiments to assess the effectiveness of each module of the HL-FER separately. Weighted average recognition accuracy of 98.7% across three different datasets, using three classifiers, indicates the success of employing the HL-FER for human FER. PMID:24316568
Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming
2014-12-01
Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Using support vector machine to predict beta- and gamma-turns in proteins.
Hu, Xiuzhen; Li, Qianzhong
2008-09-01
By using the composite vector with increment of diversity, position conservation scoring function, and predictive secondary structures to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta- and gamma-turns in the proteins is proposed. The 426 and 320 nonhomologous protein chains described by Guruprasad and Rajkumar (Guruprasad and Rajkumar J. Biosci 2000, 25,143) are used for training and testing the predictive model of the beta- and gamma-turns, respectively. The overall prediction accuracy and the Matthews correlation coefficient in 7-fold cross-validation are 79.8% and 0.47, respectively, for the beta-turns. The overall prediction accuracy in 5-fold cross-validation is 61.0% for the gamma-turns. These results are significantly higher than the other algorithms in the prediction of beta- and gamma-turns using the same datasets. In addition, the 547 and 823 nonhomologous protein chains described by Fuchs and Alix (Fuchs and Alix Proteins: Struct Funct Bioinform 2005, 59, 828) are used for training and testing the predictive model of the beta- and gamma-turns, and better results are obtained. This algorithm may be helpful to improve the performance of protein turns' prediction. To ensure the ability of the SVM method to correctly classify beta-turn and non-beta-turn (gamma-turn and non-gamma-turn), the receiver operating characteristic threshold independent measure curves are provided. (c) 2008 Wiley Periodicals, Inc.
Stocco, G; Cipolat-Gotet, C; Bonfatti, V; Schiavon, S; Bittante, G; Cecchinato, A
2016-11-01
The aims of this study were (1) to assess variability in the major mineral components of buffalo milk, (2) to estimate the effect of certain environmental sources of variation on the major minerals during lactation, and (3) to investigate the possibility of using Fourier-transform infrared (FTIR) spectroscopy as an indirect, noninvasive tool for routine prediction of the mineral content of buffalo milk. A total of 173 buffaloes reared in 5 herds were sampled once during the morning milking. Milk samples were analyzed for Ca, P, K, and Mg contents within 3h of sample collection using inductively coupled plasma optical emission spectrometry. A Milkoscan FT2 (Foss, Hillerød, Denmark) was used to acquire milk spectra over the spectral range from 5,000 to 900 wavenumber/cm. Prediction models were built using a partial least square approach, and cross-validation was used to assess the prediction accuracy of FTIR. Prediction models were validated using a 4-fold random cross-validation, thus dividing the calibration-test set in 4 folds, using one of them to check the results (prediction models) and the remaining 3 to develop the calibration models. Buffalo milk minerals averaged 162, 117, 86, and 14.4mg/dL of milk for Ca, P, K, and Mg, respectively. Herd and days in milk were the most important sources of variation in the traits investigated. Parity slightly affected only Ca content. Coefficients of determination of cross-validation between the FTIR-predicted and the measured values were 0.71, 0.70, and 0.72 for Ca, Mg, and P, respectively, whereas prediction accuracy was lower for K (0.55). Our findings reveal FTIR to be an unsuitable tool when milk mineral content needs to be predicted with high accuracy. Predictions may play a role as indicator traits in selective breeding (if the additive genetic correlation between FTIR predictions and measures of milk minerals is high enough) or in monitoring the milk of buffalo populations for dairy industry purposes. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Exploring Mouse Protein Function via Multiple Approaches.
Huang, Guohua; Chu, Chen; Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning; Cai, Yu-Dong
2016-01-01
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.
Exploring Mouse Protein Function via Multiple Approaches
Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning
2016-01-01
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. PMID:27846315
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niedzielski, Joshua S., E-mail: jsniedzielski@mdanderson.org; University of Texas Houston Graduate School of Biomedical Science, Houston, Texas; Yang, Jinzhong
Purpose: We sought to investigate the ability of mid-treatment {sup 18}F-fluorodeoxyglucose positron emission tomography (PET) studies to objectively and spatially quantify esophageal injury in vivo from radiation therapy for non-small cell lung cancer. Methods and Materials: This retrospective study was approved by the local institutional review board, with written informed consent obtained before enrollment. We normalized {sup 18}F-fluorodeoxyglucose PET uptake to each patient's low-irradiated region (<5 Gy) of the esophagus, as a radiation response measure. Spatially localized metrics of normalized uptake (normalized standard uptake value [nSUV]) were derived for 79 patients undergoing concurrent chemoradiation therapy for non-small cell lung cancer. We usedmore » nSUV metrics to classify esophagitis grade at the time of the PET study, as well as maximum severity by treatment completion, according to National Cancer Institute Common Terminology Criteria for Adverse Events, using multivariate least absolute shrinkage and selection operator (LASSO) logistic regression and repeated 3-fold cross validation (training, validation, and test folds). This 3-fold cross-validation LASSO model procedure was used to predict toxicity progression from 43 asymptomatic patients during the PET study. Dose-volume metrics were also tested in both the multivariate classification and the symptom progression prediction analyses. Classification performance was quantified with the area under the curve (AUC) from receiver operating characteristic analysis on the test set from the 3-fold analyses. Results: Statistical analysis showed increasing nSUV is related to esophagitis severity. Axial-averaged maximum nSUV for 1 esophageal slice and esophageal length with at least 40% of axial-averaged nSUV both had AUCs of 0.85 for classifying grade 2 or higher esophagitis at the time of the PET study and AUCs of 0.91 and 0.92, respectively, for maximum grade 2 or higher by treatment completion. Symptom progression was predicted with an AUC of 0.75. Dose metrics performed poorly at classifying esophagitis (AUC of 0.52, grade 2 or higher mid treatment) or predicting symptom progression (AUC of 0.67). Conclusions: Normalized uptake can objectively, locally, and noninvasively quantify esophagitis during radiation therapy and predict eventual symptoms from asymptomatic patients. Normalized uptake may provide patient-specific dose-response information not discernible from dose.« less
Niedzielski, Joshua S; Yang, Jinzhong; Liao, Zhongxing; Gomez, Daniel R; Stingo, Francesco; Mohan, Radhe; Martel, Mary K; Briere, Tina M; Court, Laurence E
2016-11-01
We sought to investigate the ability of mid-treatment (18)F-fluorodeoxyglucose positron emission tomography (PET) studies to objectively and spatially quantify esophageal injury in vivo from radiation therapy for non-small cell lung cancer. This retrospective study was approved by the local institutional review board, with written informed consent obtained before enrollment. We normalized (18)F-fluorodeoxyglucose PET uptake to each patient's low-irradiated region (<5 Gy) of the esophagus, as a radiation response measure. Spatially localized metrics of normalized uptake (normalized standard uptake value [nSUV]) were derived for 79 patients undergoing concurrent chemoradiation therapy for non-small cell lung cancer. We used nSUV metrics to classify esophagitis grade at the time of the PET study, as well as maximum severity by treatment completion, according to National Cancer Institute Common Terminology Criteria for Adverse Events, using multivariate least absolute shrinkage and selection operator (LASSO) logistic regression and repeated 3-fold cross validation (training, validation, and test folds). This 3-fold cross-validation LASSO model procedure was used to predict toxicity progression from 43 asymptomatic patients during the PET study. Dose-volume metrics were also tested in both the multivariate classification and the symptom progression prediction analyses. Classification performance was quantified with the area under the curve (AUC) from receiver operating characteristic analysis on the test set from the 3-fold analyses. Statistical analysis showed increasing nSUV is related to esophagitis severity. Axial-averaged maximum nSUV for 1 esophageal slice and esophageal length with at least 40% of axial-averaged nSUV both had AUCs of 0.85 for classifying grade 2 or higher esophagitis at the time of the PET study and AUCs of 0.91 and 0.92, respectively, for maximum grade 2 or higher by treatment completion. Symptom progression was predicted with an AUC of 0.75. Dose metrics performed poorly at classifying esophagitis (AUC of 0.52, grade 2 or higher mid treatment) or predicting symptom progression (AUC of 0.67). Normalized uptake can objectively, locally, and noninvasively quantify esophagitis during radiation therapy and predict eventual symptoms from asymptomatic patients. Normalized uptake may provide patient-specific dose-response information not discernible from dose. Copyright © 2016 Elsevier Inc. All rights reserved.
Sharma, Ram C; Hara, Keitarou; Hirayama, Hidetake
2017-01-01
This paper presents the performance and evaluation of a number of machine learning classifiers for the discrimination between the vegetation physiognomic classes using the satellite based time-series of the surface reflectance data. Discrimination of six vegetation physiognomic classes, Evergreen Coniferous Forest, Evergreen Broadleaf Forest, Deciduous Coniferous Forest, Deciduous Broadleaf Forest, Shrubs, and Herbs, was dealt with in the research. Rich-feature data were prepared from time-series of the satellite data for the discrimination and cross-validation of the vegetation physiognomic types using machine learning approach. A set of machine learning experiments comprised of a number of supervised classifiers with different model parameters was conducted to assess how the discrimination of vegetation physiognomic classes varies with classifiers, input features, and ground truth data size. The performance of each experiment was evaluated by using the 10-fold cross-validation method. Experiment using the Random Forests classifier provided highest overall accuracy (0.81) and kappa coefficient (0.78). However, accuracy metrics did not vary much with experiments. Accuracy metrics were found to be very sensitive to input features and size of ground truth data. The results obtained in the research are expected to be useful for improving the vegetation physiognomic mapping in Japan.
KINKFOLD—an AutoLISP program for construction of geological cross-sections using borehole image data
NASA Astrophysics Data System (ADS)
Özkaya, Sait Ismail
2002-04-01
KINKFOLD is an AutoLISP program designed to construct geological cross-sections from borehole image or dip meter logs. The program uses the kink-fold method for cross-section construction. Beds are folded around hinge lines as angle bisectors so that bedding thickness remains unchanged. KINKFOLD may be used to model a wide variety of parallel fold structures, including overturned and faulted folds, and folds truncated by unconformities. The program accepts data from vertical or inclined boreholes. The KINKFOLD program cannot be used to model fault drag, growth folds, inversion structures or disharmonic folds where the bed thickness changes either because of deformation or deposition. Faulted structures and similar folds can be modelled by KINKFOLD by omitting dip measurements within fault drag zones and near axial planes of similar folds.
Wang, Sheng-Yin; Zhou, Xian-Hong; Zhang, An-Sheng; Li, Li-Li; Men, Xing-Yuan; Zhang, Si-Cong; Liu, Yong-Jie; Yu, Yi
2012-07-01
To understand the resistance risks of Frankliniella occidentalis Pergande against phoxim, this paper studied the resistance mechanisms of phoxim-resistant F. occidentalis population against phoxim and the cross-resistance of the population against other insecticides. The phoxim-resistant population had medium level cross-resistance to chlorpyrifos, lambda-cyhalothrin, and methomyl, low level cross-resistance to chlorfenapyr, imidacloprid, emamectin-benzoate, and spinosad, but no cross-resistance to acetamiprid and abamectin. The synergists piperonyl butoxide (PBO), s, s, s-tributyl phosphorotrithioate (DEF), and triphenyl phosphate (TPP) had significant synergism (P < 0.05) on the toxicity of phoxim to the resistant (XK), field (BJ), and susceptible (S) populations, while diethyl maleate (DEM) had no significant synergism to XK and S populations but had significant synergism to BJ population. As compared with S population, the XK and BJ populations had significantly increased activities of mixed-functional oxidases P450 (2.79-fold and 1.48-fold), b, (2.88-fold and 1.88-fold), O-demethylase (2.60-fold and 1.68-fold), and carboxylesterase (2.02-fold and 1.61-fold, respectively), and XK population had a significantly increased acetylcholine esterase activity (3.10-fold). Both XK and BJ population had an increased activity of glutathione S-transferases (1.11-fold and 1.20-fold, respectively), but the increment was not significant. The increased detoxification enzymes activities in F. occidentalis could play an important role in the resistance of the plant against phoxim.
NASA Astrophysics Data System (ADS)
Pei, Yangwen; Paton, Douglas A.; Wu, Kongyou; Xie, Liujuan
2017-08-01
The application of trishear algorithm, in which deformation occurs in a triangle zone in front of a propagating fault tip, is often used to understand fault related folding. In comparison to kink-band methods, a key characteristic of trishear algorithm is that non-uniform deformation within the triangle zone allows the layer thickness and horizon length to change during deformation, which is commonly observed in natural structures. An example from the Lenghu5 fold-and-thrust belt (Qaidam Basin, Northern Tibetan Plateau) is interpreted to help understand how to employ trishear forward modelling to improve the accuracy of seismic interpretation. High resolution fieldwork data, including high-angle dips, 'dragging structures', thinning hanging-wall and thickening footwall, are used to determined best-fit trishear model to explain the deformation happened to the Lenghu5 fold-and-thrust belt. We also consider the factors that increase the complexity of trishear models, including: (a) fault-dip changes and (b) pre-existing faults. We integrate fault dip change and pre-existing faults to predict subsurface structures that are apparently under seismic resolution. The analogue analysis by trishear models indicates that the Lenghu5 fold-and-thrust belt is controlled by an upward-steepening reverse fault above a pre-existing opposite-thrusting fault in deeper subsurface. The validity of the trishear model is confirmed by the high accordance between the model and the high-resolution fieldwork. The validated trishear forward model provides geometric constraints to the faults and horizons in the seismic section, e.g., fault cutoffs and fault tip position, faults' intersecting relationship and horizon/fault cross-cutting relationship. The subsurface prediction using trishear algorithm can significantly increase the accuracy of seismic interpretation, particularly in seismic sections with low signal/noise ratio.
Cross-modal face recognition using multi-matcher face scores
NASA Astrophysics Data System (ADS)
Zheng, Yufeng; Blasch, Erik
2015-05-01
The performance of face recognition can be improved using information fusion of multimodal images and/or multiple algorithms. When multimodal face images are available, cross-modal recognition is meaningful for security and surveillance applications. For example, a probe face is a thermal image (especially at nighttime), while only visible face images are available in the gallery database. Matching a thermal probe face onto the visible gallery faces requires crossmodal matching approaches. A few such studies were implemented in facial feature space with medium recognition performance. In this paper, we propose a cross-modal recognition approach, where multimodal faces are cross-matched in feature space and the recognition performance is enhanced with stereo fusion at image, feature and/or score level. In the proposed scenario, there are two cameras for stereo imaging, two face imagers (visible and thermal images) in each camera, and three recognition algorithms (circular Gaussian filter, face pattern byte, linear discriminant analysis). A score vector is formed with three cross-matched face scores from the aforementioned three algorithms. A classifier (e.g., k-nearest neighbor, support vector machine, binomial logical regression [BLR]) is trained then tested with the score vectors by using 10-fold cross validations. The proposed approach was validated with a multispectral stereo face dataset from 105 subjects. Our experiments show very promising results: ACR (accuracy rate) = 97.84%, FAR (false accept rate) = 0.84% when cross-matching the fused thermal faces onto the fused visible faces by using three face scores and the BLR classifier.
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luo, Heng; Ye, Hao; Ng, Hui Wen
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides
Luo, Heng; Ye, Hao; Ng, Hui Wen; Sakkiah, Sugunadevi; Mendrick, Donna L.; Hong, Huixiao
2016-01-01
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. This algorithm can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system. PMID:27558848
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides
Luo, Heng; Ye, Hao; Ng, Hui Wen; ...
2016-08-25
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less
NASA Astrophysics Data System (ADS)
Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin
2012-04-01
Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.
Generative Topographic Mapping of Conformational Space.
Horvath, Dragos; Baskin, Igor; Marcou, Gilles; Varnek, Alexandre
2017-10-01
Herein, Generative Topographic Mapping (GTM) was challenged to produce planar projections of the high-dimensional conformational space of complex molecules (the 1LE1 peptide). GTM is a probability-based mapping strategy, and its capacity to support property prediction models serves to objectively assess map quality (in terms of regression statistics). The properties to predict were total, non-bonded and contact energies, surface area and fingerprint darkness. Map building and selection was controlled by a previously introduced evolutionary strategy allowed to choose the best-suited conformational descriptors, options including classical terms and novel atom-centric autocorrellograms. The latter condensate interatomic distance patterns into descriptors of rather low dimensionality, yet precise enough to differentiate between close favorable contacts and atom clashes. A subset of 20 K conformers of the 1LE1 peptide, randomly selected from a pool of 2 M geometries (generated by the S4MPLE tool) was employed for map building and cross-validation of property regression models. The GTM build-up challenge reached robust three-fold cross-validated determination coefficients of Q 2 =0.7…0.8, for all modeled properties. Mapping of the full 2 M conformer set produced intuitive and information-rich property landscapes. Functional and folding subspaces appear as well-separated zones, even though RMSD with respect to the PDB structure was never used as a selection criterion of the maps. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Predicting drug-induced liver injury in human with Naïve Bayes classifier approach.
Zhang, Hui; Ding, Lan; Zou, Yi; Hu, Shui-Qing; Huang, Hai-Guo; Kong, Wei-Bao; Zhang, Ji
2016-10-01
Drug-induced liver injury (DILI) is one of the major safety concerns in drug development. Although various toxicological studies assessing DILI risk have been developed, these methods were not sufficient in predicting DILI in humans. Thus, developing new tools and approaches to better predict DILI risk in humans has become an important and urgent task. In this study, we aimed to develop a computational model for assessment of the DILI risk with using a larger scale human dataset and Naïve Bayes classifier. The established Naïve Bayes prediction model was evaluated by 5-fold cross validation and an external test set. For the training set, the overall prediction accuracy of the 5-fold cross validation was 94.0 %. The sensitivity, specificity, positive predictive value and negative predictive value were 97.1, 89.2, 93.5 and 95.1 %, respectively. The test set with the concordance of 72.6 %, sensitivity of 72.5 %, specificity of 72.7 %, positive predictive value of 80.4 %, negative predictive value of 63.2 %. Furthermore, some important molecular descriptors related to DILI risk and some toxic/non-toxic fragments were identified. Thus, we hope the prediction model established here would be employed for the assessment of human DILI risk, and the obtained molecular descriptors and substructures should be taken into consideration in the design of new candidate compounds to help medicinal chemists rationally select the chemicals with the best prospects to be effective and safe.
The Vocal Cord Dysfunction Questionnaire: Validity and Reliability of the Persian Version.
Ghaemi, Hamide; Khoddami, Seyyedeh Maryam; Soleymani, Zahra; Zandieh, Fariborz; Jalaie, Shohreh; Ahanchian, Hamid; Khadivi, Ehsan
2017-12-25
The aim of this study was to develop, validate, and assess the reliability of the Persian version of Vocal Cord Dysfunction Questionnaire (VCDQ P ). The study design was cross-sectional or cultural survey. Forty-four patients with vocal fold dysfunction (VFD) and 40 healthy volunteers were recruited for the study. To assess the content validity, the prefinal questions were given to 15 experts to comment on its essential. Ten patients with VFD rated the importance of VCDQ P in detecting face validity. Eighteen of the patients with VFD completed the VCDQ 1 week later for test-retest reliability. To detect absolute reliability, standard error of measurement and smallest detected change were calculated. Concurrent validity was assessed by completing the Persian Chronic Obstructive Pulmonary Disease (COPD) Assessment Test (CAT) by 34 patients with VFD. Discriminant validity was measured from 34 participants. The VCDQ was further validated by administering the questionnaire to 40 healthy volunteers. Validation of the VCDQ as a treatment outcome tool was conducted in 18 patients with VFD using pre- and posttreatment scores. The internal consistency was confirmed (Cronbach α = 0.78). The test-retest reliability was excellent (intraclass correlation coefficient = 0.97). The standard error of measurement and smallest detected change values were acceptable (0.39 and 1.08, respectively). There was a significant correlation between the VCDQ P and the CAT total scores (P < 0.05). Discriminative validity was significantly different. The VCDQ scores in patients with VFD before and after treatment was significantly different (P < 0.001). The VCDQ was cross-culturally adapted to Persian and demonstrated to be a valid and reliable self-administered questionnaire in Persian-speaking population. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Xi, Jinghui; Pan, Yiou; Bi, Rui; Gao, Xiwu; Chen, Xuewei; Peng, Tianfei; Zhang, Min; Zhang, Hua; Hu, Xiaoyue; Shang, Qingli
2015-02-01
A resistant strain of the Aphis glycines Matsumura (CRR) has developed 76.67-fold resistance to lambda-cyhalothrin compared with the susceptible (CSS) strain. Synergists piperonyl butoxide (PBO), S,S,S-Tributyltrithiophosphate (DEF) and triphenyl phosphate (TPP) dramatically increased the toxicity of lambda-cyhalothrin to the resistant strain. Bioassay results indicated that the CRR strain had developed high levels of cross-resistance to chlorpyrifos (11.66-fold), acephate (8.20-fold), cypermethrin (53.24-fold), esfenvalerate (13.83-fold), cyfluthrin (9.64-fold), carbofuran (14.60-fold), methomyl (9.32-fold) and bifenthrin (4.81-fold), but did not have cross-resistance to chlorfenapyr, imidacloprid, diafenthiuron, abamectin. The transcriptional levels of CYP6A2-like, CYP6A14-like and cytochrome b-c1 complex subunit 9-like increased significantly in the resistant strain than that in the susceptible. Similar trend were observed in the transcripts and DNA copy number of CarE and E4 esterase. Overall, these results demonstrate that increased esterase hydrolysis activity, combined with elevated cytochrome P450 monooxygenase detoxicatication, plays an important role in the high levels of lambda-cyhalothrin resistance and can cause cross-resistance to other insecticides in the CRR strain. Copyright © 2014 Elsevier Inc. All rights reserved.
Riaz, Qaiser; Vögele, Anna; Krüger, Björn; Weber, Andreas
2015-01-01
A number of previous works have shown that information about a subject is encoded in sparse kinematic information, such as the one revealed by so-called point light walkers. With the work at hand, we extend these results to classifications of soft biometrics from inertial sensor recordings at a single body location from a single step. We recorded accelerations and angular velocities of 26 subjects using integrated measurement units (IMUs) attached at four locations (chest, lower back, right wrist and left ankle) when performing standardized gait tasks. The collected data were segmented into individual walking steps. We trained random forest classifiers in order to estimate soft biometrics (gender, age and height). We applied two different validation methods to the process, 10-fold cross-validation and subject-wise cross-validation. For all three classification tasks, we achieve high accuracy values for all four sensor locations. From these results, we can conclude that the data of a single walking step (6D: accelerations and angular velocities) allow for a robust estimation of the gender, height and age of a person. PMID:26703601
Approximate l-fold cross-validation with Least Squares SVM and Kernel Ridge Regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, Richard E; Zhang, Hao; Parker, Lynne Edwards
2013-01-01
Kernel methods have difficulties scaling to large modern data sets. The scalability issues are based on computational and memory requirements for working with a large matrix. These requirements have been addressed over the years by using low-rank kernel approximations or by improving the solvers scalability. However, Least Squares Support VectorMachines (LS-SVM), a popular SVM variant, and Kernel Ridge Regression still have several scalability issues. In particular, the O(n^3) computational complexity for solving a single model, and the overall computational complexity associated with tuning hyperparameters are still major problems. We address these problems by introducing an O(n log n) approximate l-foldmore » cross-validation method that uses a multi-level circulant matrix to approximate the kernel. In addition, we prove our algorithm s computational complexity and present empirical runtimes on data sets with approximately 1 million data points. We also validate our approximate method s effectiveness at selecting hyperparameters on real world and standard benchmark data sets. Lastly, we provide experimental results on using a multi-level circulant kernel approximation to solve LS-SVM problems with hyperparameters selected using our method.« less
An adaptive deep learning approach for PPG-based identification.
Jindal, V; Birjandtalab, J; Pouyan, M Baran; Nourani, M
2016-08-01
Wearable biosensors have become increasingly popular in healthcare due to their capabilities for low cost and long term biosignal monitoring. This paper presents a novel two-stage technique to offer biometric identification using these biosensors through Deep Belief Networks and Restricted Boltzman Machines. Our identification approach improves robustness in current monitoring procedures within clinical, e-health and fitness environments using Photoplethysmography (PPG) signals through deep learning classification models. The approach is tested on TROIKA dataset using 10-fold cross validation and achieved an accuracy of 96.1%.
Chen, Xing; Huang, Yu-An; You, Zhu-Hong; Yan, Gui-Ying; Wang, Xue-Song
2017-03-01
Accumulating clinical observations have indicated that microbes living in the human body are closely associated with a wide range of human noninfectious diseases, which provides promising insights into the complex disease mechanism understanding. Predicting microbe-disease associations could not only boost human disease diagnostic and prognostic, but also improve the new drug development. However, little efforts have been attempted to understand and predict human microbe-disease associations on a large scale until now. In this work, we constructed a microbe-human disease association network and further developed a novel computational model of KATZ measure for Human Microbe-Disease Association prediction (KATZHMDA) based on the assumption that functionally similar microbes tend to have similar interaction and non-interaction patterns with noninfectious diseases, and vice versa. To our knowledge, KATZHMDA is the first tool for microbe-disease association prediction. The reliable prediction performance could be attributed to the use of KATZ measurement, and the introduction of Gaussian interaction profile kernel similarity for microbes and diseases. LOOCV and k-fold cross validation were implemented to evaluate the effectiveness of this novel computational model based on known microbe-disease associations obtained from HMDAD database. As a result, KATZHMDA achieved reliable performance with average AUCs of 0.8130 ± 0.0054, 0.8301 ± 0.0033 and 0.8382 in 2-fold and 5-fold cross validation and LOOCV framework, respectively. It is anticipated that KATZHMDA could be used to obtain more novel microbes associated with important noninfectious human diseases and therefore benefit drug discovery and human medical improvement. Matlab codes and dataset explored in this work are available at http://dwz.cn/4oX5mS . xingchen@amss.ac.cn or zhuhongyou@gmail.com or wangxuesongcumt@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Aggio, Raphael B. M.; de Lacy Costello, Ben; White, Paul; Khalid, Tanzeela; Ratcliffe, Norman M.; Persad, Raj; Probert, Chris S. J.
2016-01-01
Prostate cancer is one of the most common cancers. Serum prostate-specific antigen (PSA) is used to aid the selection of men undergoing biopsies. Its use remains controversial. We propose a GC-sensor algorithm system for classifying urine samples from patients with urological symptoms. This pilot study includes 155 men presenting to urology clinics, 58 were diagnosed with prostate cancer, 24 with bladder cancer and 73 with haematuria and or poor stream, without cancer. Principal component analysis (PCA) was applied to assess the discrimination achieved, while linear discriminant analysis (LDA) and support vector machine (SVM) were used as statistical models for sample classification. Leave-one-out cross-validation (LOOCV), repeated 10-fold cross-validation (10FoldCV), repeated double cross-validation (DoubleCV) and Monte Carlo permutations were applied to assess performance. Significant separation was found between prostate cancer and control samples, bladder cancer and controls and between bladder and prostate cancer samples. For prostate cancer diagnosis, the GC/SVM system classified samples with 95% sensitivity and 96% specificity after LOOCV. For bladder cancer diagnosis, the SVM reported 96% sensitivity and 100% specificity after LOOCV, while the DoubleCV reported 87% sensitivity and 99% specificity, with SVM showing 78% and 98% sensitivity between prostate and bladder cancer samples. Evaluation of the results of the Monte Carlo permutation of class labels obtained chance-like accuracy values around 50% suggesting the observed results for bladder cancer and prostate cancer detection are not due to over fitting. The results of the pilot study presented here indicate that the GC system is able to successfully identify patterns that allow classification of urine samples from patients with urological cancers. An accurate diagnosis based on urine samples would reduce the number of negative prostate biopsies performed, and the frequency of surveillance cystoscopy for bladder cancer patients. Larger cohort studies are planned to investigate the potential of this system. Future work may lead to non-invasive breath analyses for diagnosing urological conditions. PMID:26865331
GIS-aided Statistical Landslide Susceptibility Modeling And Mapping Of Antipolo Rizal (Philippines)
NASA Astrophysics Data System (ADS)
Dumlao, A. J.; Victor, J. A.
2015-09-01
Slope instability associated with heavy rainfall or earthquake is a familiar geotechnical problem in the Philippines. The main objective of this study is to perform a detailed landslide susceptibility assessment of Antipolo City. The statistical method of assessment used was logistic regression. Landslide inventory was done through interpretation of aerial photographs and satellite images with corresponding field verification. In this study, morphologic and non-morphologic factors contributing to landslide occurrence and their corresponding spatial relationships were considered. The analysis of landslide susceptibility was implemented in a Geographic Information System (GIS). The 17320 randomly selected datasets were divided into training and test data sets. K- cross fold validation is done with k= 5. The subsamples are then fitted five times with k-1 training data set and the remaining fold as the validation data set. The AUROC of each model is validated using each corresponding data set. The AUROC of the five models are; 0.978, 0.977, 0.977, 0.974, and 0.979 respectively, implying that the models are effective in correctly predicting the occurrence and nonoccurrence of landslide activity. Field verification was also done. The landslide susceptibility map was then generated from the model. It is classified into four categories; low, moderate, high and very high susceptibility. The study also shows that almost 40% of Antipolo City has been assessed to be potentially dangerous areas in terms of landslide occurrence.
Design and 4D Printing of Cross-Folded Origami Structures: A Preliminary Investigation.
Teoh, Joanne Ee Mei; An, Jia; Feng, Xiaofan; Zhao, Yue; Chua, Chee Kai; Liu, Yong
2018-03-03
In 4D printing research, different types of complex structure folding and unfolding have been investigated. However, research on cross-folding of origami structures (defined as a folding structure with at least two overlapping folds) has not been reported. This research focuses on the investigation of cross-folding structures using multi-material components along different axes and different horizontal hinge thickness with single homogeneous material. Tensile tests were conducted to determine the impact of multi-material components and horizontal hinge thickness. In the case of multi-material structures, the hybrid material composition has a significant impact on the overall maximum strain and Young's modulus properties. In the case of single material structures, the shape recovery speed is inversely proportional to the horizontal hinge thickness, while the flexural or bending strength is proportional to the horizontal hinge thickness. A hinge with a thickness of 0.5 mm could be folded three times prior to fracture whilst a hinge with a thickness of 0.3 mm could be folded only once prior to fracture. A hinge with a thickness of 0.1 mm could not even be folded without cracking. The introduction of a physical hole in the center of the folding/unfolding line provided stress relief and prevented fracture. A complex flower petal shape was used to successfully demonstrate the implementation of overlapping and non-overlapping folding lines using both single material segments and multi-material segments. Design guidelines for establishing cross-folding structures using multi-material components along different axes and different horizontal hinge thicknesses with single or homogeneous material were established. These guidelines can be used to design and implement complex origami structures with overlapping and non-overlapping folding lines. Combined overlapping folding structures could be implemented and allocating specific hole locations in the overall designs could be further explored. In addition, creating a more precise prediction by investigating sets of in between hinge thicknesses and comparing the folding times before fracture, will be the subject of future work.
Cross Validation Through Two-Dimensional Solution Surface for Cost-Sensitive SVM.
Gu, Bin; Sheng, Victor S; Tay, Keng Yeow; Romano, Walter; Li, Shuo
2017-06-01
Model selection plays an important role in cost-sensitive SVM (CS-SVM). It has been proven that the global minimum cross validation (CV) error can be efficiently computed based on the solution path for one parameter learning problems. However, it is a challenge to obtain the global minimum CV error for CS-SVM based on one-dimensional solution path and traditional grid search, because CS-SVM is with two regularization parameters. In this paper, we propose a solution and error surfaces based CV approach (CV-SES). More specifically, we first compute a two-dimensional solution surface for CS-SVM based on a bi-parameter space partition algorithm, which can fit solutions of CS-SVM for all values of both regularization parameters. Then, we compute a two-dimensional validation error surface for each CV fold, which can fit validation errors of CS-SVM for all values of both regularization parameters. Finally, we obtain the CV error surface by superposing K validation error surfaces, which can find the global minimum CV error of CS-SVM. Experiments are conducted on seven datasets for cost sensitive learning and on four datasets for imbalanced learning. Experimental results not only show that our proposed CV-SES has a better generalization ability than CS-SVM with various hybrids between grid search and solution path methods, and than recent proposed cost-sensitive hinge loss SVM with three-dimensional grid search, but also show that CV-SES uses less running time.
NoFold: RNA structure clustering without folding or alignment.
Middleton, Sarah A; Kim, Junhyong
2014-11-01
Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function-for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Bozkurt, Selen; Bostanci, Asli; Turhan, Murat
2017-08-11
The goal of this study is to evaluate the results of machine learning methods for the classification of OSA severity of patients with suspected sleep disorder breathing as normal, mild, moderate and severe based on non-polysomnographic variables: 1) clinical data, 2) symptoms and 3) physical examination. In order to produce classification models for OSA severity, five different machine learning methods (Bayesian network, Decision Tree, Random Forest, Neural Networks and Logistic Regression) were trained while relevant variables and their relationships were derived empirically from observed data. Each model was trained and evaluated using 10-fold cross-validation and to evaluate classification performances of all methods, true positive rate (TPR), false positive rate (FPR), Positive Predictive Value (PPV), F measure and Area Under Receiver Operating Characteristics curve (ROC-AUC) were used. Results of 10-fold cross validated tests with different variable settings promisingly indicated that the OSA severity of suspected OSA patients can be classified, using non-polysomnographic features, with 0.71 true positive rate as the highest and, 0.15 false positive rate as the lowest, respectively. Moreover, the test results of different variables settings revealed that the accuracy of the classification models was significantly improved when physical examination variables were added to the model. Study results showed that machine learning methods can be used to estimate the probabilities of no, mild, moderate, and severe obstructive sleep apnea and such approaches may improve accurate initial OSA screening and help referring only the suspected moderate or severe OSA patients to sleep laboratories for the expensive tests.
Bøe, Tormod; Lundervold, Arvid
2017-01-01
Inattention in childhood is associated with academic problems later in life. The contribution of specific aspects of inattentive behaviour is, however, less known. We investigated feature importance of primary school teachers’ reports on nine aspects of inattentive behaviour, gender and age in predicting future academic achievement. Primary school teachers of n = 2491 children (7–9 years) rated nine items reflecting different aspects of inattentive behaviour in 2002. A mean academic achievement score from the previous semester in high school (2012) was available for each youth from an official school register. All scores were at a categorical level. Feature importances were assessed by using multinominal logistic regression, classification and regression trees analysis, and a random forest algorithm. Finally, a comprehensive pattern classification procedure using k-fold cross-validation was implemented. Overall, inattention was rated as more severe in boys, who also obtained lower academic achievement scores in high school than girls. Problems related to sustained attention and distractibility were together with age and gender defined as the most important features to predict future achievement scores. Using these four features as input to a collection of classifiers employing k-fold cross-validation for prediction of academic achievement level, we obtained classification accuracy, precision and recall that were clearly better than chance levels. Primary school teachers’ reports of problems related to sustained attention and distractibility were identified as the two most important features of inattentive behaviour predicting academic achievement in high school. Identification and follow-up procedures of primary school children showing these characteristics should be prioritised to prevent future academic failure. PMID:29182663
Lundervold, Astri J; Bøe, Tormod; Lundervold, Arvid
2017-01-01
Inattention in childhood is associated with academic problems later in life. The contribution of specific aspects of inattentive behaviour is, however, less known. We investigated feature importance of primary school teachers' reports on nine aspects of inattentive behaviour, gender and age in predicting future academic achievement. Primary school teachers of n = 2491 children (7-9 years) rated nine items reflecting different aspects of inattentive behaviour in 2002. A mean academic achievement score from the previous semester in high school (2012) was available for each youth from an official school register. All scores were at a categorical level. Feature importances were assessed by using multinominal logistic regression, classification and regression trees analysis, and a random forest algorithm. Finally, a comprehensive pattern classification procedure using k-fold cross-validation was implemented. Overall, inattention was rated as more severe in boys, who also obtained lower academic achievement scores in high school than girls. Problems related to sustained attention and distractibility were together with age and gender defined as the most important features to predict future achievement scores. Using these four features as input to a collection of classifiers employing k-fold cross-validation for prediction of academic achievement level, we obtained classification accuracy, precision and recall that were clearly better than chance levels. Primary school teachers' reports of problems related to sustained attention and distractibility were identified as the two most important features of inattentive behaviour predicting academic achievement in high school. Identification and follow-up procedures of primary school children showing these characteristics should be prioritised to prevent future academic failure.
Tohira, Hideo; Jacobs, Ian; Mountain, David; Gibson, Nick; Yeo, Allen
2011-01-01
The Abbreviated Injury Scale (AIS) was revised in 2005 and updated in 2008 (AIS 2008). We aimed to compare the outcome prediction performance of AIS-based injury severity scoring tools by using AIS 2008 and AIS 98. We used all major trauma patients hospitalized to the Royal Perth Hospital between 1994 and 2008. We selected five AIS-based injury severity scoring tools, including Injury Severity Score (ISS), New Injury Severity Score (NISS), modified Anatomic Profile (mAP), Trauma and Injury Severity Score (TRISS) and A Severity Characterization of Trauma (ASCOT). We selected survival after injury as a target outcome. We used the area under the Receiver Operating Characteristic curve (AUROC) as a performance measure. First, we compared the five tools using all cases whose records included all variables for the TRISS (complete dataset) using a 10-fold cross-validation. Second, we compared the ISS and NISS for AIS 98 and AIS 2008 using all subjects (whole dataset). We identified 1,269 and 4,174 cases for a complete dataset and a whole dataset, respectively. With the 10-fold cross-validation, there were no clear differences in the AUROCs between the AIS 98- and AIS 2008-based scores. With the second comparison, the AIS 98-based ISS performed significantly worse than the AIS 2008-based ISS (p<0.0001), while there was no significant difference between the AIS 98- and AIS 2008-based NISSs. Researchers should be aware of these findings when they select an injury severity scoring tool for their studies.
Rigid Origami via Optical Programming and Deferred Self-Folding of a Two-Stage Photopolymer.
Glugla, David J; Alim, Marvin D; Byars, Keaton D; Nair, Devatha P; Bowman, Christopher N; Maute, Kurt K; McLeod, Robert R
2016-11-02
We demonstrate the formation of shape-programmed, glassy origami structures using a single-layer photopolymer with two mechanically distinct phases. The latent origami pattern consisting of rigid, high cross-link density panels and flexible, low cross-link density creases is fabricated using a series of photomask exposures. Strong optical absorption of the polymer formulation creates depth-wise gradients in the cross-link density of the creases, enforcing directed folding which enables programming of both mountain and valley folds within the same sheet. These multiple photomask patterns can be sequentially applied because the sheet remains flat until immersed into a photopolymerizable monomer solution that differentially swells the polymer to fold and form the origami structure. After folding, a uniform photoexposure polymerizes the absorbed solution, permanently fixing the shape of the folded structure while simultaneously increasing the modulus of the folds. This approach creates sharp folds by mimicking the stiff panels and flexible creases of paper origami while overcoming the traditional trade-off of self-actuated materials that require low modulus for folding and high modulus for mechanical robustness. Using this process, we demonstrate a waterbomb base capable of supporting 1500 times its own weight.
Chiarotto, Alessandro; Vanti, Carla; Ostelo, Raymond W; Ferrari, Silvano; Tedesco, Giuseppe; Rocca, Barbara; Pillastrini, Paolo; Monticone, Marco
2015-11-01
The Pain Self-Efficacy Questionnaire (PSEQ) is a patient self-reported measurement instrument that evaluates pain self-efficacy beliefs in patients with chronic pain. The measurement properties of the PSEQ have been tested in its original and translated versions, showing satisfactory results for validity and reliability. The aims of this study were 2 fold as follows: (1) to translate the PSEQ into Italian through a process of cross-cultural adaptation, (2) to test the measurement properties of the Italian PSEQ (PSEQ-I). The cross-cultural adaptation was completed in 5 months without omitting any item of the original PSEQ. Measurement properties were tested in 165 patients with chronic low back pain (CLBP) (65% women, mean age 49.9 years). Factor analysis confirmed the one-factor structure of the questionnaire. Internal consistency (Cronbach's α = 0.94) and test-retest reliability (ICCagreement = 0.82) of the PSEQ-I showed good results. The smallest detectable change was equal to 15.69 scale points. The PSEQ-I displayed a high construct validity by meeting more than 75% of a priori hypotheses on correlations with measurement instruments assessing pain intensity, disability, anxiety, depression, pain catastrophizing, fear of movement, and coping strategies. Additionally, the PSEQ-I differentiated patients taking pain medication or not. The results of this study suggest that the PSEQ-I can be used as a valid and reliable tool in Italian patients with CLBP. © 2014 World Institute of Pain.
NASA Astrophysics Data System (ADS)
Qin, Sanbo; Mittal, Jeetain; Zhou, Huan-Xiang
2013-08-01
We have developed a ‘postprocessing’ method for modeling biochemical processes such as protein folding under crowded conditions (Qin and Zhou 2009 Biophys. J. 97 12-19). In contrast to the direct simulation approach, in which the protein undergoing folding is simulated along with crowders, the postprocessing method requires only the folding simulation without crowders. The influence of the crowders is then obtained by taking conformations from the crowder-free simulation and calculating the free energies of transferring to the crowders. This postprocessing yields the folding free energy surface of the protein under crowding. Here the postprocessing results for the folding of three small proteins under ‘repulsive’ crowding are validated by those obtained previously by the direct simulation approach (Mittal and Best 2010 Biophys. J. 98 315-20). This validation confirms the accuracy of the postprocessing approach and highlights its distinct advantages in modeling biochemical processes under cell-like crowded conditions, such as enabling an atomistic representation of the test proteins.
Munkácsy, Gyöngyi; Sztupinszki, Zsófia; Herman, Péter; Bán, Bence; Pénzváltó, Zsófia; Szarvas, Nóra; Győrffy, Balázs
2016-09-27
No independent cross-validation of success rate for studies utilizing small interfering RNA (siRNA) for gene silencing has been completed before. To assess the influence of experimental parameters like cell line, transfection technique, validation method, and type of control, we have to validate these in a large set of studies. We utilized gene chip data published for siRNA experiments to assess success rate and to compare methods used in these experiments. We searched NCBI GEO for samples with whole transcriptome analysis before and after gene silencing and evaluated the efficiency for the target and off-target genes using the array-based expression data. Wilcoxon signed-rank test was used to assess silencing efficacy and Kruskal-Wallis tests and Spearman rank correlation were used to evaluate study parameters. All together 1,643 samples representing 429 experiments published in 207 studies were evaluated. The fold change (FC) of down-regulation of the target gene was above 0.7 in 18.5% and was above 0.5 in 38.7% of experiments. Silencing efficiency was lowest in MCF7 and highest in SW480 cells (FC = 0.59 and FC = 0.30, respectively, P = 9.3E-06). Studies utilizing Western blot for validation performed better than those with quantitative polymerase chain reaction (qPCR) or microarray (FC = 0.43, FC = 0.47, and FC = 0.55, respectively, P = 2.8E-04). There was no correlation between type of control, transfection method, publication year, and silencing efficiency. Although gene silencing is a robust feature successfully cross-validated in the majority of experiments, efficiency remained insufficient in a significant proportion of studies. Selection of cell line model and validation method had the highest influence on silencing proficiency.
Sun, Jiangming; Carlsson, Lars; Ahlberg, Ernst; Norinder, Ulf; Engkvist, Ola; Chen, Hongming
2017-07-24
Conformal prediction has been proposed as a more rigorous way to define prediction confidence compared to other application domain concepts that have earlier been used for QSAR modeling. One main advantage of such a method is that it provides a prediction region potentially with multiple predicted labels, which contrasts to the single valued (regression) or single label (classification) output predictions by standard QSAR modeling algorithms. Standard conformal prediction might not be suitable for imbalanced data sets. Therefore, Mondrian cross-conformal prediction (MCCP) which combines the Mondrian inductive conformal prediction with cross-fold calibration sets has been introduced. In this study, the MCCP method was applied to 18 publicly available data sets that have various imbalance levels varying from 1:10 to 1:1000 (ratio of active/inactive compounds). Our results show that MCCP in general performed well on bioactivity data sets with various imbalance levels. More importantly, the method not only provides confidence of prediction and prediction regions compared to standard machine learning methods but also produces valid predictions for the minority class. In addition, a compound similarity based nonconformity measure was investigated. Our results demonstrate that although it gives valid predictions, its efficiency is much worse than that of model dependent metrics.
Cross-Study Homogeneity of Psoriasis Gene Expression in Skin across a Large Expression Range
Kerkof, Keith; Timour, Martin; Russell, Christopher B.
2013-01-01
Background In psoriasis, only limited overlap between sets of genes identified as differentially expressed (psoriatic lesional vs. psoriatic non-lesional) was found using statistical and fold-change cut-offs. To provide a framework for utilizing prior psoriasis data sets we sought to understand the consistency of those sets. Methodology/Principal Findings Microarray expression profiling and qRT-PCR were used to characterize gene expression in PP and PN skin from psoriasis patients. cDNA (three new data sets) and cRNA hybridization (four existing data sets) data were compared using a common analysis pipeline. Agreement between data sets was assessed using varying qualitative and quantitative cut-offs to generate a DEG list in a source data set and then using other data sets to validate the list. Concordance increased from 67% across all probe sets to over 99% across more than 10,000 probe sets when statistical filters were employed. The fold-change behavior of individual genes tended to be consistent across the multiple data sets. We found that genes with <2-fold change values were quantitatively reproducible between pairs of data-sets. In a subset of transcripts with a role in inflammation changes detected by microarray were confirmed by qRT-PCR with high concordance. For transcripts with both PN and PP levels within the microarray dynamic range, microarray and qRT-PCR were quantitatively reproducible, including minimal fold-changes in IL13, TNFSF11, and TNFRSF11B and genes with >10-fold changes in either direction such as CHRM3, IL12B and IFNG. Conclusions/Significance Gene expression changes in psoriatic lesions were consistent across different studies, despite differences in patient selection, sample handling, and microarray platforms but between-study comparisons showed stronger agreement within than between platforms. We could use cut-offs as low as log10(ratio) = 0.1 (fold-change = 1.26), generating larger gene lists that validate on independent data sets. The reproducibility of PP signatures across data sets suggests that different sample sets can be productively compared. PMID:23308107
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Z; MD Anderson Cancer Center, Houston, TX; Ho, A
Purpose: To develop and validate a prediction model using radiomics features extracted from MR images to distinguish radiation necrosis from tumor progression for brain metastases treated with Gamma knife radiosurgery. Methods: The images used to develop the model were T1 post-contrast MR scans from 71 patients who had had pathologic confirmation of necrosis or progression; 1 lesion was identified per patient (17 necrosis and 54 progression). Radiomics features were extracted from 2 images at 2 time points per patient, both obtained prior to resection. Each lesion was manually contoured on each image, and 282 radiomics features were calculated for eachmore » lesion. The correlation for each radiomics feature between two time points was calculated within each group to identify a subset of features with distinct values between two groups. The delta of this subset of radiomics features, characterizing changes from the earlier time to the later one, was included as a covariate to build a prediction model using support vector machines with a cubic polynomial kernel function. The model was evaluated with a 10-fold cross-validation. Results: Forty radiomics features were selected based on consistent correlation values of approximately 0 for the necrosis group and >0.2 for the progression group. In performing the 10-fold cross-validation, we narrowed this number down to 11 delta radiomics features for the model. This 11-delta-feature model showed an overall prediction accuracy of 83.1%, with a true positive rate of 58.8% in predicting necrosis and 90.7% for predicting tumor progression. The area under the curve for the prediction model was 0.79. Conclusion: These delta radiomics features extracted from MR scans showed potential for distinguishing radiation necrosis from tumor progression. This tool may be a useful, noninvasive means of determining the status of an enlarging lesion after radiosurgery, aiding decision-making regarding surgical resection versus conservative medical management.« less
Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.
Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing
2016-08-24
Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.
Hravnak, Marilyn; Chen, Lujie; Dubrawski, Artur; Bose, Eliezer; Clermont, Gilles; Pinsky, Michael R.
2015-01-01
PURPOSE Huge hospital information system databases can be mined for knowledge discovery and decision support, but artifact in stored non-invasive vital sign (VS) high-frequency data streams limits its use. We used machine-learning (ML) algorithms trained on expert-labeled VS data streams to automatically classify VS alerts as real or artifact, thereby “cleaning” such data for future modeling. METHODS 634 admissions to a step-down unit had recorded continuous noninvasive VS monitoring data (heart rate [HR], respiratory rate [RR], peripheral arterial oxygen saturation [SpO2] at 1/20Hz., and noninvasive oscillometric blood pressure [BP]) Time data were across stability thresholds defined VS event epochs. Data were divided Block 1 as the ML training/cross-validation set and Block 2 the test set. Expert clinicians annotated Block 1 events as perceived real or artifact. After feature extraction, ML algorithms were trained to create and validate models automatically classifying events as real or artifact. The models were then tested on Block 2. RESULTS Block 1 yielded 812 VS events, with 214 (26%) judged by experts as artifact (RR 43%, SpO2 40%, BP 15%, HR 2%). ML algorithms applied to the Block 1 training/cross-validation set (10-fold cross-validation) gave area under the curve (AUC) scores of 0.97 RR, 0.91 BP and 0.76 SpO2. Performance when applied to Block 2 test data was AUC 0.94 RR, 0.84 BP and 0.72 SpO2). CONCLUSIONS ML-defined algorithms applied to archived multi-signal continuous VS monitoring data allowed accurate automated classification of VS alerts as real or artifact, and could support data mining for future model building. PMID:26438655
Zhou, Jiyun; Lu, Qin; Xu, Ruifeng; He, Yulan; Wang, Hongpeng
2017-08-29
Prediction of DNA-binding residue is important for understanding the protein-DNA recognition mechanism. Many computational methods have been proposed for the prediction, but most of them do not consider the relationships of evolutionary information between residues. In this paper, we first propose a novel residue encoding method, referred to as the Position Specific Score Matrix (PSSM) Relation Transformation (PSSM-RT), to encode residues by utilizing the relationships of evolutionary information between residues. PDNA-62 and PDNA-224 are used to evaluate PSSM-RT and two existing PSSM encoding methods by five-fold cross-validation. Performance evaluations indicate that PSSM-RT is more effective than previous methods. This validates the point that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction. An ensemble learning classifier (EL_PSSM-RT) is also proposed by combining ensemble learning model and PSSM-RT to better handle the imbalance between binding and non-binding residues in datasets. EL_PSSM-RT is evaluated by five-fold cross-validation using PDNA-62 and PDNA-224 as well as two independent datasets TS-72 and TS-61. Performance comparisons with existing predictors on the four datasets demonstrate that EL_PSSM-RT is the best-performing method among all the predicting methods with improvement between 0.02-0.07 for MCC, 4.18-21.47% for ST and 0.013-0.131 for AUC. Furthermore, we analyze the importance of the pair-relationships extracted by PSSM-RT and the results validates the usefulness of PSSM-RT for encoding DNA-binding residues. We propose a novel prediction method for the prediction of DNA-binding residue with the inclusion of relationship of evolutionary information and ensemble learning. Performance evaluation shows that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction and ensemble learning can be used to address the data imbalance issue between binding and non-binding residues. A web service of EL_PSSM-RT ( http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/ ) is provided for free access to the biological research community.
Polymer Uncrossing and Knotting in Protein Folding, and Their Role in Minimal Folding Pathways
Mohazab, Ali R.; Plotkin, Steven S.
2013-01-01
We introduce a method for calculating the extent to which chain non-crossing is important in the most efficient, optimal trajectories or pathways for a protein to fold. This involves recording all unphysical crossing events of a ghost chain, and calculating the minimal uncrossing cost that would have been required to avoid such events. A depth-first tree search algorithm is applied to find minimal transformations to fold , , , and knotted proteins. In all cases, the extra uncrossing/non-crossing distance is a small fraction of the total distance travelled by a ghost chain. Different structural classes may be distinguished by the amount of extra uncrossing distance, and the effectiveness of such discrimination is compared with other order parameters. It was seen that non-crossing distance over chain length provided the best discrimination between structural and kinetic classes. The scaling of non-crossing distance with chain length implies an inevitable crossover to entanglement-dominated folding mechanisms for sufficiently long chains. We further quantify the minimal folding pathways by collecting the sequence of uncrossing moves, which generally involve leg, loop, and elbow-like uncrossing moves, and rendering the collection of these moves over the unfolded ensemble as a multiple-transformation “alignment”. The consensus minimal pathway is constructed and shown schematically for representative cases of an , , and knotted protein. An overlap parameter is defined between pathways; we find that proteins have minimal overlap indicating diverse folding pathways, knotted proteins are highly constrained to follow a dominant pathway, and proteins are somewhere in between. Thus we have shown how topological chain constraints can induce dominant pathway mechanisms in protein folding. PMID:23365638
Chen, Chen; Cai, Jing; Wang, Cuicui; Shi, Jingjin; Chen, Renjie; Yang, Changyuan; Li, Huichu; Lin, Zhijing; Meng, Xia; Zhao, Ang; Liu, Cong; Niu, Yue; Xia, Yongjie; Peng, Li; Zhao, Zhuohui; Chillrud, Steven; Yan, Beizhan; Kan, Haidong
2018-06-06
Epidemiologic studies of PM 2.5 (particulate matter with aerodynamic diameter ≤2.5 μm) and black carbon (BC) typically use ambient measurements as exposure proxies given that individual measurement is infeasible among large populations. Failure to account for variation in exposure will bias epidemiologic study results. The ability of ambient measurement as a proxy of exposure in regions with heavy pollution is untested. We aimed to investigate effects of potential determinants and to estimate PM 2.5 and BC exposure by a modeling approach. We collected 417 24 h personal PM 2.5 and 130 72 h personal BC measurements from a panel of 36 nonsmoking college students in Shanghai, China. Each participant underwent 4 rounds of three consecutive 24-h sampling sessions through December 2014 to July 2015. We applied backwards regression to construct mixed effect models incorporating all accessible variables of ambient pollution, climate and time-location information for exposure prediction. All models were evaluated by marginal R 2 and root mean square error (RMSE) from a leave-one-out-cross-validation (LOOCV) and a 10-fold cross-validation (10-fold CV). Personal PM 2.5 was 47.6% lower than ambient level, with mean (±Standard Deviation, SD) level of 39.9 (±32.1) μg/m 3 ; whereas personal BC (6.1 (±2.8) μg/m 3 ) was about one-fold higher than the corresponding ambient concentrations. Ambient levels were the most significant determinants of PM 2.5 and BC exposure. Meteorological and season indicators were also important predictors. Our final models predicted 75% of the variance in 24 h personal PM 2.5 and 72 h personal BC. LOOCV analysis showed an R 2 (RMSE) of 0.73 (0.40) for PM 2.5 and 0.66 (0.27) for BC. Ten-fold CV analysis showed a R 2 (RMSE) of 0.73 (0.41) for PM 2.5 and 0.68 (0.26) for BC. We used readily accessible data and established intuitive models that can predict PM 2.5 and BC exposure. This modeling approach can be a feasible solution for PM exposure estimation in epidemiological studies. Copyright © 2018 Elsevier Ltd. All rights reserved.
Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier
NASA Astrophysics Data System (ADS)
Wang, Leilei; Cheng, Jinyong
2018-03-01
Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.
Design and 4D Printing of Cross-Folded Origami Structures: A Preliminary Investigation
Teoh, Joanne Ee Mei; Feng, Xiaofan; Zhao, Yue; Liu, Yong
2018-01-01
In 4D printing research, different types of complex structure folding and unfolding have been investigated. However, research on cross-folding of origami structures (defined as a folding structure with at least two overlapping folds) has not been reported. This research focuses on the investigation of cross-folding structures using multi-material components along different axes and different horizontal hinge thickness with single homogeneous material. Tensile tests were conducted to determine the impact of multi-material components and horizontal hinge thickness. In the case of multi-material structures, the hybrid material composition has a significant impact on the overall maximum strain and Young’s modulus properties. In the case of single material structures, the shape recovery speed is inversely proportional to the horizontal hinge thickness, while the flexural or bending strength is proportional to the horizontal hinge thickness. A hinge with a thickness of 0.5 mm could be folded three times prior to fracture whilst a hinge with a thickness of 0.3 mm could be folded only once prior to fracture. A hinge with a thickness of 0.1 mm could not even be folded without cracking. The introduction of a physical hole in the center of the folding/unfolding line provided stress relief and prevented fracture. A complex flower petal shape was used to successfully demonstrate the implementation of overlapping and non-overlapping folding lines using both single material segments and multi-material segments. Design guidelines for establishing cross-folding structures using multi-material components along different axes and different horizontal hinge thicknesses with single or homogeneous material were established. These guidelines can be used to design and implement complex origami structures with overlapping and non-overlapping folding lines. Combined overlapping folding structures could be implemented and allocating specific hole locations in the overall designs could be further explored. In addition, creating a more precise prediction by investigating sets of in between hinge thicknesses and comparing the folding times before fracture, will be the subject of future work. PMID:29510503
Ferraresso, Serena; Vitulo, Nicola; Mininni, Alba N; Romualdi, Chiara; Cardazzo, Barbara; Negrisolo, Enrico; Reinhardt, Richard; Canario, Adelino V M; Patarnello, Tomaso; Bargelloni, Luca
2008-12-03
Aquaculture represents the most sustainable alternative of seafood supply to substitute for the declining marine fisheries, but severe production bottlenecks remain to be solved. The application of genomic technologies offers much promise to rapidly increase our knowledge on biological processes in farmed species and overcome such bottlenecks. Here we present an integrated platform for mRNA expression profiling in the gilthead sea bream (Sparus aurata), a marine teleost of great importance for aquaculture. A public data base was constructed, consisting of 19,734 unique clusters (3,563 contigs and 16,171 singletons). Functional annotation was obtained for 8,021 clusters. Over 4,000 sequences were also associated with a GO entry. Two 60mer probes were designed for each gene and in-situ synthesized on glass slides using Agilent SurePrint technology. Platform reproducibility and accuracy were assessed on two early stages of sea bream development (one-day and four days old larvae). Correlation between technical replicates was always > 0.99, with strong positive correlation between paired probes. A two class SAM test identified 1,050 differentially expressed genes between the two developmental stages. Functional analysis suggested that down-regulated transcripts (407) in older larvae are mostly essential/housekeeping genes, whereas tissue-specific genes are up-regulated in parallel with the formation of key organs (eye, digestive system). Cross-validation of microarray data was carried out using quantitative qRT-PCR on 11 target genes, selected to reflect the whole range of fold-change and both up-regulated and down-regulated genes. A statistically significant positive correlation was obtained comparing expression levels for each target gene across all biological replicates. Good concordance between qRT-PCR and microarray data was observed between 2- and 7-fold change, while fold-change compression in the microarray was present for differences greater than 10-fold in the qRT-PCR. A highly reliable oligo-microarray platform was developed and validated for the gilthead sea bream despite the presently limited knowledge of the species transcriptome. Because of the flexible design this array will be able to accommodate additional probes as soon as novel unique transcripts are available.
Wu, J; Awate, S P; Licht, D J; Clouchoux, C; du Plessis, A J; Avants, B B; Vossough, A; Gee, J C; Limperopoulos, C
2015-07-01
Traditional methods of dating a pregnancy based on history or sonographic assessment have a large variation in the third trimester. We aimed to assess the ability of various quantitative measures of brain cortical folding on MR imaging in determining fetal gestational age in the third trimester. We evaluated 8 different quantitative cortical folding measures to predict gestational age in 33 healthy fetuses by using T2-weighted fetal MR imaging. We compared the accuracy of the prediction of gestational age by these cortical folding measures with the accuracy of prediction by brain volume measurement and by a previously reported semiquantitative visual scale of brain maturity. Regression models were constructed, and measurement biases and variances were determined via a cross-validation procedure. The cortical folding measures are accurate in the estimation and prediction of gestational age (mean of the absolute error, 0.43 ± 0.45 weeks) and perform better than (P = .024) brain volume (mean of the absolute error, 0.72 ± 0.61 weeks) or sonography measures (SDs approximately 1.5 weeks, as reported in literature). Prediction accuracy is comparable with that of the semiquantitative visual assessment score (mean, 0.57 ± 0.41 weeks). Quantitative cortical folding measures such as global average curvedness can be an accurate and reliable estimator of gestational age and brain maturity for healthy fetuses in the third trimester and have the potential to be an indicator of brain-growth delays for at-risk fetuses and preterm neonates. © 2015 by American Journal of Neuroradiology.
Multi-parameter machine learning approach to the neuroanatomical basis of developmental dyslexia.
Płoński, Piotr; Gradkowski, Wojciech; Altarelli, Irene; Monzalvo, Karla; van Ermingen-Marbach, Muna; Grande, Marion; Heim, Stefan; Marchewka, Artur; Bogorodzki, Piotr; Ramus, Franck; Jednoróg, Katarzyna
2017-02-01
Despite decades of research, the anatomical abnormalities associated with developmental dyslexia are still not fully described. Studies have focused on between-group comparisons in which different neuroanatomical measures were generally explored in isolation, disregarding potential interactions between regions and measures. Here, for the first time a multivariate classification approach was used to investigate grey matter disruptions in children with dyslexia in a large (N = 236) multisite sample. A variety of cortical morphological features, including volumetric (volume, thickness and area) and geometric (folding index and mean curvature) measures were taken into account and generalizability of classification was assessed with both 10-fold and leave-one-out cross validation (LOOCV) techniques. Classification into control vs. dyslexic subjects achieved above chance accuracy (AUC = 0.66 and ACC = 0.65 in the case of 10-fold CV, and AUC = 0.65 and ACC = 0.64 using LOOCV) after principled feature selection. Features that discriminated between dyslexic and control children were exclusively situated in the left hemisphere including superior and middle temporal gyri, subparietal sulcus and prefrontal areas. They were related to geometric properties of the cortex, with generally higher mean curvature and a greater folding index characterizing the dyslexic group. Our results support the hypothesis that an atypical curvature pattern with extra folds in left hemispheric perisylvian regions characterizes dyslexia. Hum Brain Mapp 38:900-908, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Bereau, Tristan; Wang, Zun-Jing; Deserno, Markus
2014-03-01
Interfacial systems are at the core of fascinating phenomena in many disciplines, such as biochemistry, soft-matter physics, and food science. However, the parametrization of accurate, reliable, and consistent coarse-grained (CG) models for systems at interfaces remains a challenging endeavor. In the present work, we explore to what extent two independently developed solvent-free CG models of peptides and lipids—of different mapping schemes, parametrization methods, target functions, and validation criteria—can be combined by only tuning the cross-interactions. Our results show that the cross-parametrization can reproduce a number of structural properties of membrane peptides (for example, tilt and hydrophobic mismatch), in agreement with existing peptide-lipid CG force fields. We find encouraging results for two challenging biophysical problems: (i) membrane pore formation mediated by the cooperative action of several antimicrobial peptides, and (ii) the insertion and folding of the helix-forming peptide WALP23 in the membrane.
NASA Astrophysics Data System (ADS)
Frehner, Marcel; Reif, Daniel; Grasemann, Bernhard
2010-05-01
There are a large number of numerical finite element studies concerned with modeling the evolution of folded geological layers through time. This body of research includes many aspects of folding and many different approaches, such as two- and three-dimensional studies, single-layer folding, detachment folding, development of chevron folds, Newtonian, power-law viscous and more complex rheologies, influence of anisotropy, pure-shear, simple-shear and other boundary conditions and so forth. In recent years, studies of multilayer folding emerged, thanks to more advanced mesh generator software and increased computational power. Common to all of these studies is the fact that they consider a forward directed time evolution, as in nature. Very few studies use the finite element method for reverse-time simulations. In such studies, folded geological layers are taken as initial conditions for the numerical simulation. The folding process is reversed by changing the signs of the boundary conditions that supposedly drove the folding process. In such studies, the geometry of the geological layers before the folding process is searched and the amount of shortening necessary for the final folded geometry can be calculated. In contrast to a kinematic or geometric fold restoration procedure, the described approach takes the mechanical behavior of the geological layers into account, such as rheology and the relative strength of the individual layers. This approach is therefore called mechanical restoration of folds. In this study, the concept of mechanical restoration is applied to a two-dimensional 50km long NE-SW-cross-section through the Zagros Simply Folded Belt in Iraqi Kurdistan, NE from the city of Erbil. The Simply Folded Belt is dominated by gentle to open folding and faults are either absent or record only minor offset. Therefore, this region is ideal for testing the concept of mechanical restoration. The profile used is constructed from structural field measurements and digital elevation models using the dip-domain method for balancing the cross-section. The lithology consists of Cretaceous to Cenozoic sediments. Massive carbonate rock units act as the competent layers compared to the incompetent behavior of siltstone, claystone and marl layers. We show the first results of the mechanical restoration of the Zagros cross-section and we discuss advantages and disadvantages, as well as some technical aspects of the applied method. First results indicate that a shortening of at least 50% was necessary to create the present-day folded cross-section. This value is higher than estimates of the amount of shortening solely based on kinematic or geometric restoration. One particular problem that is discussed is the presence of (unnaturally) sharp edges in a balanced cross-section produced using the dip-domain method, which need to be eliminated for mechanical restoration calculations to get reasonable results.
Automatic classification of tissue malignancy for breast carcinoma diagnosis.
Fondón, Irene; Sarmiento, Auxiliadora; García, Ana Isabel; Silvestre, María; Eloy, Catarina; Polónia, António; Aguiar, Paulo
2018-05-01
Breast cancer is the second leading cause of cancer death among women. Its early diagnosis is extremely important to prevent avoidable deaths. However, malignancy assessment of tissue biopsies is complex and dependent on observer subjectivity. Moreover, hematoxylin and eosin (H&E)-stained histological images exhibit a highly variable appearance, even within the same malignancy level. In this paper, we propose a computer-aided diagnosis (CAD) tool for automated malignancy assessment of breast tissue samples based on the processing of histological images. We provide four malignancy levels as the output of the system: normal, benign, in situ and invasive. The method is based on the calculation of three sets of features related to nuclei, colour regions and textures considering local characteristics and global image properties. By taking advantage of well-established image processing techniques, we build a feature vector for each image that serves as an input to an SVM (Support Vector Machine) classifier with a quadratic kernel. The method has been rigorously evaluated, first with a 5-fold cross-validation within an initial set of 120 images, second with an external set of 30 different images and third with images with artefacts included. Accuracy levels range from 75.8% when the 5-fold cross-validation was performed to 75% with the external set of new images and 61.11% when the extremely difficult images were added to the classification experiment. The experimental results indicate that the proposed method is capable of distinguishing between four malignancy levels with high accuracy. Our results are close to those obtained with recent deep learning-based methods. Moreover, it performs better than other state-of-the-art methods based on feature extraction, and it can help improve the CAD of breast cancer. Copyright © 2018 Elsevier Ltd. All rights reserved.
Tohira, Hideo; Jacobs, Ian; Mountain, David; Gibson, Nick; Yeo, Allen
2011-01-01
The Abbreviated Injury Scale (AIS) was revised in 2005 and updated in 2008 (AIS 2008). We aimed to compare the outcome prediction performance of AIS-based injury severity scoring tools by using AIS 2008 and AIS 98. We used all major trauma patients hospitalized to the Royal Perth Hospital between 1994 and 2008. We selected five AIS-based injury severity scoring tools, including Injury Severity Score (ISS), New Injury Severity Score (NISS), modified Anatomic Profile (mAP), Trauma and Injury Severity Score (TRISS) and A Severity Characterization of Trauma (ASCOT). We selected survival after injury as a target outcome. We used the area under the Receiver Operating Characteristic curve (AUROC) as a performance measure. First, we compared the five tools using all cases whose records included all variables for the TRISS (complete dataset) using a 10-fold cross-validation. Second, we compared the ISS and NISS for AIS 98 and AIS 2008 using all subjects (whole dataset). We identified 1,269 and 4,174 cases for a complete dataset and a whole dataset, respectively. With the 10-fold cross-validation, there were no clear differences in the AUROCs between the AIS 98- and AIS 2008-based scores. With the second comparison, the AIS 98-based ISS performed significantly worse than the AIS 2008-based ISS (p<0.0001), while there was no significant difference between the AIS 98- and AIS 2008-based NISSs. Researchers should be aware of these findings when they select an injury severity scoring tool for their studies. PMID:22105401
Kasai, Takami; Motoori, Ken; Horikoshi, Takuro; Uchiyama, Katsuhiro; Yasufuku, Kazuhiro; Takiguchi, Yuichi; Takahashi, Fumiaki; Kuniyasu, Yoshio; Ito, Hisao
2010-08-01
To evaluate whether dual-time point scanning with integrated fluorine-18 fluorodeoxyglucose ((18)F-FDG) positron emission tomography and computed tomography (PET/CT) is useful for evaluation of mediastinal and hilar lymph nodes in non-small cell lung cancer diagnosed as operable by contrast-enhanced CT. PET/CT data and pathological findings of 560 nodal stations in 129 patients with pathologically proven non-small cell lung cancer diagnosed as operable by contrast-enhanced CT were reviewed retrospectively. Standardized uptake values (SUVs) on early scans (SUVe) 1h, and on delayed scans (SUVd) 2h after FDG injection of each nodal station were measured. Retention index (RI) (%) was calculated by subtracting SUVe from SUVd and dividing by SUVe. Logistic regression analysis was performed with seven kinds of models, consisting of (1) SUVe, (2) SUVd, (3) RI, (4) SUVe and SUVd, (5) SUVe and RI, (6) SUVd and RI, and (7) SUVe, SUVd and RI. The seven derived models were compared by receiver-operating characteristic (ROC) analysis. k-Fold cross-validation was performed with k values of 5 and 10. p<0.05 was considered statistically significant. Model (1) including the term of SUVe showed the largest area under the ROC curve among the seven models. The cut-off probability of metastasis of 3.5% with SUVe of 2.5 revealed a sensitivity of 78% and a specificity of 81% on ROC analysis, and approximately 60% and 80% on k-fold cross-validation. Single scanning of PET/CT is sufficiently useful for evaluating mediastinal and hilar nodes for metastasis. Copyright (c) 2009 Elsevier Ireland Ltd. All rights reserved.
Viscoelastic properties of rabbit vocal folds after augmentation.
Hertegård, Stellan; Dahlqvist, Ake; Laurent, Claude; Borzacchiello, Assunta; Ambrosio, Luigi
2003-03-01
Vocal fold function is closely related to tissue viscoelasticity. Augmentation substances may alter the viscoelastic properties of vocal fold tissues and hence their vibratory capacity. We sought to investigate the viscoelastic properties of rabbit vocal folds in vitro after injections of various augmentation substances. Polytetrafluoroethylene (Teflon), cross-linked collagen (Zyplast), and cross-linked hyaluronan, hylan b gel (Hylaform) were injected into the lamina propria and the thyroarytenoid muscle of rabbit vocal folds. Dynamic viscosity of the injected vocal fold as a function of frequency was measured with a Bohlin parallel-plate rheometer during small-amplitude oscillation. All injected vocal folds showed a decreasing dynamic viscosity with increasing frequency. Vocal fold samples injected with hylan b gel showed the lowest dynamic viscosity, quite close to noninjected control samples. Vocal folds injected with polytetrafluoroethylene showed the highest dynamic viscosity followed by the collagen samples. The data indicated that hylan b gel in short-term renders the most natural viscoelastic properties to the vocal fold among the substances tested. This is of importance to restore/preserve the vibratory capacity of the vocal folds when glottal insufficiency is treated with injections.
Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.
Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook
2014-11-01
As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Gao, Cong-Fen; Ma, Shao-Zhi; Shan, Cai-Hui; Wu, Shun-Fan
2014-09-01
The western flower thrips (WFT) Frankliniella occidentalis (Pergande) (Thysanoptera: Thripidae), an important pest of various crops in the world, has invaded China since 2003. To understand the risks and to determine possible mechanisms of resistance to thiamethoxam in WFT, a resistant strain was selected under the laboratory conditions. Cross-resistance and the possible biochemical resistance mechanisms were investigated in this study. A 15.1-fold thiamethoxam-resistant WFT strain (TH-R) was established after selection for 55 generations. Compared with the susceptible strain (TH-S), the selected TH-R strain showed extremely high level cross-resistance to imidaclothiz (392.1-fold) and low level cross-resistance to dinotefuran (5.7-fold), acetamiprid (2.9-fold) and emamectin benzoate (2.1-fold), respectively. No cross-resistance to other fourteen insecticides was detected. Synergism tests showed that piperonyl butoxide (PBO) and triphenyl phosphate (TPP) produced a high synergism of thiamethoxam effects in the TH-R strain (2.6- and 2.6-fold respectively). However, diethyl maleate (DEM) did not act synergistically with thiamethoxam. Biochemical assays showed that mixed function oxidase (MFO) activities and carboxylesterase (CarE) activity of the TH-R strain were 2.8- and 1.5-fold higher than that of the TH-S strain, respectively. When compared with the TH-S strain, the TH-R strain had a relative fitness of 0.64. The results show that WFT develops resistance to thiamethoxam after continuous application and thiamethoxam resistance had considerable fitness costs in the WFT. It appears that enhanced metabolism mediated by cytochrome P450 monooxygenases and CarE was a major mechanism for thiamethoxam resistance in the WFT. The use of cross-resistance insecticides, including imidaclothiz and dinotefuran, should be avoided for sustainable resistance management. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Bowden, S.; Wireman, R.; Sautter, L.; Beutel, E. K.
2015-12-01
Bathymetric data were collected off the southwest coast of County Cork, Ireland by the joint INFOMAR project between the Marine Institute of Ireland and the Geologic Survey of Ireland. Data were collected using a Kongsberg EM2040 multibeam sonar on the R/V Celtic Voyager, in August and September 2014, and were post-processed with CARIS HIPS and SIPS 8.1 and 9.0 software to create 2D and 3D bathymetric surfaces. From the computer generated images, some of the lithologic formations were relatively aged and observed. The studied regions range in depth from 20 to 118 m, with shallower areas to the northeast. Several large rock outcrops occur, the larger of which shows a vertical rise of nearly 20 m. These outcrops are oriented in a northeast-southwest direction, and exhibit significant bed folding, regional folding, tilted beds, and cross joints. The folds studied are plunging chevron folds. These folds have a northeast-southwest fold axis orthogonal to the cross joints and are older relative to the jointing systems. The NE-SW joints are older than the NW-SE joints due to their correlation with drainage and erosion patterns. Regional folding is the youngest feature due to its superposition on the chevron folding and jointing systems. The interaction of cross jointing and folding is consistent with the geologic history of the area, and creates a unique bathymetry worthy of further study.
Araki, Tadashi; Ikeda, Nobutaka; Shukla, Devarshi; Jain, Pankaj K; Londhe, Narendra D; Shrivastava, Vimal K; Banchhor, Sumit K; Saba, Luca; Nicolaides, Andrew; Shafique, Shoaib; Laird, John R; Suri, Jasjit S
2016-05-01
Percutaneous coronary interventional procedures need advance planning prior to stenting or an endarterectomy. Cardiologists use intravascular ultrasound (IVUS) for screening, risk assessment and stratification of coronary artery disease (CAD). We hypothesize that plaque components are vulnerable to rupture due to plaque progression. Currently, there are no standard grayscale IVUS tools for risk assessment of plaque rupture. This paper presents a novel strategy for risk stratification based on plaque morphology embedded with principal component analysis (PCA) for plaque feature dimensionality reduction and dominant feature selection technique. The risk assessment utilizes 56 grayscale coronary features in a machine learning framework while linking information from carotid and coronary plaque burdens due to their common genetic makeup. This system consists of a machine learning paradigm which uses a support vector machine (SVM) combined with PCA for optimal and dominant coronary artery morphological feature extraction. Carotid artery proven intima-media thickness (cIMT) biomarker is adapted as a gold standard during the training phase of the machine learning system. For the performance evaluation, K-fold cross validation protocol is adapted with 20 trials per fold. For choosing the dominant features out of the 56 grayscale features, a polling strategy of PCA is adapted where the original value of the features is unaltered. Different protocols are designed for establishing the stability and reliability criteria of the coronary risk assessment system (cRAS). Using the PCA-based machine learning paradigm and cross-validation protocol, a classification accuracy of 98.43% (AUC 0.98) with K=10 folds using an SVM radial basis function (RBF) kernel was achieved. A reliability index of 97.32% and machine learning stability criteria of 5% were met for the cRAS. This is the first Computer aided design (CADx) system of its kind that is able to demonstrate the ability of coronary risk assessment and stratification while demonstrating a successful design of the machine learning system based on our assumptions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Frehner, Marcel; Reif, Daniel; Grasemann, Bernhard
2012-06-01
This paper compares kinematical and mechanical techniques for the palinspastic reconstruction of folded cross sections in collision orogens. The studied area and the reconstructed NE-SW trending, 55.5 km long cross section is located in the High Folded Zone of the Zagros fold-and-thrust belt in the Kurdistan region of Iraq. The present-day geometry of the cross section has been constructed from field as well as remote sensing data. In a first step, the structures and the stratigraphy are simplified and summarized in eight units trying to identify the main geometric and mechanical parameters. In a second step, the shortening is kinematically estimated using the dip domain method to 11%-15%. Then the same cross section is used in a numerical finite element model to perform dynamical unfolding simulations taking various rheological parameters into account. The main factor allowing for an efficient dynamic unfolding is the presence of interfacial slip conditions between the mechanically strong units. Other factors, such as Newtonian versus power law viscous rheology or the presence of a basement, affect the numerical simulations much less strongly. If interfacial slip is accounted for, fold amplitudes are reduced efficiently during the dynamical unfolding simulations, while welded layer interfaces lead to unrealistic shortening estimates. It is suggested that interfacial slip and decoupling of the deformation along detachment horizons is an important mechanical parameter that controlled the folding processes in the Zagros High Folded Zone.
NASA Astrophysics Data System (ADS)
Frehner, M.; Reif, D.; Grasemann, B.
2012-04-01
Our study compares kinematical and mechanical techniques for the palinspastic reconstruction of folded cross-sections in collision orogens. The studied area and the reconstructed NE-SW-trending, 55.5 km long cross-section is located in the High Folded Zone of the Zagros fold-and-thrust belt in the Kurdistan Region of Iraq. The present-day geometry of the cross-section has been constructed from field, as well as remote sensing data. In a first step, the structures and the stratigraphy are simplified and summarized in eight units trying to identify the main geometric and mechanical parameters. In a second step, the shortening is kinematically estimated using the dip-domain method to 11%-15%. Then the same cross-section is used in a numerical finite-element model to perform dynamical unfolding simulations taking various rheological parameters into account. The main factor allowing for an efficient dynamic unfolding is the presence of interfacial slip conditions between the mechanically strong units. Other factors, such as Newtonian vs. power-law viscous rheology or the presence of a basement affect the numerical simulations much less strongly. If interfacial slip is accounted for, fold amplitudes are reduced efficiently during the dynamical unfolding simulations, while welded layer interfaces lead to unrealistic shortening estimates. It is suggested that interfacial slip and decoupling of the deformation along detachment horizons is an important mechanical parameter that controlled the folding processes in the Zagros High Folded Zone.
A QSAR Model for Thyroperoxidase Inhibition and Screening ...
Thyroid hormones (THs) are critical modulators of a wide range of biological processes from neurodevelopment to metabolism. Well regulated levels of THs are critical during development and even moderate changes in maternal or fetal TH levels produce irreversible neurological deficits in children. The enzyme thyroperoxidase (TPO) plays a key role in the synthesis of THs. Inhibition of TPO by xenobiotics leads to decreased TH synthesis and, depending on the degree of synthesis inhibition, may result in adverse developmental outcomes. Recently, a high-throughput screening assay for TPO inhibition (AUR-TPO) was developed and used to screen the ToxCast Phase I and II chemicals. In the present study, we used the results from the AUR-TPO screening to develop a Quantitative Structure-Activity Relationship (QSAR) model for TPO inhibition in Leadscope®. The training set consisted of 898 discrete organic chemicals: 134 positive and 764 negative for TPO inhibition. A 10 times two-fold 50% cross-validation of the model was performed, yielding a balanced accuracy of 78.7% within its defined applicability domain. More recently, an additional ~800 chemicals from the US EPA Endocrine Disruption Screening Program (EDSP21) were screened using the AUR-TPO assay. This data was used for external validation of the QSAR model, demonstrating a balanced accuracy of 85.7% within its applicability domain. Overall, the cross- and external validations indicate a model with a high predictiv
Budget Online Learning Algorithm for Least Squares SVM.
Jian, Ling; Shen, Shuqian; Li, Jundong; Liang, Xijun; Li, Lei
2017-09-01
Batch-mode least squares support vector machine (LSSVM) is often associated with unbounded number of support vectors (SVs'), making it unsuitable for applications involving large-scale streaming data. Limited-scale LSSVM, which allows efficient updating, seems to be a good solution to tackle this issue. In this paper, to train the limited-scale LSSVM dynamically, we present a budget online LSSVM (BOLSSVM) algorithm. Methodologically, by setting a fixed budget for SVs', we are able to update the LSSVM model according to the updated SVs' set dynamically without retraining from scratch. In particular, when a new small chunk of SVs' substitute for the old ones, the proposed algorithm employs a low rank correction technology and the Sherman-Morrison-Woodbury formula to compute the inverse of saddle point matrix derived from the LSSVM's Karush-Kuhn-Tucker (KKT) system, which, in turn, updates the LSSVM model efficiently. In this way, the proposed BOLSSVM algorithm is especially useful for online prediction tasks. Another merit of the proposed BOLSSVM is that it can be used for k -fold cross validation. Specifically, compared with batch-mode learning methods, the computational complexity of the proposed BOLSSVM method is significantly reduced from O(n 4 ) to O(n 3 ) for leave-one-out cross validation with n training samples. The experimental results of classification and regression on benchmark data sets and real-world applications show the validity and effectiveness of the proposed BOLSSVM algorithm.
Regenbogen, Sam; Wilkins, Angela D; Lichtarge, Olivier
2016-01-01
Biomedicine produces copious information it cannot fully exploit. Specifically, there is considerable need to integrate knowledge from disparate studies to discover connections across domains. Here, we used a Collaborative Filtering approach, inspired by online recommendation algorithms, in which non-negative matrix factorization (NMF) predicts interactions among chemicals, genes, and diseases only from pairwise information about their interactions. Our approach, applied to matrices derived from the Comparative Toxicogenomics Database, successfully recovered Chemical-Disease, Chemical-Gene, and Disease-Gene networks in 10-fold cross-validation experiments. Additionally, we could predict each of these interaction matrices from the other two. Integrating all three CTD interaction matrices with NMF led to good predictions of STRING, an independent, external network of protein-protein interactions. Finally, this approach could integrate the CTD and STRING interaction data to improve Chemical-Gene cross-validation performance significantly, and, in a time-stamped study, it predicted information added to CTD after a given date, using only data prior to that date. We conclude that collaborative filtering can integrate information across multiple types of biological entities, and that as a first step towards precision medicine it can compute drug repurposing hypotheses.
REGENBOGEN, SAM; WILKINS, ANGELA D.; LICHTARGE, OLIVIER
2015-01-01
Biomedicine produces copious information it cannot fully exploit. Specifically, there is considerable need to integrate knowledge from disparate studies to discover connections across domains. Here, we used a Collaborative Filtering approach, inspired by online recommendation algorithms, in which non-negative matrix factorization (NMF) predicts interactions among chemicals, genes, and diseases only from pairwise information about their interactions. Our approach, applied to matrices derived from the Comparative Toxicogenomics Database, successfully recovered Chemical-Disease, Chemical-Gene, and Disease-Gene networks in 10-fold cross-validation experiments. Additionally, we could predict each of these interaction matrices from the other two. Integrating all three CTD interaction matrices with NMF led to good predictions of STRING, an independent, external network of protein-protein interactions. Finally, this approach could integrate the CTD and STRING interaction data to improve Chemical-Gene cross-validation performance significantly, and, in a time-stamped study, it predicted information added to CTD after a given date, using only data prior to that date. We conclude that collaborative filtering can integrate information across multiple types of biological entities, and that as a first step towards precision medicine it can compute drug repurposing hypotheses. PMID:26776170
Zhou, Yan; Cao, Hui
2013-01-01
We propose an augmented classical least squares (ACLS) calibration method for quantitative Raman spectral analysis against component information loss. The Raman spectral signals with low analyte concentration correlations were selected and used as the substitutes for unknown quantitative component information during the CLS calibration procedure. The number of selected signals was determined by using the leave-one-out root-mean-square error of cross-validation (RMSECV) curve. An ACLS model was built based on the augmented concentration matrix and the reference spectral signal matrix. The proposed method was compared with partial least squares (PLS) and principal component regression (PCR) using one example: a data set recorded from an experiment of analyte concentration determination using Raman spectroscopy. A 2-fold cross-validation with Venetian blinds strategy was exploited to evaluate the predictive power of the proposed method. The one-way variance analysis (ANOVA) was used to access the predictive power difference between the proposed method and existing methods. Results indicated that the proposed method is effective at increasing the robust predictive power of traditional CLS model against component information loss and its predictive power is comparable to that of PLS or PCR.
Assessment of local friction in protein folding dynamics using a helix cross-linker.
Markiewicz, Beatrice N; Jo, Hyunil; Culik, Robert M; DeGrado, William F; Gai, Feng
2013-11-27
Internal friction arising from local steric hindrance and/or the excluded volume effect plays an important role in controlling not only the dynamics of protein folding but also conformational transitions occurring within the native state potential well. However, experimental assessment of such local friction is difficult because it does not manifest itself as an independent experimental observable. Herein, we demonstrate, using the miniprotein trp-cage as a testbed, that it is possible to selectively increase the local mass density in a protein and hence the magnitude of local friction, thus making its effect directly measurable via folding kinetic studies. Specifically, we show that when a helix cross-linker, m-xylene, is placed near the most congested region of the trp-cage it leads to a significant decrease in both the folding rate (by a factor of 3.8) and unfolding rate (by a factor of 2.5 at 35 °C) but has little effect on protein stability. Thus, these results, in conjunction with those obtained with another cross-linked trp-cage and two uncross-linked variants, demonstrate the feasibility of using a nonperturbing cross-linker to help quantify the effect of internal friction. In addition, we estimate that a m-xylene cross-linker could lead to an increase in the roughness of the folding energy landscape by as much as 0.4-1.0k(B)T.
Mexican sign language recognition using normalized moments and artificial neural networks
NASA Astrophysics Data System (ADS)
Solís-V., J.-Francisco; Toxqui-Quitl, Carina; Martínez-Martínez, David; H.-G., Margarita
2014-09-01
This work presents a framework designed for the Mexican Sign Language (MSL) recognition. A data set was recorded with 24 static signs from the MSL using 5 different versions, this MSL dataset was captured using a digital camera in incoherent light conditions. Digital Image Processing was used to segment hand gestures, a uniform background was selected to avoid using gloved hands or some special markers. Feature extraction was performed by calculating normalized geometric moments of gray scaled signs, then an Artificial Neural Network performs the recognition using a 10-fold cross validation tested in weka, the best result achieved 95.83% of recognition rate.
Latifoğlu, Fatma; Kodaz, Halife; Kara, Sadik; Güneş, Salih
2007-08-01
This study was conducted to distinguish between atherosclerosis and healthy subjects. Hence, we have employed the maximum envelope of the carotid artery Doppler sonograms derived from Fast Fourier Transformation-Welch method and Artificial Immune Recognition System (AIRS). The fuzzy appearance of the carotid artery Doppler signals makes physicians suspicious about the existence of diseases and sometimes causes false diagnosis. Our technique gets around this problem using AIRS to decide and assist the physician to make the final judgment in confidence. AIRS has reached 99.29% classification accuracy using 10-fold cross validation. Results show that the proposed method classified Doppler signals successfully.
Simultaneous dual modality optical and MR imaging of mouse dorsal skin-fold window chamber
NASA Astrophysics Data System (ADS)
Salek, Mir Farrokh; Pagel, Mark D.; Gmitro, Arthur F.
2011-02-01
Optical imaging and MRI have both been used extensively to study tumor microenvironment. The two imaging modalities are complementary and can be used to cross-validate one another for specific measurements. We have developed a modular platform that is capable of doing optical microscopy inside an MRI instrument. To do this, an optical relay system transfers the image to outside of the MR bore to a commercial grade CCD camera. This enables simultaneous optical and MR imaging of the same tissue and thus creates the ideal situation for comparative or complementary studies using both modalities. Initial experiments have been done using GFP labeled prostate cancer cells implanted in mouse dorsal skin fold window chamber. Vascular hemodynamics and vascular permeability were studied using our imaging system. Towards this goal, we developed a dual MR-Optical contrast agent by labeling BSA with both Gd-DTPA and Alexa Fluor. Overall system design and results of these preliminary vascular studies are presented.
Computer-aided detection of prostate cancer in T2-weighted MRI within the peripheral zone
NASA Astrophysics Data System (ADS)
Rampun, Andrik; Zheng, Ling; Malcolm, Paul; Tiddeman, Bernie; Zwiggelaar, Reyer
2016-07-01
In this paper we propose a prostate cancer computer-aided diagnosis (CAD) system and suggest a set of discriminant texture descriptors extracted from T2-weighted MRI data which can be used as a good basis for a multimodality system. For this purpose, 215 texture descriptors were extracted and eleven different classifiers were employed to achieve the best possible results. The proposed method was tested based on 418 T2-weighted MR images taken from 45 patients and evaluated using 9-fold cross validation with five patients in each fold. The results demonstrated comparable results to existing CAD systems using multimodality MRI. We achieved an area under the receiver operating curve (A z ) values equal to 90.0%+/- 7.6% , 89.5%+/- 8.9% , 87.9%+/- 9.3% and 87.4%+/- 9.2% for Bayesian networks, ADTree, random forest and multilayer perceptron classifiers, respectively, while a meta-voting classifier using average probability as a combination rule achieved 92.7%+/- 7.4% .
Epitropaki, Olga; Martin, Robin
2004-04-01
The present empirical investigation had a 3-fold purpose: (a) to cross-validate L. R. Offermann, J. K. Kennedy, and P. W. Wirtz's (1994) scale of Implicit Leadership Theories (ILTs) in several organizational settings and to further provide a shorter scale of ILTs in organizations; (b) to assess the generalizability of ILTs across different employee groups, and (c) to evaluate ILTs' change over time. Two independent samples were used for the scale validation (N1 = 500 and N2 = 439). A 6-factor structure (Sensitivity, Intelligence, Dedication, Dynamism, Tyranny, and Masculinity) was found to most accurately represent ELTs in organizational settings. Regarding the generalizability of ILTs, although the 6-factor structure was consistent across different employee groups, there was only partial support for total factorial invariance. Finally, evaluation of gamma, beta, and alpha change provided support for ILTs' stability over time.
Kusumoto, Dai; Lachmann, Mark; Kunihiro, Takeshi; Yuasa, Shinsuke; Kishino, Yoshikazu; Kimura, Mai; Katsuki, Toshiomi; Itoh, Shogo; Seki, Tomohisa; Fukuda, Keiichi
2018-06-05
Deep learning technology is rapidly advancing and is now used to solve complex problems. Here, we used deep learning in convolutional neural networks to establish an automated method to identify endothelial cells derived from induced pluripotent stem cells (iPSCs), without the need for immunostaining or lineage tracing. Networks were trained to predict whether phase-contrast images contain endothelial cells based on morphology only. Predictions were validated by comparison to immunofluorescence staining for CD31, a marker of endothelial cells. Method parameters were then automatically and iteratively optimized to increase prediction accuracy. We found that prediction accuracy was correlated with network depth and pixel size of images to be analyzed. Finally, K-fold cross-validation confirmed that optimized convolutional neural networks can identify endothelial cells with high performance, based only on morphology. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Zammit, Andrea R; Hall, Charles B; Lipton, Richard B; Katz, Mindy J; Muniz-Terrera, Graciela
2018-05-01
The aim of this study was to identify natural subgroups of older adults based on cognitive performance, and to establish each subgroup's characteristics based on demographic factors, physical function, psychosocial well-being, and comorbidity. We applied latent class (LC) modeling to identify subgroups in baseline assessments of 1345 Einstein Aging Study (EAS) participants free of dementia. The EAS is a community-dwelling cohort study of 70+ year-old adults living in the Bronx, NY. We used 10 neurocognitive tests and 3 covariates (age, sex, education) to identify latent subgroups. We used goodness-of-fit statistics to identify the optimal class solution and assess model adequacy. We also validated our model using two-fold split-half cross-validation. The sample had a mean age of 78.0 (SD=5.4) and a mean of 13.6 years of education (SD=3.5). A 9-class solution based on cognitive performance at baseline was the best-fitting model. We characterized the 9 identified classes as (i) disadvantaged, (ii) poor language, (iii) poor episodic memory and fluency, (iv) poor processing speed and executive function, (v) low average, (vi) high average, (vii) average, (viii) poor executive and poor working memory, (ix) elite. The cross validation indicated stable class assignment with the exception of the average and high average classes. LC modeling in a community sample of older adults revealed 9 cognitive subgroups. Assignment of subgroups was reliable and associated with external validators. Future work will test the predictive validity of these groups for outcomes such as Alzheimer's disease, vascular dementia and death, as well as markers of biological pathways that contribute to cognitive decline. (JINS, 2018, 24, 511-523).
Panman, Matthijs R; van Dijk, Chris N; Meuzelaar, Heleen; Woutersen, S
2015-01-28
We present a simple method to measure the dynamics of cross peaks in time-resolved two-dimensional vibrational spectroscopy. By combining suitably weighted dispersed pump-probe spectra, we eliminate the diagonal contribution to the 2D-IR response, so that the dispersed pump-probe signal contains the projection of only the cross peaks onto one of the axes of the 2D-IR spectrum. We apply the method to investigate the folding dynamics of an alpha-helical peptide in a temperature-jump experiment and find characteristic folding and unfolding time constants of 260 ± 30 and 580 ± 70 ns at 298 K.
Yorulmaz Salman, Sibel; Aydınlı, Fatma; Ay, Recep
2015-07-01
Phytoseiulus persimilis of the family Phytoseiidae is an effective predatory mite species that is used to control pest mites. The LC50 and LC60 values of etoxazole were determined on P. persimilis using a leaf-disc method and spraying tower. A laboratory selection population designated ETO6 was found to have a 111.63-fold resistance to etoxazole following 6 selection cycles. This population developed low cross-resistance to spinosad, spiromesifen, acetamiprid, indoxacarb, chlorantraniliprole, milbemectin and moderate cross-resistance to deltamethrin. PBO, IBP and DEM synergised resistance 3.17-, 2.85- and 3.60-fold respectively. Crossing experiments revealed that etoxazole resistance in the ETO6 population was an intermediately dominant and polygenic. In addition, detoxifying enzyme activities were increased 2.71-fold for esterase, 3.09-fold for glutathione S-transferase (GST) and 2.76-fold for cytochrome P450 monooxygenase (P450) in the ETO6 population. Selection for etoxazole under laboratory conditions resulted in the development of etoxazole resistance in the predatory mite P. persimilis that are resistant to pesticides are considered valuable for use in resistance management programmes within integrated pest control strategies. Copyright © 2014 Elsevier Inc. All rights reserved.
Luo, Heng; Ye, Hao; Ng, Hui; Shi, Leming; Tong, Weida; Mattes, William; Mendrick, Donna; Hong, Huixiao
2015-01-01
As the major histocompatibility complex (MHC), human leukocyte antigens (HLAs) are one of the most polymorphic genes in humans. Patients carrying certain HLA alleles may develop adverse drug reactions (ADRs) after taking specific drugs. Peptides play an important role in HLA related ADRs as they are the necessary co-binders of HLAs with drugs. Many experimental data have been generated for understanding HLA-peptide binding. However, efficiently utilizing the data for understanding and accurately predicting HLA-peptide binding is challenging. Therefore, we developed a network analysis based method to understand and predict HLA-peptide binding. Qualitative Class I HLA-peptide binding data were harvested and prepared from four major databases. An HLA-peptide binding network was constructed from this dataset and modules were identified by the fast greedy modularity optimization algorithm. To examine the significance of signals in the yielded models, the modularity was compared with the modularity values generated from 1,000 random networks. The peptides and HLAs in the modules were characterized by similarity analysis. The neighbor-edges based and unbiased leverage algorithm (Nebula) was developed for predicting HLA-peptide binding. Leave-one-out (LOO) validations and two-fold cross-validations were conducted to evaluate the performance of Nebula using the constructed HLA-peptide binding network. Nine modules were identified from analyzing the HLA-peptide binding network with a highest modularity compared to all the random networks. Peptide length and functional side chains of amino acids at certain positions of the peptides were different among the modules. HLA sequences were module dependent to some extent. Nebula archived an overall prediction accuracy of 0.816 in the LOO validations and average accuracy of 0.795 in the two-fold cross-validations and outperformed the method reported in the literature. Network analysis is a useful approach for analyzing large and sparse datasets such as the HLA-peptide binding dataset. The modules identified from the network analysis clustered peptides and HLAs with similar sequences and properties of amino acids. Nebula performed well in the predictions of HLA-peptide binding. We demonstrated that network analysis coupled with Nebula is an efficient approach to understand and predict HLA-peptide binding interactions and thus, could further our understanding of ADRs.
2015-01-01
Background As the major histocompatibility complex (MHC), human leukocyte antigens (HLAs) are one of the most polymorphic genes in humans. Patients carrying certain HLA alleles may develop adverse drug reactions (ADRs) after taking specific drugs. Peptides play an important role in HLA related ADRs as they are the necessary co-binders of HLAs with drugs. Many experimental data have been generated for understanding HLA-peptide binding. However, efficiently utilizing the data for understanding and accurately predicting HLA-peptide binding is challenging. Therefore, we developed a network analysis based method to understand and predict HLA-peptide binding. Methods Qualitative Class I HLA-peptide binding data were harvested and prepared from four major databases. An HLA-peptide binding network was constructed from this dataset and modules were identified by the fast greedy modularity optimization algorithm. To examine the significance of signals in the yielded models, the modularity was compared with the modularity values generated from 1,000 random networks. The peptides and HLAs in the modules were characterized by similarity analysis. The neighbor-edges based and unbiased leverage algorithm (Nebula) was developed for predicting HLA-peptide binding. Leave-one-out (LOO) validations and two-fold cross-validations were conducted to evaluate the performance of Nebula using the constructed HLA-peptide binding network. Results Nine modules were identified from analyzing the HLA-peptide binding network with a highest modularity compared to all the random networks. Peptide length and functional side chains of amino acids at certain positions of the peptides were different among the modules. HLA sequences were module dependent to some extent. Nebula archived an overall prediction accuracy of 0.816 in the LOO validations and average accuracy of 0.795 in the two-fold cross-validations and outperformed the method reported in the literature. Conclusions Network analysis is a useful approach for analyzing large and sparse datasets such as the HLA-peptide binding dataset. The modules identified from the network analysis clustered peptides and HLAs with similar sequences and properties of amino acids. Nebula performed well in the predictions of HLA-peptide binding. We demonstrated that network analysis coupled with Nebula is an efficient approach to understand and predict HLA-peptide binding interactions and thus, could further our understanding of ADRs. PMID:26424483
A Study into the Collision-induced Dissociation (CID) Behavior of Cross-Linked Peptides*
Giese, Sven H.; Fischer, Lutz; Rappsilber, Juri
2016-01-01
Cross-linking/mass spectrometry resolves protein–protein interactions or protein folds by help of distance constraints. Cross-linkers with specific properties such as isotope-labeled or collision-induced dissociation (CID)-cleavable cross-linkers are in frequent use to simplify the identification of cross-linked peptides. Here, we analyzed the mass spectrometric behavior of 910 unique cross-linked peptides in high-resolution MS1 and MS2 from published data and validate the observation by a ninefold larger set from currently unpublished data to explore if detailed understanding of their fragmentation behavior would allow computational delivery of information that otherwise would be obtained via isotope labels or CID cleavage of cross-linkers. Isotope-labeled cross-linkers reveal cross-linked and linear fragments in fragmentation spectra. We show that fragment mass and charge alone provide this information, alleviating the need for isotope-labeling for this purpose. Isotope-labeled cross-linkers also indicate cross-linker-containing, albeit not specifically cross-linked, peptides in MS1. We observed that acquisition can be guided to better than twofold enrich cross-linked peptides with minimal losses based on peptide mass and charge alone. By help of CID-cleavable cross-linkers, individual spectra with only linear fragments can be recorded for each peptide in a cross-link. We show that cross-linked fragments of ordinary cross-linked peptides can be linearized computationally and that a simplified subspectrum can be extracted that is enriched in information on one of the two linked peptides. This allows identifying candidates for this peptide in a simplified database search as we propose in a search strategy here. We conclude that the specific behavior of cross-linked peptides in mass spectrometers can be exploited to relax the requirements on cross-linkers. PMID:26719564
NASA Astrophysics Data System (ADS)
Gutiérrez, Jose Manuel; Maraun, Douglas; Widmann, Martin; Huth, Radan; Hertig, Elke; Benestad, Rasmus; Roessler, Ole; Wibig, Joanna; Wilcke, Renate; Kotlarski, Sven
2016-04-01
VALUE is an open European network to validate and compare downscaling methods for climate change research (http://www.value-cost.eu). A key deliverable of VALUE is the development of a systematic validation framework to enable the assessment and comparison of both dynamical and statistical downscaling methods. This framework is based on a user-focused validation tree, guiding the selection of relevant validation indices and performance measures for different aspects of the validation (marginal, temporal, spatial, multi-variable). Moreover, several experiments have been designed to isolate specific points in the downscaling procedure where problems may occur (assessment of intrinsic performance, effect of errors inherited from the global models, effect of non-stationarity, etc.). The list of downscaling experiments includes 1) cross-validation with perfect predictors, 2) GCM predictors -aligned with EURO-CORDEX experiment- and 3) pseudo reality predictors (see Maraun et al. 2015, Earth's Future, 3, doi:10.1002/2014EF000259, for more details). The results of these experiments are gathered, validated and publicly distributed through the VALUE validation portal, allowing for a comprehensive community-open downscaling intercomparison study. In this contribution we describe the overall results from Experiment 1), consisting of a European wide 5-fold cross-validation (with consecutive 6-year periods from 1979 to 2008) using predictors from ERA-Interim to downscale precipitation and temperatures (minimum and maximum) over a set of 86 ECA&D stations representative of the main geographical and climatic regions in Europe. As a result of the open call for contribution to this experiment (closed in Dec. 2015), over 40 methods representative of the main approaches (MOS and Perfect Prognosis, PP) and techniques (linear scaling, quantile mapping, analogs, weather typing, linear and generalized regression, weather generators, etc.) were submitted, including information both data (downscaled values) and metadata (characterizing different aspects of the downscaling methods). This constitutes the largest and most comprehensive to date intercomparison of statistical downscaling methods. Here, we present an overall validation, analyzing marginal and temporal aspects to assess the intrinsic performance and added value of statistical downscaling methods at both annual and seasonal levels. This validation takes into account the different properties/limitations of different approaches and techniques (as reported in the provided metadata) in order to perform a fair comparison. It is pointed out that this experiment alone is not sufficient to evaluate the limitations of (MOS) bias correction techniques. Moreover, it also does not fully validate PP since we don't learn whether we have the right predictors and whether the PP assumption is valid. These problems will be analyzed in the subsequent community-open VALUE experiments 2) and 3), which will be open for participation along the present year.
NASA Astrophysics Data System (ADS)
Cheng, Zihao; Campbell, Robert E.
2007-02-01
Binding proteins suitable for expression and high affinity molecular recognition in the cytoplasm or nucleus of live cells have numerous applications in the biological sciences. In an effort to add a new minimal motif to the growing repertoire of validated non-immunoglobulin binding proteins, we have undertaken the development of a generic protein scaffold based on a single β-hairpin that can fold efficiently in the cytoplasm. We have developed a method, based on the measurement of fluorescence resonance energy transfer (FRET) between a genetically fused cyan fluorescent protein (CFP) and yellow fluorescent protein (YFP), that allows the structural stability of recombinant β-hairpin peptides to be rapidly assessed both in vitro and in vivo. We have previously reported the validation of this method when applied to a 16mer tryptophan zipper β-hairpin. We now describe the use of this method to evaluate the potential of a designed 20mer β-hairpin peptide with a 3rd Trp/Trp cross-strand pair to function as a generic protein scaffold. Quantitative analysis of the FRET efficiency, resistance to proteolysis (assayed by loss of FRET), and circular dichroism spectra revealed that the 20mer peptide is significantly more tolerant of destabilizing mutations than the 16mer peptide. Furthermore, we experimentally demonstrate that the in vitro determined β-hairpin stabilities are well correlated with in vivo β-hairpin stabilities as determined by FRET measurements of colonies of live bacteria expressing the recombinant peptides flanked by CFP and YFP. Finally, we report on our progress to develop highly folded 24mer and 28mer β-hairpin peptides through the use of fluorescence-based library screening.
Tunable mechanical stability and deformation response of a resilin-based elastomer.
Li, Linqing; Teller, Sean; Clifton, Rodney J; Jia, Xinqiao; Kiick, Kristi L
2011-06-13
Resilin, the highly elastomeric protein found in specialized compartments of most arthropods, possesses superior resilience and excellent high-frequency responsiveness. Enabled by biosynthetic strategies, we have designed and produced a modular, recombinant resilin-like polypeptide bearing both mechanically active and biologically active domains to create novel biomaterial microenvironments for engineering mechanically active tissues such as blood vessels, cardiovascular tissues, and vocal folds. Preliminary studies revealed that these recombinant materials exhibit promising mechanical properties and support the adhesion of NIH 3T3 fibroblasts. In this Article, we detail the characterization of the dynamic mechanical properties of these materials, as assessed via dynamic oscillatory shear rheology at various protein concentrations and cross-linking ratios. Simply by varying the polypeptide concentration and cross-linker ratios, the storage modulus G' can be easily tuned within the range of 500 Pa to 10 kPa. Strain-stress cycles and resilience measurements were probed via standard tensile testing methods and indicated the excellent resilience (>90%) of these materials, even when the mechanically active domains are intercepted by nonmechanically active biological cassettes. Further evaluation, at high frequencies, of the mechanical properties of these materials were assessed by a custom-designed torsional wave apparatus (TWA) at frequencies close to human phonation, indicating elastic modulus values from 200 to 2500 Pa, which is within the range of experimental data collected on excised porcine and human vocal fold tissues. The results validate the outstanding mechanical properties of the engineered materials, which are highly comparable to the mechanical properties of targeted vocal fold tissues. The ease of production of these biologically active materials, coupled to their outstanding mechanical properties over a range of compositions, suggests their potential in tissue regeneration applications.
Fronto-Temporal Connectivity Predicts ECT Outcome in Major Depression.
Leaver, Amber M; Wade, Benjamin; Vasavada, Megha; Hellemann, Gerhard; Joshi, Shantanu H; Espinoza, Randall; Narr, Katherine L
2018-01-01
Electroconvulsive therapy (ECT) is arguably the most effective available treatment for severe depression. Recent studies have used MRI data to predict clinical outcome to ECT and other antidepressant therapies. One challenge facing such studies is selecting from among the many available metrics, which characterize complementary and sometimes non-overlapping aspects of brain function and connectomics. Here, we assessed the ability of aggregated, functional MRI metrics of basal brain activity and connectivity to predict antidepressant response to ECT using machine learning. A radial support vector machine was trained using arterial spin labeling (ASL) and blood-oxygen-level-dependent (BOLD) functional magnetic resonance imaging (fMRI) metrics from n = 46 (26 female, mean age 42) depressed patients prior to ECT (majority right-unilateral stimulation). Image preprocessing was applied using standard procedures, and metrics included cerebral blood flow in ASL, and regional homogeneity, fractional amplitude of low-frequency modulations, and graph theory metrics (strength, local efficiency, and clustering) in BOLD data. A 5-repeated 5-fold cross-validation procedure with nested feature-selection validated model performance. Linear regressions were applied post hoc to aid interpretation of discriminative features. The range of balanced accuracy in models performing statistically above chance was 58-68%. Here, prediction of non-responders was slightly higher than for responders (maximum performance 74 and 64%, respectively). Several features were consistently selected across cross-validation folds, mostly within frontal and temporal regions. Among these were connectivity strength among: a fronto-parietal network [including left dorsolateral prefrontal cortex (DLPFC)], motor and temporal networks (near ECT electrodes), and/or subgenual anterior cingulate cortex (sgACC). Our data indicate that pattern classification of multimodal fMRI metrics can successfully predict ECT outcome, particularly for individuals who will not respond to treatment. Notably, connectivity with networks highly relevant to ECT and depression were consistently selected as important predictive features. These included the left DLPFC and the sgACC, which are both targets of other neurostimulation therapies for depression, as well as connectivity between motor and right temporal cortices near electrode sites. Future studies that probe additional functional and structural MRI metrics and other patient characteristics may further improve the predictive power of these and similar models.
Circuit topology of self-interacting chains: implications for folding and unfolding dynamics.
Mugler, Andrew; Tans, Sander J; Mashaghi, Alireza
2014-11-07
Understanding the relationship between molecular structure and folding is a central problem in disciplines ranging from biology to polymer physics and DNA origami. Topology can be a powerful tool to address this question. For a folded linear chain, the arrangement of intra-chain contacts is a topological property because rearranging the contacts requires discontinuous deformations. Conversely, the topology is preserved when continuously stretching the chain while maintaining the contact arrangement. Here we investigate how the folding and unfolding of linear chains with binary contacts is guided by the topology of contact arrangements. We formalize the topology by describing the relations between any two contacts in the structure, which for a linear chain can either be in parallel, in series, or crossing each other. We show that even when other determinants of folding rate such as contact order and size are kept constant, this 'circuit' topology determines folding kinetics. In particular, we find that the folding rate increases with the fractions of parallel and crossed relations. Moreover, we show how circuit topology constrains the conformational phase space explored during folding and unfolding: the number of forbidden unfolding transitions is found to increase with the fraction of parallel relations and to decrease with the fraction of series relations. Finally, we find that circuit topology influences whether distinct intermediate states are present, with crossed contacts being the key factor. The approach presented here can be more generally applied to questions on molecular dynamics, evolutionary biology, molecular engineering, and single-molecule biophysics.
Decorrelation of the true and estimated classifier errors in high-dimensional settings.
Hanczar, Blaise; Hua, Jianping; Dougherty, Edward R
2007-01-01
The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional settings where feature selection is used to mitigate the peaking phenomenon (overfitting). Because classifier design is based upon random samples, both the true and estimated errors are sample-dependent random variables, and one would expect a loss of precision if the estimated and true errors are not well correlated, so that natural questions arise as to the degree of correlation and the manner in which lack of correlation impacts error estimation. We demonstrate the effect of correlation on error precision via a decomposition of the variance of the deviation distribution, observe that the correlation is often severely decreased in high-dimensional settings, and show that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on the variance of the estimated error. We consider the correlation between the true and estimated errors under different experimental conditions using both synthetic and real data, several feature-selection methods, different classification rules, and three error estimators commonly used (leave-one-out cross-validation, k-fold cross-validation, and .632 bootstrap). Moreover, three scenarios are considered: (1) feature selection, (2) known-feature set, and (3) all features. Only the first is of practical interest; however, the other two are needed for comparison purposes. We will observe that the true and estimated errors tend to be much more correlated in the case of a known feature set than with either feature selection or using all features, with the better correlation between the latter two showing no general trend, but differing for different models.
NASA Astrophysics Data System (ADS)
Koestel, John; Bechtold, Michel; Jorda, Helena; Jarvis, Nicholas
2015-04-01
The saturated and near-saturated hydraulic conductivity of soil is of key importance for modelling water and solute fluxes in the vadose zone. Hydraulic conductivity measurements are cumbersome at the Darcy scale and practically impossible at larger scales where water and solute transport models are mostly applied. Hydraulic conductivity must therefore be estimated from proxy variables. Such pedotransfer functions are known to work decently well for e.g. water retention curves but rather poorly for near-saturated and saturated hydraulic conductivities. Recently, Weynants et al. (2009, Revisiting Vereecken pedotransfer functions: Introducing a closed-form hydraulic model. Vadose Zone Journal, 8, 86-95) reported a coefficients of determination of 0.25 (validation with an independent data set) for the saturated hydraulic conductivity from lab-measurements of Belgian soil samples. In our study, we trained boosted regression trees on a global meta-database containing tension-disk infiltrometer data (see Jarvis et al. 2013. Influence of soil, land use and climatic factors on the hydraulic conductivity of soil. Hydrology & Earth System Sciences, 17, 5185-5195) to predict the saturated hydraulic conductivity (Ks) and the conductivity at a tension of 10 cm (K10). We found coefficients of determination of 0.39 and 0.62 under a simple 10-fold cross-validation for Ks and K10. When carrying out the validation folded over the data-sources, i.e. the source publications, we found that the corresponding coefficients of determination reduced to 0.15 and 0.36, respectively. We conclude that the stricter source-wise cross-validation should be applied in future pedotransfer studies to prevent overly optimistic validation results. The boosted regression trees also allowed for an investigation of relevant predictors for estimating the near-saturated hydraulic conductivity. We found that land use and bulk density were most important to predict Ks. We also observed that Ks is large in fine and coarse textured soils and smaller in medium textured soils. Completely different predictors were important for appraising K10, where the soil macropore system is air-filled and therefore inactive. Here, the average annual temperature and precipitation where most important. The reasons for this are unclear and require further research. The clay content and the organic matter content were also important predictors of K10. We suggest that a larger and more complete database may help to improve the prediction of K10, whereas it may be more fruitful to estimate Ks statistics of sampling sites instead of individual values since the Ks is highly variable over very short distances.
Cao, D-S; Zhao, J-C; Yang, Y-N; Zhao, C-X; Yan, J; Liu, S; Hu, Q-N; Xu, Q-S; Liang, Y-Z
2012-01-01
There is a great need to assess the harmful effects or toxicities of chemicals to which man is exposed. In the present paper, the simplified molecular input line entry specification (SMILES) representation-based string kernel, together with the state-of-the-art support vector machine (SVM) algorithm, were used to classify the toxicity of chemicals from the US Environmental Protection Agency Distributed Structure-Searchable Toxicity (DSSTox) database network. In this method, the molecular structure can be directly encoded by a series of SMILES substrings that represent the presence of some chemical elements and different kinds of chemical bonds (double, triple and stereochemistry) in the molecules. Thus, SMILES string kernel can accurately and directly measure the similarities of molecules by a series of local information hidden in the molecules. Two model validation approaches, five-fold cross-validation and independent validation set, were used for assessing the predictive capability of our developed models. The results obtained indicate that SVM based on the SMILES string kernel can be regarded as a very promising and alternative modelling approach for potential toxicity prediction of chemicals.
NASA Astrophysics Data System (ADS)
Reif, Daniel; Grasemann, Bernhard; Lockhart, Duncan
2010-05-01
The Zagros fold-and-thrust belt has formed in detached Phanerozoic sedimentary cover rocks above a shortened crystalline Precambrian basement and evolved through the Late Cretaceous to Miocene collision between the Arabian and Eurasian plate, during which the Neotethys oceanic basin was closed. Deformation is partitioned in SW directed folding and thrusting of the sediments and NW-SE to N-S trending dextral strike slip faults. The sub-cylindrical doubly-plunging fold trains with wavelengths of 5 - 10 km host more than half of the world's hydrocarbon reserves in mostly anticlinal traps. Generally the Zagros is divided into three NW-SE striking tectonic units: the Zagros Imbricate Zone, the Zagros Simply Folded Belt and the Zagros Foredeep. This work presents a balanced cross-section through the Simply Folded Belt, NE of the city of Erbil (Kurdistan, Iraq). The regional stratigraphy comprises mainly Cretaceous to Cenozoic folded sediments consisting of massive, carbonate rocks (limestones, dolomites), reacting as competent layers during folding compared to the incompetent behavior of interlayered siltstones, claystones and marls. Although the overall security situation in Kurdistan is much better than in the rest of Iraq, structural field mapping was restricted to asphalt streets, mainly because of the contamination of the area with landmines and unexploded ordnance. In order to extend the structural measurements statistically over the investigated area, we used a newly developed software tool (www.terramath.com) for interactive structural mapping of spatial orientations (i.e. dip direction and dip angles) of the sedimentary beddings from digital elevation models. Structural field data and computed measurements where integrated and projected in NE-SW striking balanced cross-sections perpendicular to the regional trend of the fold axes. We used the software LithoTect (www.geologicsystems.com) for the restoration of the cross-sections. Depending on the interpretation of the shape of the synclines, which are not exposed and covered by Neogene sediments, the shortening is in the order of 10-20%. The restoration confirms that large scale faulting is only of minor importance in the Simply Folded Belt.
Li, Yunhai; Lee, Kee Khoon; Walsh, Sean; Smith, Caroline; Hadingham, Sophie; Sorefan, Karim; Cawley, Gavin; Bevan, Michael W
2006-03-01
Establishing transcriptional regulatory networks by analysis of gene expression data and promoter sequences shows great promise. We developed a novel promoter classification method using a Relevance Vector Machine (RVM) and Bayesian statistical principles to identify discriminatory features in the promoter sequences of genes that can correctly classify transcriptional responses. The method was applied to microarray data obtained from Arabidopsis seedlings treated with glucose or abscisic acid (ABA). Of those genes showing >2.5-fold changes in expression level, approximately 70% were correctly predicted as being up- or down-regulated (under 10-fold cross-validation), based on the presence or absence of a small set of discriminative promoter motifs. Many of these motifs have known regulatory functions in sugar- and ABA-mediated gene expression. One promoter motif that was not known to be involved in glucose-responsive gene expression was identified as the strongest classifier of glucose-up-regulated gene expression. We show it confers glucose-responsive gene expression in conjunction with another promoter motif, thus validating the classification method. We were able to establish a detailed model of glucose and ABA transcriptional regulatory networks and their interactions, which will help us to understand the mechanisms linking metabolism with growth in Arabidopsis. This study shows that machine learning strategies coupled to Bayesian statistical methods hold significant promise for identifying functionally significant promoter sequences.
Makris, Eleftherios A.; Responte, Donald J.; Hu, Jerry C.; Athanasiou, Kyriacos A.
2014-01-01
The inability to recapitulate native tissue biomechanics, especially tensile properties, hinders progress in regenerative medicine. To address this problem, strategies have focused on enhancing collagen production. However, manipulating collagen cross-links, ubiquitous throughout all tissues and conferring mechanical integrity, has been underinvestigated. A series of studies examined the effects of lysyl oxidase (LOX), the enzyme responsible for the formation of collagen cross-links. Hypoxia-induced endogenous LOX was applied in multiple musculoskeletal tissues (i.e., cartilage, meniscus, tendons, ligaments). Results of these studies showed that both native and engineered tissues are enhanced by invoking a mechanism of hypoxia-induced pyridinoline (PYR) cross-links via intermediaries like LOX. Hypoxia was shown to enhance PYR cross-linking 1.4- to 6.4-fold and, concomitantly, to increase the tensile properties of collagen-rich tissues 1.3- to 2.2-fold. Direct administration of exogenous LOX was applied in native cartilage and neocartilage generated using a scaffold-free, self-assembling process of primary chondrocytes. Exogenous LOX was found to enhance native tissue tensile properties 1.9-fold. LOX concentration- and time-dependent increases in PYR content (∼16-fold compared with controls) and tensile properties (approximately fivefold compared with controls) of neocartilage were also detected, resulting in properties on par with native tissue. Finally, in vivo subcutaneous implantation of LOX-treated neocartilage in nude mice promoted further maturation of the neotissue, enhancing tensile and PYR content approximately threefold and 14-fold, respectively, compared with in vitro controls. Collectively, these results provide the first report, to our knowledge, of endogenous (hypoxia-induced) and exogenous LOX applications for promoting collagen cross-linking and improving the tensile properties of a spectrum of native and engineered tissues both in vitro and in vivo. PMID:25349395
NASA Astrophysics Data System (ADS)
Dash, Jatindra K.; Kale, Mandar; Mukhopadhyay, Sudipta; Khandelwal, Niranjan; Prabhakar, Nidhi; Garg, Mandeep; Kalra, Naveen
2017-03-01
In this paper, we investigate the effect of the error criteria used during a training phase of the artificial neural network (ANN) on the accuracy of the classifier for classification of lung tissues affected with Interstitial Lung Diseases (ILD). Mean square error (MSE) and the cross-entropy (CE) criteria are chosen being most popular choice in state-of-the-art implementations. The classification experiment performed on the six interstitial lung disease (ILD) patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Micronodules, Fibrosis and Healthy from MedGIFT database. The texture features from an arbitrary region of interest (AROI) are extracted using Gabor filter. Two different neural networks are trained with the scaled conjugate gradient back propagation algorithm with MSE and CE error criteria function respectively for weight updation. Performance is evaluated in terms of average accuracy of these classifiers using 4 fold cross-validation. Each network is trained for five times for each fold with randomly initialized weight vectors and accuracies are computed. Significant improvement in classification accuracy is observed when ANN is trained by using CE (67.27%) as error function compared to MSE (63.60%). Moreover, standard deviation of the classification accuracy for the network trained with CE (6.69) error criteria is found less as compared to network trained with MSE (10.32) criteria.
Model selection for pion photoproduction
Landay, J.; Doring, M.; Fernandez-Ramirez, C.; ...
2017-01-12
Partial-wave analysis of meson and photon-induced reactions is needed to enable the comparison of many theoretical approaches to data. In both energy-dependent and independent parametrizations of partial waves, the selection of the model amplitude is crucial. Principles of the S matrix are implemented to a different degree in different approaches; but a many times overlooked aspect concerns the selection of undetermined coefficients and functional forms for fitting, leading to a minimal yet sufficient parametrization. We present an analysis of low-energy neutral pion photoproduction using the least absolute shrinkage and selection operator (LASSO) in combination with criteria from information theory andmore » K-fold cross validation. These methods are not yet widely known in the analysis of excited hadrons but will become relevant in the era of precision spectroscopy. As a result, the principle is first illustrated with synthetic data; then, its feasibility for real data is demonstrated by analyzing the latest available measurements of differential cross sections (dσ/dΩ), photon-beam asymmetries (Σ), and target asymmetry differential cross sections (dσ T/d≡Tdσ/dΩ) in the low-energy regime.« less
A multiscale decomposition approach to detect abnormal vasculature in the optic disc.
Agurto, Carla; Yu, Honggang; Murray, Victor; Pattichis, Marios S; Nemeth, Sheila; Barriga, Simon; Soliz, Peter
2015-07-01
This paper presents a multiscale method to detect neovascularization in the optic disc (NVD) using fundus images. Our method is applied to a manually selected region of interest (ROI) containing the optic disc. All the vessels in the ROI are segmented by adaptively combining contrast enhancement methods with a vessel segmentation technique. Textural features extracted using multiscale amplitude-modulation frequency-modulation, morphological granulometry, and fractal dimension are used. A linear SVM is used to perform the classification, which is tested by means of 10-fold cross-validation. The performance is evaluated using 300 images achieving an AUC of 0.93 with maximum accuracy of 88%. Copyright © 2015 Elsevier Ltd. All rights reserved.
Li, Jing; Hong, Wenxue
2014-12-01
The feature extraction and feature selection are the important issues in pattern recognition. Based on the geometric algebra representation of vector, a new feature extraction method using blade coefficient of geometric algebra was proposed in this study. At the same time, an improved differential evolution (DE) feature selection method was proposed to solve the elevated high dimension issue. The simple linear discriminant analysis was used as the classifier. The result of the 10-fold cross-validation (10 CV) classification of public breast cancer biomedical dataset was more than 96% and proved superior to that of the original features and traditional feature extraction method.
NASA Astrophysics Data System (ADS)
Barone, Fabrizio; Giordano, Gerardo
2018-02-01
We present the Extended Folded Pendulum Model (EFPM), a model developed for a quantitative description of the dynamical behavior of a folded pendulum generically oriented in space. This model, based on the Tait-Bryan angular reference system, highlights the relationship between the folded pendulum orientation in the gravitational field and its natural resonance frequency. Tis model validated by tests performed with a monolithic UNISA Folded Pendulum, highlights a new technique of implementation of folded pendulum based tiltmeters.
Development of a QSAR Model for Thyroperoxidase Inhbition ...
hyroid hormones (THs) are involved in multiple biological processes and are critical modulators of fetal development. Even moderate changes in maternal or fetal TH levels can produce irreversible neurological deficits in children, such as lower IQ. The enzyme thyroperoxidase (TPO) plays a key role in the synthesis of THs, and inhibition of TPO by xenobiotics results in decreased TH synthesis. Recently, a high-throughput screening assay for TPO inhibition (AUR-TPO) was developed and used to test the ToxCast Phase I and II chemicals. In the present study, we used the results from AUR-TPO to develop a Quantitative Structure-Activity Relationship (QSAR) model for TPO inhibition. The training set consisted of 898 discrete organic chemicals: 134 inhibitors and 764 non-inhibitors. A five times two-fold cross-validation of the model was performed, yielding a balanced accuracy of 78.7%. More recently, an additional ~800 chemicals were tested in the AUR-TPO assay. These data were used for a blinded external validation of the QSAR model, demonstrating a balanced accuracy of 85.7%. Overall, the cross- and external validation indicate a robust model with high predictive performance. Next, we used the QSAR model to predict 72,526 REACH pre-registered substances. The model could predict 49.5% (35,925) of the substances in its applicability domain and of these, 8,863 (24.7%) were predicted to be TPO inhibitors. Predictions from this screening can be used in a tiered approach to
Glavatskikh, Marta; Madzhidov, Timur; Solov'ev, Vitaly; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre
2016-12-01
In this work, we report QSPR modeling of the free energy ΔG of 1 : 1 hydrogen bond complexes of different H-bond acceptors and donors. The modeling was performed on a large and structurally diverse set of 3373 complexes featuring a single hydrogen bond, for which ΔG was measured at 298 K in CCl 4 . The models were prepared using Support Vector Machine and Multiple Linear Regression, with ISIDA fragment descriptors. The marked atoms strategy was applied at fragmentation stage, in order to capture the location of H-bond donor and acceptor centers. Different strategies of model validation have been suggested, including the targeted omission of individual H-bond acceptors and donors from the training set, in order to check whether the predictive ability of the model is not limited to the interpolation of H-bond strength between two already encountered partners. Successfully cross-validating individual models were combined into a consensus model, and challenged to predict external test sets of 629 and 12 complexes, in which donor and acceptor formed single and cooperative H-bonds, respectively. In all cases, SVM models outperform MLR. The SVM consensus model performs well both in 3-fold cross-validation (RMSE=1.50 kJ/mol), and on the external test sets containing complexes with single (RMSE=3.20 kJ/mol) and cooperative H-bonds (RMSE=1.63 kJ/mol). © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Characterization of Asian Corn Borer Resistance to Bt Toxin Cry1Ie.
Wang, Yueqin; Yang, Jing; Quan, Yudong; Wang, Zhenying; Cai, Wanzhi; He, Kanglai
2017-06-07
A strain of the Asian corn borer (ACB), Ostrinia furnacalis (Guenée), has evolved >800-fold resistance to Cry1Ie (ACB-IeR) after 49 generations of selection. The inheritance pattern of resistance to Cry1Ie in ACB-IeR strain and its cross-resistance to other Bt toxins were determined through bioassay by exposing neonates from genetic-crosses to toxins incorporated into the diet. The response of progenies from reciprocal F₁ crosses were similar (LC 50 s: 76.07 vs. 74.32 μg/g), which suggested the resistance was autosomal. The effective dominance ( h ) decreased as concentration of Cry1Ie increased. h was nearly recessive or incompletely recessive on Cry1Ie maize leaf tissue ( h = 0.02), but nearly dominant or incompletely dominant ( h = 0.98) on Cry1Ie maize silk. Bioassay of the backcross suggested that the resistance was controlled by more than one locus. In addition, the resistant strain did not perform cross-resistance to Cry1Ab (0.8-fold), Cry1Ac (0.8-fold), Cry1F (0.9-fold), and Cry1Ah (1.0-fold). The present study not only offers the manifestation for resistance management, but also recommends that Cry1Ie will be an appropriate candidate for expression with Cry1Ab, Cry1Ac, Cry1F, or Cry1Ah for the development of Bt maize.
Adaptation and validation of Mandarin Chinese version of the pediatric Voice Handicap Index (pVHI).
Lu, Dan; Huang, Mengjie; Li, Zhen; Yiu, Edwin M-L; Cheng, Ivy K-Y; Yang, Hui; Ma, Estella P-M
2018-01-01
The aim of this study was to adapt and validate the English version of pediatric voice handicap index (pVHI) into Mandarin Chinese. METHODS: A cross-sectional study was performed from May 2016 to April 2017. A total of 367 parents participated in this study, and 338 parents completed the translated questionnaire without missing data, including 213 parents of children with voice disorders (patients group), and 125 parents of children without voice disorders (control group). The internal consistency, test-retest reliability, contents validity, construct validity, clinical validity, and cutoff point were calculated. The most common voice disorder in the patients group was vocal fold nodules (77.9%), followed by chronic laryngitis (18.8%), and vocal fold polyps (3.3%). The prevalence for voice disorders was higher in boys (67.1%) than girls (32.9%). The most common vocal misuse and abuse habit was shouting loudly (n = 186, 87.3%), followed by speaking for a long time (n = 158, 74.2%), and crying loudly (n = 99, 46.5%). The internal consistency for the Mandarin Chinese version of pVHI was excellent in patients group (Cronbach α = 0.95). The inter-class correlation coefficient indicated strong test-retest reliability (ICC = 0.99). The principal-component analysis demonstrated three-factor eigenvalues greater than 1, and the cumulative proportion was 66.23%. The mean total scores and mean subscales scores were significantly higher in the patients group than the control group (p < 0.05). The physical domain had the highest mean score among the three subscales (functional, physical and emotional) in the patients group. The optimal cutoff point of the Mandarin Chinese version of pVHI was 9.5 points with a sensitivity of 80.3% and a specificity of 84.8%. The Mandarin Chinese version of pVHI was a reliable and valid tool to assess the parents' perception about their children's voice disorders. It is recommended that it can be used as a screening tool for discriminating between children with and without dysphonia. Copyright © 2017 Elsevier B.V. All rights reserved.
Predicting protein-binding RNA nucleotides with consideration of binding partners.
Tuvshinjargal, Narankhuu; Lee, Wook; Park, Byungkyu; Han, Kyungsook
2015-06-01
In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Using deep learning for detecting gender in adult chest radiographs
NASA Astrophysics Data System (ADS)
Xue, Zhiyun; Antani, Sameer; Long, L. Rodney; Thoma, George R.
2018-03-01
In this paper, we present a method for automatically identifying the gender of an imaged person using their frontal chest x-ray images. Our work is motivated by the need to determine missing gender information in some datasets. The proposed method employs the technique of convolutional neural network (CNN) based deep learning and transfer learning to overcome the challenge of developing handcrafted features in limited data. Specifically, the method consists of four main steps: pre-processing, CNN feature extractor, feature selection, and classifier. The method is tested on a combined dataset obtained from several sources with varying acquisition quality resulting in different pre-processing steps that are applied for each. For feature extraction, we tested and compared four CNN architectures, viz., AlexNet, VggNet, GoogLeNet, and ResNet. We applied a feature selection technique, since the feature length is larger than the number of images. Two popular classifiers: SVM and Random Forest, are used and compared. We evaluated the classification performance by cross-validation and used seven performance measures. The best performer is the VggNet-16 feature extractor with the SVM classifier, with accuracy of 86.6% and ROC Area being 0.932 for 5-fold cross validation. We also discuss several misclassified cases and describe future work for performance improvement.
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Stapanian, Martin A.; Adams, Jean V.; Fennessy, M. Siobhan; Mack, John; Micacchion, Mick
2013-01-01
A persistent question among ecologists and environmental managers is whether constructed wetlands are structurally or functionally equivalent to naturally occurring wetlands. We examined 19 variables collected from 10 constructed and nine natural emergent wetlands in Ohio, USA. Our primary objective was to identify candidate indicators of wetland class (natural or constructed), based on measurements of soil properties and an index of vegetation integrity, that can be used to track the progress of constructed wetlands toward a natural state. The method of nearest shrunken centroids was used to find a subset of variables that would serve as the best classifiers of wetland class, and error rate was calculated using a five-fold cross-validation procedure. The shrunken differences of percent total organic carbon (% TOC) and percent dry weight of the soil exhibited the greatest distances from the overall centroid. Classification based on these two variables yielded a misclassification rate of 11% based on cross-validation. Our results indicate that % TOC and percent dry weight can be used as candidate indicators of the status of emergent, constructed wetlands in Ohio and for assessing the performance of mitigation. The method of nearest shrunken centroids has excellent potential for further applications in ecology.
Penalized spline estimation for functional coefficient regression models.
Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan
2010-04-01
The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.
Computational Depth of Anesthesia via Multiple Vital Signs Based on Artificial Neural Networks.
Sadrawi, Muammar; Fan, Shou-Zen; Abbod, Maysam F; Jen, Kuo-Kuang; Shieh, Jiann-Shing
2015-01-01
This study evaluated the depth of anesthesia (DoA) index using artificial neural networks (ANN) which is performed as the modeling technique. Totally 63-patient data is addressed, for both modeling and testing of 17 and 46 patients, respectively. The empirical mode decomposition (EMD) is utilized to purify between the electroencephalography (EEG) signal and the noise. The filtered EEG signal is subsequently extracted to achieve a sample entropy index by every 5-second signal. Then, it is combined with other mean values of vital signs, that is, electromyography (EMG), heart rate (HR), pulse, systolic blood pressure (SBP), diastolic blood pressure (DBP), and signal quality index (SQI) to evaluate the DoA index as the input. The 5 doctor scores are averaged to obtain an output index. The mean absolute error (MAE) is utilized as the performance evaluation. 10-fold cross-validation is performed in order to generalize the model. The ANN model is compared with the bispectral index (BIS). The results show that the ANN is able to produce lower MAE than BIS. For the correlation coefficient, ANN also has higher value than BIS tested on the 46-patient testing data. Sensitivity analysis and cross-validation method are applied in advance. The results state that EMG has the most effecting parameter, significantly.
Computational Depth of Anesthesia via Multiple Vital Signs Based on Artificial Neural Networks
Sadrawi, Muammar; Fan, Shou-Zen; Abbod, Maysam F.; Jen, Kuo-Kuang; Shieh, Jiann-Shing
2015-01-01
This study evaluated the depth of anesthesia (DoA) index using artificial neural networks (ANN) which is performed as the modeling technique. Totally 63-patient data is addressed, for both modeling and testing of 17 and 46 patients, respectively. The empirical mode decomposition (EMD) is utilized to purify between the electroencephalography (EEG) signal and the noise. The filtered EEG signal is subsequently extracted to achieve a sample entropy index by every 5-second signal. Then, it is combined with other mean values of vital signs, that is, electromyography (EMG), heart rate (HR), pulse, systolic blood pressure (SBP), diastolic blood pressure (DBP), and signal quality index (SQI) to evaluate the DoA index as the input. The 5 doctor scores are averaged to obtain an output index. The mean absolute error (MAE) is utilized as the performance evaluation. 10-fold cross-validation is performed in order to generalize the model. The ANN model is compared with the bispectral index (BIS). The results show that the ANN is able to produce lower MAE than BIS. For the correlation coefficient, ANN also has higher value than BIS tested on the 46-patient testing data. Sensitivity analysis and cross-validation method are applied in advance. The results state that EMG has the most effecting parameter, significantly. PMID:26568957
Mortality risk score prediction in an elderly population using machine learning.
Rose, Sherri
2013-03-01
Standard practice for prediction often relies on parametric regression methods. Interesting new methods from the machine learning literature have been introduced in epidemiologic studies, such as random forest and neural networks. However, a priori, an investigator will not know which algorithm to select and may wish to try several. Here I apply the super learner, an ensembling machine learning approach that combines multiple algorithms into a single algorithm and returns a prediction function with the best cross-validated mean squared error. Super learning is a generalization of stacking methods. I used super learning in the Study of Physical Performance and Age-Related Changes in Sonomans (SPPARCS) to predict death among 2,066 residents of Sonoma, California, aged 54 years or more during the period 1993-1999. The super learner for predicting death (risk score) improved upon all single algorithms in the collection of algorithms, although its performance was similar to that of several algorithms. Super learner outperformed the worst algorithm (neural networks) by 44% with respect to estimated cross-validated mean squared error and had an R2 value of 0.201. The improvement of super learner over random forest with respect to R2 was approximately 2-fold. Alternatives for risk score prediction include the super learner, which can provide improved performance.
Twofold processing for denoising ultrasound medical images.
Kishore, P V V; Kumar, K V V; Kumar, D Anil; Prasad, M V D; Goutham, E N D; Rahul, R; Krishna, C B S Vamsi; Sandeep, Y
2015-01-01
Ultrasound medical (US) imaging non-invasively pictures inside of a human body for disease diagnostics. Speckle noise attacks ultrasound images degrading their visual quality. A twofold processing algorithm is proposed in this work to reduce this multiplicative speckle noise. First fold used block based thresholding, both hard (BHT) and soft (BST), on pixels in wavelet domain with 8, 16, 32 and 64 non-overlapping block sizes. This first fold process is a better denoising method for reducing speckle and also inducing object of interest blurring. The second fold process initiates to restore object boundaries and texture with adaptive wavelet fusion. The degraded object restoration in block thresholded US image is carried through wavelet coefficient fusion of object in original US mage and block thresholded US image. Fusion rules and wavelet decomposition levels are made adaptive for each block using gradient histograms with normalized differential mean (NDF) to introduce highest level of contrast between the denoised pixels and the object pixels in the resultant image. Thus the proposed twofold methods are named as adaptive NDF block fusion with hard and soft thresholding (ANBF-HT and ANBF-ST). The results indicate visual quality improvement to an interesting level with the proposed twofold processing, where the first fold removes noise and second fold restores object properties. Peak signal to noise ratio (PSNR), normalized cross correlation coefficient (NCC), edge strength (ES), image quality Index (IQI) and structural similarity index (SSIM), measure the quantitative quality of the twofold processing technique. Validation of the proposed method is done by comparing with anisotropic diffusion (AD), total variational filtering (TVF) and empirical mode decomposition (EMD) for enhancement of US images. The US images are provided by AMMA hospital radiology labs at Vijayawada, India.
Denis, Marie; Enquobahrie, Daniel A; Tadesse, Mahlet G; Gelaye, Bizu; Sanchez, Sixto E; Salazar, Manuel; Ananth, Cande V; Williams, Michelle A
2014-01-01
While available evidence supports the role of genetics in the pathogenesis of placental abruption (PA), PA-related placental genome variations and maternal-placental genetic interactions have not been investigated. Maternal blood and placental samples collected from participants in the Peruvian Abruptio Placentae Epidemiology study were genotyped using Illumina's Cardio-Metabochip platform. We examined 118,782 genome-wide SNPs and 333 SNPs in 32 candidate genes from mitochondrial biogenesis and oxidative phosphorylation pathways in placental DNA from 280 PA cases and 244 controls. We assessed maternal-placental interactions in the candidate gene SNPS and two imprinted regions (IGF2/H19 and C19MC). Univariate and penalized logistic regression models were fit to estimate odds ratios. We examined the combined effect of multiple SNPs on PA risk using weighted genetic risk scores (WGRS) with repeated ten-fold cross-validations. A multinomial model was used to investigate maternal-placental genetic interactions. In placental genome-wide and candidate gene analyses, no SNP was significant after false discovery rate correction. The top genome-wide association study (GWAS) hits were rs544201, rs1484464 (CTNNA2), rs4149570 (TNFRSF1A) and rs13055470 (ZNRF3) (p-values: 1.11e-05 to 3.54e-05). The top 200 SNPs of the GWAS overrepresented genes involved in cell cycle, growth and proliferation. The top candidate gene hits were rs16949118 (COX10) and rs7609948 (THRB) (p-values: 6.00e-03 and 8.19e-03). Participants in the highest quartile of WGRS based on cross-validations using SNPs selected from the GWAS and candidate gene analyses had a 8.40-fold (95% CI: 5.8-12.56) and a 4.46-fold (95% CI: 2.94-6.72) higher odds of PA compared to participants in the lowest quartile. We found maternal-placental genetic interactions on PA risk for two SNPs in PPARG (chr3:12313450 and chr3:12412978) and maternal imprinting effects for multiple SNPs in the C19MC and IGF2/H19 regions. Variations in the placental genome and interactions between maternal-placental genetic variations may contribute to PA risk. Larger studies may help advance our understanding of PA pathogenesis.
Beretta, Lorenzo; Santaniello, Alessandro; Cappiello, Francesca; Chawla, Nitesh V; Vonk, Madelon C; Carreira, Patricia E; Allanore, Yannick; Popa-Diaconu, D A; Cossu, Marta; Bertolotti, Francesca; Ferraccioli, Gianfranco; Mazzone, Antonino; Scorza, Raffaella
2010-01-01
Systemic sclerosis (SSc) is a multiorgan disease with high mortality rates. Several clinical features have been associated with poor survival in different populations of SSc patients, but no clear and reproducible prognostic model to assess individual survival prediction in scleroderma patients has ever been developed. We used Cox regression and three data mining-based classifiers (Naïve Bayes Classifier [NBC], Random Forests [RND-F] and logistic regression [Log-Reg]) to develop a robust and reproducible 5-year prognostic model. All the models were built and internally validated by means of 5-fold cross-validation on a population of 558 Italian SSc patients. Their predictive ability and capability of generalisation was then tested on an independent population of 356 patients recruited from 5 external centres and finally compared to the predictions made by two SSc domain experts on the same population. The NBC outperformed the Cox-based classifier and the other data mining algorithms after internal cross-validation (area under receiving operator characteristic curve, AUROC: NBC=0.759; RND-F=0.736; Log-Reg=0.754 and Cox= 0.724). The NBC had also a remarkable and better trade-off between sensitivity and specificity (e.g. Balanced accuracy, BA) than the Cox-based classifier, when tested on an independent population of SSc patients (BA: NBC=0.769, Cox=0.622). The NBC was also superior to domain experts in predicting 5-year survival in this population (AUROC=0.829 vs. AUROC=0.788 and BA=0.769 vs. BA=0.67). We provide a model to make consistent 5-year prognostic predictions in SSc patients. Its internal validity, as well as capability of generalisation and reduced uncertainty compared to human experts support its use at bedside. Available at: http://www.nd.edu/~nchawla/survival.xls.
Stikic, Maja; Berka, Chris; Levendowski, Daniel J.; Rubio, Roberto F.; Tan, Veasna; Korszen, Stephanie; Barba, Douglas; Wurzer, David
2014-01-01
The objective of this study was to investigate the feasibility of physiological metrics such as ECG-derived heart rate and EEG-derived cognitive workload and engagement as potential predictors of performance on different training tasks. An unsupervised approach based on self-organizing neural network (NN) was utilized to model cognitive state changes over time. The feature vector comprised EEG-engagement, EEG-workload, and heart rate metrics, all self-normalized to account for individual differences. During the competitive training process, a linear topology was developed where the feature vectors similar to each other activated the same NN nodes. The NN model was trained and auto-validated on combat marksmanship training data from 51 participants that were required to make “deadly force decisions” in challenging combat scenarios. The trained NN model was cross validated using 10-fold cross-validation. It was also validated on a golf study in which additional 22 participants were asked to complete 10 sessions of 10 putts each. Temporal sequences of the activated nodes for both studies followed the same pattern of changes, demonstrating the generalization capabilities of the approach. Most node transition changes were local, but important events typically caused significant changes in the physiological metrics, as evidenced by larger state changes. This was investigated by calculating a transition score as the sum of subsequent state transitions between the activated NN nodes. Correlation analysis demonstrated statistically significant correlations between the transition scores and subjects' performances in both studies. This paper explored the hypothesis that temporal sequences of physiological changes comprise the discriminative patterns for performance prediction. These physiological markers could be utilized in future training improvement systems (e.g., through neurofeedback), and applied across a variety of training environments. PMID:25414629
Weidhaas, Joanne B.; Li, Shu-Xia; Winter, Kathryn; Ryu, Janice; Jhingran, Anuja; Miller, Bridgette; Dicker, Adam P.; Gaffney, David
2009-01-01
Purpose To evaluate the potential of gene expression signatures to predict response to treatment in locally advanced cervical cancer treated with definitive chemotherapy and radiation. Experimental Design Tissue biopsies were collected from patients participating in Radiation Therapy Oncology Group (RTOG) 0128, a phase II trial evaluating the benefit of celecoxib in addition to cisplatin chemotherapy and radiation for locally advanced cervical cancer. Gene expression profiling was done and signatures of pretreatment, mid-treatment (before the first implant), and “changed” gene expression patterns between pre- and mid-treatment samples were determined. The ability of the gene signatures to predict local control versus local failure was evaluated. Two-group t test was done to identify the initial gene set separating these end points. Supervised classification methods were used to enrich the gene sets. The results were further validated by leave-one-out and 2-fold cross-validation. Results Twenty-two patients had suitable material from pretreatment samples for analysis, and 13 paired pre- and mid-treatment samples were obtained. The changed gene expression signatures between the pre- and mid-treatment biopsies predicted response to treatment, separating patients with local failures from those who achieved local control with a seven-gene signature. The in-sample prediction rate, leave-one-out prediction rate, and 2-fold prediction rate are 100% for this seven-gene signature. This signature was enriched for cell cycle genes. Conclusions Changed gene expression signatures during therapy in cervical cancer can predict outcome as measured by local control. After further validation, such findings could be applied to direct additional therapy for cervical cancer patients treated with chemotherapy and radiation. PMID:19509178
Viscoelasticity of rabbit vocal folds after injection augmentation.
Dahlqvist, Ake; Gärskog, Ola; Laurent, Claude; Hertegård, Stellan; Ambrosio, Luigi; Borzacchiello, Assunta
2004-01-01
Vocal fold function is related to the viscoelasticity of the vocal fold tissue. Augmentation substances used for injection treatment of voice insufficiency may alter the viscoelastic properties of vocal folds and their vibratory capacity. The objective was to compare the mechanical properties (viscoelasticity) of various injectable substances and the viscoelasticity of rabbit vocal folds, 6 months after injection with one of these substances. Animal model. Cross-linked collagen (Zyplast), double cross-linked hyaluronan (hylan B gel), dextranomers in hyaluronan (DHIA), and polytetrafluoroethylene (Teflon) were injected into rabbit vocal folds. Six months after the injection, the animals were killed and the right- and left-side vocal folds were removed. Dynamic viscosity of the injected substances and the vocal folds was measured with a Bohlin parallel-plate rheometer during small-amplitude oscillation. All injected vocal folds showed a decreasing dynamic viscosity with increasing frequency. Hylan B gel and DiHA showed the lowest dynamic viscosity values, and vocal folds injected with these substances also showed the lowest dynamic viscosity (similar to noninjected control samples). Teflon (and vocal folds injected with Teflon) showed the highest dynamic viscosity values, followed by the collagen samples. Substances with low viscoelasticity alter the mechanical properties of the vocal fold to a lesser degree than substances with a high viscoelasticity. The data indicated that hylan B gel and DiHA render the most natural viscoelastic properties to the vocal folds. These substances seem to be appropriate for preserving or restoring the vibratory capacity of the vocal folds when glottal insufficiency is treated with augmentative injections.
Partial wave analysis for folded differential cross sections
NASA Astrophysics Data System (ADS)
Machacek, J. R.; McEachran, R. P.
2018-03-01
The value of modified effective range theory (MERT) and the connection between differential cross sections and phase shifts in low-energy electron scattering has long been recognized. Recent experimental techniques involving magnetically confined beams have introduced the concept of folded differential cross sections (FDCS) where the forward (θ ≤ π/2) and backward scattered (θ ≥ π/2) projectiles are unresolved, that is the value measured at the angle θ is the sum of the signal for particles scattered into the angles θ and π - θ. We have developed an alternative approach to MERT in order to analyse low-energy folded differential cross sections for positrons and electrons. This results in a simplified expression for the FDCS when it is expressed in terms of partial waves and thereby enables one to extract the first few phase shifts from a fit to an experimental FDCS at low energies. Thus, this method predicts forward and backward angle scattering (0 to π) using only experimental FDCS data and can be used to determine the total elastic cross section solely from experimental results at low-energy, which are limited in angular range.
NASA Astrophysics Data System (ADS)
Wei, Jun; Sahiner, Berkman; Hadjiiski, Lubomir M.; Chan, Heang-Ping; Helvie, Mark A.; Roubidoux, Marilyn A.; Zhou, Chuan; Ge, Jun; Zhang, Yiheng
2006-03-01
We are developing a two-view information fusion method to improve the performance of our CAD system for mass detection. Mass candidates on each mammogram were first detected with our single-view CAD system. Potential object pairs on the two-view mammograms were then identified by using the distance between the object and the nipple. Morphological features, Hessian feature, correlation coefficients between the two paired objects and texture features were used as input to train a similarity classifier that estimated a similarity scores for each pair. Finally, a linear discriminant analysis (LDA) classifier was used to fuse the score from the single-view CAD system and the similarity score. A data set of 475 patients containing 972 mammograms with 475 biopsy-proven masses was used to train and test the CAD system. All cases contained the CC view and the MLO or LM view. We randomly divided the data set into two independent sets of 243 cases and 232 cases. The training and testing were performed using the 2-fold cross validation method. The detection performance of the CAD system was assessed by free response receiver operating characteristic (FROC) analysis. The average test FROC curve was obtained from averaging the FP rates at the same sensitivity along the two corresponding test FROC curves from the 2-fold cross validation. At the case-based sensitivities of 90%, 85% and 80% on the test set, the single-view CAD system achieved an FP rate of 2.0, 1.5, and 1.2 FPs/image, respectively. With the two-view fusion system, the FP rates were reduced to 1.7, 1.3, and 1.0 FPs/image, respectively, at the corresponding sensitivities. The improvement was found to be statistically significant (p<0.05) by the AFROC method. Our results indicate that the two-view fusion scheme can improve the performance of mass detection on mammograms.
Finkelman, Matthew D; Smits, Niels; Kulich, Ronald J; Zacharoff, Kevin L; Magnuson, Britta E; Chang, Hong; Dong, Jinghui; Butler, Stephen F
2017-07-01
The Screener and Opioid Assessment for Patients with Pain-Revised (SOAPP-R) is a 24-item questionnaire designed to assess risk of aberrant medication-related behaviors in chronic pain patients. The introduction of short forms of the SOAPP-R may save time and increase utilization by practitioners. To develop and evaluate candidate SOAPP-R short forms. Retrospective study. Pain centers. Four hundred and twenty-eight patients with chronic noncancer pain. Subjects had previously been administered the full-length version of the SOAPP-R and been categorized as positive or negative for aberrant medication-related behaviors via the Aberrant Drug Behavior Index (ADBI). Short forms of the SOAPP-R were developed using lasso logistic regression. Sensitivity, specificity, and area under the curve (AUC) of all forms were calculated with respect to the ADBI using the complete data set, training-test analysis, and 10-fold cross-validation. The coefficient alpha of each form was also calculated. An external set of 12 pain practitioners reviewed the forms for content. In the complete data set analysis, a form of 12 items exhibited sensitivity, specificity, and AUC greater than or equal to those of the full-length SOAPP-R (which were 0.74, 0.67, and 0.76, respectively). The short form had a coefficient alpha of 0.76. In the training-test analysis and 10-fold cross-validation, it exhibited an AUC value within 0.01 of that of the full-length SOAPP-R. The majority of external practitioners reported a preference for this short form. The 12-item version of the SOAPP-R has potential as a short risk screener and should be tested prospectively. © 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Chen, Gongbo; Knibbs, Luke D; Zhang, Wenyi; Li, Shanshan; Cao, Wei; Guo, Jianping; Ren, Hongyan; Wang, Boguang; Wang, Hao; Williams, Gail; Hamm, N A S; Guo, Yuming
2018-02-01
PM 1 might be more hazardous than PM 2.5 (particulate matter with an aerodynamic diameter ≤ 1 μm and ≤2.5 μm, respectively). However, studies on PM 1 concentrations and its health effects are limited due to a lack of PM 1 monitoring data. To estimate spatial and temporal variations of PM 1 concentrations in China during 2005-2014 using satellite remote sensing, meteorology, and land use information. Two types of Moderate Resolution Imaging Spectroradiometer (MODIS) Collection 6 aerosol optical depth (AOD) data, Dark Target (DT) and Deep Blue (DB), were combined. Generalised additive model (GAM) was developed to link ground-monitored PM 1 data with AOD data and other spatial and temporal predictors (e.g., urban cover, forest cover and calendar month). A 10-fold cross-validation was performed to assess the predictive ability. The results of 10-fold cross-validation showed R 2 and Root Mean Squared Error (RMSE) for monthly prediction were 71% and 13.0 μg/m 3 , respectively. For seasonal prediction, the R 2 and RMSE were 77% and 11.4 μg/m 3 , respectively. The predicted annual mean concentration of PM 1 across China was 26.9 μg/m 3 . The PM 1 level was highest in winter while lowest in summer. Generally, the PM 1 levels in entire China did not substantially change during the past decade. Regarding local heavy polluted regions, PM 1 levels increased substantially in the South-Western Hebei and Beijing-Tianjin region. GAM with satellite-retrieved AOD, meteorology, and land use information has high predictive ability to estimate ground-level PM 1 . Ambient PM 1 reached high levels in China during the past decade. The estimated results can be applied to evaluate the health effects of PM 1 . Copyright © 2017 Elsevier Ltd. All rights reserved.
Automatic identification of inertial sensor placement on human body segments during walking
2013-01-01
Background Current inertial motion capture systems are rarely used in biomedical applications. The attachment and connection of the sensors with cables is often a complex and time consuming task. Moreover, it is prone to errors, because each sensor has to be attached to a predefined body segment. By using wireless inertial sensors and automatic identification of their positions on the human body, the complexity of the set-up can be reduced and incorrect attachments are avoided. We present a novel method for the automatic identification of inertial sensors on human body segments during walking. This method allows the user to place (wireless) inertial sensors on arbitrary body segments. Next, the user walks for just a few seconds and the segment to which each sensor is attached is identified automatically. Methods Walking data was recorded from ten healthy subjects using an Xsens MVN Biomech system with full-body configuration (17 inertial sensors). Subjects were asked to walk for about 6 seconds at normal walking speed (about 5 km/h). After rotating the sensor data to a global coordinate frame with x-axis in walking direction, y-axis pointing left and z-axis vertical, RMS, mean, and correlation coefficient features were extracted from x-, y- and z-components and magnitudes of the accelerations, angular velocities and angular accelerations. As a classifier, a decision tree based on the C4.5 algorithm was developed using Weka (Waikato Environment for Knowledge Analysis). Results and conclusions After testing the algorithm with 10-fold cross-validation using 31 walking trials (involving 527 sensors), 514 sensors were correctly classified (97.5%). When a decision tree for a lower body plus trunk configuration (8 inertial sensors) was trained and tested using 10-fold cross-validation, 100% of the sensors were correctly identified. This decision tree was also tested on walking trials of 7 patients (17 walking trials) after anterior cruciate ligament reconstruction, which also resulted in 100% correct identification, thus illustrating the robustness of the method. PMID:23517757
Automatic identification of inertial sensor placement on human body segments during walking.
Weenk, Dirk; van Beijnum, Bert-Jan F; Baten, Chris T M; Hermens, Hermie J; Veltink, Peter H
2013-03-21
Current inertial motion capture systems are rarely used in biomedical applications. The attachment and connection of the sensors with cables is often a complex and time consuming task. Moreover, it is prone to errors, because each sensor has to be attached to a predefined body segment. By using wireless inertial sensors and automatic identification of their positions on the human body, the complexity of the set-up can be reduced and incorrect attachments are avoided.We present a novel method for the automatic identification of inertial sensors on human body segments during walking. This method allows the user to place (wireless) inertial sensors on arbitrary body segments. Next, the user walks for just a few seconds and the segment to which each sensor is attached is identified automatically. Walking data was recorded from ten healthy subjects using an Xsens MVN Biomech system with full-body configuration (17 inertial sensors). Subjects were asked to walk for about 6 seconds at normal walking speed (about 5 km/h). After rotating the sensor data to a global coordinate frame with x-axis in walking direction, y-axis pointing left and z-axis vertical, RMS, mean, and correlation coefficient features were extracted from x-, y- and z-components and magnitudes of the accelerations, angular velocities and angular accelerations. As a classifier, a decision tree based on the C4.5 algorithm was developed using Weka (Waikato Environment for Knowledge Analysis). After testing the algorithm with 10-fold cross-validation using 31 walking trials (involving 527 sensors), 514 sensors were correctly classified (97.5%). When a decision tree for a lower body plus trunk configuration (8 inertial sensors) was trained and tested using 10-fold cross-validation, 100% of the sensors were correctly identified. This decision tree was also tested on walking trials of 7 patients (17 walking trials) after anterior cruciate ligament reconstruction, which also resulted in 100% correct identification, thus illustrating the robustness of the method.
Refining Time-Activity Classification of Human Subjects Using the Global Positioning System.
Hu, Maogui; Li, Wei; Li, Lianfa; Houston, Douglas; Wu, Jun
2016-01-01
Detailed spatial location information is important in accurately estimating personal exposure to air pollution. Global Position System (GPS) has been widely used in tracking personal paths and activities. Previous researchers have developed time-activity classification models based on GPS data, most of them were developed for specific regions. An adaptive model for time-location classification can be widely applied to air pollution studies that use GPS to track individual level time-activity patterns. Time-activity data were collected for seven days using GPS loggers and accelerometers from thirteen adult participants from Southern California under free living conditions. We developed an automated model based on random forests to classify major time-activity patterns (i.e. indoor, outdoor-static, outdoor-walking, and in-vehicle travel). Sensitivity analysis was conducted to examine the contribution of the accelerometer data and the supplemental spatial data (i.e. roadway and tax parcel data) to the accuracy of time-activity classification. Our model was evaluated using both leave-one-fold-out and leave-one-subject-out methods. Maximum speeds in averaging time intervals of 7 and 5 minutes, and distance to primary highways with limited access were found to be the three most important variables in the classification model. Leave-one-fold-out cross-validation showed an overall accuracy of 99.71%. Sensitivities varied from 84.62% (outdoor walking) to 99.90% (indoor). Specificities varied from 96.33% (indoor) to 99.98% (outdoor static). The exclusion of accelerometer and ambient light sensor variables caused a slight loss in sensitivity for outdoor walking, but little loss in overall accuracy. However, leave-one-subject-out cross-validation showed considerable loss in sensitivity for outdoor static and outdoor walking conditions. The random forests classification model can achieve high accuracy for the four major time-activity categories. The model also performed well with just GPS, road and tax parcel data. However, caution is warranted when generalizing the model developed from a small number of subjects to other populations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, W; Tu, S
Purpose: We conducted a retrospective study of Radiomics research for classifying malignancy of small pulmonary nodules. A machine learning algorithm of logistic regression and open research platform of Radiomics, IBEX (Imaging Biomarker Explorer), were used to evaluate the classification accuracy. Methods: The training set included 100 CT image series from cancer patients with small pulmonary nodules where the average diameter is 1.10 cm. These patients registered at Chang Gung Memorial Hospital and received a CT-guided operation of lung cancer lobectomy. The specimens were classified by experienced pathologists with a B (benign) or M (malignant). CT images with slice thickness ofmore » 0.625 mm were acquired from a GE BrightSpeed 16 scanner. The study was formally approved by our institutional internal review board. Nodules were delineated and 374 feature parameters were extracted from IBEX. We first used the t-test and p-value criteria to study which feature can differentiate between group B and M. Then we implemented a logistic regression algorithm to perform nodule malignancy classification. 10-fold cross-validation and the receiver operating characteristic curve (ROC) were used to evaluate the classification accuracy. Finally hierarchical clustering analysis, Spearman rank correlation coefficient, and clustering heat map were used to further study correlation characteristics among different features. Results: 238 features were found differentiable between group B and M based on whether their statistical p-values were less than 0.05. A forward search algorithm was used to select an optimal combination of features for the best classification and 9 features were identified. Our study found the best accuracy of classifying malignancy was 0.79±0.01 with the 10-fold cross-validation. The area under the ROC curve was 0.81±0.02. Conclusion: Benign nodules may be treated as a malignant tumor in low-dose CT and patients may undergo unnecessary surgeries or treatments. Our study may help radiologists to differentiate nodule malignancy for low-dose CT.« less
Lin, Frank P Y; Pokorny, Adrian; Teng, Christina; Dear, Rachel; Epstein, Richard J
2016-12-01
Multidisciplinary team (MDT) meetings are used to optimise expert decision-making about treatment options, but such expertise is not digitally transferable between centres. To help standardise medical decision-making, we developed a machine learning model designed to predict MDT decisions about adjuvant breast cancer treatments. We analysed MDT decisions regarding adjuvant systemic therapy for 1065 breast cancer cases over eight years. Machine learning classifiers with and without bootstrap aggregation were correlated with MDT decisions (recommended, not recommended, or discussable) regarding adjuvant cytotoxic, endocrine and biologic/targeted therapies, then tested for predictability using stratified ten-fold cross-validations. The predictions so derived were duly compared with those based on published (ESMO and NCCN) cancer guidelines. Machine learning more accurately predicted adjuvant chemotherapy MDT decisions than did simple application of guidelines. No differences were found between MDT- vs. ESMO/NCCN- based decisions to prescribe either adjuvant endocrine (97%, p = 0.44/0.74) or biologic/targeted therapies (98%, p = 0.82/0.59). In contrast, significant discrepancies were evident between MDT- and guideline-based decisions to prescribe chemotherapy (87%, p < 0.01, representing 43% and 53% variations from ESMO/NCCN guidelines, respectively). Using ten-fold cross-validation, the best classifiers achieved areas under the receiver operating characteristic curve (AUC) of 0.940 for chemotherapy (95% C.I., 0.922-0.958), 0.899 for the endocrine therapy (95% C.I., 0.880-0.918), and 0.977 for trastuzumab therapy (95% C.I., 0.955-0.999) respectively. Overall, bootstrap aggregated classifiers performed better among all evaluated machine learning models. A machine learning approach based on clinicopathologic characteristics can predict MDT decisions about adjuvant breast cancer drug therapies. The discrepancy between MDT- and guideline-based decisions regarding adjuvant chemotherapy implies that certain non-clincopathologic criteria, such as patient preference and resource availability, are factored into clinical decision-making by local experts but not captured by guidelines.
Iterative variational mode decomposition based automated detection of glaucoma using fundus images.
Maheshwari, Shishir; Pachori, Ram Bilas; Kanhangad, Vivek; Bhandary, Sulatha V; Acharya, U Rajendra
2017-09-01
Glaucoma is one of the leading causes of permanent vision loss. It is an ocular disorder caused by increased fluid pressure within the eye. The clinical methods available for the diagnosis of glaucoma require skilled supervision. They are manual, time consuming, and out of reach of common people. Hence, there is a need for an automated glaucoma diagnosis system for mass screening. In this paper, we present a novel method for an automated diagnosis of glaucoma using digital fundus images. Variational mode decomposition (VMD) method is used in an iterative manner for image decomposition. Various features namely, Kapoor entropy, Renyi entropy, Yager entropy, and fractal dimensions are extracted from VMD components. ReliefF algorithm is used to select the discriminatory features and these features are then fed to the least squares support vector machine (LS-SVM) for classification. Our proposed method achieved classification accuracies of 95.19% and 94.79% using three-fold and ten-fold cross-validation strategies, respectively. This system can aid the ophthalmologists in confirming their manual reading of classes (glaucoma or normal) using fundus images. Copyright © 2017 Elsevier Ltd. All rights reserved.
A contact photo-cross-linking investigation of the active site of the 8-17 deoxyribozyme.
Liu, Yong; Sen, Dipankar
2008-09-12
The small RNA-cleaving 8-17 deoxyribozyme (DNAzyme) has been the subject of extensive mechanistic and structural investigation, including a number of recent single-molecule studies of its global folding. Little detailed insight exists, however, into this DNAzyme's active site; for instance, the identity of specific nucleotides that are proximal to or in contact with the scissile site in the substrate. Here, we report a systematic replacement of a number of bases within the magnesium-folded DNAzyme-substrate complex with thio- and halogen-substituted base analogues, which were then photochemically activated to generate contact cross-links within the complex. Mapping of the cross-links revealed a striking pattern of DNAzyme-substrate cross-links but an absence of significant intra-DNAzyme cross-links. Notably, the two nucleotides directly flanking the scissile phosphodiester cross-linked strongly with functionally important elements within the DNAzyme, the thymine of a G.T wobble base pair, a WCGR bulge loop, and a terminal AGC loop. Mutation of the wobble base pair to a G-C pair led to a significant folding instability of the DNAzyme-substrate complex. The cross-linking patterns obtained were used to generate a model for the DNAzyme's active site that had the substrate's scissile phosphodiester sandwiched between the DNAzyme's wobble thymine and its AGC and WCGR loops.
Optimization of C4.5 algorithm-based particle swarm optimization for breast cancer diagnosis
NASA Astrophysics Data System (ADS)
Muslim, M. A.; Rukmana, S. H.; Sugiharti, E.; Prasetiyo, B.; Alimah, S.
2018-03-01
Data mining has become a basic methodology for computational applications in the field of medical domains. Data mining can be applied in the health field such as for diagnosis of breast cancer, heart disease, diabetes and others. Breast cancer is most common in women, with more than one million cases and nearly 600,000 deaths occurring worldwide each year. The most effective way to reduce breast cancer deaths was by early diagnosis. This study aims to determine the level of breast cancer diagnosis. This research data uses Wisconsin Breast Cancer dataset (WBC) from UCI machine learning. The method used in this research is the algorithm C4.5 and Particle Swarm Optimization (PSO) as a feature option and to optimize the algorithm. C4.5. Ten-fold cross-validation is used as a validation method and a confusion matrix. The result of this research is C4.5 algorithm. The particle swarm optimization C4.5 algorithm has increased by 0.88%.
Predictive modeling of addiction lapses in a mobile health application.
Chih, Ming-Yuan; Patton, Timothy; McTavish, Fiona M; Isham, Andrew J; Judkins-Fisher, Chris L; Atwood, Amy K; Gustafson, David H
2014-01-01
The chronically relapsing nature of alcoholism leads to substantial personal, family, and societal costs. Addiction-comprehensive health enhancement support system (A-CHESS) is a smartphone application that aims to reduce relapse. To offer targeted support to patients who are at risk of lapses within the coming week, a Bayesian network model to predict such events was constructed using responses on 2,934 weekly surveys (called the Weekly Check-in) from 152 alcohol-dependent individuals who recently completed residential treatment. The Weekly Check-in is a self-monitoring service, provided in A-CHESS, to track patients' recovery progress. The model showed good predictability, with the area under receiver operating characteristic curve of 0.829 in the 10-fold cross-validation and 0.912 in the external validation. The sensitivity/specificity table assists the tradeoff decisions necessary to apply the model in practice. This study moves us closer to the goal of providing lapse prediction so that patients might receive more targeted and timely support. © 2013.
Yin, Jinghai; Mu, Zhendong
2016-01-01
The rapid development of driver fatigue detection technology indicates important significance of traffic safety. The authors’ main goals of this Letter are principally three: (i) A middleware architecture, defined as process unit (PU), which can communicate with personal electroencephalography (EEG) node (PEN) and cloud server (CS). The PU receives EEG signals from PEN, recognises the fatigue state of the driver, and transfer this information to CS. The CS sends notification messages to the surrounding vehicles. (ii) An android application for fatigue detection is built. The application can be used for the driver to detect the state of his/her fatigue based on EEG signals, and warn neighbourhood vehicles. (iii) The detection algorithm for driver fatigue is applied based on fuzzy entropy. The idea of 10-fold cross-validation and support vector machine are used for classified calculation. Experimental results show that the average accurate rate of detecting driver fatigue is about 95%, which implying that the algorithm is validity in detecting state of driver fatigue. PMID:28529761
Yin, Jinghai; Hu, Jianfeng; Mu, Zhendong
2017-02-01
The rapid development of driver fatigue detection technology indicates important significance of traffic safety. The authors' main goals of this Letter are principally three: (i) A middleware architecture, defined as process unit (PU), which can communicate with personal electroencephalography (EEG) node (PEN) and cloud server (CS). The PU receives EEG signals from PEN, recognises the fatigue state of the driver, and transfer this information to CS. The CS sends notification messages to the surrounding vehicles. (ii) An android application for fatigue detection is built. The application can be used for the driver to detect the state of his/her fatigue based on EEG signals, and warn neighbourhood vehicles. (iii) The detection algorithm for driver fatigue is applied based on fuzzy entropy. The idea of 10-fold cross-validation and support vector machine are used for classified calculation. Experimental results show that the average accurate rate of detecting driver fatigue is about 95%, which implying that the algorithm is validity in detecting state of driver fatigue.
Breast Cancer Detection with Reduced Feature Set.
Mert, Ahmet; Kılıç, Niyazi; Bilgili, Erdem; Akan, Aydin
2015-01-01
This paper explores feature reduction properties of independent component analysis (ICA) on breast cancer decision support system. Wisconsin diagnostic breast cancer (WDBC) dataset is reduced to one-dimensional feature vector computing an independent component (IC). The original data with 30 features and reduced one feature (IC) are used to evaluate diagnostic accuracy of the classifiers such as k-nearest neighbor (k-NN), artificial neural network (ANN), radial basis function neural network (RBFNN), and support vector machine (SVM). The comparison of the proposed classification using the IC with original feature set is also tested on different validation (5/10-fold cross-validations) and partitioning (20%-40%) methods. These classifiers are evaluated how to effectively categorize tumors as benign and malignant in terms of specificity, sensitivity, accuracy, F-score, Youden's index, discriminant power, and the receiver operating characteristic (ROC) curve with its criterion values including area under curve (AUC) and 95% confidential interval (CI). This represents an improvement in diagnostic decision support system, while reducing computational complexity.
Predictive Modeling of Addiction Lapses in a Mobile Health Application
Chih, Ming-Yuan; Patton, Timothy; McTavish, Fiona M.; Isham, Andrew; Judkins-Fisher, Chris L.; Atwood, Amy K.; Gustafson, David H.
2013-01-01
The chronically relapsing nature of alcoholism leads to substantial personal, family, and societal costs. Addiction-Comprehensive Health Enhancement Support System (A-CHESS) is a smartphone application that aims to reduce relapse. To offer targeted support to patients who are at risk of lapses within the coming week, a Bayesian network model to predict such events was constructed using responses on 2,934 weekly surveys (called the Weekly Check-in) from 152 alcohol-dependent individuals who recently completed residential treatment. The Weekly Check-in is a self-monitoring service, provided in A-CHESS, to track patients’ recovery progress. The model showed good predictability, with the area under receiver operating characteristic curve of 0.829 in the 10-fold cross-validation and 0.912 in the external validation. The sensitivity/specificity table assists the tradeoff decisions necessary to apply the model in practice. This study moves us closer to the goal of providing lapse prediction so that patients might receive more targeted and timely support. PMID:24035143
Tribology of alternative bearings.
Fisher, John; Jin, Zhongmin; Tipper, Joanne; Stone, Martin; Ingham, Eileen
2006-12-01
The tribological performance and biological activity of the wear debris produced has been compared for highly cross-linked polyethylene, ceramic-on-ceramic, metal-on-metal, and modified metal bearings in a series of in vitro studies from a single laboratory. The functional lifetime demand of young and active patients is 10-fold greater than the estimated functional lifetime of traditional polyethylene. There is considerable interest in using larger diameter heads in these high demand patients. Highly cross-linked polyethylene show a four-fold reduction in functional biological activity. Ceramic-on-ceramic bearings have the lowest wear rates and least reactive wear debris. The functional biological activity is 20-fold lower than with highly cross-linked polyethylene. Hence, ceramic-on-ceramic bearings address the tribological lifetime demand of highly active patients. Metal-on-metal bearings have substantially lower wear rates than highly cross-linked polyethylene and wear decreases with head diameter. Bedding in wear is also lower with reduced radial clearance. Differential hardness ceramic-on-metal bearings and the application of ceramic-like coatings reduce metal wear and ion levels.
Improvement on a simplified model for protein folding simulation.
Zhang, Ming; Chen, Changjun; He, Yi; Xiao, Yi
2005-11-01
Improvements were made on a simplified protein model--the Ramachandran model-to achieve better computer simulation of protein folding. To check the validity of such improvements, we chose the ultrafast folding protein Engrailed Homeodomain as an example and explored several aspects of its folding. The engrailed homeodomain is a mainly alpha-helical protein of 61 residues from Drosophila melanogaster. We found that the simplified model of Engrailed Homeodomain can fold into a global minimum state with a tertiary structure in good agreement with its native structure.
Development and validation of a high-fidelity phonomicrosurgical trainer.
Klein, Adam M; Gross, Jennifer
2017-04-01
To validate the use of a high-fidelity phonomicrosurgical trainer. A high-fidelity phonomicrosurgical trainer, based on a previously validated model by Contag et al., 1 was designed with multilayered vocal folds that more closely mimic the consistency of true vocal folds, containing intracordal lesions to practice phonomicrosurgical removal. A training module was developed to simulate the true phonomicrosurgical experience. A validation study with novice and expert surgeons was conducted. Novices and experts were instructed to remove the lesion from the synthetic vocal folds, and novices were given four training trials. Performances were measured by the amount of time spent and tissue injury (microflap, superficial, deep) to the vocal fold. An independent Student t test and Fisher exact tests were used to compare subjects. A matched-paired t test and Wilcoxon signed rank tests were used to compare novice performance on the first and fourth trials and assess for improvement. Experts completed the excision with less total errors than novices (P = .004) and made less injury to the microflap (P = .05) and superficial tissue (P = .003). Novices improved their performance with training, making less total errors (P = .002) and superficial tissue injuries (P = .02) and spending less time for removal (P = .002) after several practice trials. This high-fidelity phonomicrosurgical trainer has been validated for novice surgeons. It can distinguish between experts and novices; and after training, it helped to improve novice performance. N/A. Laryngoscope, 127:888-893, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
NASA Astrophysics Data System (ADS)
Jacques, Dominique; Vieira, Romeu; Muchez, Philippe; Sintubin, Manuel
2018-02-01
The world-class W-Sn Panasqueira deposit consists of an extensive, subhorizontal vein swarm, peripheral to a late-orogenic greisen cupola. The vein swarm consists of hundreds of co-planar quartz veins that are overlapping and connected laterally over large distances. Various segmentation structures, a local zigzag geometry, and the occurrence of straight propagation paths indicate that they exploited a regional joint system. A detailed orientation analysis of the systematic joints reveals a geometrical relationship with the subvertical F2 fold generation, reflecting late-Variscan transpression. The joints are consistently orthogonal to the steeply plunging S0-S2 intersection lineation, both on the regional and the outcrop scale, and are thus defined as cross-fold or ac-joints. The joint system developed during the waning stages of the Variscan orogeny, when already uplifted to an upper-crustal level. Veining reactivated these cross-fold joints under the conditions of hydraulic overpressures and low differential stress. The consistent subperpendicular orientation of the veins relative to the non-cylindrical F2 hinge lines, also when having an inclined attitude, demonstrates that veining did not occur during far-field horizontal compression. Vein orientation is determined by local stress states variable on a meter-scale but with the minimum principal stress consistently subparallel to fold hinge lines. The conspicuous subhorizontal attitude of the Panasqueira vein swarm is thus dictated by the geometry of late-orogenic folds, which developed synchronous with oroclinal buckling of the Ibero-Armorican arc.
Lee, J; Kachman, S D; Spangler, M L
2017-08-01
Genomic selection (GS) has become an integral part of genetic evaluation methodology and has been applied to all major livestock species, including beef and dairy cattle, pigs, and chickens. Significant contributions in increased accuracy of selection decisions have been clearly illustrated in dairy cattle after practical application of GS. In the majority of U.S. beef cattle breeds, similar efforts have also been made to increase the accuracy of genetic merit estimates through the inclusion of genomic information into routine genetic evaluations using a variety of methods. However, prediction accuracies can vary relative to panel density, the number of folds used for folds cross-validation, and the choice of dependent variables (e.g., EBV, deregressed EBV, adjusted phenotypes). The aim of this study was to evaluate the accuracy of genomic predictors for Red Angus beef cattle with different strategies used in training and evaluation. The reference population consisted of 9,776 Red Angus animals whose genotypes were imputed to 2 medium-density panels consisting of over 50,000 (50K) and approximately 80,000 (80K) SNP. Using the imputed panels, we determined the influence of marker density, exclusion (deregressed EPD adjusting for parental information [DEPD-PA]) or inclusion (deregressed EPD without adjusting for parental information [DEPD]) of parental information in the deregressed EPD used as the dependent variable, and the number of clusters used to partition training animals (3, 5, or 10). A BayesC model with π set to 0.99 was used to predict molecular breeding values (MBV) for 13 traits for which EPD existed. The prediction accuracies were measured as genetic correlations between MBV and weighted deregressed EPD. The average accuracies across all traits were 0.540 and 0.552 when using the 50K and 80K SNP panels, respectively, and 0.538, 0.541, and 0.561 when using 3, 5, and 10 folds, respectively, for cross-validation. Using DEPD-PA as the response variable resulted in higher accuracies of MBV than those obtained by DEPD for growth and carcass traits. When DEPD were used as the response variable, accuracies were greater for threshold traits and those that are sex limited, likely due to the fact that these traits suffer from a lack of information content and excluding animals in training with only parental information substantially decreases the training population size. It is recommended that the contribution of parental average to deregressed EPD should be removed in the construction of genomic prediction equations. The difference in terms of prediction accuracies between the 2 SNP panels or the number of folds compared herein was negligible.
Paik, E Sun; Sohn, Insuk; Baek, Sun-Young; Shim, Minhee; Choi, Hyun Jin; Kim, Tae-Joong; Choi, Chel Hun; Lee, Jeong-Won; Kim, Byoung-Gie; Lee, Yoo-Young; Bae, Duk-Soo
2017-01-01
Purpose This study was conducted to evaluate the prognostic significance of pre-treatment complete blood cell count (CBC), including white blood cell (WBC) differential, in epithelial ovarian cancer (EOC) patients with primary debulking surgery (PDS) and to develop nomograms for platinum sensitivity, progression-free survival (PFS), and overall survival (OS). Materials and Methods We retrospectively reviewed the records of 757 patients with EOC whose primary treatment consisted of surgical debulking and chemotherapy at Samsung Medical Center from 2002 to 2012. We subsequently created nomograms for platinum sensitivity, 3-year PFS, and 5-year OS as prediction models for prognostic variables including age, stage, grade, cancer antigen 125 level, residual disease after PDS, and pre-treatment WBC differential counts. The models were then validated by 10-fold cross-validation (CV). Results In addition to stage and residual disease after PDS, which are known predictors, lymphocyte and monocyte count were found to be significant prognostic factors for platinum-sensitivity, platelet count for PFS, and neutrophil count for OS on multivariate analysis. The area under the curves of platinum sensitivity, 3-year PFS, and 5-year OS calculated by the 10-fold CV procedure were 0.7405, 0.8159, and 0.815, respectively. Conclusion Prognostic factors including pre-treatment CBC were used to develop nomograms for platinum sensitivity, 3-year PFS, and 5-year OS of patients with EOC. These nomograms can be used to better estimate individual outcomes. PMID:27669704
Paik, E Sun; Sohn, Insuk; Baek, Sun-Young; Shim, Minhee; Choi, Hyun Jin; Kim, Tae-Joong; Choi, Chel Hun; Lee, Jeong-Won; Kim, Byoung-Gie; Lee, Yoo-Young; Bae, Duk-Soo
2017-07-01
This study was conducted to evaluate the prognostic significance of pre-treatment complete blood cell count (CBC), including white blood cell (WBC) differential, in epithelial ovarian cancer (EOC) patients with primary debulking surgery (PDS) and to develop nomograms for platinum sensitivity, progression-free survival (PFS), and overall survival (OS). We retrospectively reviewed the records of 757 patients with EOC whose primary treatment consisted of surgical debulking and chemotherapy at Samsung Medical Center from 2002 to 2012. We subsequently created nomograms for platinum sensitivity, 3-year PFS, and 5-year OS as prediction models for prognostic variables including age, stage, grade, cancer antigen 125 level, residual disease after PDS, and pre-treatment WBC differential counts. The models were then validated by 10-fold cross-validation (CV). In addition to stage and residual disease after PDS, which are known predictors, lymphocyte and monocyte count were found to be significant prognostic factors for platinum-sensitivity, platelet count for PFS, and neutrophil count for OS on multivariate analysis. The area under the curves of platinum sensitivity, 3-year PFS, and 5-year OS calculated by the 10-fold CV procedure were 0.7405, 0.8159, and 0.815, respectively. Prognostic factors including pre-treatment CBC were used to develop nomograms for platinum sensitivity, 3-year PFS, and 5-year OS of patients with EOC. These nomograms can be used to better estimate individual outcomes.
Emotional Sentence Annotation Helps Predict Fiction Genre.
Samothrakis, Spyridon; Fasli, Maria
2015-01-01
Fiction, a prime form of entertainment, has evolved into multiple genres which one can broadly attribute to different forms of stories. In this paper, we examine the hypothesis that works of fiction can be characterised by the emotions they portray. To investigate this hypothesis, we use the work of fictions in the Project Gutenberg and we attribute basic emotional content to each individual sentence using Ekman's model. A time-smoothed version of the emotional content for each basic emotion is used to train extremely randomized trees. We show through 10-fold Cross-Validation that the emotional content of each work of fiction can help identify each genre with significantly higher probability than random. We also show that the most important differentiator between genre novels is fear.
Imputing data that are missing at high rates using a boosting algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cauthen, Katherine Regina; Lambert, Gregory; Ray, Jaideep
Traditional multiple imputation approaches may perform poorly for datasets with high rates of missingness unless many m imputations are used. This paper implements an alternative machine learning-based approach to imputing data that are missing at high rates. Here, we use boosting to create a strong learner from a weak learner fitted to a dataset missing many observations. This approach may be applied to a variety of types of learners (models). The approach is demonstrated by application to a spatiotemporal dataset for predicting dengue outbreaks in India from meteorological covariates. A Bayesian spatiotemporal CAR model is boosted to produce imputations, andmore » the overall RMSE from a k-fold cross-validation is used to assess imputation accuracy.« less
McBride, Matthew K; Podgorski, Maciej; Chatani, Shunsuke; Worrell, Brady T; Bowman, Christopher N
2018-06-21
Ductile, cross-linked films were folded as a means to program temporary shapes without the need for complex heating cycles or specialized equipment. Certain cross-linked polymer networks, formed here with the thiol-isocyanate reaction, possessed the ability to be pseudoplastically deformed below the glass transition, and the original shape was recovered during heating through the glass transition. To circumvent the large forces required to plastically deform a glassy polymer network, we have utilized folding, which localizes the deformation in small creases, and achieved large dimensional changes with simple programming procedures. In addition to dimension changes, three-dimensional objects such as swans and airplanes were developed to demonstrate applying origami principles to shape memory. We explored the fundamental mechanical properties that are required to fold polymer sheets and observed that a yield point that does not correspond to catastrophic failure is required. Unfolding occurred during heating through the glass transition, indicating the vitrification of the network that maintained the temporary, folded shape. Folding was demonstrated as a powerful tool to simply and effectively program ductile shape-memory polymers without the need for thermal cycling.
Baker, Matthew L.; Hryc, Corey F.; Zhang, Qinfen; Wu, Weimin; Jakana, Joanita; Haase-Pettingell, Cameron; Afonine, Pavel V.; Adams, Paul D.; King, Jonathan A.; Jiang, Wen; Chiu, Wah
2013-01-01
High-resolution structures of viruses have made important contributions to modern structural biology. Bacteriophages, the most diverse and abundant organisms on earth, replicate and infect all bacteria and archaea, making them excellent potential alternatives to antibiotics and therapies for multidrug-resistant bacteria. Here, we improved upon our previous electron cryomicroscopy structure of Salmonella bacteriophage epsilon15, achieving a resolution sufficient to determine the tertiary structures of both gp7 and gp10 protein subunits that form the T = 7 icosahedral lattice. This study utilizes recently established best practice for near-atomic to high-resolution (3–5 Å) electron cryomicroscopy data evaluation. The resolution and reliability of the density map were cross-validated by multiple reconstructions from truly independent data sets, whereas the models of the individual protein subunits were validated adopting the best practices from X-ray crystallography. Some sidechain densities are clearly resolved and show the subunit–subunit interactions within and across the capsomeres that are required to stabilize the virus. The presence of the canonical phage and jellyroll viral protein folds, gp7 and gp10, respectively, in the same virus suggests that epsilon15 may have emerged more recently relative to other bacteriophages. PMID:23840063
Fernandez-Miranda, Juan C; Pathak, Sudhir; Engh, Johnathan; Jarbo, Kevin; Verstynen, Timothy; Yeh, Fang-Cheng; Wang, Yibao; Mintz, Arlan; Boada, Fernando; Schneider, Walter; Friedlander, Robert
2012-08-01
High-definition fiber tracking (HDFT) is a novel combination of processing, reconstruction, and tractography methods that can track white matter fibers from cortex, through complex fiber crossings, to cortical and subcortical targets with subvoxel resolution. To perform neuroanatomical validation of HDFT and to investigate its neurosurgical applications. Six neurologically healthy adults and 36 patients with brain lesions were studied. Diffusion spectrum imaging data were reconstructed with a Generalized Q-Ball Imaging approach. Fiber dissection studies were performed in 20 human brains, and selected dissection results were compared with tractography. HDFT provides accurate replication of known neuroanatomical features such as the gyral and sulcal folding patterns, the characteristic shape of the claustrum, the segmentation of the thalamic nuclei, the decussation of the superior cerebellar peduncle, the multiple fiber crossing at the centrum semiovale, the complex angulation of the optic radiations, the terminal arborization of the arcuate tract, and the cortical segmentation of the dorsal Broca area. From a clinical perspective, we show that HDFT provides accurate structural connectivity studies in patients with intracerebral lesions, allowing qualitative and quantitative white matter damage assessment, aiding in understanding lesional patterns of white matter structural injury, and facilitating innovative neurosurgical applications. High-grade gliomas produce significant disruption of fibers, and low-grade gliomas cause fiber displacement. Cavernomas cause both displacement and disruption of fibers. Our HDFT approach provides an accurate reconstruction of white matter fiber tracts with unprecedented detail in both the normal and pathological human brain. Further studies to validate the clinical findings are needed.
Bridging a translational gap: using machine learning to improve the prediction of PTSD.
Karstoft, Karen-Inge; Galatzer-Levy, Isaac R; Statnikov, Alexander; Li, Zhiguo; Shalev, Arieh Y
2015-03-16
Predicting Posttraumatic Stress Disorder (PTSD) is a pre-requisite for targeted prevention. Current research has identified group-level risk-indicators, many of which (e.g., head trauma, receiving opiates) concern but a subset of survivors. Identifying interchangeable sets of risk indicators may increase the efficiency of early risk assessment. The study goal is to use supervised machine learning (ML) to uncover interchangeable, maximally predictive combinations of early risk indicators. Data variables (features) reflecting event characteristics, emergency department (ED) records and early symptoms were collected in 957 trauma survivors within ten days of ED admission, and used to predict PTSD symptom trajectories during the following fifteen months. A Target Information Equivalence Algorithm (TIE*) identified all minimal sets of features (Markov Boundaries; MBs) that maximized the prediction of a non-remitting PTSD symptom trajectory when integrated in a support vector machine (SVM). The predictive accuracy of each set of predictors was evaluated in a repeated 10-fold cross-validation and expressed as average area under the Receiver Operating Characteristics curve (AUC) for all validation trials. The average number of MBs per cross validation was 800. MBs' mean AUC was 0.75 (95% range: 0.67-0.80). The average number of features per MB was 18 (range: 12-32) with 13 features present in over 75% of the sets. Our findings support the hypothesized existence of multiple and interchangeable sets of risk indicators that equally and exhaustively predict non-remitting PTSD. ML's ability to increase prediction versatility is a promising step towards developing algorithmic, knowledge-based, personalized prediction of post-traumatic psychopathology.
NASA Astrophysics Data System (ADS)
Shu, Liangshu; Yin, Hongwei; Faure, Michel; Chen, Yan
2017-06-01
The Xu-Huai thrust-and-fold belt, located in the southeastern margin of the North China Block, consists mainly of thrust and folded pre-Mesozoic strata. Its geodynamic evolution and tectonic setting are topics of long debate. This paper provides new evidence from geological mapping, structural analysis, and making balance cross-sections, with restoration of cross-sections. Results suggest that this belt was subjected to two-phase deformation, including an early-phase regional-scale NW-ward thrust and fold, and a late-phase extension followed by the emplacement of dioritic, monzodioritic porphyrites dated at 131-135 Ma and locally strike-slip shearing. According to the mapping, field observations and drill-hole data, three structural units were distinguished, namely, (1) the pre-Neoproterozoic crystalline basement in the eastern segment, (2) the nappe unit or the thrust-and-fold zone in the central segment, which is composed of Neoproterozoic to Ordovician carbonate rocks and Carboniferous-Permian coal-bearing rocks, about 2600 m thick, and (3) the western frontal zone. A major decollement fault has also been identified in the base of the nappe unit, on which dozen-meter to km-scale thrust-and-fold bodies were commonly developed. All pre-Mesozoic depositional sequences were involved into a widespread thrust and fold event. Six uncompetent-rock layers with biostratigraphic ages (Nanjing University, 1996) have been recognized, and each uncompetent-rock layer occurred mainly in the top of the footwall, playing an important role in the development of the Xu-Huai thrust-and-fold belt. Geometry of the major decollement fault suggests that the nappe unit of this belt was rooted in its eastern side, near the Tan-Lu Fault Zone. Two geological cross-sections were chosen for structural balancing and restoration. From the balanced cross-sections, ramp-flat and imbricated faults as well as fault-related folds were identified. A shortening of 20.6-29.6 km was obtained from restoration of balanced sections, corresponding to a shortening rate of 43.6-46.4%. This shortening deformation was likely related to the SE-ward intracontinental underthrust of the North China Block beneath the South China Block during the Mesozoic.
NASA Astrophysics Data System (ADS)
Coughlan, Carolyn A.; Chou, Li-Dek; Jing, Joseph C.; Chen, Jason J.; Rangarajan, Swathi; Chang, Theodore H.; Sharma, Giriraj K.; Cho, Kyoungrai; Lee, Donghoon; Goddard, Julie A.; Chen, Zhongping; Wong, Brian J. F.
2016-03-01
Diagnosis and treatment of vocal fold lesions has been a long-evolving science for the otolaryngologist. Contemporary practice requires biopsy of a glottal lesion in the operating room under general anesthesia for diagnosis. Current in-office technology is limited to visualizing the surface of the vocal folds with fiber-optic or rigid endoscopy and using stroboscopic or high-speed video to infer information about submucosal processes. Previous efforts using optical coherence tomography (OCT) have been limited by small working distances and imaging ranges. Here we report the first full field, high-speed, and long-range OCT images of awake patients’ vocal folds as well as cross-sectional video and Doppler analysis of their vocal fold motions during phonation. These vertical-cavity surface-emitting laser source (VCSEL) OCT images offer depth resolved, high-resolution, high-speed, and panoramic images of both the true and false vocal folds. This technology has the potential to revolutionize in-office imaging of the larynx.
The PDB_REDO server for macromolecular structure model optimization.
Joosten, Robbie P; Long, Fei; Murshudov, Garib N; Perrakis, Anastassis
2014-07-01
The refinement and validation of a crystallographic structure model is the last step before the coordinates and the associated data are submitted to the Protein Data Bank (PDB). The success of the refinement procedure is typically assessed by validating the models against geometrical criteria and the diffraction data, and is an important step in ensuring the quality of the PDB public archive [Read et al. (2011 ▶), Structure, 19, 1395-1412]. The PDB_REDO procedure aims for 'constructive validation', aspiring to consistent and optimal refinement parameterization and pro-active model rebuilding, not only correcting errors but striving for optimal interpretation of the electron density. A web server for PDB_REDO has been implemented, allowing thorough, consistent and fully automated optimization of the refinement procedure in REFMAC and partial model rebuilding. The goal of the web server is to help practicing crystallo-graphers to improve their model prior to submission to the PDB. For this, additional steps were implemented in the PDB_REDO pipeline, both in the refinement procedure, e.g. testing of resolution limits and k-fold cross-validation for small test sets, and as new validation criteria, e.g. the density-fit metrics implemented in EDSTATS and ligand validation as implemented in YASARA. Innovative ways to present the refinement and validation results to the user are also described, which together with auto-generated Coot scripts can guide users to subsequent model inspection and improvement. It is demonstrated that using the server can lead to substantial improvement of structure models before they are submitted to the PDB.
Assessing a Top-Down Modeling Approach for Seasonal Scale Snow Sensitivity
NASA Astrophysics Data System (ADS)
Luce, C. H.; Lute, A.
2017-12-01
Mechanistic snow models are commonly applied to assess changes to snowpacks in a warming climate. Such assessments involve a number of assumptions about details of weather at daily to sub-seasonal time scales. Models of season-scale behavior can provide contrast for evaluating behavior at time scales more in concordance with climate warming projections. Such top-down models, however, involve a degree of empiricism, with attendant caveats about the potential of a changing climate to affect calibrated relationships. We estimated the sensitivity of snowpacks from 497 Snowpack Telemetry (SNOTEL) stations in the western U.S. based on differences in climate between stations (spatial analog). We examined the sensitivity of April 1 snow water equivalent (SWE) and mean snow residence time (SRT) to variations in Nov-Mar precipitation and average Nov-Mar temperature using multivariate local-fit regressions. We tested the modeling approach using a leave-one-out cross-validation as well as targeted two-fold non-random cross-validations contrasting, for example, warm vs. cold years, dry vs. wet years, and north vs. south stations. Nash-Sutcliffe Efficiency (NSE) values for the validations were strong for April 1 SWE, ranging from 0.71 to 0.90, and still reasonable, but weaker, for SRT, in the range of 0.64 to 0.81. From these ranges, we exclude validations where the training data do not represent the range of target data. A likely reason for differences in validation between the two metrics is that the SWE model reflects the influence of conservation of mass while using temperature as an indicator of the season-scale energy balance; in contrast, SRT depends more strongly on the energy balance aspects of the problem. Model forms with lower numbers of parameters generally validated better than more complex model forms, with the caveat that pseudoreplication could encourage selection of more complex models when validation contrasts were weak. Overall, the split sample validations confirm transferability of the relationships in space and time contingent upon full representation of validation conditions in the calibration data set. The ability of the top-down space-for-time models to predict in new time periods and locations lends confidence to their application for assessments and for improving finer time scale models.
Che, Wunan; Huang, Jianlei; Guan, Fang; Wu, Yidong; Yang, Yihua
2015-08-01
Beet armyworm, Spodoptera exigua (Hübner), is a worldwide pest of many crops. Chemical insecticides are heavily used for its control in China, and serious resistance has been evolved in the field to a variety of insecticides including emamectin benzoate. Through repeated backcrossing to a susceptible strain (WH-S) and selection with emamectin benzoate, the trait conferring resistance to emamectin benzoate in a field-collected population of S. exigua (moderately resistant to emamectin benzoate and strongly resistant to pyrethroids and indoxacarb) was introgressed into WH-S to generate a near-isogenic resistant strain (WH-EB). Compared with WH-S, the WH-EB strain developed a 1,110-fold resistance to emamectin benzoate and a high level of cross-resistance to abamectin (202-fold), with low levels of cross-resistance to cypermethrin (10-fold) and chlorfluazuron (7-fold), but no cross-resistance to representatives of another six different classes of insecticides (chlorantraniliprole, chlorfenapyr, indoxacarb, spinosad, tebufenozide, and chlorpyrifos). Resistance to emamectin benzoate in WH-EB was autosomal, incompletely dominant, and polygenic. Limited cross-resistance in WH-EB indicates that emamectin benzoate can be rotated with other classes of insecticides to which it does not show cross-resistance to delay the evolution of resistance in S. exigua. The incompletely dominant nature of resistance in S. exigua may explain the rapid evolution of resistance to emamectin benzoate in the field, and careful deployment of this chemical within a resistance management program should be considered. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Effect of Replacing Race with Apolipoprotein L1 Genotype in Calculation of Kidney Donor Risk Index
Julian, B. A.; Gaston, R. S.; Brown, W. M.; Reeves-Daniel, A. M.; Israni, A. K.; Schladt, D. P.; Pastan, S. O.; Mohan, S.; Freedman, B. I.; Divers, J.
2016-01-01
Renal allografts from deceased African Americans with two apolipoprotein L1 gene (APOL1) renal-risk variants fail sooner than kidneys from donors with fewer variants. Kidney Donor Risk Index (KDRI) was developed to evaluate organ offers by predicting allograft longevity and includes African American race as a risk factor. Substituting APOL1 genotype for race may refine the KDRI. For 622 deceased African American kidney donors, we applied 10-fold cross-validation approach to estimate contribution of APOL1 variants to a revised KDRI. Cross-validation was repeated 10,000 times to generate distribution of effect size associated with APOL1 genotype. Average effect size was used to derive the revised KDRI weighting. Mean current-KDRI score for all donors was 1.4930 versus mean revised-KDRI score 1.2518 for 529 donors with 0/1 variant and 1.8527 for 93 donors with 2 variants. Original and revised KDRIs had comparable survival prediction errors after transplantation, but the spread in Kidney Donor Profile Index based on presence/absence of 2 APOL1 variants was 37 percentage points. Replacing donor race with APOL1 genotype in KDRI better defines risk associated with kidneys transplanted from deceased African American donors, substantially improves KDRI score for 85-90% of kidneys offered, and enhances the link between donor quality and recipient need. PMID:27862962
Evaluation of polygenic risk scores for predicting breast and prostate cancer risk.
Machiela, Mitchell J; Chen, Chia-Yen; Chen, Constance; Chanock, Stephen J; Hunter, David J; Kraft, Peter
2011-09-01
Recently, polygenic risk scores (PRS) have been shown to be associated with certain complex diseases. The approach has been based on the contribution of counting multiple alleles associated with disease across independent loci, without requiring compelling evidence that every locus had already achieved definitive genome-wide statistical significance. Whether PRS assist in the prediction of risk of common cancers is unknown. We built PRS from lists of genetic markers prioritized by their association with breast cancer (BCa) or prostate cancer (PCa) in a training data set and evaluated whether these scores could improve current genetic prediction of these specific cancers in independent test samples. We used genome-wide association data on 1,145 BCa cases and 1,142 controls from the Nurses' Health Study and 1,164 PCa cases and 1,113 controls from the Prostate Lung Colorectal and Ovarian Cancer Screening Trial. Ten-fold cross validation was used to build and evaluate PRS with 10 to 60,000 independent single nucleotide polymorphisms (SNPs). For both BCa and PCa, the models that included only published risk alleles maximized the cross-validation estimate of the area under the ROC curve (0.53 for breast and 0.57 for prostate). We found no significant evidence that PRS using common variants improved risk prediction for BCa and PCa over replicated SNP scores. © 2011 Wiley-Liss, Inc.
Pedersen, Nicklas Juel; Jensen, David Hebbelstrup; Lelkaitis, Giedrius; Kiss, Katalin; Charabi, Birgitte; Specht, Lena; von Buchwald, Christian
2017-01-01
It is challenging to identify at diagnosis those patients with early oral squamous cell carcinoma (OSCC), who have a poor prognosis and those that have a high risk of harboring occult lymph node metastases. The aim of this study was to develop a standardized and objective digital scoring method to evaluate the predictive value of tumor budding. We developed a semi-automated image-analysis algorithm, Digital Tumor Bud Count (DTBC), to evaluate tumor budding. The algorithm was tested in 222 consecutive patients with early-stage OSCC and major endpoints were overall (OS) and progression free survival (PFS). We subsequently constructed and cross-validated a binary logistic regression model and evaluated its clinical utility by decision curve analysis. A high DTBC was an independent predictor of both poor OS and PFS in a multivariate Cox regression model. The logistic regression model was able to identify patients with occult lymph node metastases with an area under the curve (AUC) of 0.83 (95% CI: 0.78–0.89, P <0.001) and a 10-fold cross-validated AUC of 0.79. Compared to other known histopathological risk factors, the DTBC had a higher diagnostic accuracy. The proposed, novel risk model could be used as a guide to identify patients who would benefit from an up-front neck dissection. PMID:28212555
The prediction of palmitoylation site locations using a multiple feature extraction method.
Shi, Shao-Ping; Sun, Xing-Yu; Qiu, Jian-Ding; Suo, Sheng-Bao; Chen, Xiang; Huang, Shu-Yun; Liang, Ru-Ping
2013-03-01
As an extremely important and ubiquitous post-translational lipid modification, palmitoylation plays a significant role in a variety of biological and physiological processes. Unlike other lipid modifications, protein palmitoylation and depalmitoylation are highly dynamic and can regulate both protein function and localization. The dynamic nature of palmitoylation is poorly understood because of the limitations in current assay methods. The in vivo or in vitro experimental identification of palmitoylation sites is both time consuming and expensive. Due to the large volume of protein sequences generated in the post-genomic era, it is extraordinarily important in both basic research and drug discovery to rapidly identify the attributes of a new protein's palmitoylation sites. In this work, a new computational method, WAP-Palm, combining multiple feature extraction, has been developed to predict the palmitoylation sites of proteins. The performance of the WAP-Palm model is measured herein and was found to have a sensitivity of 81.53%, a specificity of 90.45%, an accuracy of 85.99% and a Matthews correlation coefficient of 72.26% in 10-fold cross-validation test. The results obtained from both the cross-validation and independent tests suggest that the WAP-Palm model might facilitate the identification and annotation of protein palmitoylation locations. The online service is available at http://bioinfo.ncu.edu.cn/WAP-Palm.aspx. Copyright © 2013 Elsevier Inc. All rights reserved.
Moghaddar, N; van der Werf, J H J
2017-12-01
The objectives of this study were to estimate the additive and dominance variance component of several weight and ultrasound scanned body composition traits in purebred and combined cross-bred sheep populations based on single nucleotide polymorphism (SNP) marker genotypes and then to investigate the effect of fitting additive and dominance effects on accuracy of genomic evaluation. Additive and dominance variance components were estimated in a mixed model equation based on "average information restricted maximum likelihood" using additive and dominance (co)variances between animals calculated from 48,599 SNP marker genotypes. Genomic prediction was based on genomic best linear unbiased prediction (GBLUP), and the accuracy of prediction was assessed based on a random 10-fold cross-validation. Across different weight and scanned body composition traits, dominance variance ranged from 0.0% to 7.3% of the phenotypic variance in the purebred population and from 7.1% to 19.2% in the combined cross-bred population. In the combined cross-bred population, the range of dominance variance decreased to 3.1% and 9.9% after accounting for heterosis effects. Accounting for dominance effects significantly improved the likelihood of the fitting model in the combined cross-bred population. This study showed a substantial dominance genetic variance for weight and ultrasound scanned body composition traits particularly in cross-bred population; however, improvement in the accuracy of genomic breeding values was small and statistically not significant. Dominance variance estimates in combined cross-bred population could be overestimated if heterosis is not fitted in the model. © 2017 Blackwell Verlag GmbH.
Three dimensional simulations of viscous folding in diverging microchannels
NASA Astrophysics Data System (ADS)
Xu, Bingrui; Chergui, Jalel; Shin, Seungwon; Juric, Damir
2016-11-01
Three dimensional simulations on the viscous folding in diverging microchannels reported by Cubaud and Mason are performed using the parallel code BLUE for multi-phase flows. The more viscous liquid L1 is injected into the channel from the center inlet, and the less viscous liquid L2 from two side inlets. Liquid L1 takes the form of a thin filament due to hydrodynamic focusing in the long channel that leads to the diverging region. The thread then becomes unstable to a folding instability, due to the longitudinal compressive stress applied to it by the diverging flow of liquid L2. We performed a parameter study in which the flow rate ratio, the viscosity ratio, the Reynolds number, and the shape of the channel were varied relative to a reference model. In our simulations, the cross section of the thread produced by focusing is elliptical rather than circular. The initial folding axis can be either parallel or perpendicular to the narrow dimension of the chamber. In the former case, the folding slowly transforms via twisting to perpendicular folding, or it may remain parallel. The direction of folding onset is determined by the velocity profile and the elliptical shape of the thread cross section in the channel that feeds the diverging part of the cell.
A Genome-Wide Knockout Screen to Identify Genes Involved in Acquired Carboplatin Resistance
2016-07-01
library screen to identify genes that when knocked out render human ovarian cells > 2.5-fold resistant to CBDCA; 2) Validate the ability of...a GeCKOv2 library screen to identify genes that when knocked out render human ovarian cells > 2.5-fold resistant to CBDCA; 2) validate the ability of...resistance in either cell lines or clinical samples. The CRIPSR-cas9 technology now provides us with a major new tool to introduce knock out mutations
Ligand-promoted protein folding by biased kinetic partitioning.
Hingorani, Karan S; Metcalf, Matthew C; Deming, Derrick T; Garman, Scott C; Powers, Evan T; Gierasch, Lila M
2017-04-01
Protein folding in cells occurs in the presence of high concentrations of endogenous binding partners, and exogenous binding partners have been exploited as pharmacological chaperones. A combined mathematical modeling and experimental approach shows that a ligand improves the folding of a destabilized protein by biasing the kinetic partitioning between folding and alternative fates (aggregation or degradation). Computationally predicted inhibition of test protein aggregation and degradation as a function of ligand concentration are validated by experiments in two disparate cellular systems.
Ligand-Promoted Protein Folding by Biased Kinetic Partitioning
Hingorani, Karan S.; Metcalf, Matthew C.; Deming, Derrick T.; Garman, Scott C.; Powers, Evan T.; Gierasch, Lila M.
2017-01-01
Protein folding in cells occurs in the presence of high concentrations of endogenous binding partners, and exogenous binding partners have been exploited as pharmacological chaperones. A combined mathematical modeling and experimental approach shows that a ligand improves the folding of a destabilized protein by biasing the kinetic partitioning between folding and alternative fates (aggregation or degradation). Computationally predicted inhibition of test protein aggregation and degradation as a function of ligand concentration are validated by experiments in two disparate cellular systems. PMID:28218913
Mao, Xu-lian; Liu, Jin; Li, Xu-ke; Chi, Jia-jia; Liu, Yong-jie
2016-01-01
In order to investigate the resistance development law and biochemical resistance mechanism of Laodelphax striatellus to buprofezin, spraying rice seedlings was used to continuously screen resistant strains of L. striatellus and dipping rice seedlings was applied to determine the toxicity and cross-resistance of L. striatellus to insecticides. After 32-generation screening with buprofezin, L. striatellus developed 168.49 folds resistance and its reality heritability (h2) was 0.11. If the killing rate was 80%-90%, L. striatellus was expected to develop 10-fold resistance to buprofezin only after 5 to 6 generations breeding. Because the actual reality heritability of field populations was usually lower than that of the resistant strains, the production of field populations increasing with 10-fold resistance would need much longer time. The results of cross-resistance showed that resistant strain had high level cross-resistance with thiamethoxam and imidacloprid, low level cross-resistance with acetamiprid, and no cross-resistance with pymetrozine and chlorpyrifos. The activity of detoxification enzymes of different strains and the syergism of synergist were measured. The results showed that cytochrome P450 monooxygenase played a major role in the resistance of L. striatellus to buprofezin, the esterase played a minor role and the GSH-S-transferase had no effect. Therefore, L. striatellus would have high risk to develop resistance to buprofezin when used in the field and might be delayed by using pymetrozine and chlorpyrifos.
Pattin, Kristine A.; White, Bill C.; Barney, Nate; Gui, Jiang; Nelson, Heather H.; Kelsey, Karl R.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H.
2008-01-01
Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free data mining method for detecting, characterizing, and interpreting epistasis in the absence of significant main effects in genetic and epidemiologic studies of complex traits such as disease susceptibility. The goal of MDR is to change the representation of the data using a constructive induction algorithm to make nonadditive interactions easier to detect using any classification method such as naïve Bayes or logistic regression. Traditionally, MDR constructed variables have been evaluated with a naïve Bayes classifier that is combined with 10-fold cross validation to obtain an estimate of predictive accuracy or generalizability of epistasis models. Traditionally, we have used permutation testing to statistically evaluate the significance of models obtained through MDR. The advantage of permutation testing is that it controls for false-positives due to multiple testing. The disadvantage is that permutation testing is computationally expensive. This is in an important issue that arises in the context of detecting epistasis on a genome-wide scale. The goal of the present study was to develop and evaluate several alternatives to large-scale permutation testing for assessing the statistical significance of MDR models. Using data simulated from 70 different epistasis models, we compared the power and type I error rate of MDR using a 1000-fold permutation test with hypothesis testing using an extreme value distribution (EVD). We find that this new hypothesis testing method provides a reasonable alternative to the computationally expensive 1000-fold permutation test and is 50 times faster. We then demonstrate this new method by applying it to a genetic epidemiology study of bladder cancer susceptibility that was previously analyzed using MDR and assessed using a 1000-fold permutation test. PMID:18671250
Predicting introductory programming performance: A multi-institutional multivariate study
NASA Astrophysics Data System (ADS)
Bergin, Susan; Reilly, Ronan
2006-12-01
A model for predicting student performance on introductory programming modules is presented. The model uses attributes identified in a study carried out at four third-level institutions in the Republic of Ireland. Four instruments were used to collect the data and over 25 attributes were examined. A data reduction technique was applied and a logistic regression model using 10-fold stratified cross validation was developed. The model used three attributes: Leaving Certificate Mathematics result (final mathematics examination at second level), number of hours playing computer games while taking the module and programming self-esteem. Prediction success was significant with 80% of students correctly classified. The model also works well on a per-institution level. A discussion on the implications of the model is provided and future work is outlined.
RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.
Ismail, Hamid D; Jones, Ahoi; Kim, Jung H; Newman, Robert H; Kc, Dukka B
2016-01-01
Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite.
Emotional Sentence Annotation Helps Predict Fiction Genre
Samothrakis, Spyridon; Fasli, Maria
2015-01-01
Fiction, a prime form of entertainment, has evolved into multiple genres which one can broadly attribute to different forms of stories. In this paper, we examine the hypothesis that works of fiction can be characterised by the emotions they portray. To investigate this hypothesis, we use the work of fictions in the Project Gutenberg and we attribute basic emotional content to each individual sentence using Ekman’s model. A time-smoothed version of the emotional content for each basic emotion is used to train extremely randomized trees. We show through 10-fold Cross-Validation that the emotional content of each work of fiction can help identify each genre with significantly higher probability than random. We also show that the most important differentiator between genre novels is fear. PMID:26524352
Gitifar, Vahid; Eslamloueyan, Reza; Sarshar, Mohammad
2013-11-01
In this study, pretreatment of sugarcane bagasse and subsequent enzymatic hydrolysis is investigated using two categories of pretreatment methods: dilute acid (DA) pretreatment and combined DA-ozonolysis (DAO) method. Both methods are accomplished at different solid ratios, sulfuric acid concentrations, autoclave residence times, bagasse moisture content, and ozonolysis time. The results show that the DAO pretreatment can significantly increase the production of glucose compared to DA method. Applying k-fold cross validation method, two optimal artificial neural networks (ANNs) are trained for estimations of glucose concentrations for DA and DAO pretreatment methods. Comparing the modeling results with experimental data indicates that the proposed ANNs have good estimation abilities. Copyright © 2013 Elsevier Ltd. All rights reserved.
A Comparison of Artificial Intelligence Methods on Determining Coronary Artery Disease
NASA Astrophysics Data System (ADS)
Babaoğlu, Ismail; Baykan, Ömer Kaan; Aygül, Nazif; Özdemir, Kurtuluş; Bayrak, Mehmet
The aim of this study is to show a comparison of multi-layered perceptron neural network (MLPNN) and support vector machine (SVM) on determination of coronary artery disease existence upon exercise stress testing (EST) data. EST and coronary angiography were performed on 480 patients with acquiring 23 verifying features from each. The robustness of the proposed methods is examined using classification accuracy, k-fold cross-validation method and Cohen's kappa coefficient. The obtained classification accuracies are approximately 78% and 79% for MLPNN and SVM respectively. Both MLPNN and SVM methods are rather satisfactory than human-based method looking to Cohen's kappa coefficients. Besides, SVM is slightly better than MLPNN when looking to the diagnostic accuracy, average of sensitivity and specificity, and also Cohen's kappa coefficient.
Denman, Stuart E; McSweeney, Christopher S
2006-12-01
Traditional methods for enumerating and identifying microbial populations within the rumen can be time consuming and cumbersome. Methods that involve culturing and microscopy can also be inconclusive, particularly when studying anaerobic rumen fungi. A real-time PCR SYBR Green assay, using PCR primers to target total rumen fungi and the cellulolytic bacteria Ruminococcus flavefaciens and Fibrobacter succinogenes, is described, including design and validation. The DNA and crude protein contents with respect to the fungal biomass of both polycentric and monocentric fungal isolates were investigated across the fungal growth stages to aid in standard curve generation. The primer sets used were found to be target specific with no detectable cross-reactivity. Subsequently, the real-time PCR assay was employed in a study to detect these populations within cattle rumen. The anaerobic fungal target was observed to increase 3.6-fold from 0 to 12 h after feeding. The results also indicated a 5.4-fold increase in F. succinogenes target between 0 and 12 h after feeding, whereas R. flavefaciens was observed to maintain more or less consistent levels. This is the first report of a real-time PCR assay to estimate the rumen anaerobic fungal population.
Detrended fluctuation analysis for major depressive disorder.
Mumtaz, Wajid; Malik, Aamir Saeed; Ali, Syed Saad Azhar; Yasin, Mohd Azhar Mohd; Amin, Hafeezullah
2015-01-01
Clinical utility of Electroencephalography (EEG) based diagnostic studies is less clear for major depressive disorder (MDD). In this paper, a novel machine learning (ML) scheme was presented to discriminate the MDD patients and healthy controls. The proposed method inherently involved feature extraction, selection, classification and validation. The EEG data acquisition involved eyes closed (EC) and eyes open (EO) conditions. At feature extraction stage, the de-trended fluctuation analysis (DFA) was performed, based on the EEG data, to achieve scaling exponents. The DFA was performed to analyzes the presence or absence of long-range temporal correlations (LRTC) in the recorded EEG data. The scaling exponents were used as input features to our proposed system. At feature selection stage, 3 different techniques were used for comparison purposes. Logistic regression (LR) classifier was employed. The method was validated by a 10-fold cross-validation. As results, we have observed that the effect of 3 different reference montages on the computed features. The proposed method employed 3 different types of feature selection techniques for comparison purposes as well. The results show that the DFA analysis performed better in LE data compared with the IR and AR data. In addition, during Wilcoxon ranking, the AR performed better than LE and IR. Based on the results, it was concluded that the DFA provided useful information to discriminate the MDD patients and with further validation can be employed in clinics for diagnosis of MDD.
RRegrs: an R package for computer-aided model selection with multiple regression models.
Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L
2015-01-01
Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.
Erickson, Jennifer; Abbott, Kenneth; Susienka, Lucinda
2018-06-01
Homeless patients face a variety of obstacles in pursuit of basic social services. Acknowledging this, the Social Security Administration directs employees to prioritize homeless patients and handle their disability claims with special care. However, under existing manual processes for identification of homelessness, many homeless patients never receive the special service to which they are entitled. In this paper, we explore address validation and automatic annotation of electronic health records to improve identification of homeless patients. We developed a sample of claims containing medical records at the moment of arrival in a single office. Using address validation software, we reconciled patient addresses with public directories of homeless shelters, veterans' hospitals and clinics, and correctional facilities. Other tools annotated electronic health records. We trained random forests to identify homeless patients and validated each model with 10-fold cross validation. For our finished model, the area under the receiver operating characteristic curve was 0.942. The random forest improved sensitivity from 0.067 to 0.879 but decreased positive predictive value to 0.382. Presumed false positive classifications bore many characteristics of homelessness. Organizations could use these methods to prompt early collection of information necessary to avoid labor-intensive attempts to reestablish contact with homeless individuals. Annually, such methods could benefit tens of thousands of patients who are homeless, destitute, and in urgent need of assistance. We were able to identify many more homeless patients through a combination of automatic address validation and natural language processing of unstructured electronic health records. Copyright © 2018. Published by Elsevier Inc.
The PDB_REDO server for macromolecular structure model optimization
Joosten, Robbie P.; Long, Fei; Murshudov, Garib N.; Perrakis, Anastassis
2014-01-01
The refinement and validation of a crystallographic structure model is the last step before the coordinates and the associated data are submitted to the Protein Data Bank (PDB). The success of the refinement procedure is typically assessed by validating the models against geometrical criteria and the diffraction data, and is an important step in ensuring the quality of the PDB public archive [Read et al. (2011 ▶), Structure, 19, 1395–1412]. The PDB_REDO procedure aims for ‘constructive validation’, aspiring to consistent and optimal refinement parameterization and pro-active model rebuilding, not only correcting errors but striving for optimal interpretation of the electron density. A web server for PDB_REDO has been implemented, allowing thorough, consistent and fully automated optimization of the refinement procedure in REFMAC and partial model rebuilding. The goal of the web server is to help practicing crystallographers to improve their model prior to submission to the PDB. For this, additional steps were implemented in the PDB_REDO pipeline, both in the refinement procedure, e.g. testing of resolution limits and k-fold cross-validation for small test sets, and as new validation criteria, e.g. the density-fit metrics implemented in EDSTATS and ligand validation as implemented in YASARA. Innovative ways to present the refinement and validation results to the user are also described, which together with auto-generated Coot scripts can guide users to subsequent model inspection and improvement. It is demonstrated that using the server can lead to substantial improvement of structure models before they are submitted to the PDB. PMID:25075342
St-Louis, Etienne; Bracco, David; Hanley, James; Razek, Tarek; Baird, Robert
2017-10-12
There is a need for a pediatric trauma outcomes benchmarking model that is adapted for Low-and-Middle-Income Countries (LMICs). We used the National-Trauma-Data-Bank (NTDB) and applied constraints specific to resource-poor environments to develop and validate an LMIC-specific pediatric trauma score. We selected a sample of pediatric trauma patients aged 0-14years in the NTDB from 2007 to 2012. Primary outcome was in-hospital death. Logistic regression was used to create the Pediatric Resuscitation and Trauma Outcome (PRESTO) score, which includes only low-tech predictor variables - those easily obtainable at point-of-care. Internal validation was performed using 10-fold cross-validation. External validation compared PRESTO to TRISS using ROC analyses. Among 651,030 patients, there were 64% males. Median age was 7. In-hospital mortality-rate was 1.2%. Mean TRISS predicted mortality was 0.04% (range 0%-43%). Independent predictors included in PRESTO (p<0.01) were age, blood pressure, neurologic status, need for supplemental oxygen, pulse, and oxygen saturation. The sensitivity and specificity of PRESTO were 95.7% and 94.0%. The resulting model had an AUC of 0.98 compared to 0.89 for TRISS. PRESTO satisfies the requirements of low-resource settings and is inherently adapted to children, allowing for benchmarking and eventual quality improvement initiatives. Further research is necessary for in-situ validation using prospectively collected LMIC data. Level III - Case-Control (Prognostic) Study. Copyright © 2017 Elsevier Inc. All rights reserved.
Bittante, G; Ferragina, A; Cipolat-Gotet, C; Cecchinato, A
2014-10-01
Cheese yield is an important technological trait in the dairy industry. The aim of this study was to infer the genetic parameters of some cheese yield-related traits predicted using Fourier-transform infrared (FTIR) spectral analysis and compare the results with those obtained using an individual model cheese-producing procedure. A total of 1,264 model cheeses were produced using 1,500-mL milk samples collected from individual Brown Swiss cows, and individual measurements were taken for 10 traits: 3 cheese yield traits (fresh curd, curd total solids, and curd water as a percent of the weight of the processed milk), 4 milk nutrient recovery traits (fat, protein, total solids, and energy of the curd as a percent of the same nutrient in the processed milk), and 3 daily cheese production traits per cow (fresh curd, total solids, and water weight of the curd). Each unprocessed milk sample was analyzed using a MilkoScan FT6000 (Foss, Hillerød, Denmark) over the spectral range, from 5,000 to 900 wavenumber × cm(-1). The FTIR spectrum-based prediction models for the previously mentioned traits were developed using modified partial least-square regression. Cross-validation of the whole data set yielded coefficients of determination between the predicted and measured values in cross-validation of 0.65 to 0.95 for all traits, except for the recovery of fat (0.41). A 3-fold external validation was also used, in which the available data were partitioned into 2 subsets: a training set (one-third of the herds) and a testing set (two-thirds). The training set was used to develop calibration equations, whereas the testing subsets were used for external validation of the calibration equations and to estimate the heritabilities and genetic correlations of the measured and FTIR-predicted phenotypes. The coefficients of determination between the predicted and measured values in cross-validation results obtained from the training sets were very similar to those obtained from the whole data set, but the coefficient of determination of validation values for the external validation sets were much lower for all traits (0.30 to 0.73), and particularly for fat recovery (0.05 to 0.18), for the training sets compared with the full data set. For each testing subset, the (co)variance components for the measured and FTIR-predicted phenotypes were estimated using bivariate Bayesian analyses and linear models. The intraherd heritabilities for the predicted traits obtained from our internal cross-validation using the whole data set ranged from 0.085 for daily yield of curd solids to 0.576 for protein recovery, and were similar to those obtained from the measured traits (0.079 to 0.586, respectively). The heritabilities estimated from the testing data set used for external validation were more variable but similar (on average) to the corresponding values obtained from the whole data set. Moreover, the genetic correlations between the predicted and measured traits were high in general (0.791 to 0.996), and they were always higher than the corresponding phenotypic correlations (0.383 to 0.995), especially for the external validation subset. In conclusion, we herein report that application of the cross-validation technique to the whole data set tended to overestimate the predictive ability of FTIR spectra, give more precise phenotypic predictions than the calibrations obtained using smaller data sets, and yield genetic correlations similar to those obtained from the measured traits. Collectively, our findings indicate that FTIR predictions have the potential to be used as indicator traits for the rapid and inexpensive selection of dairy populations for improvement of cheese yield, milk nutrient recovery in curd, and daily cheese production per cow. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Cell-specific targeting by heterobivalent ligands.
Josan, Jatinder S; Handl, Heather L; Sankaranarayanan, Rajesh; Xu, Liping; Lynch, Ronald M; Vagner, Josef; Mash, Eugene A; Hruby, Victor J; Gillies, Robert J
2011-07-20
Current cancer therapies exploit either differential metabolism or targeting to specific individual gene products that are overexpressed in aberrant cells. The work described herein proposes an alternative approach--to specifically target combinations of cell-surface receptors using heteromultivalent ligands ("receptor combination approach"). As a proof-of-concept that functionally unrelated receptors can be noncovalently cross-linked with high avidity and specificity, a series of heterobivalent ligands (htBVLs) were constructed from analogues of the melanocortin peptide ligand ([Nle(4), dPhe(7)]-α-MSH) and the cholecystokinin peptide ligand (CCK-8). Binding of these ligands to cells expressing the human Melanocortin-4 receptor and the Cholecystokinin-2 receptor was analyzed. The MSH(7) and CCK(6) were tethered with linkers of varying rigidity and length, constructed from natural and/or synthetic building blocks. Modeling data suggest that a linker length of 20-50 Å is needed to simultaneously bind these two different G-protein coupled receptors (GPCRs). These ligands exhibited up to 24-fold enhancement in binding affinity to cells that expressed both (bivalent binding), compared to cells with only one (monovalent binding) of the cognate receptors. The htBVLs had up to 50-fold higher affinity than that of a monomeric CCK ligand, i.e., Ac-CCK(6)-NH(2). Cell-surface targeting of these two cell types with labeled heteromultivalent ligand demonstrated high avidity and specificity, thereby validating the receptor combination approach. This ability to noncovalently cross-link heterologous receptors and target individual cells using a receptor combination approach opens up new possibilities for specific cell targeting in vivo for therapy or imaging.
Cell-Specific Targeting by Heterobivalent Ligands
Josan, Jatinder S.; Handl, Heather L.; Sankaranarayanan, Rajesh; Xu, Liping; Lynch, Ronald M.; Vagner, Josef; Mash, Eugene A.; Hruby, Victor J.; Gillies, Robert J.
2012-01-01
Current cancer therapies exploit either differential metabolism or targeting to specific individual gene products that are overexpressed in aberrant cells. The work described herein proposes an alternative approach—to specifically target combinations of cell-surface receptors using heteromultivalent ligands (“receptor combination approach”). As a proof-of-concept that functionally unrelated receptors can be noncovalently cross-linked with high avidity and specificity, a series of heterobivalent ligands (htBVLs) were constructed from analogues of the melanocortin peptide ligand ([Nle4, DPhe7]-α-MSH) and the cholecystokinin peptide ligand (CCK-8). Binding of these ligands to cells expressing the human Melanocortin-4 receptor and the Cholecystokinin-2 receptor was analyzed. The MSH(7) and CCK(6) were tethered with linkers of varying rigidity and length, constructed from natural and/or synthetic building blocks. Modeling data suggest that a linker length of 20–50 Å is needed to simultaneously bind these two different G-protein coupled receptors (GPCRs). These ligands exhibited up to 24-fold enhancement in binding affinity to cells that expressed both (bivalent binding), compared to cells with only one (monovalent binding) of the cognate receptors. The htBVLs had up to 50-fold higher affinity than that of a monomeric CCK ligand, i.e., Ac-CCK(6)-NH2. Cell-surface targeting of these two cell types with labeled heteromultivalent ligand demonstrated high avidity and specificity, thereby validating the receptor combination approach. This ability to noncovalently cross-link heterologous receptors and target individual cells using a receptor combination approach opens up new possibilities for specific cell targeting in vivo for therapy or imaging. PMID:21639139
Ulfberg, J; Carter, N; Talbäck, M; Edling, C
1996-09-01
To evaluate excessive daytime sleepiness (EDS) at work and effects on reported work performance among men in the general population and male patients suffering from snoring and obstructive sleep apnea syndrome (OSAS). A cross-sectional study of Swedish men between the ages of 30 and 64 years in the county of Kopparberg, in mid-Sweden. A random sample of the general population (n = 285) and consecutive patients referred to a sleep laboratory who fulfilled objective diagnostic criteria (snorers = 289, OSAS = 62) responded to a questionnaire. Responders from the general population were divided into 2 groups, nonsnorers (n = 223) and snorers (n = 62). To validate a question on snoring in the questionnaire, 50 men, randomly selected from the sample of the general population, underwent sleep apnea screening in their homes. The specificity of the questions about snoring was 83% and the sensitivity was 42%. The risk ratios for reporting EDS at work were 4-fold for snorers in the general population, 20-fold for snoring patients, and 40-fold for patients with OSAS as compared with nonsnoring men in the general population. Patients with OSAS and snoring patients both showed increased ratios on measures of difficulties with concentration, learning new tasks, and performing monotonous tasks when compared with nonsnorers. Snoring and sleep apnea were highly associated with excessive EDS at work and subjective work performance problems. The results provide additional evidence that snoring is not merely a nuisance.
LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction
Huang, Li
2017-01-01
Predicting novel microRNA (miRNA)-disease associations is clinically significant due to miRNAs’ potential roles of diagnostic biomarkers and therapeutic targets for various human diseases. Previous studies have demonstrated the viability of utilizing different types of biological data to computationally infer new disease-related miRNAs. Yet researchers face the challenge of how to effectively integrate diverse datasets and make reliable predictions. In this study, we presented a computational model named Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction (LRSSLMDA), which projected miRNAs/diseases’ statistical feature profile and graph theoretical feature profile to a common subspace. It used Laplacian regularization to preserve the local structures of the training data and a L1-norm constraint to select important miRNA/disease features for prediction. The strength of dimensionality reduction enabled the model to be easily extended to much higher dimensional datasets than those exploited in this study. Experimental results showed that LRSSLMDA outperformed ten previous models: the AUC of 0.9178 in global leave-one-out cross validation (LOOCV) and the AUC of 0.8418 in local LOOCV indicated the model’s superior prediction accuracy; and the average AUC of 0.9181+/-0.0004 in 5-fold cross validation justified its accuracy and stability. In addition, three types of case studies further demonstrated its predictive power. Potential miRNAs related to Colon Neoplasms, Lymphoma, Kidney Neoplasms, Esophageal Neoplasms and Breast Neoplasms were predicted by LRSSLMDA. Respectively, 98%, 88%, 96%, 98% and 98% out of the top 50 predictions were validated by experimental evidences. Therefore, we conclude that LRSSLMDA would be a valuable computational tool for miRNA-disease association prediction. PMID:29253885
Inclusion of Endogenous Hormone Levels in Risk Prediction Models of Postmenopausal Breast Cancer
Tworoger, Shelley S.; Zhang, Xuehong; Eliassen, A. Heather; Qian, Jing; Colditz, Graham A.; Willett, Walter C.; Rosner, Bernard A.; Kraft, Peter; Hankinson, Susan E.
2014-01-01
Purpose Endogenous hormones are risk factors for postmenopausal breast cancer, and their measurement may improve our ability to identify high-risk women. Therefore, we evaluated whether inclusion of plasma estradiol, estrone, estrone sulfate, testosterone, dehydroepiandrosterone sulfate, prolactin, and sex hormone–binding globulin (SHBG) improved risk prediction for postmenopausal invasive breast cancer (n = 437 patient cases and n = 775 controls not using postmenopausal hormones) in the Nurses' Health Study. Methods We evaluated improvement in the area under the curve (AUC) for 5-year risk of invasive breast cancer by adding each hormone to the Gail and Rosner-Colditz risk scores. We used stepwise regression to identify the subset of hormones most associated with risk and assessed AUC improvement; we used 10-fold cross validation to assess model overfitting. Results Each hormone was associated with breast cancer risk (odds ratio doubling, 0.82 [SHBG] to 1.37 [estrone sulfate]). Individual hormones improved the AUC by 1.3 to 5.2 units relative to the Gail score and 0.3 to 2.9 for the Rosner-Colditz score. Estrone sulfate, testosterone, and prolactin were selected by stepwise regression and increased the AUC by 5.9 units (P = .003) for the Gail score and 3.4 (P = .04) for the Rosner-Colditz score. In cross validation, the average AUC change across the validation data sets was 6.0 (P = .002) and 3.0 units (P = .03), respectively. Similar results were observed for estrogen receptor–positive disease (selected hormones: estrone sulfate, testosterone, prolactin, and SHBG; change in AUC, 8.8 [P < .001] for Gail score and 5.8 [P = .004] for Rosner-Colditz score). Conclusion Our results support that endogenous hormones improve risk prediction for invasive breast cancer and could help identify women who may benefit from chemoprevention or more screening. PMID:25135988
Carmona-Bayonas, A; Jiménez-Fonseca, P; Font, C; Fenoy, F; Otero, R; Beato, C; Plasencia, J M; Biosca, M; Sánchez, M; Benegas, M; Calvo-Temprano, D; Varona, D; Faez, L; de la Haba, I; Antonio, M; Madridano, O; Solis, M P; Ramchandani, A; Castañón, E; Marchena, P J; Martín, M; Ayala de la Peña, F; Vicente, V
2017-04-11
Our objective was to develop a prognostic stratification tool that enables patients with cancer and pulmonary embolism (PE), whether incidental or symptomatic, to be classified according to the risk of serious complications within 15 days. The sample comprised cases from a national registry of pulmonary thromboembolism in patients with cancer (1075 patients from 14 Spanish centres). Diagnosis was incidental in 53.5% of the events in this registry. The Exhaustive CHAID analysis was applied with 10-fold cross-validation to predict development of serious complications following PE diagnosis. About 208 patients (19.3%, 95% confidence interval (CI), 17.1-21.8%) developed a serious complication after PE diagnosis. The 15-day mortality rate was 10.1%, (95% CI, 8.4-12.1%). The decision tree detected six explanatory covariates: Hestia-like clinical decision rule (any risk criterion present vs none), Eastern Cooperative Group performance scale (ECOG-PS; <2 vs ⩾2), O 2 saturation (<90 vs ⩾90%), presence of PE-specific symptoms, tumour response (progression, unknown, or not evaluated vs others), and primary tumour resection. Three risk classes were created (low, intermediate, and high risk). The risk of serious complications within 15 days increases according to the group: 1.6, 9.4, 30.6%; P<0.0001. Fifteen-day mortality rates also rise progressively in low-, intermediate-, and high-risk patients: 0.3, 6.1, and 17.1%; P<0.0001. The cross-validated risk estimate is 0.191 (s.e.=0.012). The optimism-corrected area under the receiver operating characteristic curve is 0.779 (95% CI, 0.717-0.840). We have developed and internally validated a prognostic index to predict serious complications with the potential to impact decision-making in patients with cancer and PE.
GIMDA: Graphlet interaction-based MiRNA-disease association prediction.
Chen, Xing; Guan, Na-Na; Li, Jian-Qiang; Yan, Gui-Ying
2018-03-01
MicroRNAs (miRNAs) have been confirmed to be closely related to various human complex diseases by many experimental studies. It is necessary and valuable to develop powerful and effective computational models to predict potential associations between miRNAs and diseases. In this work, we presented a prediction model of Graphlet Interaction for MiRNA-Disease Association prediction (GIMDA) by integrating the disease semantic similarity, miRNA functional similarity, Gaussian interaction profile kernel similarity and the experimentally confirmed miRNA-disease associations. The related score of a miRNA to a disease was calculated by measuring the graphlet interactions between two miRNAs or two diseases. The novelty of GIMDA lies in that we used graphlet interaction to analyse the complex relationships between two nodes in a graph. The AUCs of GIMDA in global and local leave-one-out cross-validation (LOOCV) turned out to be 0.9006 and 0.8455, respectively. The average result of five-fold cross-validation reached to 0.8927 ± 0.0012. In case study for colon neoplasms, kidney neoplasms and prostate neoplasms based on the database of HMDD V2.0, 45, 45, 41 of the top 50 potential miRNAs predicted by GIMDA were validated by dbDEMC and miR2Disease. Additionally, in the case study of new diseases without any known associated miRNAs and the case study of predicting potential miRNA-disease associations using HMDD V1.0, there were also high percentages of top 50 miRNAs verified by the experimental literatures. © 2017 The Authors. Journal of Cellular and Molecular Medicine published by John Wiley & Sons Ltd and Foundation for Cellular and Molecular Medicine.
NASA Astrophysics Data System (ADS)
Yan, Xinping; Xu, Xiaojian; Sheng, Chenxing; Yuan, Chengqing; Li, Zhixiong
2018-01-01
Wear faults are among the chief causes of main-engine damage, significantly influencing the secure and economical operation of ships. It is difficult for engineers to utilize multi-source information to identify wear modes, so an intelligent wear mode identification model needs to be developed to assist engineers in diagnosing wear faults in diesel engines. For this purpose, a multi-level belief rule base (BBRB) system is proposed in this paper. The BBRB system consists of two-level belief rule bases, and the 2D and 3D characteristics of wear particles are used as antecedent attributes on each level. Quantitative and qualitative wear information with uncertainties can be processed simultaneously by the BBRB system. In order to enhance the efficiency of the BBRB, the silhouette value is adopted to determine referential points and the fuzzy c-means clustering algorithm is used to transform input wear information into belief degrees. In addition, the initial parameters of the BBRB system are constructed on the basis of expert-domain knowledge and then optimized by the genetic algorithm to ensure the robustness of the system. To verify the validity of the BBRB system, experimental data acquired from real-world diesel engines are analyzed. Five-fold cross-validation is conducted on the experimental data and the BBRB is compared with the other four models in the cross-validation. In addition, a verification dataset containing different wear particles is used to highlight the effectiveness of the BBRB system in wear mode identification. The verification results demonstrate that the proposed BBRB is effective and efficient for wear mode identification with better performance and stability than competing systems.
Signal processing and neural network toolbox and its application to failure diagnosis and prognosis
NASA Astrophysics Data System (ADS)
Tu, Fang; Wen, Fang; Willett, Peter K.; Pattipati, Krishna R.; Jordan, Eric H.
2001-07-01
Many systems are comprised of components equipped with self-testing capability; however, if the system is complex involving feedback and the self-testing itself may occasionally be faulty, tracing faults to a single or multiple causes is difficult. Moreover, many sensors are incapable of reliable decision-making on their own. In such cases, a signal processing front-end that can match inference needs will be very helpful. The work is concerned with providing an object-oriented simulation environment for signal processing and neural network-based fault diagnosis and prognosis. In the toolbox, we implemented a wide range of spectral and statistical manipulation methods such as filters, harmonic analyzers, transient detectors, and multi-resolution decomposition to extract features for failure events from data collected by data sensors. Then we evaluated multiple learning paradigms for general classification, diagnosis and prognosis. The network models evaluated include Restricted Coulomb Energy (RCE) Neural Network, Learning Vector Quantization (LVQ), Decision Trees (C4.5), Fuzzy Adaptive Resonance Theory (FuzzyArtmap), Linear Discriminant Rule (LDR), Quadratic Discriminant Rule (QDR), Radial Basis Functions (RBF), Multiple Layer Perceptrons (MLP) and Single Layer Perceptrons (SLP). Validation techniques, such as N-fold cross-validation and bootstrap techniques, are employed for evaluating the robustness of network models. The trained networks are evaluated for their performance using test data on the basis of percent error rates obtained via cross-validation, time efficiency, generalization ability to unseen faults. Finally, the usage of neural networks for the prediction of residual life of turbine blades with thermal barrier coatings is described and the results are shown. The neural network toolbox has also been applied to fault diagnosis in mixed-signal circuits.
Hao, Ming; Wang, Yanli; Bryant, Stephen H
2016-02-25
Identification of drug-target interactions (DTI) is a central task in drug discovery processes. In this work, a simple but effective regularized least squares integrating with nonlinear kernel fusion (RLS-KF) algorithm is proposed to perform DTI predictions. Using benchmark DTI datasets, our proposed algorithm achieves the state-of-the-art results with area under precision-recall curve (AUPR) of 0.915, 0.925, 0.853 and 0.909 for enzymes, ion channels (IC), G protein-coupled receptors (GPCR) and nuclear receptors (NR) based on 10 fold cross-validation. The performance can further be improved by using a recalculated kernel matrix, especially for the small set of nuclear receptors with AUPR of 0.945. Importantly, most of the top ranked interaction predictions can be validated by experimental data reported in the literature, bioassay results in the PubChem BioAssay database, as well as other previous studies. Our analysis suggests that the proposed RLS-KF is helpful for studying DTI, drug repositioning as well as polypharmacology, and may help to accelerate drug discovery by identifying novel drug targets. Published by Elsevier B.V.
NASA Astrophysics Data System (ADS)
Zhou, Si-Da; Ma, Yuan-Chen; Liu, Li; Kang, Jie; Ma, Zhi-Sai; Yu, Lei
2018-01-01
Identification of time-varying modal parameters contributes to the structural health monitoring, fault detection, vibration control, etc. of the operational time-varying structural systems. However, it is a challenging task because there is not more information for the identification of the time-varying systems than that of the time-invariant systems. This paper presents a vector time-dependent autoregressive model and least squares support vector machine based modal parameter estimator for linear time-varying structural systems in case of output-only measurements. To reduce the computational cost, a Wendland's compactly supported radial basis function is used to achieve the sparsity of the Gram matrix. A Gamma-test-based non-parametric approach of selecting the regularization factor is adapted for the proposed estimator to replace the time-consuming n-fold cross validation. A series of numerical examples have illustrated the advantages of the proposed modal parameter estimator on the suppression of the overestimate and the short data. A laboratory experiment has further validated the proposed estimator.
Hao, Z Q; Li, C M; Shen, M; Yang, X Y; Li, K H; Guo, L B; Li, X Y; Lu, Y F; Zeng, X Y
2015-03-23
Laser-induced breakdown spectroscopy (LIBS) with partial least squares regression (PLSR) has been applied to measuring the acidity of iron ore, which can be defined by the concentrations of oxides: CaO, MgO, Al₂O₃, and SiO₂. With the conventional internal standard calibration, it is difficult to establish the calibration curves of CaO, MgO, Al₂O₃, and SiO₂ in iron ore due to the serious matrix effects. PLSR is effective to address this problem due to its excellent performance in compensating the matrix effects. In this work, fifty samples were used to construct the PLSR calibration models for the above-mentioned oxides. These calibration models were validated by the 10-fold cross-validation method with the minimum root-mean-square errors (RMSE). Another ten samples were used as a test set. The acidities were calculated according to the estimated concentrations of CaO, MgO, Al₂O₃, and SiO₂ using the PLSR models. The average relative error (ARE) and RMSE of the acidity achieved 3.65% and 0.0048, respectively, for the test samples.
Saraiva, Renata M; Bezerra, João; Perkusich, Mirko; Almeida, Hyggo; Siebra, Clauirton
2015-01-01
Recently there has been an increasing interest in applying information technology to support the diagnosis of diseases such as cancer. In this paper, we present a hybrid approach using case-based reasoning (CBR) and rule-based reasoning (RBR) to support cancer diagnosis. We used symptoms, signs, and personal information from patients as inputs to our model. To form specialized diagnoses, we used rules to define the input factors' importance according to the patient's characteristics. The model's output presents the probability of the patient having a type of cancer. To carry out this research, we had the approval of the ethics committee at Napoleão Laureano Hospital, in João Pessoa, Brazil. To define our model's cases, we collected real patient data at Napoleão Laureano Hospital. To define our model's rules and weights, we researched specialized literature and interviewed health professional. To validate our model, we used K-fold cross validation with the data collected at Napoleão Laureano Hospital. The results showed that our approach is an effective CBR system to diagnose cancer.
Automating an integrated spatial data-mining model for landfill site selection
NASA Astrophysics Data System (ADS)
Abujayyab, Sohaib K. M.; Ahamad, Mohd Sanusi S.; Yahya, Ahmad Shukri; Ahmad, Siti Zubaidah; Aziz, Hamidi Abdul
2017-10-01
An integrated programming environment represents a robust approach to building a valid model for landfill site selection. One of the main challenges in the integrated model is the complicated processing and modelling due to the programming stages and several limitations. An automation process helps avoid the limitations and improve the interoperability between integrated programming environments. This work targets the automation of a spatial data-mining model for landfill site selection by integrating between spatial programming environment (Python-ArcGIS) and non-spatial environment (MATLAB). The model was constructed using neural networks and is divided into nine stages distributed between Matlab and Python-ArcGIS. A case study was taken from the north part of Peninsular Malaysia. 22 criteria were selected to utilise as input data and to build the training and testing datasets. The outcomes show a high-performance accuracy percentage of 98.2% in the testing dataset using 10-fold cross validation. The automated spatial data mining model provides a solid platform for decision makers to performing landfill site selection and planning operations on a regional scale.
AbuHassan, Kamal J; Bakhori, Noremylia M; Kusnin, Norzila; Azmi, Umi Z M; Tania, Marzia H; Evans, Benjamin A; Yusof, Nor A; Hossain, M A
2017-07-01
Tuberculosis (TB) remains one of the most devastating infectious diseases and its treatment efficiency is majorly influenced by the stage at which infection with the TB bacterium is diagnosed. The available methods for TB diagnosis are either time consuming, costly or not efficient. This study employs a signal generation mechanism for biosensing, known as Plasmonic ELISA, and computational intelligence to facilitate automatic diagnosis of TB. Plasmonic ELISA enables the detection of a few molecules of analyte by the incorporation of smart nanomaterials for better sensitivity of the developed detection system. The computational system uses k-means clustering and thresholding for image segmentation. This paper presents the results of the classification performance of the Plasmonic ELISA imaging data by using various types of classifiers. The five-fold cross-validation results show high accuracy rate (>97%) in classifying TB images using the entire data set. Future work will focus on developing an intelligent mobile-enabled expert system to diagnose TB in real-time. The intelligent system will be clinically validated and tested in collaboration with healthcare providers in Malaysia.
Prediction of metabolites of epoxidation reaction in MetaTox.
Rudik, A V; Dmitriev, A V; Bezhentsev, V M; Lagunin, A A; Filimonov, D A; Poroikov, V V
2017-10-01
Biotransformation is a process of the chemical modifications which may lead to the reactive metabolites, in particular the epoxides. Epoxide reactive metabolites may cause the toxic effects. The prediction of such metabolites is important for drug development and ecotoxicology studies. Epoxides are formed by some oxidation reactions, usually catalysed by cytochromes P450, and represent a large class of three-membered cyclic ethers. Identification of molecules, which may be epoxidized, and indication of the specific location of epoxide functional group (which is called SOE - site of epoxidation) are important for prediction of epoxide metabolites. Datasets from 355 molecules and 615 reactions were created for training and validation. The prediction of SOE is based on a combination of LMNA (Labelled Multilevel Neighbourhood of Atom) descriptors and Bayesian-like algorithm implemented in PASS software and MetaTox web-service. The average invariant accuracy of prediction (AUC) calculated in leave-one-out and 20-fold cross-validation procedures is 0.9. Prediction of epoxide formation based on the created SAR model is included as the component of MetaTox web-service ( http://www.way2drug.com/mg ).
Predicting drug-induced liver injury using ensemble learning methods and molecular fingerprints.
Ai, Haixin; Chen, Wen; Zhang, Li; Huang, Liangchao; Yin, Zimo; Hu, Huan; Zhao, Qi; Zhao, Jian; Liu, Hongsheng
2018-05-21
Drug-induced liver injury (DILI) is a major safety concern in the drug-development process, and various methods have been proposed to predict the hepatotoxicity of compounds during the early stages of drug trials. In this study, we developed an ensemble model using three machine learning algorithms and 12 molecular fingerprints from a dataset containing 1,241 diverse compounds. The ensemble model achieved an average accuracy of 71.1±2.6%, sensitivity of 79.9±3.6%, specificity of 60.3±4.8%, and area under the receiver operating characteristic curve (AUC) of 0.764±0.026 in five-fold cross-validation and an accuracy of 84.3%, sensitivity of 86.9%, specificity of 75.4%, and AUC of 0.904 in an external validation dataset of 286 compounds collected from the Liver Toxicity Knowledge Base (LTKB). Compared with previous methods, the ensemble model achieved relatively high accuracy and sensitivity. We also identified several substructures related to DILI. In addition, we provide a web server offering access to our models (http://ccsipb.lnu.edu.cn/toxicity/HepatoPred-EL/).
Harijan, Rajesh K.; Zoi, Ioanna; Antoniou, Dimitri; Schwartz, Steven D.; Schramm, Vern L.
2017-01-01
Heavy-enzyme isotope effects (15N-, 13C-, and 2H-labeled protein) explore mass-dependent vibrational modes linked to catalysis. Transition path-sampling (TPS) calculations have predicted femtosecond dynamic coupling at the catalytic site of human purine nucleoside phosphorylase (PNP). Coupling is observed in heavy PNPs, where slowed barrier crossing caused a normal heavy-enzyme isotope effect (kchem light/kchem heavy > 1.0). We used TPS to design mutant F159Y PNP, predicted to improve barrier crossing for heavy F159Y PNP, an attempt to generate a rare inverse heavy-enzyme isotope effect (kchem light/kchem heavy < 1.0). Steady-state kinetic comparison of light and heavy native PNPs to light and heavy F159Y PNPs revealed similar kinetic properties. Pre–steady-state chemistry was slowed 32-fold in F159Y PNP. Pre–steady-state chemistry compared heavy and light native and F159Y PNPs and found a normal heavy-enzyme isotope effect of 1.31 for native PNP and an inverse effect of 0.75 for F159Y PNP. Increased isotopic mass in F159Y PNP causes more efficient transition state formation. Independent validation of the inverse isotope effect for heavy F159Y PNP came from commitment to catalysis experiments. Most heavy enzymes demonstrate normal heavy-enzyme isotope effects, and F159Y PNP is a rare example of an inverse effect. Crystal structures and TPS dynamics of native and F159Y PNPs explore the catalytic-site geometry associated with these catalytic changes. Experimental validation of TPS predictions for barrier crossing establishes the connection of rapid protein dynamics and vibrational coupling to enzymatic transition state passage. PMID:28584087
An empirical assessment of validation practices for molecular classifiers
Castaldi, Peter J.; Dahabreh, Issa J.
2011-01-01
Proposed molecular classifiers may be overfit to idiosyncrasies of noisy genomic and proteomic data. Cross-validation methods are often used to obtain estimates of classification accuracy, but both simulations and case studies suggest that, when inappropriate methods are used, bias may ensue. Bias can be bypassed and generalizability can be tested by external (independent) validation. We evaluated 35 studies that have reported on external validation of a molecular classifier. We extracted information on study design and methodological features, and compared the performance of molecular classifiers in internal cross-validation versus external validation for 28 studies where both had been performed. We demonstrate that the majority of studies pursued cross-validation practices that are likely to overestimate classifier performance. Most studies were markedly underpowered to detect a 20% decrease in sensitivity or specificity between internal cross-validation and external validation [median power was 36% (IQR, 21–61%) and 29% (IQR, 15–65%), respectively]. The median reported classification performance for sensitivity and specificity was 94% and 98%, respectively, in cross-validation and 88% and 81% for independent validation. The relative diagnostic odds ratio was 3.26 (95% CI 2.04–5.21) for cross-validation versus independent validation. Finally, we reviewed all studies (n = 758) which cited those in our study sample, and identified only one instance of additional subsequent independent validation of these classifiers. In conclusion, these results document that many cross-validation practices employed in the literature are potentially biased and genuine progress in this field will require adoption of routine external validation of molecular classifiers, preferably in much larger studies than in current practice. PMID:21300697
Origami tubes with reconfigurable polygonal cross-sections.
Filipov, E T; Paulino, G H; Tachi, T
2016-01-01
Thin sheets can be assembled into origami tubes to create a variety of deployable, reconfigurable and mechanistically unique three-dimensional structures. We introduce and explore origami tubes with polygonal, translational symmetric cross-sections that can reconfigure into numerous geometries. The tubular structures satisfy the mathematical definitions for flat and rigid foldability, meaning that they can fully unfold from a flattened state with deformations occurring only at the fold lines. The tubes do not need to be straight and can be constructed to follow a non-linear curved line when deployed. The cross-section and kinematics of the tubular structures can be reprogrammed by changing the direction of folding at some folds. We discuss the variety of tubular structures that can be conceived and we show limitations that govern the geometric design. We quantify the global stiffness of the origami tubes through eigenvalue and structural analyses and highlight the mechanical characteristics of these systems. The two-scale nature of this work indicates that, from a local viewpoint, the cross-sections of the polygonal tubes are reconfigurable while, from a global viewpoint, deployable tubes of desired shapes are achieved. This class of tubes has potential applications ranging from pipes and micro-robotics to deployable architecture in buildings.
Origami tubes with reconfigurable polygonal cross-sections
Filipov, E. T.; Paulino, G. H.; Tachi, T.
2016-01-01
Thin sheets can be assembled into origami tubes to create a variety of deployable, reconfigurable and mechanistically unique three-dimensional structures. We introduce and explore origami tubes with polygonal, translational symmetric cross-sections that can reconfigure into numerous geometries. The tubular structures satisfy the mathematical definitions for flat and rigid foldability, meaning that they can fully unfold from a flattened state with deformations occurring only at the fold lines. The tubes do not need to be straight and can be constructed to follow a non-linear curved line when deployed. The cross-section and kinematics of the tubular structures can be reprogrammed by changing the direction of folding at some folds. We discuss the variety of tubular structures that can be conceived and we show limitations that govern the geometric design. We quantify the global stiffness of the origami tubes through eigenvalue and structural analyses and highlight the mechanical characteristics of these systems. The two-scale nature of this work indicates that, from a local viewpoint, the cross-sections of the polygonal tubes are reconfigurable while, from a global viewpoint, deployable tubes of desired shapes are achieved. This class of tubes has potential applications ranging from pipes and micro-robotics to deployable architecture in buildings. PMID:26997894
Analysis of MHC class I folding: novel insights into intermediate forms
Simone, Laura C.; Tuli, Amit; Simone, Peter D.; Wang, Xiaojian; Solheim, Joyce C.
2012-01-01
Folding around a peptide ligand is integral to the antigen presentation function of major histocompatibility complex (MHC) class I molecules. Several lines of evidence indicate that the broadly cross-reactive 34-1-2 antibody is sensitive to folding of the MHC class I peptide-binding groove. Here, we show that peptide-loading complex proteins associated with the murine MHC class I molecule Kd are found primarily in association with the 34-1-2+ form. This led us to hypothesize that the 34-1-2 antibody may recognize intermediately, as well as fully, folded MHC class I molecules. In order to further characterize the form(s) of MHC class I molecules recognized by 34-1-2, we took advantage of its cross-reactivity with Ld. Recognition of the open and folded forms of Ld by the 64-3-7 and 30-5-7 antibodies, respectively, has been extensively characterized, providing us with parameters against which to compare 34-1-2 reactivity. We found that the 34-1-2+ Ld molecules displayed characteristics indicative of incomplete folding, including increased tapasin association, endoplasmic reticulum retention, and instability at the cell surface. Moreover, we demonstrate that an Ld-specific peptide induced folding of the 34-1-2+ Ld intermediate. Altogether, these results yield novel insights into the nature of MHC class I molecules recognized by the 34-1-2 antibody. PMID:22329842
NASA Astrophysics Data System (ADS)
Barber, Douglas E.; Stockli, Daniel F.; Koshnaw, Renas I.; Tamar-Agha, Mazin Y.; Yilmaz, Ismail O.
2016-04-01
The Bitlis-Zagros orogen in northern Iraq is a principal element of the Arabia-Eurasia continent collision and is characterized by the lateral intersection of two structural domains: the NW-SE trending Zagros proper system of Iran and the E-W trending Bitlis fold-thrust belt of Turkey and Syria. While these components in northern Iraq share a similar stratigraphic framework, they exhibit along-strike variations in the width and style of tectonic zones, fold morphology and trends, and structural inheritance. However, the distinctions of the Bitlis and Zagros segments remains poorly understood in terms of timing and deformation kinematics as well as first-order controls on fold-thrust development. Structural and stratigraphic study and seismic data combined with low-T thermochronometry provide the basis for reconstructions of the Bitlis-Zagros fold-thrust belt in southeastern Turkey and northern Iraq to elucidate the kinematic and temporal relationship of these two systems. Balanced cross-sections were constructed and incrementally restored to quantify the deformational evolution and use as input for thermokinematic models (FETKIN) to generate thermochronometric ages along the topographic surface of each cross-section line. The forward modeled thermochronometric ages from were then compared to new and previously published apatite and zircon (U-Th)/He and fission-track ages from southeastern Turkey and northern Iraq to test the validity of the timing, rate, and fault-motion geometry associated with each reconstruction. The results of these balanced theromokinematic restorations integrated with constraints from syn-tectonic sedimentation suggest that the Zagros belt between Erbil and Suleimaniyah was affected by an initial phase of Late Cretaceous exhumation related to the Proto-Zagros collision. During the main Zagros phase, deformation advanced rapidly and in-sequence from the Main Zagros Fault to the thin-skinned frontal thrusts (Kirkuk, Shakal, Qamar) from middle to latest Miocene times, followed by out-of-sequence development of the Mountain Front Flexure (Qaradagh anticline) by ~5 Ma. In contrast, initial exhumation in the northern Bitlis belt occurred by mid-Eocene time, followed by collisional deformation that propagated southward into northern Iraqi Kurdistan during the middle to late Miocene. Plio-Pleistocene deformation was partitioned into out-of-sequence reactivation of the Ora thrust along the Iraq-Turkey border, concurrent with development of the Sinjar and Abdulaziz inversion structures at the edge of the Bitlis deformation front. Overall, these data suggest the Bitlis and Zagros trends evolved relatively independently during Cretaceous and early Cenozoic times, resulting in very different structural and stratigraphic inheritance, before being affected contemporaneously by major phase of in-sequence shortening during middle to latest Miocene and out-of-sequence deformation since the Pliocene. Limited seismic sections corroborate the notion that the structural style and trend of the Bitlis fold belt is dominated by inverted Mesozoic extensional faults, whereas the Zagros structures are interpreted mostly as fault-propagation folds above a Triassic décollement. These pre-existing heterogeneities in the Bitlis contributed to the lower shortening estimates, variable anticline orientation, and irregular fold spacing and the fundamentally different orientations of the Zagros-Bitlis belt in Iraqi Kurdistan and Turkey.
Miyazaki, Ryoji; Myougo, Naomi; Mori, Hiroyuki; Akiyama, Yoshinori
2018-01-12
Many proteins form multimeric complexes that play crucial roles in various cellular processes. Studying how proteins are correctly folded and assembled into such complexes in a living cell is important for understanding the physiological roles and the qualitative and quantitative regulation of the complex. However, few methods are suitable for analyzing these rapidly occurring processes. Site-directed in vivo photo-cross-linking is an elegant technique that enables analysis of protein-protein interactions in living cells with high spatial resolution. However, the conventional site-directed in vivo photo-cross-linking method is unsuitable for analyzing dynamic processes. Here, by combining an improved site-directed in vivo photo-cross-linking technique with a pulse-chase approach, we developed a new method that can analyze the folding and assembly of a newly synthesized protein with high spatiotemporal resolution. We demonstrate that this method, named the pulse-chase and in vivo photo-cross-linking experiment (PiXie), enables the kinetic analysis of the formation of an Escherichia coli periplasmic (soluble) protein complex (PhoA). We also used our new technique to investigate assembly/folding processes of two membrane complexes (SecD-SecF in the inner membrane and LptD-LptE in the outer membrane), which provided new insights into the biogenesis of these complexes. Our PiXie method permits analysis of the dynamic behavior of various proteins and enables examination of protein-protein interactions at the level of individual amino acid residues. We anticipate that our new technique will have valuable utility for studies of protein dynamics in many organisms. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.
Bigbee, William L.; Gopalakrishnan, Vanathi; Weissfeld, Joel L.; Wilson, David O.; Dacic, Sanja; Lokshin, Anna E.; Siegfried, Jill M.
2012-01-01
Introduction Clinical decision-making in the setting of CT screening could benefit from accessible biomarkers that help predict the level of lung cancer risk in high-risk individuals with indeterminate pulmonary nodules. Methods To identify candidate serum biomarkers, we measured 70 cancer-related proteins by Luminex xMAP® multiplexed immunoassays in a training set of sera from 56 patients with biopsy-proven primary non small cell lung cancer and 56 age-, sex- and smoking-matched CT-screened controls. Results We identified a panel of 10 serum biomarkers – prolactin, transthyretin, thrombospondin-1, E-selectin, C-C motif chemokine 5, macrophage migration inhibitory factor, plasminogen activator inhibitor, receptor tyrosine-protein kinase, Cyfra 21.1, and serum amyloid A – that distinguished lung cancer from controls with an estimated balanced accuracy (average of sensitivity and specificity) of 76.0%±3.8% from 20-fold internal cross-validation. We then iteratively evaluated this model in independent test and verification case/control studies confirming the initial classification performance of the panel. The classification performance of the 10-biomarker panel was also analytically validated using ELISAs in a second independent case/control population further validating the robustness of the panel. Conclusions The performance of this 10-biomarker panel based model was 77.1% sensitivity/76.2% specificity in cross-validation in the expanded training set, 73.3% sensitivity/93.3% specificity (balanced accuracy 83.3%) in the blinded verification set with the best discriminative performance in Stage I/II cases: 85% sensitivity (balanced accuracy 89.2%). Importantly, the rate of misclassification of CT-screened controls was not different in most control subgroups with or without airflow obstruction or emphysema or pulmonary nodules. These biomarkers have potential to aid in the early detection of lung cancer and more accurate interpretation of indeterminate pulmonary nodules detected by screening CT. PMID:22425918
Refining Time-Activity Classification of Human Subjects Using the Global Positioning System
Hu, Maogui; Li, Wei; Li, Lianfa; Houston, Douglas; Wu, Jun
2016-01-01
Background Detailed spatial location information is important in accurately estimating personal exposure to air pollution. Global Position System (GPS) has been widely used in tracking personal paths and activities. Previous researchers have developed time-activity classification models based on GPS data, most of them were developed for specific regions. An adaptive model for time-location classification can be widely applied to air pollution studies that use GPS to track individual level time-activity patterns. Methods Time-activity data were collected for seven days using GPS loggers and accelerometers from thirteen adult participants from Southern California under free living conditions. We developed an automated model based on random forests to classify major time-activity patterns (i.e. indoor, outdoor-static, outdoor-walking, and in-vehicle travel). Sensitivity analysis was conducted to examine the contribution of the accelerometer data and the supplemental spatial data (i.e. roadway and tax parcel data) to the accuracy of time-activity classification. Our model was evaluated using both leave-one-fold-out and leave-one-subject-out methods. Results Maximum speeds in averaging time intervals of 7 and 5 minutes, and distance to primary highways with limited access were found to be the three most important variables in the classification model. Leave-one-fold-out cross-validation showed an overall accuracy of 99.71%. Sensitivities varied from 84.62% (outdoor walking) to 99.90% (indoor). Specificities varied from 96.33% (indoor) to 99.98% (outdoor static). The exclusion of accelerometer and ambient light sensor variables caused a slight loss in sensitivity for outdoor walking, but little loss in overall accuracy. However, leave-one-subject-out cross-validation showed considerable loss in sensitivity for outdoor static and outdoor walking conditions. Conclusions The random forests classification model can achieve high accuracy for the four major time-activity categories. The model also performed well with just GPS, road and tax parcel data. However, caution is warranted when generalizing the model developed from a small number of subjects to other populations. PMID:26919723
Ferrández, Oscar; South, Brett R; Shen, Shuying; Friedlin, F Jeffrey; Samore, Matthew H; Meystre, Stéphane M
2012-07-27
The increased use and adoption of Electronic Health Records (EHR) causes a tremendous growth in digital information useful for clinicians, researchers and many other operational purposes. However, this information is rich in Protected Health Information (PHI), which severely restricts its access and possible uses. A number of investigators have developed methods for automatically de-identifying EHR documents by removing PHI, as specified in the Health Insurance Portability and Accountability Act "Safe Harbor" method.This study focuses on the evaluation of existing automated text de-identification methods and tools, as applied to Veterans Health Administration (VHA) clinical documents, to assess which methods perform better with each category of PHI found in our clinical notes; and when new methods are needed to improve performance. We installed and evaluated five text de-identification systems "out-of-the-box" using a corpus of VHA clinical documents. The systems based on machine learning methods were trained with the 2006 i2b2 de-identification corpora and evaluated with our VHA corpus, and also evaluated with a ten-fold cross-validation experiment using our VHA corpus. We counted exact, partial, and fully contained matches with reference annotations, considering each PHI type separately, or only one unique 'PHI' category. Performance of the systems was assessed using recall (equivalent to sensitivity) and precision (equivalent to positive predictive value) metrics, as well as the F(2)-measure. Overall, systems based on rules and pattern matching achieved better recall, and precision was always better with systems based on machine learning approaches. The highest "out-of-the-box" F(2)-measure was 67% for partial matches; the best precision and recall were 95% and 78%, respectively. Finally, the ten-fold cross validation experiment allowed for an increase of the F(2)-measure to 79% with partial matches. The "out-of-the-box" evaluation of text de-identification systems provided us with compelling insight about the best methods for de-identification of VHA clinical documents. The errors analysis demonstrated an important need for customization to PHI formats specific to VHA documents. This study informed the planning and development of a "best-of-breed" automatic de-identification application for VHA clinical text.
Parsing clinical text: how good are the state-of-the-art parsers?
Jiang, Min; Huang, Yang; Fan, Jung-wei; Tang, Buzhou; Denny, Josh; Xu, Hua
2015-01-01
Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.
STRUM: structure-based prediction of protein stability changes upon single-point mutation.
Quan, Lijun; Lv, Qiang; Zhang, Yang
2016-10-01
Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Vigneault, Davis M; Xie, Weidi; Ho, Carolyn Y; Bluemke, David A; Noble, J Alison
2018-05-22
Pixelwise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady state free precession (SSFP) cine sequences is an essential preprocessing step for a wide range of analyses. Variability in contrast, appearance, orientation, and placement of the heart between patients, clinical views, scanners, and protocols makes fully automatic semantic segmentation a notoriously difficult problem. Here, we present Ω-Net (Omega-Net): A novel convolutional neural network (CNN) architecture for simultaneous localization, transformation into a canonical orientation, and semantic segmentation. First, an initial segmentation is performed on the input image; second, the features learned during this initial segmentation are used to predict the parameters needed to transform the input image into a canonical orientation; and third, a final segmentation is performed on the transformed image. In this work, Ω-Nets of varying depths were trained to detect five foreground classes in any of three clinical views (short axis, SA; four-chamber, 4C; two-chamber, 2C), without prior knowledge of the view being segmented. This constitutes a substantially more challenging problem compared with prior work. The architecture was trained using three-fold cross-validation on a cohort of patients with hypertrophic cardiomyopathy (HCM, N=42) and healthy control subjects (N=21). Network performance, as measured by weighted foreground intersection-over-union (IoU), was substantially improved for the best-performing Ω-Net compared with U-Net segmentation without localization or orientation (0.858 vs 0.834). In addition, to be comparable with other works, Ω-Net was retrained from scratch using five-fold cross-validation on the publicly available 2017 MICCAI Automated Cardiac Diagnosis Challenge (ACDC) dataset. The Ω-Net outperformed the state-of-the-art method in segmentation of the LV and RV bloodpools, and performed slightly worse in segmentation of the LV myocardium. We conclude that this architecture represents a substantive advancement over prior approaches, with implications for biomedical image segmentation more generally. Published by Elsevier B.V.
STRUM: structure-based prediction of protein stability changes upon single-point mutation
Quan, Lijun; Lv, Qiang; Zhang, Yang
2016-01-01
Motivation: Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. Results: We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. Availability and Implementation: http://zhanglab.ccmb.med.umich.edu/STRUM/ Contact: qiang@suda.edu.cn and zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318206
Stratigraphy and structure of coalbed methane reservoirs in the United States: an overview
Pashin, J.C.
1998-01-01
Stratigraphy and geologic structure determine the shape, continuity and permeability of coal and are therefore critical considerations for designing exploration and production strategies for coalbed methane. Coal in the United states is dominantly of Pennsylvanian, Cretaceous and Tertiary age, and to date, more than 90% of the coalbed methane produced is from Pennsylvanian and cretaceous strata of the Black Warrior and San Juan Basins. Investigations of these basins establish that sequence stratigraphy is a promising approach for regional characterization of coalbed methane reservoirs. Local stratigraphic variation within these strata is the product of sedimentologic and tectonic processes and is a consideration for selecting completion zones. Coalbed methane production in the United States is mainly from foreland and intermontane basins containing diverse compression and extensional structures. Balanced structural models can be used to construct and validate cross sections as well as to quantify layer-parallel strain and predict the distribution of fractures. Folds and faults influence gas and water production in diverse ways. However, interwell heterogeneity related to fractures and shear structures makes the performance of individual wells difficult to predict.Stratigraphy and geologic structure determine the shape, continuity and permeability of coal and are therefore critical considerations for designing exploration and production strategies for coalbed methane. Coal in the United States is dominantly of Pennsylvanian, Cretaceous and Tertiary age, and to date, more than 90% of the coalbed methane produced is from Pennsylvanian and Cretaceous strata of the Black Warrior and San Juan Basins. Investigations of these basins establish that sequence stratigraphy is a promising approach for regional characterization of coalbed methane reservoirs. Local stratigraphic variation within these strata is the product of sedimentologic and tectonic processes and is a consideration for selecting completion zones. Coalbed methane production in the United States is mainly from foreland and intermontane basins containing diverse compressional and extensional structures. Balanced structural models can be used to construct and validate cross sections as well as to quantify layer-parallel strain and predict the distribution of fractures. Folds and faults influence gas and water production in diverse ways. However, interwell heterogeneity related to fractures and shear structures makes the performance of individual wells difficult to predict.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, J; Pollom, E; Durkee, B
2015-06-15
Purpose: To predict response to radiation treatment using computational FDG-PET and CT images in locally advanced head and neck cancer (HNC). Methods: 68 patients with State III-IVB HNC treated with chemoradiation were included in this retrospective study. For each patient, we analyzed primary tumor and lymph nodes on PET and CT scans acquired both prior to and during radiation treatment, which led to 8 combinations of image datasets. From each image set, we extracted high-throughput, radiomic features of the following types: statistical, morphological, textural, histogram, and wavelet, resulting in a total of 437 features. We then performed unsupervised redundancy removalmore » and stability test on these features. To avoid over-fitting, we trained a logistic regression model with simultaneous feature selection based on least absolute shrinkage and selection operator (LASSO). To objectively evaluate the prediction ability, we performed 5-fold cross validation (CV) with 50 random repeats of stratified bootstrapping. Feature selection and model training was solely conducted on the training set and independently validated on the holdout test set. Receiver operating characteristic (ROC) curve of the pooled Result and the area under the ROC curve (AUC) was calculated as figure of merit. Results: For predicting local-regional recurrence, our model built on pre-treatment PET of lymph nodes achieved the best performance (AUC=0.762) on 5-fold CV, which compared favorably with node volume and SUVmax (AUC=0.704 and 0.449, p<0.001). Wavelet coefficients turned out to be the most predictive features. Prediction of distant recurrence showed a similar trend, in which pre-treatment PET features of lymph nodes had the highest AUC of 0.705. Conclusion: The radiomics approach identified novel imaging features that are predictive to radiation treatment response. If prospectively validated in larger cohorts, they could aid in risk-adaptive treatment of HNC.« less
Cole, Adam J; David, Allan E; Wang, Jianxin; Galbán, Craig J; Hill, Hannah L; Yang, Victor C
2011-03-01
While successful magnetic tumor targeting of iron oxide nanoparticles has been achieved in a number of models, the rapid blood clearance of magnetically suitable particles by the reticuloendothelial system (RES) limits their availability for targeting. This work aimed to develop a long-circulating magnetic iron oxide nanoparticle (MNP) platform capable of sustained tumor exposure via the circulation and, thus, potentially enhanced magnetic tumor targeting. Aminated, cross-linked starch (DN) and aminosilane (A) coated MNPs were successfully modified with 5 kDa (A5, D5) or 20 kDa (A20, D20) polyethylene glycol (PEG) chains using simple N-Hydroxysuccinimide (NHS) chemistry and characterized. Identical PEG-weight analogues between platforms (A5 & D5, A20 & D20) were similar in size (140-190 nm) and relative PEG labeling (1.5% of surface amines - A5/D5, 0.4% - A20/D20), with all PEG-MNPs possessing magnetization properties suitable for magnetic targeting. Candidate PEG-MNPs were studied in RES simulations in vitro to predict long-circulating character. D5 and D20 performed best showing sustained size stability in cell culture medium at 37 °C and 7 (D20) to 10 (D5) fold less uptake in RAW264.7 macrophages when compared to previously targeted, unmodified starch MNPs (D). Observations in vitro were validated in vivo, with D5 (7.29 h) and D20 (11.75 h) showing much longer half-lives than D (0.12 h). Improved plasma stability enhanced tumor MNP exposure 100 (D5) to 150 (D20) fold as measured by plasma AUC(0-∞). Sustained tumor exposure over 24 h was visually confirmed in a 9L-glioma rat model (12 mg Fe/kg) using magnetic resonance imaging (MRI). Findings indicate that a polyethylene glycol modified, cross-linked starch-coated MNP is a promising platform for enhanced magnetic tumor targeting, warranting further study in tumor models. Copyright © 2010 Elsevier Ltd. All rights reserved.
Comparison of Random Forest and Support Vector Machine classifiers using UAV remote sensing imagery
NASA Astrophysics Data System (ADS)
Piragnolo, Marco; Masiero, Andrea; Pirotti, Francesco
2017-04-01
Since recent years surveying with unmanned aerial vehicles (UAV) is getting a great amount of attention due to decreasing costs, higher precision and flexibility of usage. UAVs have been applied for geomorphological investigations, forestry, precision agriculture, cultural heritage assessment and for archaeological purposes. It can be used for land use and land cover classification (LULC). In literature, there are two main types of approaches for classification of remote sensing imagery: pixel-based and object-based. On one hand, pixel-based approach mostly uses training areas to define classes and respective spectral signatures. On the other hand, object-based classification considers pixels, scale, spatial information and texture information for creating homogeneous objects. Machine learning methods have been applied successfully for classification, and their use is increasing due to the availability of faster computing capabilities. The methods learn and train the model from previous computation. Two machine learning methods which have given good results in previous investigations are Random Forest (RF) and Support Vector Machine (SVM). The goal of this work is to compare RF and SVM methods for classifying LULC using images collected with a fixed wing UAV. The processing chain regarding classification uses packages in R, an open source scripting language for data analysis, which provides all necessary algorithms. The imagery was acquired and processed in November 2015 with cameras providing information over the red, blue, green and near infrared wavelength reflectivity over a testing area in the campus of Agripolis, in Italy. Images were elaborated and ortho-rectified through Agisoft Photoscan. The ortho-rectified image is the full data set, and the test set is derived from partial sub-setting of the full data set. Different tests have been carried out, using a percentage from 2 % to 20 % of the total. Ten training sets and ten validation sets are obtained from each test set. The control dataset consist of an independent visual classification done by an expert over the whole area. The classes are (i) broadleaf, (ii) building, (iii) grass, (iv) headland access path, (v) road, (vi) sowed land, (vii) vegetable. The RF and SVM are applied to the test set. The performances of the methods are evaluated using the three following accuracy metrics: Kappa index, Classification accuracy and Classification Error. All three are calculated in three different ways: with K-fold cross validation, using the validation test set and using the full test set. The analysis indicates that SVM gets better results in terms of good scores using K-fold cross or validation test set. Using the full test set, RF achieves a better result in comparison to SVM. It also seems that SVM performs better with smaller training sets, whereas RF performs better as training sets get larger.
NASA Astrophysics Data System (ADS)
Bian, Yunqiang; Ren, Weitong; Song, Feng; Yu, Jiafeng; Wang, Jihua
2018-05-01
Structure-based models or Gō-like models, which are built from one or multiple particular experimental structures, have been successfully applied to the folding of proteins and RNAs. Recently, a variant termed the hybrid atomistic model advances the description of backbone and side chain interactions of traditional structure-based models, by borrowing the description of local interactions from classical force fields. In this study, we assessed the validity of this model in the folding problem of human telomeric DNA G-quadruplex, where local dihedral terms play important roles. A two-state model was developed and a set of molecular dynamics simulations was conducted to study the folding dynamics of sequence Htel24, which was experimentally validated to adopt two different (3 + 1) hybrid G-quadruplex topologies in K+ solution. Consistent with the experimental observations, the hybrid-1 conformation was found to be more stable and the hybrid-2 conformation was kinetically more favored. The simulations revealed that the hybrid-2 conformation folded in a higher cooperative manner, which may be the reason why it was kinetically more accessible. Moreover, by building a Markov state model, a two-quartet G-quadruplex state and a misfolded state were identified as competing states to complicate the folding process of Htel24. Besides, the simulations also showed that the transition between hybrid-1 and hybrid-2 conformations may proceed an ensemble of hairpin structures. The hybrid atomistic structure-based model reproduced the kinetic partitioning folding dynamics of Htel24 between two different folds, and thus can be used to study the complex folding processes of other G-quadruplex structures.
Kinematics, structural mechanics, and design of origami structures with smooth folds
NASA Astrophysics Data System (ADS)
Peraza Hernandez, Edwin Alexander
Origami provides novel approaches to the fabrication, assembly, and functionality of engineering structures in various fields such as aerospace, robotics, etc. With the increase in complexity of the geometry and materials for origami structures that provide engineering utility, computational models and design methods for such structures have become essential. Currently available models and design methods for origami structures are generally limited to the idealization of the folds as creases of zeroth-order geometric continuity. Such an idealization is not proper for origami structures having non-negligible thickness or maximum curvature at the folds restricted by material limitations. Thus, for general structures, creased folds of merely zeroth-order geometric continuity are not appropriate representations of structural response and a new approach is needed. The first contribution of this dissertation is a model for the kinematics of origami structures having realistic folds of non-zero surface area and exhibiting higher-order geometric continuity, here termed smooth folds. The geometry of the smooth folds and the constraints on their associated kinematic variables are presented. A numerical implementation of the model allowing for kinematic simulation of structures having arbitrary fold patterns is also described. Examples illustrating the capability of the model to capture realistic structural folding response are provided. Subsequently, a method for solving the origami design problem of determining the geometry of a single planar sheet and its pattern of smooth folds that morphs into a given three-dimensional goal shape, discretized as a polygonal mesh, is presented. The design parameterization of the planar sheet and the constraints that allow for a valid pattern of smooth folds and approximation of the goal shape in a known folded configuration are presented. Various testing examples considering goal shapes of diverse geometries are provided. Afterwards, a model for the structural mechanics of origami continuum bodies with smooth folds is presented. Such a model entails the integration of the presented kinematic model and existing plate theories in order to obtain a structural representation for folds having non-zero thickness and comprised of arbitrary materials. The model is validated against finite element analysis. The last contribution addresses the design and analysis of active material-based self-folding structures that morph via simultaneous folding towards a given three-dimensional goal shape starting from a planar configuration. Implementation examples including shape memory alloy (SMA)-based self-folding structures are provided.
ActivityAware: An App for Real-Time Daily Activity Level Monitoring on the Amulet Wrist-Worn Device.
Boateng, George; Batsis, John A; Halter, Ryan; Kotz, David
2017-03-01
Physical activity helps reduce the risk of cardiovascular disease, hypertension and obesity. The ability to monitor a person's daily activity level can inform self-management of physical activity and related interventions. For older adults with obesity, the importance of regular, physical activity is critical to reduce the risk of long-term disability. In this work, we present ActivityAware , an application on the Amulet wrist-worn device that measures daily activity levels (sedentary, moderate and vigorous) of individuals, continuously and in real-time. The app implements an activity-level detection model, continuously collects acceleration data on the Amulet, classifies the current activity level, updates the day's accumulated time spent at that activity level, logs the data for later analysis, and displays the results on the screen. We developed an activity-level detection model using a Support Vector Machine (SVM). We trained our classifiers using data from a user study, where subjects performed the following physical activities: sit, stand, lay down, walk and run. With 10-fold cross validation and leave-one-subject-out (LOSO) cross validation, we obtained preliminary results that suggest accuracies up to 98%, for n=14 subjects. Testing the ActivityAware app revealed a projected battery life of up to 4 weeks before needing to recharge. The results are promising, indicating that the app may be used for activity-level monitoring, and eventually for the development of interventions that could improve the health of individuals.
HAMDA: Hybrid Approach for MiRNA-Disease Association prediction.
Chen, Xing; Niu, Ya-Wei; Wang, Guang-Hui; Yan, Gui-Ying
2017-12-01
For decades, enormous experimental researches have collectively indicated that microRNA (miRNA) could play indispensable roles in many critical biological processes and thus also the pathogenesis of human complex diseases. Whereas the resource and time cost required in traditional biology experiments are expensive, more and more attentions have been paid to the development of effective and feasible computational methods for predicting potential associations between disease and miRNA. In this study, we developed a computational model of Hybrid Approach for MiRNA-Disease Association prediction (HAMDA), which involved the hybrid graph-based recommendation algorithm, to reveal novel miRNA-disease associations by integrating experimentally verified miRNA-disease associations, disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity into a recommendation algorithm. HAMDA took not only network structure and information propagation but also node attribution into consideration, resulting in a satisfactory prediction performance. Specifically, HAMDA obtained AUCs of 0.9035 and 0.8395 in the frameworks of global and local leave-one-out cross validation, respectively. Meanwhile, HAMDA also achieved good performance with AUC of 0.8965 ± 0.0012 in 5-fold cross validation. Additionally, we conducted case studies about three important human cancers for performance evaluation of HAMDA. As a result, 90% (Lymphoma), 86% (Prostate Cancer) and 92% (Kidney Cancer) of top 50 predicted miRNAs were confirmed by recent experiment literature, which showed the reliable prediction ability of HAMDA. Copyright © 2017 Elsevier Inc. All rights reserved.
Babcock, Chad; Finley, Andrew O.; Bradford, John B.; Kolka, Randall K.; Birdsey, Richard A.; Ryan, Michael G.
2015-01-01
Many studies and production inventory systems have shown the utility of coupling covariates derived from Light Detection and Ranging (LiDAR) data with forest variables measured on georeferenced inventory plots through regression models. The objective of this study was to propose and assess the use of a Bayesian hierarchical modeling framework that accommodates both residual spatial dependence and non-stationarity of model covariates through the introduction of spatial random effects. We explored this objective using four forest inventory datasets that are part of the North American Carbon Program, each comprising point-referenced measures of above-ground forest biomass and discrete LiDAR. For each dataset, we considered at least five regression model specifications of varying complexity. Models were assessed based on goodness of fit criteria and predictive performance using a 10-fold cross-validation procedure. Results showed that the addition of spatial random effects to the regression model intercept improved fit and predictive performance in the presence of substantial residual spatial dependence. Additionally, in some cases, allowing either some or all regression slope parameters to vary spatially, via the addition of spatial random effects, further improved model fit and predictive performance. In other instances, models showed improved fit but decreased predictive performance—indicating over-fitting and underscoring the need for cross-validation to assess predictive ability. The proposed Bayesian modeling framework provided access to pixel-level posterior predictive distributions that were useful for uncertainty mapping, diagnosing spatial extrapolation issues, revealing missing model covariates, and discovering locally significant parameters.
NASA Astrophysics Data System (ADS)
Nakapelyukh, Mykhaylo; Bubniak, Ihor; Bubniak, Andriy; Jonckheere, Raymond; Ratschbacher, Lothar
2018-01-01
The Carpathians are part of the Alpine-Carpathian-Dinaridic orogen surrounding the Pannonian basin. Their Ukrainian part constitutes an ancient subduction-accretion complex that evolved into a foreland fold-thrust belt with a shortening history that was perpendicular to the orogenic strike. Herein, we constrain the evolution of the Ukrainian part of the Carpathian fold-thrust belt by apatite fission-track dating of sedimentary and volcanic samples and cross-section balancing and restoration. The apatite fission-track ages are uniform in the inner―southwestern part of the fold-thrust belt, implying post-shortening erosion since 12-10 Ma. The ages in the leading and trailing edges record provenance, i.e., sources in the Trans-European suture zone and the Inner Carpathians, respectively, and show that these parts of the fold-thrust were not heated to more than 100 °C. Syn-orogenic strata show sediment recycling: in the interior of the fold-thrust belt―the most thickened and most deeply eroded nappes―the apatite ages were reset, eroded, and redeposited in the syn-orogenic strata closer to the fore- and hinterland; the lag times are only a few million years. Two balanced cross sections, one constructed for this study and based on field and subsurface data, reveal an architecture characterized by nappe stacks separated by high-displacement thrusts; they record 340-390 km shortening. A kinematic forward model highlights the fold-thrust belt evolution from the pre-contractional configuration over the intermediate geometries during folding and thrusting and the post-shortening, erosional-unloading configuration at 12-10 Ma to the present-day geometry. Average shortening rates between 32-20 Ma and 20-12 Ma amounted to 13 and 21 km/Ma, respectively, implying a two-phased deformation of the Ukrainian fold-thrust belt.
An Automated Approach for Ranking Journals to Help in Clinician Decision Support
Jonnalagadda, Siddhartha R.; Moosavinasab, Soheil; Nath, Chinmoy; Li, Dingcheng; Chute, Christopher G.; Liu, Hongfang
2014-01-01
Point of care access to knowledge from full text journal articles supports decision-making and decreases medical errors. However, it is an overwhelming task to search through full text journal articles and find quality information needed by clinicians. We developed a method to rate journals for a given clinical topic, Congestive Heart Failure (CHF). Our method enables filtering of journals and ranking of journal articles based on source journal in relation to CHF. We also obtained a journal priority score, which automatically rates any journal based on its importance to CHF. Comparing our ranking with data gathered by surveying 169 cardiologists, who publish on CHF, our best Multiple Linear Regression model showed a correlation of 0.880, based on five-fold cross validation. Our ranking system can be extended to other clinical topics. PMID:25954382
Vehicular traffic noise prediction using soft computing approach.
Singh, Daljeet; Nigam, S P; Agrawal, V P; Kumar, Maneek
2016-12-01
A new approach for the development of vehicular traffic noise prediction models is presented. Four different soft computing methods, namely, Generalized Linear Model, Decision Trees, Random Forests and Neural Networks, have been used to develop models to predict the hourly equivalent continuous sound pressure level, Leq, at different locations in the Patiala city in India. The input variables include the traffic volume per hour, percentage of heavy vehicles and average speed of vehicles. The performance of the four models is compared on the basis of performance criteria of coefficient of determination, mean square error and accuracy. 10-fold cross validation is done to check the stability of the Random Forest model, which gave the best results. A t-test is performed to check the fit of the model with the field data. Copyright © 2016 Elsevier Ltd. All rights reserved.
Gesture recognition for smart home applications using portable radar sensors.
Wan, Qian; Li, Yiran; Li, Changzhi; Pal, Ranadip
2014-01-01
In this article, we consider the design of a human gesture recognition system based on pattern recognition of signatures from a portable smart radar sensor. Powered by AAA batteries, the smart radar sensor operates in the 2.4 GHz industrial, scientific and medical (ISM) band. We analyzed the feature space using principle components and application-specific time and frequency domain features extracted from radar signals for two different sets of gestures. We illustrate that a nearest neighbor based classifier can achieve greater than 95% accuracy for multi class classification using 10 fold cross validation when features are extracted based on magnitude differences and Doppler shifts as compared to features extracted through orthogonal transformations. The reported results illustrate the potential of intelligent radars integrated with a pattern recognition system for high accuracy smart home and health monitoring purposes.
Detection of Unilateral Hearing Loss by Stationary Wavelet Entropy.
Zhang, Yudong; Nayak, Deepak Ranjan; Yang, Ming; Yuan, Ti-Fei; Liu, Bin; Lu, Huimin; Wang, Shuihua
2017-01-01
Sensorineural hearing loss is correlated to massive neurological or psychiatric disease. T1-weighted volumetric images were acquired from fourteen subjects with right-sided hearing loss (RHL), fifteen subjects with left-sided hearing loss (LHL), and twenty healthy controls (HC). We treated a three-class classification problem: HC, LHL, and RHL. Stationary wavelet entropy was employed to extract global features from magnetic resonance images of each subject. Those stationary wavelet entropy features were used as input to a single-hidden layer feedforward neuralnetwork classifier. The 10 repetition results of 10-fold cross validation show that the accuracies of HC, LHL, and RHL are 96.94%, 97.14%, and 97.35%, respectively. Our developed system is promising and effective in detecting hearing loss. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Predicting ionizing radiation exposure using biochemically-inspired genomic machine learning.
Zhao, Jonathan Z L; Mucaki, Eliseos J; Rogan, Peter K
2018-01-01
Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% ( DDB2 , PRKDC , TPP2 , PTPRE , and GADD45A ) when validated over 209 samples and traditional validation accuracies of up to 92% ( DDB2 , CD8A , TALDO1 , PCNA , EIF4G2 , LCN2 , CDKN1A , PRKCH , ENO1 , and PPM1D ) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.
Mental State Assessment and Validation Using Personalized Physiological Biometrics
Patel, Aashish N.; Howard, Michael D.; Roach, Shane M.; Jones, Aaron P.; Bryant, Natalie B.; Robinson, Charles S. H.; Clark, Vincent P.; Pilly, Praveen K.
2018-01-01
Mental state monitoring is a critical component of current and future human-machine interfaces, including semi-autonomous driving and flying, air traffic control, decision aids, training systems, and will soon be integrated into ubiquitous products like cell phones and laptops. Current mental state assessment approaches supply quantitative measures, but their only frame of reference is generic population-level ranges. What is needed are physiological biometrics that are validated in the context of task performance of individuals. Using curated intake experiments, we are able to generate personalized models of three key biometrics as useful indicators of mental state; namely, mental fatigue, stress, and attention. We demonstrate improvements to existing approaches through the introduction of new features. Furthermore, addressing the current limitations in assessing the efficacy of biometrics for individual subjects, we propose and employ a multi-level validation scheme for the biometric models by means of k-fold cross-validation for discrete classification and regression testing for continuous prediction. The paper not only provides a unified pipeline for extracting a comprehensive mental state evaluation from a parsimonious set of sensors (only EEG and ECG), but also demonstrates the use of validation techniques in the absence of empirical data. Furthermore, as an example of the application of these models to novel situations, we evaluate the significance of correlations of personalized biometrics to the dynamic fluctuations of accuracy and reaction time on an unrelated threat detection task using a permutation test. Our results provide a path toward integrating biometrics into augmented human-machine interfaces in a judicious way that can help to maximize task performance.
Mental State Assessment and Validation Using Personalized Physiological Biometrics.
Patel, Aashish N; Howard, Michael D; Roach, Shane M; Jones, Aaron P; Bryant, Natalie B; Robinson, Charles S H; Clark, Vincent P; Pilly, Praveen K
2018-01-01
Mental state monitoring is a critical component of current and future human-machine interfaces, including semi-autonomous driving and flying, air traffic control, decision aids, training systems, and will soon be integrated into ubiquitous products like cell phones and laptops. Current mental state assessment approaches supply quantitative measures, but their only frame of reference is generic population-level ranges. What is needed are physiological biometrics that are validated in the context of task performance of individuals. Using curated intake experiments, we are able to generate personalized models of three key biometrics as useful indicators of mental state; namely, mental fatigue, stress, and attention. We demonstrate improvements to existing approaches through the introduction of new features. Furthermore, addressing the current limitations in assessing the efficacy of biometrics for individual subjects, we propose and employ a multi-level validation scheme for the biometric models by means of k -fold cross-validation for discrete classification and regression testing for continuous prediction. The paper not only provides a unified pipeline for extracting a comprehensive mental state evaluation from a parsimonious set of sensors (only EEG and ECG), but also demonstrates the use of validation techniques in the absence of empirical data. Furthermore, as an example of the application of these models to novel situations, we evaluate the significance of correlations of personalized biometrics to the dynamic fluctuations of accuracy and reaction time on an unrelated threat detection task using a permutation test. Our results provide a path toward integrating biometrics into augmented human-machine interfaces in a judicious way that can help to maximize task performance.
Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction.
Cheng, Hao; Garrick, Dorian J; Fernando, Rohan L
2017-01-01
A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model. Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis. Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
Kloog, Itai; Nordio, Francesco; Coull, Brent A; Schwartz, Joel
2012-11-06
Satellite-derived aerosol optical depth (AOD) measurements have the potential to provide spatiotemporally resolved predictions of both long and short-term exposures, but previous studies have generally shown moderate predictive power and lacked detailed high spatio- temporal resolution predictions across large domains. We aimed at extending our previous work by validating our model in another region with different geographical and metrological characteristics, and incorporating fine scale land use regression and nonrandom missingness to better predict PM(2.5) concentrations for days with or without satellite AOD measures. We start by calibrating AOD data for 2000-2008 across the Mid-Atlantic. We used mixed models regressing PM(2.5) measurements against day-specific random intercepts, and fixed and random AOD and temperature slopes. We used inverse probability weighting to account for nonrandom missingness of AOD, nested regions within days to capture spatial variation in the daily calibration, and introduced a penalization method that reduces the dimensionality of the large number of spatial and temporal predictors without selecting different predictors in different locations. We then take advantage of the association between grid-cell specific AOD values and PM(2.5) monitoring data, together with associations between AOD values in neighboring grid cells to develop grid cell predictions when AOD is missing. Finally to get local predictions (at the resolution of 50 m), we regressed the residuals from the predictions for each monitor from these previous steps against the local land use variables specific for each monitor. "Out-of-sample" 10-fold cross-validation was used to quantify the accuracy of our predictions at each step. For all days without AOD values, model performance was excellent (mean "out-of-sample" R(2) = 0.81, year-to-year variation 0.79-0.84). Upon removal of outliers in the PM(2.5) monitoring data, the results of the cross validation procedure was even better (overall mean "out of sample"R(2) of 0.85). Further, cross validation results revealed no bias in the predicted concentrations (Slope of observed vs predicted = 0.97-1.01). Our model allows one to reliably assess short-term and long-term human exposures in order to investigate both the acute and effects of ambient particles, respectively.
Golas, Sara Bersche; Shibahara, Takuma; Agboola, Stephen; Otaki, Hiroko; Sato, Jumpei; Nakae, Tatsuya; Hisamitsu, Toru; Kojima, Go; Felsted, Jennifer; Kakarmath, Sujay; Kvedar, Joseph; Jethwani, Kamal
2018-06-22
Heart failure is one of the leading causes of hospitalization in the United States. Advances in big data solutions allow for storage, management, and mining of large volumes of structured and semi-structured data, such as complex healthcare data. Applying these advances to complex healthcare data has led to the development of risk prediction models to help identify patients who would benefit most from disease management programs in an effort to reduce readmissions and healthcare cost, but the results of these efforts have been varied. The primary aim of this study was to develop a 30-day readmission risk prediction model for heart failure patients discharged from a hospital admission. We used longitudinal electronic medical record data of heart failure patients admitted within a large healthcare system. Feature vectors included structured demographic, utilization, and clinical data, as well as selected extracts of un-structured data from clinician-authored notes. The risk prediction model was developed using deep unified networks (DUNs), a new mesh-like network structure of deep learning designed to avoid over-fitting. The model was validated with 10-fold cross-validation and results compared to models based on logistic regression, gradient boosting, and maxout networks. Overall model performance was assessed using concordance statistic. We also selected a discrimination threshold based on maximum projected cost saving to the Partners Healthcare system. Data from 11,510 patients with 27,334 admissions and 6369 30-day readmissions were used to train the model. After data processing, the final model included 3512 variables. The DUNs model had the best performance after 10-fold cross-validation. AUCs for prediction models were 0.664 ± 0.015, 0.650 ± 0.011, 0.695 ± 0.016 and 0.705 ± 0.015 for logistic regression, gradient boosting, maxout networks, and DUNs respectively. The DUNs model had an accuracy of 76.4% at the classification threshold that corresponded with maximum cost saving to the hospital. Deep learning techniques performed better than other traditional techniques in developing this EMR-based prediction model for 30-day readmissions in heart failure patients. Such models can be used to identify heart failure patients with impending hospitalization, enabling care teams to target interventions at their most high-risk patients and improving overall clinical outcomes.
Analysis of a crossed Bragg cell acousto-optical spectrometer for SETI
NASA Technical Reports Server (NTRS)
Gulkis, S.
1989-01-01
The search for radio signals from extraterrestrial intelligent beings (SETI) requires the use of large instantaneous bandwidth (500 MHz) and high resolution (20 Hz) spectrometers. Digital systems with a high degree of modularity can be used to provide this capability, and this method has been widely discussed. Another technique for meeting the SETI requirement is to use a crossed Bragg cell spectrometer as described by Psaltis and Casasent. This technique makes use of the Folded Spectrum concept, introduced by Thomas. The Folded Spectrum is a 2-D Fourier Transform of a raster scanned 1-D signal. It is directly related to the long 1-D spectrum of the original signal and is ideally suited for optical signal processing. The folded spectrum technique has received little attention to date, primarily because early systems made use of photographic film which are unsuitable for the real time data analysis and voluminous data requirements of SETI. An analysis of the crossed Bragg cell spectrometer is presented as a method to achieve the spectral processing requirements for SETI. Systematic noise contributions unique to the Bragg cell system will be discussed.
Analysis of a crossed Bragg cell acousto-optical spectrometer for SETI.
Gulkis, S
1989-01-01
The search for radio signals from extraterrestrial intelligent beings (SETI) requires the use of large instantaneous bandwidth (500 MHz) and high resolution (20 Hz) spectrometers. Digital systems with a high degree of modularity can be used to provide this capability, and this method has been widely discussed. Another technique for meeting the SETI requirement is to use a crossed Bragg cell spectrometer as described by Psaltis and Casasent. This technique makes use of the Folded Spectrum concept, introduced by Thomas. The Folded Spectrum is a 2-D Fourier Transform of a raster scanned 1-D signal. It is directly related to the long 1-D spectrum of the original signal and is ideally suited for optical signal processing. The folded spectrum technique has received little attention to date, primarily because early systems made use of photographic film which are unsuitable for the real time data analysis and voluminous data requirements of SETI. An analysis of the crossed Bragg cell spectrometer is presented as a method to achieve the spectral processing requirements for SETI. Systematic noise contributions unique to the Bragg cell system will be discussed.
Analysis of a crossed Bragg cell acousto-optical spectrometer for SETI
NASA Astrophysics Data System (ADS)
Gulkis, Samuel
The search for radio signals from extraterrestrial intelligent beings (SETI) requires the use of large instantaneous bandwidth (500 MHz) and high resolution (20 Hz) spectrometers. Digital systems with a high degree of modularity can be used to provide this capability, and this method has been widely discussed. Another technique for meeting the SETI requirement is to use a crossed Bragg cell spectrometer as described by Psaltis and Casasent. This technique makes use of the Folded Spectrum concept, introduced by Thomas. The Folded Spectrum is a 2-D Fourier Transform of a raster scanned 1-D signal. It is directly related to the long 1-D spectrum of the original signal and is ideally suited for optical signal processing. The folded spectrum technique has received little attention to date, primarily because early systems made use of photographic film which are unsuitable for the real time data analysis and voluminous data requirements of SETI. An analysis of the crossed Bragg cell spectrometer is presented as a method to achieve the spectral processing requirements for SETI. Systematic noise contributions unique to the Bragg cell system will be discussed.
Guerrero-Romero, Fernando; Rodríguez-Morán, Martha
2010-03-01
To validate a method for screening cases of type 2 diabetes and monitoring at-risk people in a community in northern Mexico. The screening instrument for type 2 diabetes (ITD, for its Spanish acronym) was developed using a multiple logistic regression analysis that made it possible to determine the association between a new diagnosis of diabetes (a dependent variable) and 11 known risk factors. Internal validations were performed (through v-fold cross-validation), together with external validations (through the monitoring of a cohort of healthy individuals). In order to estimate the relative risk (RR) of developing type 2 diabetes, the total ITD score is calculated on the basis of an individual's risk factors and compared against a curve that shows the probability of that individual developing the disease. Of the 525 people in the cohort, 438 (83.4%) were followed for an average of 7 years (4.5 to 10 years), for a total of 2 696 person-years; 62 (14.2%) people developed diabetes during the time they were followed. Individuals scoring 55 points based on their risk factors demonstrated a significantly higher risk of developing diabetes in 7 years (RR = 6.1; IC95%: 1.7 to 11.1); the risk was even higher for those with a score of 75 points (RR = 9.4; IC95%: 2.1 to 11.5). The ITD is easy to use and a valid screening alternative for type 2 diabetes. Its use will allow more individuals to benefit from disease prevention methods and early diagnosis without substantially increasing costs and with minimal use of laboratory resources.
Instability, unfolding and aggregation of human lysozyme variants underlying amyloid fibrillogenesis
NASA Astrophysics Data System (ADS)
Booth, David R.; Sunde, Margaret; Bellotti, Vittorio; Robinson, Carol V.; Hutchinson, Winston L.; Fraser, Paul E.; Hawkins, Philip N.; Dobson, Christopher M.; Radford, Sheena E.; Blake, Colin C. F.; Pepys, Mark B.
1997-02-01
Tissue deposition of soluble proteins as amyloid fibrils underlies a range of fatal diseases. The two naturally occurring human lysozyme variants are both amyloidogenic, and are shown here to be unstable. They aggregate to form amyloid fibrils with transformation of the mainly helical native fold, observed in crystal structures, to the amyloid fibril cross-β fold. Biophysical studies suggest that partly folded intermediates are involved in fibrillogenesis, and this may be relevant to amyloidosis generally.
Reduced Set of Virulence Genes Allows High Accuracy Prediction of Bacterial Pathogenicity in Humans
Iraola, Gregorio; Vazquez, Gustavo; Spangenberg, Lucía; Naya, Hugo
2012-01-01
Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of different virulence-related genes among more than finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes () is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at ), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions. PMID:22916122
Predicting turns in proteins with a unified model.
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
Predicting Turns in Proteins with a Unified Model
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872
Differences in Mouse and Human Non-Memory B Cell Pools1
Benitez, Abigail; Weldon, Abby J.; Tatosyan, Lynnette; Velkuru, Vani; Lee, Steve; Milford, Terry-Ann; Francis, Olivia L.; Hsu, Sheri; Nazeri, Kavoos; Casiano, Carlos M.; Schneider, Rebekah; Gonzalez, Jennifer; Su, Rui-Jun; Baez, Ineavely; Colburn, Keith; Moldovan, Ioana; Payne, Kimberly J.
2014-01-01
Identifying cross-species similarities and differences in immune development and function is critical for maximizing the translational potential of animal models. Co-expression of CD21 and CD24 distinguishes transitional and mature B cell subsets in mice. Here, we validate these markers for identifying analogous subsets in humans and use them to compare the non-memory B cell pools in mice and humans, across tissues, during fetal/neonatal and adult life. Among human CD19+IgM+ B cells, the CD21/CD24 schema identifies distinct populations that correspond to T1 (transitional 1), T2 (transitional 2), FM (follicular mature), and MZ (marginal zone) subsets identified in mice. Markers specific to human B cell development validate the identity of MZ cells and the maturation status of human CD21/CD24 non-memory B cell subsets. A comparison of the non-memory B cell pools in bone marrow (BM), blood, and spleen in mice and humans shows that transitional B cells comprise a much smaller fraction in adult humans than mice. T1 cells are a major contributor to the non-memory B cell pool in mouse BM where their frequency is more than twice that in humans. Conversely, in spleen the T1:T2 ratio shows that T2 cells are proportionally ∼8 fold higher in humans than mouse. Despite the relatively small contribution of transitional B cells to the human non-memory pool, the number of naïve FM cells produced per transitional B cell is 3-6 fold higher across tissues than in mouse. These data suggest differing dynamics or mechanisms produce the non-memory B cell compartments in mice and humans. PMID:24719464
Kumar, Y Kiran; Mehta, Shashi Bhushan; Ramachandra, Manjunath
2017-01-01
The purpose of this work is to provide some validation methods for evaluating the hemodynamic assessment of Cerebral Arteriovenous Malformation (CAVM). This article emphasizes the importance of validating noninvasive measurements for CAVM patients, which are designed using lumped models for complex vessel structure. The validation of the hemodynamics assessment is based on invasive clinical measurements and cross-validation techniques with the Philips proprietary validated software's Qflow and 2D Perfursion. The modeling results are validated for 30 CAVM patients for 150 vessel locations. Mean flow, diameter, and pressure were compared between modeling results and with clinical/cross validation measurements, using an independent two-tailed Student t test. Exponential regression analysis was used to assess the relationship between blood flow, vessel diameter, and pressure between them. Univariate analysis is used to assess the relationship between vessel diameter, vessel cross-sectional area, AVM volume, AVM pressure, and AVM flow results were performed with linear or exponential regression. Modeling results were compared with clinical measurements from vessel locations of cerebral regions. Also, the model is cross validated with Philips proprietary validated software's Qflow and 2D Perfursion. Our results shows that modeling results and clinical results are nearly matching with a small deviation. In this article, we have validated our modeling results with clinical measurements. The new approach for cross-validation is proposed by demonstrating the accuracy of our results with a validated product in a clinical environment.
Observing Storm Surges from Space: A New Opportunity
NASA Astrophysics Data System (ADS)
Han, Guoqi; Ma, Zhimin; Chen, Dake; de Young, Brad; Chen, Nancy
2013-04-01
Coastal tide gauges can be used to monitor variations of a storm surge along the coast, but not in the cross-shelf direction. As a result, the cross-shelf structure of a storm surge has rarely been observed. In this study we focus on Hurricane Igor-induced storm surge off Newfoundland, Canada. Altimetric observations at about 2:30, September 22, 2010 UTC (hours after the passage of Hurricane Igor) reveal prominent cross-shelf variation of sea surface height during the storm passage, including a large nearshore slope and a mid-shelf depression. A significant coastal surge of 1 m derived from satellite altimetry is found to be consistent with tide-gauge measurements at nearby St. John's station. The post-storm sea level variations at St. John's and Argentia are argued to be associated with free equatorward-propagating continental shelf waves (with phase speeds of 11-13 m/s), generated along the northeast Newfoundland coast hours after the storm moved away from St. John's. The cross-shelf e-folding scale of the shelf wave was estimated to be ~100 km. We further show approximate agreement of altimetric and tide-gauge observations in the Gulf of Mexico during Hurricane Katrina (2005) and Isaac (2012). The study for the first time in the literature shows the robustness of satellite altimetry to observe storm surges, complementing tide-gauge observations for the analysis of storm surge characteristics and for the validation and improvement of storm surge models.
NASA Astrophysics Data System (ADS)
Pouyandeh, Sima; Iubini, Stefano; Jurinovich, Sandro; Omar, Yasser; Mennucci, Benedetta; Piazza, Francesco
2017-12-01
In this paper, we work out a parameterization of environmental noise within the Haken-Strobl-Reinenker (HSR) model for the PE545 light-harvesting complex, based on atomic-level quantum mechanics/molecular mechanics (QM/MM) simulations. We use this approach to investigate the role of various auto- and cross-correlations in the HSR noise tensor, confirming that site-energy autocorrelations (pure dephasing) terms dominate the noise-induced exciton mobility enhancement, followed by site energy-coupling cross-correlations for specific triplets of pigments. Interestingly, several cross-correlations of the latter kind, together with coupling-coupling cross-correlations, display clear low-frequency signatures in their spectral densities in the 30-70 cm-1 region. These slow components lie at the limits of validity of the HSR approach, which requires that environmental fluctuations be faster than typical exciton transfer time scales. We show that a simple coarse-grained elastic-network-model (ENM) analysis of the PE545 protein naturally spotlights collective normal modes in this frequency range that represent specific concerted motions of the subnetwork of cysteines covalenty linked to the pigments. This analysis strongly suggests that protein scaffolds in light-harvesting complexes are able to express specific collective, low-frequency normal modes providing a fold-rooted blueprint of exciton transport pathways. We speculate that ENM-based mixed quantum classical methods, such as Ehrenfest dynamics, might be promising tools to disentangle the fundamental designing principles of these dynamical processes in natural and artificial light-harvesting structures.
Reverse fault growth and fault interaction with frictional interfaces: insights from analogue models
NASA Astrophysics Data System (ADS)
Bonanno, Emanuele; Bonini, Lorenzo; Basili, Roberto; Toscani, Giovanni; Seno, Silvio
2017-04-01
The association of faulting and folding is a common feature in mountain chains, fold-and-thrust belts, and accretionary wedges. Kinematic models are developed and widely used to explain a range of relationships between faulting and folding. However, these models may result not to be completely appropriate to explain shortening in mechanically heterogeneous rock bodies. Weak layers, bedding surfaces, or pre-existing faults placed ahead of a propagating fault tip may influence the fault propagation rate itself and the associated fold shape. In this work, we employed clay analogue models to investigate how mechanical discontinuities affect the propagation rate and the associated fold shape during the growth of reverse master faults. The simulated master faults dip at 30° and 45°, recalling the range of the most frequent dip angles for active reverse faults that occurs in nature. The mechanical discontinuities are simulated by pre-cutting the clay pack. For both experimental setups (30° and 45° dipping faults) we analyzed three different configurations: 1) isotropic, i.e. without precuts; 2) with one precut in the middle of the clay pack; and 3) with two evenly-spaced precuts. To test the repeatability of the processes and to have a statistically valid dataset we replicate each configuration three times. The experiments were monitored by collecting successive snapshots with a high-resolution camera pointing at the side of the model. The pictures were then processed using the Digital Image Correlation method (D.I.C.), in order to extract the displacement and shear-rate fields. These two quantities effectively show both the on-fault and off-fault deformation, indicating the activity along the newly-formed faults and whether and at what stage the discontinuities (precuts) are reactivated. To study the fault propagation and fold shape variability we marked the position of the fault tips and the fold profiles for every successive step of deformation. Then we compared precut models with isotropic models to evaluate the trends of variability. Our results indicate that the discontinuities are reactivated especially when the tip of the newly-formed fault is either below or connected to them. During the stage of maximum activity along the precut, the faults slow down or even stop their propagation. The fault propagation systematically resumes when the angle between the fault and the precut is about 90° (critical angle); only during this stage the fault crosses the precut. The reactivation of the discontinuities induces an increase of the apical angle of the fault-related fold and produces wider limbs compared to the isotropic reference experiments.
Thoemmes, Stephen F; Stutzke, Crystal A; Du, Yanmei; Browning, Michael D; Buttrick, Peter M; Walker, Lori A
2014-01-31
Phosphorylation of cardiac troponin I is a well established mechanism by which cardiac contractility is modulated. However, there are a number of phosphorylation sites on TnI which contribute singly or in combination to influence cardiac function. Accordingly, methods for accurately measuring site-specific TnI phosphorylation are needed. Currently, two strategies are employed: mass spectrometry, which is costly, difficult and has a low throughput; and Western blotting using phospho-specific antibodies, which is limited by the availability of reagents. In this report, we describe a cohort of new site-specific TnI phosphoantibodies, generated against physiologically relevant phosphorylation sites, that are superior to the current commercially available antibodies: to phospho-serine 22/23 which shows a >5-fold phospho-specificity for phosphorylated TnI; to phospho-serine 43, which has >3-fold phospho-specificity for phosphorylated TnI; and phospho-serine 150 which has >2-fold phospho-specificity for phosphorylated TnI. These new antibodies demonstrated greater sensitivity and specificity for the phosphorylated TnI than the most widely used commercially available reagents. For example, at a protein load of 20 μg of total cardiac extract, a commercially available antibody recognized both phosphorylated and dephosphorylated TnI to the same degree. At the same protein load our phospho-serine 22/23 antibody exhibited no cross-reactivity with dephosphorylated TnI. These new tools should allow a more accurate assessment and a better understanding of the role of TnI phosphorylation in the response of the heart to pathologic stress. Copyright © 2013 Elsevier B.V. All rights reserved.
Cassaignau, Anaïs M E; Launay, Hélène M M; Karyadi, Maria-Evangelia; Wang, Xiaolin; Waudby, Christopher A; Deckert, Annika; Robertson, Amy L; Christodoulou, John; Cabrita, Lisa D
2016-08-01
During biosynthesis on the ribosome, an elongating nascent polypeptide chain can begin to fold, in a process that is central to all living systems. Detailed structural studies of co-translational protein folding are now beginning to emerge; such studies were previously limited, at least in part, by the inherently dynamic nature of emerging nascent chains, which precluded most structural techniques. NMR spectroscopy is able to provide atomic-resolution information for ribosome-nascent chain complexes (RNCs), but it requires large quantities (≥10 mg) of homogeneous, isotopically labeled RNCs. Further challenges include limited sample working concentration and stability of the RNC sample (which contribute to weak NMR signals) and resonance broadening caused by attachment to the large (2.4-MDa) ribosomal complex. Here, we present a strategy to generate isotopically labeled RNCs in Escherichia coli that are suitable for NMR studies. Uniform translational arrest of the nascent chains is achieved using a stalling motif, and isotopically labeled RNCs are produced at high yield using high-cell-density E. coli growth conditions. Homogeneous RNCs are isolated by combining metal affinity chromatography (to isolate ribosome-bound species) with sucrose density centrifugation (to recover intact 70S monosomes). Sensitivity-optimized NMR spectroscopy is then applied to the RNCs, combined with a suite of parallel NMR and biochemical analyses to cross-validate their integrity, including RNC-optimized NMR diffusion measurements to report on ribosome attachment in situ. Comparative NMR studies of RNCs with the analogous isolated proteins permit a high-resolution description of the structure and dynamics of a nascent chain during its progressive biosynthesis on the ribosome.
HMM-ModE: implementation, benchmarking and validation with HMMER3
2014-01-01
Background HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation. Results The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy. Conclusions The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families. PMID:25073805
Livingstone, Mark; Folkman, Lukas; Yang, Yuedong; Zhang, Ping; Mort, Matthew; Cooper, David N; Liu, Yunlong; Stantic, Bela; Zhou, Yaoqi
2017-10-01
Synonymous single-nucleotide variants (SNVs), although they do not alter the encoded protein sequences, have been implicated in many genetic diseases. Experimental studies indicate that synonymous SNVs can lead to changes in the secondary and tertiary structures of DNA and RNA, thereby affecting translational efficiency, cotranslational protein folding as well as the binding of DNA-/RNA-binding proteins. However, the importance of these various features in disease phenotypes is not clearly understood. Here, we have built a support vector machine (SVM) model (termed DDIG-SN) as a means to discriminate disease-causing synonymous variants. The model was trained and evaluated on nearly 900 disease-causing variants. The method achieves robust performance with the area under the receiver operating characteristic curve of 0.84 and 0.85 for protein-stratified 10-fold cross-validation and independent testing, respectively. We were able to show that the disease-causing effects in the immediate proximity to exon-intron junctions (1-3 bp) are driven by the loss of splicing motif strength, whereas the gain of splicing motif strength is the primary cause in regions further away from the splice site (4-69 bp). The method is available as a part of the DDIG server at http://sparks-lab.org/ddig. © 2017 Wiley Periodicals, Inc.
Zhang, Yan; Liu, Jun W; Zheng, Wen J; Wang, Lei; Zhang, Hong Y; Fang, Guo Z; Wang, Shuo
2008-02-01
In this study, an enzyme-linked immunosorbent assay (ELISA) was optimized and applied to the determination of endosulfan residues in 20 different kinds of food commodities including vegetables, dry fruits, tea and meat. The limit of detection (IC(15)) was 0.8 microg kg(-1) and the sensitivity (IC(50)) was 5.3 microg kg(-1). Three simple extraction methods were developed, including shaking on the rotary shaker at 250 r min(-1) overnight, shaking on the rotary shaker for 1 h and thoroughly mixing for 2 min. Methanol was used as the extraction solvent in this study. The extracts were diluted in 0.5% fish skin gelatin (FG) in phosphate-buffered saline (PBS) at various dilutions in order to remove the matrix interference. For cabbage (purple and green), asparagus, Japanese green, Chinese cabbage, scallion, garland chrysanthemum, spinach and garlic, the extracts were diluted 10-fold; for carrots and tea, the extracts were diluted 15-fold and 900-fold, respectively. The extracts of celery, adzuki beans and chestnuts, were diluted 20-fold to avoid the matrix interference; ginger, vegetable soybean and peanut extracts were diluted 100-fold; mutton and chicken extracts were diluted 10-fold and for eel, the dilution was 40-fold. Average recoveries were 63.13-125.61%. Validation was conducted by gas chromatography (GC) and gas chromatography-mass spectrometry (GC-MS). The results of this study will be useful to the wide application of an ELISA for the rapid determination of pesticides in food samples.
NASA Astrophysics Data System (ADS)
Yin, An; Oertel, Gerhard
1993-06-01
In order to understand interactions between motion along thrusts and the associated style of deformation and strain distribution in their hangingwalls, geologic mapping and strain measurements were conducted in an excellently exposed thrust-related fold system in the Lewis thrust plate, northwestern Montana. This system consists of: (1) an E-directed basal thrust (the Gunsight thrust) that has a flat-ramp geometry and a slip of about 3.6 km; (2) an E-verging asymmetric anticline with its nearly vertical forelimb truncated by the basal thrust from below; (3) a 4-km wide fold belt, the frontal fold complex, that lies directly in front of the E-verging anticline; (4) a W-directed bedding-parallel fault (the Mount Thompson fault) that bounds the top of the frontal fold belt and separates it from the undeformed to broadly folded strata above; and (5) regionally developed, W-dipping spaced cleavage. Although the overall geometry of the thrust-related fold system differs from any previously documented fault-related folds, the E-verging anticline itself resembles geometrically a Rich-type fault-bend fold. The observed initial cut-off and fold interlimb angles of this anticline, however, cannot be explained by cross-section balancing models for the development of either a fault-bend fold or a fault propagation fold. Possible origins for the E-verging anticline include (1) the fold initiated as an open fault-bend fold and tightened only later during its emplacement along the basal thrust and (2) the fold started as either a fault-bend or a fault-propagation fold, but simultaneous or subsequent volume change incompatible with any balanced cross-section models altered its shape. Strain in the thrust-related fold system was determined by the preferred orientation of mica and chlorite grains. The direction and magnitude of the post-compaction strain varies from place to place. Strains in the foreclimb of the hangingwall anticline imply bedding-parallel thinning at some localities and thickening at others. This inhomogeneity may be caused by the development of thrusts and folds. Strain in the backlimb of the hangingwall anticline implies bedding-parallel stretching in the thrust transport direction. This could be the effect of bending as the E-verging anticline was tightened and transported across the basal thrust ramp. Strain measured next to the Gunsight thrust again indicates locally varying shortening and extension in the transport direction, perhaps in response to inhomogeneous friction on the fault or else to a history of alternating strain hardening and softening in the basal thrust zone.
Deformation and kinematics of the central Kirthar Fold Belt, Pakistan
NASA Astrophysics Data System (ADS)
Hinsch, Ralph; Hagedorn, Peter; Asmar, Chloé; Nasim, Muhammad; Aamir Rasheed, Muhammad; Kiely, James M.
2017-04-01
The Kirthar Fold Belt is part of the lateral mountain belts in Pakistan linking the Himalaya orogeny with the Makran accretionary wedge. This region is deforming very oblique/nearly parallel to the regional plate motion vector. The study area is situated between the prominent Chaman strike-slip fault in the West and the un-deformed foreland (Kirthar Foredeep/Middle Indus Basin) in the East. The Kirthar Fold Belt is subdivided into several crustal blocks/units based on structural orientation and deformation style (e.g. Kallat, Khuzdar, frontal Kirthar). This study uses newly acquired and depth-migrated 2D seismic lines, surface geology observations and Google Earth assessments to construct three balanced cross sections for the frontal part of the fold belt. Further work was done in order to insure the coherency of the built cross-sections by taking a closer look at the regional context inferred from published data, simple analogue modelling, and constructed regional sketch sections. The Khuzdar area and the frontal Kirthar Fold Belt are dominated by folding. Large thrusts with major stratigraphic repetitions are not observed. Furthermore, strike-slip faults in the Khuzdar area are scarce and not observed in the frontal Kirthar Fold Belt. The regional structural elevation rises from the foreland across the Kirthar Fold Belt towards the hinterland (Khuzdar area). These observations indicate that basement-involved deformation is present at depth. The domination of folding indicates a weak decollement below the folds (soft-linked deformation). The fold pattern in the Khuzdar area is complex, whereas the large folds of the central Kirthar Fold Belt trend SSW-NNE to N-S and are best described as large detachment folds that have been slightly uplifted by basement involved transpressive deformation underneath. Towards the foreland, the deformation is apparently more hard-linked and involves fault-propagation folding and a small triangle zone in Cretaceous sediments. Shortening is in the order of 21-24% for the frontal structures. The deformation above the weak Eocene Ghazij shales is partly decoupled from the layers underneath, especially where the Ghazij shales are thick. Thus, not all structures visible at surface level in the Kirthar Fold Belt are also present in the deeper section, and vice versa (disharmonic folding). The structural architecture in the frontal central Kirthar Fold Belt shows only convergent structures nearly parallel to the regional plate motion vector of the Indian plate and thus represents an example of extreme strain partitioning.
Design and simulation of origami structures with smooth folds
Peraza Hernandez, E. A.; Lagoudas, D. C.
2017-01-01
Origami has enabled new approaches to the fabrication and functionality of multiple structures. Current methods for origami design are restricted to the idealization of folds as creases of zeroth-order geometric continuity. Such an idealization is not proper for origami structures of non-negligible fold thickness or maximum curvature at the folds restricted by material limitations. For such structures, folds are not properly represented as creases but rather as bent regions of higher-order geometric continuity. Such fold regions of arbitrary order of continuity are termed as smooth folds. This paper presents a method for solving the following origami design problem: given a goal shape represented as a polygonal mesh (termed as the goal mesh), find the geometry of a single planar sheet, its pattern of smooth folds, and the history of folding motion allowing the sheet to approximate the goal mesh. The parametrization of the planar sheet and the constraints that allow for a valid pattern of smooth folds are presented. The method is tested against various goal meshes having diverse geometries. The results show that every determined sheet approximates its corresponding goal mesh in a known folded configuration having fold angles obtained from the geometry of the goal mesh. PMID:28484322
Design and simulation of origami structures with smooth folds.
Peraza Hernandez, E A; Hartl, D J; Lagoudas, D C
2017-04-01
Origami has enabled new approaches to the fabrication and functionality of multiple structures. Current methods for origami design are restricted to the idealization of folds as creases of zeroth-order geometric continuity. Such an idealization is not proper for origami structures of non-negligible fold thickness or maximum curvature at the folds restricted by material limitations. For such structures, folds are not properly represented as creases but rather as bent regions of higher-order geometric continuity. Such fold regions of arbitrary order of continuity are termed as smooth folds . This paper presents a method for solving the following origami design problem: given a goal shape represented as a polygonal mesh (termed as the goal mesh ), find the geometry of a single planar sheet, its pattern of smooth folds, and the history of folding motion allowing the sheet to approximate the goal mesh. The parametrization of the planar sheet and the constraints that allow for a valid pattern of smooth folds are presented. The method is tested against various goal meshes having diverse geometries. The results show that every determined sheet approximates its corresponding goal mesh in a known folded configuration having fold angles obtained from the geometry of the goal mesh.
Folding Properties of Two-Dimensional Deployable Membrane Using FEM Analyses
NASA Astrophysics Data System (ADS)
Satou, Yasutaka; Furuya, Hiroshi
Folding FEM analyses are presented to examine folding properties of a two-dimensional deployable membrane for a precise deployment simulation. A fold model of the membrane is proposed by dividing the wrapping fold process into two regions which are the folded state and the transient process. The cross-section of the folded state is assumed to be a repeating structure, and analytical procedures of the repeating structure are constructed. To investigate the mechanical properties of the crease in detail, the bending stiffness is considered in the FEM analyses. As the results of the FEM analyses, the configuration of the membrane and the contact force by the adjacent membrane are obtained quantitatively for an arbitrary layer pitch. Possible occurrence of the plastic deformation is estimated using the Mises stress in the crease. The FEM results are compared with one-dimensional approximation analyses to evaluate these results.
Pyrethroid resistance and cross-resistance in the German cockroach, Blattella germanica (L).
Wei, Y; Appel, A G; Moar, W J; Liu, N
2001-11-01
A German cockroach (Blatella germanica (L)) strain, Apyr-R, was collected from Opelika, Alabama after control failures with pyrethroid insecticides. Levels of resistance to permethrin and deltamethrin in Apyr-R (97- and 480-fold, respectively, compared with a susceptible strain, ACY) were partially or mostly suppressed by piperonyl butoxide (PBO) and S,S,S,-tributylphosphorotrithioate (DEF), suggesting that P450 monooxygenases and hydrolases are involved in resistance to these two pyrethroids in Apyr-R. However, incomplete suppression of pyrethroid resistance with PBO and DEF implies that one or more additional mechanisms are involved in resistance. Injection, compared with topical application, resulted in 43- and 48-fold increases in toxicity of permethrin in ACY and Apyr-R, respectively. Similarly, injection increased the toxicity of deltamethrin 27-fold in ACY and 28-fold in Apyr-R. These data indicate that cuticular penetration is one of the obstacles for the effectiveness of pyrethroids against German cockroaches. However, injection did not change the levels of resistance to either permethrin or deltamethrin, suggesting that a decrease in the rate of cuticular penetration may not play an important role in pyrethroid resistance in Apyr-R. Apyr-R showed cross-resistance to imidacloprid, with a resistance ratio of 10. PBO treatment resulted in no significant change in the toxicity of imidacloprid, implying that P450 monooxygenase-mediated detoxication is not the mechanism responsible for cross-resistance. Apyr-R showed no cross-resistance to spinosad, although spinosad had relatively low toxicity to German cockroaches compared with other insecticides tested in this study. This result further confirmed that the mode of action of spinosad to insects is unique. Fipronil, a relatively new insecticide, was highly toxic to German cockroaches, and the multi-resistance mechanisms in Apyr-R did not confer significant cross-resistance to this compound. Thus, we propose that fipronil could be a valuable tool in integrated resistance management of German cockroaches.
James, S. W.; Ranum, LPW.; Silflow, C. D.; Lefebvre, P. A.
1988-01-01
We have used genetic analysis to study the mode of action of two anti-microtubule herbicides, amiprophos-methyl (APM) and oryzalin (ORY). Over 200 resistant mutants were selected by growth on APM- or ORY-containing plates. The 21 independently isolated mutants examined in this study are 3- to 8-fold resistant to APM and are strongly cross-resistant to ORY and butamiphos, a close analog of APM. Two Mendelian genes, apm1 and apm2, are defined by linkage and complementation analysis. There are 20 alleles of apm1 and one temperature-sensitive lethal (33°) allele of apm2. Mapping by two-factor crosses places apm1 6.5 cM centromere proximal to uni1 and within 4 cM of pf7 on the uni linkage group, a genetically circular linkage group comprising genes which affect flagellar assembly or function; apm2 maps near the centromere of linkage group VIII. Allele-specific synthetic lethality is observed in crosses between apm2 and alleles of apm1. Also, self crosses of apm2 are zygotic lethal, whereas crosses of nine apm1 alleles inter se result in normal germination and tetrad viability. The mutants are recessive to their wild-type alleles but doubly heterozygous diploids (apm1 +/+ apm2) made with apm2 and any of 15 apm1 alleles display partial intergenic noncomplementation, expressed as intermediate resistance. Diploids homozygous for mutant alleles of apm1 are 4-6-fold resistant to APM and ORY; diploids homozygous for apm2 are ts(-) and 2-fold resistant to the herbicides. Doubly heterozygous diploids complement the ts(-) phenotype of apm2, but they are typically 1.5-2-fold resistant to APM and ORY. From the results described we suggest that the gene products of apm1 and apm2 may interact directly or function in the same structure or process. PMID:8608924
3D visualization of sheath folds in Ancient Roman marble wall coverings from Ephesos, Turkey
NASA Astrophysics Data System (ADS)
Wex, Sebastian; Passchier, Cees W.; de Kemp, Eric A.; İlhan, Sinan
2014-10-01
Archaeological excavations and restoration of a palatial Roman housing complex in Ephesos, Turkey yielded 40 wall-decorating plates of folded mylonitic marble (Cipollino verde), derived from the internal Hellenides near Karystos, Greece. Cipollino verde was commonly used for decoration purposes in Roman buildings. The plates were serial-sectioned from a single quarried block of 1,25 m3 and provided a research opportunity for detailed reconstruction of the 3D geometry of meterscale folds in mylonitized marble. A GOCAD model is used to visualize the internal fold structures of the marble, comprising curtain folds and multilayered sheath folds. The sheath folds are unusual in that they have their intermediate axis normal to the parent layering. This agrees with regional tectonic studies, which suggest that Cipollino verde structures formed by local constrictional non-coaxial flow. Sheath fold cross-section geometry, exposed on the surface of a plate or outcrop, is found to be independent of the intersection angle of the fold structure with the studied plane. Consequently, a single surface cannot be used as an indicator of the three-dimensional geometry of transected sheath folds.
Dependence of Internal Friction on Folding Mechanism
2016-01-01
An outstanding challenge in protein folding is understanding the origin of “internal friction” in folding dynamics, experimentally identified from the dependence of folding rates on solvent viscosity. A possible origin suggested by simulation is the crossing of local torsion barriers. However, it was unclear why internal friction varied from protein to protein or for different folding barriers of the same protein. Using all-atom simulations with variable solvent viscosity, in conjunction with transition-path sampling to obtain reaction rates and analysis via Markov state models, we are able to determine the internal friction in the folding of several peptides and miniproteins. In agreement with experiment, we find that the folding events with greatest internal friction are those that mainly involve helix formation, while hairpin formation exhibits little or no evidence of friction. Via a careful analysis of folding transition paths, we show that internal friction arises when torsion angle changes are an important part of the folding mechanism near the folding free energy barrier. These results suggest an explanation for the variation of internal friction effects from protein to protein and across the energy landscape of the same protein. PMID:25721133
Dependence of internal friction on folding mechanism.
Zheng, Wenwei; De Sancho, David; Hoppe, Travis; Best, Robert B
2015-03-11
An outstanding challenge in protein folding is understanding the origin of "internal friction" in folding dynamics, experimentally identified from the dependence of folding rates on solvent viscosity. A possible origin suggested by simulation is the crossing of local torsion barriers. However, it was unclear why internal friction varied from protein to protein or for different folding barriers of the same protein. Using all-atom simulations with variable solvent viscosity, in conjunction with transition-path sampling to obtain reaction rates and analysis via Markov state models, we are able to determine the internal friction in the folding of several peptides and miniproteins. In agreement with experiment, we find that the folding events with greatest internal friction are those that mainly involve helix formation, while hairpin formation exhibits little or no evidence of friction. Via a careful analysis of folding transition paths, we show that internal friction arises when torsion angle changes are an important part of the folding mechanism near the folding free energy barrier. These results suggest an explanation for the variation of internal friction effects from protein to protein and across the energy landscape of the same protein.
Double Cross-Validation in Multiple Regression: A Method of Estimating the Stability of Results.
ERIC Educational Resources Information Center
Rowell, R. Kevin
In multiple regression analysis, where resulting predictive equation effectiveness is subject to shrinkage, it is especially important to evaluate result replicability. Double cross-validation is an empirical method by which an estimate of invariance or stability can be obtained from research data. A procedure for double cross-validation is…
2015-01-01
The 5-hydroxytryptamine 1A (5-HT1A) serotonin receptor has been an attractive target for treating mood and anxiety disorders such as schizophrenia. We have developed binary classification quantitative structure–activity relationship (QSAR) models of 5-HT1A receptor binding activity using data retrieved from the PDSP Ki database. The prediction accuracy of these models was estimated by external 5-fold cross-validation as well as using an additional validation set comprising 66 structurally distinct compounds from the World of Molecular Bioactivity database. These validated models were then used to mine three major types of chemical screening libraries, i.e., drug-like libraries, GPCR targeted libraries, and diversity libraries, to identify novel computational hits. The five best hits from each class of libraries were chosen for further experimental testing in radioligand binding assays, and nine of the 15 hits were confirmed to be active experimentally with binding affinity better than 10 μM. The most active compound, Lysergol, from the diversity library showed very high binding affinity (Ki) of 2.3 nM against 5-HT1A receptor. The novel 5-HT1A actives identified with the QSAR-based virtual screening approach could be potentially developed as novel anxiolytics or potential antischizophrenic drugs. PMID:24410373
Nayak, Deepak Ranjan; Dash, Ratnakar; Majhi, Banshidhar
2017-01-01
This paper presents an automatic classification system for segregating pathological brain from normal brains in magnetic resonance imaging scanning. The proposed system employs contrast limited adaptive histogram equalization scheme to enhance the diseased region in brain MR images. Two-dimensional stationary wavelet transform is harnessed to extract features from the preprocessed images. The feature vector is constructed using the energy and entropy values, computed from the level- 2 SWT coefficients. Then, the relevant and uncorrelated features are selected using symmetric uncertainty ranking filter. Subsequently, the selected features are given input to the proposed AdaBoost with support vector machine classifier, where SVM is used as the base classifier of AdaBoost algorithm. To validate the proposed system, three standard MR image datasets, Dataset-66, Dataset-160, and Dataset- 255 have been utilized. The 5 runs of k-fold stratified cross validation results indicate the suggested scheme offers better performance than other existing schemes in terms of accuracy and number of features. The proposed system earns ideal classification over Dataset-66 and Dataset-160; whereas, for Dataset- 255, an accuracy of 99.45% is achieved. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
The Principle of the Micro-Electronic Neural Bridge and a Prototype System Design.
Huang, Zong-Hao; Wang, Zhi-Gong; Lu, Xiao-Ying; Li, Wen-Yuan; Zhou, Yu-Xuan; Shen, Xiao-Yan; Zhao, Xin-Tai
2016-01-01
The micro-electronic neural bridge (MENB) aims to rebuild lost motor function of paralyzed humans by routing movement-related signals from the brain, around the damage part in the spinal cord, to the external effectors. This study focused on the prototype system design of the MENB, including the principle of the MENB, the neural signal detecting circuit and the functional electrical stimulation (FES) circuit design, and the spike detecting and sorting algorithm. In this study, we developed a novel improved amplitude threshold spike detecting method based on variable forward difference threshold for both training and bridging phase. The discrete wavelet transform (DWT), a new level feature coefficient selection method based on Lilliefors test, and the k-means clustering method based on Mahalanobis distance were used for spike sorting. A real-time online spike detecting and sorting algorithm based on DWT and Euclidean distance was also implemented for the bridging phase. Tested by the data sets available at Caltech, in the training phase, the average sensitivity, specificity, and clustering accuracies are 99.43%, 97.83%, and 95.45%, respectively. Validated by the three-fold cross-validation method, the average sensitivity, specificity, and classification accuracy are 99.43%, 97.70%, and 96.46%, respectively.
ERIC Educational Resources Information Center
Acar, Tu¨lin
2014-01-01
In literature, it has been observed that many enhanced criteria are limited by factor analysis techniques. Besides examinations of statistical structure and/or psychological structure, such validity studies as cross validation and classification-sequencing studies should be performed frequently. The purpose of this study is to examine cross…
When fast is better: protein folding fundamentals and mechanisms from ultrafast approaches
Muñoz, Victor; Cerminara, Michele
2016-01-01
Protein folding research stalled for decades because conventional experiments indicated that proteins fold slowly and in single strokes, whereas theory predicted a complex interplay between dynamics and energetics resulting in myriad microscopic pathways. Ultrafast kinetic methods turned the field upside down by providing the means to probe fundamental aspects of folding, test theoretical predictions and benchmark simulations. Accordingly, experimentalists could measure the timescales for all relevant folding motions, determine the folding speed limit and confirm that folding barriers are entropic bottlenecks. Moreover, a catalogue of proteins that fold extremely fast (microseconds) could be identified. Such fast-folding proteins cross shallow free energy barriers or fold downhill, and thus unfold with minimal co-operativity (gradually). A new generation of thermodynamic methods has exploited this property to map folding landscapes, interaction networks and mechanisms at nearly atomic resolution. In parallel, modern molecular dynamics simulations have finally reached the timescales required to watch fast-folding proteins fold and unfold in silico. All of these findings have buttressed the fundamentals of protein folding predicted by theory, and are now offering the first glimpses at the underlying mechanisms. Fast folding appears to also have functional implications as recent results connect downhill folding with intrinsically disordered proteins, their complex binding modes and ability to moonlight. These connections suggest that the coupling between downhill (un)folding and binding enables such protein domains to operate analogically as conformational rheostats. PMID:27574021
Differential gene expression in human abdominal aortic aneurysm and aortic occlusive disease
Moran, Corey S.; Schreurs, Charlotte; Lindeman, Jan H. N.; Walker, Philip J.; Nataatmadja, Maria; West, Malcolm; Holdt, Lesca M.; Hinterseher, Irene; Pilarsky, Christian; Golledge, Jonathan
2015-01-01
Abdominal aortic aneurysm (AAA) and aortic occlusive disease (AOD) represent common causes of morbidity and mortality in elderly populations which were previously believed to have common aetiologies. The aim of this study was to assess the gene expression in human AAA and AOD. We performed microarrays using aortic specimen obtained from 20 patients with small AAAs (≤ 55mm), 29 patients with large AAAs (> 55mm), 9 AOD patients, and 10 control aortic specimens obtained from organ donors. Some differentially expressed genes were validated by quantitative-PCR (qRT-PCR)/immunohistochemistry. We identified 840 and 1,014 differentially expressed genes in small and large AAAs, respectively. Immune-related pathways including cytokine-cytokine receptor interaction and T-cell-receptor signalling were upregulated in both small and large AAAs. Examples of validated genes included CTLA4 (2.01-fold upregulated in small AAA, P = 0.002), NKTR (2.37-and 2.66-fold upregulated in small and large AAA with P = 0.041 and P = 0.015, respectively), and CD8A (2.57-fold upregulated in large AAA, P = 0.004). 1,765 differentially expressed genes were identified in AOD. Pathways upregulated in AOD included metabolic and oxidative phosphorylation categories. The UCP2 gene was downregulated in AOD (3.73-fold downregulated, validated P = 0.017). In conclusion, the AAA and AOD transcriptomes were very different suggesting that AAA and AOD have distinct pathogenic mechanisms. PMID:25944698
Park, Jinhee; Javier, Rios Jesus; Moon, Taesup; Kim, Youngwook
2016-11-24
Accurate classification of human aquatic activities using radar has a variety of potential applications such as rescue operations and border patrols. Nevertheless, the classification of activities on water using radar has not been extensively studied, unlike the case on dry ground, due to its unique challenge. Namely, not only is the radar cross section of a human on water small, but the micro-Doppler signatures are much noisier due to water drops and waves. In this paper, we first investigate whether discriminative signatures could be obtained for activities on water through a simulation study. Then, we show how we can effectively achieve high classification accuracy by applying deep convolutional neural networks (DCNN) directly to the spectrogram of real measurement data. From the five-fold cross-validation on our dataset, which consists of five aquatic activities, we report that the conventional feature-based scheme only achieves an accuracy of 45.1%. In contrast, the DCNN trained using only the collected data attains 66.7%, and the transfer learned DCNN, which takes a DCNN pre-trained on a RGB image dataset and fine-tunes the parameters using the collected data, achieves a much higher 80.3%, which is a significant performance boost.
Estimates of Commercial Motor Vehicles Using the Southwest Border Crossings
DOT National Transportation Integrated Search
2000-09-20
The United States has experienced almost a five-fold increase in commercial motor vehicle traffic to and from Mexico during the past sixteen years. There were more than 4< million commercial motor vehicle (CMV) crossings from Mexico into the United S...
Modeling of autocatalytic hydrolysis of adefovir dipivoxil in solid formulations.
Dong, Ying; Zhang, Yan; Xiang, Bingren; Deng, Haishan; Wu, Jingfang
2011-04-01
The stability and hydrolysis kinetics of a phosphate prodrug, adefovir dipivoxil, in solid formulations were studied. The stability relationship between five solid formulations was explored. An autocatalytic mechanism for hydrolysis could be proposed according to the kinetic behavior which fits the Prout-Tompkins model well. For the classical kinetic models could hardly describe and predict the hydrolysis kinetics of adefovir dipivoxil in solid formulations accurately when the temperature is high, a feedforward multilayer perceptron (MLP) neural network was constructed to model the hydrolysis kinetics. The build-in approaches in Weka, such as lazy classifiers and rule-based learners (IBk, KStar, DecisionTable and M5Rules), were used to verify the performance of MLP. The predictability of the models was evaluated by 10-fold cross-validation and an external test set. It reveals that MLP should be of general applicability proposing an alternative efficient way to model and predict autocatalytic hydrolysis kinetics for phosphate prodrugs.
Babaei, Sepideh; Geranmayeh, Amir; Seyyedsalehi, Seyyed Ali
2010-12-01
The supervised learning of recurrent neural networks well-suited for prediction of protein secondary structures from the underlying amino acids sequence is studied. Modular reciprocal recurrent neural networks (MRR-NN) are proposed to model the strong correlations between adjacent secondary structure elements. Besides, a multilayer bidirectional recurrent neural network (MBR-NN) is introduced to capture the long-range intramolecular interactions between amino acids in formation of the secondary structure. The final modular prediction system is devised based on the interactive integration of the MRR-NN and the MBR-NN structures to arbitrarily engage the neighboring effects of the secondary structure types concurrent with memorizing the sequential dependencies of amino acids along the protein chain. The advanced combined network augments the percentage accuracy (Q₃) to 79.36% and boosts the segment overlap (SOV) up to 70.09% when tested on the PSIPRED dataset in three-fold cross-validation. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
[Rapid identification of hogwash oil by using synchronous fluorescence spectroscopy].
Sun, Yan-Hui; An, Hai-Yang; Jia, Xiao-Li; Wang, Juan
2012-10-01
To identify hogwash oil quickly, the characteristic delta lambda of hogwash oil was analyzed by three dimensional fluorescence spectroscopy with parallel factor analysis, and the model was built up by using synchronous fluorescence spectroscopy with support vector machines (SVM). The results showed that the characteristic delta lambda of hogwash oil was 60 nm. Collecting original spectrum of different samples under the condition of characteristic delta lambda 60 nm, the best model was established while 5 principal components were selected from original spectrum and the radial basis function (RBF) was used as the kernel function, and the optimal penalty factor C and kernel function g were 512 and 0.5 respectively obtained by the grid searching and 6-fold cross validation. The discrimination rate of the model was 100% for both training sets and prediction sets. Thus, it is quick and accurate to apply synchronous fluorescence spectroscopy to identification of hogwash oil.
Soft computing techniques toward modeling the water supplies of Cyprus.
Iliadis, L; Maris, F; Tachos, S
2011-10-01
This research effort aims in the application of soft computing techniques toward water resources management. More specifically, the target is the development of reliable soft computing models capable of estimating the water supply for the case of "Germasogeia" mountainous watersheds in Cyprus. Initially, ε-Regression Support Vector Machines (ε-RSVM) and fuzzy weighted ε-RSVMR models have been developed that accept five input parameters. At the same time, reliable artificial neural networks have been developed to perform the same job. The 5-fold cross validation approach has been employed in order to eliminate bad local behaviors and to produce a more representative training data set. Thus, the fuzzy weighted Support Vector Regression (SVR) combined with the fuzzy partition has been employed in an effort to enhance the quality of the results. Several rational and reliable models have been produced that can enhance the efficiency of water policy designers. Copyright © 2011 Elsevier Ltd. All rights reserved.
Xu, Wenzhao; Collingsworth, Paris D.; Bailey, Barbara; Carlson Mazur, Martha L.; Schaeffer, Jeff; Minsker, Barbara
2017-01-01
This paper proposes a geospatial analysis framework and software to interpret water-quality sampling data from towed undulating vehicles in near-real time. The framework includes data quality assurance and quality control processes, automated kriging interpolation along undulating paths, and local hotspot and cluster analyses. These methods are implemented in an interactive Web application developed using the Shiny package in the R programming environment to support near-real time analysis along with 2- and 3-D visualizations. The approach is demonstrated using historical sampling data from an undulating vehicle deployed at three rivermouth sites in Lake Michigan during 2011. The normalized root-mean-square error (NRMSE) of the interpolation averages approximately 10% in 3-fold cross validation. The results show that the framework can be used to track river plume dynamics and provide insights on mixing, which could be related to wind and seiche events.
Personalized recommendation based on preferential bidirectional mass diffusion
NASA Astrophysics Data System (ADS)
Chen, Guilin; Gao, Tianrun; Zhu, Xuzhen; Tian, Hui; Yang, Zhao
2017-03-01
Recommendation system provides a promising way to alleviate the dilemma of information overload. In physical dynamics, mass diffusion has been used to design effective recommendation algorithms on bipartite network. However, most of the previous studies focus overwhelmingly on unidirectional mass diffusion from collected objects to uncollected objects, while overlooking the opposite direction, leading to the risk of similarity estimation deviation and performance degradation. In addition, they are biased towards recommending popular objects which will not necessarily promote the accuracy but make the recommendation lack diversity and novelty that indeed contribute to the vitality of the system. To overcome the aforementioned disadvantages, we propose a preferential bidirectional mass diffusion (PBMD) algorithm by penalizing the weight of popular objects in bidirectional diffusion. Experiments are evaluated on three benchmark datasets (Movielens, Netflix and Amazon) by 10-fold cross validation, and results indicate that PBMD remarkably outperforms the mainstream methods in accuracy, diversity and novelty.
Deep Gaze Velocity Analysis During Mammographic Reading for Biometric Identification of Radiologists
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoon, Hong-Jun; Alamudun, Folami T.; Hudson, Kathy
Several studies have confirmed that the gaze velocity of the human eye can be utilized as a behavioral biometric or personalized biomarker. In this study, we leverage the local feature representation capacity of convolutional neural networks (CNNs) for eye gaze velocity analysis as the basis for biometric identification of radiologists performing breast cancer screening. Using gaze data collected from 10 radiologists reading 100 mammograms of various diagnoses, we compared the performance of a CNN-based classification algorithm with two deep learning classifiers, deep neural network and deep belief network, and a previously presented hidden Markov model classifier. The study showed thatmore » the CNN classifier is superior compared to alternative classification methods based on macro F1-scores derived from 10-fold cross-validation experiments. Our results further support the efficacy of eye gaze velocity as a biometric identifier of medical imaging experts.« less
Liu, Bin; Wu, Hao; Zhang, Deyuan; Wang, Xiaolong; Chou, Kuo-Chen
2017-02-21
To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.
Computing Prediction and Functional Analysis of Prokaryotic Propionylation.
Wang, Li-Na; Shi, Shao-Ping; Wen, Ping-Ping; Zhou, Zhi-You; Qiu, Jian-Ding
2017-11-27
Identification and systematic analysis of candidates for protein propionylation are crucial steps for understanding its molecular mechanisms and biological functions. Although several proteome-scale methods have been performed to delineate potential propionylated proteins, the majority of lysine-propionylated substrates and their role in pathological physiology still remain largely unknown. By gathering various databases and literatures, experimental prokaryotic propionylation data were collated to be trained in a support vector machine with various features via a three-step feature selection method. A novel online tool for seeking potential lysine-propionylated sites (PropSeek) ( http://bioinfo.ncu.edu.cn/PropSeek.aspx ) was built. Independent test results of leave-one-out and n-fold cross-validation were similar to each other, showing that PropSeek is a stable and robust predictor with satisfying performance. Meanwhile, analyses of Gene Ontology, Kyoto Encyclopedia of Genes and Genomes pathways, and protein-protein interactions implied a potential role of prokaryotic propionylation in protein synthesis and metabolism.
Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K
2014-10-01
Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.
Tracing the Geographical Origin of Onions by Strontium Isotope Ratio and Strontium Content.
Hiraoka, Hisaaki; Morita, Sakie; Izawa, Atsunobu; Aoyama, Keisuke; Shin, Ki-Cheol; Nakano, Takanori
2016-01-01
The strontium (Sr) isotope ratio ((87)Sr/(86)Sr) and Sr content were used to trace the geographical origin of onions from Japan and other countries, including China, the United States of America, New Zealand, Australia, and Thailand. The mean (87)Sr/(86)Sr ratio and Sr content (dry weight basis) for onions from Japan were 0.70751 and 4.6 mg kg(-1), respectively, and the values for onions from the other countries were 0.71199 and 12.4 mg kg(-1), respectively. Linear discriminant analysis was performed to classify onions produced in Japan from those produced in the other countries based on the Sr data. The discriminant equation derived from linear discriminant analysis was evaluated by 10-fold cross validation. As a result, the origins of 92% of onions were correctly classified between Japan and the other countries.
Lake, Bathilda B; Rossmeisl, John Henry; Cecere, Julie; Stafford, Phillip; Zimmerman, Kurt L
2018-01-01
A variety of inflammatory conditions of unknown cause (meningoencephalomyelitis of unknown etiology-MUE) and neoplastic diseases can affect the central nervous system (CNS) of dogs. MUE can mimic intracranial neoplasia both clinically, radiologically and even in some cases, histologically. Serum immunosignature protein microarray assays have been used in humans to identify CNS diseases such as Alzheimer's and neoplasia, and in dogs, to detect lymphoma and its progression. This study evaluated the effectiveness of immunosignature profiles for distinguishing between three cohorts of dogs: healthy, intracranial neoplasia, and MUE. Using the learned peptide patterns for these three cohorts, classification prediction was evaluated for the same groups using a 10-fold cross validation methodology. Accuracy for classification was 100%, as well as 100% specific and 100% sensitive. This pilot study demonstrates that immunosignature profiles may help serve as a minimally invasive tool to distinguish between MUE and intracranial neoplasia in dogs.
Single-accelerometer-based daily physical activity classification.
Long, Xi; Yin, Bin; Aarts, Ronald M
2009-01-01
In this study, a single tri-axial accelerometer placed on the waist was used to record the acceleration data for human physical activity classification. The data collection involved 24 subjects performing daily real-life activities in a naturalistic environment without researchers' intervention. For the purpose of assessing customers' daily energy expenditure, walking, running, cycling, driving, and sports were chosen as target activities for classification. This study compared a Bayesian classification with that of a Decision Tree based approach. A Bayes classifier has the advantage to be more extensible, requiring little effort in classifier retraining and software update upon further expansion or modification of the target activities. Principal components analysis was applied to remove the correlation among features and to reduce the feature vector dimension. Experiments using leave-one-subject-out and 10-fold cross validation protocols revealed a classification accuracy of approximately 80%, which was comparable with that obtained by a Decision Tree classifier.
Reading Emotion From Mouse Cursor Motions: Affective Computing Approach.
Yamauchi, Takashi; Xiao, Kunchen
2018-04-01
Affective computing research has advanced emotion recognition systems using facial expressions, voices, gaits, and physiological signals, yet these methods are often impractical. This study integrates mouse cursor motion analysis into affective computing and investigates the idea that movements of the computer cursor can provide information about emotion of the computer user. We extracted 16-26 trajectory features during a choice-reaching task and examined the link between emotion and cursor motions. Participants were induced for positive or negative emotions by music, film clips, or emotional pictures, and they indicated their emotions with questionnaires. Our 10-fold cross-validation analysis shows that statistical models formed from "known" participants (training data) could predict nearly 10%-20% of the variance of positive affect and attentiveness ratings of "unknown" participants, suggesting that cursor movement patterns such as the area under curve and direction change help infer emotions of computer users. Copyright © 2017 Cognitive Science Society, Inc.
Deep Gaze Velocity Analysis During Mammographic Reading for Biometric Identification of Radiologists
Yoon, Hong-Jun; Alamudun, Folami T.; Hudson, Kathy; ...
2018-01-24
Several studies have confirmed that the gaze velocity of the human eye can be utilized as a behavioral biometric or personalized biomarker. In this study, we leverage the local feature representation capacity of convolutional neural networks (CNNs) for eye gaze velocity analysis as the basis for biometric identification of radiologists performing breast cancer screening. Using gaze data collected from 10 radiologists reading 100 mammograms of various diagnoses, we compared the performance of a CNN-based classification algorithm with two deep learning classifiers, deep neural network and deep belief network, and a previously presented hidden Markov model classifier. The study showed thatmore » the CNN classifier is superior compared to alternative classification methods based on macro F1-scores derived from 10-fold cross-validation experiments. Our results further support the efficacy of eye gaze velocity as a biometric identifier of medical imaging experts.« less
Chikh, Mohamed Amine; Saidi, Meryem; Settouti, Nesma
2012-10-01
The use of expert systems and artificial intelligence techniques in disease diagnosis has been increasing gradually. Artificial Immune Recognition System (AIRS) is one of the methods used in medical classification problems. AIRS2 is a more efficient version of the AIRS algorithm. In this paper, we used a modified AIRS2 called MAIRS2 where we replace the K- nearest neighbors algorithm with the fuzzy K-nearest neighbors to improve the diagnostic accuracy of diabetes diseases. The diabetes disease dataset used in our work is retrieved from UCI machine learning repository. The performances of the AIRS2 and MAIRS2 are evaluated regarding classification accuracy, sensitivity and specificity values. The highest classification accuracy obtained when applying the AIRS2 and MAIRS2 using 10-fold cross-validation was, respectively 82.69% and 89.10%.
Prieto, Luis P; Sharma, Kshitij; Kidzinski, Łukasz; Rodríguez-Triana, María Jesús; Dillenbourg, Pierre
2018-04-01
The pedagogical modelling of everyday classroom practice is an interesting kind of evidence, both for educational research and teachers' own professional development. This paper explores the usage of wearable sensors and machine learning techniques to automatically extract orchestration graphs (teaching activities and their social plane over time), on a dataset of 12 classroom sessions enacted by two different teachers in different classroom settings. The dataset included mobile eye-tracking as well as audiovisual and accelerometry data from sensors worn by the teacher. We evaluated both time-independent and time-aware models, achieving median F1 scores of about 0.7-0.8 on leave-one-session-out k-fold cross-validation. Although these results show the feasibility of this approach, they also highlight the need for larger datasets, recorded in a wider variety of classroom settings, to provide automated tagging of classroom practice that can be used in everyday practice across multiple teachers.
Cross-Validating Chinese Language Mental Health Recovery Measures in Hong Kong
ERIC Educational Resources Information Center
Bola, John; Chan, Tiffany Hill Ching; Chen, Eric HY; Ng, Roger
2016-01-01
Objectives: Promoting recovery in mental health services is hampered by a shortage of reliable and valid measures, particularly in Hong Kong. We seek to cross validate two Chinese language measures of recovery and one of recovery-promoting environments. Method: A cross-sectional survey of people recovering from early episode psychosis (n = 121)…
Shad, Sarfraz Ali; Sayyed, Ali H; Saleem, Mushtaq A
2010-08-01
Spodoptera litura (F.) is a cosmopolitan pest that has developed resistance to several insecticides. The aim of the present study was to establish whether an emamectin-selected (Ema-SEL) population could render cross-resistance to other insecticides, and to investigate the genetics of resistance. Bioassays at G(1) gave resistance ratios (RRs) of 80-, 2980-, 3050- and 2800-fold for emamectin, abamectin, indoxacarb and acetamiprid, respectively, compared with a laboratory susceptible population Lab-PK. After three rounds of selection, resistance to emamectin in Ema-SEL increased significantly, with RRs of 730-fold and 13-fold compared with the Lab-PK and unselected (UNSEL) population respectively. Further studies revealed that three generations were required for a tenfold increase in resistance to emamectin. Resistance to abamectin, indoxacarb, acetamiprid and emamectin in UNSEL declined significantly compared with the field population at G(1). Furthermore, selection with emamectin reduced resistance to abamectin, indoxacarb and acetamiprid on a par with UNSEL. Crosses between Ema-SEL and Lab-PK indicated autosomal and incomplete dominance of resistance. A direct test of a monogenic model and Land's method suggested that resistance to emamectin was controlled by more than one locus. Instability of resistance and lack of cross-resistance to other insecticides suggest that insecticides with different modes of action should be recommended to reduce emamectin selection pressure. Copyright (c) 2010 Society of Chemical Industry.
Xiao, Z; Tang, Z; Qiang, J; Wang, S; Qian, W; Zhong, Y; Wang, R; Wang, J; Wu, L; Tang, W; Zhang, Z
2018-01-25
Intravoxel incoherent motion is a promising method for the differentiation of sinonasal lesions. This study aimed to evaluate the value of intravoxel incoherent motion in the differentiation of benign and malignant sinonasal lesions and to compare the diagnostic performance of intravoxel incoherent motion with that of conventional DWI. One hundred thirty-one patients with histologically proved solid sinonasal lesions (56 benign and 75 malignant) who underwent conventional DWI and intravoxel incoherent motion were recruited in this study. The diffusion coefficient ( D ), pseudodiffusion coefficient ( D *), and perfusion fraction ( f ) values derived from intravoxel incoherent motion and ADC values derived from conventional DWI were measured and compared between the 2 groups using the Student t test. Receiver operating characteristic curve analysis, logistic regression analysis, and 10-fold cross-validation were performed to evaluate the diagnostic performance of single-parametric and multiparametric models. The mean ADC and D values were significantly lower in malignant sinonasal lesions than in benign sinonasal lesions (both P < .001). The mean f value was higher in malignant lesions than in benign lesions ( P = .003). Multiparametric models can significantly improve the cross-validated areas under the curve for the differentiation of sinonasal lesions compared with single-parametric models (all corrected P < .05 except the D value). The model of D + f provided a better diagnostic performance than the ADC value (corrected P < .001). Intravoxel incoherent motion appears to be a more effective MR imaging technique than conventional DWI in the differentiation of benign and malignant sinonasal lesions. © 2018 by American Journal of Neuroradiology.
NASA Astrophysics Data System (ADS)
Tailanián, Matías; Castiglioni, Enrique; Musé, Pablo; Fernández Flores, Germán.; Lema, Gabriel; Mastrángelo, Pedro; Almansa, Mónica; Fernández Liñares, Ignacio; Fernández Liñares, Germán.
2015-10-01
Soybean producers suffer from caterpillar damage in many areas of the world. Estimated average economic losses are annually 500 million USD in Brazil, Argentina, Paraguay and Uruguay. Designing efficient pest control management using selective and targeted pesticide applications is extremely important both from economic and environmental perspectives. With that in mind, we conducted a research program during the 2013-2014 and 2014-2015 planting seasons in a 4,000 ha soybean farm, seeking to achieve early pest detection. Nowadays pest presence is evaluated using manual, labor-intensive counting methods based on sampling strategies which are time consuming and imprecise. The experiment was conducted as follows. Using manual counting methods as ground-truth, a spectrometer capturing reflectance from 400 to 1100 nm was used to measure the reflectance of soy plants. A first conclusion, resulting from measuring the spectral response at leaves level, showed that stress was a property of plants since different leaves with different levels of damage yielded the same spectral response. Then, to assess the applicability of unsupervised classification of plants as healthy, biotic-stressed or abiotic-stressed, feature extraction and selection from leaves spectral signatures, combined with a Supported Vector Machine classifier was designed. Optimization of SVM parameters using grid search with cross-validation, along with classification evaluation by ten-folds cross-validation showed a correct classification rate of 95%, consistently on both seasons. Controlled experiments using cages with different numbers of caterpillars--including caterpillar-free plants--were also conducted to evaluate consistency in trends of the spectral response as well as the extracted features.
NASA Astrophysics Data System (ADS)
Bauer, Daniel R.; Olafsson, Ragnar; Montilla, Leonardo G.; Witte, Russell S.
2010-02-01
Understanding the tumor microenvironment is critical to characterizing how cancers operate and predicting how they will eventually respond to treatment. The mouse window chamber model is an excellent tool for cancer research, because it enables high resolution tumor imaging and cross-validation using multiple modalities. We describe a novel multimodality imaging system that incorporates three dimensional (3D) photoacoustics with pulse echo ultrasound for imaging the tumor microenvironment and tracking tissue growth in mice. Three mice were implanted with a dorsal skin flap window chamber. PC-3 prostate tumor cells, expressing green fluorescent protein (GFP), were injected into the skin. The ensuing tumor invasion was mapped using photoacoustic and pulse echo imaging, as well as optical and fluorescent imaging for comparison and cross validation. The photoacoustic imaging and spectroscopy system, consisting of a tunable (680-1000nm) pulsed laser and 25 MHz ultrasound transducer, revealed near infrared absorbing regions, primarily blood vessels. Pulse echo images, obtained simultaneously, provided details of the tumor microstructure and growth with 100-μm3 resolution. The tumor size in all three mice increased between three and five fold during 3+ weeks of imaging. Results were consistent with the optical and fluorescent images. Photoacoustic imaging revealed detailed maps of the tumor vasculature, whereas photoacoustic spectroscopy identified regions of oxygenated and deoxygenated blood vessels. The 3D photoacoustic and pulse echo imaging system provided complementary information to track the tumor microenvironment, evaluate new cancer therapies, and develop molecular imaging agents in vivo. Finally, these safe and noninvasive techniques are potentially applicable for human cancer imaging.
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Predictive model of outcome of targeted nodal assessment in colorectal cancer.
Nissan, Aviram; Protic, Mladjan; Bilchik, Anton; Eberhardt, John; Peoples, George E; Stojadinovic, Alexander
2010-02-01
Improvement in staging accuracy is the principal aim of targeted nodal assessment in colorectal carcinoma. Technical factors independently predictive of false negative (FN) sentinel lymph node (SLN) mapping should be identified to facilitate operative decision making. To define independent predictors of FN SLN mapping and to develop a predictive model that could support surgical decisions. Data was analyzed from 2 completed prospective clinical trials involving 278 patients with colorectal carcinoma undergoing SLN mapping. Clinical outcome of interest was FN SLN(s), defined as one(s) with no apparent tumor cells in the presence of non-SLN metastases. To assess the independent predictive effect of a covariate for a nominal response (FN SLN), a logistic regression model was constructed and parameters estimated using maximum likelihood. A probabilistic Bayesian model was also trained and cross validated using 10-fold train-and-test sets to predict FN SLN mapping. Area under the curve (AUC) from receiver operating characteristics curves of these predictions was calculated to determine the predictive value of the model. Number of SLNs (<3; P = 0.03) and tumor-replaced nodes (P < 0.01) independently predicted FN SLN. Cross validation of the model created with Bayesian Network Analysis effectively predicted FN SLN (area under the curve = 0.84-0.86). The positive and negative predictive values of the model are 83% and 97%, respectively. This study supports a minimum threshold of 3 nodes for targeted nodal assessment in colorectal cancer, and establishes sufficient basis to conclude that SLN mapping and biopsy cannot be justified in the presence of clinically apparent tumor-replaced nodes.
Classification of Focal and Non Focal Epileptic Seizures Using Multi-Features and SVM Classifier.
Sriraam, N; Raghu, S
2017-09-02
Identifying epileptogenic zones prior to surgery is an essential and crucial step in treating patients having pharmacoresistant focal epilepsy. Electroencephalogram (EEG) is a significant measurement benchmark to assess patients suffering from epilepsy. This paper investigates the application of multi-features derived from different domains to recognize the focal and non focal epileptic seizures obtained from pharmacoresistant focal epilepsy patients from Bern Barcelona database. From the dataset, five different classification tasks were formed. Total 26 features were extracted from focal and non focal EEG. Significant features were selected using Wilcoxon rank sum test by setting p-value (p < 0.05) and z-score (-1.96 > z > 1.96) at 95% significance interval. Hypothesis was made that the effect of removing outliers improves the classification accuracy. Turkey's range test was adopted for pruning outliers from feature set. Finally, 21 features were classified using optimized support vector machine (SVM) classifier with 10-fold cross validation. Bayesian optimization technique was adopted to minimize the cross-validation loss. From the simulation results, it was inferred that the highest sensitivity, specificity, and classification accuracy of 94.56%, 89.74%, and 92.15% achieved respectively and found to be better than the state-of-the-art approaches. Further, it was observed that the classification accuracy improved from 80.2% with outliers to 92.15% without outliers. The classifier performance metrics ensures the suitability of the proposed multi-features with optimized SVM classifier. It can be concluded that the proposed approach can be applied for recognition of focal EEG signals to localize epileptogenic zones.
NASA Astrophysics Data System (ADS)
Lu, Cheng-Tsung; Chen, Shu-An; Bretaña, Neil Arvin; Cheng, Tzu-Hsiu; Lee, Tzong-Yi
2011-10-01
In proteins, glutamate (Glu) residues are transformed into γ-carboxyglutamate (Gla) residues in a process called carboxylation. The process of protein carboxylation catalyzed by γ-glutamyl carboxylase is deemed to be important due to its involvement in biological processes such as blood clotting cascade and bone growth. There is an increasing interest within the scientific community to identify protein carboxylation sites. However, experimental identification of carboxylation sites via mass spectrometry-based methods is observed to be expensive, time-consuming, and labor-intensive. Thus, we were motivated to design a computational method for identifying protein carboxylation sites. This work aims to investigate the protein carboxylation by considering the composition of amino acids that surround modification sites. With the implication of a modified residue prefers to be accessible on the surface of a protein, the solvent-accessible surface area (ASA) around carboxylation sites is also investigated. Radial basis function network is then employed to build a predictive model using various features for identifying carboxylation sites. Based on a five-fold cross-validation evaluation, a predictive model trained using the combined features of amino acid sequence (AA20D), amino acid composition, and ASA, yields the highest accuracy at 0.874. Furthermore, an independent test done involving data not included in the cross-validation process indicates that in silico identification is a feasible means of preliminary analysis. Additionally, the predictive method presented in this work is implemented as Carboxylator (http://csb.cse.yzu.edu.tw/Carboxylator/), a web-based tool for identifying carboxylated proteins with modification sites in order to help users in investigating γ-glutamyl carboxylation.
Alizadeh Sardroud, Hamed; Nemati, Sorour; Baradar Khoshfetrat, Ali; Nabavinia, Mahbobeh; Beygi Khosrowshahi, Younes
2017-08-01
Influence of gelatine concentration and cross-linker ions of Ca 2+ and Ba 2+ was evaluated on characteristics of alginate hydrogels and proliferation behaviours of model adherent and suspendable stem cells of fibroblast and U937 embedded in alginate microcapsules. Increasing gelatine concentration to 2.5% increased extent of swelling to 15% and 25% for barium- and calcium-cross-linked hydrogels, respectively. Mechanical properties also decreased with increasing swelling of hydrogels. Both by increasing gelatine concentration and using barium ions increased considerably the proliferation of encapsulated model stem cells. Barium-cross-linked alginate-gelatine microcapsule tested for bone building block showed a 13.5 ± 1.5-fold expansion for osteoblast cells after 21 days with deposition of bone matrix. The haematopoietic stem cells cultured in the microcapsule after 7 days also showed up to 2-fold increase without adding any growth factor. The study demonstrates that barium-cross-linked alginate-gelatine microcapsule has potential for use as a simple and efficient 3D platform for stem cell production and modular tissue formation.
Afzal, M B S; Shad, S A
2016-06-01
Cotton mealybug Phenacoccus solenopsis (Tinsley) (Homoptera: Pseudococcidae) is a sucking pest of worldwide importance causing huge losses by feeding upon cotton in various parts of the world. Because of the importance of this pest, this research was carried out to select emamectin resistance in P. solenopsis in the laboratory to study cross-resistance, stability, realized heritability, and fitness cost of emamectin resistance. After selection from third generation (G3) to G6, P. solenopsis developed very high emamectin resistance (159.24-fold) when compared to a susceptible unselected population (Unsel pop). Population selected to emamectin benzoate conferred moderate (45.81-fold), low (14.06-fold), and no cross-resistance with abamectin, cypermethrin, and profenofos, respectively compared to the Unsel pop. A significant decline in emamectin resistance was observed in the resistant population when not exposed to emamectin from G7 to G13. The estimated realized heritability (h (2)) for emamectin resistance was 0.84. A high fitness cost was associated with emamectin resistance in P. solenopsis. Results of this study may be helpful in devising insecticide resistance management strategies for P. solenopsis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schirmer, T.W.
1988-05-01
Detailed mapping and cross-section traverses provide the control for structural analysis and geometric modeling of the Ogden duplex, a complex thrust system exposed in the Wasatch Mountains, east of Ogden, Utah. The structures consist of east-dipping folded thrust faults, basement-cored horses, lateral ramps and folds, and tear faults. The sequence of thrusting determined by means of lateral overlap of horses, thrust-splay relationships, and a top-to-bottom piggyback development is Willard thrust, Ogden thrust, Weber thrust, and Taylor thrust. Major decollement zones occur in the Cambrian shales and limestones. The Tintic Quartzite is the marker for determining gross geometries of horses. Thismore » exposed duplex serves as a good model to illustrate the method of constructing a hanging-wall sequence diagram - a series of longitudinal cross sections that move forward in time and space, and show how a thrust system formed as it moved updip over various footwall ramps. A hanging wall sequence diagram also shows the complex lateral variations in a thrust system and helps to locate lateral ramps, lateral folds, tear faults, and other features not shown on dip-oriented cross sections. 8 figures.« less
LeDell, Erin; Petersen, Maya; van der Laan, Mark
In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.
Petersen, Maya; van der Laan, Mark
2015-01-01
In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC. PMID:26279737
A cross-validation package driving Netica with python
Fienen, Michael N.; Plant, Nathaniel G.
2014-01-01
Bayesian networks (BNs) are powerful tools for probabilistically simulating natural systems and emulating process models. Cross validation is a technique to avoid overfitting resulting from overly complex BNs. Overfitting reduces predictive skill. Cross-validation for BNs is known but rarely implemented due partly to a lack of software tools designed to work with available BN packages. CVNetica is open-source, written in Python, and extends the Netica software package to perform cross-validation and read, rebuild, and learn BNs from data. Insights gained from cross-validation and implications on prediction versus description are illustrated with: a data-driven oceanographic application; and a model-emulation application. These examples show that overfitting occurs when BNs become more complex than allowed by supporting data and overfitting incurs computational costs as well as causing a reduction in prediction skill. CVNetica evaluates overfitting using several complexity metrics (we used level of discretization) and its impact on performance metrics (we used skill).
Lee, Sun; Bae, Yuna H; Worley, Marcia; Law, Anandi
2017-09-08
Barriers to medication adherence stem from multiple factors. An effective and convenient tool is needed to identify these barriers so that clinicians can provide a tailored, patient-centered consultation with patients. The Modified Drug Adherence Work-up Tool (M-DRAW) was developed as a 13-item checklist questionnaire to identify barriers to medication adherence. The response scale was a 4-point Likert scale of frequency of occurrence (1 = never to 4 = often). The checklist was accompanied by a GUIDE that provided corresponding motivational interview-based intervention strategies for each identified barrier. The current pilot study examined the psychometric properties of the M-DRAW checklist (reliability, responsiveness and discriminant validity) in patients taking one or more prescription medication(s) for chronic conditions. A cross-sectional sample of 26 patients was recruited between December 2015 and March 2016 at an academic medical center pharmacy in Southern California. A priming question that assessed self-reported adherence was used to separate participants into the control group of 17 "adherers" (65.4%), and into the intervention group of nine "unintentional and intentional non-adherers" (34.6%). Comparable baseline characteristics were observed between the two groups. The M-DRAW checklist showed acceptable reliability (13 item; alpha = 0.74) for identifying factors and barriers leading to medication non-adherence. Discriminant validity of the tool and the priming question was established by the four-fold number of barriers to adherence identified within the self-selected intervention group compared to the control group (4.4 versus 1.2 barriers, p < 0.05). The current study did not investigate construct validity due to small sample size and challenges on follow-up with patients. Future testing of the tool will include construct validation.
Automatic Detection of Whole Night Snoring Events Using Non-Contact Microphone
Dafna, Eliran; Tarasiuk, Ariel; Zigel, Yaniv
2013-01-01
Objective Although awareness of sleep disorders is increasing, limited information is available on whole night detection of snoring. Our study aimed to develop and validate a robust, high performance, and sensitive whole-night snore detector based on non-contact technology. Design Sounds during polysomnography (PSG) were recorded using a directional condenser microphone placed 1 m above the bed. An AdaBoost classifier was trained and validated on manually labeled snoring and non-snoring acoustic events. Patients Sixty-seven subjects (age 52.5±13.5 years, BMI 30.8±4.7 kg/m2, m/f 40/27) referred for PSG for obstructive sleep apnea diagnoses were prospectively and consecutively recruited. Twenty-five subjects were used for the design study; the validation study was blindly performed on the remaining forty-two subjects. Measurements and Results To train the proposed sound detector, >76,600 acoustic episodes collected in the design study were manually classified by three scorers into snore and non-snore episodes (e.g., bedding noise, coughing, environmental). A feature selection process was applied to select the most discriminative features extracted from time and spectral domains. The average snore/non-snore detection rate (accuracy) for the design group was 98.4% based on a ten-fold cross-validation technique. When tested on the validation group, the average detection rate was 98.2% with sensitivity of 98.0% (snore as a snore) and specificity of 98.3% (noise as noise). Conclusions Audio-based features extracted from time and spectral domains can accurately discriminate between snore and non-snore acoustic events. This audio analysis approach enables detection and analysis of snoring sounds from a full night in order to produce quantified measures for objective follow-up of patients. PMID:24391903
Automatic detection of whole night snoring events using non-contact microphone.
Dafna, Eliran; Tarasiuk, Ariel; Zigel, Yaniv
2013-01-01
Although awareness of sleep disorders is increasing, limited information is available on whole night detection of snoring. Our study aimed to develop and validate a robust, high performance, and sensitive whole-night snore detector based on non-contact technology. Sounds during polysomnography (PSG) were recorded using a directional condenser microphone placed 1 m above the bed. An AdaBoost classifier was trained and validated on manually labeled snoring and non-snoring acoustic events. Sixty-seven subjects (age 52.5 ± 13.5 years, BMI 30.8 ± 4.7 kg/m(2), m/f 40/27) referred for PSG for obstructive sleep apnea diagnoses were prospectively and consecutively recruited. Twenty-five subjects were used for the design study; the validation study was blindly performed on the remaining forty-two subjects. To train the proposed sound detector, >76,600 acoustic episodes collected in the design study were manually classified by three scorers into snore and non-snore episodes (e.g., bedding noise, coughing, environmental). A feature selection process was applied to select the most discriminative features extracted from time and spectral domains. The average snore/non-snore detection rate (accuracy) for the design group was 98.4% based on a ten-fold cross-validation technique. When tested on the validation group, the average detection rate was 98.2% with sensitivity of 98.0% (snore as a snore) and specificity of 98.3% (noise as noise). Audio-based features extracted from time and spectral domains can accurately discriminate between snore and non-snore acoustic events. This audio analysis approach enables detection and analysis of snoring sounds from a full night in order to produce quantified measures for objective follow-up of patients.
A machine learning approach to multi-level ECG signal quality classification.
Li, Qiao; Rajagopalan, Cadathur; Clifford, Gari D
2014-12-01
Current electrocardiogram (ECG) signal quality assessment studies have aimed to provide a two-level classification: clean or noisy. However, clinical usage demands more specific noise level classification for varying applications. This work outlines a five-level ECG signal quality classification algorithm. A total of 13 signal quality metrics were derived from segments of ECG waveforms, which were labeled by experts. A support vector machine (SVM) was trained to perform the classification and tested on a simulated dataset and was validated using data from the MIT-BIH arrhythmia database (MITDB). The simulated training and test datasets were created by selecting clean segments of the ECG in the 2011 PhysioNet/Computing in Cardiology Challenge database, and adding three types of real ECG noise at different signal-to-noise ratio (SNR) levels from the MIT-BIH Noise Stress Test Database (NSTDB). The MITDB was re-annotated for five levels of signal quality. Different combinations of the 13 metrics were trained and tested on the simulated datasets and the best combination that produced the highest classification accuracy was selected and validated on the MITDB. Performance was assessed using classification accuracy (Ac), and a single class overlap accuracy (OAc), which assumes that an individual type classified into an adjacent class is acceptable. An Ac of 80.26% and an OAc of 98.60% on the test set were obtained by selecting 10 metrics while 57.26% (Ac) and 94.23% (OAc) were the numbers for the unseen MITDB validation data without retraining. By performing the fivefold cross validation, an Ac of 88.07±0.32% and OAc of 99.34±0.07% were gained on the validation fold of MITDB. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Cross-Validation of easyCBM Reading Cut Scores in Washington: 2009-2010. Technical Report #1109
ERIC Educational Resources Information Center
Irvin, P. Shawn; Park, Bitnara Jasmine; Anderson, Daniel; Alonzo, Julie; Tindal, Gerald
2011-01-01
This technical report presents results from a cross-validation study designed to identify optimal cut scores when using easyCBM[R] reading tests in Washington state. The cross-validation study analyzes data from the 2009-2010 academic year for easyCBM[R] reading measures. A sample of approximately 900 students per grade, randomly split into two…
Kaneko, Hiromasa; Funatsu, Kimito
2013-09-23
We propose predictive performance criteria for nonlinear regression models without cross-validation. The proposed criteria are the determination coefficient and the root-mean-square error for the midpoints between k-nearest-neighbor data points. These criteria can be used to evaluate predictive ability after the regression models are updated, whereas cross-validation cannot be performed in such a situation. The proposed method is effective and helpful in handling big data when cross-validation cannot be applied. By analyzing data from numerical simulations and quantitative structural relationships, we confirm that the proposed criteria enable the predictive ability of the nonlinear regression models to be appropriately quantified.
Cross-validation to select Bayesian hierarchical models in phylogenetics.
Duchêne, Sebastián; Duchêne, David A; Di Giallonardo, Francesca; Eden, John-Sebastian; Geoghegan, Jemma L; Holt, Kathryn E; Ho, Simon Y W; Holmes, Edward C
2016-05-26
Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.
Tian, Kuang-da; Qiu, Kai-xian; Li, Zu-hong; Lü, Ya-qiong; Zhang, Qiu-ju; Xiong, Yan-mei; Min, Shun-geng
2014-12-01
The purpose of the present paper is to determine calcium and magnesium in tobacco using NIR combined with least squares-support vector machine (LS-SVM). Five hundred ground and dried tobacco samples from Qujing city, Yunnan province, China, were surveyed by a MATRIX-I spectrometer (Bruker Optics, Bremen, Germany). At the beginning of data processing, outliers of samples were eliminated for stability of the model. The rest 487 samples were divided into several calibration sets and validation sets according to a hybrid modeling strategy. Monte-Carlo cross validation was used to choose the best spectral preprocess method from multiplicative scatter correction (MSC), standard normal variate transformation (SNV), S-G smoothing, 1st derivative, etc., and their combinations. To optimize parameters of LS-SVM model, the multilayer grid search and 10-fold cross validation were applied. The final LS-SVM models with the optimizing parameters were trained by the calibration set and accessed by 287 validation samples picked by Kennard-Stone method. For the quantitative model of calcium in tobacco, Savitzky-Golay FIR smoothing with frame size 21 showed the best performance. The regularization parameter λ of LS-SVM was e16.11, while the bandwidth of the RBF kernel σ2 was e8.42. The determination coefficient for prediction (Rc(2)) was 0.9755 and the determination coefficient for prediction (Rp(2)) was 0.9422, better than the performance of PLS model (Rc(2)=0.9593, Rp(2)=0.9344). For the quantitative analysis of magnesium, SNV made the regression model more precise than other preprocess. The optimized λ was e15.25 and σ2 was e6.32. Rc(2) and Rp(2) were 0.9961 and 0.9301, respectively, better than PLS model (Rc(2)=0.9716, Rp(2)=0.8924). After modeling, the whole progress of NIR scan and data analysis for one sample was within tens of seconds. The overall results show that NIR spectroscopy combined with LS-SVM can be efficiently utilized for rapid and accurate analysis of calcium and magnesium in tobacco.
Vanderploeg, Rodney D; Cooper, Douglas B; Belanger, Heather G; Donnell, Alison J; Kennedy, Jan E; Hopewell, Clifford A; Scott, Steven G
2014-01-01
To develop and cross-validate internal validity scales for the Neurobehavioral Symptom Inventory (NSI). Four existing data sets were used: (1) outpatient clinical traumatic brain injury (TBI)/neurorehabilitation database from a military site (n = 403), (2) National Department of Veterans Affairs TBI evaluation database (n = 48 175), (3) Florida National Guard nonclinical TBI survey database (n = 3098), and (4) a cross-validation outpatient clinical TBI/neurorehabilitation database combined across 2 military medical centers (n = 206). Secondary analysis of existing cohort data to develop (study 1) and cross-validate (study 2) internal validity scales for the NSI. The NSI, Mild Brain Injury Atypical Symptoms, and Personality Assessment Inventory scores. Study 1: Three NSI validity scales were developed, composed of 5 unusual items (Negative Impression Management [NIM5]), 6 low-frequency items (LOW6), and the combination of 10 nonoverlapping items (Validity-10). Cut scores maximizing sensitivity and specificity on these measures were determined, using a Mild Brain Injury Atypical Symptoms score of 8 or more as the criterion for invalidity. Study 2: The same validity scale cut scores again resulted in the highest classification accuracy and optimal balance between sensitivity and specificity in the cross-validation sample, using a Personality Assessment Inventory Negative Impression Management scale with a T score of 75 or higher as the criterion for invalidity. The NSI is widely used in the Department of Defense and Veterans Affairs as a symptom-severity assessment following TBI, but is subject to symptom overreporting or exaggeration. This study developed embedded NSI validity scales to facilitate the detection of invalid response styles. The NSI Validity-10 scale appears to hold considerable promise for validity assessment when the NSI is used as a population-screening tool.
When fast is better: protein folding fundamentals and mechanisms from ultrafast approaches.
Muñoz, Victor; Cerminara, Michele
2016-09-01
Protein folding research stalled for decades because conventional experiments indicated that proteins fold slowly and in single strokes, whereas theory predicted a complex interplay between dynamics and energetics resulting in myriad microscopic pathways. Ultrafast kinetic methods turned the field upside down by providing the means to probe fundamental aspects of folding, test theoretical predictions and benchmark simulations. Accordingly, experimentalists could measure the timescales for all relevant folding motions, determine the folding speed limit and confirm that folding barriers are entropic bottlenecks. Moreover, a catalogue of proteins that fold extremely fast (microseconds) could be identified. Such fast-folding proteins cross shallow free energy barriers or fold downhill, and thus unfold with minimal co-operativity (gradually). A new generation of thermodynamic methods has exploited this property to map folding landscapes, interaction networks and mechanisms at nearly atomic resolution. In parallel, modern molecular dynamics simulations have finally reached the timescales required to watch fast-folding proteins fold and unfold in silico All of these findings have buttressed the fundamentals of protein folding predicted by theory, and are now offering the first glimpses at the underlying mechanisms. Fast folding appears to also have functional implications as recent results connect downhill folding with intrinsically disordered proteins, their complex binding modes and ability to moonlight. These connections suggest that the coupling between downhill (un)folding and binding enables such protein domains to operate analogically as conformational rheostats. © 2016 The Author(s).
Cameriere, Roberto; Velandia Palacio, Luz Andrea; Pinares, Jorge; Bestetti, Fiorella; Paba, Rossella; Coccia, Erminia; Ferrante, Luigi
2018-04-01
This retrospective cross-sectional study has two-fold aims: the first is to assess new cut-offs at the legal age thresholds (LATs) of 14 and 16 years old and the second is to validate the cut-off of third molar index I 3M =0.08 for 18 years of age in Chilean people. Orthopantomographs from 822 Chilean children aged from 11 to 22 (472 girls and 350 boys) were analysed. For LAT of 14 years, cut-offs were found using the ROC curves singly for boys and girls. The cut-offs for boys were I 2M =0.16 and I 3M =0.73 while for girls we obtained I 2M =0.10 and I 3M =0.77. For LAT of 16 years we obtained the same cut-offs regardless of gender, which were 0.06 and 0.36 for I 2M and I 3M respectively. Concerning the validity of I 3M cut-off for 18 years old in Chilean population, the proportion of correctly classified individuals was 83% and estimated post-test probability, PPV, was 93.2%, with a 95% confidence interval equals to 91.3%, 94.6%. Hence, the probability that a subject positive on the test was 18 years of age or older was 93.2%, confirming the validation of the I 3M cut-off for Chilean population. Copyright © 2018 Elsevier B.V. All rights reserved.
Monga, Isha; Qureshi, Abid; Thakur, Nishant; Gupta, Amit Kumar; Kumar, Manoj
2017-01-01
Allele-specific siRNAs (ASP-siRNAs) have emerged as promising therapeutic molecules owing to their selectivity to inhibit the mutant allele or associated single-nucleotide polymorphisms (SNPs) sparing the expression of the wild-type counterpart. Thus, a dedicated bioinformatics platform encompassing updated ASP-siRNAs and an algorithm for the prediction of their inhibitory efficacy will be helpful in tackling currently intractable genetic disorders. In the present study, we have developed the ASPsiRNA resource (http://crdd.osdd.net/servers/aspsirna/) covering three components viz (i) ASPsiDb, (ii) ASPsiPred, and (iii) analysis tools like ASP-siOffTar. ASPsiDb is a manually curated database harboring 4543 (including 422 chemically modified) ASP-siRNAs targeting 78 unique genes involved in 51 different diseases. It furnishes comprehensive information from experimental studies on ASP-siRNAs along with multidimensional genetic and clinical information for numerous mutations. ASPsiPred is a two-layered algorithm to predict efficacy of ASP-siRNAs for fully complementary mutant (Effmut) and wild-type allele (Effwild) with one mismatch by ASPsiPredSVM and ASPsiPredmatrix, respectively. In ASPsiPredSVM, 922 unique ASP-siRNAs with experimentally validated quantitative Effmut were used. During 10-fold cross-validation (10nCV) employing various sequence features on the training/testing dataset (T737), the best predictive model achieved a maximum Pearson’s correlation coefficient (PCC) of 0.71. Further, the accuracy of the classifier to predict Effmut against novel genes was assessed by leave one target out cross-validation approach (LOTOCV). ASPsiPredmatrix was constructed from rule-based studies describing the effect of single siRNA:mRNA mismatches on the efficacy at 19 different locations of siRNA. Thus, ASPsiRNA encompasses the first database, prediction algorithm, and off-target analysis tool that is expected to accelerate research in the field of RNAi-based therapeutics for human genetic diseases. PMID:28696921
NASA Astrophysics Data System (ADS)
Ali, Mumtaz; Deo, Ravinesh C.; Downs, Nathan J.; Maraseni, Tek
2018-07-01
Forecasting drought by means of the World Meteorological Organization-approved Standardized Precipitation Index (SPI) is considered to be a fundamental task to support socio-economic initiatives and effectively mitigating the climate-risk. This study aims to develop a robust drought modelling strategy to forecast multi-scalar SPI in drought-rich regions of Pakistan where statistically significant lagged combinations of antecedent SPI are used to forecast future SPI. With ensemble-Adaptive Neuro Fuzzy Inference System ('ensemble-ANFIS') executed via a 10-fold cross-validation procedure, a model is constructed by randomly partitioned input-target data. Resulting in 10-member ensemble-ANFIS outputs, judged by mean square error and correlation coefficient in the training period, the optimal forecasts are attained by the averaged simulations, and the model is benchmarked with M5 Model Tree and Minimax Probability Machine Regression (MPMR). The results show the proposed ensemble-ANFIS model's preciseness was notably better (in terms of the root mean square and mean absolute error including the Willmott's, Nash-Sutcliffe and Legates McCabe's index) for the 6- and 12- month compared to the 3-month forecasts as verified by the largest error proportions that registered in smallest error band. Applying 10-member simulations, ensemble-ANFIS model was validated for its ability to forecast severity (S), duration (D) and intensity (I) of drought (including the error bound). This enabled uncertainty between multi-models to be rationalized more efficiently, leading to a reduction in forecast error caused by stochasticity in drought behaviours. Through cross-validations at diverse sites, a geographic signature in modelled uncertainties was also calculated. Considering the superiority of ensemble-ANFIS approach and its ability to generate uncertainty-based information, the study advocates the versatility of a multi-model approach for drought-risk forecasting and its prime importance for estimating drought properties over confidence intervals to generate better information for strategic decision-making.
Brown, Christopher A.; Brown, Kevin S.
2010-01-01
Correlated amino acid substitution algorithms attempt to discover groups of residues that co-fluctuate due to either structural or functional constraints. Although these algorithms could inform both ab initio protein folding calculations and evolutionary studies, their utility for these purposes has been hindered by a lack of confidence in their predictions due to hard to control sources of error. To complicate matters further, naive users are confronted with a multitude of methods to choose from, in addition to the mechanics of assembling and pruning a dataset. We first introduce a new pair scoring method, called ZNMI (Z-scored-product Normalized Mutual Information), which drastically improves the performance of mutual information for co-fluctuating residue prediction. Second and more important, we recast the process of finding coevolving residues in proteins as a data-processing pipeline inspired by the medical imaging literature. We construct an ensemble of alignment partitions that can be used in a cross-validation scheme to assess the effects of choices made during the procedure on the resulting predictions. This pipeline sensitivity study gives a measure of reproducibility (how similar are the predictions given perturbations to the pipeline?) and accuracy (are residue pairs with large couplings on average close in tertiary structure?). We choose a handful of published methods, along with ZNMI, and compare their reproducibility and accuracy on three diverse protein families. We find that (i) of the algorithms tested, while none appear to be both highly reproducible and accurate, ZNMI is one of the most accurate by far and (ii) while users should be wary of predictions drawn from a single alignment, considering an ensemble of sub-alignments can help to determine both highly accurate and reproducible couplings. Our cross-validation approach should be of interest both to developers and end users of algorithms that try to detect correlated amino acid substitutions. PMID:20531955
Pothula, Venu M.; Yuan, Stanley C.; Maerz, David A.; Montes, Lucresia; Oleszkiewicz, Stephen M.; Yusupov, Albert; Perline, Richard
2015-01-01
Background Advanced predictive analytical techniques are being increasingly applied to clinical risk assessment. This study compared a neural network model to several other models in predicting the length of stay (LOS) in the cardiac surgical intensive care unit (ICU) based on pre-incision patient characteristics. Methods Thirty six variables collected from 185 cardiac surgical patients were analyzed for contribution to ICU LOS. The Automatic Linear Modeling (ALM) module of IBM-SPSS software identified 8 factors with statistically significant associations with ICU LOS; these factors were also analyzed with the Artificial Neural Network (ANN) module of the same software. The weighted contributions of each factor (“trained” data) were then applied to data for a “new” patient to predict ICU LOS for that individual. Results Factors identified in the ALM model were: use of an intra-aortic balloon pump; O2 delivery index; age; use of positive cardiac inotropic agents; hematocrit; serum creatinine ≥ 1.3 mg/deciliter; gender; arterial pCO2. The r2 value for ALM prediction of ICU LOS in the initial (training) model was 0.356, p <0.0001. Cross validation in prediction of a “new” patient yielded r2 = 0.200, p <0.0001. The same 8 factors analyzed with ANN yielded a training prediction r2 of 0.535 (p <0.0001) and a cross validation prediction r2 of 0.410, p <0.0001. Two additional predictive algorithms were studied, but they had lower prediction accuracies. Our validated neural network model identified the upper quartile of ICU LOS with an odds ratio of 9.8(p <0.0001). Conclusions ANN demonstrated a 2-fold greater accuracy than ALM in prediction of observed ICU LOS. This greater accuracy would be presumed to result from the capacity of ANN to capture nonlinear effects and higher order interactions. Predictive modeling may be of value in early anticipation of risks of post-operative morbidity and utilization of ICU facilities. PMID:26710254
Shah, Neomi; Hanna, David B; Teng, Yanping; Sotres-Alvarez, Daniela; Hall, Martica; Loredo, Jose S; Zee, Phyllis; Kim, Mimi; Yaggi, H Klar; Redline, Susan; Kaplan, Robert C
2016-06-01
We developed and validated the first-ever sleep apnea (SA) risk calculator in a large population-based cohort of Hispanic/Latino subjects. Cross-sectional data on adults from the Hispanic Community Health Study/Study of Latinos (2008-2011) were analyzed. Subjective and objective sleep measurements were obtained. Clinically significant SA was defined as an apnea-hypopnea index ≥ 15 events per hour. Using logistic regression, four prediction models were created: three sex-specific models (female-only, male-only, and a sex × covariate interaction model to allow differential predictor effects), and one overall model with sex included as a main effect only. Models underwent 10-fold cross-validation and were assessed by using the C statistic. SA and its predictive variables; a total of 17 variables were considered. A total of 12,158 participants had complete sleep data available; 7,363 (61%) were women. The population-weighted prevalence of SA (apnea-hypopnea index ≥ 15 events per hour) was 6.1% in female subjects and 13.5% in male subjects. Male-only (C statistic, 0.808) and female-only (C statistic, 0.836) prediction models had the same predictor variables (ie, age, BMI, self-reported snoring). The sex-interaction model (C statistic, 0.836) contained sex, age, age × sex, BMI, BMI × sex, and self-reported snoring. The final overall model (C statistic, 0.832) contained age, BMI, snoring, and sex. We developed two websites for our SA risk calculator: one in English (https://www.montefiore.org/sleepapneariskcalc.html) and another in Spanish (http://www.montefiore.org/sleepapneariskcalc-es.html). We created an internally validated, highly discriminating, well-calibrated, and parsimonious prediction model for SA. Contrary to the study hypothesis, the variables did not have different predictive magnitudes in male and female subjects. Copyright © 2016 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.
Fox, Eric W; Hill, Ryan A; Leibowitz, Scott G; Olsen, Anthony R; Thornbrugh, Darren J; Weber, Marc H
2017-07-01
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.
Carmona-Bayonas, A; Jiménez-Fonseca, P; Font, C; Fenoy, F; Otero, R; Beato, C; Plasencia, J M; Biosca, M; Sánchez, M; Benegas, M; Calvo-Temprano, D; Varona, D; Faez, L; de la Haba, I; Antonio, M; Madridano, O; Solis, M P; Ramchandani, A; Castañón, E; Marchena, P J; Martín, M; Ayala de la Peña, F; Vicente, V
2017-01-01
Background: Our objective was to develop a prognostic stratification tool that enables patients with cancer and pulmonary embolism (PE), whether incidental or symptomatic, to be classified according to the risk of serious complications within 15 days. Methods: The sample comprised cases from a national registry of pulmonary thromboembolism in patients with cancer (1075 patients from 14 Spanish centres). Diagnosis was incidental in 53.5% of the events in this registry. The Exhaustive CHAID analysis was applied with 10-fold cross-validation to predict development of serious complications following PE diagnosis. Results: About 208 patients (19.3%, 95% confidence interval (CI), 17.1–21.8%) developed a serious complication after PE diagnosis. The 15-day mortality rate was 10.1%, (95% CI, 8.4–12.1%). The decision tree detected six explanatory covariates: Hestia-like clinical decision rule (any risk criterion present vs none), Eastern Cooperative Group performance scale (ECOG-PS; <2 vs ⩾2), O2 saturation (<90 vs ⩾90%), presence of PE-specific symptoms, tumour response (progression, unknown, or not evaluated vs others), and primary tumour resection. Three risk classes were created (low, intermediate, and high risk). The risk of serious complications within 15 days increases according to the group: 1.6, 9.4, 30.6% P<0.0001. Fifteen-day mortality rates also rise progressively in low-, intermediate-, and high-risk patients: 0.3, 6.1, and 17.1% P<0.0001. The cross-validated risk estimate is 0.191 (s.e.=0.012). The optimism-corrected area under the receiver operating characteristic curve is 0.779 (95% CI, 0.717–0.840). Conclusions: We have developed and internally validated a prognostic index to predict serious complications with the potential to impact decision-making in patients with cancer and PE. PMID:28267709
Predicting the Operational Acceptability of Route Advisories
NASA Technical Reports Server (NTRS)
Evans, Antony; Lee, Paul
2017-01-01
NASA envisions a future Air Traffic Management system that allows safe, efficient growth in global operations, enabled by increasing levels of automation and autonomy. In a safety-critical system, the introduction of increasing automation and autonomy has to be done in stages, making human-system integrated concepts critical in the foreseeable future. One example where this is relevant is for tools that generate more efficient flight routings or reroute advisories. If these routes are not operationally acceptable, they will be rejected by human operators, and the associated benefits will not be realized. Operational acceptance is therefore required to enable the increased efficiency and reduced workload benefits associated with these tools. In this paper, the authors develop a predictor of operational acceptability for reroute advisories. Such a capability has applications in tools that identify more efficient routings around weather and congestion and that better meet airline preferences. The capability is based on applying data mining techniques to flight plan amendment data reported by the Federal Aviation Administration and data on requested reroutes collected from a field trial of the NASA developed Dynamic Weather Routes tool, which advised efficient route changes to American Airlines dispatchers in 2014. 10-Fold cross validation was used for feature, model and parameter selection, while nested cross validation was used to validate the model. The model performed well in predicting controller acceptance or rejection of a route change as indicated by chosen performance metrics. Features identified as relevant to controller acceptance included the historical usage of the advised route, the location of the maneuver start point relative to the boundaries of the airspace sector containing the maneuver start (the maneuver start sector), the reroute deviation from the original flight plan, and the demand level in the maneuver start sector. A random forest with forty trees was the best performing of the five models evaluated in this paper.
Fetit, Ahmed E; Novak, Jan; Peet, Andrew C; Arvanitits, Theodoros N
2015-09-01
The aim of this study was to assess the efficacy of three-dimensional texture analysis (3D TA) of conventional MR images for the classification of childhood brain tumours in a quantitative manner. The dataset comprised pre-contrast T1 - and T2-weighted MRI series obtained from 48 children diagnosed with brain tumours (medulloblastoma, pilocytic astrocytoma and ependymoma). 3D and 2D TA were carried out on the images using first-, second- and higher order statistical methods. Six supervised classification algorithms were trained with the most influential 3D and 2D textural features, and their performances in the classification of tumour types, using the two feature sets, were compared. Model validation was carried out using the leave-one-out cross-validation (LOOCV) approach, as well as stratified 10-fold cross-validation, in order to provide additional reassurance. McNemar's test was used to test the statistical significance of any improvements demonstrated by 3D-trained classifiers. Supervised learning models trained with 3D textural features showed improved classification performances to those trained with conventional 2D features. For instance, a neural network classifier showed 12% improvement in area under the receiver operator characteristics curve (AUC) and 19% in overall classification accuracy. These improvements were statistically significant for four of the tested classifiers, as per McNemar's tests. This study shows that 3D textural features extracted from conventional T1 - and T2-weighted images can improve the diagnostic classification of childhood brain tumours. Long-term benefits of accurate, yet non-invasive, diagnostic aids include a reduction in surgical procedures, improvement in surgical and therapy planning, and support of discussions with patients' families. It remains necessary, however, to extend the analysis to a multicentre cohort in order to assess the scalability of the techniques used. Copyright © 2015 John Wiley & Sons, Ltd.
Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods.
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J Sunil
2014-08-01
We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called "Patient Recursive Survival Peeling" is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called "combined" cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication.
Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods
Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil
2015-01-01
We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called “Patient Recursive Survival Peeling” is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called “combined” cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication. PMID:26997922
Improving diagnostic recognition of primary hyperparathyroidism with machine learning.
Somnay, Yash R; Craven, Mark; McCoy, Kelly L; Carty, Sally E; Wang, Tracy S; Greenberg, Caprice C; Schneider, David F
2017-04-01
Parathyroidectomy offers the only cure for primary hyperparathyroidism, but today only 50% of primary hyperparathyroidism patients are referred for operation, in large part, because the condition is widely under-recognized. The diagnosis of primary hyperparathyroidism can be especially challenging with mild biochemical indices. Machine learning is a collection of methods in which computers build predictive algorithms based on labeled examples. With the aim of facilitating diagnosis, we tested the ability of machine learning to distinguish primary hyperparathyroidism from normal physiology using clinical and laboratory data. This retrospective cohort study used a labeled training set and 10-fold cross-validation to evaluate accuracy of the algorithm. Measures of accuracy included area under the receiver operating characteristic curve, precision (sensitivity), and positive and negative predictive value. Several different algorithms and ensembles of algorithms were tested using the Weka platform. Among 11,830 patients managed operatively at 3 high-volume endocrine surgery programs from March 2001 to August 2013, 6,777 underwent parathyroidectomy for confirmed primary hyperparathyroidism, and 5,053 control patients without primary hyperparathyroidism underwent thyroidectomy. Test-set accuracies for machine learning models were determined using 10-fold cross-validation. Age, sex, and serum levels of preoperative calcium, phosphate, parathyroid hormone, vitamin D, and creatinine were defined as potential predictors of primary hyperparathyroidism. Mild primary hyperparathyroidism was defined as primary hyperparathyroidism with normal preoperative calcium or parathyroid hormone levels. After testing a variety of machine learning algorithms, Bayesian network models proved most accurate, classifying correctly 95.2% of all primary hyperparathyroidism patients (area under receiver operating characteristic = 0.989). Omitting parathyroid hormone from the model did not decrease the accuracy significantly (area under receiver operating characteristic = 0.985). In mild disease cases, however, the Bayesian network model classified correctly 71.1% of patients with normal calcium and 92.1% with normal parathyroid hormone levels preoperatively. Bayesian networking and AdaBoost improved the accuracy of all parathyroid hormone patients to 97.2% cases (area under receiver operating characteristic = 0.994), and 91.9% of primary hyperparathyroidism patients with mild disease. This was significantly improved relative to Bayesian networking alone (P < .0001). Machine learning can diagnose accurately primary hyperparathyroidism without human input even in mild disease. Incorporation of this tool into electronic medical record systems may aid in recognition of this under-diagnosed disorder. Copyright © 2016 Elsevier Inc. All rights reserved.
Zhang, Xueying; Chu, Yiyi; Wang, Yuxuan; Zhang, Kai
2018-08-01
The regulatory monitoring data of particulate matter with an aerodynamic diameter <2.5μm (PM 2.5 ) in Texas have limited spatial and temporal coverage. The purpose of this study is to estimate the ground-level PM 2.5 concentrations on a daily basis using satellite-retrieved Aerosol Optical Depth (AOD) in the state of Texas. We obtained the AOD values at 1-km resolution generated through the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm based on the images retrieved from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellites. We then developed mixed-effects models based on AODs, land use features, geographic characteristics, and weather conditions, and the day-specific as well as site-specific random effects to estimate the PM 2.5 concentrations (μg/m 3 ) in the state of Texas during the period 2008-2013. The mixed-effects models' performance was evaluated using the coefficient of determination (R 2 ) and square root of the mean squared prediction error (RMSPE) from ten-fold cross-validation, which randomly selected 90% of the observations for training purpose and 10% of the observations for assessing the models' true prediction ability. Mixed-effects regression models showed good prediction performance (R 2 values from 10-fold cross validation: 0.63-0.69). The model performance varied by regions and study years, and the East region of Texas, and year of 2009 presented relatively higher prediction precision (R 2 : 0.62 for the East region; R 2 : 0.69 for the year of 2009). The PM 2.5 concentrations generated through our developed models at 1-km grid cells in the state of Texas showed a decreasing trend from 2008 to 2013 and a higher reduction of predicted PM 2.5 in more polluted areas. Our findings suggest that mixed-effects regression models developed based on MAIAC AOD are a feasible approach to predict ground-level PM 2.5 in Texas. Predicted PM 2.5 concentrations at the 1-km resolution on a daily basis can be used for epidemiological studies to investigate short- and long-term health impact of PM 2.5 in Texas. Copyright © 2017 Elsevier B.V. All rights reserved.
Parsing clinical text: how good are the state-of-the-art parsers?
2015-01-01
Background Parsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain. Methods In this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank. Results Our results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus. Conclusions Our study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text. PMID:26045009
Hajiloo, Mohsen; Sapkota, Yadav; Mackey, John R; Robson, Paula; Greiner, Russell; Damaraju, Sambasivarao
2013-02-22
Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification. We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual's continental and sub-continental ancestry. To predict an individual's continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control's λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of 86.5% ± 2.4%, 95.6% ± 3.9%, 95.6% ± 2.1%, 98.3% ± 2.0%, and 95.9% ± 1.5%. However, ETHNOPRED was unable to produce a classifier that can accurately distinguish Chinese in Beijing vs. Chinese in Denver. ETHNOPRED is a novel technique for producing classifiers that can identify an individual's continental and sub-continental heritage, based on a small number of SNPs. We show that its learned classifiers are simple, cost-efficient, accurate, transparent, flexible, fast, applicable to large scale GWASs, and robust to missing values.
NASA Technical Reports Server (NTRS)
Garner, Gregory G.; Thompson, Anne M.
2013-01-01
An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for
How to determine an optimal threshold to classify real-time crash-prone traffic conditions?
Yang, Kui; Yu, Rongjie; Wang, Xuesong; Quddus, Mohammed; Xue, Lifang
2018-08-01
One of the proactive approaches in reducing traffic crashes is to identify hazardous traffic conditions that may lead to a traffic crash, known as real-time crash prediction. Threshold selection is one of the essential steps of real-time crash prediction. And it provides the cut-off point for the posterior probability which is used to separate potential crash warnings against normal traffic conditions, after the outcome of the probability of a crash occurring given a specific traffic condition on the basis of crash risk evaluation models. There is however a dearth of research that focuses on how to effectively determine an optimal threshold. And only when discussing the predictive performance of the models, a few studies utilized subjective methods to choose the threshold. The subjective methods cannot automatically identify the optimal thresholds in different traffic and weather conditions in real application. Thus, a theoretical method to select the threshold value is necessary for the sake of avoiding subjective judgments. The purpose of this study is to provide a theoretical method for automatically identifying the optimal threshold. Considering the random effects of variable factors across all roadway segments, the mixed logit model was utilized to develop the crash risk evaluation model and further evaluate the crash risk. Cross-entropy, between-class variance and other theories were employed and investigated to empirically identify the optimal threshold. And K-fold cross-validation was used to validate the performance of proposed threshold selection methods with the help of several evaluation criteria. The results indicate that (i) the mixed logit model can obtain a good performance; (ii) the classification performance of the threshold selected by the minimum cross-entropy method outperforms the other methods according to the criteria. This method can be well-behaved to automatically identify thresholds in crash prediction, by minimizing the cross entropy between the original dataset with continuous probability of a crash occurring and the binarized dataset after using the thresholds to separate potential crash warnings against normal traffic conditions. Copyright © 2018 Elsevier Ltd. All rights reserved.
Assessing the Online Social Environment for Surveillance of Obesity Prevalence
Chunara, Rumi; Bouton, Lindsay; Ayers, John W.; Brownstein, John S.
2013-01-01
Background Understanding the social environmental around obesity has been limited by available data. One promising approach used to bridge similar gaps elsewhere is to use passively generated digital data. Purpose This article explores the relationship between online social environment via web-based social networks and population obesity prevalence. Methods We performed a cross-sectional study using linear regression and cross validation to measure the relationship and predictive performance of user interests on the online social network Facebook to obesity prevalence in metros across the United States of America (USA) and neighborhoods within New York City (NYC). The outcomes, proportion of obese and/or overweight population in USA metros and NYC neighborhoods, were obtained via the Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance and NYC EpiQuery systems. Predictors were geographically specific proportion of users with activity-related and sedentary-related interests on Facebook. Results Higher proportion of the population with activity-related interests on Facebook was associated with a significant 12.0% (95% Confidence Interval (CI) 11.9 to 12.1) lower predicted prevalence of obese and/or overweight people across USA metros and 7.2% (95% CI: 6.8 to 7.7) across NYC neighborhoods. Conversely, greater proportion of the population with interest in television was associated with higher prevalence of obese and/or overweight people of 3.9% (95% CI: 3.7 to 4.0) (USA) and 27.5% (95% CI: 27.1 to 27.9, significant) (NYC). For activity-interests and national obesity outcomes, the average root mean square prediction error from 10-fold cross validation was comparable to the average root mean square error of a model developed using the entire data set. Conclusions Activity-related interests across the USA and sedentary-related interests across NYC were significantly associated with obesity prevalence. Further research is needed to understand how the online social environment relates to health outcomes and how it can be used to identify or target interventions. PMID:23637820
Cross-Validation of easyCBM Reading Cut Scores in Oregon: 2009-2010. Technical Report #1108
ERIC Educational Resources Information Center
Park, Bitnara Jasmine; Irvin, P. Shawn; Anderson, Daniel; Alonzo, Julie; Tindal, Gerald
2011-01-01
This technical report presents results from a cross-validation study designed to identify optimal cut scores when using easyCBM[R] reading tests in Oregon. The cross-validation study analyzes data from the 2009-2010 academic year for easyCBM[R] reading measures. A sample of approximately 2,000 students per grade, randomly split into two groups of…
Bhunia, Bibhas K; Kaplan, David L; Mandal, Biman B
2018-01-16
Recapitulation of the form and function of complex tissue organization using appropriate biomaterials impacts success in tissue engineering endeavors. The annulus fibrosus (AF) represents a complex, multilamellar, hierarchical structure consisting of collagen, proteoglycans, and elastic fibers. To mimic the intricacy of AF anatomy, a silk protein-based multilayered, disc-like angle-ply construct was fabricated, consisting of concentric layers of lamellar sheets. Scanning electron microscopy and fluorescence image analysis revealed cross-aligned and lamellar characteristics of the construct, mimicking the native hierarchical architecture of the AF. Induction of secondary structure in the silk constructs was confirmed by infrared spectroscopy and X-ray diffraction. The constructs showed a compressive modulus of 499.18 ± 86.45 kPa. Constructs seeded with porcine AF cells and human mesenchymal stem cells (hMSCs) showed ∼2.2-fold and ∼1.7-fold increases in proliferation on day 14, respectively, compared with initial seeding. Biochemical analysis, histology, and immunohistochemistry results showed the deposition of AF-specific extracellular matrix (sulfated glycosaminoglycan and collagen type I), indicating a favorable environment for both cell types, which was further validated by the expression of AF tissue-specific genes. The constructs seeded with porcine AF cells showed ∼11-, ∼5.1-, and ∼6.7-fold increases in col I α 1 , sox 9, and aggrecan genes, respectively. The differentiation of hMSCs to AF-like tissue was evident from the enhanced expression of the AF-specific genes. Overall, the constructs supported cell proliferation, differentiation, and ECM deposition resulting in AF-like tissue features based on ECM deposition and morphology, indicating potential for future studies related to intervertebral disc replacement therapy.
NASA Astrophysics Data System (ADS)
Battistella, C.; Robinson, D.; McQuarrie, N.; Ghoshal, S.
2017-12-01
Multiple valid balanced cross sections can be produced from mapped surface and subsurface data. By integrating low temperature thermochronologic data, we are better able to predict subsurface geometries. Existing valid balanced cross section for far western Nepal are few (Robinson et al., 2006) and do not incorporate thermochronologic data because the data did not exist. The data published along the Simikot cross section along the Karnali River since then include muscovite Ar, zircon U-Th/He and apatite fission track. We present new mapping and a new valid balanced cross section that takes into account the new field data as well as the limitations that thermochronologic data places on the kinematics of the cross section. Additional constrains include some new geomorphology data acquired since 2006 that indicate areas of increased vertical uplift, which indicate locations of buried ramps in the Main Himalayan thrust and guide the locations of Lesser Himalayan ramps in the balanced cross section. Future work will include flexural modeling, new low temperature thermochronometic data, and 2-D thermokinematic models from a sequentially forward modeled balanced cross sections in far western Nepal.
Kumar, Amod; Gaur, Gyanendra Kumar; Gandham, Ravi Kumar; Panigrahi, Manjit; Ghosh, Shrikant; Saravanan, B C; Bhushan, Bharat; Tiwari, Ashok Kumar; Sulabh, Sourabh; Priya, Bhuvana; V N, Muhasin Asaf; Gupta, Jay Prakash; Wani, Sajad Ahmad; Sahu, Amit Ranjan; Sahoo, Aditya Prasad
2017-01-01
Bovine tropical theileriosis is an important haemoprotozoan disease associated with high rates of morbidity and mortality particularly in exotic and crossbred cattle. It is one of the major constraints of the livestock development programmes in India and Southeast Asia. Indigenous cattle (Bos indicus) are reported to be comparatively less affected than exotic and crossbred cattle. However, genetic basis of resistance to tropical theileriosis in indigenous cattle is not well documented. Recent studies incited an idea that differentially expressed genes in exotic and indigenous cattle play significant role in breed specific resistance to tropical theileriosis. The present study was designed to determine the global gene expression profile in peripheral blood mononuclear cells derived from indigenous (Tharparkar) and cross-bred cattle following in vitro infection of T. annulata (Parbhani strain). Two separate microarray experiments were carried out each for cross-bred and Tharparkar cattle. The cross-bred cattle showed 1082 differentially expressed genes (DEGs). Out of total DEGs, 597 genes were down-regulated and 485 were up-regulated. Their fold change varied from 2283.93 to -4816.02. Tharparkar cattle showed 875 differentially expressed genes including 451 down-regulated and 424 up-regulated. The fold change varied from 94.93 to -19.20. A subset of genes was validated by qRT-PCR and results were correlated well with microarray data indicating that microarray results provided an accurate report of transcript level. Functional annotation study of DEGs confirmed their involvement in various pathways including response to oxidative stress, immune system regulation, cell proliferation, cytoskeletal changes, kinases activity and apoptosis. Gene network analysis of these DEGs plays an important role to understand the interaction among genes. It is therefore, hypothesized that the different susceptibility to tropical theileriosis exhibited by indigenous and crossbred cattle is due to breed-specific differences in the dealing of infected cells with other immune cells, which ultimately influence the immune response responded against T. annulata infection. Copyright © 2016 Elsevier B.V. All rights reserved.
An artificial neural network to predict resting energy expenditure in obesity.
Disse, Emmanuel; Ledoux, Séverine; Bétry, Cécile; Caussy, Cyrielle; Maitrepierre, Christine; Coupaye, Muriel; Laville, Martine; Simon, Chantal
2017-09-01
The resting energy expenditure (REE) determination is important in nutrition for adequate dietary prescription. The gold standard i.e. indirect calorimetry is not available in clinical settings. Thus, several predictive equations have been developed, but they lack of accuracy in subjects with extreme weight including obese populations. Artificial neural networks (ANN) are useful predictive tools in the area of artificial intelligence, used in numerous clinical fields. The aim of this study was to determine the relevance of ANN in predicting REE in obesity. A Multi-Layer Perceptron (MLP) feed-forward neural network with a back propagation algorithm was created and cross-validated in a cohort of 565 obese subjects (BMI within 30-50 kg m -2 ) with weight, height, sex and age as clinical inputs and REE measured by indirect calorimetry as output. The predictive performances of ANN were compared to those of 23 predictive REE equations in the training set and in two independent sets of 100 and 237 obese subjects for external validation. Among the 23 established prediction equations for REE evaluated, the Harris & Benedict equations recalculated by Roza were the most accurate for the obese population, followed by the USA DRI, Müller and the original Harris & Benedict equations. The final 5-fold cross-validated three-layer 4-3-1 feed-forward back propagation ANN model developed in that study improved precision and accuracy of REE prediction over linear equations (precision = 68.1%, MAPE = 8.6% and RMSPE = 210 kcal/d), independently from BMI subgroups within 30-50 kg m -2 . External validation confirmed the better predictive performances of ANN model (precision = 73% and 65%, MAPE = 7.7% and 8.6%, RMSPE = 187 kcal/d and 200 kcal/d in the 2 independent datasets) for the prediction of REE in obese subjects. We developed and validated an ANN model for the prediction of REE in obese subjects that is more precise and accurate than established REE predictive equations independent from BMI subgroups. For convenient use in clinical settings, we provide a simple ANN-REE calculator available at: https://www.crnh-rhone-alpes.fr/fr/ANN-REE-Calculator. Copyright © 2017 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.
Indicators of Ecological Change
2005-03-01
Vine 0.07 0.26 Yellow Jasmine LM Gymnopogon ambiguus Graminae Cryptophyte Geophyte Grass 0.17 0.45 Beard grass RLD Haplopappus divaricatus Asteraceae...cross-validation procedure. The cross-validation analysis 7 determines the percentage of observations correctly classified. In essence , a cross-8
Hertegård, S; Dahlqvist, A; Goodyer, E
2006-07-01
The scarring model resulted in significant damage and elevated viscoelasticity of the lamina propria. Hyaluronan preparations may alter viscoelasticity in scarred rabbit vocal folds. Vocal fold scarring results in stiffness of the lamina propria and severe voice problems. The aims of this study were to examine the degree of scarring achieved in the experiment and to measure the viscoelastic properties after injection of hyaluronan in rabbit vocal folds. Twenty-two vocal folds from 15 New Zealand rabbits were scarred, 8 vocal folds were controls. After 8 weeks 12 of the scarred vocal folds received injections with 2 types of cross-linked hyaluronan products and 10 scarred folds were injected with saline. After 11 more weeks the animals were sacrificed. After dissection, 15 vocal folds were frozen for viscoelastic measurements, whereas 14 vocal folds were prepared and stained. Measurements were made of the lamina propria thickness. Viscoelasticity was measured on intact vocal folds with a linear skin rheometer (LSR) adapted to laryngeal measurements. Measurements on the digitized slides showed a thickened lamina propria in the scarred samples as compared with the normal vocal folds (p<0.05). The viscoelastic analysis showed a tendency to stiffening of the scarred vocal folds as compared with the normal controls (p=0.05). There was large variation in stiffness between the two injected hyaluronan products.
Lateral propagation of folding and thrust faulting at Mahan, S.E. Iran
NASA Astrophysics Data System (ADS)
Walker, R. T.
2003-12-01
Folding identified near the town of Mahan in S.E. Iran has no record of historical activity, and yet there are clear geomorphological indications of recent fold growth, presumably driven by movements on underlying thrust faults. The structures at Mahan may still be capable of producing destructive earthquakes, posing a considerable hazard to local population centres. We describe a drainage evolution that shows the effect of lateral propagation of surface folding and the effect of tilting above an underlying thrust fault. River systems cross and incise through the fold segments. Each of these rivers show a distinct deflection parallel to the fold axis. This deflection starts several kilometres into the hanging-wall of the underlying thrust fault. Remnants of several abandoned drainage channels and abandoned alluvial fans are preserved within the folds. The westward lateral propagation of folding is also suggested by an increase in relief and exposure of deeper stratigraphic layers across fold segments in the east of the system, implying a greater cumulative displacement in the east than in the west. The preservation of numerous dry valleys across the fold suggests a continual forcing of drainage around the nose of the growing fold, rather than an along strike variation in slip-rate.
Zhang, Yueliang; Han, Yangchun; Yang, Qiong; Wang, Lihua; He, Peng; Liu, Zewen; Li, Zhong; Guo, Huifang; Fang, Jichao
2018-04-01
Cycloxaprid is a new oxabridged cis-configuration neonicotinoid insecticide, the resistance development potential and underlying resistance mechanism of which were investigated in the small brown planthopper, Laodelphax striatellus (Fallén), an important agricultural pest of rice. A cycloxaprid-resistant strain (YN-CPD) only achieved 10-fold higher resistance, in contrast to 106-fold higher resistance to buprofezin and 332-fold higher resistance to chlorpyrifos achieved after exposure to similar selection pressure, and the cycloxaprid selected line showed no cross-resistance to the buprofezin and chlorpyrifos-selected resistance strains. Moreover, we identified 10 nicotinic acetylcholine receptor (nAChR) subunits from the transcriptome of L. striatellus, and six segments had open reading frames (ORFs). While we did not find mutations in the nAChR genes of L. striatellus, subunits Lsα1 and Lsβ1 exhibited, respectively, 9.60-fold and 3.36-fold higher expression in the resistant strain, while Lsα8 exhibited 0.44-fold lower expression. Suppression of Lsα1 through ingestion of dsLsα1 led to an increase in susceptibility to cycloxaprid. The findings indicate that resistance to cycloxaprid develops slowly compared with resistance to other chemicals and without cross-resistance to chlorpyrifos or buprofezin; over-expressed Lsα1 is associated with low cycloxaprid resistance levels, but the importance of over-expressed Lsβ1 and reduced expression of Lsα8 could not be excluded. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Cross-Validation of Predictor Equations for Armor Crewman Performance
1980-01-01
Technical Report 447 CROSS-VALIDATION OF PREDICTOR EQUATIONS FOR ARMOR CREWMAN PERFORMANCE Anthony J. Maitland , Newell K. Eaton, and Janet F. Neft...ORG. REPORT NUMBER Anthony J/ Maitland . Newell K/EatorV. and B OTATO RN UBR. 9- PERFORMING ORGANIZATION NAME AND ADDRESS I0. PROGRAM ELEMENT, PROJECT...Technical Report 447 CROSS-VALIDATION OF PREDICTOR EQUATIONS FOR ARMOR CREWMAN PERFORMANCE Anthony J. Maitland , Newell K. Eaton, Accession For and
Statistical validation of normal tissue complication probability models.
Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis
2012-09-01
To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.
Cross-Validation of the Africentrism Scale.
ERIC Educational Resources Information Center
Kwate, Naa Oyo A.
2003-01-01
Cross-validated the Africentrism Scale, investigating the relationship between Africentrism and demographic variables in a diverse sample of individuals of African descent. Results indicated that the scale demonstrated solid internal consistency and convergent validity. Age and education related to Africentrism, with younger and less educated…
Gianni, Stefano; Jemth, Per
2014-07-01
The only experimental strategy to address the structure of folding transition states, the so-called Φ value analysis, relies on the synergy between site directed mutagenesis and the measurement of reaction kinetics. Despite its importance, the Φ value analysis has been often criticized and its power to pinpoint structural information has been questioned. In this hypothesis, we demonstrate that comparing the Φ values between proteins not only allows highlighting the robustness of folding pathways but also provides per se a strong validation of the method. © 2014 International Union of Biochemistry and Molecular Biology.
Mu, Xi-Chao; Zhang, Wei; Wang, Li-Xiang; Zhang, Shuai; Zhang, Kai; Gao, Cong-Fen; Wu, Shun-Fan
2016-11-01
Three rice planthoppers, brown planthopper, Nilaparvata lugens, white-backed planthopper, Sogatella furcifera and small brown planthopper, Laodelphax striatellus, are important pests of cultivated rice in tropical and temperate Asia. They have caused severe economic loss and developed resistance to insecticides from most chemical classes. Dinotefuran is the third neonicotinoid which possesses a broad spectrum and systemic insecticidal activity. We determined the susceptibility of dinotefuran to field populations from major rice production areas in China from 2013 to 2015. All the populations of S. furcifera and L. striatellus were kept susceptible to dinotefuran (0.7 to 1.4-fold of S. furcifera and 1.1-to 3.4-fold of L. striatellus) However, most strains of N. lugens (except FQ15) collected in 2015 had developed moderate resistance to dinotefuran, with resistance ratios (RR) ranging from 23.1 to 100.0 folds. Cross-resistance studies showed that chlorpyrifos-resistant and buprofezin-resistant Sogatella furcifera, chlorpyrifos-resistant and fipronil-resistant L. striatellus, imidacloprid-resistant and buprofezin-resistant Nilaparvata lugens exhibited negligible or no cross-resistance to dinotefuran. Synergism tests showed that piperonyl butoxide (PBO) produced a high synergism of dinotefuran effects in the DY15 and JS15 populations (2.14 and 2.52-fold, respectively). The obvious increase in resistance to dinotefuran in N. lugens indicates that insecticide resistance management strategies are urgently needed to prevent or delay further increase of insecticide resistance in N. lugens. Copyright © 2016 Elsevier B.V. All rights reserved.
Sierakowska, Matylda; Sierakowski, Stanisław; Sierakowska, Justyna; Horton, Mike; Ndosi, Mwidimi
2015-03-01
To undertake cross-cultural adaptation and validation of the educational needs assessment tool (ENAT) for use with people with rheumatoid arthritis (RA) and systemic sclerosis (SSc) in Poland. The study involved two main phases: (1) cross-cultural adaptation of the ENAT from English into Polish and (2) Cross-cultural validation of Polish Educational Needs Assessment Tool (Pol-ENAT). The first phase followed an established process of cross-cultural adaptation of self-report measures. The second phase involved completion of the Pol-ENAT by patients and subjecting the data to Rasch analysis to assess the construct validity, unidimensionality, internal consistency and cross-cultural invariance. An adequate conceptual equivalence was achieved following the adaptation process. The dataset for validation comprised a total of 278 patients, 237 (85.3 %) of which were female. In each disease group (145, RA and 133, SSc), the 7 domains of the Pol-ENAT were found to fit the Rasch model, X (2)(df) = 16.953(14), p = 0.259 and 8.132(14), p = 0.882 for RA and SSc, respectively. Internal consistency of the Pol-ENAT was high (patient separation index = 0.85 and 0.89 for SSc and RA, respectively), and unidimensionality was confirmed. Cross-cultural differential item functioning (DIF) was detected in some subscales, and DIF-adjusted conversion tables were calibrated to enable cross-cultural comparison of data between Poland and the UK. Using a standard process in cross-cultural adaptation, conceptual equivalence was achieved between the original (UK) ENAT and the adapted Pol-ENAT. Fit to the Rasch model, confirmed that the construct validity, unidimensionality and internal consistency of the ENAT have been preserved.
The unique N-terminal zinc finger of synaptotagmin-like protein 4 reveals FYVE structure.
Miyamoto, Kazuhide; Nakatani, Arisa; Saito, Kazuki
2017-12-01
Synaptotagmin-like protein 4 (Slp4), expressed in human platelets, is associated with dense granule release. Slp4 is comprised of the N-terminal zinc finger, Slp homology domain, and C2 domains. We synthesized a compact construct (the Slp4N peptide) corresponding to the Slp4 N-terminal zinc finger. Herein, we have determined the solution structure of the Slp4N peptide by nuclear magnetic resonance (NMR). Furthermore, experimental, chemical modification of Cys residues revealed that the Slp4N peptide binds two zinc atoms to mediate proper folding. NMR data showed that eight Cys residues coordinate zinc atoms in a cross-brace fashion. The Simple Modular Architecture Research Tool database predicted the structure of Slp4N as a RING finger. However, the actual structure of the Slp4N peptide adopts a unique C 4 C 4 -type FYVE fold and is distinct from a RING fold. To create an artificial RING finger (ARF) with specific ubiquitin-conjugating enzyme (E2)-binding capability, cross-brace structures with eight zinc-ligating residues are needed as the scaffold. The cross-brace structure of the Slp4N peptide could be utilized as the scaffold for the design of ARFs. © 2017 The Protein Society.
RNAiFold 2.0: a web server and software to design custom and Rfam-based RNA molecules.
Garcia-Martin, Juan Antonio; Dotu, Ivan; Clote, Peter
2015-07-01
Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules. the web server, source code and linux binaries are publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold2.0. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Dunley, John E.; Brunner, Jay F.; Doerr, Michael D.; Beers, E. H.
2006-01-01
Insecticide bioassays of the leafrollers, Choristoneura rosaceana (Harris), and Pandemis pyrusana Kearfott (Lepidoptera: Tortricidae), were used to investigate resistance and cross-resistance between azinphosmethyl and other insecticides. Comparisons of field-collected populations with susceptible laboratory colonies of both leafroller species were made in 1996–97, prior to registration and field introduction of several of insecticides, and were re-tested in 2000–2001 following several years of use in the field. Insecticides tested included azinphosmethyl, chlorpyrifos, methyl parathion, tebufenozide, methoxyfenozide, spinosad, indoxacarb, acetamiprid, Bacillus thuringiensis, and azadirachtin. Azinphosmethyl-susceptible laboratory colonies were used for comparison to field populations. Resistance to azinphosmethyl was found in all populations of C. rosaceana (5.2–26.8 fold) and P. pyrusana (8.4–24.9 fold) collected from commercial orchards. Cross-resistance between azinphosmethyl and the insect growth regulators tebufenozide and methoxyfenozide was found in all but one population of the two leafroller species. No cross-resistance was found to chlorpyrifos. Some of the populations tested were cross-resistant to spinosad and indoxacarb, but the responses to these materials were more variable. PMID:19537964
Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
Korjus, Kristjan; Hebart, Martin N; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.
Methods to compute reliabilities for genomic predictions of feed intake
USDA-ARS?s Scientific Manuscript database
For new traits without historical reference data, cross-validation is often the preferred method to validate reliability (REL). Time truncation is less useful because few animals gain substantial REL after the truncation point. Accurate cross-validation requires separating genomic gain from pedigree...
Sekhar, Ashok; Vallurupalli, Pramodh; Kay, Lewis E
2012-11-20
Friction plays a critical role in protein folding. Frictional forces originating from random solvent and protein fluctuations both retard motion along the folding pathway and activate protein molecules to cross free energy barriers. Studies of friction thus may provide insights into the driving forces underlying protein conformational dynamics. However, the molecular origin of friction in protein folding remains poorly understood because, with the exception of the native conformer, there generally is little detailed structural information on the other states participating in the folding process. Here, we study the folding of the four-helix bundle FF domain that proceeds via a transiently formed, sparsely populated compact on-pathway folding intermediate whose structure was elucidated previously. Because the intermediate is stabilized by both native and nonnative interactions, friction in the folding transition between intermediate and folded states is expected to arise from intrachain reorganization in the protein. However, the viscosity dependencies of rates of folding from or unfolding to the intermediate, as established by relaxation dispersion NMR spectroscopy, clearly indicate that contributions from internal friction are small relative to those from solvent, so solvent frictional forces drive the folding process. Our results emphasize the importance of solvent dynamics in mediating the interconversion between protein configurations, even those that are highly compact, and in equilibrium folding/unfolding fluctuations in general.
NASA Astrophysics Data System (ADS)
Hamada, Sh.
2018-03-01
Available experimental data for protons elastically scattered from 14N and 16O target nuclei are reanalyzed within the framework of single folding optical potential (SFOP) model. In this model, the real part of the potential is derived on the basis of single folding potential. The renormalization factor N r is extracted for the two aforementioned nuclear systems. Theoretical calculations fairly reproduce the experimental data in the whole angular range. Energy dependence of real and imaginary volume integrals as well as reaction cross sections are discussed.
Crossed wires: 3D genome misfolding in human disease.
Norton, Heidi K; Phillips-Cremins, Jennifer E
2017-11-06
Mammalian genomes are folded into unique topological structures that undergo precise spatiotemporal restructuring during healthy development. Here, we highlight recent advances in our understanding of how the genome folds inside the 3D nucleus and how these folding patterns are miswired during the onset and progression of mammalian disease states. We discuss potential mechanisms underlying the link among genome misfolding, genome dysregulation, and aberrant cellular phenotypes. We also discuss cases in which the endogenous 3D genome configurations in healthy cells might be particularly susceptible to mutation or translocation. Together, these data support an emerging model in which genome folding and misfolding is critically linked to the onset and progression of a broad range of human diseases. © 2017 Norton and Phillips-Cremins.
Tiong, H Y; Goldfarb, D A; Kattan, M W; Alster, J M; Thuita, L; Yu, C; Wee, A; Poggio, E D
2009-03-01
We developed nomograms that predict transplant renal function at 1 year (Modification of Diet in Renal Disease equation [estimated glomerular filtration rate]) and 5-year graft survival after living donor kidney transplantation. Data for living donor renal transplants were obtained from the United Network for Organ Sharing registry for 2000 to 2003. Nomograms were designed using linear or Cox regression models to predict 1-year estimated glomerular filtration rate and 5-year graft survival based on pretransplant information including demographic factors, immunosuppressive therapy, immunological factors and organ procurement technique. A third nomogram was constructed to predict 5-year graft survival using additional information available by 6 months after transplantation. These data included delayed graft function, any treated rejection episodes and the 6-month estimated glomerular filtration rate. The nomograms were internally validated using 10-fold cross-validation. The renal function nomogram had an r-square value of 0.13. It worked best when predicting estimated glomerular filtration rate values between 50 and 70 ml per minute per 1.73 m(2). The 5-year graft survival nomograms had a concordance index of 0.71 for the pretransplant nomogram and 0.78 for the 6-month posttransplant nomogram. Calibration was adequate for all nomograms. Nomograms based on data from the United Network for Organ Sharing registry have been validated to predict the 1-year estimated glomerular filtration rate and 5-year graft survival. These nomograms may facilitate individualized patient care in living donor kidney transplantation.
Adam, B A; Smith, R N; Rosales, I A; Matsunami, M; Afzali, B; Oura, T; Cosimi, A B; Kawai, T; Colvin, R B; Mengel, M
2017-11-01
Molecular testing represents a promising adjunct for the diagnosis of antibody-mediated rejection (AMR). Here, we apply a novel gene expression platform in sequential formalin-fixed paraffin-embedded samples from nonhuman primate (NHP) renal transplants. We analyzed 34 previously described gene transcripts related to AMR in humans in 197 archival NHP samples, including 102 from recipients that developed chronic AMR, 80 from recipients without AMR, and 15 normal native nephrectomies. Three endothelial genes (VWF, DARC, and CAV1), derived from 10-fold cross-validation receiver operating characteristic curve analysis, demonstrated excellent discrimination between AMR and non-AMR samples (area under the curve = 0.92). This three-gene set correlated with classic features of AMR, including glomerulitis, capillaritis, glomerulopathy, C4d deposition, and DSAs (r = 0.39-0.63, p < 0.001). Principal component analysis confirmed the association between three-gene set expression and AMR and highlighted the ambiguity of v lesions and ptc lesions between AMR and T cell-mediated rejection (TCMR). Elevated three-gene set expression corresponded with the development of immunopathological evidence of rejection and often preceded it. Many recipients demonstrated mixed AMR and TCMR, suggesting that this represents the natural pattern of rejection. These data provide NHP animal model validation of recent updates to the Banff classification including the assessment of molecular markers for diagnosing AMR. © 2017 The American Society of Transplantation and the American Society of Transplant Surgeons.
Genomic Prediction Accounting for Residual Heteroskedasticity
Ou, Zhining; Tempelman, Robert J.; Steibel, Juan P.; Ernst, Catherine W.; Bates, Ronald O.; Bello, Nora M.
2015-01-01
Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. PMID:26564950
Uniting statistical and individual-based approaches for animal movement modelling.
Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel
2014-01-01
The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.
Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling
Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel
2014-01-01
The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047
A Planet Hunters Search of the Kepler TCE Inventory
NASA Astrophysics Data System (ADS)
Schwamb, Meg; Lintott, Chris; Fischer, Debra; Smith, Arfon; Boyajian, Tabetha; Brewer, John; Giguere, Matt; Lynn, Stuart; Schawinski, Kevin; Simpson, Rob; Wang, Ji
2013-07-01
NASA's Kepler spacecraft has spent the past 4 years monitoring ~160,000 stars for the signatures of transiting exoplanets. Planet Hunters (http://www.planethunters.org), part of the Zooniverse (http://www.zooniverse.org) collection of citizen science projects, uses the power of human pattern recognition via the World Wide Web to identify transits in the Kepler public data. We have demonstrated the success of a citizen science approach with the project's discoveries including PH1 b, a transiting circumbinary planet in a four star system., and over 20 previously unknown planet candidates. The Kepler team has released the list of 18,406 potential transit signals or threshold-crossing events (TCEs) identified in Quarters 1-12 (~1000 days) by their automated Transit Planet Search (TPS) algorithm. The majority of these detections found by TPS are triggered by transient events and are not valid planet candidates. To identify planetary candidates from the detected TCEs, a human review of the validation reports, generated by the Kepler pipeline for each TCE, is performed by several Kepler team members. We have undertaken an independent crowd-sourced effort to perform a systematic search of the Kepler Q1-12 TCE list. With the Internet we can obtain multiple assessments of each TCE's data validation report. Planet Hunters volunteers evaluate whether a transit is visible in the Kepler light curve folded on the expected period identified by TPS. We present the first results of this analysis.
Classification of burn wounds using support vector machines
NASA Astrophysics Data System (ADS)
Acha, Begona; Serrano, Carmen; Palencia, Sergio; Murillo, Juan Jose
2004-05-01
The purpose of this work is to improve a previous method developed by the authors for the classification of burn wounds into their depths. The inputs of the system are color and texture information, as these are the characteristics observed by physicians in order to give a diagnosis. Our previous work consisted in segmenting the burn wound from the rest of the image and classifying the burn into its depth. In this paper we focus on the classification problem only. We already proposed to use a Fuzzy-ARTMAP neural network (NN). However, we may take advantage of new powerful classification tools such as Support Vector Machines (SVM). We apply the five-folded cross validation scheme to divide the database into training and validating sets. Then, we apply a feature selection method for each classifier, which will give us the set of features that yields the smallest classification error for each classifier. Features used to classify are first-order statistical parameters extracted from the L*, u* and v* color components of the image. The feature selection algorithms used are the Sequential Forward Selection (SFS) and the Sequential Backward Selection (SBS) methods. As data of the problem faced here are not linearly separable, the SVM was trained using some different kernels. The validating process shows that the SVM method, when using a Gaussian kernel of variance 1, outperforms classification results obtained with the rest of the classifiers, yielding an error classification rate of 0.7% whereas the Fuzzy-ARTMAP NN attained 1.6 %.
Küçükdeveci, Ayse A; Sahin, Hülya; Ataman, Sebnem; Griffiths, Bridget; Tennant, Alan
2004-02-15
Guidelines have been established for cross-cultural adaptation of outcome measures. However, invariance across cultures must also be demonstrated through analysis of Differential Item Functioning (DIF). This is tested in the context of a Turkish adaptation of the Health Assessment Questionnaire (HAQ). Internal construct validity of the adapted HAQ is assessed by Rasch analysis; reliability, by internal consistency and the intraclass correlation coefficient; external construct validity, by association with impairments and American College of Rheumatology functional stages. Cross-cultural validity is tested through DIF by comparison with data from the UK version of the HAQ. The adapted version of the HAQ demonstrated good internal construct validity through fit of the data to the Rasch model (mean item fit 0.205; SD 0.998). Reliability was excellent (alpha = 0.97) and external construct validity was confirmed by expected associations. DIF for culture was found in only 1 item. Cross-cultural validity was found to be sufficient for use in international studies between the UK and Turkey. Future adaptation of instruments should include analysis of DIF at the field testing stage in the adaptation process.
Shi, Jade; Nobrega, R. Paul; Schwantes, Christian; ...
2017-03-08
The dynamics of globular proteins can be described in terms of transitions between a folded native state and less-populated intermediates, or excited states, which can play critical roles in both protein folding and function. Excited states are by definition transient species, and therefore are difficult to characterize using current experimental techniques. We report an atomistic model of the excited state ensemble of a stabilized mutant of an extensively studied flavodoxin fold protein CheY. We employed a hybrid simulation and experimental approach in which an aggregate 42 milliseconds of all-atom molecular dynamics were used as an informative prior for the structuremore » of the excited state ensemble. The resulting prior was then refined against small-angle X-ray scattering (SAXS) data employing an established method (EROS). The most striking feature of the resulting excited state ensemble was an unstructured N-terminus stabilized by non-native contacts in a conformation that is topologically simpler than the native state. We then predict incisive single molecule FRET experiments, using these results, as a means of model validation. Our study demonstrates the paradigm of uniting simulation and experiment in a statistical model to study the structure of protein excited states and rationally design validating experiments.« less
NASA Astrophysics Data System (ADS)
Shi, Jade; Nobrega, R. Paul; Schwantes, Christian; Kathuria, Sagar V.; Bilsel, Osman; Matthews, C. Robert; Lane, T. J.; Pande, Vijay S.
2017-03-01
The dynamics of globular proteins can be described in terms of transitions between a folded native state and less-populated intermediates, or excited states, which can play critical roles in both protein folding and function. Excited states are by definition transient species, and therefore are difficult to characterize using current experimental techniques. Here, we report an atomistic model of the excited state ensemble of a stabilized mutant of an extensively studied flavodoxin fold protein CheY. We employed a hybrid simulation and experimental approach in which an aggregate 42 milliseconds of all-atom molecular dynamics were used as an informative prior for the structure of the excited state ensemble. This prior was then refined against small-angle X-ray scattering (SAXS) data employing an established method (EROS). The most striking feature of the resulting excited state ensemble was an unstructured N-terminus stabilized by non-native contacts in a conformation that is topologically simpler than the native state. Using these results, we then predict incisive single molecule FRET experiments as a means of model validation. This study demonstrates the paradigm of uniting simulation and experiment in a statistical model to study the structure of protein excited states and rationally design validating experiments.
Predicting RNA pseudoknot folding thermodynamics
Cao, Song; Chen, Shi-Jie
2006-01-01
Based on the experimentally determined atomic coordinates for RNA helices and the self-avoiding walks of the P (phosphate) and C4 (carbon) atoms in the diamond lattice for the polynucleotide loop conformations, we derive a set of conformational entropy parameters for RNA pseudoknots. Based on the entropy parameters, we develop a folding thermodynamics model that enables us to compute the sequence-specific RNA pseudoknot folding free energy landscape and thermodynamics. The model is validated through extensive experimental tests both for the native structures and for the folding thermodynamics. The model predicts strong sequence-dependent helix-loop competitions in the pseudoknot stability and the resultant conformational switches between different hairpin and pseudoknot structures. For instance, for the pseudoknot domain of human telomerase RNA, a native-like and a misfolded hairpin intermediates are found to coexist on the (equilibrium) folding pathways, and the interplay between the stabilities of these intermediates causes the conformational switch that may underlie a human telomerase disease. PMID:16709732
Gupta, Punkaj; Rettiganti, Mallikarjuna; Gossett, Jeffrey M; Daufeldt, Jennifer; Rice, Tom B; Wetzel, Randall C
2018-01-01
To create a novel tool to predict favorable neurologic outcomes during ICU stay among children with critical illness. Logistic regression models using adaptive lasso methodology were used to identify independent factors associated with favorable neurologic outcomes. A mixed effects logistic regression model was used to create the final prediction model including all predictors selected from the lasso model. Model validation was performed using a 10-fold internal cross-validation approach. Virtual Pediatric Systems (VPS, LLC, Los Angeles, CA) database. Patients less than 18 years old admitted to one of the participating ICUs in the Virtual Pediatric Systems database were included (2009-2015). None. A total of 160,570 patients from 90 hospitals qualified for inclusion. Of these, 1,675 patients (1.04%) were associated with a decline in Pediatric Cerebral Performance Category scale by at least 2 between ICU admission and ICU discharge (unfavorable neurologic outcome). The independent factors associated with unfavorable neurologic outcome included higher weight at ICU admission, higher Pediatric Index of Morality-2 score at ICU admission, cardiac arrest, stroke, seizures, head/nonhead trauma, use of conventional mechanical ventilation and high-frequency oscillatory ventilation, prolonged hospital length of ICU stay, and prolonged use of mechanical ventilation. The presence of chromosomal anomaly, cardiac surgery, and utilization of nitric oxide were associated with favorable neurologic outcome. The final online prediction tool can be accessed at https://soipredictiontool.shinyapps.io/GNOScore/. Our model predicted 139,688 patients with favorable neurologic outcomes in an internal validation sample when the observed number of patients with favorable neurologic outcomes was among 139,591 patients. The area under the receiver operating curve for the validation model was 0.90. This proposed prediction tool encompasses 20 risk factors into one probability to predict favorable neurologic outcome during ICU stay among children with critical illness. Future studies should seek external validation and improved discrimination of this prediction tool.
The Cross Validation of the Attitudes toward Mainstreaming Scale (ATMS).
ERIC Educational Resources Information Center
Berryman, Joan D.; Neal, W. R. Jr.
1980-01-01
Reliability and factorial validity of the Attitudes Toward Mainstreaming Scale was supported in a cross-validation study with teachers. Three factors emerged: learning capability, general mainstreaming, and traditional limiting disabilities. Factor intercorrelations varied from .42 to .55; correlations between total scores and individual factors…
Electroproduction of pπ+π- off protons at 0.2
NASA Astrophysics Data System (ADS)
Fedotov, G. V.; Mokeev, V. I.; Burkert, V. D.; Elouadrhiri, L.; Golovatch, E. N.; Ishkhanov, B. S.; Isupov, E. L.; Shvedunov, N. V.; Adams, G.; Amaryan, M. J.; Ambrozewicz, P.; Anghinolfi, M.; Asavapibhop, B.; Asryan, G.; Avakian, H.; Baghdasaryan, H.; Baillie, N.; Ball, J. P.; Baltzell, N. A.; Batourine, V.; Battaglieri, M.; Bedlinskiy, I.; Bektasoglu, M.; Bellis, M.; Benmouna, N.; Biselli, A. S.; Bonner, B. E.; Bouchigny, S.; Boiarinov, S.; Bradford, R.; Branford, D.; Brooks, W. K.; Bültmann, S.; Butuceanu, C.; Calarco, J. R.; Careccia, S. L.; Carman, D. S.; Carnahan, B.; Chen, S.; Cole, P. L.; Coltharp, P.; Corvisiero, P.; Crabb, D.; Crannell, H.; Crede, V.; Cummings, J. P.; Dashyan, N. B.; Sanctis, E. De; Vita, R. De; Degtyarenko, P. V.; Denizli, H.; Dennis, L.; Dharmawardane, K. V.; Dickson, R.; Djalali, C.; Dodge, G. E.; Donnelly, J.; Doughty, D.; Dugger, M.; Dytman, S.; Dzyubak, O. P.; Egiyan, H.; Egiyan, K. S.; Eugenio, P.; Fatemi, R.; Feuerbach, R. J.; Forest, T. A.; Funsten, H.; Gavalian, G.; Gevorgyan, N. G.; Gilfoyle, G. P.; Giovanetti, K. L.; Girod, F. X.; Goetz, J. T.; Gothe, R. W.; Griffioen, K. A.; Guidal, M.; Guillo, M.; Guler, N.; Guo, L.; Gyurjyan, V.; Hadjidakis, C.; Hardie, J.; Hassall, N.; Hersman, F. W.; Hicks, K.; Hleiqawi, I.; Holtrop, M.; Hu, J.; Huertas, M.; Hyde, C. E.; Ilieva, Y.; Ireland, D. G.; Ito, M. M.; Jenkins, D.; Jo, H. S.; Joo, K.; Juengst, H. G.; Kellie, J. D.; Khandaker, M.; Kim, K. Y.; Kim, K.; Kim, W.; Klein, A.; Klein, F. J.; Klimenko, A.; Klusman, M.; Krahn, Z.; Kramer, L. H.; Kubarovsky, V.; Kuhn, J.; Kuhn, S. E.; Kuleshov, S.; Lachniet, J.; Laget, J. M.; Langheinrich, J.; Lawrence, D.; Lee, T.; Livingston, K.; Markov, N.; McCracken, M.; McKinnon, B.; McNabb, J. W. C.; Mecking, B. A.; Mestayer, M. D.; Meyer, C. A.; Mibe, T.; Mikhailov, K.; Mineeva, T.; Minehart, R.; Mirazita, M.; Miskimen, R.; Moriya, K.; Morrow, S. A.; Mueller, J.; Mutchler, G. S.; Nadel-Turonski, P.; Nasseripour, R.; Niccolai, S.; Niculescu, G.; Niculescu, I.; Niczyporuk, B. B.; Niyazov, R. A.; O'Rielly, G. V.; Osipenko, M.; Ostrovidov, A. I.; Park, K.; Pasyuk, E.; Paterson, C.; Pierce, J.; Pivnyuk, N.; Pocanic, D.; Pogorelko, O.; Pozdniakov, S.; Price, J. W.; Prok, Y.; Protopopescu, D.; Raue, B. A.; Ricco, G.; Ripani, M.; Ritchie, B. G.; Rosner, G.; Rossi, P.; Rowntree, D.; Rubin, P. D.; Sabatié, F.; Salgado, C.; Santoro, J. P.; Sapunenko, V.; Schumacher, R. A.; Serov, V. S.; Sharabian, Y. G.; Sharov, D.; Shaw, J.; Smith, E. S.; Smith, L. C.; Sober, D. I.; Stavinsky, A.; Stepanyan, S.; Stokes, B. E.; Stoler, P.; Stopani, K.; Strauch, S.; Taiuti, M.; Taylor, S.; Tedeschi, D. J.; Thompson, R.; Tkabladze, A.; Tkachenko, S.; Todor, L.; Tur, C.; Ungaro, M.; Vineyard, M. F.; Vlassov, A. V.; Weinstein, L. B.; Weygand, D. P.; Williams, M.; Wolin, E.; Wood, M. H.; Yegneswaran, A.; Zana, L.; Zhang, J.
2009-01-01
This paper reports on the most comprehensive data set obtained on differential and fully integrated cross sections for the process ep→e'pπ+π-. The data were collected with the CLAS detector at Jefferson Laboratory. Measurements were carried out in the as yet unexplored kinematic region of photon virtuality 0.2
Thomas, Reuben; Thomas, Russell S.; Auerbach, Scott S.; Portier, Christopher J.
2013-01-01
Background Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. Objectives To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Methods Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Results Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Conclusions Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species. PMID:23737943
Lee, Kyoungyeul; Lee, Minho; Kim, Dongsup
2017-12-28
The identification of target molecules is important for understanding the mechanism of "target deconvolution" in phenotypic screening and "polypharmacology" of drugs. Because conventional methods of identifying targets require time and cost, in-silico target identification has been considered an alternative solution. One of the well-known in-silico methods of identifying targets involves structure activity relationships (SARs). SARs have advantages such as low computational cost and high feasibility; however, the data dependency in the SAR approach causes imbalance of active data and ambiguity of inactive data throughout targets. We developed a ligand-based virtual screening model comprising 1121 target SAR models built using a random forest algorithm. The performance of each target model was tested by employing the ROC curve and the mean score using an internal five-fold cross validation. Moreover, recall rates for top-k targets were calculated to assess the performance of target ranking. A benchmark model using an optimized sampling method and parameters was examined via external validation set. The result shows recall rates of 67.6% and 73.9% for top-11 (1% of the total targets) and top-33, respectively. We provide a website for users to search the top-k targets for query ligands available publicly at http://rfqsar.kaist.ac.kr . The target models that we built can be used for both predicting the activity of ligands toward each target and ranking candidate targets for a query ligand using a unified scoring scheme. The scores are additionally fitted to the probability so that users can estimate how likely a ligand-target interaction is active. The user interface of our web site is user friendly and intuitive, offering useful information and cross references.
Detecting intentional insulin omission for weight loss in girls with type 1 diabetes mellitus.
Pinhas-Hamiel, Orit; Hamiel, Uri; Greenfield, Yuval; Boyko, Valentina; Graph-Barel, Chana; Rachmiel, Marianna; Lerner-Geva, Liat; Reichman, Brian
2013-12-01
Intentional insulin omission is a unique inappropriate compensatory behavior that occurs in patients with type 1 diabetes mellitus, mostly in females, who omit or restrict their required insulin doses in order to lose weight. Diagnosis of this underlying disorder is difficult. We aimed to use clinical and laboratory criteria to create an algorithm to assist in the detection of intentional insulin omission. The distribution of HbA1c levels from 287 (181 females) patients with type 1 diabetes were used as reference. Data from 26 patients with type 1 diabetes and intentional insulin omission were analysed. The Weka (Waikato Environment for Knowledge Analysis) machine learning software, decision tree classifier with 10-fold cross validation was used to developed prediction models. Model performance was assessed by cross-validation in a further 43 patients. Adolescents with intentional insulin omission were discriminated by: female sex, HbA1c>9.2%, more than 20% of HbA1c measurements above the 90th percentile, the mean of 3 highest delta HbA1c z-scores>1.28, current age and age at diagnosis. The models developed showed good discrimination (sensitivity and specificity 0.88 and 0.74, respectively). The external test dataset revealed good performance of the model with a sensitivity and specificity of 1.00 and 0.97, respectively. Using data mining methods we developed a clinical prediction model to determine an individual's probability of intentionally omitting insulin. This model provides a decision support system for the detection of intentional insulin omission for weight loss in adolescent females with type 1 diabetes mellitus. Copyright © 2013 Wiley Periodicals, Inc.
Thomas, Reuben; Thomas, Russell S; Auerbach, Scott S; Portier, Christopher J
2013-01-01
Several groups have employed genomic data from subchronic chemical toxicity studies in rodents (90 days) to derive gene-centric predictors of chronic toxicity and carcinogenicity. Genes are annotated to belong to biological processes or molecular pathways that are mechanistically well understood and are described in public databases. To develop a molecular pathway-based prediction model of long term hepatocarcinogenicity using 90-day gene expression data and to evaluate the performance of this model with respect to both intra-species, dose-dependent and cross-species predictions. Genome-wide hepatic mRNA expression was retrospectively measured in B6C3F1 mice following subchronic exposure to twenty-six (26) chemicals (10 were positive, 2 equivocal and 14 negative for liver tumors) previously studied by the US National Toxicology Program. Using these data, a pathway-based predictor model for long-term liver cancer risk was derived using random forests. The prediction model was independently validated on test sets associated with liver cancer risk obtained from mice, rats and humans. Using 5-fold cross validation, the developed prediction model had reasonable predictive performance with the area under receiver-operator curve (AUC) equal to 0.66. The developed prediction model was then used to extrapolate the results to data associated with rat and human liver cancer. The extrapolated model worked well for both extrapolated species (AUC value of 0.74 for rats and 0.91 for humans). The prediction models implied a balanced interplay between all pathway responses leading to carcinogenicity predictions. Pathway-based prediction models estimated from sub-chronic data hold promise for predicting long-term carcinogenicity and also for its ability to extrapolate results across multiple species.
NASA Astrophysics Data System (ADS)
Murugesan, Gowtham; Saghafi, Behrouz; Davenport, Elizabeth; Wagner, Ben; Urban, Jillian; Kelley, Mireille; Jones, Derek; Powers, Alex; Whitlow, Christopher; Stitzel, Joel; Maldjian, Joseph; Montillo, Albert
2018-02-01
The effect of repetitive sub-concussive head impact exposure in contact sports like American football on brain health is poorly understood, especially in the understudied populations of youth and high school players. These players, aged 9-18 years old may be particularly susceptible to impact exposure as their brains are undergoing rapid maturation. This study helps fill the void by quantifying the association between head impact exposure and functional connectivity, an important aspect of brain health measurable via resting-state fMRI (rs-fMRI). The contributions of this paper are three fold. First, the data from two separate studies (youth and high school) are combined to form a high-powered analysis with 60 players. These players experience head acceleration within overlapping impact exposure making their combination particularly appropriate. Second, multiple features are extracted from rs-fMRI and tested for their association with impact exposure. One type of feature is the power spectral density decomposition of intrinsic, spatially distributed networks extracted via independent components analysis (ICA). Another feature type is the functional connectivity between brain regions known often associated with mild traumatic brain injury (mTBI). Third, multiple supervised machine learning algorithms are evaluated for their stability and predictive accuracy in a low bias, nested cross-validation modeling framework. Each classifier predicts whether a player sustained low or high levels of head impact exposure. The nested cross validation reveals similarly high classification performance across the feature types, and the Support Vector, Extremely randomized trees, and Gradboost classifiers achieve F1-score up to 75%.
Binquet, C; Abrahamowicz, M; Mahboubi, A; Jooste, V; Faivre, J; Bonithon-Kopp, C; Quantin, C
2008-12-30
Flexible survival models, which avoid assumptions about hazards proportionality (PH) or linearity of continuous covariates effects, bring the issues of model selection to a new level of complexity. Each 'candidate covariate' requires inter-dependent decisions regarding (i) its inclusion in the model, and representation of its effects on the log hazard as (ii) either constant over time or time-dependent (TD) and, for continuous covariates, (iii) either loglinear or non-loglinear (NL). Moreover, 'optimal' decisions for one covariate depend on the decisions regarding others. Thus, some efficient model-building strategy is necessary.We carried out an empirical study of the impact of the model selection strategy on the estimates obtained in flexible multivariable survival analyses of prognostic factors for mortality in 273 gastric cancer patients. We used 10 different strategies to select alternative multivariable parametric as well as spline-based models, allowing flexible modeling of non-parametric (TD and/or NL) effects. We employed 5-fold cross-validation to compare the predictive ability of alternative models.All flexible models indicated significant non-linearity and changes over time in the effect of age at diagnosis. Conventional 'parametric' models suggested the lack of period effect, whereas more flexible strategies indicated a significant NL effect. Cross-validation confirmed that flexible models predicted better mortality. The resulting differences in the 'final model' selected by various strategies had also impact on the risk prediction for individual subjects.Overall, our analyses underline (a) the importance of accounting for significant non-parametric effects of covariates and (b) the need for developing accurate model selection strategies for flexible survival analyses. Copyright 2008 John Wiley & Sons, Ltd.
Ariyama, Kaoru; Aoyama, Yoshinori; Mochizuki, Akashi; Homura, Yuji; Kadokura, Masashi; Yasui, Akemi
2007-01-24
Onions (Allium cepa L.) are produced in many countries and are one of the most popular vegetables in the world, thus leading to an enormous amount of international trade. It is currently important that a scientific technique be developed for determining geographic origin as a means to detect fraudulent labeling. We have therefore developed a technique based on mineral analysis and linear discriminant analysis (LDA). The onion samples used in this study were from Hokkaido, Hyogo, and Saga, which are the primary onion-growing areas in Japan, and those from countries that export onions to Japan (China, the United States, New Zealand, Thailand, Australia, and Chile). Of 309 samples, 108 were from Hokkaido, 52 were from Saga, 77 were from Hyogo, and 72 were from abroad. Fourteen elements (Na, Mg, P, Mn, Co, Ni, Cu, Zn, Rb, Sr, Mo, Cd, Cs, and Ba) in the samples were determined by frame atomic adsorption spectrometry, inductively coupled plasma optical emission spectrometry, and inductively coupled plasma mass spectrometry. The models established by LDA were used to discriminate the geographic origin between Hokkaido and abroad, Hyogo and abroad, and Saga and abroad. Ten-fold cross-validations were conducted using these models. The discrimination accuracies obtained by cross-validation between Hokkaido and abroad were 100 and 86%, respectively. Those between Hyogo and abroad were 100 and 90%, respectively. Those between Saga and abroad were 98 and 90%, respectively. In addition, it was demonstrated that the fingerprint of an element pattern from a specific production area, which a crop receives, did not easily change by the variations of fertilization, crop year, variety, soil type, and production year if appropriate elements were chosen.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Shiju; Qian, Wei; Guan, Yubao
2016-06-15
Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initiallymore » computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.« less
A Bayesian approach to in silico blood-brain barrier penetration modeling.
Martins, Ines Filipa; Teixeira, Ana L; Pinheiro, Luis; Falcao, Andre O
2012-06-25
The human blood-brain barrier (BBB) is a membrane that protects the central nervous system (CNS) by restricting the passage of solutes. The development of any new drug must take into account its existence whether for designing new molecules that target components of the CNS or, on the other hand, to find new substances that should not penetrate the barrier. Several studies in the literature have attempted to predict BBB penetration, so far with limited success and few, if any, application to real world drug discovery and development programs. Part of the reason is due to the fact that only about 2% of small molecules can cross the BBB, and the available data sets are not representative of that reality, being generally biased with an over-representation of molecules that show an ability to permeate the BBB (BBB positives). To circumvent this limitation, the current study aims to devise and use a new approach based on Bayesian statistics, coupled with state-of-the-art machine learning methods to produce a robust model capable of being applied in real-world drug research scenarios. The data set used, gathered from the literature, totals 1970 curated molecules, one of the largest for similar studies. Random Forests and Support Vector Machines were tested in various configurations against several chemical descriptor set combinations. Models were tested in a 5-fold cross-validation process, and the best one tested over an independent validation set. The best fitted model produced an overall accuracy of 95%, with a mean square contingency coefficient (ϕ) of 0.74, and showing an overall capacity for predicting BBB positives of 83% and 96% for determining BBB negatives. This model was adapted into a Web based tool made available for the whole community at http://b3pp.lasige.di.fc.ul.pt.
Piette, Elizabeth R; Moore, Jason H
2018-01-01
Machine learning methods and conventions are increasingly employed for the analysis of large, complex biomedical data sets, including genome-wide association studies (GWAS). Reproducibility of machine learning analyses of GWAS can be hampered by biological and statistical factors, particularly so for the investigation of non-additive genetic interactions. Application of traditional cross validation to a GWAS data set may result in poor consistency between the training and testing data set splits due to an imbalance of the interaction genotypes relative to the data as a whole. We propose a new cross validation method, proportional instance cross validation (PICV), that preserves the original distribution of an independent variable when splitting the data set into training and testing partitions. We apply PICV to simulated GWAS data with epistatic interactions of varying minor allele frequencies and prevalences and compare performance to that of a traditional cross validation procedure in which individuals are randomly allocated to training and testing partitions. Sensitivity and positive predictive value are significantly improved across all tested scenarios for PICV compared to traditional cross validation. We also apply PICV to GWAS data from a study of primary open-angle glaucoma to investigate a previously-reported interaction, which fails to significantly replicate; PICV however improves the consistency of testing and training results. Application of traditional machine learning procedures to biomedical data may require modifications to better suit intrinsic characteristics of the data, such as the potential for highly imbalanced genotype distributions in the case of epistasis detection. The reproducibility of genetic interaction findings can be improved by considering this variable imbalance in cross validation implementation, such as with PICV. This approach may be extended to problems in other domains in which imbalanced variable distributions are a concern.
Measurement of flow separation in a human vocal folds model
NASA Astrophysics Data System (ADS)
Šidlof, Petr; Doaré, Olivier; Cadot, Olivier; Chaigne, Antoine
2011-07-01
The paper provides experimental data on flow separation from a model of the human vocal folds. Data were measured on a four times scaled physical model, where one vocal fold was fixed and the other oscillated due to fluid-structure interaction. The vocal folds were fabricated from silicone rubber and placed on elastic support in the wall of a transparent wind tunnel. A PIV system was used to visualize the flow fields immediately downstream of the glottis and to measure the velocity fields. From the visualizations, the position of the flow separation point was evaluated using a semiautomatic procedure and plotted for different airflow velocities. The separation point position was quantified relative to the orifice width separately for the left and right vocal folds to account for flow asymmetry. The results indicate that the flow separation point remains close to the narrowest cross-section during most of the vocal fold vibration cycle, but moves significantly further downstream shortly prior to and after glottal closure.
Arouri, Rabeh; Le Goff, Gaelle; Hemden, Hiethem; Navarro-Llopis, Vicente; M'saad, Mariem; Castañera, Pedro; Feyereisen, René; Hernández-Crespo, Pedro; Ortego, Félix
2015-09-01
The withdrawal of malathion in the European Union in 2009 resulted in a large increase in lambda-cyhalothrin applications for the control of the Mediterranean fruit fly, Ceratitis capitata, in Spanish citrus crops. Spanish field populations of C. capitata have developed resistance to lambda-cyhalothrin (6-14-fold), achieving LC50 values (129-287 ppm) higher than the recommended concentration for field treatments (125 ppm). These results contrast with the high susceptibility to lambda-cyhalothrin found in three Tunisian field populations. We have studied the mechanism of resistance in the laboratory-selected resistant strain W-1Kλ (205-fold resistance). Bioassays with synergists showed that resistance was almost completely suppressed by the P450 inhibitor PBO. The study of the expression of 53 P450 genes belonging to the CYP4, CYP6, CYP9 and CYP12 families in C. capitata revealed that CYP6A51 was overexpressed (13-18-fold) in the resistant strain. The W-1Kλ strain also showed high levels of cross-resistance to etofenprox (240-fold) and deltamethrin (150-fold). Field-evolved resistance to lambda-cyhalothrin has been found in C. capitata. Metabolic resistance mediated by P450 appears to be the main resistance mechanism in the resistant strain W-1Kλ. The levels of cross-resistance found may compromise the effectiveness of other pyrethroids for the control of this species. © 2014 Society of Chemical Industry. © 2014 Society of Chemical Industry.
López-Linares, Karen; Aranjuelo, Nerea; Kabongo, Luis; Maclair, Gregory; Lete, Nerea; Ceresa, Mario; García-Familiar, Ainhoa; Macía, Iván; González Ballester, Miguel A
2018-05-01
Computerized Tomography Angiography (CTA) based follow-up of Abdominal Aortic Aneurysms (AAA) treated with Endovascular Aneurysm Repair (EVAR) is essential to evaluate the progress of the patient and detect complications. In this context, accurate quantification of post-operative thrombus volume is required. However, a proper evaluation is hindered by the lack of automatic, robust and reproducible thrombus segmentation algorithms. We propose a new fully automatic approach based on Deep Convolutional Neural Networks (DCNN) for robust and reproducible thrombus region of interest detection and subsequent fine thrombus segmentation. The DetecNet detection network is adapted to perform region of interest extraction from a complete CTA and a new segmentation network architecture, based on Fully Convolutional Networks and a Holistically-Nested Edge Detection Network, is presented. These networks are trained, validated and tested in 13 post-operative CTA volumes of different patients using a 4-fold cross-validation approach to provide more robustness to the results. Our pipeline achieves a Dice score of more than 82% for post-operative thrombus segmentation and provides a mean relative volume difference between ground truth and automatic segmentation that lays within the experienced human observer variance without the need of human intervention in most common cases. Copyright © 2018 Elsevier B.V. All rights reserved.
Hsu, David
2015-09-27
Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression,more » also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.« less
Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan
2015-06-01
Conventional clinical decision support systems are based on individual classifiers or simple combination of these classifiers which tend to show moderate performance. This research paper presents a novel classifier ensemble framework based on enhanced bagging approach with multi-objective weighted voting scheme for prediction and analysis of heart disease. The proposed model overcomes the limitations of conventional performance by utilizing an ensemble of five heterogeneous classifiers: Naïve Bayes, linear regression, quadratic discriminant analysis, instance based learner and support vector machines. Five different datasets are used for experimentation, evaluation and validation. The datasets are obtained from publicly available data repositories. Effectiveness of the proposed ensemble is investigated by comparison of results with several classifiers. Prediction results of the proposed ensemble model are assessed by ten fold cross validation and ANOVA statistics. The experimental evaluation shows that the proposed framework deals with all type of attributes and achieved high diagnosis accuracy of 84.16 %, 93.29 % sensitivity, 96.70 % specificity, and 82.15 % f-measure. The f-ratio higher than f-critical and p value less than 0.05 for 95 % confidence interval indicate that the results are extremely statistically significant for most of the datasets.
A New Protocol to Evaluate the Effect of Topical Anesthesia
List, Thomas; Mojir, Katerina; Svensson, Peter; Pigg, Maria
2014-01-01
This double-blind, placebo-controlled, randomized cross-over clinical experimental study tested the reliability, validity, and sensitivity to change of punctuate pain thresholds and self-reported pain on needle penetration. Female subjects without orofacial pain were tested in 2 sessions at 1- to 2-week intervals. The test site was the mucobuccal fold adjacent to the first upper right premolar. Active lidocaine hydrochloride 2% (Dynexan) or placebo gel was applied for 5 minutes, and sensory testing was performed before and after application. The standardized quantitative sensory test protocol included mechanical pain threshold (MPT), pressure pain threshold (PPT), mechanical pain sensitivity (MPS), and needle penetration sensitivity (NPS) assessments. Twenty-nine subjects, mean (SD) age 29.0 (10.2) years, completed the study. Test-retest reliability intraclass correlation coefficient at 10-minute intervals between examinations was MPT 0.69, PPT 0.79, MPS 0.72, and NPS 0.86. A high correlation was found between NPS and MPS (r = 0.84; P < .001), whereas NPS and PPT were not significantly correlated. The study found good to excellent test-retest reliability for all measures. None of the sensory measures detected changes in sensitivity following lidocaine 2% or placebo gel. Electronic von Frey assessments of MPT/MPS on oral mucosa have good validity. PMID:25517548
QRS complex detection based on continuous density hidden Markov models using univariate observations
NASA Astrophysics Data System (ADS)
Sotelo, S.; Arenas, W.; Altuve, M.
2018-04-01
In the electrocardiogram (ECG), the detection of QRS complexes is a fundamental step in the ECG signal processing chain since it allows the determination of other characteristics waves of the ECG and provides information about heart rate variability. In this work, an automatic QRS complex detector based on continuous density hidden Markov models (HMM) is proposed. HMM were trained using univariate observation sequences taken either from QRS complexes or their derivatives. The detection approach is based on the log-likelihood comparison of the observation sequence with a fixed threshold. A sliding window was used to obtain the observation sequence to be evaluated by the model. The threshold was optimized by receiver operating characteristic curves. Sensitivity (Sen), specificity (Spc) and F1 score were used to evaluate the detection performance. The approach was validated using ECG recordings from the MIT-BIH Arrhythmia database. A 6-fold cross-validation shows that the best detection performance was achieved with 2 states HMM trained with QRS complexes sequences (Sen = 0.668, Spc = 0.360 and F1 = 0.309). We concluded that these univariate sequences provide enough information to characterize the QRS complex dynamics from HMM. Future works are directed to the use of multivariate observations to increase the detection performance.
Luo, Heng; Zhang, Ping; Cao, Xi Hang; Du, Dizheng; Ye, Hao; Huang, Hui; Li, Can; Qin, Shengying; Wan, Chunling; Shi, Leming; He, Lin; Yang, Lun
2016-11-02
The cost of developing a new drug has increased sharply over the past years. To ensure a reasonable return-on-investment, it is useful for drug discovery researchers in both industry and academia to identify all the possible indications for early pipeline molecules. For the first time, we propose the term computational "drug candidate positioning" or "drug positioning", to describe the above process. It is distinct from drug repositioning, which identifies new uses for existing drugs and maximizes their value. Since many therapeutic effects are mediated by unexpected drug-protein interactions, it is reasonable to analyze the chemical-protein interactome (CPI) profiles to predict indications. Here we introduce the server DPDR-CPI, which can make real-time predictions based only on the structure of the small molecule. When a user submits a molecule, the server will dock it across 611 human proteins, generating a CPI profile of features that can be used for predictions. It can suggest the likelihood of relevance of the input molecule towards ~1,000 human diseases with top predictions listed. DPDR-CPI achieved an overall AUROC of 0.78 during 10-fold cross-validations and AUROC of 0.76 for the independent validation. The server is freely accessible via http://cpi.bio-x.cn/dpdr/.
García-Remesal, Miguel; García-Ruiz, Alejandro; Pérez-Rey, David; de la Iglesia, Diana; Maojo, Víctor
2013-01-01
Nanoinformatics is an emerging research field that uses informatics techniques to collect, process, store, and retrieve data, information, and knowledge on nanoparticles, nanomaterials, and nanodevices and their potential applications in health care. In this paper, we have focused on the solutions that nanoinformatics can provide to facilitate nanotoxicology research. For this, we have taken a computational approach to automatically recognize and extract nanotoxicology-related entities from the scientific literature. The desired entities belong to four different categories: nanoparticles, routes of exposure, toxic effects, and targets. The entity recognizer was trained using a corpus that we specifically created for this purpose and was validated by two nanomedicine/nanotoxicology experts. We evaluated the performance of our entity recognizer using 10-fold cross-validation. The precisions range from 87.6% (targets) to 93.0% (routes of exposure), while recall values range from 82.6% (routes of exposure) to 87.4% (toxic effects). These results prove the feasibility of using computational approaches to reliably perform different named entity recognition (NER)-dependent tasks, such as for instance augmented reading or semantic searches. This research is a "proof of concept" that can be expanded to stimulate further developments that could assist researchers in managing data, information, and knowledge at the nanolevel, thus accelerating research in nanomedicine.
NASA Astrophysics Data System (ADS)
Jain, Sankalp; Kotsampasakou, Eleni; Ecker, Gerhard F.
2018-05-01
Cheminformatics datasets used in classification problems, especially those related to biological or physicochemical properties, are often imbalanced. This presents a major challenge in development of in silico prediction models, as the traditional machine learning algorithms are known to work best on balanced datasets. The class imbalance introduces a bias in the performance of these algorithms due to their preference towards the majority class. Here, we present a comparison of the performance of seven different meta-classifiers for their ability to handle imbalanced datasets, whereby Random Forest is used as base-classifier. Four different datasets that are directly (cholestasis) or indirectly (via inhibition of organic anion transporting polypeptide 1B1 and 1B3) related to liver toxicity were chosen for this purpose. The imbalance ratio in these datasets ranges between 4:1 and 20:1 for negative and positive classes, respectively. Three different sets of molecular descriptors for model development were used, and their performance was assessed in 10-fold cross-validation and on an independent validation set. Stratified bagging, MetaCost and CostSensitiveClassifier were found to be the best performing among all the methods. While MetaCost and CostSensitiveClassifier provided better sensitivity values, Stratified Bagging resulted in high balanced accuracies.
Estimating aboveground biomass in interior Alaska with Landsat data and field measurements
Ji, Lei; Wylie, Bruce K.; Nossov, Dana R.; Peterson, Birgit E.; Waldrop, Mark P.; McFarland, Jack W.; Rover, Jennifer R.; Hollingsworth, Teresa N.
2012-01-01
Terrestrial plant biomass is a key biophysical parameter required for understanding ecological systems in Alaska. An accurate estimation of biomass at a regional scale provides an important data input for ecological modeling in this region. In this study, we created an aboveground biomass (AGB) map at 30-m resolution for the Yukon Flats ecoregion of interior Alaska using Landsat data and field measurements. Tree, shrub, and herbaceous AGB data in both live and dead forms were collected in summers and autumns of 2009 and 2010. Using the Landsat-derived spectral variables and the field AGB data, we generated a regression model and applied this model to map AGB for the ecoregion. A 3-fold cross-validation indicated that the AGB estimates had a mean absolute error of 21.8 Mg/ha and a mean bias error of 5.2 Mg/ha. Additionally, we validated the mapping results using an airborne lidar dataset acquired for a portion of the ecoregion. We found a significant relationship between the lidar-derived canopy height and the Landsat-derived AGB (R2 = 0.40). The AGB map showed that 90% of the ecoregion had AGB values ranging from 10 Mg/ha to 134 Mg/ha. Vegetation types and fires were the primary factors controlling the spatial AGB patterns in this ecoregion.
Hayashi, Yoshihiro; Oishi, Takuya; Shirotori, Kaede; Marumo, Yuki; Kosugi, Atsushi; Kumada, Shungo; Hirai, Daijiro; Takayama, Kozo; Onuki, Yoshinori
2018-07-01
The aim of this study was to explore the potential of boosted tree (BT) to develop a correlation model between active pharmaceutical ingredient (API) characteristics and a tensile strength (TS) of tablets as critical quality attributes. First, we evaluated 81 kinds of API characteristics, such as particle size distribution, bulk density, tapped density, Hausner ratio, moisture content, elastic recovery, molecular weight, and partition coefficient. Next, we prepared tablets containing 50% API, 49% microcrystalline cellulose, and 1% magnesium stearate using direct compression at 6, 8, and 10 kN, and measured TS. Then, we applied BT to our dataset to develop a correlation model. Finally, the constructed BT model was validated using k-fold cross-validation. Results showed that the BT model achieved high-performance statistics, whereas multiple regression analysis resulted in poor estimations. Sensitivity analysis of the BT model revealed that diameter of powder particles at the 10th percentile of the cumulative percentage size distribution was the most crucial factor for TS. In addition, the influences of moisture content, partition coefficients, and modal diameter were appreciably meaningful factors. This study demonstrates that BT model could provide comprehensive understanding of the latent structure underlying APIs and TS of tablets.
Mahajan, Ruhi; Viangteeravat, Teeradache; Akbilgic, Oguz
2017-12-01
A timely diagnosis of congestive heart failure (CHF) is crucial to evade a life-threatening event. This paper presents a novel probabilistic symbol pattern recognition (PSPR) approach to detect CHF in subjects from their cardiac interbeat (R-R) intervals. PSPR discretizes each continuous R-R interval time series by mapping them onto an eight-symbol alphabet and then models the pattern transition behavior in the symbolic representation of the series. The PSPR-based analysis of the discretized series from 107 subjects (69 normal and 38 CHF subjects) yielded discernible features to distinguish normal subjects and subjects with CHF. In addition to PSPR features, we also extracted features using the time-domain heart rate variability measures such as average and standard deviation of R-R intervals. An ensemble of bagged decision trees was used to classify two groups resulting in a five-fold cross-validation accuracy, specificity, and sensitivity of 98.1%, 100%, and 94.7%, respectively. However, a 20% holdout validation yielded an accuracy, specificity, and sensitivity of 99.5%, 100%, and 98.57%, respectively. Results from this study suggest that features obtained with the combination of PSPR and long-term heart rate variability measures can be used in developing automated CHF diagnosis tools. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Jain, Sankalp; Kotsampasakou, Eleni; Ecker, Gerhard F.
2018-04-01
Cheminformatics datasets used in classification problems, especially those related to biological or physicochemical properties, are often imbalanced. This presents a major challenge in development of in silico prediction models, as the traditional machine learning algorithms are known to work best on balanced datasets. The class imbalance introduces a bias in the performance of these algorithms due to their preference towards the majority class. Here, we present a comparison of the performance of seven different meta-classifiers for their ability to handle imbalanced datasets, whereby Random Forest is used as base-classifier. Four different datasets that are directly (cholestasis) or indirectly (via inhibition of organic anion transporting polypeptide 1B1 and 1B3) related to liver toxicity were chosen for this purpose. The imbalance ratio in these datasets ranges between 4:1 and 20:1 for negative and positive classes, respectively. Three different sets of molecular descriptors for model development were used, and their performance was assessed in 10-fold cross-validation and on an independent validation set. Stratified bagging, MetaCost and CostSensitiveClassifier were found to be the best performing among all the methods. While MetaCost and CostSensitiveClassifier provided better sensitivity values, Stratified Bagging resulted in high balanced accuracies.
NASA Astrophysics Data System (ADS)
Melchiorre, C.; Castellanos Abella, E. A.; van Westen, C. J.; Matteucci, M.
2011-04-01
This paper describes a procedure for landslide susceptibility assessment based on artificial neural networks, and focuses on the estimation of the prediction capability, robustness, and sensitivity of susceptibility models. The study is carried out in the Guantanamo Province of Cuba, where 186 landslides were mapped using photo-interpretation. Twelve conditioning factors were mapped including geomorphology, geology, soils, landuse, slope angle, slope direction, internal relief, drainage density, distance from roads and faults, rainfall intensity, and ground peak acceleration. A methodology was used that subdivided the database in 3 subsets. A training set was used for updating the weights. A validation set was used to stop the training procedure when the network started losing generalization capability, and a test set was used to calculate the performance of the network. A 10-fold cross-validation was performed in order to show that the results are repeatable. The prediction capability, the robustness analysis, and the sensitivity analysis were tested on 10 mutually exclusive datasets. The results show that by means of artificial neural networks it is possible to obtain models with high prediction capability and high robustness, and that an exploration of the effect of the individual variables is possible, even if they are considered as a black-box model.
Baker, J B; Dutta, D; Watson, D; Maddala, T; Munneke, B M; Shak, S; Rowinsky, E K; Xu, L-A; Harbison, C T; Clark, E A; Mauro, D J; Khambata-Ford, S
2011-02-01
Although it is accepted that metastatic colorectal cancers (mCRCs) that carry activating mutations in KRAS are unresponsive to anti-epidermal growth factor receptor (EGFR) monoclonal antibodies, a significant fraction of KRAS wild-type (wt) mCRCs are also unresponsive to anti-EGFR therapy. Genes encoding EGFR ligands amphiregulin (AREG) and epiregulin (EREG) are promising gene expression-based markers but have not been incorporated into a test to dichotomise KRAS wt mCRC patients with respect to sensitivity to anti-EGFR treatment. We used RT-PCR to test 110 candidate gene expression markers in primary tumours from 144 KRAS wt mCRC patients who received monotherapy with the anti-EGFR antibody cetuximab. Results were correlated with multiple clinical endpoints: disease control, objective response, and progression-free survival (PFS). Expression of many of the tested candidate genes, including EREG and AREG, strongly associate with all clinical endpoints. Using multivariate analysis with two-layer five-fold cross-validation, we constructed a four-gene predictive classifier. Strikingly, patients below the classifier cutpoint had PFS and disease control rates similar to those of patients with KRAS mutant mCRC. Gene expression appears to identify KRAS wt mCRC patients who receive little benefit from cetuximab. It will be important to test this model in an independent validation study.
NASA Astrophysics Data System (ADS)
Land, Walker H., Jr.; Masters, Timothy D.; Lo, Joseph Y.; McKee, Dan
2001-07-01
A new neural network technology was developed for improving the benign/malignant diagnosis of breast cancer using mammogram findings. A new paradigm, Adaptive Boosting (AB), uses a markedly different theory in solutioning Computational Intelligence (CI) problems. AB, a new machine learning paradigm, focuses on finding weak learning algorithm(s) that initially need to provide slightly better than random performance (i.e., approximately 55%) when processing a mammogram training set. Then, by successive development of additional architectures (using the mammogram training set), the adaptive boosting process improves the performance of the basic Evolutionary Programming derived neural network architectures. The results of these several EP-derived hybrid architectures are then intelligently combined and tested using a similar validation mammogram data set. Optimization focused on improving specificity and positive predictive value at very high sensitivities, where an analysis of the performance of the hybrid would be most meaningful. Using the DUKE mammogram database of 500 biopsy proven samples, on average this hybrid was able to achieve (under statistical 5-fold cross-validation) a specificity of 48.3% and a positive predictive value (PPV) of 51.8% while maintaining 100% sensitivity. At 97% sensitivity, a specificity of 56.6% and a PPV of 55.8% were obtained.
Novel naïve Bayes classification models for predicting the carcinogenicity of chemicals.
Zhang, Hui; Cao, Zhi-Xing; Li, Meng; Li, Yu-Zhi; Peng, Cheng
2016-11-01
The carcinogenicity prediction has become a significant issue for the pharmaceutical industry. The purpose of this investigation was to develop a novel prediction model of carcinogenicity of chemicals by using a naïve Bayes classifier. The established model was validated by the internal 5-fold cross validation and external test set. The naïve Bayes classifier gave an average overall prediction accuracy of 90 ± 0.8% for the training set and 68 ± 1.9% for the external test set. Moreover, five simple molecular descriptors (e.g., AlogP, Molecular weight (M W ), No. of H donors, Apol and Wiener) considered as important for the carcinogenicity of chemicals were identified, and some substructures related to the carcinogenicity were achieved. Thus, we hope the established naïve Bayes prediction model could be applied to filter early-stage molecules for this potential carcinogenicity adverse effect; and the identified five simple molecular descriptors and substructures of carcinogens would give a better understanding of the carcinogenicity of chemicals, and further provide guidance for medicinal chemists in the design of new candidate drugs and lead optimization, ultimately reducing the attrition rate in later stages of drug development. Copyright © 2016 Elsevier Ltd. All rights reserved.
Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models
ERIC Educational Resources Information Center
Shieh, Gwowen
2009-01-01
In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…
NASA Astrophysics Data System (ADS)
Galaviz, V. E.; Yost, M. G.; Simpson, C. D.; Camp, J. E.; Paulsen, M. H.; Elder, J. P.; Hoffman, L.; Flores, D.; Quintana, P. J. E.
2014-05-01
Pedestrians waiting to cross into the US from Mexico at Ports of Entry experience long wait times near idling vehicles. The near-road environment is associated with elevated pollutant levels and adverse health outcomes. This is the first exposure assessment conducted to quantify northbound pedestrian commuter exposure to traffic-related air pollutants at the U.S.-Mexico border San Ysidro Port of Entry (SYPOE). Seventy-three persons who regularly crossed the SYPOE in the pedestrian line and 18 persons who did not cross were recruited to wear personal air monitors for 24-h to measure traffic pollutants particulate matter less than 2.5 μm (PM2.5), 1-nitropyrene (1-NP) - a marker for diesel exhaust - and carbon monoxide (CO). Fixed site concentrations were collected at SYPOE and occurred during the time subjects were crossing northbound to approximate their exposure to 1-NP, ultrafine particles (UFP), PM2.5, CO, and black carbon (BC) while standing in line during their border wait. Subjects who crossed the border in pedestrian lanes had a 6-fold increase in exposure to 1-NP, a 3-fold increase in exposure to CO, and a 2-fold increase in exposure to gravimetric PM2.5, vs. non-border commuters. Univariate regression analysis for UFP (median 40,000 # cm-3) found that border wait time for vehicles explained 21% of variability and relative humidity 13%, but when modeled together neither predictor remained significant. Concentrations at the SYPOE of UFP, PM2.5, CO, and BC are similar to those in other near-roadway studies that show associations with acute and chronic adverse health effects. Although results are limited by small sample numbers, these findings warrant concern for adverse health effects experienced by pedestrian commuters waiting in a long northbound queue at SYPOE and demonstrates a potential health benefit of reduced wait times at the border.
Molecular Origins of Internal Friction Effects on Protein Folding Rates
Sirur, Anshul
2014-01-01
Recent experiments on protein folding dynamics have revealed strong evidence for internal friction effects. That is, observed relaxation times are not simply proportional to the solvent viscosity as might be expected if the solvent were the only source of friction. However, a molecular interpretation of this remarkable phenomenon is currently lacking. Here, we use all-atom simulations of peptide and protein folding in explicit solvent, to probe the origin of the unusual viscosity dependence. We find that an important contribution to this effect, explaining the viscosity dependence of helix formation and the folding of a helix-containing protein, is the insensitivity of torsion angle isomerization to solvent friction. The influence of this landscape roughness can, in turn, be quantitatively explained by a rate theory including memory friction. This insensitivity of local barrier crossing to solvent friction is expected to contribute to the viscosity dependence of folding rates in larger proteins. PMID:24986114
Molecular origins of internal friction effects on protein-folding rates.
de Sancho, David; Sirur, Anshul; Best, Robert B
2014-07-02
Recent experiments on protein-folding dynamics have revealed strong evidence for internal friction effects. That is, observed relaxation times are not simply proportional to the solvent viscosity as might be expected if the solvent were the only source of friction. However, a molecular interpretation of this remarkable phenomenon is currently lacking. Here, we use all-atom simulations of peptide and protein folding in explicit solvent, to probe the origin of the unusual viscosity dependence. We find that an important contribution to this effect, explaining the viscosity dependence of helix formation and the folding of a helix-containing protein, is the insensitivity of torsion angle isomerization to solvent friction. The influence of this landscape roughness can, in turn, be quantitatively explained by a rate theory including memory friction. This insensitivity of local barrier crossing to solvent friction is expected to contribute to the viscosity dependence of folding rates in larger proteins.
Dideoxynucleoside resistance emerges with prolonged zidovudine monotherapy. The RV43 Study Group.
Mayers, D L; Japour, A J; Arduino, J M; Hammer, S M; Reichman, R; Wagner, K F; Chung, R; Lane, J; Crumpacker, C S; McLeod, G X
1994-01-01
Human immunodeficiency virus type 1 (HIV-1) isolates resistant to zidovudine (ZDV) have previously been demonstrated to exhibit in vitro cross-resistance to other similar dideoxynucleoside agents which contain a 3'-azido group. However, cross-resistance to didanosine (ddI) or dideoxycytidine (ddC) has been less well documented. ZDV, ddI, and ddC susceptibility data have been collected from clinical HIV-1 isolates obtained by five clinical centers and their respective retrovirology laboratories. All subjects were treated only with ZDV. Clinical HIV-1 isolates were isolated, amplified, and assayed for drug susceptibility in standardized cultures of phytohemagglutinin-stimulated donor peripheral blood mononuclear cells obtained from healthy seronegative donors. All five cohorts showed a correlation between decreased in vitro susceptibility to ZDV and decreased susceptibility to ddI and ddC. For each 10-fold decrease in ZDV susceptibility, an average corresponding decrease of 2.2-fold in ddI susceptibility was observed (129 isolates studied; P < 0.001, Fisher's test of combined significance). Similarly, susceptibility to ddC decreased 2.0-fold for each 10-fold decrease in ZDV susceptibility (82 isolates studied; P < 0.001, Fisher's test of combined significance). These data indicate that a correlation exists between HIV-1 susceptibilities to ZDV and ddI or ddC for clinical HIV-1 isolates. PMID:8192457
Enjeti, Anoop; Granter, Neil; Ashraf, Asma; Fletcher, Linda; Branford, Susan; Rowlings, Philip; Dooley, Susan
2015-10-01
An automated cartridge-based detection system (GeneXpert; Cepheid) is being widely adopted in low throughput laboratories for monitoring BCR-ABL1 transcript in chronic myelogenous leukaemia. This Australian study evaluated the longitudinal performance specific characteristics of the automated system.The automated cartridge-based system was compared prospectively with the manual qRT-PCR-based reference method at SA Pathology, Adelaide, over a period of 2.5 years. A conversion factor determination was followed by four re-validations. Peripheral blood samples (n = 129) with international scale (IS) values within detectable range were selected for assessment. The mean bias, proportion of results within specified fold difference (2-, 3- and 5-fold), the concordance rate of major molecular remission (MMR) and concordance across a range of IS values on paired samples were evaluated.The initial conversion factor for the automated system was determined as 0.43. Except for the second re-validation, where a negative bias of 1.9-fold was detected, all other biases fell within desirable limits. A cartridge-specific conversion factor and efficiency value was introduced and the conversion factor was confirmed to be stable in subsequent re-validation cycles. Concordance with the reference method/laboratory at >0.1-≤10 IS was 78.2% and at ≤0.001 was 80%, compared to 86.8% in the >0.01-≤0.1 IS range. The overall and MMR concordance were 85.7% and 94% respectively, for samples that fell within ± 5-fold of the reference laboratory value over the entire period of study.Conversion factor and performance specific characteristics for the automated system were longitudinally stable in the clinically relevant range, following introduction by the manufacturer of lot specific efficiency values.
Khan, Hafiz Azhar Ali; Akram, Waseem; Shehzad, Khurram; Shaalan, Essam A
2011-07-22
Agrochemicals have been widely used in Pakistan for several years. This exposes mosquito populations, particularly those present around agricultural settings, to an intense selection pressure for insecticide resistance. The aim of the present study was to investigate the toxicity of representative agrochemicals against various populations of Aedes albopictus (Skuse) collected from three different regions from 2008-2010. For organophosphates and pyrethroids, the resistance ratios compared with susceptible Lab-PK were in the range of 157-266 fold for chlorpyrifos, 24-52 fold for profenofos, 41-71 fold for triazofos, and 15-26 fold for cypermethrin, 15-53 fold for deltamethrin and 21-58 fold for lambdacyhalothrin. The resistance ratios for carbamates and new insecticides were in the range of 13-22 fold for methomyl, 24-30 fold for thiodicarb, and 41-101 fold for indoxacarb, 14-27 fold for emamectin benzoate and 23-50 fold for spinosad. Pair wise comparisons of the log LC50s of insecticides revealed correlation among several insecticides, suggesting a possible cross resistance mechanism. Moreover, resistance remained stable across 3 years, suggesting field selection for general fitness had also taken place for various populations of Ae. albopictus. Moderate to high level of resistance to agrochemicals in Pakistani field populations of Ae. albopictus is reported here first time. The geographic extent of resistance is unknown but, if widespread, may lead to problems in future vector control.
Du, Jian; Gridneva, Zoya; Gay, Melvin C L; Trengove, Robert D; Hartmann, Peter E; Geddes, Donna T
2017-01-01
Persistent organic pollutants in human milk (HM) at high levels are considered to be detrimental to the breastfed infant. To determine the pesticide concentration in HM, a pilot cross-sectional study of 40 Western Australian (WA) women was carried out. Gas chromatography-tandem mass spectrometry (GC-MS/MS) with a validated QuEChERS was used for the analysis of 88 pesticides in HM. p,p'-dichlorodiphenyldichloroethylene (p,p'-DDE) with a mean concentration of 62.8 ± 54.5 ng/g fat was found, whereas other organochlorines, organophosphates, carbamates and pyrethroids were not detected in HM. Overall, no association was observed between HM p,p'-DDE concentrations and maternal age, parity, body mass index and percentage fat mass. Furthermore, for the first time no significant association was found between p,p'-DDE concentrations in HM and infant growth outcomes such as weight, length, head circumference and percentage fat mass. The calculated daily intake was significantly different to the estimated daily intake of total DDTs and was well below the guideline proposed by WHO. The DDTs levels in WA have also significantly decreased by 42 - fold since the 1970s and are currently the lowest in Australia. Copyright © 2016 Elsevier Ltd. All rights reserved.
Mendoza, Jason A; Watson, Kathy; Chen, Tzu-An; Baranowski, Tom; Nicklas, Theresa A; Uscanga, Doris K; Hanfling, Marcus J
2012-01-01
Walking school buses (WSB) increased children's physical activity, but impact on pedestrian safety behaviors (PSB) is unknown. We tested the feasibility of a protocol evaluating changes to PSB during a WSB program. Outcomes were school-level street crossing PSB prior to (Time 1) and during weeks 4-5 (Time 2) of the WSB. The protocol collected 1252 observations at Time 1 and 2548 at Time 2. Mixed model analyses yielded: intervention schoolchildren had 5-fold higher odds (p<0.01) of crossing at the corner/crosswalk but 5-fold lower odds (p<0.01) of stopping at the curb. The protocol appears feasible for documenting changes to school-level PSB. Copyright © 2011 Elsevier Ltd. All rights reserved.
Analysis of a crossed Bragg-cell acousto optical spectrometer for SETI
NASA Technical Reports Server (NTRS)
Gulkis, S.
1986-01-01
The search for radio signals from extraterrestrial intelligent (SETI) beings requires the use of large instantaneous bandwidth (500 MHz) and high resolution (20 Hz) spectrometers. Digital systems with a high degree of modularity can be used to provide this capability, and this method has been widely discussed. Another technique for meeting the SETI requirement is to use a crossed Bragg-cell spectrometer as described by Psaltis and Casasent (1979). This technique makes use of the Folded Spectrum concept, introduced by Thomas (1966). The Folded Spectrum is a two-dimensional Fourier Transform of a raster scanned one-dimensional signal. It is directly related to the long one-dimensional spectrum of the original signal and is ideally suited for optical signal processing.
Analysis of a crossed Bragg-cell acousto optical spectrometer for SETI
NASA Astrophysics Data System (ADS)
Gulkis, S.
1986-10-01
The search for radio signals from extraterrestrial intelligent (SETI) beings requires the use of large instantaneous bandwidth (500 MHz) and high resolution (20 Hz) spectrometers. Digital systems with a high degree of modularity can be used to provide this capability, and this method has been widely discussed. Another technique for meeting the SETI requirement is to use a crossed Bragg-cell spectrometer as described by Psaltis and Casasent (1979). This technique makes use of the Folded Spectrum concept, introduced by Thomas (1966). The Folded Spectrum is a two-dimensional Fourier Transform of a raster scanned one-dimensional signal. It is directly related to the long one-dimensional spectrum of the original signal and is ideally suited for optical signal processing.
Tasayco, M L; Fuchs, J; Yang, X M; Dyalram, D; Georgescu, R E
2000-09-05
The approach of comparing folding and folding/binding processes is exquisitely poised to narrow down the regions of the sequence that drive protein folding. We have dissected the small single alpha/beta domain of oxidized Escherichia coli thioredoxin (Trx) into three complementary fragments (N, residues 1-37; M, residues 38-73; and C, residues 74-108) to study them in isolation and upon recombination by far-UV CD and NMR spectroscopy. The isolated fragments show a minimum of ellipticity of ca. 197 nm in their far-UV CD spectra without concentration dependence, chemical shifts of H(alpha) that are close to the random coil values, and no medium- and long-range NOE connectivities in their three-dimensional NMR spectra. These fragments behave as disordered monomers. Only the far-UV CD spectra of binary or ternary mixtures that contain N- and C-fragments are different from the sum of their individual spectra, which is indicative of folding and/or binding of these fragments. Indeed, the cross-peaks corresponding to the rather hydrophobic beta(2) and beta(4) regions of the beta-sheet of Trx disappear from the (1)H-(15)N HSQC spectra of isolated labeled N- and C-fragments, respectively, upon addition of the unlabeled complementary fragments. The disappearing cross-peaks indicate interactions between the beta(2) and beta(4) regions, and their reappearance at lower temperatures indicates unfolding and/or dissociation of heteromers that are predominantly held by hydrophobic forces. Our results argue that the folding of Trx begins by zippering two discontiguous and rather hydrophobic chain segments (beta(2) and beta(4)) corresponding to neighboring strands of the native beta-sheet.
Li, Zhufang; Terry, Brian; Olds, William; Protack, Tricia; Deminie, Carol; Minassian, Beatrice; Nowicka-Sans, Beata; Sun, Yongnian; Dicker, Ira; Hwang, Carey; Lataillade, Max; Hanna, George J; Krystal, Mark
2013-11-01
BMS-986001 is a novel HIV nucleoside reverse transcriptase inhibitor (NRTI). To date, little is known about its resistance profile. In order to examine the cross-resistance profile of BMS-986001 to NRTI mutations, a replicating virus system was used to examine specific amino acid mutations known to confer resistance to various NRTIs. In addition, reverse transcriptases from 19 clinical isolates with various NRTI mutations were examined in the Monogram PhenoSense HIV assay. In the site-directed mutagenesis studies, a virus containing a K65R substitution exhibited a 0.4-fold change in 50% effective concentration (EC50) versus the wild type, while the majority of viruses with the Q151M constellation (without M184V) exhibited changes in EC50 versus wild type of 0.23- to 0.48-fold. Susceptibility to BMS-986001 was also maintained in an L74V-containing virus (0.7-fold change), while an M184V-only-containing virus induced a 2- to 3-fold decrease in susceptibility. Increasing numbers of thymidine analog mutation pattern 1 (TAM-1) pathway mutations correlated with decreases in susceptibility to BMS-986001, while viruses with TAM-2 pathway mutations exhibited a 5- to 8-fold decrease in susceptibility, regardless of the number of TAMs. A 22-fold decrease in susceptibility to BMS-986001 was observed in a site-directed mutant containing the T69 insertion complex. Common non-NRTI (NNRTI) mutations had little impact on susceptibility to BMS-986001. The results from the site-directed mutants correlated well with the more complicated genotypes found in NRTI-resistant clinical isolates. Data from clinical studies are needed to determine the clinically relevant resistance cutoff values for BMS-986001.
Mies, J.W.
1993-01-01
Remnant blocks of marble from the Moretti-Harrah dimension-stone quarry provide excellent exposure of meter-scale sheath folds. Tubular structures with elliptical cross-sections (4 ???Ryz ??? 5) are the most common expression of the folds. The tubes are elongate subparallel to stretching lineation and are defined by centimeter-scale layers of schist. Eccentrically nested elliptical patterns and opposing asymmetry of folds ('S' and 'Z') are consistent with the sheath-fold interpretation. Sheath folds are locally numerous in the Moretti-Harrah quarry but are not widely distributed in the Sylacauga Marble Group; reconnaissance in neighboring quarries provided no additional observations. The presence of sheath folds in part of the Talladega slate belt indicates a local history of plastic, non-coaxial deformation. Such a history of deformation is substantiated by petrographic study of an extracted hinge from the Moretti-Harrah quarry. The sheath folds are modeled as due to passive amplification of initial structures during simple shear, using both analytic geometry and graphic simulation. As indicated by these models, relatively large shear strains (y ??? 9) and longitudinal initial structures are required. The shear strain presumably relates to NW-directed displacement of overlying crystalline rocks during late Paleozoic orogeny. ?? 1993.
Guidelines To Validate Control of Cross-Contamination during Washing of Fresh-Cut Leafy Vegetables.
Gombas, D; Luo, Y; Brennan, J; Shergill, G; Petran, R; Walsh, R; Hau, H; Khurana, K; Zomorodi, B; Rosen, J; Varley, R; Deng, K
2017-02-01
The U.S. Food and Drug Administration requires food processors to implement and validate processes that will result in significantly minimizing or preventing the occurrence of hazards that are reasonably foreseeable in food production. During production of fresh-cut leafy vegetables, microbial contamination that may be present on the product can spread throughout the production batch when the product is washed, thus increasing the risk of illnesses. The use of antimicrobials in the wash water is a critical step in preventing such water-mediated cross-contamination; however, many factors can affect antimicrobial efficacy in the production of fresh-cut leafy vegetables, and the procedures for validating this key preventive control have not been articulated. Producers may consider three options for validating antimicrobial washing as a preventive control for cross-contamination. Option 1 involves the use of a surrogate for the microbial hazard and the demonstration that cross-contamination is prevented by the antimicrobial wash. Option 2 involves the use of antimicrobial sensors and the demonstration that a critical antimicrobial level is maintained during worst-case operating conditions. Option 3 validates the placement of the sensors in the processing equipment with the demonstration that a critical antimicrobial level is maintained at all locations, regardless of operating conditions. These validation options developed for fresh-cut leafy vegetables may serve as examples for validating processes that prevent cross-contamination during washing of other fresh produce commodities.
Computer-aided Assessment of Regional Abdominal Fat with Food Residue Removal in CT
Makrogiannis, Sokratis; Caturegli, Giorgio; Davatzikos, Christos; Ferrucci, Luigi
2014-01-01
Rationale and Objectives Separate quantification of abdominal subcutaneous and visceral fat regions is essential to understand the role of regional adiposity as risk factor in epidemiological studies. Fat quantification is often based on computed tomography (CT) because fat density is distinct from other tissue densities in the abdomen. However, the presence of intestinal food residues with densities similar to fat may reduce fat quantification accuracy. We introduce an abdominal fat quantification method in CT with interest in food residue removal. Materials and Methods Total fat was identified in the feature space of Hounsfield units and divided into subcutaneous and visceral components using model-based segmentation. Regions of food residues were identified and removed from visceral fat using a machine learning method integrating intensity, texture, and spatial information. Cost-weighting and bagging techniques were investigated to address class imbalance. Results We validated our automated food residue removal technique against semimanual quantifications. Our feature selection experiments indicated that joint intensity and texture features produce the highest classification accuracy at 95%. We explored generalization capability using k-fold cross-validation and receiver operating characteristic (ROC) analysis with variable k. Losses in accuracy and area under ROC curve between maximum and minimum k were limited to 0.1% and 0.3%. We validated tissue segmentation against reference semimanual delineations. The Dice similarity scores were as high as 93.1 for subcutaneous fat and 85.6 for visceral fat. Conclusions Computer-aided regional abdominal fat quantification is a reliable computational tool for large-scale epidemiological studies. Our proposed intestinal food residue reduction scheme is an original contribution of this work. Validation experiments indicate very good accuracy and generalization capability. PMID:24119354
Computer-aided assessment of regional abdominal fat with food residue removal in CT.
Makrogiannis, Sokratis; Caturegli, Giorgio; Davatzikos, Christos; Ferrucci, Luigi
2013-11-01
Separate quantification of abdominal subcutaneous and visceral fat regions is essential to understand the role of regional adiposity as risk factor in epidemiological studies. Fat quantification is often based on computed tomography (CT) because fat density is distinct from other tissue densities in the abdomen. However, the presence of intestinal food residues with densities similar to fat may reduce fat quantification accuracy. We introduce an abdominal fat quantification method in CT with interest in food residue removal. Total fat was identified in the feature space of Hounsfield units and divided into subcutaneous and visceral components using model-based segmentation. Regions of food residues were identified and removed from visceral fat using a machine learning method integrating intensity, texture, and spatial information. Cost-weighting and bagging techniques were investigated to address class imbalance. We validated our automated food residue removal technique against semimanual quantifications. Our feature selection experiments indicated that joint intensity and texture features produce the highest classification accuracy at 95%. We explored generalization capability using k-fold cross-validation and receiver operating characteristic (ROC) analysis with variable k. Losses in accuracy and area under ROC curve between maximum and minimum k were limited to 0.1% and 0.3%. We validated tissue segmentation against reference semimanual delineations. The Dice similarity scores were as high as 93.1 for subcutaneous fat and 85.6 for visceral fat. Computer-aided regional abdominal fat quantification is a reliable computational tool for large-scale epidemiological studies. Our proposed intestinal food residue reduction scheme is an original contribution of this work. Validation experiments indicate very good accuracy and generalization capability. Published by Elsevier Inc.
A support vector machine for predicting defibrillation outcomes from waveform metrics.
Howe, Andrew; Escalona, Omar J; Di Maio, Rebecca; Massot, Bertrand; Cromie, Nick A; Darragh, Karen M; Adgey, Jennifer; McEneaney, David J
2014-03-01
Algorithms to predict shock success based on VF waveform metrics could significantly enhance resuscitation by optimising the timing of defibrillation. To investigate robust methods of predicting defibrillation success in VF cardiac arrest patients, by using a support vector machine (SVM) optimisation approach. Frequency-domain (AMSA, dominant frequency and median frequency) and time-domain (slope and RMS amplitude) VF waveform metrics were calculated in a 4.1Y window prior to defibrillation. Conventional prediction test validity of each waveform parameter was conducted and used AUC>0.6 as the criterion for inclusion as a corroborative attribute processed by the SVM classification model. The latter used a Gaussian radial-basis-function (RBF) kernel and the error penalty factor C was fixed to 1. A two-fold cross-validation resampling technique was employed. A total of 41 patients had 115 defibrillation instances. AMSA, slope and RMS waveform metrics performed test validation with AUC>0.6 for predicting termination of VF and return-to-organised rhythm. Predictive accuracy of the optimised SVM design for termination of VF was 81.9% (± 1.24 SD); positive and negative predictivity were respectively 84.3% (± 1.98 SD) and 77.4% (± 1.24 SD); sensitivity and specificity were 87.6% (± 2.69 SD) and 71.6% (± 9.38 SD) respectively. AMSA, slope and RMS were the best VF waveform frequency-time parameters predictors of termination of VF according to test validity assessment. This a priori can be used for a simplified SVM optimised design that combines the predictive attributes of these VF waveform metrics for improved prediction accuracy and generalisation performance without requiring the definition of any threshold value on waveform metrics. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Flood loss modelling with FLF-IT: a new flood loss function for Italian residential structures
NASA Astrophysics Data System (ADS)
Hasanzadeh Nafari, Roozbeh; Amadio, Mattia; Ngo, Tuan; Mysiak, Jaroslav
2017-07-01
The damage triggered by different flood events costs the Italian economy millions of euros each year. This cost is likely to increase in the future due to climate variability and economic development. In order to avoid or reduce such significant financial losses, risk management requires tools which can provide a reliable estimate of potential flood impacts across the country. Flood loss functions are an internationally accepted method for estimating physical flood damage in urban areas. In this study, we derived a new flood loss function for Italian residential structures (FLF-IT), on the basis of empirical damage data collected from a recent flood event in the region of Emilia-Romagna. The function was developed based on a new Australian approach (FLFA), which represents the confidence limits that exist around the parameterized functional depth-damage relationship. After model calibration, the performance of the model was validated for the prediction of loss ratios and absolute damage values. It was also contrasted with an uncalibrated relative model with frequent usage in Europe. In this regard, a three-fold cross-validation procedure was carried out over the empirical sample to measure the range of uncertainty from the actual damage data. The predictive capability has also been studied for some sub-classes of water depth. The validation procedure shows that the newly derived function performs well (no bias and only 10 % mean absolute error), especially when the water depth is high. Results of these validation tests illustrate the importance of model calibration. The advantages of the FLF-IT model over other Italian models include calibration with empirical data, consideration of the epistemic uncertainty of data, and the ability to change parameters based on building practices across Italy.
Qureshi, Abid; Tandon, Himani; Kumar, Manoj
2015-11-01
Peptide-based antiviral therapeutics has gradually paved their way into mainstream drug discovery research. Experimental determination of peptides' antiviral activity as expressed by their IC50 values involves a lot of effort. Therefore, we have developed "AVP-IC50 Pred," a regression-based algorithm to predict the antiviral activity in terms of IC50 values (μM). A total of 759 non-redundant peptides from AVPdb and HIPdb were divided into a training/test set having 683 peptides (T(683)) and a validation set with 76 independent peptides (V(76)) for evaluation. We utilized important peptide sequence features like amino-acid compositions, binary profile of N8-C8 residues, physicochemical properties and their hybrids. Four different machine learning techniques (MLTs) namely Support vector machine, Random Forest, Instance-based classifier, and K-Star were employed. During 10-fold cross validation, we achieved maximum Pearson correlation coefficients (PCCs) of 0.66, 0.64, 0.56, 0.55, respectively, for the above MLTs using the best combination of feature sets. All the predictive models also performed well on the independent validation dataset and achieved maximum PCCs of 0.74, 0.68, 0.59, 0.57, respectively, on the best combination of feature sets. The AVP-IC50 Pred web server is anticipated to assist the researchers working on antiviral therapeutics by enabling them to computationally screen many compounds and focus experimental validation on the most promising set of peptides, thus reducing cost and time efforts. The server is available at http://crdd.osdd.net/servers/ic50avp. © 2015 Wiley Periodicals, Inc.
Maamary, Joel A; Cole, Ian; Darveniza, Paul; Pemberton, Cecilia; Brake, Helen Mary; Tisch, Stephen
2017-09-01
The objective of this study was to better define the relationship of laryngeal electromyography and video laryngostroboscopy in the diagnosis of vocal fold paralysis. Retrospective diagnostic cohort study with cross-sectional data analysis METHODS: Data were obtained from 57 patients with unilateral vocal fold paralysis who attended a large tertiary voice referral center. Electromyographic findings were classified according to recurrent laryngeal nerve, superior laryngeal nerve, and high vagal/combined lesions. Video laryngostroboscopy recordings were classified according to the position of the immobile fold into median, paramedian, lateral, and a foreshortened/hooded vocal fold. The position of the paralyzed vocal fold was then analyzed according to the lesion as determined by electromyography. The recurrent laryngeal nerve was affected in the majority of cases with left-sided lesions more common than right. Vocal fold position differed between recurrent laryngeal and combined vagal lesions. Recurrent laryngeal nerve lesions were more commonly associated with a laterally displaced immobile fold. No fold position was suggestive of a combined vagal lesion. The inter-rater reliability for determining fold position was high. Laryngeal electromyography is useful in diagnosing neuromuscular dysfunction of the larynx and best practice recommends its continued implementation along with laryngostroboscopy. While recurrent laryngeal nerve lesions are more likely to present with a lateral vocal fold, this does not occur in all cases. Such findings indicate that further unknown mechanisms contribute to fold position in unilateral paralysis. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Pohn, Howard A.
2000-01-01
Lateral ramps are zones where decollements change stratigraphic level along strike; they differ from frontal ramps, which are zones where decollements change stratigraphic level perpendicular to strike. In the Appalachian Mountains, the surface criteria for recognizing the subsurface presence of lateral ramps include (1) an abrupt change in wavelength or a termination of folds along strike, (2) a conspicuous change in the frequency of mapped faults or disturbed zones (extremely disrupted duplexes) at the surface, (3) long, straight river trends emerging onto the coastal plain or into the Appalachian Plateaus province, (4) major geomorphic discontinuities in the trend of the Blue Ridge province, (5) interruption of Mesozoic basins by cross-strike border faults, and (6) zones of modern and probable ancient seismic activity. Additional features related to lateral ramps include tectonic windows, cross-strike igneous intrusions, areas of giant landslides, and abrupt changes in Paleozoic sedimentation along strike. Proprietary strike-line seismic-reflection profiles cross three of the lateral ramps that were identified by using the surface criteria. The profiles confirm their presence and show their detailed nature in the subsurface. Like frontal ramps, lateral ramps are one of two possible consequences of fold-and-thrust-belt tectonics and are common elements in the Appalachian fold-and-thrust belt. A survey of other thrust belts in the United States and elsewhere strongly suggests that lateral ramps at depth can be identified by their surface effects. Lateral ramps probably are the result of thrust sheet motion caused by continued activation of ancient cratonic fracture systems. Such fractures localized the transform faults along which the continental segments adjusted during episodes of sea-floor spreading.
Khan, Hafiz Azhar Ali; Akram, Waseem; Iqbal, Javaid; Naeem-Ullah, Unsar
2015-01-01
The house fly, Musca domestica L., is an important ectoparasite with the ability to develop resistance to insecticides used for their control. Thiamethoxam, a neonicotinoid, is a relatively new insecticide and effectively used against house flies with a few reports of resistance around the globe. To understand the status of resistance to thiamethoxam, eight adult house fly strains were evaluated under laboratory conditions. In addition, to assess the risks of resistance development, cross-resistance potential and possible biochemical mechanisms, a field strain of house flies was selected with thiamethoxam in the laboratory. The results revealed that the field strains showed varying level of resistance to thiamethoxam with resistance ratios (RR) at LC50 ranged from 7.66-20.13 folds. Continuous selection of the field strain (Thia-SEL) for five generations increased the RR from initial 7.66 fold to 33.59 fold. However, resistance declined significantly when the Thia-SEL strain reared for the next five generations without exposure to thiamethoxam. Compared to the laboratory susceptible reference strain (Lab-susceptible), the Thia-SEL strain showed cross-resistance to imidacloprid. Synergism tests revealed that S,S,S-tributylphosphorotrithioate (DEF) and piperonyl butoxide (PBO) produced synergism of thiamethoxam effects in the Thia-SEL strain (2.94 and 5.00 fold, respectively). In addition, biochemical analyses revealed that the activities of carboxylesterase (CarE) and mixed function oxidase (MFO) in the Thia-SEL strain were significantly higher than the Lab-susceptible strain. It seems that metabolic detoxification by CarE and MFO was a major mechanism for thiamethoxam resistance in the Thia-SEL strain of house flies. The results could be helpful in the future to develop an improved control strategy against house flies.
Khan, Hafiz Azhar Ali; Akram, Waseem; Iqbal, Javaid; Naeem-Ullah, Unsar
2015-01-01
The house fly, Musca domestica L., is an important ectoparasite with the ability to develop resistance to insecticides used for their control. Thiamethoxam, a neonicotinoid, is a relatively new insecticide and effectively used against house flies with a few reports of resistance around the globe. To understand the status of resistance to thiamethoxam, eight adult house fly strains were evaluated under laboratory conditions. In addition, to assess the risks of resistance development, cross-resistance potential and possible biochemical mechanisms, a field strain of house flies was selected with thiamethoxam in the laboratory. The results revealed that the field strains showed varying level of resistance to thiamethoxam with resistance ratios (RR) at LC50 ranged from 7.66-20.13 folds. Continuous selection of the field strain (Thia-SEL) for five generations increased the RR from initial 7.66 fold to 33.59 fold. However, resistance declined significantly when the Thia-SEL strain reared for the next five generations without exposure to thiamethoxam. Compared to the laboratory susceptible reference strain (Lab-susceptible), the Thia-SEL strain showed cross-resistance to imidacloprid. Synergism tests revealed that S,S,S-tributylphosphorotrithioate (DEF) and piperonyl butoxide (PBO) produced synergism of thiamethoxam effects in the Thia-SEL strain (2.94 and 5.00 fold, respectively). In addition, biochemical analyses revealed that the activities of carboxylesterase (CarE) and mixed function oxidase (MFO) in the Thia-SEL strain were significantly higher than the Lab-susceptible strain. It seems that metabolic detoxification by CarE and MFO was a major mechanism for thiamethoxam resistance in the Thia-SEL strain of house flies. The results could be helpful in the future to develop an improved control strategy against house flies. PMID:25938578
Wang, Xingliang; Wu, Shuwen; Gao, Weiyue; Wu, Yidong
2016-02-01
A field-collected strain (HF) of Plutella xylostella (L.) showed 420-fold resistance to fipronil compared with a susceptible laboratory strain (Roth). The HF-R strain, derived from the HF strain by 25 generations of successive selection with fipronil in the laboratory, developed 2,200-fold resistance to fipronil relative to the Roth strain. The F(1) progeny of the reciprocal crosses between HF-R and Roth showed 640-fold (R♀ × S♂) and 1,380-fold (R♂ × S♀) resistance to fipronil, indicating resistance is inherited as an incompletely dominant trait. Analysis of progeny from a backcross (F1♂ × S♀) suggests that resistance is controlled by one major locus. The LC(50) of the R♂ × S♀ cross F(1) progeny is slightly but significantly higher than that of the R♀ × S♂ cross F(1) progeny, suggesting a minor resistance gene on the Z chromosome. Sequence analysis of PxGABARα1 (an Rdl-homologous GABA receptor gene of P. xylostella) from the HF-R strain identified two mutations A282S and A282G (corresponding to the A302S mutation of the Drosophila melanogaster Rdl gene), which have been previously implicated in fipronil resistance in several insect species including P. xylostella. PxGABARα1 was previously mapped to the Z chromosome of P. xylostella. In conclusion, fipronil resistance in the HF-R strain of P. xylostella was incompletely dominant, and controlled by a major autosomal locus and a sex-linked minor gene (PxGABARα1) on the Z chromosome. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Jackman, A. L.; Kelland, L. R.; Kimbell, R.; Brown, M.; Gibson, W.; Aherne, G. W.; Hardcastle, A.; Boyle, F. T.
1995-01-01
Four cell lines, the mouse L1210 leukaemia, the human W1L2 lymphoblastoid and two human ovarian (CH1 and 41M) cell lines, were made resistant to ZD1694 (Tomudex) by continual exposure to incremental doses of the drug. A 500-fold increase in thymidylate synthase (TS) activity is the primary mechanism of resistance to ZD1694 in the W1L2:RD1694 cell line, which is consequently highly cross-resistant to other folate-based TS inhibitors, including BW1843U89, LY231514 and AG337, but sensitive to antifolates with other enzyme targets. The CH1:RD1694 cell line is 14-fold resistant to ZD1694, largely accounted for by the 4.2-fold increase in TS activity. Cross-resistance was observed to other TS inhibitors, including 5-fluorodeoxyuridine (FdUrd). 41M:RD1694 cells, when exposed to 0.1 microM [3H]ZD1694, accumulated approximately 20-fold less 3H-labelled material over 24 h than the parental line. Data are consistent with this being the result of impaired transport of the drug via the reduced folate/methotrexate carrier. Resistance was therefore observed to methotrexate but not to CB3717, a compound known to use this transport mechanism poorly. The mouse L1210:RD1694 cell line does not accumulate ZD1694 or Methotrexate (MTX) polyglutamates. Folylpolyglutamate synthetase substrate activity (using ZD1694 as the substrate) was decreased to approximately 13% of that observed in the parental line. Cross-resistance was found to those compounds known to be active through polyglutamation. PMID:7537518
Cole, Adam J.; David, Allan E.; Wang, Jianxin; Galbán, Craig J.; Hill, Hannah L.; Yang, Victor C.
2010-01-01
While successful magnetic tumor targeting of iron oxide nanoparticles has been achieved in a number of models, the rapid blood clearance of magnetically suitable particles by the reticuloendothelial system (RES) limits their availability for targeting. This work aimed to develop a long-circulating magnetic iron oxide nanoparticle (MNP) platform capable of sustained tumor exposure via the circulation and, thus, enhanced magnetic tumor targeting. Aminated, cross-linked starch (DN) and aminosilane (A) coated MNPs were successfully modified with 5 kDa (A5, D5) or 20 kDa (A20, D20) polyethylene glycol (PEG) chains using simple N-Hydroxysuccinimide (NHS) chemistry and characterized. Identical PEG-weight analogues between platforms (A5 & D5, A20 & D20) were similar in size (140–190 nm) and relative PEG labeling (1.5% of surface amines – A5/D5, 0.4% – A20/D20), with all PEG-MNPs possessing magnetization properties suitable for magnetic targeting. Candidate PEG-MNPs were studied in RES simulations in vitro to predict long-circulating character. D5 and D20 performed best showing sustained size stability in cell culture medium at 37°C and 7 (D20) to 10 (D5) fold less uptake in RAW264.7 macrophages when compared to previously targeted, unmodified starch MNPs (D). Observations in vitro were validated in vivo, with D5 (7.29 hr) and D20 (11.75 hr) showing much longer half-lives than D (0.12 hr). Improved plasma stability enhanced tumor MNP exposure 100 (D5) to 150 (D20) fold as measured by plasma AUC0-∞ Sustained tumor exposure over 24 hours was visually confirmed in a 9L-glioma rat model (12 mg Fe/kg) using magnetic resonance imaging (MRI). Findings indicate that both D5 and D20 are promising MNP platforms for enhanced magnetic tumor targeting, warranting further study in tumor models. PMID:21176955
Reliable Digit Span: A Systematic Review and Cross-Validation Study
ERIC Educational Resources Information Center
Schroeder, Ryan W.; Twumasi-Ankrah, Philip; Baade, Lyle E.; Marshall, Paul S.
2012-01-01
Reliable Digit Span (RDS) is a heavily researched symptom validity test with a recent literature review yielding more than 20 studies ranging in dates from 1994 to 2011. Unfortunately, limitations within some of the research minimize clinical generalizability. This systematic review and cross-validation study was conducted to address these…
ERIC Educational Resources Information Center
Tseng, Chia-Ti Heather
2017-01-01
This study aims to investigate EFL learners' perspectives for the effectiveness of content-based instruction in a cross-cultural communication course. The main objectives of this study are three-folds: (1) to examine students' perspectives regarding the effectiveness of content learning; (2) to examine students' perspectives regarding the…
Origami: Delineating Cosmic Structures with Phase-Space Folds
NASA Astrophysics Data System (ADS)
Neyrinck, Mark C.; Falck, Bridget L.; Szalay, Alex S.
2015-01-01
Structures like galaxies and filaments of galaxies in the Universe come about from the origami-like folding of an initially flat three-dimensional manifold in 6D phase space. The ORIGAMI method identifies these structures in a cosmological simulation, delineating the structures according to their outer folds. Structure identification is a crucial step in comparing cosmological simulations to observed maps of the Universe. The ORIGAMI definition is objective, dynamical and geometric: filament, wall and void particles are classified according to the number of orthogonal axes along which dark-matter streams have crossed. Here, we briefly review these ideas, and speculate on how ORIGAMI might be useful to find cosmic voids.
NASA Technical Reports Server (NTRS)
Brendt. Emily; Zavodsky, Bradley; Jedlovec, Gary; Elmer, Nicholas
2014-01-01
Tropopause folds are identified by warm, dry, high-potential vorticity, ozone-rich air and are one explanation for damaging non-convective wind events. Could improved model representation of stratospheric air and associated tropopause folding improve non-convective wind forecasts and high wind warnings? The goal of this study is to assess the impact of assimilating Hyperspectral Infrared (IR) profiles on forecasting stratospheric air, tropopause folds, and associated non-convective winds: (1) AIRS: Atmospheric Infrared Sounder (2) IASI: Infrared Atmospheric Sounding Interferometer (3) CrIMSS: Cross-track Infrared and Microwave Sounding Suite
Seebohm, B; Matinmehr, F; Köhler, J; Francino, A; Navarro-Lopéz, F; Perrot, A; Ozcelik, C; McKenna, W J; Brenner, B; Kraft, T
2009-08-05
The ability of myosin to generate motile forces is based on elastic distortion of a structural element of the actomyosin complex (cross-bridge) that allows strain to develop before filament sliding. Addressing the question, which part of the actomyosin complex experiences main elastic distortion, we suggested previously that the converter domain might be the most compliant region of the myosin head domain. Here we test this proposal by studying functional effects of naturally occurring missense mutations in the beta-myosin heavy chain, 723Arg --> Gly (R723G) and 736Ile --> Thr (I736T), in comparison to 719Arg --> Trp (R719W). All three mutations are associated with hypertrophic cardiomyopathy and are located in the converter region of the myosin head domain. We determined several mechanical parameters of single skinned slow fibers isolated from Musculus soleus biopsies of hypertrophic cardiomyopathy patients and healthy controls. Major findings of this study for mutation R723G were i), a >40% increase in fiber stiffness in rigor with a 2.9-fold increase in stiffness per myosin head (S( *)(rigor R723G) = 0.84 pN/nm S( *)(rigor WT) = 0.29 pN/nm); and ii), a significant increase in force per head (F( *)(10 degrees C), 1.99 pN vs. 1.49 pN = 1.3-fold increase; F( *)(20 degrees C), 2.56 pN vs. 1.92 pN = 1.3-fold increase) as well as stiffness per head during isometric steady-state contraction (S( *)(active10 degrees C), 0.52 pN/nm vs. 0.28 pN/nm = 1.9-fold increase). Similar changes were found for mutation R719W (2.6-fold increase in S( *)(rigor); 1.8-fold increase in F( *)(10 degrees C), 1.6-fold in F( *)(20 degrees C); twofold increase in S( *)(active10 degrees C)). Changes in active cross-bridge cycling kinetics could not account for the increase in force and active stiffness. For the above estimates the previously determined fraction of mutated myosin in the biopsies was taken into account. Data for wild-type myosin of slow soleus muscle fibers support previous findings that for the slow myosin isoform S( *) and F( *) are significantly lower than for fast myosin e.g., of rabbit psoas muscle. The data indicate that two mutations, R723G and R719W, are associated with an increase in resistance to elastic distortion of the individual mutated myosin heads whereas mutation I736T has essentially no effect. The data strongly support the notion that major elastic distortion occurs within the converter itself. Apparently, the compliance depends on specific residues, e.g., R719 and R723, presumably located at strategic positions near the long alpha-helix of the light chain binding domain. Because amino acids 719 and 723 are nonconserved residues, cross-bridge stiffness may well be specifically tuned for different myosins.
Wang, Xu; Xiong, Youling L; Sato, Hiroaki
2017-09-27
Porcine myofibrillar protein (MP) was modified with glucose oxidase (GluOx)-iron that produces hydroxyl radicals then subjected to microbial transglutaminase (TGase) cross-linking in 0.6 M NaCl at 4 °C. The resulting aggregation and gel formation of MP were examined. The GluOx-mediated oxidation promoted the formation of both soluble and insoluble protein aggregates via disulfide bonds and occlusions of hydrophobic groups. The subsequent TGase treatment converted protein aggregates into highly cross-linked polymers. MP-lipid emulsion composite gels formed with such polymers exhibited markedly enhanced gelling capacity: up to 4.4-fold increases in gel firmness and 3.5-fold increases in gel elasticity over nontreated protein. Microstructural examination showed small oil droplets dispersed in a densely packed gel matrix when MP was oxidatively modified, and the TGase treatment further contributed to such packing. The enzymatic GluOx oxidation/TGase treatment shows promise to improve the textural properties of emulsified meat products.
Ogunyemi, Omolola; Teklehaimanot, Senait; Patty, Lauren; Moran, Erin; George, Sheba
2013-01-01
Introduction Screening guidelines for diabetic patients recommend yearly eye examinations to detect diabetic retinopathy and other forms of diabetic eye disease. However, annual screening rates for retinopathy in US urban safety net settings remain low. Methods Using data gathered from a study of teleretinal screening in six urban safety net clinics, we assessed whether predictive modeling could be of value in identifying patients at risk of developing retinopathy. We developed and examined the accuracy of two predictive modeling approaches for diabetic retinopathy in a sample of 513 diabetic individuals, using routinely available clinical variables from retrospective medical record reviews. Bayesian networks and radial basis function (neural) networks were learned using ten-fold cross-validation. Results The predictive models were modestly predictive with the best model having an AUC of 0.71. Discussion Using routinely available clinical variables to predict patients at risk of developing retinopathy and to target them for annual eye screenings may be of some usefulness to safety net clinics. PMID:23920536
Ogunyemi, Omolola; Teklehaimanot, Senait; Patty, Lauren; Moran, Erin; George, Sheba
2013-01-01
Screening guidelines for diabetic patients recommend yearly eye examinations to detect diabetic retinopathy and other forms of diabetic eye disease. However, annual screening rates for retinopathy in US urban safety net settings remain low. Using data gathered from a study of teleretinal screening in six urban safety net clinics, we assessed whether predictive modeling could be of value in identifying patients at risk of developing retinopathy. We developed and examined the accuracy of two predictive modeling approaches for diabetic retinopathy in a sample of 513 diabetic individuals, using routinely available clinical variables from retrospective medical record reviews. Bayesian networks and radial basis function (neural) networks were learned using ten-fold cross-validation. The predictive models were modestly predictive with the best model having an AUC of 0.71. Using routinely available clinical variables to predict patients at risk of developing retinopathy and to target them for annual eye screenings may be of some usefulness to safety net clinics.
Alalwiat, Ahlam; Tang, Wen; Gerişlioğlu, Selim; Becker, Matthew L; Wesdemiotis, Chrys
2017-01-17
The bioconjugate BMP2-(PEO-HA) 2 , composed of a dendron with two monodisperse poly(ethylene oxide) (PEO) branches terminated by a hydroxyapatite binding peptide (HA), and a focal point substituted with a bone growth stimulating peptide (BMP2), has been comprehensively characterized by mass spectrometry (MS) methods, encompassing matrix-assisted laser desorption ionization (MALDI), electrospray ionization (ESI), tandem mass spectrometry (MS 2 ), and ion mobility mass spectrometry (IM-MS). MS 2 experiments using different ion activation techniques validated the sequences of the synthetic, bioactive peptides HA and BMP2, which contained highly basic amino acid residues either at the N-terminus (BMP2) or C-terminus (HA). Application of MALDI-MS, ESI-MS, and IM-MS to the polymer-peptide biomaterial confirmed its composition. Collision cross-section measurements and molecular modeling indicated that BMP2-(PEO-HA) 2 exists in several folded and extended conformations, depending on the degree of protonation. Protonation of all basic sites of the hybrid material nearly doubles its conformational space and accessible surface area.
NASA Astrophysics Data System (ADS)
Mahvash Mohammadi, Neda; Hezarkhani, Ardeshir
2018-07-01
Classification of mineralised zones is an important factor for the analysis of economic deposits. In this paper, the support vector machine (SVM), a supervised learning algorithm, based on subsurface data is proposed for classification of mineralised zones in the Takht-e-Gonbad porphyry Cu-deposit (SE Iran). The effects of the input features are evaluated via calculating the accuracy rates on the SVM performance. Ultimately, the SVM model, is developed based on input features namely lithology, alteration, mineralisation, the level and, radial basis function (RBF) as a kernel function. Moreover, the optimal amount of parameters λ and C, using n-fold cross-validation method, are calculated at level 0.001 and 0.01 respectively. The accuracy of this model is 0.931 for classification of mineralised zones in the Takht-e-Gonbad porphyry deposit. The results of the study confirm the efficiency of SVM method for classification the mineralised zones.
Automatic Identification of Critical Follow-Up Recommendation Sentences in Radiology Reports
Yetisgen-Yildiz, Meliha; Gunn, Martin L.; Xia, Fei; Payne, Thomas H.
2011-01-01
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports. PMID:22195225
Automated Non-Alphanumeric Symbol Resolution in Clinical Texts
Moon, SungRim; Pakhomov, Serguei; Ryan, James; Melton, Genevieve B.
2011-01-01
Although clinical texts contain many symbols, relatively little attention has been given to symbol resolution by medical natural language processing (NLP) researchers. Interpreting the meaning of symbols may be viewed as a special case of Word Sense Disambiguation (WSD). One thousand instances of four common non-alphanumeric symbols (‘+’, ‘–’, ‘/’, and ‘#’) were randomly extracted from a clinical document repository and annotated by experts. The symbols and their surrounding context, in addition to bag-of-Words (BoW), and heuristic rules were evaluated as features for the following classifiers: Naïve Bayes, Support Vector Machine, and Decision Tree, using 10-fold cross-validation. Accuracies for ‘+’, ‘–’, ‘/’, and ‘#’ were 80.11%, 80.22%, 90.44%, and 95.00% respectively, with Naïve Bayes. While symbol context contributed the most, BoW was also helpful for disambiguation of some symbols. Symbol disambiguation with supervised techniques can be implemented with reasonable accuracy as a module for medical NLP systems. PMID:22195157
Automatic identification of critical follow-up recommendation sentences in radiology reports.
Yetisgen-Yildiz, Meliha; Gunn, Martin L; Xia, Fei; Payne, Thomas H
2011-01-01
Communication of follow-up recommendations when abnormalities are identified on imaging studies is prone to error. When recommendations are not systematically identified and promptly communicated to referrers, poor patient outcomes can result. Using information technology can improve communication and improve patient safety. In this paper, we describe a text processing approach that uses natural language processing (NLP) and supervised text classification methods to automatically identify critical recommendation sentences in radiology reports. To increase the classification performance we enhanced the simple unigram token representation approach with lexical, semantic, knowledge-base, and structural features. We tested different combinations of those features with the Maximum Entropy (MaxEnt) classification algorithm. Classifiers were trained and tested with a gold standard corpus annotated by a domain expert. We applied 5-fold cross validation and our best performing classifier achieved 95.60% precision, 79.82% recall, 87.0% F-score, and 99.59% classification accuracy in identifying the critical recommendation sentences in radiology reports.
NASA Astrophysics Data System (ADS)
Shiraishi, Yuhki; Takeda, Fumiaki
In this research, we have developed a sorting system for fishes, which is comprised of a conveyance part, a capturing image part, and a sorting part. In the conveyance part, we have developed an independent conveyance system in order to separate one fish from an intertwined group of fishes. After the image of the separated fish is captured in the capturing part, a rotation invariant feature is extracted using two-dimensional fast Fourier transform, which is the mean value of the power spectrum with the same distance from the origin in the spectrum field. After that, the fishes are classified by three-layered feed-forward neural networks. The experimental results show that the developed system classifies three kinds of fishes captured in various angles with the classification ratio of 98.95% for 1044 captured images of five fishes. The other experimental results show the classification ratio of 90.7% for 300 fishes by 10-fold cross validation method.
Sarker, Hillol; Sharmin, Moushumi; Ali, Amin Ahsan; Rahman, Md Mahbubur; Bari, Rummana; Hossain, Syed Monowar; Kumar, Santosh
Wearable wireless sensors for health monitoring are enabling the design and delivery of just-in-time interventions (JITI). Critical to the success of JITI is to time its delivery so that the user is available to be engaged. We take a first step in modeling users' availability by analyzing 2,064 hours of physiological sensor data and 2,717 self-reports collected from 30 participants in a week-long field study. We use delay in responding to a prompt to objectively measure availability. We compute 99 features and identify 30 as most discriminating to train a machine learning model for predicting availability. We find that location, affect, activity type, stress, time, and day of the week, play significant roles in predicting availability. We find that users are least available at work and during driving, and most available when walking outside. Our model finally achieves an accuracy of 74.7% in 10-fold cross-validation and 77.9% with leave-one-subject-out.
Sarker, Hillol; Sharmin, Moushumi; Ali, Amin Ahsan; Rahman, Md. Mahbubur; Bari, Rummana; Hossain, Syed Monowar; Kumar, Santosh
2015-01-01
Wearable wireless sensors for health monitoring are enabling the design and delivery of just-in-time interventions (JITI). Critical to the success of JITI is to time its delivery so that the user is available to be engaged. We take a first step in modeling users’ availability by analyzing 2,064 hours of physiological sensor data and 2,717 self-reports collected from 30 participants in a week-long field study. We use delay in responding to a prompt to objectively measure availability. We compute 99 features and identify 30 as most discriminating to train a machine learning model for predicting availability. We find that location, affect, activity type, stress, time, and day of the week, play significant roles in predicting availability. We find that users are least available at work and during driving, and most available when walking outside. Our model finally achieves an accuracy of 74.7% in 10-fold cross-validation and 77.9% with leave-one-subject-out. PMID:25798455
A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features.
Amudha, P; Karthik, S; Sivakumari, S
2015-01-01
Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. This is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC) with Enhanced Particle Swarm Optimization (EPSO) to predict the intrusion detection problem. The algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. The purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup'99 benchmark dataset from the UCI Machine Learning repository is used. The performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different.
A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features
Amudha, P.; Karthik, S.; Sivakumari, S.
2015-01-01
Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. This is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC) with Enhanced Particle Swarm Optimization (EPSO) to predict the intrusion detection problem. The algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. The purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup'99 benchmark dataset from the UCI Machine Learning repository is used. The performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different. PMID:26221625
Chen, Yukun; Wrenn, Jesse; Xu, Hua; Spickard, Anderson; Habermann, Ralf; Powers, James; Denny, Joshua C
2014-01-01
Competence is essential for health care professionals. Current methods to assess competency, however, do not efficiently capture medical students' experience. In this preliminary study, we used machine learning and natural language processing (NLP) to identify geriatric competency exposures from students' clinical notes. The system applied NLP to generate the concepts and related features from notes. We extracted a refined list of concepts associated with corresponding competencies. This system was evaluated through 10-fold cross validation for six geriatric competency domains: "medication management (MedMgmt)", "cognitive and behavioral disorders (CBD)", "falls, balance, gait disorders (Falls)", "self-care capacity (SCC)", "palliative care (PC)", "hospital care for elders (HCE)" - each an American Association of Medical Colleges competency for medical students. The systems could accurately assess MedMgmt, SCC, HCE, and Falls competencies with F-measures of 0.94, 0.86, 0.85, and 0.84, respectively, but did not attain good performance for PC and CBD (0.69 and 0.62 in F-measure, respectively).
Mahalingam, Rajasekaran; Peng, Hung-Pin; Yang, An-Suei
2014-08-01
Protein-fatty acid interaction is vital for many cellular processes and understanding this interaction is important for functional annotation as well as drug discovery. In this work, we present a method for predicting the fatty acid (FA)-binding residues by using three-dimensional probability density distributions of interacting atoms of FAs on protein surfaces which are derived from the known protein-FA complex structures. A machine learning algorithm was established to learn the characteristic patterns of the probability density maps specific to the FA-binding sites. The predictor was trained with five-fold cross validation on a non-redundant training set and then evaluated with an independent test set as well as on holo-apo pair's dataset. The results showed good accuracy in predicting the FA-binding residues. Further, the predictor developed in this study is implemented as an online server which is freely accessible at the following website, http://ismblab.genomics.sinica.edu.tw/. Copyright © 2014 Elsevier B.V. All rights reserved.
Huynh, Benjamin Q; Li, Hui; Giger, Maryellen L
2016-07-01
Convolutional neural networks (CNNs) show potential for computer-aided diagnosis (CADx) by learning features directly from the image data instead of using analytically extracted features. However, CNNs are difficult to train from scratch for medical images due to small sample sizes and variations in tumor presentations. Instead, transfer learning can be used to extract tumor information from medical images via CNNs originally pretrained for nonmedical tasks, alleviating the need for large datasets. Our database includes 219 breast lesions (607 full-field digital mammographic images). We compared support vector machine classifiers based on the CNN-extracted image features and our prior computer-extracted tumor features in the task of distinguishing between benign and malignant breast lesions. Five-fold cross validation (by lesion) was conducted with the area under the receiver operating characteristic (ROC) curve as the performance metric. Results show that classifiers based on CNN-extracted features (with transfer learning) perform comparably to those using analytically extracted features [area under the ROC curve [Formula: see text
Maldonado, Ramon; Goodwin, Travis R; Harabagiu, Sanda M
2018-01-01
The automatic identification of relations between medical concepts in a large corpus of Electroencephalography (EEG) reports is an important step in the development of an EEG-specific patient cohort retrieval system as well as in the acquisition of EEG-specific knowledge from this corpus. EEG-specific relations involve medical concepts that are not typically mentioned in the same sentence or even the same section of a report, thus requiring extraction techniques that can handle such long-distance dependencies. To address this challenge, we present a novel frame work which combines the advantages of a deep learning framework employing Dynamic Relational Memory (DRM) with active learning. While DRM enables the prediction of long-distance relations, active learning provides a mechanism for accurately identifying relations with minimal training data, obtaining an 5-fold cross validationF1 score of 0.7475 on a set of 140 EEG reports selected with active learning. The results obtained with our novel framework show great promise.
Stability of halophilic proteins: from dipeptide attributes to discrimination classifier.
Zhang, Guangya; Huihua, Ge; Yi, Lin
2013-02-01
To investigate the molecular features responsible for protein halophilicity is of great significance for understanding the structure basis of protein halo-stability and would help to develop a practical strategy for designing halophilic proteins. In this work, we have systematically analyzed the dipeptide composition of the halophilic and non-halophilic protein sequences. We observed the halophilic proteins contained more DA, RA, AD, RR, AP, DD, PD, EA, VG and DV at the expense of LK, IL, II, IA, KK, IS, KA, GK, RK and AI. We identified some macromolecular signatures of halo-adaptation, and thought the dipeptide composition might contain more information than amino acid composition. Based on the dipeptide composition, we have developed a machine learning method for classifying halophilic and non-halophilic proteins for the first time. The accuracy of our method for the training dataset was 100.0%, and for the 10-fold cross-validation was 93.1%. We also discussed the influence of some specific dipeptides on prediction accuracy. Copyright © 2012 Elsevier B.V. All rights reserved.
A Real-Time Earthquake Precursor Detection Technique Using TEC from a GPS Network
NASA Astrophysics Data System (ADS)
Alp Akyol, Ali; Arikan, Feza; Arikan, Orhan
2016-07-01
Anomalies have been observed in the ionospheric electron density distribution prior to strong earthquakes. However, most of the reported results are obtained by earthquake analysis. Therefore, their implementation in practice is highly problematic. Recently, a novel earthquake precursor detection technique based on spatio-temporal analysis of Total Electron Content (TEC) data obtained from Turkish National Permanent GPS Network (TNPGN) is developed by IONOLAB group (www.ionolab.org). In the present study, the developed detection technique is implemented in a causal setup over the available data set in test phase that enables the real time implementation. The performance of the developed earthquake prediction technique is evaluated by using 10 fold cross validation over the data obtained in 2011. Among the 23 earthquakes that have magnitudes higher than 5, the developed technique can detect precursors of 14 earthquakes while producing 8 false alarms. This study is supported by TUBITAK 115E915 and Joint TUBITAK 114E092 and AS CR 14/001 projects.
Emotion detection model of Filipino music
NASA Astrophysics Data System (ADS)
Noblejas, Kathleen Alexis; Isidro, Daryl Arvin; Samonte, Mary Jane C.
2017-02-01
This research explored the creation of a model to detect emotion from Filipino songs. The emotion model used was based from Paul Ekman's six basic emotions. The songs were classified into the following genres: kundiman, novelty, pop, and rock. The songs were annotated by a group of music experts based on the emotion the song induces to the listener. Musical features of the songs were extracted using jAudio while the lyric features were extracted by Bag-of- Words feature representation. The audio and lyric features of the Filipino songs were extracted for classification by the chosen three classifiers, Naïve Bayes, Support Vector Machines, and k-Nearest Neighbors. The goal of the research was to know which classifier would work best for Filipino music. Evaluation was done by 10-fold cross validation and accuracy, precision, recall, and F-measure results were compared. Models were also tested with unknown test data to further determine the models' accuracy through the prediction results.
Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.
Liu, Liang; Cai, Yudong; Lu, Wencong; Feng, Kaiyan; Peng, Chunrong; Niu, Bing
2009-03-06
Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.
Ejlerskov, Katrine T.; Jensen, Signe M.; Christensen, Line B.; Ritz, Christian; Michaelsen, Kim F.; Mølgaard, Christian
2014-01-01
For 3-year-old children suitable methods to estimate body composition are sparse. We aimed to develop predictive equations for estimating fat-free mass (FFM) from bioelectrical impedance (BIA) and anthropometry using dual-energy X-ray absorptiometry (DXA) as reference method using data from 99 healthy 3-year-old Danish children. Predictive equations were derived from two multiple linear regression models, a comprehensive model (height2/resistance (RI), six anthropometric measurements) and a simple model (RI, height, weight). Their uncertainty was quantified by means of 10-fold cross-validation approach. Prediction error of FFM was 3.0% for both equations (root mean square error: 360 and 356 g, respectively). The derived equations produced BIA-based prediction of FFM and FM near DXA scan results. We suggest that the predictive equations can be applied in similar population samples aged 2–4 years. The derived equations may prove useful for studies linking body composition to early risk factors and early onset of obesity. PMID:24463487
Ejlerskov, Katrine T; Jensen, Signe M; Christensen, Line B; Ritz, Christian; Michaelsen, Kim F; Mølgaard, Christian
2014-01-27
For 3-year-old children suitable methods to estimate body composition are sparse. We aimed to develop predictive equations for estimating fat-free mass (FFM) from bioelectrical impedance (BIA) and anthropometry using dual-energy X-ray absorptiometry (DXA) as reference method using data from 99 healthy 3-year-old Danish children. Predictive equations were derived from two multiple linear regression models, a comprehensive model (height(2)/resistance (RI), six anthropometric measurements) and a simple model (RI, height, weight). Their uncertainty was quantified by means of 10-fold cross-validation approach. Prediction error of FFM was 3.0% for both equations (root mean square error: 360 and 356 g, respectively). The derived equations produced BIA-based prediction of FFM and FM near DXA scan results. We suggest that the predictive equations can be applied in similar population samples aged 2-4 years. The derived equations may prove useful for studies linking body composition to early risk factors and early onset of obesity.
Pathological brain detection based on wavelet entropy and Hu moment invariants.
Zhang, Yudong; Wang, Shuihua; Sun, Ping; Phillips, Preetha
2015-01-01
With the aim of developing an accurate pathological brain detection system, we proposed a novel automatic computer-aided diagnosis (CAD) to detect pathological brains from normal brains obtained by magnetic resonance imaging (MRI) scanning. The problem still remained a challenge for technicians and clinicians, since MR imaging generated an exceptionally large information dataset. A new two-step approach was proposed in this study. We used wavelet entropy (WE) and Hu moment invariants (HMI) for feature extraction, and the generalized eigenvalue proximal support vector machine (GEPSVM) for classification. To further enhance classification accuracy, the popular radial basis function (RBF) kernel was employed. The 10 runs of k-fold stratified cross validation result showed that the proposed "WE + HMI + GEPSVM + RBF" method was superior to existing methods w.r.t. classification accuracy. It obtained the average classification accuracies of 100%, 100%, and 99.45% over Dataset-66, Dataset-160, and Dataset-255, respectively. The proposed method is effective and can be applied to realistic use.
Evolutionary Wavelet Neural Network ensembles for breast cancer and Parkinson's disease prediction.
Khan, Maryam Mahsal; Mendes, Alexandre; Chalup, Stephan K
2018-01-01
Wavelet Neural Networks are a combination of neural networks and wavelets and have been mostly used in the area of time-series prediction and control. Recently, Evolutionary Wavelet Neural Networks have been employed to develop cancer prediction models. The present study proposes to use ensembles of Evolutionary Wavelet Neural Networks. The search for a high quality ensemble is directed by a fitness function that incorporates the accuracy of the classifiers both independently and as part of the ensemble itself. The ensemble approach is tested on three publicly available biomedical benchmark datasets, one on Breast Cancer and two on Parkinson's disease, using a 10-fold cross-validation strategy. Our experimental results show that, for the first dataset, the performance was similar to previous studies reported in literature. On the second dataset, the Evolutionary Wavelet Neural Network ensembles performed better than all previous methods. The third dataset is relatively new and this study is the first to report benchmark results.