Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction.
Cheng, Hao; Garrick, Dorian J; Fernando, Rohan L
2017-01-01
A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model. Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis. Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
Reduction of bias and variance for evaluation of computer-aided diagnostic schemes.
Li, Qiang; Doi, Kunio
2006-04-01
Computer-aided diagnostic (CAD) schemes have been developed to assist radiologists in detecting various lesions in medical images. In addition to the development, an equally important problem is the reliable evaluation of the performance levels of various CAD schemes. It is good to see that more and more investigators are employing more reliable evaluation methods such as leave-one-out and cross validation, instead of less reliable methods such as resubstitution, for assessing their CAD schemes. However, the common applications of leave-one-out and cross-validation evaluation methods do not necessarily imply that the estimated performance levels are accurate and precise. Pitfalls often occur in the use of leave-one-out and cross-validation evaluation methods, and they lead to unreliable estimation of performance levels. In this study, we first identified a number of typical pitfalls for the evaluation of CAD schemes, and conducted a Monte Carlo simulation experiment for each of the pitfalls to demonstrate quantitatively the extent of bias and/or variance caused by the pitfall. Our experimental results indicate that considerable bias and variance may exist in the estimated performance levels of CAD schemes if one employs various flawed leave-one-out and cross-validation evaluation methods. In addition, for promoting and utilizing a high standard for reliable evaluation of CAD schemes, we attempt to make recommendations, whenever possible, for overcoming these pitfalls. We believe that, with the recommended evaluation methods, we can considerably reduce the bias and variance in the estimated performance levels of CAD schemes.
NASA Astrophysics Data System (ADS)
Petersen, D.; Naveed, P.; Ragheb, A.; Niedieker, D.; El-Mashtoly, S. F.; Brechmann, T.; Kötting, C.; Schmiegel, W. H.; Freier, E.; Pox, C.; Gerwert, K.
2017-06-01
Endoscopy plays a major role in early recognition of cancer which is not externally accessible and therewith in increasing the survival rate. Raman spectroscopic fiber-optical approaches can help to decrease the impact on the patient, increase objectivity in tissue characterization, reduce expenses and provide a significant time advantage in endoscopy. In gastroenterology an early recognition of malign and precursor lesions is relevant. Instantaneous and precise differentiation between adenomas as precursor lesions for cancer and hyperplastic polyps on the one hand and between high and low-risk alterations on the other hand is important. Raman fiber-optical measurements of colon biopsy samples taken during colonoscopy were carried out during a clinical study, and samples of adenocarcinoma (22), tubular adenomas (141), hyperplastic polyps (79) and normal tissue (101) from 151 patients were analyzed. This allows us to focus on the bioinformatic analysis and to set stage for Raman endoscopic measurements. Since spectral differences between normal and cancerous biopsy samples are small, special care has to be taken in data analysis. Using a leave-one-patient-out cross-validation scheme, three different outlier identification methods were investigated to decrease the influence of systematic errors, like a residual risk in misplacement of the sample and spectral dilution of marker bands (esp. cancerous tissue) and therewith optimize the experimental design. Furthermore other validations methods like leave-one-sample-out and leave-one-spectrum-out cross-validation schemes were compared with leave-one-patient-out cross-validation. High-risk lesions were differentiated from low-risk lesions with a sensitivity of 79%, specificity of 74% and an accuracy of 77%, cancer and normal tissue with a sensitivity of 79%, specificity of 83% and an accuracy of 81%. Additionally applied outlier identification enabled us to improve the recognition of neoplastic biopsy samples.
How to test validity in orthodontic research: a mixed dentition analysis example.
Donatelli, Richard E; Lee, Shin-Jae
2015-02-01
The data used to test the validity of a prediction method should be different from the data used to generate the prediction model. In this study, we explored whether an independent data set is mandatory for testing the validity of a new prediction method and how validity can be tested without independent new data. Several validation methods were compared in an example using the data from a mixed dentition analysis with a regression model. The validation errors of real mixed dentition analysis data and simulation data were analyzed for increasingly large data sets. The validation results of both the real and the simulation studies demonstrated that the leave-1-out cross-validation method had the smallest errors. The largest errors occurred in the traditional simple validation method. The differences between the validation methods diminished as the sample size increased. The leave-1-out cross-validation method seems to be an optimal validation method for improving the prediction accuracy in a data set with limited sample sizes. Copyright © 2015 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Jiang, Wenyu; Simon, Richard
2007-12-20
This paper first provides a critical review on some existing methods for estimating the prediction error in classifying microarray data where the number of genes greatly exceeds the number of specimens. Special attention is given to the bootstrap-related methods. When the sample size n is small, we find that all the reviewed methods suffer from either substantial bias or variability. We introduce a repeated leave-one-out bootstrap (RLOOB) method that predicts for each specimen in the sample using bootstrap learning sets of size ln. We then propose an adjusted bootstrap (ABS) method that fits a learning curve to the RLOOB estimates calculated with different bootstrap learning set sizes. The ABS method is robust across the situations we investigate and provides a slightly conservative estimate for the prediction error. Even with small samples, it does not suffer from large upward bias as the leave-one-out bootstrap and the 0.632+ bootstrap, and it does not suffer from large variability as the leave-one-out cross-validation in microarray applications. Copyright (c) 2007 John Wiley & Sons, Ltd.
Zhang, Mengliang; Zhao, Yang; Harrington, Peter de B; Chen, Pei
2016-03-01
Two simple fingerprinting methods, flow-injection coupled to ultraviolet spectroscopy and proton nuclear magnetic resonance, were used for discriminating between Aurantii fructus immaturus and Fructus poniciri trifoliatae immaturus . Both methods were combined with partial least-squares discriminant analysis. In the flow-injection method, four data representations were evaluated: total ultraviolet absorbance chromatograms, averaged ultraviolet spectra, absorbance at 193, 205, 225, and 283 nm, and absorbance at 225 and 283 nm. Prediction rates of 100% were achieved for all data representations by partial least-squares discriminant analysis using leave-one-sample-out cross-validation. The prediction rate for the proton nuclear magnetic resonance data by partial least-squares discriminant analysis with leave-one-sample-out cross-validation was also 100%. A new validation set of data was collected by flow-injection with ultraviolet spectroscopic detection two weeks later and predicted by partial least-squares discriminant analysis models constructed by the initial data representations with no parameter changes. The classification rates were 95% with the total ultraviolet absorbance chromatograms datasets and 100% with the other three datasets. Flow-injection with ultraviolet detection and proton nuclear magnetic resonance are simple, high throughput, and low-cost methods for discrimination studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Xiaolin; Ye, Li; Wang, Xiaoxiang
2012-12-15
Several recent reports suggested that hydroxylated polybrominated diphenyl ethers (HO-PBDEs) may disturb thyroid hormone homeostasis. To illuminate the structural features for thyroid hormone activity of HO-PBDEs and the binding mode between HO-PBDEs and thyroid hormone receptor (TR), the hormone activity of a series of HO-PBDEs to thyroid receptors β was studied based on the combination of 3D-QSAR, molecular docking, and molecular dynamics (MD) methods. The ligand- and receptor-based 3D-QSAR models were obtained using Comparative Molecular Similarity Index Analysis (CoMSIA) method. The optimum CoMSIA model with region focusing yielded satisfactory statistical results: leave-one-out cross-validation correlation coefficient (q{sup 2}) was 0.571 andmore » non-cross-validation correlation coefficient (r{sup 2}) was 0.951. Furthermore, the results of internal validation such as bootstrapping, leave-many-out cross-validation, and progressive scrambling as well as external validation indicated the rationality and good predictive ability of the best model. In addition, molecular docking elucidated the conformations of compounds and key amino acid residues at the docking pocket, MD simulation further determined the binding process and validated the rationality of docking results. -- Highlights: ► The thyroid hormone activities of HO-PBDEs were studied by 3D-QSAR. ► The binding modes between HO-PBDEs and TRβ were explored. ► 3D-QSAR, molecular docking, and molecular dynamics (MD) methods were performed.« less
Correcting for Optimistic Prediction in Small Data Sets
Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.
2014-01-01
The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219
Divya, O; Mishra, Ashok K
2007-05-29
Quantitative determination of kerosene fraction present in diesel has been carried out based on excitation emission matrix fluorescence (EEMF) along with parallel factor analysis (PARAFAC) and N-way partial least squares regression (N-PLS). EEMF is a simple, sensitive and nondestructive method suitable for the analysis of multifluorophoric mixtures. Calibration models consisting of varying compositions of diesel and kerosene were constructed and their validation was carried out using leave-one-out cross validation method. The accuracy of the model was evaluated through the root mean square error of prediction (RMSEP) for the PARAFAC, N-PLS and unfold PLS methods. N-PLS was found to be a better method compared to PARAFAC and unfold PLS method because of its low RMSEP values.
Bairy, Santhosh Kumar; Suneel Kumar, B V S; Bhalla, Joseph Uday Tej; Pramod, A B; Ravikumar, Muttineni
2009-04-01
c-Src kinase play an important role in cell growth and differentiation and its inhibitors can be useful for the treatment of various diseases, including cancer, osteoporosis, and metastatic bone disease. Three dimensional quantitative structure-activity relationship (3D-QSAR) studies were carried out on quinazolin derivatives inhibiting c-Src kinase. Molecular field analysis (MFA) models with four different alignment techniques, namely, GLIDE, GOLD, LIGANDFIT and Least squares based methods were developed. glide based MFA model showed better results (Leave one out cross validation correlation coefficient r(2)(cv) = 0.923 and non-cross validation correlation coefficient r(2)= 0.958) when compared with other models. These results help us to understand the nature of descriptors required for activity of these compounds and thereby provide guidelines to design novel and potent c-Src kinase inhibitors.
Rojas, Cristian; Duchowicz, Pablo R; Tripaldi, Piercosimo; Pis Diez, Reinaldo
2015-11-27
A quantitative structure-property relationship (QSPR) was developed for modeling the retention index of 1184 flavor and fragrance compounds measured using a Carbowax 20M glass capillary gas chromatography column. The 4885 molecular descriptors were calculated using Dragon software, and then were simultaneously analyzed through multivariable linear regression analysis using the replacement method (RM) variable subset selection technique. We proceeded in three steps, the first one by considering all descriptor blocks, the second one by excluding conformational descriptor blocks, and the last one by analyzing only 3D-descriptor families. The models were validated through an external test set of compounds. Cross-validation methods such as leave-one-out and leave-many-out were applied, together with Y-randomization and applicability domain analysis. The developed model was used to estimate the I of a set of 22 molecules. The results clearly suggest that 3D-descriptors do not offer relevant information for modeling the retention index, while a topological index such as the Randić-like index from reciprocal squared distance matrix has a high relevance for this purpose. Copyright © 2015 Elsevier B.V. All rights reserved.
Refining Time-Activity Classification of Human Subjects Using the Global Positioning System.
Hu, Maogui; Li, Wei; Li, Lianfa; Houston, Douglas; Wu, Jun
2016-01-01
Detailed spatial location information is important in accurately estimating personal exposure to air pollution. Global Position System (GPS) has been widely used in tracking personal paths and activities. Previous researchers have developed time-activity classification models based on GPS data, most of them were developed for specific regions. An adaptive model for time-location classification can be widely applied to air pollution studies that use GPS to track individual level time-activity patterns. Time-activity data were collected for seven days using GPS loggers and accelerometers from thirteen adult participants from Southern California under free living conditions. We developed an automated model based on random forests to classify major time-activity patterns (i.e. indoor, outdoor-static, outdoor-walking, and in-vehicle travel). Sensitivity analysis was conducted to examine the contribution of the accelerometer data and the supplemental spatial data (i.e. roadway and tax parcel data) to the accuracy of time-activity classification. Our model was evaluated using both leave-one-fold-out and leave-one-subject-out methods. Maximum speeds in averaging time intervals of 7 and 5 minutes, and distance to primary highways with limited access were found to be the three most important variables in the classification model. Leave-one-fold-out cross-validation showed an overall accuracy of 99.71%. Sensitivities varied from 84.62% (outdoor walking) to 99.90% (indoor). Specificities varied from 96.33% (indoor) to 99.98% (outdoor static). The exclusion of accelerometer and ambient light sensor variables caused a slight loss in sensitivity for outdoor walking, but little loss in overall accuracy. However, leave-one-subject-out cross-validation showed considerable loss in sensitivity for outdoor static and outdoor walking conditions. The random forests classification model can achieve high accuracy for the four major time-activity categories. The model also performed well with just GPS, road and tax parcel data. However, caution is warranted when generalizing the model developed from a small number of subjects to other populations.
Besalú, Emili
2016-01-01
The Superposing Significant Interaction Rules (SSIR) method is described. It is a general combinatorial and symbolic procedure able to rank compounds belonging to combinatorial analogue series. The procedure generates structure-activity relationship (SAR) models and also serves as an inverse SAR tool. The method is fast and can deal with large databases. SSIR operates from statistical significances calculated from the available library of compounds and according to the previously attached molecular labels of interest or non-interest. The required symbolic codification allows dealing with almost any combinatorial data set, even in a confidential manner, if desired. The application example categorizes molecules as binding or non-binding, and consensus ranking SAR models are generated from training and two distinct cross-validation methods: leave-one-out and balanced leave-two-out (BL2O), the latter being suited for the treatment of binary properties. PMID:27240346
Refining Time-Activity Classification of Human Subjects Using the Global Positioning System
Hu, Maogui; Li, Wei; Li, Lianfa; Houston, Douglas; Wu, Jun
2016-01-01
Background Detailed spatial location information is important in accurately estimating personal exposure to air pollution. Global Position System (GPS) has been widely used in tracking personal paths and activities. Previous researchers have developed time-activity classification models based on GPS data, most of them were developed for specific regions. An adaptive model for time-location classification can be widely applied to air pollution studies that use GPS to track individual level time-activity patterns. Methods Time-activity data were collected for seven days using GPS loggers and accelerometers from thirteen adult participants from Southern California under free living conditions. We developed an automated model based on random forests to classify major time-activity patterns (i.e. indoor, outdoor-static, outdoor-walking, and in-vehicle travel). Sensitivity analysis was conducted to examine the contribution of the accelerometer data and the supplemental spatial data (i.e. roadway and tax parcel data) to the accuracy of time-activity classification. Our model was evaluated using both leave-one-fold-out and leave-one-subject-out methods. Results Maximum speeds in averaging time intervals of 7 and 5 minutes, and distance to primary highways with limited access were found to be the three most important variables in the classification model. Leave-one-fold-out cross-validation showed an overall accuracy of 99.71%. Sensitivities varied from 84.62% (outdoor walking) to 99.90% (indoor). Specificities varied from 96.33% (indoor) to 99.98% (outdoor static). The exclusion of accelerometer and ambient light sensor variables caused a slight loss in sensitivity for outdoor walking, but little loss in overall accuracy. However, leave-one-subject-out cross-validation showed considerable loss in sensitivity for outdoor static and outdoor walking conditions. Conclusions The random forests classification model can achieve high accuracy for the four major time-activity categories. The model also performed well with just GPS, road and tax parcel data. However, caution is warranted when generalizing the model developed from a small number of subjects to other populations. PMID:26919723
Robust Smoothing: Smoothing Parameter Selection and Applications to Fluorescence Spectroscopy∂
Lee, Jong Soo; Cox, Dennis D.
2009-01-01
Fluorescence spectroscopy has emerged in recent years as an effective way to detect cervical cancer. Investigation of the data preprocessing stage uncovered a need for a robust smoothing to extract the signal from the noise. Various robust smoothing methods for estimating fluorescence emission spectra are compared and data driven methods for the selection of smoothing parameter are suggested. The methods currently implemented in R for smoothing parameter selection proved to be unsatisfactory, and a computationally efficient procedure that approximates robust leave-one-out cross validation is presented. PMID:20729976
Exploring Mouse Protein Function via Multiple Approaches.
Huang, Guohua; Chu, Chen; Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning; Cai, Yu-Dong
2016-01-01
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.
Exploring Mouse Protein Function via Multiple Approaches
Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning
2016-01-01
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. PMID:27846315
NASA Astrophysics Data System (ADS)
Haddad, Khaled; Rahman, Ataur; A Zaman, Mohammad; Shrestha, Surendra
2013-03-01
SummaryIn regional hydrologic regression analysis, model selection and validation are regarded as important steps. Here, the model selection is usually based on some measurements of goodness-of-fit between the model prediction and observed data. In Regional Flood Frequency Analysis (RFFA), leave-one-out (LOO) validation or a fixed percentage leave out validation (e.g., 10%) is commonly adopted to assess the predictive ability of regression-based prediction equations. This paper develops a Monte Carlo Cross Validation (MCCV) technique (which has widely been adopted in Chemometrics and Econometrics) in RFFA using Generalised Least Squares Regression (GLSR) and compares it with the most commonly adopted LOO validation approach. The study uses simulated and regional flood data from the state of New South Wales in Australia. It is found that when developing hydrologic regression models, application of the MCCV is likely to result in a more parsimonious model than the LOO. It has also been found that the MCCV can provide a more realistic estimate of a model's predictive ability when compared with the LOO.
Spatial interpolation quality assessments for soil sensor transect datasets
USDA-ARS?s Scientific Manuscript database
Near-ground geophysical soil sensors provide extremely valuable information for precision agriculture applications. Indeed, their readings can be used as proxy for many soil parameters. Typically, leave-one-out (loo) cross-validation (CV) of spatial interpolation of sensor data returns overly optimi...
Tipton, John; Hooten, Mevin B.; Goring, Simon
2017-01-01
Scientific records of temperature and precipitation have been kept for several hundred years, but for many areas, only a shorter record exists. To understand climate change, there is a need for rigorous statistical reconstructions of the paleoclimate using proxy data. Paleoclimate proxy data are often sparse, noisy, indirect measurements of the climate process of interest, making each proxy uniquely challenging to model statistically. We reconstruct spatially explicit temperature surfaces from sparse and noisy measurements recorded at historical United States military forts and other observer stations from 1820 to 1894. One common method for reconstructing the paleoclimate from proxy data is principal component regression (PCR). With PCR, one learns a statistical relationship between the paleoclimate proxy data and a set of climate observations that are used as patterns for potential reconstruction scenarios. We explore PCR in a Bayesian hierarchical framework, extending classical PCR in a variety of ways. First, we model the latent principal components probabilistically, accounting for measurement error in the observational data. Next, we extend our method to better accommodate outliers that occur in the proxy data. Finally, we explore alternatives to the truncation of lower-order principal components using different regularization techniques. One fundamental challenge in paleoclimate reconstruction efforts is the lack of out-of-sample data for predictive validation. Cross-validation is of potential value, but is computationally expensive and potentially sensitive to outliers in sparse data scenarios. To overcome the limitations that a lack of out-of-sample records presents, we test our methods using a simulation study, applying proper scoring rules including a computationally efficient approximation to leave-one-out cross-validation using the log score to validate model performance. The result of our analysis is a spatially explicit reconstruction of spatio-temporal temperature from a very sparse historical record.
Tóth, Gergely; Bodai, Zsolt; Héberger, Károly
2013-10-01
Coefficient of determination (R (2)) and its leave-one-out cross-validated analogue (denoted by Q (2) or R cv (2) ) are the most frequantly published values to characterize the predictive performance of models. In this article we use R (2) and Q (2) in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q (2))/(1 - R (2)) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q (2))/(1 - R (2)) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q (2) and R (2) values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling.
Zhou, Yan; Cao, Hui
2013-01-01
We propose an augmented classical least squares (ACLS) calibration method for quantitative Raman spectral analysis against component information loss. The Raman spectral signals with low analyte concentration correlations were selected and used as the substitutes for unknown quantitative component information during the CLS calibration procedure. The number of selected signals was determined by using the leave-one-out root-mean-square error of cross-validation (RMSECV) curve. An ACLS model was built based on the augmented concentration matrix and the reference spectral signal matrix. The proposed method was compared with partial least squares (PLS) and principal component regression (PCR) using one example: a data set recorded from an experiment of analyte concentration determination using Raman spectroscopy. A 2-fold cross-validation with Venetian blinds strategy was exploited to evaluate the predictive power of the proposed method. The one-way variance analysis (ANOVA) was used to access the predictive power difference between the proposed method and existing methods. Results indicated that the proposed method is effective at increasing the robust predictive power of traditional CLS model against component information loss and its predictive power is comparable to that of PLS or PCR.
Near-affine-invariant texture learning for lung tissue analysis using isotropic wavelet frames.
Depeursinge, Adrien; Van de Ville, Dimitri; Platon, Alexandra; Geissbuhler, Antoine; Poletti, Pierre-Alexandre; Müller, Henning
2012-07-01
We propose near-affine-invariant texture descriptors derived from isotropic wavelet frames for the characterization of lung tissue patterns in high-resolution computed tomography (HRCT) imaging. Affine invariance is desirable to enable learning of nondeterministic textures without a priori localizations, orientations, or sizes. When combined with complementary gray-level histograms, the proposed method allows a global classification accuracy of 76.9% with balanced precision among five classes of lung tissue using a leave-one-patient-out cross validation, in accordance with clinical practice.
Doubly stochastic radial basis function methods
NASA Astrophysics Data System (ADS)
Yang, Fenglian; Yan, Liang; Ling, Leevan
2018-06-01
We propose a doubly stochastic radial basis function (DSRBF) method for function recoveries. Instead of a constant, we treat the RBF shape parameters as stochastic variables whose distribution were determined by a stochastic leave-one-out cross validation (LOOCV) estimation. A careful operation count is provided in order to determine the ranges of all the parameters in our methods. The overhead cost for setting up the proposed DSRBF method is O (n2) for function recovery problems with n basis. Numerical experiments confirm that the proposed method not only outperforms constant shape parameter formulation (in terms of accuracy with comparable computational cost) but also the optimal LOOCV formulation (in terms of both accuracy and computational cost).
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luo, Heng; Ye, Hao; Ng, Hui Wen
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides
Luo, Heng; Ye, Hao; Ng, Hui Wen; Sakkiah, Sugunadevi; Mendrick, Donna L.; Hong, Huixiao
2016-01-01
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. This algorithm can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system. PMID:27558848
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides
Luo, Heng; Ye, Hao; Ng, Hui Wen; ...
2016-08-25
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less
Sparse brain network using penalized linear regression
NASA Astrophysics Data System (ADS)
Lee, Hyekyoung; Lee, Dong Soo; Kang, Hyejin; Kim, Boong-Nyun; Chung, Moo K.
2011-03-01
Sparse partial correlation is a useful connectivity measure for brain networks when it is difficult to compute the exact partial correlation in the small-n large-p setting. In this paper, we formulate the problem of estimating partial correlation as a sparse linear regression with a l1-norm penalty. The method is applied to brain network consisting of parcellated regions of interest (ROIs), which are obtained from FDG-PET images of the autism spectrum disorder (ASD) children and the pediatric control (PedCon) subjects. To validate the results, we check their reproducibilities of the obtained brain networks by the leave-one-out cross validation and compare the clustered structures derived from the brain networks of ASD and PedCon.
Shirahata, Mitsuaki; Iwao-Koizumi, Kyoko; Saito, Sakae; Ueno, Noriko; Oda, Masashi; Hashimoto, Nobuo; Takahashi, Jun A; Kato, Kikuya
2007-12-15
Current morphology-based glioma classification methods do not adequately reflect the complex biology of gliomas, thus limiting their prognostic ability. In this study, we focused on anaplastic oligodendroglioma and glioblastoma, which typically follow distinct clinical courses. Our goal was to construct a clinically useful molecular diagnostic system based on gene expression profiling. The expression of 3,456 genes in 32 patients, 12 and 20 of whom had prognostically distinct anaplastic oligodendroglioma and glioblastoma, respectively, was measured by PCR array. Next to unsupervised methods, we did supervised analysis using a weighted voting algorithm to construct a diagnostic system discriminating anaplastic oligodendroglioma from glioblastoma. The diagnostic accuracy of this system was evaluated by leave-one-out cross-validation. The clinical utility was tested on a microarray-based data set of 50 malignant gliomas from a previous study. Unsupervised analysis showed divergent global gene expression patterns between the two tumor classes. A supervised binary classification model showed 100% (95% confidence interval, 89.4-100%) diagnostic accuracy by leave-one-out cross-validation using 168 diagnostic genes. Applied to a gene expression data set from a previous study, our model correlated better with outcome than histologic diagnosis, and also displayed 96.6% (28 of 29) consistency with the molecular classification scheme used for these histologically controversial gliomas in the original article. Furthermore, we observed that histologically diagnosed glioblastoma samples that shared anaplastic oligodendroglioma molecular characteristics tended to be associated with longer survival. Our molecular diagnostic system showed reproducible clinical utility and prognostic ability superior to traditional histopathologic diagnosis for malignant glioma.
SU-E-J-85: Leave-One-Out Perturbation (LOOP) Fitting Algorithm for Absolute Dose Film Calibration
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chu, A; Ahmad, M; Chen, Z
2014-06-01
Purpose: To introduce an outliers-recognition fitting routine for film dosimetry. It cannot only be flexible with any linear and non-linear regression but also can provide information for the minimal number of sampling points, critical sampling distributions and evaluating analytical functions for absolute film-dose calibration. Methods: The technique, leave-one-out (LOO) cross validation, is often used for statistical analyses on model performance. We used LOO analyses with perturbed bootstrap fitting called leave-one-out perturbation (LOOP) for film-dose calibration . Given a threshold, the LOO process detects unfit points (“outliers”) compared to other cohorts, and a bootstrap fitting process follows to seek any possibilitiesmore » of using perturbations for further improvement. After that outliers were reconfirmed by a traditional t-test statistics and eliminated, then another LOOP feedback resulted in the final. An over-sampled film-dose- calibration dataset was collected as a reference (dose range: 0-800cGy), and various simulated conditions for outliers and sampling distributions were derived from the reference. Comparisons over the various conditions were made, and the performance of fitting functions, polynomial and rational functions, were evaluated. Results: (1) LOOP can prove its sensitive outlier-recognition by its statistical correlation to an exceptional better goodness-of-fit as outliers being left-out. (2) With sufficient statistical information, the LOOP can correct outliers under some low-sampling conditions that other “robust fits”, e.g. Least Absolute Residuals, cannot. (3) Complete cross-validated analyses of LOOP indicate that the function of rational type demonstrates a much superior performance compared to the polynomial. Even with 5 data points including one outlier, using LOOP with rational function can restore more than a 95% value back to its reference values, while the polynomial fitting completely failed under the same conditions. Conclusion: LOOP can cooperate with any fitting routine functioning as a “robust fit”. In addition, it can be set as a benchmark for film-dose calibration fitting performance.« less
Khalid, Tanzeela; White, Paul; De Lacy Costello, Ben; Persad, Raj; Ewen, Richard; Johnson, Emmanuel; Probert, Chris S.; Ratcliffe, Norman
2013-01-01
There is a need to reduce the number of cystoscopies on patients with haematuria. Presently there are no reliable biomarkers to screen for bladder cancer. In this paper, we evaluate a new simple in–house fabricated, GC-sensor device in the diagnosis of bladder cancer based on volatiles. Sensor outputs from 98 urine samples were used to build and test diagnostic models. Samples were taken from 24 patients with transitional (urothelial) cell carcinoma (age 27-91 years, median 71 years) and 74 controls presenting with urological symptoms, but without a urological malignancy (age 29-86 years, median 64 years); results were analysed using two statistical approaches to assess the robustness of the methodology. A two-group linear discriminant analysis method using a total of 9 time points (which equates to 9 biomarkers) correctly assigned 24/24 (100%) of cancer cases and 70/74 (94.6%) controls. Under leave-one-out cross-validation 23/24 (95.8%) of cancer cases were correctly predicted with 69/74 (93.2%) of controls. For partial least squares discriminant analysis, the correct leave-one-out cross-validation prediction values were 95.8% (cancer cases) and 94.6% (controls). These data are an improvement on those reported by other groups studying headspace gases and also superior to current clinical techniques. This new device shows potential for the diagnosis of bladder cancer, but the data must be reproduced in a larger study. PMID:23861976
Lahmiri, Salim; Gargour, Christian S; Gabrea, Marcel
2014-10-01
An automated diagnosis system that uses complex continuous wavelet transform (CWT) to process retina digital images and support vector machines (SVMs) for classification purposes is presented. In particular, each retina image is transformed into two one-dimensional signals by concatenating image rows and columns separately. The mathematical norm of phase angles found in each one-dimensional signal at each level of CWT decomposition are relied on to characterise the texture of normal images against abnormal images affected by exudates, drusen and microaneurysms. The leave-one-out cross-validation method was adopted to conduct experiments and the results from the SVM show that the proposed approach gives better results than those obtained by other methods based on the correct classification rate, sensitivity and specificity.
Wang, Hui; Qin, Feng; Ruan, Liu; Wang, Rui; Liu, Qi; Ma, Zhanhong; Li, Xiaolong; Cheng, Pei; Wang, Haiguang
2016-01-01
It is important to implement detection and assessment of plant diseases based on remotely sensed data for disease monitoring and control. Hyperspectral data of healthy leaves, leaves in incubation period and leaves in diseased period of wheat stripe rust and wheat leaf rust were collected under in-field conditions using a black-paper-based measuring method developed in this study. After data preprocessing, the models to identify the diseases were built using distinguished partial least squares (DPLS) and support vector machine (SVM), and the disease severity inversion models of stripe rust and the disease severity inversion models of leaf rust were built using quantitative partial least squares (QPLS) and support vector regression (SVR). All the models were validated by using leave-one-out cross validation and external validation. The diseases could be discriminated using both distinguished partial least squares and support vector machine with the accuracies of more than 99%. For each wheat rust, disease severity levels were accurately retrieved using both the optimal QPLS models and the optimal SVR models with the coefficients of determination (R2) of more than 0.90 and the root mean square errors (RMSE) of less than 0.15. The results demonstrated that identification and severity evaluation of stripe rust and leaf rust at the leaf level could be implemented based on the hyperspectral data acquired using the developed method. A scientific basis was provided for implementing disease monitoring by using aerial and space remote sensing technologies.
Ruan, Liu; Wang, Rui; Liu, Qi; Ma, Zhanhong; Li, Xiaolong; Cheng, Pei; Wang, Haiguang
2016-01-01
It is important to implement detection and assessment of plant diseases based on remotely sensed data for disease monitoring and control. Hyperspectral data of healthy leaves, leaves in incubation period and leaves in diseased period of wheat stripe rust and wheat leaf rust were collected under in-field conditions using a black-paper-based measuring method developed in this study. After data preprocessing, the models to identify the diseases were built using distinguished partial least squares (DPLS) and support vector machine (SVM), and the disease severity inversion models of stripe rust and the disease severity inversion models of leaf rust were built using quantitative partial least squares (QPLS) and support vector regression (SVR). All the models were validated by using leave-one-out cross validation and external validation. The diseases could be discriminated using both distinguished partial least squares and support vector machine with the accuracies of more than 99%. For each wheat rust, disease severity levels were accurately retrieved using both the optimal QPLS models and the optimal SVR models with the coefficients of determination (R2) of more than 0.90 and the root mean square errors (RMSE) of less than 0.15. The results demonstrated that identification and severity evaluation of stripe rust and leaf rust at the leaf level could be implemented based on the hyperspectral data acquired using the developed method. A scientific basis was provided for implementing disease monitoring by using aerial and space remote sensing technologies. PMID:27128464
Diagnostic accuracy of eye movements in assessing pedophilia.
Fromberger, Peter; Jordan, Kirsten; Steinkrauss, Henrike; von Herder, Jakob; Witzel, Joachim; Stolpmann, Georg; Kröner-Herwig, Birgit; Müller, Jürgen Leo
2012-07-01
Given that recurrent sexual interest in prepubescent children is one of the strongest single predictors for pedosexual offense recidivism, valid and reliable diagnosis of pedophilia is of particular importance. Nevertheless, current assessment methods still fail to fulfill psychometric quality criteria. The aim of the study was to evaluate the diagnostic accuracy of eye-movement parameters in regard to pedophilic sexual preferences. Eye movements were measured while 22 pedophiles (according to ICD-10 F65.4 diagnosis), 8 non-pedophilic forensic controls, and 52 healthy controls simultaneously viewed the picture of a child and the picture of an adult. Fixation latency was assessed as a parameter for automatic attentional processes and relative fixation time to account for controlled attentional processes. Receiver operating characteristic (ROC) analyses, which are based on calculated age-preference indices, were carried out to determine the classifier performance. Cross-validation using the leave-one-out method was used to test the validity of classifiers. Pedophiles showed significantly shorter fixation latencies and significantly longer relative fixation times for child stimuli than either of the control groups. Classifier performance analysis revealed an area under the curve (AUC) = 0.902 for fixation latency and an AUC = 0.828 for relative fixation time. The eye-tracking method based on fixation latency discriminated between pedophiles and non-pedophiles with a sensitivity of 86.4% and a specificity of 90.0%. Cross-validation demonstrated good validity of eye-movement parameters. Despite some methodological limitations, measuring eye movements seems to be a promising approach to assess deviant pedophilic interests. Eye movements, which represent automatic attentional processes, demonstrated high diagnostic accuracy. © 2012 International Society for Sexual Medicine.
Accuracy assessment of high resolution satellite imagery orientation by leave-one-out method
NASA Astrophysics Data System (ADS)
Brovelli, Maria Antonia; Crespi, Mattia; Fratarcangeli, Francesca; Giannone, Francesca; Realini, Eugenio
Interest in high-resolution satellite imagery (HRSI) is spreading in several application fields, at both scientific and commercial levels. Fundamental and critical goals for the geometric use of this kind of imagery are their orientation and orthorectification, processes able to georeference the imagery and correct the geometric deformations they undergo during acquisition. In order to exploit the actual potentialities of orthorectified imagery in Geomatics applications, the definition of a methodology to assess the spatial accuracy achievable from oriented imagery is a crucial topic. In this paper we want to propose a new method for accuracy assessment based on the Leave-One-Out Cross-Validation (LOOCV), a model validation method already applied in different fields such as machine learning, bioinformatics and generally in any other field requiring an evaluation of the performance of a learning algorithm (e.g. in geostatistics), but never applied to HRSI orientation accuracy assessment. The proposed method exhibits interesting features which are able to overcome the most remarkable drawbacks involved by the commonly used method (Hold-Out Validation — HOV), based on the partitioning of the known ground points in two sets: the first is used in the orientation-orthorectification model (GCPs — Ground Control Points) and the second is used to validate the model itself (CPs — Check Points). In fact the HOV is generally not reliable and it is not applicable when a low number of ground points is available. To test the proposed method we implemented a new routine that performs the LOOCV in the software SISAR, developed by the Geodesy and Geomatics Team at the Sapienza University of Rome to perform the rigorous orientation of HRSI; this routine was tested on some EROS-A and QuickBird images. Moreover, these images were also oriented using the world recognized commercial software OrthoEngine v. 10 (included in the Geomatica suite by PCI), manually performing the LOOCV since only the HOV is implemented. The software comparison guaranteed about the overall correctness and good performances of the SISAR model, whereas the results showed the good features of the LOOCV method.
Multisensor system for toxic gases detection generated on indoor environments
NASA Astrophysics Data System (ADS)
Durán, C. M.; Monsalve, P. A. G.; Mosquera, C. J.
2016-11-01
This work describes a wireless multisensory system for different toxic gases detection generated on indoor environments (i.e., Underground coal mines, etc.). The artificial multisensory system proposed in this study was developed through a set of six chemical gas sensors (MQ) of low cost with overlapping sensitivities to detect hazardous gases in the air. A statistical parameter was implemented to the data set and two pattern recognition methods such as Principal Component Analysis (PCA) and Discriminant Function Analysis (DFA) were used for feature selection. The toxic gases categories were classified with a Probabilistic Neural Network (PNN) in order to validate the results previously obtained. The tests were carried out to verify feasibility of the application through a wireless communication model which allowed to monitor and store the information of the sensor signals for the appropriate analysis. The success rate in the measures discrimination was 100%, using an artificial neural network where leave-one-out was used as cross validation method.
Pesteie, Mehran; Abolmaesumi, Purang; Ashab, Hussam Al-Deen; Lessoway, Victoria A; Massey, Simon; Gunka, Vit; Rohling, Robert N
2015-06-01
Injection therapy is a commonly used solution for back pain management. This procedure typically involves percutaneous insertion of a needle between or around the vertebrae, to deliver anesthetics near nerve bundles. Most frequently, spinal injections are performed either blindly using palpation or under the guidance of fluoroscopy or computed tomography. Recently, due to the drawbacks of the ionizing radiation of such imaging modalities, there has been a growing interest in using ultrasound imaging as an alternative. However, the complex spinal anatomy with different wave-like structures, affected by speckle noise, makes the accurate identification of the appropriate injection plane difficult. The aim of this study was to propose an automated system that can identify the optimal plane for epidural steroid injections and facet joint injections. A multi-scale and multi-directional feature extraction system to provide automated identification of the appropriate plane is proposed. Local Hadamard coefficients are obtained using the sequency-ordered Hadamard transform at multiple scales. Directional features are extracted from local coefficients which correspond to different regions in the ultrasound images. An artificial neural network is trained based on the local directional Hadamard features for classification. The proposed method yields distinctive features for classification which successfully classified 1032 images out of 1090 for epidural steroid injection and 990 images out of 1052 for facet joint injection. In order to validate the proposed method, a leave-one-out cross-validation was performed. The average classification accuracy for leave-one-out validation was 94 % for epidural and 90 % for facet joint targets. Also, the feature extraction time for the proposed method was 20 ms for a native 2D ultrasound image. A real-time machine learning system based on the local directional Hadamard features extracted by the sequency-ordered Hadamard transform for detecting the laminae and facet joints in ultrasound images has been proposed. The system has the potential to assist the anesthesiologists in quickly finding the target plane for epidural steroid injections and facet joint injections.
Lithgow, Brian J; Moussavi, Zahra
2018-06-05
Electrovestibulography (EVestG) recordings have been previously applied toward classifying and/or measuring the severity of several neurological disorders including depression with and without anxiety. This study's objectives were to: (1) extract EVestG features representing physiological differences of healthy women during their menses, and follicular and luteal phases of their menstrual cycle, and (2) compare these features to those observed in previous studies for depression with and without anxiety. Three EVestG recordings were made on 15 young healthy menstruating females during menses, and follicular and luteal phases. Three features were extracted, using the shape and timing of the detected spontaneously evoked vestibulo-acoustic field potentials. Using these features, a 3-way separation of the 3 phases was achieved, with a leave-one-out cross-validation, resulting in accuracy of > 72%. Using an EVestG shape feature, separation of the follicular and luteal phases was achieved with a leave-one-out cross-validation accuracy of > 93%. The mechanism of separation was not like that in previous depression analyses, and is postulated to be more akin to a form of anxiety and/or progesterone sensitivity. © 2018 S. Karger AG, Basel.
Gómez-Carracedo, M P; Andrade, J M; Rutledge, D N; Faber, N M
2007-03-07
Selecting the correct dimensionality is critical for obtaining partial least squares (PLS) regression models with good predictive ability. Although calibration and validation sets are best established using experimental designs, industrial laboratories cannot afford such an approach. Typically, samples are collected in an (formally) undesigned way, spread over time and their measurements are included in routine measurement processes. This makes it hard to evaluate PLS model dimensionality. In this paper, classical criteria (leave-one-out cross-validation and adjusted Wold's criterion) are compared to recently proposed alternatives (smoothed PLS-PoLiSh and a randomization test) to seek out the optimum dimensionality of PLS models. Kerosene (jet fuel) samples were measured by attenuated total reflectance-mid-IR spectrometry and their spectra where used to predict eight important properties determined using reference methods that are time-consuming and prone to analytical errors. The alternative methods were shown to give reliable dimensionality predictions when compared to external validation. By contrast, the simpler methods seemed to be largely affected by the largest changes in the modeling capabilities of the first components.
QSPR using MOLGEN-QSPR: the challenge of fluoroalkane boiling points.
Rücker, Christoph; Meringer, Markus; Kerber, Adalbert
2005-01-01
By means of the new software MOLGEN-QSPR, a multilinear regression model for the boiling points of lower fluoroalkanes is established. The model is based exclusively on simple descriptors derived directly from molecular structure and nevertheless describes a broader set of data more precisely than previous attempts that used either more demanding (quantum chemical) descriptors or more demanding (nonlinear) statistical methods such as neural networks. The model's internal consistency was confirmed by leave-one-out cross-validation. The model was used to predict all unknown boiling points of fluorobutanes, and the quality of predictions was estimated by means of comparison with boiling point predictions for fluoropentanes.
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
Padilla-Buritica, Jorge I.; Martinez-Vargas, Juan D.; Castellanos-Dominguez, German
2016-01-01
Lately, research on computational models of emotion had been getting much attention due to their potential for understanding the mechanisms of emotions and their promising broad range of applications that potentially bridge the gap between human and machine interactions. We propose a new method for emotion classification that relies on features extracted from those active brain areas that are most likely related to emotions. To this end, we carry out the selection of spatially compact regions of interest that are computed using the brain neural activity reconstructed from Electroencephalography data. Throughout this study, we consider three representative feature extraction methods widely applied to emotion detection tasks, including Power spectral density, Wavelet, and Hjorth parameters. Further feature selection is carried out using principal component analysis. For validation purpose, these features are used to feed a support vector machine classifier that is trained under the leave-one-out cross-validation strategy. Obtained results on real affective data show that incorporation of the proposed training method in combination with the enhanced spatial resolution provided by the source estimation allows improving the performed accuracy of discrimination in most of the considered emotions, namely: dominance, valence, and liking. PMID:27489541
Model selection for the North American Breeding Bird Survey: A comparison of methods
Link, William; Sauer, John; Niven, Daniel
2017-01-01
The North American Breeding Bird Survey (BBS) provides data for >420 bird species at multiple geographic scales over 5 decades. Modern computational methods have facilitated the fitting of complex hierarchical models to these data. It is easy to propose and fit new models, but little attention has been given to model selection. Here, we discuss and illustrate model selection using leave-one-out cross validation, and the Bayesian Predictive Information Criterion (BPIC). Cross-validation is enormously computationally intensive; we thus evaluate the performance of the Watanabe-Akaike Information Criterion (WAIC) as a computationally efficient approximation to the BPIC. Our evaluation is based on analyses of 4 models as applied to 20 species covered by the BBS. Model selection based on BPIC provided no strong evidence of one model being consistently superior to the others; for 14/20 species, none of the models emerged as superior. For the remaining 6 species, a first-difference model of population trajectory was always among the best fitting. Our results show that WAIC is not reliable as a surrogate for BPIC. Development of appropriate model sets and their evaluation using BPIC is an important innovation for the analysis of BBS data.
Linear combination methods to improve diagnostic/prognostic accuracy on future observations
Kang, Le; Liu, Aiyi; Tian, Lili
2014-01-01
Multiple diagnostic tests or biomarkers can be combined to improve diagnostic accuracy. The problem of finding the optimal linear combinations of biomarkers to maximise the area under the receiver operating characteristic curve has been extensively addressed in the literature. The purpose of this article is threefold: (1) to provide an extensive review of the existing methods for biomarker combination; (2) to propose a new combination method, namely, the nonparametric stepwise approach; (3) to use leave-one-pair-out cross-validation method, instead of re-substitution method, which is overoptimistic and hence might lead to wrong conclusion, to empirically evaluate and compare the performance of different linear combination methods in yielding the largest area under receiver operating characteristic curve. A data set of Duchenne muscular dystrophy was analysed to illustrate the applications of the discussed combination methods. PMID:23592714
Sivan, Sree Kanth; Manga, Vijjulatha
2012-02-01
Multiple receptors conformation docking (MRCD) and clustering of dock poses allows seamless incorporation of receptor binding conformation of the molecules on wide range of ligands with varied structural scaffold. The accuracy of the approach was tested on a set of 120 cyclic urea molecules having HIV-1 protease inhibitory activity using 12 high resolution X-ray crystal structures and one NMR resolved conformation of HIV-1 protease extracted from protein data bank. A cross validation was performed on 25 non-cyclic urea HIV-1 protease inhibitor having varied structures. The comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) models were generated using 60 molecules in the training set by applying leave one out cross validation method, r (loo) (2) values of 0.598 and 0.674 for CoMFA and CoMSIA respectively and non-cross validated regression coefficient r(2) values of 0.983 and 0.985 were obtained for CoMFA and CoMSIA respectively. The predictive ability of these models was determined using a test set of 60 cyclic urea molecules that gave predictive correlation (r (pred) (2) ) of 0.684 and 0.64 respectively for CoMFA and CoMSIA indicating good internal predictive ability. Based on this information 25 non-cyclic urea molecules were taken as a test set to check the external predictive ability of these models. This gave remarkable out come with r (pred) (2) of 0.61 and 0.53 for CoMFA and CoMSIA respectively. The results invariably show that this method is useful for performing 3D QSAR analysis on molecules having different structural motifs.
A novel multi-target regression framework for time-series prediction of drug efficacy.
Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin
2017-01-18
Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task.
A novel multi-target regression framework for time-series prediction of drug efficacy
Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin
2017-01-01
Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task. PMID:28098186
Chen, Yinsheng; Li, Zeju; Wu, Guoqing; Yu, Jinhua; Wang, Yuanyuan; Lv, Xiaofei; Ju, Xue; Chen, Zhongping
2018-07-01
Due to the totally different therapeutic regimens needed for primary central nervous system lymphoma (PCNSL) and glioblastoma (GBM), accurate differentiation of the two diseases by noninvasive imaging techniques is important for clinical decision-making. Thirty cases of PCNSL and 66 cases of GBM with conventional T1-contrast magnetic resonance imaging (MRI) were analyzed in this study. Convolutional neural networks was used to segment tumor automatically. A modified scale invariant feature transform (SIFT) method was utilized to extract three-dimensional local voxel arrangement information from segmented tumors. Fisher vector was proposed to normalize the dimension of SIFT features. An improved genetic algorithm (GA) was used to extract SIFT features with PCNSL and GBM discrimination ability. The data-set was divided into a cross-validation cohort and an independent validation cohort by the ratio of 2:1. Support vector machine with the leave-one-out cross-validation based on 20 cases of PCNSL and 44 cases of GBM was employed to build and validate the differentiation model. Among 16,384 high-throughput features, 1356 features show significant differences between PCNSL and GBM with p < 0.05 and 420 features with p < 0.001. A total of 496 features were finally chosen by improved GA algorithm. The proposed method produces PCNSL vs. GBM differentiation with an area under the curve (AUC) curve of 99.1% (98.2%), accuracy 95.3% (90.6%), sensitivity 85.0% (80.0%) and specificity 100% (95.5%) on the cross-validation cohort (and independent validation cohort). Since the local voxel arrangement characterization provided by SIFT features, proposed method produced more competitive PCNSL and GBM differentiation performance by using conventional MRI than methods based on advanced MRI.
Concussion classification via deep learning using whole-brain white matter fiber strains
Cai, Yunliang; Wu, Shaoju; Zhao, Wei; Li, Zhigang; Wu, Zheyang
2018-01-01
Developing an accurate and reliable injury predictor is central to the biomechanical studies of traumatic brain injury. State-of-the-art efforts continue to rely on empirical, scalar metrics based on kinematics or model-estimated tissue responses explicitly pre-defined in a specific brain region of interest. They could suffer from loss of information. A single training dataset has also been used to evaluate performance but without cross-validation. In this study, we developed a deep learning approach for concussion classification using implicit features of the entire voxel-wise white matter fiber strains. Using reconstructed American National Football League (NFL) injury cases, leave-one-out cross-validation was employed to objectively compare injury prediction performances against two baseline machine learning classifiers (support vector machine (SVM) and random forest (RF)) and four scalar metrics via univariate logistic regression (Brain Injury Criterion (BrIC), cumulative strain damage measure of the whole brain (CSDM-WB) and the corpus callosum (CSDM-CC), and peak fiber strain in the CC). Feature-based machine learning classifiers including deep learning, SVM, and RF consistently outperformed all scalar injury metrics across all performance categories (e.g., leave-one-out accuracy of 0.828–0.862 vs. 0.690–0.776, and .632+ error of 0.148–0.176 vs. 0.207–0.292). Further, deep learning achieved the best cross-validation accuracy, sensitivity, AUC, and .632+ error. These findings demonstrate the superior performances of deep learning in concussion prediction and suggest its promise for future applications in biomechanical investigations of traumatic brain injury. PMID:29795640
Concussion classification via deep learning using whole-brain white matter fiber strains.
Cai, Yunliang; Wu, Shaoju; Zhao, Wei; Li, Zhigang; Wu, Zheyang; Ji, Songbai
2018-01-01
Developing an accurate and reliable injury predictor is central to the biomechanical studies of traumatic brain injury. State-of-the-art efforts continue to rely on empirical, scalar metrics based on kinematics or model-estimated tissue responses explicitly pre-defined in a specific brain region of interest. They could suffer from loss of information. A single training dataset has also been used to evaluate performance but without cross-validation. In this study, we developed a deep learning approach for concussion classification using implicit features of the entire voxel-wise white matter fiber strains. Using reconstructed American National Football League (NFL) injury cases, leave-one-out cross-validation was employed to objectively compare injury prediction performances against two baseline machine learning classifiers (support vector machine (SVM) and random forest (RF)) and four scalar metrics via univariate logistic regression (Brain Injury Criterion (BrIC), cumulative strain damage measure of the whole brain (CSDM-WB) and the corpus callosum (CSDM-CC), and peak fiber strain in the CC). Feature-based machine learning classifiers including deep learning, SVM, and RF consistently outperformed all scalar injury metrics across all performance categories (e.g., leave-one-out accuracy of 0.828-0.862 vs. 0.690-0.776, and .632+ error of 0.148-0.176 vs. 0.207-0.292). Further, deep learning achieved the best cross-validation accuracy, sensitivity, AUC, and .632+ error. These findings demonstrate the superior performances of deep learning in concussion prediction and suggest its promise for future applications in biomechanical investigations of traumatic brain injury.
Li, Zhenghua; Cheng, Fansheng; Xia, Zhining
2011-01-01
The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.
Schadl, Kornél; Vassar, Rachel; Cahill-Rowley, Katelyn; Yeom, Kristin W; Stevenson, David K; Rose, Jessica
2018-01-01
Advanced neuroimaging and computational methods offer opportunities for more accurate prognosis. We hypothesized that near-term regional white matter (WM) microstructure, assessed on diffusion tensor imaging (DTI), using exhaustive feature selection with cross-validation would predict neurodevelopment in preterm children. Near-term MRI and DTI obtained at 36.6 ± 1.8 weeks postmenstrual age in 66 very-low-birth-weight preterm neonates were assessed. 60/66 had follow-up neurodevelopmental evaluation with Bayley Scales of Infant-Toddler Development, 3rd-edition (BSID-III) at 18-22 months. Linear models with exhaustive feature selection and leave-one-out cross-validation computed based on DTI identified sets of three brain regions most predictive of cognitive and motor function; logistic regression models were computed to classify high-risk infants scoring one standard deviation below mean. Cognitive impairment was predicted (100% sensitivity, 100% specificity; AUC = 1) by near-term right middle-temporal gyrus MD, right cingulate-cingulum MD, left caudate MD. Motor impairment was predicted (90% sensitivity, 86% specificity; AUC = 0.912) by left precuneus FA, right superior occipital gyrus MD, right hippocampus FA. Cognitive score variance was explained (29.6%, cross-validated Rˆ2 = 0.296) by left posterior-limb-of-internal-capsule MD, Genu RD, right fusiform gyrus AD. Motor score variance was explained (31.7%, cross-validated Rˆ2 = 0.317) by left posterior-limb-of-internal-capsule MD, right parahippocampal gyrus AD, right middle-temporal gyrus AD. Search in large DTI feature space more accurately identified neonatal neuroimaging correlates of neurodevelopment.
Liu, Ming; Zhao, Jing; Lu, XiaoZuo; Li, Gang; Wu, Taixia; Zhang, LiFu
2018-05-10
With spectral methods, noninvasive determination of blood hyperviscosity in vivo is very potential and meaningful in clinical diagnosis. In this study, 67 male subjects (41 health, and 26 hyperviscosity according to blood sample analysis results) participate. Reflectance spectra of subjects' tongue tips is measured, and a classification method bases on principal component analysis combined with artificial neural network model is built to identify hyperviscosity. Hold-out and Leave-one-out methods are used to avoid significant bias and lessen overfitting problem, which are widely accepted in the model validation. To measure the performance of the classification, sensitivity, specificity, accuracy and F-measure are calculated, respectively. The accuracies with 100 times Hold-out method and 67 times Leave-one-out method are 88.05% and 97.01%, respectively. Experimental results indicate that the built classification model has certain practical value and proves the feasibility of using spectroscopy to identify hyperviscosity by noninvasive determination.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Budget Online Learning Algorithm for Least Squares SVM.
Jian, Ling; Shen, Shuqian; Li, Jundong; Liang, Xijun; Li, Lei
2017-09-01
Batch-mode least squares support vector machine (LSSVM) is often associated with unbounded number of support vectors (SVs'), making it unsuitable for applications involving large-scale streaming data. Limited-scale LSSVM, which allows efficient updating, seems to be a good solution to tackle this issue. In this paper, to train the limited-scale LSSVM dynamically, we present a budget online LSSVM (BOLSSVM) algorithm. Methodologically, by setting a fixed budget for SVs', we are able to update the LSSVM model according to the updated SVs' set dynamically without retraining from scratch. In particular, when a new small chunk of SVs' substitute for the old ones, the proposed algorithm employs a low rank correction technology and the Sherman-Morrison-Woodbury formula to compute the inverse of saddle point matrix derived from the LSSVM's Karush-Kuhn-Tucker (KKT) system, which, in turn, updates the LSSVM model efficiently. In this way, the proposed BOLSSVM algorithm is especially useful for online prediction tasks. Another merit of the proposed BOLSSVM is that it can be used for k -fold cross validation. Specifically, compared with batch-mode learning methods, the computational complexity of the proposed BOLSSVM method is significantly reduced from O(n 4 ) to O(n 3 ) for leave-one-out cross validation with n training samples. The experimental results of classification and regression on benchmark data sets and real-world applications show the validity and effectiveness of the proposed BOLSSVM algorithm.
NASA Astrophysics Data System (ADS)
Otake, Y.; Murphy, R. J.; Grupp, R. B.; Sato, Y.; Taylor, R. H.; Armand, M.
2015-03-01
A robust atlas-to-subject registration using a statistical deformation model (SDM) is presented. The SDM uses statistics of voxel-wise displacement learned from pre-computed deformation vectors of a training dataset. This allows an atlas instance to be directly translated into an intensity volume and compared with a patient's intensity volume. Rigid and nonrigid transformation parameters were simultaneously optimized via the Covariance Matrix Adaptation - Evolutionary Strategy (CMA-ES), with image similarity used as the objective function. The algorithm was tested on CT volumes of the pelvis from 55 female subjects. A performance comparison of the CMA-ES and Nelder-Mead downhill simplex optimization algorithms with the mutual information and normalized cross correlation similarity metrics was conducted. Simulation studies using synthetic subjects were performed, as well as leave-one-out cross validation studies. Both studies suggested that mutual information and CMA-ES achieved the best performance. The leave-one-out test demonstrated 4.13 mm error with respect to the true displacement field, and 26,102 function evaluations in 180 seconds, on average.
van Hees, Vincent T; Renström, Frida; Wright, Antony; Gradmark, Anna; Catt, Michael; Chen, Kong Y; Löf, Marie; Bluck, Les; Pomeroy, Jeremy; Wareham, Nicholas J; Ekelund, Ulf; Brage, Søren; Franks, Paul W
2011-01-01
Few studies have compared the validity of objective measures of physical activity energy expenditure (PAEE) in pregnant and non-pregnant women. PAEE is commonly estimated with accelerometers attached to the hip or waist, but little is known about the validity and participant acceptability of wrist attachment. The objectives of the current study were to assess the validity of a simple summary measure derived from a wrist-worn accelerometer (GENEA, Unilever Discover, UK) to estimate PAEE in pregnant and non-pregnant women, and to evaluate participant acceptability. Non-pregnant (N = 73) and pregnant (N = 35) Swedish women (aged 20-35 yrs) wore the accelerometer on their wrist for 10 days during which total energy expenditure (TEE) was assessed using doubly-labelled water. PAEE was calculated as 0.9×TEE-REE. British participants (N = 99; aged 22-65 yrs) wore accelerometers on their non-dominant wrist and hip for seven days and were asked to score the acceptability of monitor placement (scored 1 [least] through 10 [most] acceptable). There was no significant correlation between body weight and PAEE. In non-pregnant women, acceleration explained 24% of the variation in PAEE, which decreased to 19% in leave-one-out cross-validation. In pregnant women, acceleration explained 11% of the variation in PAEE, which was not significant in leave-one-out cross-validation. Median (IQR) acceptability of wrist and hip placement was 9(8-10) and 9(7-10), respectively; there was a within-individual difference of 0.47 (p<.001). A simple summary measure derived from a wrist-worn tri-axial accelerometer adds significantly to the prediction of energy expenditure in non-pregnant women and is scored acceptable by participants.
NASA Astrophysics Data System (ADS)
Maier, Oskar; Wilms, Matthias; von der Gablentz, Janina; Krämer, Ulrike; Handels, Heinz
2014-03-01
Automatic segmentation of ischemic stroke lesions in magnetic resonance (MR) images is important in clinical practice and for neuroscientific trials. The key problem is to detect largely inhomogeneous regions of varying sizes, shapes and locations. We present a stroke lesion segmentation method based on local features extracted from multi-spectral MR data that are selected to model a human observer's discrimination criteria. A support vector machine classifier is trained on expert-segmented examples and then used to classify formerly unseen images. Leave-one-out cross validation on eight datasets with lesions of varying appearances is performed, showing our method to compare favourably with other published approaches in terms of accuracy and robustness. Furthermore, we compare a number of feature selectors and closely examine each feature's and MR sequence's contribution.
NASA Astrophysics Data System (ADS)
Choiri, S.; Ainurofiq, A.; Ratri, R.; Zulmi, M. U.
2018-03-01
Nifedipin (NIF) is a photo-labile drug that easily degrades when it exposures a sunlight. This research aimed to develop of an analytical method using a high-performance liquid chromatography and implemented a quality by design approach to obtain effective, efficient, and validated analytical methods of NIF and its degradants. A 22 full factorial design approach with a curvature as a center point was applied to optimize of the analytical condition of NIF and its degradants. Mobile phase composition (MPC) and flow rate (FR) as factors determined on the system suitability parameters. The selected condition was validated by cross-validation using a leave one out technique. Alteration of MPC affected on time retention significantly. Furthermore, an increase of FR reduced the tailing factor. In addition, the interaction of both factors affected on an increase of the theoretical plates and resolution of NIF and its degradants. The selected analytical condition of NIF and its degradants has been validated at range 1 – 16 µg/mL that had good linearity, precision, accuration and efficient due to an analysis time within 10 min.
NASA Astrophysics Data System (ADS)
Majumder, S. K.; Krishna, H.; Sidramesh, M.; Chaturvedi, P.; Gupta, P. K.
2011-08-01
We report the results of a comparative evaluation of in vivo fluorescence and Raman spectroscopy for diagnosis of oral neoplasia. The study carried out at Tata Memorial Hospital, Mumbai, involved 26 healthy volunteers and 138 patients being screened for neoplasm of oral cavity. Spectral measurements were taken from multiple sites of abnormal as well as apparently uninvolved contra-lateral regions of the oral cavity in each patient. The different tissue sites investigated belonged to one of the four histopathology categories: 1) squamous cell carcinoma (SCC), 2) oral sub-mucous fibrosis (OSMF), 3) leukoplakia (LP) and 4) normal squamous tissue. A probability based multivariate statistical algorithm utilizing nonlinear Maximum Representation and Discrimination Feature for feature extraction and Sparse Multinomial Logistic Regression for classification was developed for direct multi-class classification in a leave-one-patient-out cross validation mode. The results reveal that the performance of Raman spectroscopy is considerably superior to that of fluorescence in stratifying the oral tissues into respective histopathologic categories. The best classification accuracy was observed to be 90%, 93%, 94%, and 89% for SCC, SMF, leukoplakia, and normal oral tissues, respectively, on the basis of leave-one-patient-out cross-validation, with an overall accuracy of 91%. However, when a binary classification was employed to distinguish spectra from all the SCC, SMF and leukoplakik tissue sites together from normal, fluorescence and Raman spectroscopy were seen to have almost comparable performances with Raman yielding marginally better classification accuracy of 98.5% as compared to 94% of fluorescence.
Vasanthanathan, Poongavanam; Lakshmi, Manickavasagam; Arockia Babu, Marianesan; Kaskhedikar, Sathish Gopalrao
2006-06-01
A quantitative structure activity relationship, Hansch approach was applied on twenty compounds of chromene derivatives as Lanosterol 14alpha-demethylase inhibitory activity against eight fungal organisms. Various physicochemical descriptors and reported minimum inhibitory concentration values of different fungal organisms were used as independent variables and dependent variable respectively. The best models for eight different fungal organisms were first validated by leave-one-out cross validation procedure. It was revealed that thermodynamic parameters were found to have overall significant correlationship with anti fungal activity and these studies provide an insight to design new molecules.
Development of gait segmentation methods for wearable foot pressure sensors.
Crea, S; De Rossi, S M M; Donati, M; Reberšek, P; Novak, D; Vitiello, N; Lenzi, T; Podobnik, J; Munih, M; Carrozza, M C
2012-01-01
We present an automated segmentation method based on the analysis of plantar pressure signals recorded from two synchronized wireless foot insoles. Given the strict limits on computational power and power consumption typical of wearable electronic components, our aim is to investigate the capability of a Hidden Markov Model machine-learning method, to detect gait phases with different levels of complexity in the processing of the wearable pressure sensors signals. Therefore three different datasets are developed: raw voltage values, calibrated sensor signals and a calibrated estimation of total ground reaction force and position of the plantar center of pressure. The method is tested on a pool of 5 healthy subjects, through a leave-one-out cross validation. The results show high classification performances achieved using estimated biomechanical variables, being on average the 96%. Calibrated signals and raw voltage values show higher delays and dispersions in phase transition detection, suggesting a lower reliability for online applications.
NASA Astrophysics Data System (ADS)
Samadi; Wajizah, S.; Munawar, A. A.
2018-02-01
Feed plays an important factor in animal production. The purpose of this study is to apply NIRS method in determining feed values. NIRS spectra data were acquired for feed samples in wavelength range of 1000 - 2500 nm with 32 scans and 0.2 nm wavelength. Spectral data were corrected by de-trending (DT) and standard normal variate (SNV) methods. Prediction of in vitro dry matter digestibility (IVDMD) and in vitro organic matter digestibility (IVOMD) were established as model by using principal component regression (PCR) and validated using leave one out cross validation (LOOCV). Prediction performance was quantified using coefficient correlation (r) and residual predictive deviation (RPD) index. The results showed that IVDMD and IVOMD can be predicted by using SNV spectra data with r and RPD index: 0.93 and 2.78 for IVDMD ; 0.90 and 2.35 for IVOMD respectively. In conclusion, NIRS technique appears feasible to predict animal feed nutritive values.
A novel method to estimate the affinity of HLA-A∗0201 restricted CTL epitope
NASA Astrophysics Data System (ADS)
Xu, Yun-sheng; Lin, Yong; Zhu, Bo; Lin, Zhi-hua
2009-02-01
A set of 70 peptides with affinity for the class I MHC HLA-A∗0201 molecule was subjected to quantitative structure-affinity relationship studies based on the SCORE function with good results ( r2 = 0.6982, RMS = 0.280). Then the 'leave-one-out' cross-validation (LOO-CV) and an outer test set including 18 outer samples were used to validate the QSAR model. The results of the LOO-CV were q2 = 0.6188, RMS = 0.315, and the results of outer test set were r2 = 0.5633, RMS = 0.2292. All these show that the QSAR model has good predictability. Statistical analysis showed that the hydrophobic and hydrogen bond interaction played a significant role in peptide-MHC molecule binding. The study also provided useful information for structure modification of CTL epitope, and laid theoretical base for molecular design of therapeutic vaccine.
Saavedra, Laura M; Romanelli, Gustavo P; Rozo, Ciro E; Duchowicz, Pablo R
2018-01-01
The insecticidal activity of a series of 62 plant derived molecules against the chikungunya, dengue and zika vector, the Aedes aegypti (Diptera:Culicidae) mosquito, is subjected to a Quantitative Structure-Activity Relationships (QSAR) analysis. The Replacement Method (RM) variable subset selection technique based on Multivariable Linear Regression (MLR) proves to be successful for exploring 4885 molecular descriptors calculated with Dragon 6. The predictive capability of the obtained models is confirmed through an external test set of compounds, Leave-One-Out (LOO) cross-validation and Y-Randomization. The present study constitutes a first necessary computational step for designing less toxic insecticides. Copyright © 2017 Elsevier B.V. All rights reserved.
Predicting prolonged dose titration in patients starting warfarin.
Finkelman, Brian S; French, Benjamin; Bershaw, Luanne; Brensinger, Colleen M; Streiff, Michael B; Epstein, Andrew E; Kimmel, Stephen E
2016-11-01
Patients initiating warfarin therapy generally experience a dose-titration period of weeks to months, during which time they are at higher risk of both thromboembolic and bleeding events. Accurate prediction of prolonged dose titration could help clinicians determine which patients might be better treated by alternative anticoagulants that, while more costly, do not require dose titration. A prediction model was derived in a prospective cohort of patients starting warfarin (n = 390), using Cox regression, and validated in an external cohort (n = 663) from a later time period. Prolonged dose titration was defined as a dose-titration period >12 weeks. Predictor variables were selected using a modified best subsets algorithm, using leave-one-out cross-validation to reduce overfitting. The final model had five variables: warfarin indication, insurance status, number of doctor's visits in the previous year, smoking status, and heart failure. The area under the ROC curve (AUC) in the derivation cohort was 0.66 (95%CI 0.60, 0.74) using leave-one-out cross-validation, but only 0.59 (95%CI 0.54, 0.64) in the external validation cohort, and varied across clinics. Including genetic factors in the model did not improve the area under the ROC curve (0.59; 95%CI 0.54, 0.65). Relative utility curves indicated that the model was unlikely to provide a clinically meaningful benefit compared with no prediction. Our results suggest that prolonged dose titration cannot be accurately predicted in warfarin patients using traditional clinical, social, and genetic predictors, and that accurate prediction will need to accommodate heterogeneities across clinical sites and over time. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Rekha, Pachaiappan; Aruna, Prakasa Rao; Ganesan, Singaravelu
2016-03-01
Many research works based on fluorescence spectroscopy have proven its potential in the diagnosis of various diseases using the spectral signatures of the native key fluorophores such as tryptophan, tyrosine, collagen, NADH, FAD and porphyrin. These fluorophores distribution, concentration and their conformation may be changed depending upon the pathological and metabolic conditions of cells and tissues. In this study, we have made an attempt to characterize the blood plasma of normal subject and oral cancer patients by native fluorescence spectroscopy at 280 nm excitation. Further, the fluorescence data were analyzed by employing the multivariate statistical method - linear discriminant analyses (LDA) using leaves one out cross validation method. The results illustrate the potential of fluorescence spectroscopy technique in the diagnosis of oral cancer using blood plasma.
Yao, Yibing; Fan, Yu; Wu, Jun; Wan, Haisu; Wang, Jing; Lam, Stephen; Lam, Wan L.; Girard, Luc; Gazdar, Adi F.; Wu, Zhihao; Zhou, Qinghua
2015-01-01
To identify a panel of tumor associated autoantibodies which can potentially be used as biomarkers for the early diagnosis of non-small cell lung cancer (NSCLC). Thirty-five unique and in-frame expressed phage proteins were isolated. Based on the gene expression profiling, four proteins were selected for further study. Both receiver operating characteristic curve analysis and leave-one-out method revealed that combined measurements of four antibodies produced have better predictive accuracies than any single marker alone. Leave-one-out validation also showed significant relevance with all stages of NSCLC patients. The panel of autoantibodies has a high potential for detecting early stage NSCLC. PMID:22713465
Crovato, César David Paredes; Schuck, Adalberto
2007-10-01
This paper presents a dysphonic voice classification system using the wavelet packet transform and the best basis algorithm (BBA) as dimensionality reductor and 06 artificial neural networks (ANN) acting as specialist systems. Each ANN was a 03-layer multilayer perceptron with 64 input nodes, 01 output node and in the intermediary layer the number of neurons depends on the related training pathology group. The dysphonic voice database was separated in five pathology groups and one healthy control group. Each ANN was trained and associated with one of the 06 groups, and fed by the best base tree (BBT) nodes' entropy values, using the multiple cross validation (MCV) method and the leave-one-out (LOO) variation technique and success rates obtained were 87.5%, 95.31%, 87.5%, 100%, 96.87% and 89.06% for the groups 01 to 06, respectively.
Determination of total phenolic compounds in compost by infrared spectroscopy.
Cascant, M M; Sisouane, M; Tahiri, S; Krati, M El; Cervera, M L; Garrigues, S; de la Guardia, M
2016-06-01
Middle and near infrared (MIR and NIR) were applied to determine the total phenolic compounds (TPC) content in compost samples based on models built by using partial least squares (PLS) regression. The multiplicative scatter correction, standard normal variate and first derivative were employed as spectra pretreatment, and the number of latent variable were optimized by leave-one-out cross-validation. The performance of PLS-ATR-MIR and PLS-DR-NIR models was evaluated according to root mean square error of cross validation and prediction (RMSECV and RMSEP), the coefficient of determination for prediction (Rpred(2)) and residual predictive deviation (RPD) being obtained for this latter values of 5.83 and 8.26 for MIR and NIR, respectively. Copyright © 2016 Elsevier B.V. All rights reserved.
Weidhaas, Joanne B.; Li, Shu-Xia; Winter, Kathryn; Ryu, Janice; Jhingran, Anuja; Miller, Bridgette; Dicker, Adam P.; Gaffney, David
2009-01-01
Purpose To evaluate the potential of gene expression signatures to predict response to treatment in locally advanced cervical cancer treated with definitive chemotherapy and radiation. Experimental Design Tissue biopsies were collected from patients participating in Radiation Therapy Oncology Group (RTOG) 0128, a phase II trial evaluating the benefit of celecoxib in addition to cisplatin chemotherapy and radiation for locally advanced cervical cancer. Gene expression profiling was done and signatures of pretreatment, mid-treatment (before the first implant), and “changed” gene expression patterns between pre- and mid-treatment samples were determined. The ability of the gene signatures to predict local control versus local failure was evaluated. Two-group t test was done to identify the initial gene set separating these end points. Supervised classification methods were used to enrich the gene sets. The results were further validated by leave-one-out and 2-fold cross-validation. Results Twenty-two patients had suitable material from pretreatment samples for analysis, and 13 paired pre- and mid-treatment samples were obtained. The changed gene expression signatures between the pre- and mid-treatment biopsies predicted response to treatment, separating patients with local failures from those who achieved local control with a seven-gene signature. The in-sample prediction rate, leave-one-out prediction rate, and 2-fold prediction rate are 100% for this seven-gene signature. This signature was enriched for cell cycle genes. Conclusions Changed gene expression signatures during therapy in cervical cancer can predict outcome as measured by local control. After further validation, such findings could be applied to direct additional therapy for cervical cancer patients treated with chemotherapy and radiation. PMID:19509178
Refinement of a Method for Identifying Probable Archaeological Sites from Remotely Sensed Data
NASA Technical Reports Server (NTRS)
Tilton, James C.; Comer, Douglas C.; Priebe, Carey E.; Sussman, Daniel; Chen, Li
2012-01-01
To facilitate locating archaeological sites before they are compromised or destroyed, we are developing approaches for generating maps of probable archaeological sites, through detecting subtle anomalies in vegetative cover, soil chemistry, and soil moisture by analyzing remotely sensed data from multiple sources. We previously reported some success in this effort with a statistical analysis of slope, radar, and Ikonos data (including tasseled cap and NDVI transforms) with Student's t-test. We report here on new developments in our work, performing an analysis of 8-band multispectral Worldview-2 data. The Worldview-2 analysis begins by computing medians and median absolute deviations for the pixels in various annuli around each site of interest on the 28 band difference ratios. We then use principle components analysis followed by linear discriminant analysis to train a classifier which assigns a posterior probability that a location is an archaeological site. We tested the procedure using leave-one-out cross validation with a second leave-one-out step to choose parameters on a 9,859x23,000 subset of the WorldView-2 data over the western portion of Ft. Irwin, CA, USA. We used 100 known non-sites and trained one classifier for lithic sites (n=33) and one classifier for habitation sites (n=16). We then analyzed convex combinations of scores from the Archaeological Predictive Model (APM) and our scores. We found that that the combined scores had a higher area under the ROC curve than either individual method, indicating that including WorldView-2 data in analysis improved the predictive power of the provided APM.
Rational selection of training and test sets for the development of validated QSAR models
NASA Astrophysics Data System (ADS)
Golbraikh, Alexander; Shen, Min; Xiao, Zhiyan; Xiao, Yun-De; Lee, Kuo-Hsiung; Tropsha, Alexander
2003-02-01
Quantitative Structure-Activity Relationship (QSAR) models are used increasingly to screen chemical databases and/or virtual chemical libraries for potentially bioactive molecules. These developments emphasize the importance of rigorous model validation to ensure that the models have acceptable predictive power. Using k nearest neighbors ( kNN) variable selection QSAR method for the analysis of several datasets, we have demonstrated recently that the widely accepted leave-one-out (LOO) cross-validated R2 (q2) is an inadequate characteristic to assess the predictive ability of the models [Golbraikh, A., Tropsha, A. Beware of q2! J. Mol. Graphics Mod. 20, 269-276, (2002)]. Herein, we provide additional evidence that there exists no correlation between the values of q 2 for the training set and accuracy of prediction ( R 2) for the test set and argue that this observation is a general property of any QSAR model developed with LOO cross-validation. We suggest that external validation using rationally selected training and test sets provides a means to establish a reliable QSAR model. We propose several approaches to the division of experimental datasets into training and test sets and apply them in QSAR studies of 48 functionalized amino acid anticonvulsants and a series of 157 epipodophyllotoxin derivatives with antitumor activity. We formulate a set of general criteria for the evaluation of predictive power of QSAR models.
Cross-Cultural Detection of Depression from Nonverbal Behaviour.
Alghowinem, Sharifa; Goecke, Roland; Cohn, Jeffrey F; Wagner, Michael; Parker, Gordon; Breakspear, Michael
2015-05-01
Millions of people worldwide suffer from depression. Do commonalities exist in their nonverbal behavior that would enable cross-culturally viable screening and assessment of severity? We investigated the generalisability of an approach to detect depression severity cross-culturally using video-recorded clinical interviews from Australia, the USA and Germany. The material varied in type of interview, subtypes of depression and inclusion healthy control subjects, cultural background, and recording environment. The analysis focussed on temporal features of participants' eye gaze and head pose. Several approaches to training and testing within and between datasets were evaluated. The strongest results were found for training across all datasets and testing across datasets using leave-one-subject-out cross-validation. In contrast, generalisability was attenuated when training on only one or two of the three datasets and testing on subjects from the dataset(s) not used in training. These findings highlight the importance of using training data exhibiting the expected range of variability.
Flumignan, Danilo Luiz; Boralle, Nivaldo; Oliveira, José Eduardo de
2010-06-30
In this work, the combination of carbon nuclear magnetic resonance ((13)C NMR) fingerprinting with pattern-recognition analyses provides an original and alternative approach to screening commercial gasoline quality. Soft Independent Modelling of Class Analogy (SIMCA) was performed on spectroscopic fingerprints to classify representative commercial gasoline samples, which were selected by Hierarchical Cluster Analyses (HCA) over several months in retails services of gas stations, into previously quality-defined classes. Following optimized (13)C NMR-SIMCA algorithm, sensitivity values were obtained in the training set (99.0%), with leave-one-out cross-validation, and external prediction set (92.0%). Governmental laboratories could employ this method as a rapid screening analysis to discourage adulteration practices. Copyright 2010 Elsevier B.V. All rights reserved.
Huh, Jung Wook; Kim, Sung Chun; Sohn, Insuk; Jung, Sin-Ho; Kim, Hee Cheol
2016-01-01
Background In this study, we established and validated a model for predicting prognosis of stage IIA colon cancer patients based on expression profiles of aptamers in serum. Methods Bloods samples were collected from 227 consecutive patients with pathologic T3N0M0 (stage IIA) colon cancer. We incubated 1,149 serum molecule-binding aptamer pools of clinical significance with serum from patients to obtain aptamers bound to serum molecules, which were then amplified and marked. Oligonucleotide arrays were constructed with the base sequences of the 1,149 aptamers, and the marked products identified above were reacted with one another to produce profiles of the aptamers bound to serum molecules. These profiles were organized into low- and high-risk groups of colon cancer patients based on clinical information for the serum samples. Cox proportional hazards model and leave-one-out cross-validation (LOOCV) were used to evaluate predictive performance. Results During a median follow-up period of 5 years, 29 of the 227 patients (11.9%) experienced recurrence. There were 212 patients (93.4%) in the low-risk group and 15 patients (6.6%) in the high-risk group in our aptamer prognosis model. Postoperative recurrence significantly correlated with age and aptamer risk stratification (p = 0.046 and p = 0.001, respectively). In multivariate analysis, aptamer risk stratification (p < 0.001) was an independent predictor of recurrence. Disease-free survival curves calculated according to aptamer risk level predicted through a LOOCV procedure and age showed significant differences (p < 0.001 from permutations). Conclusion Aptamer risk stratification can be a valuable prognostic factor in stage II colon cancer patients. PMID:26908450
Computing Prediction and Functional Analysis of Prokaryotic Propionylation.
Wang, Li-Na; Shi, Shao-Ping; Wen, Ping-Ping; Zhou, Zhi-You; Qiu, Jian-Ding
2017-11-27
Identification and systematic analysis of candidates for protein propionylation are crucial steps for understanding its molecular mechanisms and biological functions. Although several proteome-scale methods have been performed to delineate potential propionylated proteins, the majority of lysine-propionylated substrates and their role in pathological physiology still remain largely unknown. By gathering various databases and literatures, experimental prokaryotic propionylation data were collated to be trained in a support vector machine with various features via a three-step feature selection method. A novel online tool for seeking potential lysine-propionylated sites (PropSeek) ( http://bioinfo.ncu.edu.cn/PropSeek.aspx ) was built. Independent test results of leave-one-out and n-fold cross-validation were similar to each other, showing that PropSeek is a stable and robust predictor with satisfying performance. Meanwhile, analyses of Gene Ontology, Kyoto Encyclopedia of Genes and Genomes pathways, and protein-protein interactions implied a potential role of prokaryotic propionylation in protein synthesis and metabolism.
Kumar, Ravindra; Kumari, Bandana; Kumar, Manish
2017-01-01
The endoplasmic reticulum plays an important role in many cellular processes, which includes protein synthesis, folding and post-translational processing of newly synthesized proteins. It is also the site for quality control of misfolded proteins and entry point of extracellular proteins to the secretory pathway. Hence at any given point of time, endoplasmic reticulum contains two different cohorts of proteins, (i) proteins involved in endoplasmic reticulum-specific function, which reside in the lumen of the endoplasmic reticulum, called as endoplasmic reticulum resident proteins and (ii) proteins which are in process of moving to the extracellular space. Thus, endoplasmic reticulum resident proteins must somehow be distinguished from newly synthesized secretory proteins, which pass through the endoplasmic reticulum on their way out of the cell. Approximately only 50% of the proteins used in this study as training data had endoplasmic reticulum retention signal, which shows that these signals are not essentially present in all endoplasmic reticulum resident proteins. This also strongly indicates the role of additional factors in retention of endoplasmic reticulum-specific proteins inside the endoplasmic reticulum. This is a support vector machine based method, where we had used different forms of protein features as inputs for support vector machine to develop the prediction models. During training leave-one-out approach of cross-validation was used. Maximum performance was obtained with a combination of amino acid compositions of different part of proteins. In this study, we have reported a novel support vector machine based method for predicting endoplasmic reticulum resident proteins, named as ERPred. During training we achieved a maximum accuracy of 81.42% with leave-one-out approach of cross-validation. When evaluated on independent dataset, ERPred did prediction with sensitivity of 72.31% and specificity of 83.69%. We have also annotated six different proteomes to predict the candidate endoplasmic reticulum resident proteins in them. A webserver, ERPred, was developed to make the method available to the scientific community, which can be accessed at http://proteininformatics.org/mkumar/erpred/index.html. We found that out of 124 proteins of the training dataset, only 66 proteins had endoplasmic reticulum retention signals, which shows that these signals are not an absolute necessity for endoplasmic reticulum resident proteins to remain inside the endoplasmic reticulum. This observation also strongly indicates the role of additional factors in retention of proteins inside the endoplasmic reticulum. Our proposed predictor, ERPred, is a signal independent tool. It is tuned for the prediction of endoplasmic reticulum resident proteins, even if the query protein does not contain specific ER-retention signal.
Efficient mining of association rules for the early diagnosis of Alzheimer's disease
NASA Astrophysics Data System (ADS)
Chaves, R.; Górriz, J. M.; Ramírez, J.; Illán, I. A.; Salas-Gonzalez, D.; Gómez-Río, M.
2011-09-01
In this paper, a novel technique based on association rules (ARs) is presented in order to find relations among activated brain areas in single photon emission computed tomography (SPECT) imaging. In this sense, the aim of this work is to discover associations among attributes which characterize the perfusion patterns of normal subjects and to make use of them for the early diagnosis of Alzheimer's disease (AD). Firstly, voxel-as-feature-based activation estimation methods are used to find the tridimensional activated brain regions of interest (ROIs) for each patient. These ROIs serve as input to secondly mine ARs with a minimum support and confidence among activation blocks by using a set of controls. In this context, support and confidence measures are related to the proportion of functional areas which are singularly and mutually activated across the brain. Finally, we perform image classification by comparing the number of ARs verified by each subject under test to a given threshold that depends on the number of previously mined rules. Several classification experiments were carried out in order to evaluate the proposed methods using a SPECT database that consists of 41 controls (NOR) and 56 AD patients labeled by trained physicians. The proposed methods were validated by means of the leave-one-out cross validation strategy, yielding up to 94.87% classification accuracy, thus outperforming recent developed methods for computer aided diagnosis of AD.
Chotimah, Chusnul; Sudjadi; Riyanto, Sugeng; Rohman, Abdul
2015-01-01
Purpose: Analysis of drugs in multicomponent system officially is carried out using chromatographic technique, however, this technique is too laborious and involving sophisticated instrument. Therefore, UV-VIS spectrophotometry coupled with multivariate calibration of partial least square (PLS) for quantitative analysis of metamizole, thiamin and pyridoxin is developed in the presence of cyanocobalamine without any separation step. Methods: The calibration and validation samples are prepared. The calibration model is prepared by developing a series of sample mixture consisting these drugs in certain proportion. Cross validation of calibration sample using leave one out technique is used to identify the smaller set of components that provide the greatest predictive ability. The evaluation of calibration model was based on the coefficient of determination (R2) and root mean square error of calibration (RMSEC). Results: The results showed that the coefficient of determination (R2) for the relationship between actual values and predicted values for all studied drugs was higher than 0.99 indicating good accuracy. The RMSEC values obtained were relatively low, indicating good precision. The accuracy and presision results of developed method showed no significant difference compared to those obtained by official method of HPLC. Conclusion: The developed method (UV-VIS spectrophotometry in combination with PLS) was succesfully used for analysis of metamizole, thiamin and pyridoxin in tablet dosage form. PMID:26819934
Boosting specificity of MEG artifact removal by weighted support vector machine.
Duan, Fang; Phothisonothai, Montri; Kikuchi, Mitsuru; Yoshimura, Yuko; Minabe, Yoshio; Watanabe, Kastumi; Aihara, Kazuyuki
2013-01-01
An automatic artifact removal method of magnetoencephalogram (MEG) was presented in this paper. The method proposed is based on independent components analysis (ICA) and support vector machine (SVM). However, different from the previous studies, in this paper we consider two factors which would influence the performance. First, the imbalance factor of independent components (ICs) of MEG is handled by weighted SVM. Second, instead of simply setting a fixed weight to each class, a re-weighting scheme is used for the preservation of useful MEG ICs. Experimental results on manually marked MEG dataset showed that the method proposed could correctly distinguish the artifacts from the MEG ICs. Meanwhile, 99.72% ± 0.67 of MEG ICs were preserved. The classification accuracy was 97.91% ± 1.39. In addition, it was found that this method was not sensitive to individual differences. The cross validation (leave-one-subject-out) results showed an averaged accuracy of 97.41% ± 2.14.
NASA Astrophysics Data System (ADS)
Hariharan, Harishwaran; Aklaghi, Nima; Baker, Clayton A.; Rangwala, Huzefa; Kosecka, Jana; Sikdar, Siddhartha
2016-04-01
In spite of major advances in biomechanical design of upper extremity prosthetics, these devices continue to lack intuitive control. Conventional myoelectric control strategies typically utilize electromyography (EMG) signal amplitude sensed from forearm muscles. EMG has limited specificity in resolving deep muscle activity and poor signal-to-noise ratio. We have been investigating alternative control strategies that rely on real-time ultrasound imaging that can overcome many of the limitations of EMG. In this work, we present an ultrasound image sequence classification method that utilizes spatiotemporal features to describe muscle activity and classify motor intent. Ultrasound images of the forearm muscles were obtained from able-bodied subjects and a trans-radial amputee while they attempted different hand movements. A grid-based approach is used to test the feasibility of using spatio-temporal features by classifying hand motions performed by the subjects. Using the leave-one-out cross validation on image sequences acquired from able-bodied subjects, we observe that the grid-based approach is able to discern four hand motions with 95.31% accuracy. In case of the trans-radial amputee, we are able to discern three hand motions with 80% accuracy. In a second set of experiments, we study classification accuracy by extracting spatio-temporal sub-sequences the depict activity due to the motion of local anatomical interfaces. Short time and space limited cuboidal sequences are initially extracted and assigned an optical flow behavior label, based on a response function. The image space is clustered based on the location of cuboids and features calculated from the cuboids in each cluster. Using sequences of known motions, we extract feature vectors that describe said motion. A K-nearest neighbor classifier is designed for classification experiments. Using the leave-one-out cross validation on image sequences for an amputee subject, we demonstrate that the classifier is able to discern three important hand motions with an accuracy of 93.33% accuracy, 91-100% precision and 80-100% recall rate. We anticipate that ultrasound imaging based methods will address some limitations of conventional myoelectric sensing, while adding advantages inherent to ultrasound imaging.
Fine-tuning convolutional deep features for MRI based brain tumor classification
NASA Astrophysics Data System (ADS)
Ahmed, Kaoutar B.; Hall, Lawrence O.; Goldgof, Dmitry B.; Liu, Renhao; Gatenby, Robert A.
2017-03-01
Prediction of survival time from brain tumor magnetic resonance images (MRI) is not commonly performed and would ordinarily be a time consuming process. However, current cross-sectional imaging techniques, particularly MRI, can be used to generate many features that may provide information on the patient's prognosis, including survival. This information can potentially be used to identify individuals who would benefit from more aggressive therapy. Rather than using pre-defined and hand-engineered features as with current radiomics methods, we investigated the use of deep features extracted from pre-trained convolutional neural networks (CNNs) in predicting survival time. We also provide evidence for the power of domain specific fine-tuning in improving the performance of a pre-trained CNN's, even though our data set is small. We fine-tuned a CNN initially trained on a large natural image recognition dataset (Imagenet ILSVRC) and transferred the learned feature representations to the survival time prediction task, obtaining over 81% accuracy in a leave one out cross validation.
Tan, Jin; Li, Rong; Jiang, Zi-Tao; Tang, Shu-Hua; Wang, Ying; Shi, Meng; Xiao, Yi-Qian; Jia, Bin; Lu, Tian-Xiang; Wang, Hao
2017-02-15
Synchronous front-face fluorescence spectroscopy has been developed for the discrimination of used frying oil (UFO) from edible vegetable oil (EVO), the estimation of the using time of UFO, and the determination of the adulteration of EVO with UFO. Both the heating time of laboratory prepared UFO and the adulteration of EVO with UFO could be determined by partial least squares regression (PLSR). To simulate the EVO adulteration with UFO, for each kind of oil, fifty adulterated samples at the adulterant amounts range of 1-50% were prepared. PLSR was then adopted to build the model and both full (leave-one-out) cross-validation and external validation were performed to evaluate the predictive ability. Under the optimum condition, the plots of observed versus predicted values exhibited high linearity (R(2)>0.96). The root mean square error of cross-validation (RMSECV) and root mean square error of prediction (RMSEP) were both lower than 3%. Copyright © 2016 Elsevier Ltd. All rights reserved.
Joint source based analysis of multiple brain structures in studying major depressive disorder
NASA Astrophysics Data System (ADS)
Ramezani, Mahdi; Rasoulian, Abtin; Hollenstein, Tom; Harkness, Kate; Johnsrude, Ingrid; Abolmaesumi, Purang
2014-03-01
We propose a joint Source-Based Analysis (jSBA) framework to identify brain structural variations in patients with Major Depressive Disorder (MDD). In this framework, features representing position, orientation and size (i.e. pose), shape, and local tissue composition are extracted. Subsequently, simultaneous analysis of these features within a joint analysis method is performed to generate the basis sources that show signi cant di erences between subjects with MDD and those in healthy control. Moreover, in a cross-validation leave- one-out experiment, we use a Fisher Linear Discriminant (FLD) classi er to identify individuals within the MDD group. Results show that we can classify the MDD subjects with an accuracy of 76% solely based on the information gathered from the joint analysis of pose, shape, and tissue composition in multiple brain structures.
Pfau, Maximilian; Lindner, Moritz; Goerdt, Lukas; Thiele, Sarah; Nadal, Jennifer; Schmid, Matthias; Schmitz-Valckenberg, Steffen; Sadda, SriniVas R; Holz, Frank G; Fleckenstein, Monika
2018-05-16
To systematically compare the prognostic value of multiple shape-descriptive factors in the natural course of the disease. A total of 296 eyes of 201 patients (female patients 130; mean age: 72.2 ± 13.08 years) with a median follow-up of 2.38 years from 2 prospective, noninterventional natural history studies (Fundus-Autofluorescence-in-Age-related-Macular-Degeneration [clinicaltrials.gov identifier NCT00393692], Directional-Spread-in-Geographic-Atrophy [NCT02051998]) were included in the analysis. Serial fundus autofluorescence images were annotated using semiautomated image analysis software to determine the lesion area, circularity, perimeter, and caliper diameters. These variables and the fundus autofluorescence phenotype were evaluated for prediction of the future square root progression rates using linear mixed-effects models. For the combined model, leave-one-out cross validation on patient level (Scenario 1: previously unknown patient) resulted in a goodness-to-fit (R value) of 0.244 and leave-one-out cross validation on visit level (Scenario 2: previous observation of the patient) in a R value of 0.391. This indicated that shape-descriptive factors could explain 24.4% of the variance in geographic atrophy progression in previously unknown patients and 39.1% in patients with previous observation. These findings confirm the relevance of shape-descriptive factors and previous progression as prognostic variables for geographic atrophy progression. However, a substantial part of the remaining variation in geographic atrophy progression seems to depend on other variables, some of which are visible in optical coherence tomography.
Ding, H; Chen, C; Zhang, X
2016-01-01
The linear solvation energy relationship (LSER) was applied to predict the adsorption coefficient (K) of synthetic organic compounds (SOCs) on single-walled carbon nanotubes (SWCNTs). A total of 40 log K values were used to develop and validate the LSER model. The adsorption data for 34 SOCs were collected from 13 published articles and the other six were obtained in our experiment. The optimal model composed of four descriptors was developed by a stepwise multiple linear regression (MLR) method. The adjusted r(2) (r(2)adj) and root mean square error (RMSE) were 0.84 and 0.49, respectively, indicating good fitness. The leave-one-out cross-validation Q(2) ([Formula: see text]) was 0.79, suggesting the robustness of the model was satisfactory. The external Q(2) ([Formula: see text]) and RMSE (RMSEext) were 0.72 and 0.50, respectively, showing the model's strong predictive ability. Hydrogen bond donating interaction (bB) and cavity formation and dispersion interactions (vV) stood out as the two most influential factors controlling the adsorption of SOCs onto SWCNTs. The equilibrium concentration would affect the fitness and predictive ability of the model, while the coefficients varied slightly.
NASA Astrophysics Data System (ADS)
Luo, X.; Heck, B.; Awange, J. L.
2013-12-01
Global Navigation Satellite Systems (GNSS) are emerging as possible tools for remote sensing high-resolution atmospheric water vapour that improves weather forecasting through numerical weather prediction models. Nowadays, the GNSS-derived tropospheric zenith total delay (ZTD), comprising zenith dry delay (ZDD) and zenith wet delay (ZWD), is achievable with sub-centimetre accuracy. However, if no representative near-site meteorological information is available, the quality of the ZDD derived from tropospheric models is degraded, leading to inaccurate estimation of the water vapour component ZWD as difference between ZTD and ZDD. On the basis of freely accessible regional surface meteorological data, this paper proposes a height-dependent linear correction model for a priori ZDD. By applying the ordinary least-squares estimation (OLSE), bootstrapping (BOOT), and leave-one-out cross-validation (CROS) methods, the model parameters are estimated and analysed with respect to outlier detection. The model validation is carried out using GNSS stations with near-site meteorological measurements. The results verify the efficiency of the proposed ZDD correction model, showing a significant reduction in the mean bias from several centimetres to about 5 mm. The OLSE method enables a fast computation, while the CROS procedure allows for outlier detection. All the three methods produce consistent results after outlier elimination, which improves the regression quality by about 20% and the model accuracy by up to 30%.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Y; Zou, J; Murillo, P
Purpose: Chemo-radiation therapy (CRT) is widely used in treating patients with locally advanced non-small cell lung cancer (NSCLC). Determination of the likelihood of patient response to treatment and optimization of treatment regime is of clinical significance. Up to date, no imaging biomarker has reliably correlated to NSCLC patient survival rate. This pilot study is to extract CT texture information from tumor regions for patient survival prediction. Methods: Thirteen patients with stage II-III NSCLC were treated using CRT with a median dose of 6210 cGy. Non-contrast-enhanced CT images were acquired for treatment planning and retrospectively collected for this study. Texture analysismore » was applied in segmented tumor regions using the Local Binary Pattern method (LBP). By comparing its HU with neighboring voxels, the LBPs of a voxel were measured in multiple scales with different group radiuses and numbers of neighbors. The LBP histograms formed a multi-dimensional texture vector for each patient, which was then used to establish and test a Support Vector Machine (SVM) model to predict patients’ one year survival. The leave-one-out cross validation strategy was used recursively to enlarge the training set and derive a reliable predictor. The predictions were compared with the true clinical outcomes. Results: A 10-dimensional LBP histogram was extracted from 3D segmented tumor region for each of the 13 patients. Using the SVM model with the leave-one-out strategy, only 1 out of 13 patients was misclassified. The experiments showed an accuracy of 93%, sensitivity of 100%, and specificity of 86%. Conclusion: Within the framework of a Support Vector Machine based model, the Local Binary Pattern method is able to extract a quantitative imaging biomarker in the prediction of NSCLC patient survival. More patients are to be included in the study.« less
Davies, John R; Chang, Yu-mei; Bishop, D Timothy; Armstrong, Bruce K; Bataille, Veronique; Bergman, Wilma; Berwick, Marianne; Bracci, Paige M; Elwood, J Mark; Ernstoff, Marc S; Green, Adele; Gruis, Nelleke A; Holly, Elizabeth A; Ingvar, Christian; Kanetsky, Peter A; Karagas, Margaret R; Lee, Tim K; Le Marchand, Loïc; Mackie, Rona M; Olsson, Håkan; Østerlind, Anne; Rebbeck, Timothy R; Reich, Kristian; Sasieni, Peter; Siskind, Victor; Swerdlow, Anthony J; Titus, Linda; Zens, Michael S; Ziegler, Andreas; Gallagher, Richard P.; Barrett, Jennifer H; Newton-Bishop, Julia
2015-01-01
Background We report the development of a cutaneous melanoma risk algorithm based upon 7 factors; hair colour, skin type, family history, freckling, nevus count, number of large nevi and history of sunburn, intended to form the basis of a self-assessment webtool for the general public. Methods Predicted odds of melanoma were estimated by analysing a pooled dataset from 16 case-control studies using logistic random coefficients models. Risk categories were defined based on the distribution of the predicted odds in the controls from these studies. Imputation was used to estimate missing data in the pooled datasets. The 30th, 60th and 90th centiles were used to distribute individuals into four risk groups for their age, sex and geographic location. Cross-validation was used to test the robustness of the thresholds for each group by leaving out each study one by one. Performance of the model was assessed in an independent UK case-control study dataset. Results Cross-validation confirmed the robustness of the threshold estimates. Cases and controls were well discriminated in the independent dataset (area under the curve 0.75, 95% CI 0.73-0.78). 29% of cases were in the highest risk group compared with 7% of controls, and 43% of controls were in the lowest risk group compared with 13% of cases. Conclusion We have identified a composite score representing an estimate of relative risk and successfully validated this score in an independent dataset. Impact This score may be a useful tool to inform members of the public about their melanoma risk. PMID:25713022
Drosos, Juan Carlos; Viola-Rhenals, Maricela; Vivas-Reyes, Ricardo
2010-06-25
Polycyclic aromatic compounds (PAHs) are of concern in environmental chemistry and toxicology. In the present work, a QSRR study was performed for 209 previously reported PAHs using quantum mechanics and other sources descriptors estimated by different approaches. The B3LYP/6-31G* level of theory was used for geometrical optimization and quantum mechanics related variables. A good linear relationship between gas-chromatographic retention index and electronic or topologic descriptors was found by stepwise linear regression analysis. The molecular polarizability (alpha) and the second order molecular connectivity Kier and Hall index ((2)chi) showed evidence of significant correlation with retention index by means of important squared coefficient of determination, (R(2)), values (R(2)=0.950 and 0.962, respectively). A one variable QSRR model is presented for each descriptor and both models demonstrates a significant predictive capacity established using the leave-many-out LMO (excluding 25% of rows) cross validation method's q(2) cross-validation coefficients q(2)(CV-LMO25%), (obtained q(2)(CV-LMO25%) 0.947 and 0.960, respectively). Furthermore, the physicochemical interpretation of selected descriptors allowed detailed explanation of the source of the observed statistical correlation. The model analysis suggests that only one descriptor is sufficient to establish a consistent retention index-structure relationship. Moderate or non-significant improve was observed for quantitative results or statistical validation parameters when introducing more terms in predictive equation. The one parameter QSRR proposed model offers a consistent scheme to predict chromatographic properties of PAHs compounds. Copyright 2010 Elsevier B.V. All rights reserved.
Random forest classification of large volume structures for visuo-haptic rendering in CT images
NASA Astrophysics Data System (ADS)
Mastmeyer, Andre; Fortmeier, Dirk; Handels, Heinz
2016-03-01
For patient-specific voxel-based visuo-haptic rendering of CT scans of the liver area, the fully automatic segmentation of large volume structures such as skin, soft tissue, lungs and intestine (risk structures) is important. Using a machine learning based approach, several existing segmentations from 10 segmented gold-standard patients are learned by random decision forests individually and collectively. The core of this paper is feature selection and the application of the learned classifiers to a new patient data set. In a leave-some-out cross-validation, the obtained full volume segmentations are compared to the gold-standard segmentations of the untrained patients. The proposed classifiers use a multi-dimensional feature space to estimate the hidden truth, instead of relying on clinical standard threshold and connectivity based methods. The result of our efficient whole-body section classification are multi-label maps with the considered tissues. For visuo-haptic simulation, other small volume structures would have to be segmented additionally. We also take a look into these structures (liver vessels). For an experimental leave-some-out study consisting of 10 patients, the proposed method performs much more efficiently compared to state of the art methods. In two variants of leave-some-out experiments we obtain best mean DICE ratios of 0.79, 0.97, 0.63 and 0.83 for skin, soft tissue, hard bone and risk structures. Liver structures are segmented with DICE 0.93 for the liver, 0.43 for blood vessels and 0.39 for bile vessels.
Predicting protein-binding regions in RNA using nucleotide profiles and compositions.
Choi, Daesik; Park, Byungkyu; Chae, Hanju; Lee, Wook; Han, Kyungsook
2017-03-14
Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. We developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others. Sequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful features than nucleotide compositions in finding protein-binding regions in RNA sequences. But, a slight performance gain was obtained when using the sequence profiles along with nucleotide compositions. These are preliminary results of ongoing research, but demonstrate the potential of our approach as a powerful predictor of protein-binding regions in RNA. The program and supporting data are available at http://bclab.inha.ac.kr/RBPbinding .
Tabu search and binary particle swarm optimization for feature selection using microarray data.
Chuang, Li-Yeh; Yang, Cheng-Huei; Yang, Cheng-Hong
2009-12-01
Gene expression profiles have great potential as a medical diagnosis tool because they represent the state of a cell at the molecular level. In the classification of cancer type research, available training datasets generally have a fairly small sample size compared to the number of genes involved. This fact poses an unprecedented challenge to some classification methodologies due to training data limitations. Therefore, a good selection method for genes relevant for sample classification is needed to improve the predictive accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this article, we propose to combine tabu search (TS) and binary particle swarm optimization (BPSO) for feature selection. BPSO acts as a local optimizer each time the TS has been run for a single generation. The K-nearest neighbor method with leave-one-out cross-validation and support vector machine with one-versus-rest serve as evaluators of the TS and BPSO. The proposed method is applied and compared to the 11 classification problems taken from the literature. Experimental results show that our method simplifies features effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.
BEaST: brain extraction based on nonlocal segmentation technique.
Eskildsen, Simon F; Coupé, Pierrick; Fonov, Vladimir; Manjón, José V; Leung, Kelvin K; Guizard, Nicolas; Wassef, Shafik N; Østergaard, Lasse Riis; Collins, D Louis
2012-02-01
Brain extraction is an important step in the analysis of brain images. The variability in brain morphology and the difference in intensity characteristics due to imaging sequences make the development of a general purpose brain extraction algorithm challenging. To address this issue, we propose a new robust method (BEaST) dedicated to produce consistent and accurate brain extraction. This method is based on nonlocal segmentation embedded in a multi-resolution framework. A library of 80 priors is semi-automatically constructed from the NIH-sponsored MRI study of normal brain development, the International Consortium for Brain Mapping, and the Alzheimer's Disease Neuroimaging Initiative databases. In testing, a mean Dice similarity coefficient of 0.9834±0.0053 was obtained when performing leave-one-out cross validation selecting only 20 priors from the library. Validation using the online Segmentation Validation Engine resulted in a top ranking position with a mean Dice coefficient of 0.9781±0.0047. Robustness of BEaST is demonstrated on all baseline ADNI data, resulting in a very low failure rate. The segmentation accuracy of the method is better than two widely used publicly available methods and recent state-of-the-art hybrid approaches. BEaST provides results comparable to a recent label fusion approach, while being 40 times faster and requiring a much smaller library of priors. Copyright © 2011 Elsevier Inc. All rights reserved.
Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H
2017-03-01
For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type. Copyright © 2017. Published by Elsevier Inc.
Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M
2015-01-01
Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.
Ciofi, Lorenzo; Renai, Lapo; Rossini, Daniele; Ancillotti, Claudia; Falai, Alida; Fibbi, Donatella; Bruzzoniti, Maria Concetta; Santana-Rodriguez, José Juan; Orlandini, Serena; Del Bubba, Massimo
2018-01-01
The applicability of a direct injection UHPLC-MS/MS method for the analysis of several perfluoroalkyl acids (PFAAs) in a wide range of water matrices was investigated. The method is based on the direct injection of 100µL of centrifuged water sample, without any other sample treatment. Very good method detection limits (0.014-0.44ngL -1 ) and excellent intra and inter-day precision (RSD% values in the range 1.8-4.4% and 2.7-5.7%, respectively) were achieved, with a total analysis time of 20min per sample. A high number of samples - i.e. 8 drinking waters (DW), 12 ground waters (GW), 13 surface waters (SW), 8 influents and 11 effluents of wastewater treatment plants (WWTP IN and WWTP OUT ) were processed and the extent of matrix effect (ME) was calculated, highlighting the strong prevalence of |ME| < 20%. The occurrence of |ME| > 50% was occasionally observed only for perfluorooctanesulphonic and perfluorodecanoic acids. Linear discriminant analysis highlighted the great contribution of the sample origin (i.e. DW, GW, SW, WWTP IN and WWTP OUT ) to the ME. Partial least square regression (PLS) and leave-one-out cross-validation were performed in order to interpret and predict the signal suppression or enhancement phenomena as a function of physicochemical parameters of water samples (i.e. conductivity, hardness and chemical oxygen demand) and background chromatographic area. The PLS approach resulted only in an approximate screening, due to the low prediction power of the PLS models. However, for most analytes in most samples, the fitted and cross-validated values were such as to correctly distinguish between | ME | higher than 20% or below this limit. PFAAs in the aforementioned water samples were quantified by means of the standard addition method, highlighting their occurrence mainly in WWTP influents and effluents, at concentrations as high as one hundred of µgL -1 . Copyright © 2017 Elsevier B.V. All rights reserved.
2012-01-01
Objective Odor exposure is an environmental stressor that is responsible of many citizens complains about air pollution in non-urban areas. However, information about the exposure-response relation is scarce. One of the main challenges is to identify a measurable compound that can be related with odor annoyance responses. We investigated the association between regional and temporal variation of ammonia (NH3) concentrations in five Danish non-urban regions and environmental odor annoyance as perceived by the local residents. Methods A cross-sectional study where NH3 concentration was obtained from the national air quality monitoring program and from emission-dispersion modelling, and odor pollution perception from questionnaires. The exposure-response model was a sigmoid model. Linear regression analyses were used to estimate the model constants after equation transformations. The model was validated using leave-one-out cross validation (LOOCV) statistical method. Results About 45% of the respondents were annoyed by odor pollution at their residential areas. The perceived odor was characterized by all respondents as animal waste odor. The exposure-annoyance sigmoid model showed that the prevalence of odor annoyance was significantly associated with NH3 concentrations (measured and estimated) at the local air quality monitoring stations (p < 0.01,R2 = 0.99; and p < 0.05,R2 = 0.93; respectively). Prediction errors were below 5.1% and 20% respectively. The seasonal pattern of odor perception was associated with the seasonal variation in NH3 concentrations (p < 0.001, adjusted R2 = 0.68). Conclusion The results suggest that atmospheric NH3 levels at local air quality stations could be used as indicators of prevalence of odor annoyance in non-urban residential communities. PMID:22513250
NASA Astrophysics Data System (ADS)
Sepehri, Bakhtyar; Ghavami, Raouf
2017-02-01
In this research, molecular docking and CoMFA were used to determine interactions of α, β-unsaturated carbonyl-based compounds and oxime analogs with P-glycoprotein and prediction of their activity. Molecular docking study shown these molecules establish strong Van der Waals interactions with side chain of PHE-332, PHE-728 and PHE-974. Based on the effect of component numbers on squared correlation coefficient for cross validation tests (including leave-one-out and leave-many-out), CoMFA models with five components were built to predict pIC50 of molecules in seven cancer cell lines (including Panc-1 (pancreas cancer cell line), PaCa-2 (pancreatic carcinoma cell line), MCF-7 (breast cancer cell line), A-549 (epithelial), HT-29 (colon cancer cell line), H-460 (lung cancer cell line), PC-3 (prostate cancer cell line)). R2 values for training and test sets were in the range of 0.94-0.97 and 0.84 to 0.92, respectively, and for LOO and LMO cross validation test, q2 values were in the range of 0.75-0.82 and 0.65 to 0.73, respectively. Based on molecular docking results and extracted steric and electrostatic contour maps for CoMFA models, four new molecules with higher activity with respect to the most active compound in data set were designed.
NASA Astrophysics Data System (ADS)
Hemmateenejad, Bahram; Rezaei, Zahra; Khabnadideh, Soghra; Saffari, Maryam
2007-11-01
Carbamazepine (CBZ) undergoes enzyme biotransformation through epoxidation with the formation of its metabolite, carbamazepine-10,11-epoxide (CBZE). A simple chemometrics-assisted spectrophotometric method has been proposed for simultaneous determination of CBZ and CBZE in plasma. A liquid extraction procedure was operated to separate the analytes from plasma, and the UV absorbance spectra of the resultant solutions were subjected to partial least squares (PLS) regression. The optimum number of PLS latent variables was selected according to the PRESS values of leave-one-out cross-validation. A HPLC method was also employed for comparison. The respective mean recoveries for analysis of CBZ and CBZE in synthetic mixtures were 102.57 (±0.25)% and 103.00 (±0.09)% for PLS and 99.40 (±0.15)% and 102.20 (±0.02)%. The concentrations of CBZ and CBZE were also determined in five patients using the PLS and HPLC methods. The results showed that the data obtained by PLS were comparable with those obtained by HPLC method.
GRMDA: Graph Regression for MiRNA-Disease Association Prediction
Chen, Xing; Yang, Jing-Ru; Guan, Na-Na; Li, Jian-Qiang
2018-01-01
Nowadays, as more and more associations between microRNAs (miRNAs) and diseases have been discovered, miRNA has gradually become a hot topic in the biological field. Because of the high consumption of time and money on carrying out biological experiments, computational method which can help scientists choose the most likely associations between miRNAs and diseases for further experimental studies is desperately needed. In this study, we proposed a method of Graph Regression for MiRNA-Disease Association prediction (GRMDA) which combines known miRNA-disease associations, miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity. We used Gaussian interaction profile kernel similarity to supplement the shortage of miRNA functional similarity and disease semantic similarity. Furthermore, the graph regression was synchronously performed in three latent spaces, including association space, miRNA similarity space, and disease similarity space, by using two matrix factorization approaches called Singular Value Decomposition and Partial Least-Squares to extract important related attributes and filter the noise. In the leave-one-out cross validation and five-fold cross validation, GRMDA obtained the AUCs of 0.8272 and 0.8080 ± 0.0024, respectively. Thus, its performance is better than some previous models. In the case study of Lymphoma using the recorded miRNA-disease associations in HMDD V2.0 database, 88% of top 50 predicted miRNAs were verified by experimental literatures. In order to test the performance of GRMDA on new diseases with no known related miRNAs, we took Breast Neoplasms as an example by regarding all the known related miRNAs as unknown ones. We found that 100% of top 50 predicted miRNAs were verified. Moreover, 84% of top 50 predicted miRNAs in case study for Esophageal Neoplasms based on HMDD V1.0 were verified to have known associations. In conclusion, GRMDA is an effective and practical method for miRNA-disease association prediction. PMID:29515453
GRMDA: Graph Regression for MiRNA-Disease Association Prediction.
Chen, Xing; Yang, Jing-Ru; Guan, Na-Na; Li, Jian-Qiang
2018-01-01
Nowadays, as more and more associations between microRNAs (miRNAs) and diseases have been discovered, miRNA has gradually become a hot topic in the biological field. Because of the high consumption of time and money on carrying out biological experiments, computational method which can help scientists choose the most likely associations between miRNAs and diseases for further experimental studies is desperately needed. In this study, we proposed a method of Graph Regression for MiRNA-Disease Association prediction (GRMDA) which combines known miRNA-disease associations, miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity. We used Gaussian interaction profile kernel similarity to supplement the shortage of miRNA functional similarity and disease semantic similarity. Furthermore, the graph regression was synchronously performed in three latent spaces, including association space, miRNA similarity space, and disease similarity space, by using two matrix factorization approaches called Singular Value Decomposition and Partial Least-Squares to extract important related attributes and filter the noise. In the leave-one-out cross validation and five-fold cross validation, GRMDA obtained the AUCs of 0.8272 and 0.8080 ± 0.0024, respectively. Thus, its performance is better than some previous models. In the case study of Lymphoma using the recorded miRNA-disease associations in HMDD V2.0 database, 88% of top 50 predicted miRNAs were verified by experimental literatures. In order to test the performance of GRMDA on new diseases with no known related miRNAs, we took Breast Neoplasms as an example by regarding all the known related miRNAs as unknown ones. We found that 100% of top 50 predicted miRNAs were verified. Moreover, 84% of top 50 predicted miRNAs in case study for Esophageal Neoplasms based on HMDD V1.0 were verified to have known associations. In conclusion, GRMDA is an effective and practical method for miRNA-disease association prediction.
Drop coating deposition Raman spectroscopy of blood plasma for the detection of colorectal cancer
NASA Astrophysics Data System (ADS)
Li, Pengpeng; Chen, Changshui; Deng, Xiaoyuan; Mao, Hua; Jin, Shaoqin
2015-03-01
We have recently applied the technique of drop coating deposition Raman (DCDR) spectroscopy for colorectal cancer (CRC) detection using blood plasma. The aim of this study was to develop a more convenient and stable method based on blood plasma for noninvasive CRC detection. Significant differences are observed in DCDR spectra between healthy (n=105) and cancer (n=75) plasma from 15 CRC patients and 21 volunteers, particularly in the spectra that are related to proteins, nucleic acids, and β-carotene. The multivariate analysis principal components analysis and the linear discriminate analysis, together with leave-one-out, cross validation were used on DCDR spectra and yielded a sensitivity of 100% (75/75) and specificity of 98.1% (103/105) for detection of CRC. This study demonstrates that DCDR spectroscopy of blood plasma associated with multivariate statistical algorithms has the potential for the noninvasive detection of CRC.
Weisberg, Arel; Lakis, Rollin E; Simpson, Michael F; Horowitz, Leo; Craparo, Joseph
2014-01-01
The versatility of laser-induced breakdown spectroscopy (LIBS) as an analytical method for high-temperature applications was demonstrated through measurement of the concentrations of the lanthanide elements europium (Eu) and praseodymium (Pr) in molten eutectic lithium chloride-potassium chloride (LiCl-KCl) salts at a temperature of 500 °C. Laser pulses (1064 nm, 7 ns, 120 mJ/pulse) were focused on the top surface of the molten salt samples in a laboratory furnace under an argon atmosphere, and the resulting LIBS signals were collected using a broadband Echelle-type spectrometer. Partial least squares (PLS) regression using leave-one-sample-out cross-validation was used to quantify the concentrations of Eu and Pr in the samples. The root mean square error of prediction (RMSEP) for Eu was 0.13% (absolute) over a concentration range of 0-3.01%, and for Pr was 0.13% (absolute) over a concentration range of 0-1.04%.
Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.
Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon
2015-01-01
Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs. Copyright © 2014 Elsevier Inc. All rights reserved.
Benchmarking Methods and Data Sets for Ligand Enrichment Assessment in Virtual Screening
Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon
2014-01-01
Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. “analogue bias”, “artificial enrichment” and “false negative”. In addition, we introduced our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylase (HDAC) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The Leave-One-Out Cross-Validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased in terms of property matching, ROC curves and AUCs. PMID:25481478
Prioritizing chronic obstructive pulmonary disease (COPD) candidate genes in COPD-related networks
Zhang, Yihua; Li, Wan; Feng, Yuyan; Guo, Shanshan; Zhao, Xilei; Wang, Yahui; He, Yuehan; He, Weiming; Chen, Lina
2017-01-01
Chronic obstructive pulmonary disease (COPD) is a multi-factor disease, which could be caused by many factors, including disturbances of metabolism and protein-protein interactions (PPIs). In this paper, a weighted COPD-related metabolic network and a weighted COPD-related PPI network were constructed base on COPD disease genes and functional information. Candidate genes in these weighted COPD-related networks were prioritized by making use of a gene prioritization method, respectively. Literature review and functional enrichment analysis of the top 100 genes in these two networks suggested the correlation of COPD and these genes. The performance of our gene prioritization method was superior to that of ToppGene and ToppNet for genes from the COPD-related metabolic network or the COPD-related PPI network after assessing using leave-one-out cross-validation, literature validation and functional enrichment analysis. The top-ranked genes prioritized from COPD-related metabolic and PPI networks could promote the better understanding about the molecular mechanism of this disease from different perspectives. The top 100 genes in COPD-related metabolic network or COPD-related PPI network might be potential markers for the diagnosis and treatment of COPD. PMID:29262568
Prioritizing chronic obstructive pulmonary disease (COPD) candidate genes in COPD-related networks.
Zhang, Yihua; Li, Wan; Feng, Yuyan; Guo, Shanshan; Zhao, Xilei; Wang, Yahui; He, Yuehan; He, Weiming; Chen, Lina
2017-11-28
Chronic obstructive pulmonary disease (COPD) is a multi-factor disease, which could be caused by many factors, including disturbances of metabolism and protein-protein interactions (PPIs). In this paper, a weighted COPD-related metabolic network and a weighted COPD-related PPI network were constructed base on COPD disease genes and functional information. Candidate genes in these weighted COPD-related networks were prioritized by making use of a gene prioritization method, respectively. Literature review and functional enrichment analysis of the top 100 genes in these two networks suggested the correlation of COPD and these genes. The performance of our gene prioritization method was superior to that of ToppGene and ToppNet for genes from the COPD-related metabolic network or the COPD-related PPI network after assessing using leave-one-out cross-validation, literature validation and functional enrichment analysis. The top-ranked genes prioritized from COPD-related metabolic and PPI networks could promote the better understanding about the molecular mechanism of this disease from different perspectives. The top 100 genes in COPD-related metabolic network or COPD-related PPI network might be potential markers for the diagnosis and treatment of COPD.
Bueno, Justin; Sikirzhytski, Vitali; Lednev, Igor K
2012-05-15
Near-infrared (NIR) Raman microspectroscopy combined with advanced statistics was used to differentiate gunshot residue (GSR) particles originating from different caliber ammunition. The firearm discharge process is analogous to a complex chemical reaction. The reagents of this process are represented by the chemical composition of the ammunition, firearm, and cartridge case. The specific firearm parameters determine the conditions of the reaction and thus the subsequent product, GSR. We found that Raman spectra collected from these products are characteristic for different caliber ammunition. GSR particles from 9 mm and 0.38 caliber ammunition, collected under identical discharge conditions, were used to demonstrate the capability of confocal Raman microspectroscopy for the discrimination and identification of GSR particles. The caliber differentiation algorithm is based on support vector machines (SVM) and partial least squares (PLS) discriminant analyses, validated by a leave-one-out cross-validation method. This study demonstrates for the first time that NIR Raman microspectroscopy has the potential for the reagentless differentiation of GSR based upon forensically relevant parameters, such as caliber size. When fully developed, this method should have a significant impact on the efficiency of crime scene investigations.
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Wavelet analysis enables system-independent texture analysis of optical coherence tomography images.
Lingley-Papadopoulos, Colleen A; Loew, Murray H; Zara, Jason M
2009-01-01
Texture analysis for tissue characterization is a current area of optical coherence tomography (OCT) research. We discuss some of the differences between OCT systems and the effects those differences have on the resulting images and subsequent image analysis. In addition, as an example, two algorithms for the automatic recognition of bladder cancer are compared: one that was developed on a single system with no consideration for system differences, and one that was developed to address the issues associated with system differences. The first algorithm had a sensitivity of 73% and specificity of 69% when tested using leave-one-out cross-validation on data taken from a single system. When tested on images from another system with a different central wavelength, however, the method classified all images as cancerous regardless of the true pathology. By contrast, with the use of wavelet analysis and the removal of system-dependent features, the second algorithm reported sensitivity and specificity values of 87 and 58%, respectively, when trained on images taken with one imaging system and tested on images taken with another.
Wavelet analysis enables system-independent texture analysis of optical coherence tomography images
NASA Astrophysics Data System (ADS)
Lingley-Papadopoulos, Colleen A.; Loew, Murray H.; Zara, Jason M.
2009-07-01
Texture analysis for tissue characterization is a current area of optical coherence tomography (OCT) research. We discuss some of the differences between OCT systems and the effects those differences have on the resulting images and subsequent image analysis. In addition, as an example, two algorithms for the automatic recognition of bladder cancer are compared: one that was developed on a single system with no consideration for system differences, and one that was developed to address the issues associated with system differences. The first algorithm had a sensitivity of 73% and specificity of 69% when tested using leave-one-out cross-validation on data taken from a single system. When tested on images from another system with a different central wavelength, however, the method classified all images as cancerous regardless of the true pathology. By contrast, with the use of wavelet analysis and the removal of system-dependent features, the second algorithm reported sensitivity and specificity values of 87 and 58%, respectively, when trained on images taken with one imaging system and tested on images taken with another.
van Os-Medendorp, Harmieke; Appelman-Noordermeer, Simone; Bruijnzeel-Koomen, Carla; de Bruin-Weller, Marjolein
2015-01-01
Background: Little is known about the prevalence of sick leave due to atopic dermatitis (AD). The current literature on factors influencing sick leave is mostly derived from other chronic inflammatory diseases. This study aimed to determine the prevalence of sick leave due to AD and to identify influencing factors. Methods: A cross-sectional study was carried out in adult patients with AD. Outcome measures: sick leave during the two-week and one-year periods, socio-demographic characteristics, disease severity, quality of life and socio-occupational factors. Logistic regression analyses were used to determine influencing factors on sick leave over the two-week period. Results: In total, 253 patients were included; 12% of the patients had to take sick leave in the last two weeks due to AD and 42% in the past year. A higher level of symptom interference (OR 1.26; 95% CI 1.13–1.40) or perfectionism/diligence (OR 0.90; 95% CI 0.83–0.96) may respectively increase or decrease the number of sick leave days. Conclusion: Sick leave in patients with AD is a common problem and symptom interference and perfectionism/diligence appeared to influence it. Novel approaches are needed to deal with symptoms at work or school to reduce the amount of sick leave due to AD. PMID:26239345
Raman spectroscopy based screening of IgG positive and negative sera for dengue virus infection
NASA Astrophysics Data System (ADS)
Bilal, M.; Saleem, M.; Bial, Maria; Khan, Saranjam; Ullah, Rahat; Ali, Hina; Ahmed, M.; Ikram, Masroor
2017-11-01
A quantitative analysis for the screening of immunoglobulin-G (IgG) positive human sera samples is presented for the dengue virus infection. The regression model was developed using 79 samples while 20 samples were used to test the performance of the model. The R-square (r 2) value of 0.91 was found through a leave-one-sample-out cross validation method, which shows the validity of this model. This model incorporates the molecular changes associated with IgG. Molecular analysis based on regression coefficients revealed that myristic acid, coenzyme-A, alanine, arabinose, arginine, vitamin C, carotene, fumarate, galactosamine, glutamate, lactic acid, stearic acid, tryptophan and vaccenic acid are positively correlated with IgG; while amide III, collagen, proteins, fatty acids, phospholipids and fucose are negatively correlated. For blindly tested samples, an excellent agreement has been found between the model predicted, and the clinical values of IgG. The parameters, which include sensitivity, specificity, accuracy and the area under the receiver operator characteristic curve, are found to be 100%, 83.3%, 95% and 0.99, respectively, which confirms the high quality of the model.
Finding models to detect Alzheimer's disease by fusing structural and neuropsychological information
NASA Astrophysics Data System (ADS)
Giraldo, Diana L.; García-Arteaga, Juan D.; Velasco, Nelson; Romero, Eduardo
2015-12-01
Alzheimer's disease (AD) is a neurodegenerative disease that affects higher brain functions. Initial diagnosis of AD is based on the patient's clinical history and a battery of neuropsychological tests. The accuracy of the diagnosis is highly dependent on the examiner's skills and on the evolution of a variable clinical frame. This work presents an automatic strategy that learns probabilistic brain models for different stages of the disease, reducing the complexity, parameter adjustment and computational costs. The proposed method starts by setting a probabilistic class description using the information stored in the neuropsychological test, followed by constructing the different structural class models using membership values from the learned probabilistic functions. These models are then used as a reference frame for the classification problem: a new case is assigned to a particular class simply by projecting to the different models. The validation was performed using a leave-one-out cross-validation, two classes were used: Normal Control (NC) subjects and patients diagnosed with mild AD. In this experiment it is possible to achieve a sensibility and specificity of 80% and 79% respectively.
Nam, J G; Kang, K M; Choi, S H; Lim, W H; Yoo, R-E; Kim, J-H; Yun, T J; Sohn, C-H
2017-12-01
Glioblastoma is the most common primary brain malignancy and differentiation of true progression from pseudoprogression is clinically important. Our purpose was to compare the diagnostic performance of dynamic contrast-enhanced pharmacokinetic parameters using the fixed T1 and measured T1 on differentiating true from pseudoprogression of glioblastoma after chemoradiation with temozolomide. This retrospective study included 37 patients with histopathologically confirmed glioblastoma with new enhancing lesions after temozolomide chemoradiation defined as true progression ( n = 15) or pseudoprogression ( n = 22). Dynamic contrast-enhanced pharmacokinetic parameters, including the volume transfer constant, the rate transfer constant, the blood plasma volume per unit volume, and the extravascular extracellular space per unit volume, were calculated by using both the fixed T1 of 1000 ms and measured T1 by using the multiple flip-angle method. Intra- and interobserver reproducibility was assessed by using the intraclass correlation coefficient. Dynamic contrast-enhanced pharmacokinetic parameters were compared between the 2 groups by using univariate and multivariate analysis. The diagnostic performance was evaluated by receiver operating characteristic analysis and leave-one-out cross validation. The intraclass correlation coefficients of all the parameters from both T1 values were fair to excellent (0.689-0.999). The volume transfer constant and rate transfer constant from the fixed T1 were significantly higher in patients with true progression ( P = .048 and .010, respectively). Multivariate analysis revealed that the rate transfer constant from the fixed T1 was the only independent variable (OR, 1.77 × 10 5 ) and showed substantial diagnostic power on receiver operating characteristic analysis (area under the curve, 0.752; P = .002). The sensitivity and specificity on leave-one-out cross validation were 73.3% (11/15) and 59.1% (13/20), respectively. The dynamic contrast-enhanced parameter of rate transfer constant from the fixed T1 acted as a preferable marker to differentiate true progression from pseudoprogression. © 2017 by American Journal of Neuroradiology.
Automatic detection of kidney in 3D pediatric ultrasound images using deep neural networks
NASA Astrophysics Data System (ADS)
Tabrizi, Pooneh R.; Mansoor, Awais; Biggs, Elijah; Jago, James; Linguraru, Marius George
2018-02-01
Ultrasound (US) imaging is the routine and safe diagnostic modality for detecting pediatric urology problems, such as hydronephrosis in the kidney. Hydronephrosis is the swelling of one or both kidneys because of the build-up of urine. Early detection of hydronephrosis can lead to a substantial improvement in kidney health outcomes. Generally, US imaging is a challenging modality for the evaluation of pediatric kidneys with different shape, size, and texture characteristics. The aim of this study is to present an automatic detection method to help kidney analysis in pediatric 3DUS images. The method localizes the kidney based on its minimum volume oriented bounding box) using deep neural networks. Separate deep neural networks are trained to estimate the kidney position, orientation, and scale, making the method computationally efficient by avoiding full parameter training. The performance of the method was evaluated using a dataset of 45 kidneys (18 normal and 27 diseased kidneys diagnosed with hydronephrosis) through the leave-one-out cross validation method. Quantitative results show the proposed detection method could extract the kidney position, orientation, and scale ratio with root mean square values of 1.3 +/- 0.9 mm, 6.34 +/- 4.32 degrees, and 1.73 +/- 0.04, respectively. This method could be helpful in automating kidney segmentation for routine clinical evaluation.
Rodriguez-Saona, L E; Koca, N; Harper, W J; Alvarez, V B
2006-05-01
There is a need for rapid and simple techniques that can be used to predict the quality of cheese. The aim of this research was to develop a simple and rapid screening tool for monitoring Swiss cheese composition by using Fourier transform infrared spectroscopy. Twenty Swiss cheese samples from different manufacturers and degree of maturity were evaluated. Direct measurements of Swiss cheese slices (approximately 0.5 g) were made using a MIRacle 3-reflection diamond attenuated total reflectance (ATR) accessory. Reference methods for moisture (vacuum oven), protein content (Kjeldahl), and fat (Babcock) were used. Calibration models were developed based on a cross-validated (leave-one-out approach) partial least squares regression. The information-rich infrared spectral range for Swiss cheese samples was from 3,000 to 2,800 cm(-1) and 1,800 to 900 cm(-1). The performance statistics for cross-validated models gave estimates for standard error of cross-validation of 0.45, 0.25, and 0.21% for moisture, protein, and fat respectively, and correlation coefficients r > 0.96. Furthermore, the ATR infrared protocol allowed for the classification of cheeses according to manufacturer and aging based on unique spectral information, especially of carbonyl groups, probably due to their distinctive lipid composition. Attenuated total reflectance infrared spectroscopy allowed for the rapid (approximately 3-min analysis time) and accurate analysis of the composition of Swiss cheese. This technique could contribute to the development of simple and rapid protocols for monitoring complex biochemical changes, and predicting the final quality of the cheese.
ERIC Educational Resources Information Center
Young, Debra Hunt
2010-01-01
The indicators and predictors of dropout as documented in the literature are vast and encompass influences such as family, motivation, socio-economic status, and academic achievement, and could be accepted as universal reasons students choose to leave school and not return. This qualitative study investigated the reasons why students in one West…
Parametric vs. non-parametric statistics of low resolution electromagnetic tomography (LORETA).
Thatcher, R W; North, D; Biver, C
2005-01-01
This study compared the relative statistical sensitivity of non-parametric and parametric statistics of 3-dimensional current sources as estimated by the EEG inverse solution Low Resolution Electromagnetic Tomography (LORETA). One would expect approximately 5% false positives (classification of a normal as abnormal) at the P < .025 level of probability (two tailed test) and approximately 1% false positives at the P < .005 level. EEG digital samples (2 second intervals sampled 128 Hz, 1 to 2 minutes eyes closed) from 43 normal adult subjects were imported into the Key Institute's LORETA program. We then used the Key Institute's cross-spectrum and the Key Institute's LORETA output files (*.lor) as the 2,394 gray matter pixel representation of 3-dimensional currents at different frequencies. The mean and standard deviation *.lor files were computed for each of the 2,394 gray matter pixels for each of the 43 subjects. Tests of Gaussianity and different transforms were computed in order to best approximate a normal distribution for each frequency and gray matter pixel. The relative sensitivity of parametric vs. non-parametric statistics were compared using a "leave-one-out" cross validation method in which individual normal subjects were withdrawn and then statistically classified as being either normal or abnormal based on the remaining subjects. Log10 transforms approximated Gaussian distribution in the range of 95% to 99% accuracy. Parametric Z score tests at P < .05 cross-validation demonstrated an average misclassification rate of approximately 4.25%, and range over the 2,394 gray matter pixels was 27.66% to 0.11%. At P < .01 parametric Z score cross-validation false positives were 0.26% and ranged from 6.65% to 0% false positives. The non-parametric Key Institute's t-max statistic at P < .05 had an average misclassification error rate of 7.64% and ranged from 43.37% to 0.04% false positives. The nonparametric t-max at P < .01 had an average misclassification rate of 6.67% and ranged from 41.34% to 0% false positives of the 2,394 gray matter pixels for any cross-validated normal subject. In conclusion, adequate approximation to Gaussian distribution and high cross-validation can be achieved by the Key Institute's LORETA programs by using a log10 transform and parametric statistics, and parametric normative comparisons had lower false positive rates than the non-parametric tests.
Decorrelation of the true and estimated classifier errors in high-dimensional settings.
Hanczar, Blaise; Hua, Jianping; Dougherty, Edward R
2007-01-01
The aim of many microarray experiments is to build discriminatory diagnosis and prognosis models. Given the huge number of features and the small number of examples, model validity which refers to the precision of error estimation is a critical issue. Previous studies have addressed this issue via the deviation distribution (estimated error minus true error), in particular, the deterioration of cross-validation precision in high-dimensional settings where feature selection is used to mitigate the peaking phenomenon (overfitting). Because classifier design is based upon random samples, both the true and estimated errors are sample-dependent random variables, and one would expect a loss of precision if the estimated and true errors are not well correlated, so that natural questions arise as to the degree of correlation and the manner in which lack of correlation impacts error estimation. We demonstrate the effect of correlation on error precision via a decomposition of the variance of the deviation distribution, observe that the correlation is often severely decreased in high-dimensional settings, and show that the effect of high dimensionality on error estimation tends to result more from its decorrelating effects than from its impact on the variance of the estimated error. We consider the correlation between the true and estimated errors under different experimental conditions using both synthetic and real data, several feature-selection methods, different classification rules, and three error estimators commonly used (leave-one-out cross-validation, k-fold cross-validation, and .632 bootstrap). Moreover, three scenarios are considered: (1) feature selection, (2) known-feature set, and (3) all features. Only the first is of practical interest; however, the other two are needed for comparison purposes. We will observe that the true and estimated errors tend to be much more correlated in the case of a known feature set than with either feature selection or using all features, with the better correlation between the latter two showing no general trend, but differing for different models.
Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M
2018-02-01
Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.
Explore Efficient Local Features from RGB-D Data for One-Shot Learning Gesture Recognition.
Wan, Jun; Guo, Guodong; Li, Stan Z
2016-08-01
Availability of handy RGB-D sensors has brought about a surge of gesture recognition research and applications. Among various approaches, one shot learning approach is advantageous because it requires minimum amount of data. Here, we provide a thorough review about one-shot learning gesture recognition from RGB-D data and propose a novel spatiotemporal feature extracted from RGB-D data, namely mixed features around sparse keypoints (MFSK). In the review, we analyze the challenges that we are facing, and point out some future research directions which may enlighten researchers in this field. The proposed MFSK feature is robust and invariant to scale, rotation and partial occlusions. To alleviate the insufficiency of one shot training samples, we augment the training samples by artificially synthesizing versions of various temporal scales, which is beneficial for coping with gestures performed at varying speed. We evaluate the proposed method on the Chalearn gesture dataset (CGD). The results show that our approach outperforms all currently published approaches on the challenging data of CGD, such as translated, scaled and occluded subsets. When applied to the RGB-D datasets that are not one-shot (e.g., the Cornell Activity Dataset-60 and MSR Daily Activity 3D dataset), the proposed feature also produces very promising results under leave-one-out cross validation or one-shot learning.
Driver fatigue detection through multiple entropy fusion analysis in an EEG-based system.
Min, Jianliang; Wang, Ping; Hu, Jianfeng
2017-01-01
Driver fatigue is an important contributor to road accidents, and fatigue detection has major implications for transportation safety. The aim of this research is to analyze the multiple entropy fusion method and evaluate several channel regions to effectively detect a driver's fatigue state based on electroencephalogram (EEG) records. First, we fused multiple entropies, i.e., spectral entropy, approximate entropy, sample entropy and fuzzy entropy, as features compared with autoregressive (AR) modeling by four classifiers. Second, we captured four significant channel regions according to weight-based electrodes via a simplified channel selection method. Finally, the evaluation model for detecting driver fatigue was established with four classifiers based on the EEG data from four channel regions. Twelve healthy subjects performed continuous simulated driving for 1-2 hours with EEG monitoring on a static simulator. The leave-one-out cross-validation approach obtained an accuracy of 98.3%, a sensitivity of 98.3% and a specificity of 98.2%. The experimental results verified the effectiveness of the proposed method, indicating that the multiple entropy fusion features are significant factors for inferring the fatigue state of a driver.
NASA Astrophysics Data System (ADS)
Cánovas-García, Fulgencio; Alonso-Sarría, Francisco; Gomariz-Castillo, Francisco; Oñate-Valdivieso, Fernando
2017-06-01
Random forest is a classification technique widely used in remote sensing. One of its advantages is that it produces an estimation of classification accuracy based on the so called out-of-bag cross-validation method. It is usually assumed that such estimation is not biased and may be used instead of validation based on an external data-set or a cross-validation external to the algorithm. In this paper we show that this is not necessarily the case when classifying remote sensing imagery using training areas with several pixels or objects. According to our results, out-of-bag cross-validation clearly overestimates accuracy, both overall and per class. The reason is that, in a training patch, pixels or objects are not independent (from a statistical point of view) of each other; however, they are split by bootstrapping into in-bag and out-of-bag as if they were really independent. We believe that putting whole patch, rather than pixels/objects, in one or the other set would produce a less biased out-of-bag cross-validation. To deal with the problem, we propose a modification of the random forest algorithm to split training patches instead of the pixels (or objects) that compose them. This modified algorithm does not overestimate accuracy and has no lower predictive capability than the original. When its results are validated with an external data-set, the accuracy is not different from that obtained with the original algorithm. We analysed three remote sensing images with different classification approaches (pixel and object based); in the three cases reported, the modification we propose produces a less biased accuracy estimation.
A novel approach for food intake detection using electroglottography
Farooq, Muhammad; Fontana, Juan M; Sazonov, Edward
2014-01-01
Many methods for monitoring diet and food intake rely on subjects self-reporting their daily intake. These methods are subjective, potentially inaccurate and need to be replaced by more accurate and objective methods. This paper presents a novel approach that uses an Electroglottograph (EGG) device for an objective and automatic detection of food intake. Thirty subjects participated in a 4-visit experiment involving the consumption of meals with self-selected content. Variations in the electrical impedance across the larynx caused by the passage of food during swallowing were captured by the EGG device. To compare performance of the proposed method with a well-established acoustical method, a throat microphone was used for monitoring swallowing sounds. Both signals were segmented into non-overlapping epochs of 30 s and processed to extract wavelet features. Subject-independent classifiers were trained using Artificial Neural Networks, to identify periods of food intake from the wavelet features. Results from leave-one-out cross-validation showed an average per-epoch classification accuracy of 90.1% for the EGG-based method and 83.1% for the acoustic-based method, demonstrating the feasibility of using an EGG for food intake detection. PMID:24671094
Wang, Jinglu; Qu, Susu; Wang, Weixiao; Guo, Liyuan; Zhang, Kunlin; Chang, Suhua; Wang, Jing
2016-11-01
Numbers of gene expression profiling studies of bipolar disorder have been published. Besides different array chips and tissues, variety of the data processes in different cohorts aggravated the inconsistency of results of these genome-wide gene expression profiling studies. By searching the gene expression databases, we obtained six data sets for prefrontal cortex (PFC) of bipolar disorder with raw data and combinable platforms. We used standardized pre-processing and quality control procedures to analyze each data set separately and then combined them into a large gene expression matrix with 101 bipolar disorder subjects and 106 controls. A standard linear mixed-effects model was used to calculate the differentially expressed genes (DEGs). Multiple levels of sensitivity analyses and cross validation with genetic data were conducted. Functional and network analyses were carried out on basis of the DEGs. In the result, we identified 198 unique differentially expressed genes in the PFC of bipolar disorder and control. Among them, 115 DEGs were robust to at least three leave-one-out tests or different pre-processing methods; 51 DEGs were validated with genetic association signals. Pathway enrichment analysis showed these DEGs were related with regulation of neurological system, cell death and apoptosis, and several basic binding processes. Protein-protein interaction network further identified one key hub gene. We have contributed the most comprehensive integrated analysis of bipolar disorder expression profiling studies in PFC to date. The DEGs, especially those with multiple validations, may denote a common signature of bipolar disorder and contribute to the pathogenesis of disease. Copyright © 2016 Elsevier Ltd. All rights reserved.
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
2010-01-01
Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies.
David, Maria Pamela C; Concepcion, Gisela P; Padlan, Eduardo A
2010-02-08
All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.
CNN-BLPred: a Convolutional neural network based predictor for β-Lactamases (BL) and their classes.
White, Clarence; Ismail, Hamid D; Saigo, Hiroto; Kc, Dukka B
2017-12-28
The β-Lactamase (BL) enzyme family is an important class of enzymes that plays a key role in bacterial resistance to antibiotics. As the newly identified number of BL enzymes is increasing daily, it is imperative to develop a computational tool to classify the newly identified BL enzymes into one of its classes. There are two types of classification of BL enzymes: Molecular Classification and Functional Classification. Existing computational methods only address Molecular Classification and the performance of these existing methods is unsatisfactory. We addressed the unsatisfactory performance of the existing methods by implementing a Deep Learning approach called Convolutional Neural Network (CNN). We developed CNN-BLPred, an approach for the classification of BL proteins. The CNN-BLPred uses Gradient Boosted Feature Selection (GBFS) in order to select the ideal feature set for each BL classification. Based on the rigorous benchmarking of CCN-BLPred using both leave-one-out cross-validation and independent test sets, CCN-BLPred performed better than the other existing algorithms. Compared with other architectures of CNN, Recurrent Neural Network, and Random Forest, the simple CNN architecture with only one convolutional layer performs the best. After feature extraction, we were able to remove ~95% of the 10,912 features using Gradient Boosted Trees. During 10-fold cross validation, we increased the accuracy of the classic BL predictions by 7%. We also increased the accuracy of Class A, Class B, Class C, and Class D performance by an average of 25.64%. The independent test results followed a similar trend. We implemented a deep learning algorithm known as Convolutional Neural Network (CNN) to develop a classifier for BL classification. Combined with feature selection on an exhaustive feature set and using balancing method such as Random Oversampling (ROS), Random Undersampling (RUS) and Synthetic Minority Oversampling Technique (SMOTE), CNN-BLPred performs significantly better than existing algorithms for BL classification.
Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor
2014-01-01
The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels. PMID:25372618
Ruiz-Gonzalez, Ruben; Gomez-Gil, Jaime; Gomez-Gil, Francisco Javier; Martínez-Martínez, Víctor
2014-11-03
The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM)-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i) accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii) the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii) when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels.
Muniyappa, Ranganath; Irving, Brian A; Unni, Uma S; Briggs, William M; Nair, K Sreekumaran; Quon, Michael J; Kurpad, Anura V
2010-12-01
Insulin resistance is highly prevalent in Asian Indians and contributes to worldwide public health problems, including diabetes and related disorders. Surrogate measurements of insulin sensitivity/resistance are used frequently to study Asian Indians, but these are not formally validated in this population. In this study, we compared the ability of simple surrogate indices to accurately predict insulin sensitivity as determined by the reference glucose clamp method. In this cross-sectional study of Asian-Indian men (n = 70), we used a calibration model to assess the ability of simple surrogate indices for insulin sensitivity [quantitative insulin sensitivity check index (QUICKI), homeostasis model assessment (HOMA2-IR), fasting insulin-to-glucose ratio (FIGR), and fasting insulin (FI)] to predict an insulin sensitivity index derived from the reference glucose clamp method (SI(Clamp)). Predictive accuracy was assessed by both root mean squared error (RMSE) of prediction as well as leave-one-out cross-validation-type RMSE of prediction (CVPE). QUICKI, FIGR, and FI, but not HOMA2-IR, had modest linear correlations with SI(Clamp) (QUICKI: r = 0.36; FIGR: r = -0.36; FI: r = -0.27; P < 0.05). No significant differences were noted among CVPE or RMSE from any of the surrogate indices when compared with QUICKI. Surrogate measurements of insulin sensitivity/resistance such as QUICKI, FIGR, and FI are easily obtainable in large clinical studies, but these may only be useful as secondary outcome measurements in assessing insulin sensitivity/resistance in clinical studies of Asian Indians.
Shetty, N; Løvendahl, P; Lund, M S; Buitenhuis, A J
2017-01-01
The present study explored the effectiveness of Fourier transform mid-infrared (FT-IR) spectral profiles as a predictor for dry matter intake (DMI) and residual feed intake (RFI). The partial least squares regression method was used to develop the prediction models. The models were validated using different external test sets, one randomly leaving out 20% of the records (validation A), the second randomly leaving out 20% of cows (validation B), and a third (for DMI prediction models) randomly leaving out one cow (validation C). The data included 1,044 records from 140 cows; 97 were Danish Holstein and 43 Danish Jersey. Results showed better accuracies for validation A compared with other validation methods. Milk yield (MY) contributed largely to DMI prediction; MY explained 59% of the variation and the validated model error root mean square error of prediction (RMSEP) was 2.24kg. The model was improved by adding live weight (LW) as an additional predictor trait, where the accuracy R 2 increased from 0.59 to 0.72 and error RMSEP decreased from 2.24 to 1.83kg. When only the milk FT-IR spectral profile was used in DMI prediction, a lower prediction ability was obtained, with R 2 =0.30 and RMSEP=2.91kg. However, once the spectral information was added, along with MY and LW as predictors, model accuracy improved and R 2 increased to 0.81 and RMSEP decreased to 1.49kg. Prediction accuracies of RFI changed throughout lactation. The RFI prediction model for the early-lactation stage was better compared with across lactation or mid- and late-lactation stages, with R 2 =0.46 and RMSEP=1.70. The most important spectral wavenumbers that contributed to DMI and RFI prediction models included fat, protein, and lactose peaks. Comparable prediction results were obtained when using infrared-predicted fat, protein, and lactose instead of full spectra, indicating that FT-IR spectral data do not add significant new information to improve DMI and RFI prediction models. Therefore, in practice, if full FT-IR spectral data are not stored, it is possible to achieve similar DMI or RFI prediction results based on standard milk control data. For DMI, the milk fat region was responsible for the major variation in milk spectra; for RFI, the major variation in milk spectra was within the milk protein region. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Martínez Gila, Diego Manuel; Cano Marchal, Pablo; Gómez Ortega, Juan; Gámez García, Javier
2018-03-25
Normally the olive oil quality is assessed by chemical analysis according to international standards. These norms define chemical and organoleptic markers, and depending on the markers, the olive oil can be labelled as lampante, virgin, or extra virgin olive oil (EVOO), the last being an indicator of top quality. The polyphenol content is related to EVOO organoleptic features, and different scientific works have studied the positive influence that these compounds have on human health. The works carried out in this paper are focused on studying relations between the polyphenol content in olive oil samples and its spectral response in the near infrared spectra. In this context, several acquisition parameters have been assessed to optimize the measurement process within the virgin olive oil production process. The best regression model reached a mean error value of 156.14 mg/kg in leave one out cross validation, and the higher regression coefficient was 0.81 through holdout validation.
Cano Marchal, Pablo; Gómez Ortega, Juan; Gámez García, Javier
2018-01-01
Normally the olive oil quality is assessed by chemical analysis according to international standards. These norms define chemical and organoleptic markers, and depending on the markers, the olive oil can be labelled as lampante, virgin, or extra virgin olive oil (EVOO), the last being an indicator of top quality. The polyphenol content is related to EVOO organoleptic features, and different scientific works have studied the positive influence that these compounds have on human health. The works carried out in this paper are focused on studying relations between the polyphenol content in olive oil samples and its spectral response in the near infrared spectra. In this context, several acquisition parameters have been assessed to optimize the measurement process within the virgin olive oil production process. The best regression model reached a mean error value of 156.14 mg/kg in leave one out cross validation, and the higher regression coefficient was 0.81 through holdout validation. PMID:29587403
Nargotra, Amit; Sharma, Sujata; Koul, Jawahir Lal; Sangwan, Pyare Lal; Khan, Inshad Ali; Kumar, Ashwani; Taneja, Subhash Chander; Koul, Surrinder
2009-10-01
Quantitative structure activity relationship (QSAR) analysis of piperine analogs as inhibitors of efflux pump NorA from Staphylococcus aureus has been performed in order to obtain a highly accurate model enabling prediction of inhibition of S. aureus NorA of new chemical entities from natural sources as well as synthetic ones. Algorithm based on genetic function approximation method of variable selection in Cerius2 was used to generate the model. Among several types of descriptors viz., topological, spatial, thermodynamic, information content and E-state indices that were considered in generating the QSAR model, three descriptors such as partial negative surface area of the compounds, area of the molecular shadow in the XZ plane and heat of formation of the molecules resulted in a statistically significant model with r(2)=0.962 and cross-validation parameter q(2)=0.917. The validation of the QSAR models was done by cross-validation, leave-25%-out and external test set prediction. The theoretical approach indicates that the increase in the exposed partial negative surface area increases the inhibitory activity of the compound against NorA whereas the area of the molecular shadow in the XZ plane is inversely proportional to the inhibitory activity. This model also explains the relationship of the heat of formation of the compound with the inhibitory activity. The model is not only able to predict the activity of new compounds but also explains the important regions in the molecules in quantitative manner.
Nayana, M Ravi Shashi; Sekhar, Y Nataraja; Nandyala, Haritha; Muttineni, Ravikumar; Bairy, Santosh Kumar; Singh, Kriti; Mahmood, S K
2008-10-01
In the present study, a series of 179 quinoline and quinazoline heterocyclic analogues exhibiting inhibitory activity against Gastric (H+/K+)-ATPase were investigated using the comparative molecular field analysis (CoMFA) and comparative molecular similarity indices (CoMSIA) methods. Both the models exhibited good correlation between the calculated 3D-QSAR fields and the observed biological activity for the respective training set compounds. The most optimal CoMFA and CoMSIA models yielded significant leave-one-out cross-validation coefficient, q(2) of 0.777, 0.744 and conventional cross-validation coefficient, r(2) of 0.927, 0.914 respectively. The predictive ability of generated models was tested on a set of 52 compounds having broad range of activity. CoMFA and CoMSIA yielded predicted activities for test set compounds with r(pred)(2) of 0.893 and 0.917 respectively. These validation tests not only revealed the robustness of the models but also demonstrated that for our models r(pred)(2) based on the mean activity of test set compounds can accurately estimate external predictivity. The factors affecting activity were analyzed carefully according to standard coefficient contour maps of steric, electrostatic, hydrophobic, acceptor and donor fields derived from the CoMFA and CoMSIA. These contour plots identified several key features which explain the wide range of activities. The results obtained from models offer important structural insight into designing novel peptic-ulcer inhibitors prior to their synthesis.
Fetit, Ahmed E; Novak, Jan; Peet, Andrew C; Arvanitits, Theodoros N
2015-09-01
The aim of this study was to assess the efficacy of three-dimensional texture analysis (3D TA) of conventional MR images for the classification of childhood brain tumours in a quantitative manner. The dataset comprised pre-contrast T1 - and T2-weighted MRI series obtained from 48 children diagnosed with brain tumours (medulloblastoma, pilocytic astrocytoma and ependymoma). 3D and 2D TA were carried out on the images using first-, second- and higher order statistical methods. Six supervised classification algorithms were trained with the most influential 3D and 2D textural features, and their performances in the classification of tumour types, using the two feature sets, were compared. Model validation was carried out using the leave-one-out cross-validation (LOOCV) approach, as well as stratified 10-fold cross-validation, in order to provide additional reassurance. McNemar's test was used to test the statistical significance of any improvements demonstrated by 3D-trained classifiers. Supervised learning models trained with 3D textural features showed improved classification performances to those trained with conventional 2D features. For instance, a neural network classifier showed 12% improvement in area under the receiver operator characteristics curve (AUC) and 19% in overall classification accuracy. These improvements were statistically significant for four of the tested classifiers, as per McNemar's tests. This study shows that 3D textural features extracted from conventional T1 - and T2-weighted images can improve the diagnostic classification of childhood brain tumours. Long-term benefits of accurate, yet non-invasive, diagnostic aids include a reduction in surgical procedures, improvement in surgical and therapy planning, and support of discussions with patients' families. It remains necessary, however, to extend the analysis to a multicentre cohort in order to assess the scalability of the techniques used. Copyright © 2015 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Yongjun; Paul, Anjan Kumar; Kim, Namkug, E-mail: namkugkim@gmail.com
Purpose: To develop a semiautomated computer-aided diagnosis (CAD) system for thyroid cancer using two-dimensional ultrasound images that can be used to yield a second opinion in the clinic to differentiate malignant and benign lesions. Methods: A total of 118 ultrasound images that included axial and longitudinal images from patients with biopsy-confirmed malignant (n = 30) and benign (n = 29) nodules were collected. Thyroid CAD software was developed to extract quantitative features from these images based on thyroid nodule segmentation in which adaptive diffusion flow for active contours was used. Various features, including histogram, intensity differences, elliptical fit, gray-level co-occurrencemore » matrixes, and gray-level run-length matrixes, were evaluated for each region imaged. Based on these imaging features, a support vector machine (SVM) classifier was used to differentiate benign and malignant nodules. Leave-one-out cross-validation with sequential forward feature selection was performed to evaluate the overall accuracy of this method. Additionally, analyses with contingency tables and receiver operating characteristic (ROC) curves were performed to compare the performance of CAD with visual inspection by expert radiologists based on established gold standards. Results: Most univariate features for this proposed CAD system attained accuracies that ranged from 78.0% to 83.1%. When optimal SVM parameters that were established using a grid search method with features that radiologists use for visual inspection were employed, the authors could attain rates of accuracy that ranged from 72.9% to 84.7%. Using leave-one-out cross-validation results in a multivariate analysis of various features, the highest accuracy achieved using the proposed CAD system was 98.3%, whereas visual inspection by radiologists reached 94.9% accuracy. To obtain the highest accuracies, “axial ratio” and “max probability” in axial images were most frequently included in the optimal feature sets for the authors’ proposed CAD system, while “shape” and “calcification” in longitudinal images were most frequently included in the optimal feature sets for visual inspection by radiologists. The computed areas under curves in the ROC analysis were 0.986 and 0.979 for the proposed CAD system and visual inspection by radiologists, respectively; no significant difference was detected between these groups. Conclusions: The use of thyroid CAD to differentiate malignant from benign lesions shows accuracy similar to that obtained via visual inspection by radiologists. Thyroid CAD might be considered a viable way to generate a second opinion for radiologists in clinical practice.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Shiju; Qian, Wei; Guan, Yubao
2016-06-15
Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initiallymore » computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.« less
Blanchard, P; Wong, AJ; Gunn, GB; Garden, AS; Mohamed, ASR; Rosenthal, DI; Crutison, J; Wu, R; Zhang, X; Zhu, XR; Mohan, R; Amin, MV; Fuller, CD; Frank, SJ
2017-01-01
Objective To externally validate head and neck cancer (HNC) photon-derived normal tissue complication probability (NTCP) models in patients treated with proton beam therapy (PBT). Methods This prospective cohort consisted of HNC patients treated with PBT at a single institution. NTCP models were selected based on the availability of data for validation and evaluated using the leave-one-out cross-validated area under the curve (AUC) for the receiver operating characteristics curve. Results 192 patients were included. The most prevalent tumor site was oropharynx (n=86, 45%), followed by sinonasal (n=28), nasopharyngeal (n=27) or parotid (n=27) tumors. Apart from the prediction of acute mucositis (reduction of AUC of 0.17), the models overall performed well. The validation (PBT) AUC and the published AUC were respectively 0.90 versus 0.88 for feeding tube 6 months post-PBT; 0.70 versus 0.80 for physician rated dysphagia 6 months post-PBT; 0.70 versus 0.80 for dry mouth 6 months post-PBT; and 0.73 versus 0.85 for hypothyroidism 12 months post-PBT. Conclusion While the drop in NTCP model performance was expected in PBT patients, the models showed robustness and remained valid. Further work is warranted, but these results support the validity of the model-based approach for treatment selection for HNC patients. PMID:27641784
Dehing-Oberije, Cary; Aerts, Hugo; Yu, Shipeng; De Ruysscher, Dirk; Menheere, Paul; Hilvo, Mika; van der Weide, Hiska; Rao, Bharat; Lambin, Philippe
2011-10-01
Currently, prediction of survival for non-small-cell lung cancer patients treated with (chemo)radiotherapy is mainly based on clinical factors. The hypothesis of this prospective study was that blood biomarkers related to hypoxia, inflammation, and tumor load would have an added prognostic value for predicting survival. Clinical data and blood samples were collected prospectively (NCT00181519, NCT00573040, and NCT00572325) from 106 inoperable non-small-cell lung cancer patients (Stages I-IIIB), treated with curative intent with radiotherapy alone or combined with chemotherapy. Blood biomarkers, including lactate dehydrogenase, C-reactive protein, osteopontin, carbonic anhydrase IX, interleukin (IL) 6, IL-8, carcinoembryonic antigen (CEA), and cytokeratin fragment 21-1, were measured. A multivariate model, built on a large patient population (N = 322) and externally validated, was used as a baseline model. An extended model was created by selecting additional biomarkers. The model's performance was expressed as the area under the curve (AUC) of the receiver operating characteristic and assessed by use of leave-one-out cross validation as well as a validation cohort (n = 52). The baseline model consisted of gender, World Health Organization performance status, forced expiratory volume, number of positive lymph node stations, and gross tumor volume and yielded an AUC of 0.72. The extended model included two additional blood biomarkers (CEA and IL-6) and resulted in a leave-one-out AUC of 0.81. The performance of the extended model was significantly better than the clinical model (p = 0.004). The AUC on the validation cohort was 0.66 and 0.76, respectively. The performance of the prognostic model for survival improved markedly by adding two blood biomarkers: CEA and IL-6. Copyright © 2011 Elsevier Inc. All rights reserved.
In vivo Raman spectroscopy of cervix cancers
NASA Astrophysics Data System (ADS)
Rubina, S.; Sathe, Priyanka; Dora, Tapas Kumar; Chopra, Supriya; Maheshwari, Amita; Krishna, C. Murali
2014-03-01
Cervix-cancer is the third most common female cancer worldwide. It is the leading cancer among Indian females with more than million new diagnosed cases and 50% mortality, annually. The high mortality rates can be attributed to late diagnosis. Efficacy of Raman spectroscopy in classification of normal and pathological conditions in cervix cancers on diverse populations has already been demonstrated. Our earlier ex vivo studies have shown the feasibility of classifying normal and cancer cervix tissues as well as responders/non-responders to Concurrent chemoradiotherapy (CCRT). The present study was carried out to explore feasibility of in vivo Raman spectroscopic methods in classifying normal and cancerous conditions in Indian population. A total of 182 normal and 132 tumor in vivo Raman spectra, from 63 subjects, were recorded using a fiberoptic probe coupled HE-785 spectrometer, under clinical supervision. Spectra were acquired for 5 s and averaged over 3 times at 80 mW laser power. Spectra of normal conditions suggest strong collagenous features and abundance of non-collagenous proteins and DNA in case of tumors. Preprocessed spectra were subjected to Principal Component-Linear Discrimination Analysis (PCLDA) followed by leave-one-out-cross-validation. Classification efficiency of ~96.7% and 100% for normal and cancerous conditions respectively, were observed. Findings of the study corroborates earlier studies and suggest applicability of Raman spectroscopic methods in combination with appropriate multivariate tool for objective, noninvasive and rapid diagnosis of cervical cancers in Indian population. In view of encouraging results, extensive validation studies will be undertaken to confirm the findings.
NASA Astrophysics Data System (ADS)
Iwashita, Fabio; Brooks, Andrew; Spencer, John; Borombovits, Daniel; Curwen, Graeme; Olley, Jon
2015-04-01
Assessing bank stability using geotechnical models traditionally involves the laborious collection of data on the bank and floodplain stratigraphy, as well as in-situ geotechnical data for each sedimentary unit within a river bank. The application of geotechnical bank stability models are limited to those sites where extensive field data has been collected, where their ability to provide predictions of bank erosion at the reach scale are limited without a very extensive and expensive field data collection program. Some challenges in the construction and application of riverbank erosion and hydraulic numerical models are their one-dimensionality, steady-state requirements, lack of calibration data, and nonuniqueness. Also, numerical models commonly can be too rigid with respect to detecting unexpected features like the onset of trends, non-linear relations, or patterns restricted to sub-samples of a data set. These shortcomings create the need for an alternate modelling approach capable of using available data. The application of the Self-Organizing Maps (SOM) approach is well-suited to the analysis of noisy, sparse, nonlinear, multidimensional, and scale-dependent data. It is a type of unsupervised artificial neural network with hybrid competitive-cooperative learning. In this work we present a method that uses a database of geotechnical data collected at over 100 sites throughout Queensland State, Australia, to develop a modelling approach that enables geotechnical parameters (soil effective cohesion, friction angle, soil erodibility and critical stress) to be derived from sediment particle size data (PSD). The model framework and predicted values were evaluated using two methods, splitting the dataset into training and validation set, and through a Bootstrap approach. The basis of Bootstrap cross-validation is a leave-one-out strategy. This requires leaving one data value out of the training set while creating a new SOM to estimate that missing value based on the remaining data. As a new SOM is created up to 30 times for each value under scrutiny, it forms the basis for a stochastic framework from which residuals are used to evaluate error statistics and model bias. The proposed method is suitable to estimate soil geotechnical properties, revealing and quantifying relationships between geotechnical variables and particle distribution size, not properly observed by linear multivariate statistical approaches.
NASA Astrophysics Data System (ADS)
Jamal, Wasifa; Das, Saptarshi; Oprescu, Ioana-Anastasia; Maharatna, Koushik; Apicella, Fabio; Sicca, Federico
2014-08-01
Objective. The paper investigates the presence of autism using the functional brain connectivity measures derived from electro-encephalogram (EEG) of children during face perception tasks. Approach. Phase synchronized patterns from 128-channel EEG signals are obtained for typical children and children with autism spectrum disorder (ASD). The phase synchronized states or synchrostates temporally switch amongst themselves as an underlying process for the completion of a particular cognitive task. We used 12 subjects in each group (ASD and typical) for analyzing their EEG while processing fearful, happy and neutral faces. The minimal and maximally occurring synchrostates for each subject are chosen for extraction of brain connectivity features, which are used for classification between these two groups of subjects. Among different supervised learning techniques, we here explored the discriminant analysis and support vector machine both with polynomial kernels for the classification task. Main results. The leave one out cross-validation of the classification algorithm gives 94.7% accuracy as the best performance with corresponding sensitivity and specificity values as 85.7% and 100% respectively. Significance. The proposed method gives high classification accuracies and outperforms other contemporary research results. The effectiveness of the proposed method for classification of autistic and typical children suggests the possibility of using it on a larger population to validate it for clinical practice.
Raman spectroscopy-based screening of IgM positive and negative sera for dengue virus infection
NASA Astrophysics Data System (ADS)
Bilal, M.; Saleem, M.; Bilal, Maria; Ijaz, T.; Khan, Saranjam; Ullah, Rahat; Raza, A.; Khurram, M.; Akram, W.; Ahmed, M.
2016-11-01
A statistical method based on Raman spectroscopy for the screening of immunoglobulin M (IgM) in dengue virus (DENV) infected human sera is presented. In total, 108 sera samples were collected and their antibody indexes (AI) for IgM were determined through enzyme-linked immunosorbent assay (ELISA). Raman spectra of these samples were acquired using a 785 nm wavelength excitation laser. Seventy-eight Raman spectra were selected randomly and unbiasedly for the development of a statistical model using partial least square (PLS) regression, while the remaining 30 were used for testing the developed model. An R-square (r 2) value of 0.929 was determined using the leave-one-sample-out (LOO) cross validation method, showing the validity of this model. It considers all molecular changes related to IgM concentration, and describes their role in infection. A graphical user interface (GUI) platform has been developed to run a developed multivariate model for the prediction of AI of IgM for blindly tested samples, and an excellent agreement has been found between model predicted and clinically determined values. Parameters like sensitivity, specificity, accuracy, and area under receiver operator characteristic (ROC) curve for these tested samples are also reported to visualize model performance.
2015-01-01
Background microRNA (miRNA) expression plays an influential role in cancer classification and malignancy, and miRNAs are feasible as alternative diagnostic markers for pancreatic cancer, a highly aggressive neoplasm with silent early symptoms, high metastatic potential, and resistance to conventional therapies. Methods In this study, we evaluated the benefits of multi-omics data analysis by integrating miRNA and mRNA expression data in pancreatic cancer. Using support vector machine (SVM) modelling and leave-one-out cross validation (LOOCV), we evaluated the diagnostic performance of single- or multi-markers based on miRNA and mRNA expression profiles from 104 PDAC tissues and 17 benign pancreatic tissues. For selecting even more reliable and robust markers, we performed validation by independent datasets from the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA) data depositories. For validation, miRNA activity was estimated by miRNA-target gene interaction and mRNA expression datasets in pancreatic cancer. Results Using a comprehensive identification approach, we successfully identified 705 multi-markers having powerful diagnostic performance for PDAC. In addition, these marker candidates annotated with cancer pathways using gene ontology analysis. Conclusions Our prediction models have strong potential for the diagnosis of pancreatic cancer. PMID:26328610
NASA Astrophysics Data System (ADS)
Chandra, Malavika; Scheiman, James; Simeone, Diane; McKenna, Barbara; Purdy, Julianne; Mycek, Mary-Ann
2010-01-01
Pancreatic adenocarcinoma is one of the leading causes of cancer death, in part because of the inability of current diagnostic methods to reliably detect early-stage disease. We present the first assessment of the diagnostic accuracy of algorithms developed for pancreatic tissue classification using data from fiber optic probe-based bimodal optical spectroscopy, a real-time approach that would be compatible with minimally invasive diagnostic procedures for early cancer detection in the pancreas. A total of 96 fluorescence and 96 reflectance spectra are considered from 50 freshly excised tissue sites-including human pancreatic adenocarcinoma, chronic pancreatitis (inflammation), and normal tissues-on nine patients. Classification algorithms using linear discriminant analysis are developed to distinguish among tissues, and leave-one-out cross-validation is employed to assess the classifiers' performance. The spectral areas and ratios classifier (SpARC) algorithm employs a combination of reflectance and fluorescence data and has the best performance, with sensitivity, specificity, negative predictive value, and positive predictive value for correctly identifying adenocarcinoma being 85, 89, 92, and 80%, respectively.
Lu, Shao Hua; Li, Bao Qiong; Zhai, Hong Lin; Zhang, Xin; Zhang, Zhuo Yong
2018-04-25
Terahertz time-domain spectroscopy has been applied to many fields, however, it still encounters drawbacks in multicomponent mixtures analysis due to serious spectral overlapping. Here, an effective approach to quantitative analysis was proposed, and applied on the determination of the ternary amino acids in foxtail millet substrate. Utilizing three parameters derived from the THz-TDS, the images were constructed and the Tchebichef image moments were used to extract the information of target components. Then the quantitative models were obtained by stepwise regression. The correlation coefficients of leave-one-out cross-validation (R loo-cv 2 ) were more than 0.9595. As for external test set, the predictive correlation coefficients (R p 2 ) were more than 0.8026 and the root mean square error of prediction (RMSE p ) were less than 1.2601. Compared with the traditional methods (PLS and N-PLS methods), our approach is more accurate, robust and reliable, and can be a potential excellent approach to quantify multicomponent with THz-TDS spectroscopy. Copyright © 2017 Elsevier Ltd. All rights reserved.
Classification of prostate cancer grade using temporal ultrasound: in vivo feasibility study
NASA Astrophysics Data System (ADS)
Ghavidel, Sahar; Imani, Farhad; Khallaghi, Siavash; Gibson, Eli; Khojaste, Amir; Gaed, Mena; Moussa, Madeleine; Gomez, Jose A.; Siemens, D. Robert; Leveridge, Michael; Chang, Silvia; Fenster, Aaron; Ward, Aaron D.; Abolmaesumi, Purang; Mousavi, Parvin
2016-03-01
Temporal ultrasound has been shown to have high classification accuracy in differentiating cancer from benign tissue. In this paper, we extend the temporal ultrasound method to classify lower grade Prostate Cancer (PCa) from all other grades. We use a group of nine patients with mostly lower grade PCa, where cancerous regions are also limited. A critical challenge is to train a classifier with limited aggressive cancerous tissue compared to low grade cancerous tissue. To resolve the problem of imbalanced data, we use Synthetic Minority Oversampling Technique (SMOTE) to generate synthetic samples for the minority class. We calculate spectral features of temporal ultrasound data and perform feature selection using Random Forests. In leave-one-patient-out cross-validation strategy, an area under receiver operating characteristic curve (AUC) of 0.74 is achieved with overall sensitivity and specificity of 70%. Using an unsupervised learning approach prior to proposed method improves sensitivity and AUC to 80% and 0.79. This work represents promising results to classify lower and higher grade PCa with limited cancerous training samples, using temporal ultrasound.
Zhang, Xiaotian; Yin, Jian; Zhang, Xu
2018-03-02
Increasing evidence suggests that dysregulation of microRNAs (miRNAs) may lead to a variety of diseases. Therefore, identifying disease-related miRNAs is a crucial problem. Currently, many computational approaches have been proposed to predict binary miRNA-disease associations. In this study, in order to predict underlying miRNA-disease association types, a semi-supervised model called the network-based label propagation algorithm is proposed to infer multiple types of miRNA-disease associations (NLPMMDA) by mutual information derived from the heterogeneous network. The NLPMMDA method integrates disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity information of miRNAs and diseases to construct a heterogeneous network. NLPMMDA is a semi-supervised model which does not require verified negative samples. Leave-one-out cross validation (LOOCV) was implemented for four known types of miRNA-disease associations and demonstrated the reliable performance of our method. Moreover, case studies of lung cancer and breast cancer confirmed effective performance of NLPMMDA to predict novel miRNA-disease associations and their association types.
NASA Astrophysics Data System (ADS)
Luo, Shuwen; Chen, Changshui; Mao, Hua; Jin, Shaoqin
2013-06-01
The feasibility of early detection of gastric cancer using near-infrared (NIR) Raman spectroscopy (RS) by distinguishing premalignant lesions (adenomatous polyp, n=27) and cancer tissues (adenocarcinoma, n=33) from normal gastric tissues (n=45) is evaluated. Significant differences in Raman spectra are observed among the normal, adenomatous polyp, and adenocarcinoma gastric tissues at 936, 1003, 1032, 1174, 1208, 1323, 1335, 1450, and 1655 cm-1. Diverse statistical methods are employed to develop effective diagnostic algorithms for classifying the Raman spectra of different types of ex vivo gastric tissues, including principal component analysis (PCA), linear discriminant analysis (LDA), and naive Bayesian classifier (NBC) techniques. Compared with PCA-LDA algorithms, PCA-NBC techniques together with leave-one-out, cross-validation method provide better discriminative results of normal, adenomatous polyp, and adenocarcinoma gastric tissues, resulting in superior sensitivities of 96.3%, 96.9%, and 96.9%, and specificities of 93%, 100%, and 95.2%, respectively. Therefore, NIR RS associated with multivariate statistical algorithms has the potential for early diagnosis of gastric premalignant lesions and cancer tissues in molecular level.
The use of atlas registration and graph cuts for prostate segmentation in magnetic resonance images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Korsager, Anne Sofie, E-mail: asko@hst.aau.dk; Østergaard, Lasse Riis; Fortunati, Valerio
2015-04-15
Purpose: An automatic method for 3D prostate segmentation in magnetic resonance (MR) images is presented for planning image-guided radiotherapy treatment of prostate cancer. Methods: A spatial prior based on intersubject atlas registration is combined with organ-specific intensity information in a graph cut segmentation framework. The segmentation is tested on 67 axial T{sub 2}-weighted MR images in a leave-one-out cross validation experiment and compared with both manual reference segmentations and with multiatlas-based segmentations using majority voting atlas fusion. The impact of atlas selection is investigated in both the traditional atlas-based segmentation and the new graph cut method that combines atlas andmore » intensity information in order to improve the segmentation accuracy. Best results were achieved using the method that combines intensity information, shape information, and atlas selection in the graph cut framework. Results: A mean Dice similarity coefficient (DSC) of 0.88 and a mean surface distance (MSD) of 1.45 mm with respect to the manual delineation were achieved. Conclusions: This approaches the interobserver DSC of 0.90 and interobserver MSD 0f 1.15 mm and is comparable to other studies performing prostate segmentation in MR.« less
High wavenumber Raman spectroscopic characterization of normal and oral cancer using blood plasma
NASA Astrophysics Data System (ADS)
Pachaiappan, Rekha; Prakasarao, Aruna; Suresh Kumar, Murugesan; Singaravelu, Ganesan
2017-02-01
Blood plasma possesses the biomolecules released from cells/tissues after metabolism and reflects the pathological conditions of the subjects. The analysis of biofluids for disease diagnosis becomes very attractive in the diagnosis of cancers due to the ease in the collection of samples, easy to transport, multiple sampling for regular screening of the disease and being less invasive to the patients. Hence, the intention of this study was to apply near-infrared (NIR) Raman spectroscopy in the high wavenumber (HW) region (2500-3400 cm-1) for the diagnosis of oral malignancy using blood plasma. From the Raman spectra it is observed that the biomolecules protein and lipid played a major role in the discrimination between groups. The diagnostic algorithms based on principal components analysis coupled with linear discriminant analysis (PCA-LDA) with the leave-one-patient-out cross-validation method on HW Raman spectra yielded a promising results in the identification of oral malignancy. The details of results will be discussed.
Optical biopsy using fluorescence spectroscopy for prostate cancer diagnosis
NASA Astrophysics Data System (ADS)
Wu, Binlin; Gao, Xin; Smith, Jason; Bailin, Jacob
2017-02-01
Native fluorescence spectra are acquired from fresh normal and cancerous human prostate tissues. The fluorescence data are analyzed using a multivariate analysis algorithm such as non-negative matrix factorization. The nonnegative spectral components are retrieved and attributed to the native fluorophores such as collagen, reduced nicotinamide adenine dinucleotide (NADH), and flavin adenine dinucleotide (FAD) in tissue. The retrieved weights of the components, e.g. NADH and FAD are used to estimate the relative concentrations of the native fluorophores and the redox ratio. A machine learning algorithm such as support vector machine (SVM) is used for classification to distinguish normal and cancerous tissue samples based on either the relative concentrations of NADH and FAD or the redox ratio alone. The classification performance is shown based on statistical measures such as sensitivity, specificity, and accuracy, along with the area under receiver operating characteristic (ROC) curve. A cross validation method such as leave-one-out is used to evaluate the predictive performance of the SVM classifier to avoid bias due to overfitting.
The Application of FT-IR Spectroscopy for Quality Control of Flours Obtained from Polish Producers
Ceglińska, Alicja; Reder, Magdalena; Ciemniewska-Żytkiewicz, Hanna
2017-01-01
Samples of wheat, spelt, rye, and triticale flours produced by different Polish mills were studied by both classic chemical methods and FT-IR MIR spectroscopy. An attempt was made to statistically correlate FT-IR spectral data with reference data with regard to content of various components, for example, proteins, fats, ash, and fatty acids as well as properties such as moisture, falling number, and energetic value. This correlation resulted in calibrated and validated statistical models for versatile evaluation of unknown flour samples. The calibration data set was used to construct calibration models with use of the CSR and the PLS with the leave one-out, cross-validation techniques. The calibrated models were validated with a validation data set. The results obtained confirmed that application of statistical models based on MIR spectral data is a robust, accurate, precise, rapid, inexpensive, and convenient methodology for determination of flour characteristics, as well as for detection of content of selected flour ingredients. The obtained models' characteristics were as follows: R2 = 0.97, PRESS = 2.14; R2 = 0.96, PRESS = 0.69; R2 = 0.95, PRESS = 1.27; R2 = 0.94, PRESS = 0.76, for content of proteins, lipids, ash, and moisture level, respectively. Best results of CSR models were obtained for protein, ash, and crude fat (R2 = 0.86; 0.82; and 0.78, resp.). PMID:28243483
Jóźwiak, Michał; Stępień, Karolina; Wrzosek, Małgorzata; Olejarz, Wioletta; Kubiak-Tomaszewska, Grażyna; Filipowska, Anna; Filipowski, Wojciech; Struga, Marta
2018-04-03
Thirty new derivatives of palmitic acid were efficiently synthesized. All obtained compounds can be divided into three groups of derivatives: Thiosemicarbazides (compounds 1 - 10 ), 1,2,4-triazoles (compounds 1a - 10a ) and 1,3,4-thiadiazoles (compounds 1b - 10b ) moieties. ¹H-NMR, 13 C-NMR and MS methods were used to confirm the structure of derivatives. All obtained compounds were tested in vitro against a number of microorganisms, including Gram-positive cocci, Gram-negative rods and Candida albicans . Compounds 4 , 5 , 6 , 8 showed significant inhibition against C. albicans . The range of MIC values was 50-1.56 μg/mL. The halogen atom, especially at the 3rd position of the phenyl group was significantly important for antifungal activity. The biological activity against Candida albicans and selected molecular descriptors were used as a basis for QSAR models, that have been determined by means of multiple linear regression. The models have been validated by means of the Leave-One-Out Cross Validation. The obtained QSAR models were characterized by high determination coefficients and good prediction power.
Zeng, Xiao-Lan; Wang, Hong-Jun; Wang, Yan
2012-02-01
The possible molecular geometries of 134 halogenated methyl-phenyl ethers were optimized at B3LYP/6-31G(*) level with Gaussian 98 program. The calculated structural parameters were taken as theoretical descriptors to establish two new novel QSPR models for predicting aqueous solubility (-lgS(w,l)) and n-octanol/water partition coefficient (lgK(ow)) of halogenated methyl-phenyl ethers. The two models achieved in this work both contain three variables: energy of the lowest unoccupied molecular orbital (E(LUMO)), most positive atomic partial charge in molecule (q(+)), and quadrupole moment (Q(yy) or Q(zz)), of which R values are 0.992 and 0.970 respectively, their standard errors of estimate in modeling (SD) are 0.132 and 0.178, respectively. The results of leave-one-out (LOO) cross-validation for training set and validation with external test sets both show that the models obtained exhibited optimum stability and good predictive power. We suggests that two QSPR models derived here can be used to predict S(w,l) and K(ow) accurately for non-tested halogenated methyl-phenyl ethers congeners. Copyright © 2011 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Beger, Richard D.; Buzatu, Dan A.; Wilkes, Jon G.
2002-10-01
A three-dimensional quantitative spectrometric data-activity relationship (3D-QSDAR) modeling technique which uses NMR spectral and structural information that is combined in a 3D-connectivity matrix has been developed. A 3D-connectivity matrix was built by displaying all possible assigned carbon NMR chemical shifts, carbon-to-carbon connections, and distances between the carbons. Two-dimensional 13C-13C COSY and 2D slices from the distance dimension of the 3D-connectivity matrix were used to produce a relationship among the 2D spectral patterns for polychlorinated dibenzofurans, dibenzodioxins, and biphenyls (PCDFs, PCDDs, and PCBs respectively) binding to the aryl hydrocarbon receptor (AhR). We refer to this technique as comparative structural connectivity spectral analysis (CoSCoSA) modeling. All CoSCoSA models were developed using forward multiple linear regression analysis of the predicted 13C NMR structure-connectivity spectral bins. A CoSCoSA model for 26 PCDFs had an explained variance (r2) of 0.93 and an average leave-four-out cross-validated variance (q4 2) of 0.89. A CoSCoSA model for 14 PCDDs produced an r2 of 0.90 and an average leave-two-out cross-validated variance (q2 2) of 0.79. One CoSCoSA model for 12 PCBs gave an r2 of 0.91 and an average q2 2 of 0.80. Another CoSCoSA model for all 52 compounds had an r2 of 0.85 and an average q4 2 of 0.52. Major benefits of CoSCoSA modeling include ease of development since the technique does not use molecular docking routines.
Qin, Li-Tang; Liu, Shu-Shen; Liu, Hai-Ling
2010-02-01
A five-variable model (model M2) was developed for the bioconcentration factors (BCFs) of nonpolar organic compounds (NPOCs) by using molecular electronegativity distance vector (MEDV) to characterize the structures of NPOCs and variable selection and modeling based on prediction (VSMP) to select the optimum descriptors. The estimated correlation coefficient (r (2)) and the leave-one-out cross-validation correlation coefficients (q (2)) of model M2 were 0.9271 and 0.9171, respectively. The model was externally validated by splitting the whole data set into a representative training set of 85 chemicals and a validation set of 29 chemicals. The results show that the main structural factors influencing the BCFs of NPOCs are -cCc, cCcc, -Cl, and -Br (where "-" refers to a single bond and "c" refers to a conjugated bond). The quantitative structure-property relationship (QSPR) model can effectively predict the BCFs of NPOCs, and the predictions of the model can also extend the current BCF database of experimental values.
Griffis, Joseph C; Allendorfer, Jane B; Szaflarski, Jerzy P
2016-01-15
Manual lesion delineation by an expert is the standard for lesion identification in MRI scans, but it is time-consuming and can introduce subjective bias. Alternative methods often require multi-modal MRI data, user interaction, scans from a control population, and/or arbitrary statistical thresholding. We present an approach for automatically identifying stroke lesions in individual T1-weighted MRI scans using naïve Bayes classification. Probabilistic tissue segmentation and image algebra were used to create feature maps encoding information about missing and abnormal tissue. Leave-one-case-out training and cross-validation was used to obtain out-of-sample predictions for each of 30 cases with left hemisphere stroke lesions. Our method correctly predicted lesion locations for 30/30 un-trained cases. Post-processing with smoothing (8mm FWHM) and cluster-extent thresholding (100 voxels) was found to improve performance. Quantitative evaluations of post-processed out-of-sample predictions on 30 cases revealed high spatial overlap (mean Dice similarity coefficient=0.66) and volume agreement (mean percent volume difference=28.91; Pearson's r=0.97) with manual lesion delineations. Our automated approach agrees with manual tracing. It provides an alternative to automated methods that require multi-modal MRI data, additional control scans, or user interaction to achieve optimal performance. Our fully trained classifier has applications in neuroimaging and clinical contexts. Copyright © 2015 Elsevier B.V. All rights reserved.
Choi, Kwanghun; Spohn, Marie; Park, Soo Jin; Huwe, Bernd; Ließ, Mareike
2017-01-01
Nitrogen (N) and phosphorus (P) in topsoils are critical for plant nutrition. Relatively little is known about the spatial patterns of N and P in the organic layer of mountainous landscapes. Therefore, the spatial distributions of N and P in both the organic layer and the A horizon were analyzed using a light detection and ranging (LiDAR) digital elevation model and vegetation metrics. The objective of the study was to analyze the effect of vegetation and topography on the spatial patterns of N and P in a small watershed covered by forest in South Korea. Soil samples were collected using the conditioned latin hypercube method. LiDAR vegetation metrics, the normalized difference vegetation index (NDVI), and terrain parameters were derived as predictors. Spatial explicit predictions of N/P ratios were obtained using a random forest with uncertainty analysis. We tested different strategies of model validation (repeated 2-fold to 20-fold and leave-one-out cross validation). Repeated 10-fold cross validation was selected for model validation due to the comparatively high accuracy and low variance of prediction. Surface curvature was the best predictor of P contents in the organic layer and in the A horizon, while LiDAR vegetation metrics and NDVI were important predictors of N in the organic layer. N/P ratios increased with surface curvature and were higher on the convex upper slope than on the concave lower slope. This was due to P enrichment of the soil on the lower slope and a more even spatial distribution of N. Our digital soil maps showed that the topsoils on the upper slopes contained relatively little P. These findings are critical for understanding N and P dynamics in mountainous ecosystems. PMID:28837590
Rainfall frequency analysis for ungauged sites using satellite precipitation products
NASA Astrophysics Data System (ADS)
Gado, Tamer A.; Hsu, Kuolin; Sorooshian, Soroosh
2017-11-01
The occurrence of extreme rainfall events and their impacts on hydrologic systems and society are critical considerations in the design and management of a large number of water resources projects. As precipitation records are often limited or unavailable at many sites, it is essential to develop better methods for regional estimation of extreme rainfall at these partially-gauged or ungauged sites. In this study, an innovative method for regional rainfall frequency analysis for ungauged sites is presented. The new method (hereafter, this is called the RRFA-S) is based on corrected annual maximum series obtained from a satellite precipitation product (e.g., PERSIANN-CDR). The probability matching method (PMM) is used here for bias correction to match the CDF of satellite-based precipitation data with the gauged data. The RRFA-S method was assessed through a comparative study with the traditional index flood method using the available annual maximum series of daily rainfall in two different regions in USA (11 sites in Colorado and 18 sites in California). The leave-one-out cross-validation technique was used to represent the ungauged site condition. Results of this numerical application have found that the quantile estimates obtained from the new approach are more accurate and more robust than those given by the traditional index flood method.
Liu, Bin; Jin, Min; Zeng, Pan
2015-10-01
The identification of gene-phenotype relationships is very important for the treatment of human diseases. Studies have shown that genes causing the same or similar phenotypes tend to interact with each other in a protein-protein interaction (PPI) network. Thus, many identification methods based on the PPI network model have achieved good results. However, in the PPI network, some interactions between the proteins encoded by candidate gene and the proteins encoded by known disease genes are very weak. Therefore, some studies have combined the PPI network with other genomic information and reported good predictive performances. However, we believe that the results could be further improved. In this paper, we propose a new method that uses the semantic similarity between the candidate gene and known disease genes to set the initial probability vector of a random walk with a restart algorithm in a human PPI network. The effectiveness of our method was demonstrated by leave-one-out cross-validation, and the experimental results indicated that our method outperformed other methods. Additionally, our method can predict new causative genes of multifactor diseases, including Parkinson's disease, breast cancer and obesity. The top predictions were good and consistent with the findings in the literature, which further illustrates the effectiveness of our method. Copyright © 2015 Elsevier Inc. All rights reserved.
de Heer, K; Kok, M G M; Fens, N; Weersink, E J M; Zwinderman, A H; van der Schee, M P C; Visser, C E; van Oers, M H J; Sterk, P J
2016-03-01
Currently, there is no noninvasive test that can reliably diagnose early invasive pulmonary aspergillosis (IA). An electronic nose (eNose) can discriminate various lung diseases through an analysis of exhaled volatile organic compounds. We recently published a proof-of-principle study showing that patients with prolonged chemotherapy-induced neutropenia and IA have a distinct exhaled breath profile (or breathprint) that can be discriminated with an eNose. An eNose is cheap and noninvasive, and it yields results within minutes. We determined whether Aspergillus fumigatus colonization may also be detected with an eNose in cystic fibrosis (CF) patients. Exhaled breath samples of 27 CF patients were analyzed with a Cyranose 320. Culture of sputum samples defined the A. fumigatus colonization status. eNose data were classified using canonical discriminant analysis after principal component reduction. Our primary outcome was cross-validated accuracy, defined as the percentage of correctly classified subjects using the leave-one-out method. The P value was calculated by the generation of 100,000 random alternative classifications. Nine of the 27 subjects were colonized by A. fumigatus. In total, 3 subjects were misclassified, resulting in a cross-validated accuracy of the Cyranose detecting IA of 89% (P = 0.004; sensitivity, 78%; specificity, 94%). Receiver operating characteristic (ROC) curve analysis showed an area under the curve (AUC) of 0.89. The results indicate that A. fumigatus colonization leads to a distinctive breathprint in CF patients. The present proof-of-concept data merit external validation and monitoring studies. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
NDRC: A Disease-Causing Genes Prioritized Method Based on Network Diffusion and Rank Concordance.
Fang, Minghong; Hu, Xiaohua; Wang, Yan; Zhao, Junmin; Shen, Xianjun; He, Tingting
2015-07-01
Disease-causing genes prioritization is very important to understand disease mechanisms and biomedical applications, such as design of drugs. Previous studies have shown that promising candidate genes are mostly ranked according to their relatedness to known disease genes or closely related disease genes. Therefore, a dangling gene (isolated gene) with no edges in the network can not be effectively prioritized. These approaches tend to prioritize those genes that are highly connected in the PPI network while perform poorly when they are applied to loosely connected disease genes. To address these problems, we propose a new disease-causing genes prioritization method that based on network diffusion and rank concordance (NDRC). The method is evaluated by leave-one-out cross validation on 1931 diseases in which at least one gene is known to be involved, and it is able to rank the true causal gene first in 849 of all 2542 cases. The experimental results suggest that NDRC significantly outperforms other existing methods such as RWR, VAVIEN, DADA and PRINCE on identifying loosely connected disease genes and successfully put dangling genes as potential candidate disease genes. Furthermore, we apply NDRC method to study three representative diseases, Meckel syndrome 1, Protein C deficiency and Peroxisome biogenesis disorder 1A (Zellweger). Our study has also found that certain complex disease-causing genes can be divided into several modules that are closely associated with different disease phenotype.
Chang, Yongjun; Paul, Anjan Kumar; Kim, Namkug; Baek, Jung Hwan; Choi, Young Jun; Ha, Eun Ju; Lee, Kang Dae; Lee, Hyoung Shin; Shin, DaeSeock; Kim, Nakyoung
2016-01-01
To develop a semiautomated computer-aided diagnosis (cad) system for thyroid cancer using two-dimensional ultrasound images that can be used to yield a second opinion in the clinic to differentiate malignant and benign lesions. A total of 118 ultrasound images that included axial and longitudinal images from patients with biopsy-confirmed malignant (n = 30) and benign (n = 29) nodules were collected. Thyroid cad software was developed to extract quantitative features from these images based on thyroid nodule segmentation in which adaptive diffusion flow for active contours was used. Various features, including histogram, intensity differences, elliptical fit, gray-level co-occurrence matrixes, and gray-level run-length matrixes, were evaluated for each region imaged. Based on these imaging features, a support vector machine (SVM) classifier was used to differentiate benign and malignant nodules. Leave-one-out cross-validation with sequential forward feature selection was performed to evaluate the overall accuracy of this method. Additionally, analyses with contingency tables and receiver operating characteristic (ROC) curves were performed to compare the performance of cad with visual inspection by expert radiologists based on established gold standards. Most univariate features for this proposed cad system attained accuracies that ranged from 78.0% to 83.1%. When optimal SVM parameters that were established using a grid search method with features that radiologists use for visual inspection were employed, the authors could attain rates of accuracy that ranged from 72.9% to 84.7%. Using leave-one-out cross-validation results in a multivariate analysis of various features, the highest accuracy achieved using the proposed cad system was 98.3%, whereas visual inspection by radiologists reached 94.9% accuracy. To obtain the highest accuracies, "axial ratio" and "max probability" in axial images were most frequently included in the optimal feature sets for the authors' proposed cad system, while "shape" and "calcification" in longitudinal images were most frequently included in the optimal feature sets for visual inspection by radiologists. The computed areas under curves in the ROC analysis were 0.986 and 0.979 for the proposed cad system and visual inspection by radiologists, respectively; no significant difference was detected between these groups. The use of thyroid cad to differentiate malignant from benign lesions shows accuracy similar to that obtained via visual inspection by radiologists. Thyroid cad might be considered a viable way to generate a second opinion for radiologists in clinical practice.
Nikam, P. H.; Kareparamban, J. A.; Jadhav, A. P.; Kadam, V. J.
2013-01-01
Ursolic acid, a pentacyclic triterpenoid possess a wide range of pharmacological activities. It shows hypoglycemic, antiandrogenic, antibacterial, antiinflammatory, antioxidant, diuretic and cynogenic activity. It is commonly present in plants especially coating of leaves and fruits, such as apple fruit, vinca leaves, rosemary leaves, and eucalyptus leaves. A simple high-performance thin layer chromatographic method has been developed for the quantification of ursolic acid from apple peel (Malus domestica). The samples dissolved in methanol and linear ascending development was carried out in twin trough glass chamber. The mobile phase was selected as toluene:ethyl acetate:glacial acetic acid (70:30:2). The linear regression analysis data for the calibration plots showed good linear relationship with r2=0.9982 in the concentration range 0.2-7 μg/spot with respect to peak area. According to the ICH guidelines the method was validated for linearity, accuracy, precision, and robustness. Statistical analysis of the data showed that the method is reproducible and selective for the estimation of ursolic acid. PMID:24302805
Wang, Li; Shi, Feng; Li, Gang; Lin, Weili; Gilmore, John H.; Shen, Dinggang
2014-01-01
Segmentation of infant brain MR images is challenging due to insufficient image quality, severe partial volume effect, and ongoing maturation and myelination process. During the first year of life, the signal contrast between white matter (WM) and gray matter (GM) in MR images undergoes inverse changes. In particular, the inversion of WM/GM signal contrast appears around 6–8 months of age, where brain tissues appear isointense and hence exhibit extremely low tissue contrast, posing significant challenges for automated segmentation. In this paper, we propose a novel segmentation method to address the above-mentioned challenge based on the sparse representation of the complementary tissue distribution information from T1, T2 and diffusion-weighted images. Specifically, we first derive an initial segmentation from a library of aligned multi-modality images with ground-truth segmentations by using sparse representation in a patch-based fashion. The segmentation is further refined by the integration of the geometrical constraint information. The proposed method was evaluated on 22 6-month-old training subjects using leave-one-out cross-validation, as well as 10 additional infant testing subjects, showing superior results in comparison to other state-of-the-art methods. PMID:24505729
Wang, Li; Shi, Feng; Li, Gang; Lin, Weili; Gilmore, John H; Shen, Dinggang
2013-01-01
Segmentation of infant brain MR images is challenging due to insufficient image quality, severe partial volume effect, and ongoing maturation and myelination process. During the first year of life, the signal contrast between white matter (WM) and gray matter (GM) in MR images undergoes inverse changes. In particular, the inversion of WM/GM signal contrast appears around 6-8 months of age, where brain tissues appear isointense and hence exhibit extremely low tissue contrast, posing significant challenges for automated segmentation. In this paper, we propose a novel segmentation method to address the above-mentioned challenge based on the sparse representation of the complementary tissue distribution information from T1, T2 and diffusion-weighted images. Specifically, we first derive an initial segmentation from a library of aligned multi-modality images with ground-truth segmentations by using sparse representation in a patch-based fashion. The segmentation is further refined by the integration of the geometrical constraint information. The proposed method was evaluated on 22 6-month-old training subjects using leave-one-out cross-validation, as well as 10 additional infant testing subjects, showing superior results in comparison to other state-of-the-art methods.
Driver fatigue detection through multiple entropy fusion analysis in an EEG-based system
Min, Jianliang; Wang, Ping
2017-01-01
Driver fatigue is an important contributor to road accidents, and fatigue detection has major implications for transportation safety. The aim of this research is to analyze the multiple entropy fusion method and evaluate several channel regions to effectively detect a driver's fatigue state based on electroencephalogram (EEG) records. First, we fused multiple entropies, i.e., spectral entropy, approximate entropy, sample entropy and fuzzy entropy, as features compared with autoregressive (AR) modeling by four classifiers. Second, we captured four significant channel regions according to weight-based electrodes via a simplified channel selection method. Finally, the evaluation model for detecting driver fatigue was established with four classifiers based on the EEG data from four channel regions. Twelve healthy subjects performed continuous simulated driving for 1–2 hours with EEG monitoring on a static simulator. The leave-one-out cross-validation approach obtained an accuracy of 98.3%, a sensitivity of 98.3% and a specificity of 98.2%. The experimental results verified the effectiveness of the proposed method, indicating that the multiple entropy fusion features are significant factors for inferring the fatigue state of a driver. PMID:29220351
Activity classification using realistic data from wearable sensors.
Pärkkä, Juha; Ermes, Miikka; Korpipää, Panu; Mäntyjärvi, Jani; Peltola, Johannes; Korhonen, Ilkka
2006-01-01
Automatic classification of everyday activities can be used for promotion of health-enhancing physical activities and a healthier lifestyle. In this paper, methods used for classification of everyday activities like walking, running, and cycling are described. The aim of the study was to find out how to recognize activities, which sensors are useful and what kind of signal processing and classification is required. A large and realistic data library of sensor data was collected. Sixteen test persons took part in the data collection, resulting in approximately 31 h of annotated, 35-channel data recorded in an everyday environment. The test persons carried a set of wearable sensors while performing several activities during the 2-h measurement session. Classification results of three classifiers are shown: custom decision tree, automatically generated decision tree, and artificial neural network. The classification accuracies using leave-one-subject-out cross validation range from 58 to 97% for custom decision tree classifier, from 56 to 97% for automatically generated decision tree, and from 22 to 96% for artificial neural network. Total classification accuracy is 82 % for custom decision tree classifier, 86% for automatically generated decision tree, and 82% for artificial neural network.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rueegsegger, Michael B.; Bach Cuadra, Meritxell; Pica, Alessia
Purpose: Ocular anatomy and radiation-associated toxicities provide unique challenges for external beam radiation therapy. For treatment planning, precise modeling of organs at risk and tumor volume are crucial. Development of a precise eye model and automatic adaptation of this model to patients' anatomy remain problematic because of organ shape variability. This work introduces the application of a 3-dimensional (3D) statistical shape model as a novel method for precise eye modeling for external beam radiation therapy of intraocular tumors. Methods and Materials: Manual and automatic segmentations were compared for 17 patients, based on head computed tomography (CT) volume scans. A 3Dmore » statistical shape model of the cornea, lens, and sclera as well as of the optic disc position was developed. Furthermore, an active shape model was built to enable automatic fitting of the eye model to CT slice stacks. Cross-validation was performed based on leave-one-out tests for all training shapes by measuring dice coefficients and mean segmentation errors between automatic segmentation and manual segmentation by an expert. Results: Cross-validation revealed a dice similarity of 95% {+-} 2% for the sclera and cornea and 91% {+-} 2% for the lens. Overall, mean segmentation error was found to be 0.3 {+-} 0.1 mm. Average segmentation time was 14 {+-} 2 s on a standard personal computer. Conclusions: Our results show that the solution presented outperforms state-of-the-art methods in terms of accuracy, reliability, and robustness. Moreover, the eye model shape as well as its variability is learned from a training set rather than by making shape assumptions (eg, as with the spherical or elliptical model). Therefore, the model appears to be capable of modeling nonspherically and nonelliptically shaped eyes.« less
Madhavan, Dinesh B; Baldock, Jeff A; Read, Zoe J; Murphy, Simon C; Cunningham, Shaun C; Perring, Michael P; Herrmann, Tim; Lewis, Tom; Cavagnaro, Timothy R; England, Jacqueline R; Paul, Keryn I; Weston, Christopher J; Baker, Thomas G
2017-05-15
Reforestation of agricultural lands with mixed-species environmental plantings can effectively sequester C. While accurate and efficient methods for predicting soil organic C content and composition have recently been developed for soils under agricultural land uses, such methods under forested land uses are currently lacking. This study aimed to develop a method using infrared spectroscopy for accurately predicting total organic C (TOC) and its fractions (particulate, POC; humus, HOC; and resistant, ROC organic C) in soils under environmental plantings. Soils were collected from 117 paired agricultural-reforestation sites across Australia. TOC fractions were determined in a subset of 38 reforested soils using physical fractionation by automated wet-sieving and 13 C nuclear magnetic resonance (NMR) spectroscopy. Mid- and near-infrared spectra (MNIRS, 6000-450 cm -1 ) were acquired from finely-ground soils from environmental plantings and agricultural land. Satisfactory prediction models based on MNIRS and partial least squares regression (PLSR) were developed for TOC and its fractions. Leave-one-out cross-validations of MNIRS-PLSR models indicated accurate predictions (R 2 > 0.90, negligible bias, ratio of performance to deviation > 3) and fraction-specific functional group contributions to beta coefficients in the models. TOC and its fractions were predicted using the cross-validated models and soil spectra for 3109 reforested and agricultural soils. The reliability of predictions determined using k-nearest neighbour score distance indicated that >80% of predictions were within the satisfactory inlier limit. The study demonstrated the utility of infrared spectroscopy (MNIRS-PLSR) to rapidly and economically determine TOC and its fractions and thereby accurately describe the effects of land use change such as reforestation on agricultural soils. Copyright © 2017 Elsevier Ltd. All rights reserved.
Kumar, Ravindra; Kumari, Bandana; Srivastava, Abhishikha; Kumar, Manish
2014-10-29
Nuclear receptor proteins (NRP) are transcription factor that regulate many vital cellular processes in animal cells. NRPs form a super-family of phylogenetically related proteins and divided into different sub-families on the basis of ligand characteristics and their functions. In the post-genomic era, when new proteins are being added to the database in a high-throughput mode, it becomes imperative to identify new NRPs using information from amino acid sequence alone. In this study we report a SVM based two level prediction systems, NRfamPred, using dipeptide composition of proteins as input. At the 1st level, NRfamPred screens whether the query protein is NRP or non-NRP; if the query protein belongs to NRP class, prediction moves to 2nd level and predicts the sub-family. Using leave-one-out cross-validation, we were able to achieve an overall accuracy of 97.88% at the 1st level and an overall accuracy of 98.11% at the 2nd level with dipeptide composition. Benchmarking on independent datasets showed that NRfamPred had comparable accuracy to other existing methods, developed on the same dataset. Our method predicted the existence of 76 NRPs in the human proteome, out of which 14 are novel NRPs. NRfamPred also predicted the sub-families of these 14 NRPs.
Mapping the Diagnosis Axis of an Interface Terminology to the NANDA International Taxonomy
Juvé Udina, Maria-Eulàlia; Gonzalez Samartino, Maribel; Matud Calvo, Cristina
2012-01-01
Background. Nursing terminologies are designed to support nursing practice but, as with any other clinical tool, they should be evaluated. Cross-mapping is a formal method for examining the validity of the existing controlled vocabularies. Objectives. The study aims to assess the inclusiveness and expressiveness of the nursing diagnosis axis of a newly implemented interface terminology by cross-mapping with the NANDA-I taxonomy. Design/Methods. The study applied a descriptive design, using a cross-sectional, bidirectional mapping strategy. The sample included 728 concepts from both vocabularies. Concept cross-mapping was carried out to identify one-to-one, negative, and hierarchical connections. The analysis was conducted using descriptive statistics. Results. Agreement of the raters' mapping achieved 97%. More than 60% of the nursing diagnosis concepts in the NANDA-I taxonomy were mapped to concepts in the diagnosis axis of the new interface terminology; 71.1% were reversely mapped. Conclusions. Main results for outcome measures suggest that the diagnosis axis of this interface terminology meets the validity criterion of cross-mapping when mapped from and to the NANDA-I taxonomy. PMID:22830046
Mapping the Diagnosis Axis of an Interface Terminology to the NANDA International Taxonomy.
Juvé Udina, Maria-Eulàlia; Gonzalez Samartino, Maribel; Matud Calvo, Cristina
2012-01-01
Background. Nursing terminologies are designed to support nursing practice but, as with any other clinical tool, they should be evaluated. Cross-mapping is a formal method for examining the validity of the existing controlled vocabularies. Objectives. The study aims to assess the inclusiveness and expressiveness of the nursing diagnosis axis of a newly implemented interface terminology by cross-mapping with the NANDA-I taxonomy. Design/Methods. The study applied a descriptive design, using a cross-sectional, bidirectional mapping strategy. The sample included 728 concepts from both vocabularies. Concept cross-mapping was carried out to identify one-to-one, negative, and hierarchical connections. The analysis was conducted using descriptive statistics. Results. Agreement of the raters' mapping achieved 97%. More than 60% of the nursing diagnosis concepts in the NANDA-I taxonomy were mapped to concepts in the diagnosis axis of the new interface terminology; 71.1% were reversely mapped. Conclusions. Main results for outcome measures suggest that the diagnosis axis of this interface terminology meets the validity criterion of cross-mapping when mapped from and to the NANDA-I taxonomy.
Design and Implementation of a Smart Home System Using Multisensor Data Fusion Technology.
Hsu, Yu-Liang; Chou, Po-Huan; Chang, Hsing-Cheng; Lin, Shyan-Lung; Yang, Shih-Chin; Su, Heng-Yi; Chang, Chih-Chien; Cheng, Yuan-Sheng; Kuo, Yu-Chen
2017-07-15
This paper aims to develop a multisensor data fusion technology-based smart home system by integrating wearable intelligent technology, artificial intelligence, and sensor fusion technology. We have developed the following three systems to create an intelligent smart home environment: (1) a wearable motion sensing device to be placed on residents' wrists and its corresponding 3D gesture recognition algorithm to implement a convenient automated household appliance control system; (2) a wearable motion sensing device mounted on a resident's feet and its indoor positioning algorithm to realize an effective indoor pedestrian navigation system for smart energy management; (3) a multisensor circuit module and an intelligent fire detection and alarm algorithm to realize a home safety and fire detection system. In addition, an intelligent monitoring interface is developed to provide in real-time information about the smart home system, such as environmental temperatures, CO concentrations, communicative environmental alarms, household appliance status, human motion signals, and the results of gesture recognition and indoor positioning. Furthermore, an experimental testbed for validating the effectiveness and feasibility of the smart home system was built and verified experimentally. The results showed that the 3D gesture recognition algorithm could achieve recognition rates for automated household appliance control of 92.0%, 94.8%, 95.3%, and 87.7% by the 2-fold cross-validation, 5-fold cross-validation, 10-fold cross-validation, and leave-one-subject-out cross-validation strategies. For indoor positioning and smart energy management, the distance accuracy and positioning accuracy were around 0.22% and 3.36% of the total traveled distance in the indoor environment. For home safety and fire detection, the classification rate achieved 98.81% accuracy for determining the conditions of the indoor living environment.
Design and Implementation of a Smart Home System Using Multisensor Data Fusion Technology
Chou, Po-Huan; Chang, Hsing-Cheng; Lin, Shyan-Lung; Yang, Shih-Chin; Su, Heng-Yi; Chang, Chih-Chien; Cheng, Yuan-Sheng; Kuo, Yu-Chen
2017-01-01
This paper aims to develop a multisensor data fusion technology-based smart home system by integrating wearable intelligent technology, artificial intelligence, and sensor fusion technology. We have developed the following three systems to create an intelligent smart home environment: (1) a wearable motion sensing device to be placed on residents’ wrists and its corresponding 3D gesture recognition algorithm to implement a convenient automated household appliance control system; (2) a wearable motion sensing device mounted on a resident’s feet and its indoor positioning algorithm to realize an effective indoor pedestrian navigation system for smart energy management; (3) a multisensor circuit module and an intelligent fire detection and alarm algorithm to realize a home safety and fire detection system. In addition, an intelligent monitoring interface is developed to provide in real-time information about the smart home system, such as environmental temperatures, CO concentrations, communicative environmental alarms, household appliance status, human motion signals, and the results of gesture recognition and indoor positioning. Furthermore, an experimental testbed for validating the effectiveness and feasibility of the smart home system was built and verified experimentally. The results showed that the 3D gesture recognition algorithm could achieve recognition rates for automated household appliance control of 92.0%, 94.8%, 95.3%, and 87.7% by the 2-fold cross-validation, 5-fold cross-validation, 10-fold cross-validation, and leave-one-subject-out cross-validation strategies. For indoor positioning and smart energy management, the distance accuracy and positioning accuracy were around 0.22% and 3.36% of the total traveled distance in the indoor environment. For home safety and fire detection, the classification rate achieved 98.81% accuracy for determining the conditions of the indoor living environment. PMID:28714884
Structure- and ligand-based structure-activity relationships for a series of inhibitors of aldolase.
Ferreira, Leonardo G; Andricopulo, Adriano D
2012-12-01
Aldolase has emerged as a promising molecular target for the treatment of human African trypanosomiasis. Over the last years, due to the increasing number of patients infected with Trypanosoma brucei, there is an urgent need for new drugs to treat this neglected disease. In the present study, two-dimensional fragment-based quantitative-structure activity relationship (QSAR) models were generated for a series of inhibitors of aldolase. Through the application of leave-one-out and leave-many-out cross-validation procedures, significant correlation coefficients were obtained (r²=0.98 and q²=0.77) as an indication of the statistical internal and external consistency of the models. The best model was employed to predict pKi values for a series of test set compounds, and the predicted values were in good agreement with the experimental results, showing the power of the model for untested compounds. Moreover, structure-based molecular modeling studies were performed to investigate the binding mode of the inhibitors in the active site of the parasitic target enzyme. The structural and QSAR results provided useful molecular information for the design of new aldolase inhibitors within this structural class.
Parastar, Hadi; Mostafapour, Sara; Azimi, Gholamhasan
2016-01-01
Comprehensive two-dimensional gas chromatography and flame ionization detection combined with unfolded-partial least squares is proposed as a simple, fast and reliable method to assess the quality of gasoline and to detect its potential adulterants. The data for the calibration set are first baseline corrected using a two-dimensional asymmetric least squares algorithm. The number of significant partial least squares components to build the model is determined using the minimum value of root-mean square error of leave-one out cross validation, which was 4. In this regard, blends of gasoline with kerosene, white spirit and paint thinner as frequently used adulterants are used to make calibration samples. Appropriate statistical parameters of regression coefficient of 0.996-0.998, root-mean square error of prediction of 0.005-0.010 and relative error of prediction of 1.54-3.82% for the calibration set show the reliability of the developed method. In addition, the developed method is externally validated with three samples in validation set (with a relative error of prediction below 10.0%). Finally, to test the applicability of the proposed strategy for the analysis of real samples, five real gasoline samples collected from gas stations are used for this purpose and the gasoline proportions were in range of 70-85%. Also, the relative standard deviations were below 8.5% for different samples in the prediction set. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Wang, Kun; Jiang, Tianzi; Liang, Meng; Wang, Liang; Tian, Lixia; Zhang, Xinqing; Li, Kuncheng; Liu, Zhening
2006-01-01
In this work, we proposed a discriminative model of Alzheimer's disease (AD) on the basis of multivariate pattern classification and functional magnetic resonance imaging (fMRI). This model used the correlation/anti-correlation coefficients of two intrinsically anti-correlated networks in resting brains, which have been suggested by two recent studies, as the feature of classification. Pseudo-Fisher Linear Discriminative Analysis (pFLDA) was then performed on the feature space and a linear classifier was generated. Using leave-one-out (LOO) cross validation, our results showed a correct classification rate of 83%. We also compared the proposed model with another one based on the whole brain functional connectivity. Our proposed model outperformed the other one significantly, and this implied that the two intrinsically anti-correlated networks may be a more susceptible part of the whole brain network in the early stage of AD.
Empirical performance of interpolation techniques in risk-neutral density (RND) estimation
NASA Astrophysics Data System (ADS)
Bahaludin, H.; Abdullah, M. H.
2017-03-01
The objective of this study is to evaluate the empirical performance of interpolation techniques in risk-neutral density (RND) estimation. Firstly, the empirical performance is evaluated by using statistical analysis based on the implied mean and the implied variance of RND. Secondly, the interpolation performance is measured based on pricing error. We propose using the leave-one-out cross-validation (LOOCV) pricing error for interpolation selection purposes. The statistical analyses indicate that there are statistical differences between the interpolation techniques:second-order polynomial, fourth-order polynomial and smoothing spline. The results of LOOCV pricing error shows that interpolation by using fourth-order polynomial provides the best fitting to option prices in which it has the lowest value error.
Majumdar, Subhabrata; Basak, Subhash C
2018-04-26
Proper validation is an important aspect of QSAR modelling. External validation is one of the widely used validation methods in QSAR where the model is built on a subset of the data and validated on the rest of the samples. However, its effectiveness for datasets with a small number of samples but large number of predictors remains suspect. Calculating hundreds or thousands of molecular descriptors using currently available software has become the norm in QSAR research, owing to computational advances in the past few decades. Thus, for n chemical compounds and p descriptors calculated for each molecule, the typical chemometric dataset today has high value of p but small n (i.e. n < p). Motivated by the evidence of inadequacies of external validation in estimating the true predictive capability of a statistical model in recent literature, this paper performs an extensive and comparative study of this method with several other validation techniques. We compared four validation methods: leave-one-out, K-fold, external and multi-split validation, using statistical models built using the LASSO regression, which simultaneously performs variable selection and modelling. We used 300 simulated datasets and one real dataset of 95 congeneric amine mutagens for this evaluation. External validation metrics have high variation among different random splits of the data, hence are not recommended for predictive QSAR models. LOO has the overall best performance among all validation methods applied in our scenario. Results from external validation are too unstable for the datasets we analyzed. Based on our findings, we recommend using the LOO procedure for validating QSAR predictive models built on high-dimensional small-sample data. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Sun, Jun; Zhou, Xin; Wu, Xiaohong; Zhang, Xiaodong; Li, Qinglin
2016-02-26
Fast identification of moisture content in tobacco plant leaves plays a key role in the tobacco cultivation industry and benefits the management of tobacco plant in the farm. In order to identify moisture content of tobacco plant leaves in a fast and nondestructive way, a method involving Mahalanobis distance coupled with Monte Carlo cross validation(MD-MCCV) was proposed to eliminate outlier sample in this study. The hyperspectral data of 200 tobacco plant leaf samples of 20 moisture gradients were obtained using FieldSpc(®) 3 spectrometer. Savitzky-Golay smoothing(SG), roughness penalty smoothing(RPS), kernel smoothing(KS) and median smoothing(MS) were used to preprocess the raw spectra. In addition, Mahalanobis distance(MD), Monte Carlo cross validation(MCCV) and Mahalanobis distance coupled to Monte Carlo cross validation(MD-MCCV) were applied to select the outlier sample of the raw spectrum and four smoothing preprocessing spectra. Successive projections algorithm (SPA) was used to extract the most influential wavelengths. Multiple Linear Regression (MLR) was applied to build the prediction models based on preprocessed spectra feature in characteristic wavelengths. The results showed that the preferably four prediction model were MD-MCCV-SG (Rp(2) = 0.8401 and RMSEP = 0.1355), MD-MCCV-RPS (Rp(2) = 0.8030 and RMSEP = 0.1274), MD-MCCV-KS (Rp(2) = 0.8117 and RMSEP = 0.1433), MD-MCCV-MS (Rp(2) = 0.9132 and RMSEP = 0.1162). MD-MCCV algorithm performed best among MD algorithm, MCCV algorithm and the method without sample pretreatment algorithm in the eliminating outlier sample from 20 different moisture gradients of tobacco plant leaves and MD-MCCV can be used to eliminate outlier sample in the spectral preprocessing. Copyright © 2016 Elsevier Inc. All rights reserved.
Estévez Campo, Enrique José; López-Lázaro, Sandra; López-Morago Rodríguez, Claudia; Alemán Aguilera, Inmaculada; Botella López, Miguel Cecilio
2018-05-01
Sex determination of unknown individuals is one of the primary goals of Physical and Forensic Anthropology. The adult skeleton can be sexed using both morphological and metric traits on a large number of bones. The human pelvis is often used as an important element of adult sex determination. However, studies carried out about the pelvic bone in subadult individuals present several limitations due the absence of sexually dimorphic characteristics. In this study, we analyse the sexual dimorphism of the immature pubis and ischium bones, attending to their shape (Procrustes residuals) and size (centroid size), using an identified sample of subadult individuals composed of 58 individuals for the pubis and 83 for the ischium, aged between birth and 1year of life, from the Granada osteological collection of identified infants (Granada, Spain). Geometric morphometric methods and discriminant analysis were applied to this study. The results of intra- and inter-observer error showed good and excellent agreement in the location of coordinates of landmarks and semilandmarks, respectively. Principal component analysis performed on shape and size variables showed superposition of the two sexes, suggesting a low degree of sexual dimorphism. Canonical variable analysis did not show significant changes between the male and female shapes. As a consequence, discriminant analysis with leave-one-out cross validation provided low classification accuracy. The results suggested a low degree of sexual dimorphism supported by significant sexual dimorphism in the subadult sample and poor cross-validated classification accuracy. The inclusion of centroid size as a discriminant variable does not imply a significant improvement in the results of the analysis. The similarities found between the sexes prevent consideration of pubic and ischial morphology as a sex estimator in early stages of development. The authors suggest extending this study by analysing the different trajectories of shape and size in later ontogeny between males and females. Copyright © 2018 Elsevier B.V. All rights reserved.
HAMDA: Hybrid Approach for MiRNA-Disease Association prediction.
Chen, Xing; Niu, Ya-Wei; Wang, Guang-Hui; Yan, Gui-Ying
2017-12-01
For decades, enormous experimental researches have collectively indicated that microRNA (miRNA) could play indispensable roles in many critical biological processes and thus also the pathogenesis of human complex diseases. Whereas the resource and time cost required in traditional biology experiments are expensive, more and more attentions have been paid to the development of effective and feasible computational methods for predicting potential associations between disease and miRNA. In this study, we developed a computational model of Hybrid Approach for MiRNA-Disease Association prediction (HAMDA), which involved the hybrid graph-based recommendation algorithm, to reveal novel miRNA-disease associations by integrating experimentally verified miRNA-disease associations, disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity into a recommendation algorithm. HAMDA took not only network structure and information propagation but also node attribution into consideration, resulting in a satisfactory prediction performance. Specifically, HAMDA obtained AUCs of 0.9035 and 0.8395 in the frameworks of global and local leave-one-out cross validation, respectively. Meanwhile, HAMDA also achieved good performance with AUC of 0.8965 ± 0.0012 in 5-fold cross validation. Additionally, we conducted case studies about three important human cancers for performance evaluation of HAMDA. As a result, 90% (Lymphoma), 86% (Prostate Cancer) and 92% (Kidney Cancer) of top 50 predicted miRNAs were confirmed by recent experiment literature, which showed the reliable prediction ability of HAMDA. Copyright © 2017 Elsevier Inc. All rights reserved.
New public QSAR model for carcinogenicity
2010-01-01
Background One of the main goals of the new chemical regulation REACH (Registration, Evaluation and Authorization of Chemicals) is to fulfill the gaps in data concerned with properties of chemicals affecting the human health. (Q)SAR models are accepted as a suitable source of information. The EU funded CAESAR project aimed to develop models for prediction of 5 endpoints for regulatory purposes. Carcinogenicity is one of the endpoints under consideration. Results Models for prediction of carcinogenic potency according to specific requirements of Chemical regulation were developed. The dataset of 805 non-congeneric chemicals extracted from Carcinogenic Potency Database (CPDBAS) was used. Counter Propagation Artificial Neural Network (CP ANN) algorithm was implemented. In the article two alternative models for prediction carcinogenicity are described. The first model employed eight MDL descriptors (model A) and the second one twelve Dragon descriptors (model B). CAESAR's models have been assessed according to the OECD principles for the validation of QSAR. For the model validity we used a wide series of statistical checks. Models A and B yielded accuracy of training set (644 compounds) equal to 91% and 89% correspondingly; the accuracy of the test set (161 compounds) was 73% and 69%, while the specificity was 69% and 61%, respectively. Sensitivity in both cases was equal to 75%. The accuracy of the leave 20% out cross validation for the training set of models A and B was equal to 66% and 62% respectively. To verify if the models perform correctly on new compounds the external validation was carried out. The external test set was composed of 738 compounds. We obtained accuracy of external validation equal to 61.4% and 60.0%, sensitivity 64.0% and 61.8% and specificity equal to 58.9% and 58.4% respectively for models A and B. Conclusion Carcinogenicity is a particularly important endpoint and it is expected that QSAR models will not replace the human experts opinions and conventional methods. However, we believe that combination of several methods will provide useful support to the overall evaluation of carcinogenicity. In present paper models for classification of carcinogenic compounds using MDL and Dragon descriptors were developed. Models could be used to set priorities among chemicals for further testing. The models at the CAESAR site were implemented in java and are publicly accessible. PMID:20678182
Garrard, Lili; Price, Larry R.; Bott, Marjorie J.; Gajewski, Byron J.
2016-01-01
Item response theory (IRT) models provide an appropriate alternative to the classical ordinal confirmatory factor analysis (CFA) during the development of patient-reported outcome measures (PROMs). Current literature has identified the assessment of IRT model fit as both challenging and underdeveloped (Sinharay & Johnson, 2003; Sinharay, Johnson, & Stern, 2006). This study evaluates the performance of Ordinal Bayesian Instrument Development (OBID), a Bayesian IRT model with a probit link function approach, through applications in two breast cancer-related instrument development studies. The primary focus is to investigate an appropriate method for comparing Bayesian IRT models in PROMs development. An exact Bayesian leave-one-out cross-validation (LOO-CV) approach (Vehtari & Lampinen, 2002) is implemented to assess prior selection for the item discrimination parameter in the IRT model and subject content experts’ bias (in a statistical sense and not to be confused with psychometric bias as in differential item functioning) toward the estimation of item-to-domain correlations. Results support the utilization of content subject experts’ information in establishing evidence for construct validity when sample size is small. However, the incorporation of subject experts’ content information in the OBID approach can be sensitive to the level of expertise of the recruited experts. More stringent efforts need to be invested in the appropriate selection of subject experts to efficiently use the OBID approach and reduce potential bias during PROMs development. PMID:27667878
Garrard, Lili; Price, Larry R; Bott, Marjorie J; Gajewski, Byron J
2016-10-01
Item response theory (IRT) models provide an appropriate alternative to the classical ordinal confirmatory factor analysis (CFA) during the development of patient-reported outcome measures (PROMs). Current literature has identified the assessment of IRT model fit as both challenging and underdeveloped (Sinharay & Johnson, 2003; Sinharay, Johnson, & Stern, 2006). This study evaluates the performance of Ordinal Bayesian Instrument Development (OBID), a Bayesian IRT model with a probit link function approach, through applications in two breast cancer-related instrument development studies. The primary focus is to investigate an appropriate method for comparing Bayesian IRT models in PROMs development. An exact Bayesian leave-one-out cross-validation (LOO-CV) approach (Vehtari & Lampinen, 2002) is implemented to assess prior selection for the item discrimination parameter in the IRT model and subject content experts' bias (in a statistical sense and not to be confused with psychometric bias as in differential item functioning) toward the estimation of item-to-domain correlations. Results support the utilization of content subject experts' information in establishing evidence for construct validity when sample size is small. However, the incorporation of subject experts' content information in the OBID approach can be sensitive to the level of expertise of the recruited experts. More stringent efforts need to be invested in the appropriate selection of subject experts to efficiently use the OBID approach and reduce potential bias during PROMs development.
Zemp, Roland; Tanadini, Matteo; Plüss, Stefan; Schnüriger, Karin; Singh, Navrag B; Taylor, William R; Lorenzetti, Silvio
2016-01-01
Occupational musculoskeletal disorders, particularly chronic low back pain (LBP), are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user's sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest). Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples). Sixteen force sensor values and the backrest angle were used as the explanatory variables (features) for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.
Williams, Ammon; Bryce, Keith; Phongikaroon, Supathorn
2017-10-01
Pyroprocessing of used nuclear fuel (UNF) has many advantages-including that it is proliferation resistant. However, as part of the process, special nuclear materials accumulate in the electrolyte salt and present material accountability and safeguards concerns. The main motivation of this work was to explore a laser-induced breakdown spectroscopy (LIBS) approach as an online monitoring technique to enhance the material accountability of special nuclear materials in pyroprocessing. In this work, a vacuum extraction method was used to draw the molten salt (CeCl 3 -GdCl 3 -LiCl-KCl) up into 4 mm diameter Pyrex tubes where it froze. The salt was then removed and the solid salt was measured using LIBS and inductively coupled plasma mass spectroscopy (ICP-MS). A total of 36 samples were made that varied the CeCl 3 and GdCl 3 (surrogates for uranium and plutonium, respectively) concentrations from 0.5 wt% to 5 wt%. From these samples, univariate calibration curves for Ce and Gd were generated using peak area and peak intensity methods. For Ce, the Ce 551.1 nm line using the peak area provided the best calibration curve with a limit of detection (LOD) of 0.099 wt% and a root mean squared error of cross-validation (RMSECV) of 0.197 wt%. For Gd, the best curve was generated using the peak intensities of the Gd 564.2 nm line resulting in a LOD of 0.027 wt% and a RMSECV of 0.295 wt%. The RMSECV for the univariate cases were determined using leave-one-out cross-validation. In addition to the univariate calibration curves, partial least squares (PLS) regression was done to develop a calibration model. The PLS models yielded similar results with RMSECV (determined using Venetian blind cross-validation with 17% left out per split) values of 0.30 wt% and 0.29 wt% for Ce and Gd, respectively. This work has shown that solid pyroprocessing salt can be qualitatively and quantitatively monitored using LIBS. This work has the potential of significantly enhancing the material monitoring and safeguards of special nuclear materials in pyroprocessing.
Watch-Dog: Detecting Self-Harming Activities From Wrist Worn Accelerometers.
Bharti, Pratool; Panwar, Anurag; Gopalakrishna, Ganesh; Chellappan, Sriram
2018-05-01
In a 2012 survey, in the United States alone, there were more than 35 000 reported suicides with approximately 1800 of being psychiatric inpatients. Recent Centers for Disease Control and Prevention (CDC) reports indicate an upward trend in these numbers. In psychiatric facilities, staff perform intermittent or continuous observation of patients manually in order to prevent such tragedies, but studies show that they are insufficient, and also consume staff time and resources. In this paper, we present the Watch-Dog system, to address the problem of detecting self-harming activities when attempted by in-patients in clinical settings. Watch-Dog comprises of three key components-Data sensed by tiny accelerometer sensors worn on wrists of subjects; an efficient algorithm to classify whether a user is active versus dormant (i.e., performing a physical activity versus not performing any activity); and a novel decision selection algorithm based on random forests and continuity indices for fine grained activity classification. With data acquired from 11 subjects performing a series of activities (both self-harming and otherwise), Watch-Dog achieves a classification accuracy of , , and for same-user 10-fold cross-validation, cross-user 10-fold cross-validation, and cross-user leave-one-out evaluation, respectively. We believe that the problem addressed in this paper is practical, important, and timely. We also believe that our proposed system is practically deployable, and related discussions are provided in this paper.
Willis, Brian H; Riley, Richard D
2017-09-20
An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple ('leave-one-out') cross-validation technique, we demonstrate how we may test meta-analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta-analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta-analysis and a tailored meta-regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within-study variance, between-study variance, study sample size, and the number of studies in the meta-analysis. Finally, we apply Vn to two published meta-analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta-analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Dietz, Hans Peter; D’hooge, Jan; Barratt, Dean; Deprest, Jan
2018-01-01
Abstract. Segmentation of the levator hiatus in ultrasound allows the extraction of biometrics, which are of importance for pelvic floor disorder assessment. We present a fully automatic method using a convolutional neural network (CNN) to outline the levator hiatus in a two-dimensional image extracted from a three-dimensional ultrasound volume. In particular, our method uses a recently developed scaled exponential linear unit (SELU) as a nonlinear self-normalizing activation function, which for the first time has been applied in medical imaging with CNN. SELU has important advantages such as being parameter-free and mini-batch independent, which may help to overcome memory constraints during training. A dataset with 91 images from 35 patients during Valsalva, contraction, and rest, all labeled by three operators, is used for training and evaluation in a leave-one-patient-out cross validation. Results show a median Dice similarity coefficient of 0.90 with an interquartile range of 0.08, with equivalent performance to the three operators (with a Williams’ index of 1.03), and outperforming a U-Net architecture without the need for batch normalization. We conclude that the proposed fully automatic method achieved equivalent accuracy in segmenting the pelvic floor levator hiatus compared to a previous semiautomatic approach. PMID:29340289
Bonmati, Ester; Hu, Yipeng; Sindhwani, Nikhil; Dietz, Hans Peter; D'hooge, Jan; Barratt, Dean; Deprest, Jan; Vercauteren, Tom
2018-04-01
Segmentation of the levator hiatus in ultrasound allows the extraction of biometrics, which are of importance for pelvic floor disorder assessment. We present a fully automatic method using a convolutional neural network (CNN) to outline the levator hiatus in a two-dimensional image extracted from a three-dimensional ultrasound volume. In particular, our method uses a recently developed scaled exponential linear unit (SELU) as a nonlinear self-normalizing activation function, which for the first time has been applied in medical imaging with CNN. SELU has important advantages such as being parameter-free and mini-batch independent, which may help to overcome memory constraints during training. A dataset with 91 images from 35 patients during Valsalva, contraction, and rest, all labeled by three operators, is used for training and evaluation in a leave-one-patient-out cross validation. Results show a median Dice similarity coefficient of 0.90 with an interquartile range of 0.08, with equivalent performance to the three operators (with a Williams' index of 1.03), and outperforming a U-Net architecture without the need for batch normalization. We conclude that the proposed fully automatic method achieved equivalent accuracy in segmenting the pelvic floor levator hiatus compared to a previous semiautomatic approach.
Automated Detection of Atrial Fibrillation Based on Time-Frequency Analysis of Seismocardiograms.
Hurnanen, Tero; Lehtonen, Eero; Tadi, Mojtaba Jafari; Kuusela, Tom; Kiviniemi, Tuomas; Saraste, Antti; Vasankari, Tuija; Airaksinen, Juhani; Koivisto, Tero; Pankaala, Mikko
2017-09-01
In this paper, a novel method to detect atrial fibrillation (AFib) from a seismocardiogram (SCG) is presented. The proposed method is based on linear classification of the spectral entropy and a heart rate variability index computed from the SCG. The performance of the developed algorithm is demonstrated on data gathered from 13 patients in clinical setting. After motion artifact removal, in total 119 min of AFib data and 126 min of sinus rhythm data were considered for automated AFib detection. No other arrhythmias were considered in this study. The proposed algorithm requires no direct heartbeat peak detection from the SCG data, which makes it tolerant against interpersonal variations in the SCG morphology, and noise. Furthermore, the proposed method relies solely on the SCG and needs no complementary electrocardiography to be functional. For the considered data, the detection method performs well even on relatively low quality SCG signals. Using a majority voting scheme that takes five randomly selected segments from a signal and classifies these segments using the proposed algorithm, we obtained an average true positive rate of [Formula: see text] and an average true negative rate of [Formula: see text] for detecting AFib in leave-one-out cross-validation. This paper facilitates adoption of microelectromechanical sensor based heart monitoring devices for arrhythmia detection.
Bello, Alessandra; Bianchi, Federica; Careri, Maria; Giannetto, Marco; Mori, Giovanni; Musci, Marilena
2007-11-05
A new NIR method based on multivariate calibration for determination of ethanol in industrially packed wholemeal bread was developed and validated. GC-FID was used as reference method for the determination of actual ethanol concentration of different samples of wholemeal bread with proper content of added ethanol, ranging from 0 to 3.5% (w/w). Stepwise discriminant analysis was carried out on the NIR dataset, in order to reduce the number of original variables by selecting those that were able to discriminate between the samples of different ethanol concentrations. With the so selected variables a multivariate calibration model was then obtained by multiple linear regression. The prediction power of the linear model was optimized by a new "leave one out" method, so that the number of original variables resulted further reduced.
RRegrs: an R package for computer-aided model selection with multiple regression models.
Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L
2015-01-01
Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.
Automated Authorship Attribution Using Advanced Signal Classification Techniques
Ebrahimpour, Maryam; Putniņš, Tālis J.; Berryman, Matthew J.; Allison, Andrew; Ng, Brian W.-H.; Abbott, Derek
2013-01-01
In this paper, we develop two automated authorship attribution schemes, one based on Multiple Discriminant Analysis (MDA) and the other based on a Support Vector Machine (SVM). The classification features we exploit are based on word frequencies in the text. We adopt an approach of preprocessing each text by stripping it of all characters except a-z and space. This is in order to increase the portability of the software to different types of texts. We test the methodology on a corpus of undisputed English texts, and use leave-one-out cross validation to demonstrate classification accuracies in excess of 90%. We further test our methods on the Federalist Papers, which have a partly disputed authorship and a fair degree of scholarly consensus. And finally, we apply our methodology to the question of the authorship of the Letter to the Hebrews by comparing it against a number of original Greek texts of known authorship. These tests identify where some of the limitations lie, motivating a number of open questions for future work. An open source implementation of our methodology is freely available for use at https://github.com/matthewberryman/author-detection. PMID:23437047
Álvarez, Ángela; Yáñez, Jorge; Contreras, David; Saavedra, Renato; Sáez, Pedro; Amarasiriwardena, Dulasiri
2017-11-01
The use of propellant for making improvised explosive devices (IED) is an incipient criminal practice. Propellant can be used as initiator in explosive mixtures along with other components such as coal, ammonium nitrate, sulfur, etc. The identification of the propellant's brand used in homemade explosives can provide additional forensic information of this evidence. In this work, four of the most common propellant brands were characterized by Fourier-transform infrared photoacoustic spectroscopy (FTIR-PAS) which is a non-destructive micro-analytical technique. Spectra shows characteristic signals of typical compounds in the propellants, such as nitrocellulose, nitroglycerin, guanidine, diphenylamine, etc. The differentiation of propellant components was achieved by using FTIR-PAS combined with chemometric methods of classification. Principal component analysis (PCA) and soft independent modelling of class analogy (SIMCA) were used to achieve an effective differentiation and classification (100%) of propellant brands. Furthermore, propellant brand differentiation was also assessed using partial least squares discriminant analyses (PLS-DA) by leave one out cross (∼97%) and external (∼100%) validation method. Our results show the ability of FTIR-PAS combined with chemometric analysis to identify and differentiate propellant brands in different explosive formulations of IED. Copyright © 2017 Elsevier B.V. All rights reserved.
Intraoperative Raman Spectroscopy of Soft Tissue Sarcomas
Nguyen, John Q.; Gowani, Zain S.; O’Connor, Maggie; Pence, Isaac J.; Nguyen, The-Quyen; Holt, Ginger E.; Schwartz, Herbert S.; Halpern, Jennifer L.; Mahadevan-Jansen, Anita
2017-01-01
Background and Objective Soft tissue sarcomas (STS) are a rare and heterogeneous group of malignant tumors that are often treated through surgical resection. Current intraoperative margin assessment methods are limited and highlight the need for an improved approach with respect to time and specificity. Here we investigate the potential of near-infrared Raman spectroscopy for the intraoperative differentiation of STS from surrounding normal tissue. Materials and Methods In vivo Raman measurements at 785 nm excitation were intraoperatively acquired from subjects undergoing STS resection using a probe based spectroscopy system. A multivariate classification algorithm was developed in order to automatically identify spectral features that can be used to differentiate STS from the surrounding normal muscle and fat. The classification algorithm was subsequently tested using leave-one-subject-out cross-validation. Results With the exclusion of well-differentiated liposarcomas, the algorithm was able to classify STS from the surrounding normal muscle and fat with a sensitivity and specificity of 89.5% and 96.4%, respectively. Conclusion These results suggest that single point near-infrared Raman spectroscopy could be utilized as a rapid and non-destructive surgical guidance tool for identifying abnormal tissue margins in need of further excision. PMID:27454580
Full-motion video analysis for improved gender classification
NASA Astrophysics Data System (ADS)
Flora, Jeffrey B.; Lochtefeld, Darrell F.; Iftekharuddin, Khan M.
2014-06-01
The ability of computer systems to perform gender classification using the dynamic motion of the human subject has important applications in medicine, human factors, and human-computer interface systems. Previous works in motion analysis have used data from sensors (including gyroscopes, accelerometers, and force plates), radar signatures, and video. However, full-motion video, motion capture, range data provides a higher resolution time and spatial dataset for the analysis of dynamic motion. Works using motion capture data have been limited by small datasets in a controlled environment. In this paper, we explore machine learning techniques to a new dataset that has a larger number of subjects. Additionally, these subjects move unrestricted through a capture volume, representing a more realistic, less controlled environment. We conclude that existing linear classification methods are insufficient for the gender classification for larger dataset captured in relatively uncontrolled environment. A method based on a nonlinear support vector machine classifier is proposed to obtain gender classification for the larger dataset. In experimental testing with a dataset consisting of 98 trials (49 subjects, 2 trials per subject), classification rates using leave-one-out cross-validation are improved from 73% using linear discriminant analysis to 88% using the nonlinear support vector machine classifier.
2012-01-01
Background Previous validation studies of sick leave measures have focused on self-reports. Register-based sick leave data are considered to be valid; however methodological problems may be associated with such data. A Danish national register on sickness benefit (DREAM) has been widely used in sick leave research. On the basis of sick leave records from 3,554 and 2,311 eldercare workers in 14 different workplaces, the aim of this study was to: 1) validate registered sickness benefit data from DREAM against workplace-registered sick leave spells of at least 15 days; 2) validate self-reported sick leave days during one year against workplace-registered sick leave. Methods Agreement between workplace-registered sick leave and DREAM-registered sickness benefit was reported as sensitivities, specificities and positive predictive values. A receiver-operating characteristic curve and a Bland-Altman plot were used to study the concordance with sick leave duration of the first spell. By means of an analysis of agreement between self-reported and workplace-registered sick leave sensitivity and specificity was calculated. Ninety-five percent confidence intervals (95% CI) were used. Results The probability that registered DREAM data on sickness benefit agrees with workplace-registered sick leave of at least 15 days was 96.7% (95% CI: 95.6-97.6). Specificity was close to 100% (95% CI: 98.3-100). The registered DREAM data on sickness benefit overestimated the duration of sick leave spells by an average of 1.4 (SD: 3.9) weeks. Separate analysis on pregnancy-related sick leave revealed a maximum sensitivity of 20% (95% CI: 4.3-48.1). The sensitivity of self-reporting at least one or at least 56 sick leave day/s was 94.5 (95% CI: 93.4 – 95.5) % and 58.5 (95% CI: 51.1 – 65.6) % respectively. The corresponding specificities were 85.3 (95% CI: 81.4 – 88.6) % and 98.9 (95% CI: 98.3 – 99.3) %. Conclusions The DREAM register offered valid measures of sick leave spells of at least 15 days among eldercare employees. Pregnancy-related sick leave should be excluded in studies planning to use DREAM data on sickness benefit. Self-reported sick leave became more imprecise when number of absence days increased, but the sensitivity and specificity were acceptable for lengths not exceeding one week. PMID:22894644
Energy-Based Metrics for Arthroscopic Skills Assessment.
Poursartip, Behnaz; LeBel, Marie-Eve; McCracken, Laura C; Escoto, Abelardo; Patel, Rajni V; Naish, Michael D; Trejos, Ana Luisa
2017-08-05
Minimally invasive skills assessment methods are essential in developing efficient surgical simulators and implementing consistent skills evaluation. Although numerous methods have been investigated in the literature, there is still a need to further improve the accuracy of surgical skills assessment. Energy expenditure can be an indication of motor skills proficiency. The goals of this study are to develop objective metrics based on energy expenditure, normalize these metrics, and investigate classifying trainees using these metrics. To this end, different forms of energy consisting of mechanical energy and work were considered and their values were divided by the related value of an ideal performance to develop normalized metrics. These metrics were used as inputs for various machine learning algorithms including support vector machines (SVM) and neural networks (NNs) for classification. The accuracy of the combination of the normalized energy-based metrics with these classifiers was evaluated through a leave-one-subject-out cross-validation. The proposed method was validated using 26 subjects at two experience levels (novices and experts) in three arthroscopic tasks. The results showed that there are statistically significant differences between novices and experts for almost all of the normalized energy-based metrics. The accuracy of classification using SVM and NN methods was between 70% and 95% for the various tasks. The results show that the normalized energy-based metrics and their combination with SVM and NN classifiers are capable of providing accurate classification of trainees. The assessment method proposed in this study can enhance surgical training by providing appropriate feedback to trainees about their level of expertise and can be used in the evaluation of proficiency.
NASA Astrophysics Data System (ADS)
Oleszko, Adam; Hartwich, Jadwiga; Wójtowicz, Anna; Gąsior-Głogowska, Marlena; Huras, Hubert; Komorowska, Małgorzata
2017-08-01
Hypertriglyceridemia, related with triglyceride (TG) in plasma above 1.7 mmol/L is one of the cardiovascular risk factors. Very low density lipoproteins (VLDL) are the main TG carriers. Despite being time consuming, demanding well-qualified staff and expensive instrumentation, ultracentrifugation technique still remains the gold standard for the VLDL isolation. Therefore faster and simpler method of VLDL-TG determination is needed. Vibrational spectroscopy, including FT-IR and Raman, is widely used technique in lipid and protein research. The aim of this study was assessment of Raman and FT-IR spectroscopy in determination of VLDL-TG directly in serum with the isolation step omitted. TG concentration in serum and in ultracentrifugated VLDL fractions from 32 patients were measured with reference colorimetric method. FT-IR and Raman spectra of VLDL and serum samples were acquired. Partial least square (PLS) regression was used for calibration and leave-one-out cross validation. Our results confirmed possibility of reagent-free determination of VLDL-TG directly in serum with both Raman and FT-IR spectroscopy. Quantitative VLDL testing by FT-IR and/or Raman spectroscopy applied directly to maternal serum seems to be promising screening test to identify women with increased risk of adverse pregnancy outcomes and patient friendly method of choice based on ease of performance, accuracy and efficiency.
2015-01-01
Background As the major histocompatibility complex (MHC), human leukocyte antigens (HLAs) are one of the most polymorphic genes in humans. Patients carrying certain HLA alleles may develop adverse drug reactions (ADRs) after taking specific drugs. Peptides play an important role in HLA related ADRs as they are the necessary co-binders of HLAs with drugs. Many experimental data have been generated for understanding HLA-peptide binding. However, efficiently utilizing the data for understanding and accurately predicting HLA-peptide binding is challenging. Therefore, we developed a network analysis based method to understand and predict HLA-peptide binding. Methods Qualitative Class I HLA-peptide binding data were harvested and prepared from four major databases. An HLA-peptide binding network was constructed from this dataset and modules were identified by the fast greedy modularity optimization algorithm. To examine the significance of signals in the yielded models, the modularity was compared with the modularity values generated from 1,000 random networks. The peptides and HLAs in the modules were characterized by similarity analysis. The neighbor-edges based and unbiased leverage algorithm (Nebula) was developed for predicting HLA-peptide binding. Leave-one-out (LOO) validations and two-fold cross-validations were conducted to evaluate the performance of Nebula using the constructed HLA-peptide binding network. Results Nine modules were identified from analyzing the HLA-peptide binding network with a highest modularity compared to all the random networks. Peptide length and functional side chains of amino acids at certain positions of the peptides were different among the modules. HLA sequences were module dependent to some extent. Nebula archived an overall prediction accuracy of 0.816 in the LOO validations and average accuracy of 0.795 in the two-fold cross-validations and outperformed the method reported in the literature. Conclusions Network analysis is a useful approach for analyzing large and sparse datasets such as the HLA-peptide binding dataset. The modules identified from the network analysis clustered peptides and HLAs with similar sequences and properties of amino acids. Nebula performed well in the predictions of HLA-peptide binding. We demonstrated that network analysis coupled with Nebula is an efficient approach to understand and predict HLA-peptide binding interactions and thus, could further our understanding of ADRs. PMID:26424483
Kwon, Yong-Kook; Bong, Yeon-Sik; Lee, Kwang-Sik; Hwang, Geum-Sook
2014-10-15
ICP-MS and (1)H NMR are commonly used to determine the geographical origin of food and crops. In this study, data from multielemental analysis performed by ICP-AES/ICP-MS and metabolomic data obtained from (1)H NMR were integrated to improve the reliability of determining the geographical origin of medicinal herbs. Astragalus membranaceus and Paeonia albiflora with different origins in Korea and China were analysed by (1)H NMR and ICP-AES/ICP-MS, and an integrated multivariate analysis was performed to characterise the differences between their origins. Four classification methods were applied: linear discriminant analysis (LDA), k-nearest neighbour classification (KNN), support vector machines (SVM), and partial least squares-discriminant analysis (PLS-DA). Results were compared using leave-one-out cross-validation and external validation. The integration of multielemental and metabolomic data was more suitable for determining geographical origin than the use of each individual data set alone. The integration of the two analytical techniques allowed diverse environmental factors such as climate and geology, to be considered. Our study suggests that an appropriate integration of different types of analytical data is useful for determining the geographical origin of food and crops with a high degree of reliability. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Chen, Jiang; Zhu, Weining; Tian, Yong Q.; Yu, Qian; Zheng, Yuhan; Huang, Litong
2017-07-01
Colored dissolved organic matter (CDOM) and chlorophyll-a (Chla) are important water quality parameters and play crucial roles in aquatic environment. Remote sensing of CDOM and Chla concentrations for inland lakes is often limited by low spatial resolution. The newly launched Sentinel-2 satellite is equipped with high spatial resolution (10, 20, and 60 m). Empirical band ratio models were developed to derive CDOM and Chla concentrations in Lake Huron. The leave-one-out cross-validation method was used for model calibration and validation. The best CDOM retrieval algorithm is a B3/B5 model with accuracy coefficient of determination (R2)=0.884, root-mean-squared error (RMSE)=0.731 m-1, relative root-mean-squared error (RRMSE)=28.02%, and bias=-0.1 m-1. The best Chla retrieval algorithm is a B5/B4 model with accuracy R2=0.49, RMSE=9.972 mg/m3, RRMSE=48.47%, and bias=-0.116 mg/m3. Neural network models were further implemented to improve inversion accuracy. The applications of the two best band ratio models to Sentinel-2 imagery with 10 m×10 m pixel size presented the high potential of the sensor for monitoring water quality of inland lakes.
Dong, Pei-Pei; Ge, Guang-Bo; Zhang, Yan-Yan; Ai, Chun-Zhi; Li, Guo-Hui; Zhu, Liang-Liang; Luan, Hong-Wei; Liu, Xing-Bao; Yang, Ling
2009-10-16
Seven pairs of epimers and one pair of isomeric metabolites of taxanes, each pair of which have similar structures but different retention behaviors, together with additional 13 taxanes with different substitutions were chosen to investigate the quantitative structure-retention relationship (QSRR) of taxanes in ultra fast liquid chromatography (UFLC). Monte Carlo variable selection (MCVS) method was adopted to choose descriptors. The selected four descriptors were used to build QSRR model with multi-linear regression (MLR) and artificial neural network (ANN) modeling techniques. Both linear and nonlinear models show good predictive ability, of which ANN model was better with the determination coefficient R(2) for training, validation and test set being 0.9892, 0.9747 and 0.9840, respectively. The results of 100 times' leave-12-out cross validation showed the robustness of this model. All the isomers can be correctly differentiated by this model. According to the selected descriptors, the three dimensional structural information was critical for recognition of epimers. Hydrophobic interaction was the uppermost factor for retention in UFLC. Molecules' polarizability and polarity properties were also closely correlated with retention behaviors. This QSRR model will be useful for separation and identification of taxanes including epimers and metabolites from botanical or biological samples.
Physiological reactivity to nonideographic virtual reality stimuli in veterans with and without PTSD
Webb, Andrea K; Vincent, Ashley L; Jin, Alvin B; Pollack, Mark H
2015-01-01
Background Post-traumatic stress disorder (PTSD) currently is diagnosed via clinical interview in which subjective self reports of traumatic events and associated experiences are discussed with a mental health professional. The reliability and validity of diagnoses can be improved with the use of objective physiological measures. Methods In this study, physiological activity was recorded from 58 male veterans (PTSD Diagnosis n = 16; Trauma Exposed/No PTSD Diagnosis: n = 23; No Trauma/No PTSD Diagnosis: n = 19) with and without PTSD and combat trauma exposure in response to emotionally evocative non-idiographic virtual reality stimuli. Results Statistically significant differences among the Control, Trauma, and PTSD groups were present during the viewing of two virtual reality videos. Skin conductance and interbeat interval features were extracted for each of ten video events (five events of increasing severity per video). These features were submitted to three stepwise discriminant function analyses to assess classification accuracy for Control versus Trauma, Control versus PTSD, and Trauma versus PTSD pairings of participant groups. Leave-one-out cross-validation classification accuracy was between 71 and 94%. Conclusions These results are promising and suggest the utility of objective physiological measures in assisting with PTSD diagnosis. PMID:25642387
A Systematic Approach to Predicting Spring Force for Sagittal Craniosynostosis Surgery.
Zhang, Guangming; Tan, Hua; Qian, Xiaohua; Zhang, Jian; Li, King; David, Lisa R; Zhou, Xiaobo
2016-05-01
Spring-assisted surgery (SAS) can effectively treat scaphocephaly by reshaping crania with the appropriate spring force. However, it is difficult to accurately estimate spring force without considering biomechanical properties of tissues. This study presents and validates a reliable system to accurately predict the spring force for sagittal craniosynostosis surgery. The authors randomly chose 23 patients who underwent SAS and had been followed for at least 2 years. An elastic model was designed to characterize the biomechanical behavior of calvarial bone tissue for each individual. After simulating the contact force on accurate position of the skull strip with the springs, the finite element method was applied to calculating the stress of each tissue node based on the elastic model. A support vector regression approach was then used to model the relationships between biomechanical properties generated from spring force, bone thickness, and the change of cephalic index after surgery. Therefore, for a new patient, the optimal spring force can be predicted based on the learned model with virtual spring simulation and dynamic programming approach prior to SAS. Leave-one-out cross-validation was implemented to assess the accuracy of our prediction. As a result, the mean prediction accuracy of this model was 93.35%, demonstrating the great potential of this model as a useful adjunct for preoperative planning tool.
Detection of nasopharyngeal cancer using confocal Raman spectroscopy and genetic algorithm technique
NASA Astrophysics Data System (ADS)
Li, Shao-Xin; Chen, Qiu-Yan; Zhang, Yan-Jiao; Liu, Zhi-Ming; Xiong, Hong-Lian; Guo, Zhou-Yi; Mai, Hai-Qiang; Liu, Song-Hao
2012-12-01
Raman spectroscopy (RS) and a genetic algorithm (GA) were applied to distinguish nasopharyngeal cancer (NPC) from normal nasopharyngeal tissue. A total of 225 Raman spectra are acquired from 120 tissue sites of 63 nasopharyngeal patients, 56 Raman spectra from normal tissue and 169 Raman spectra from NPC tissue. The GA integrated with linear discriminant analysis (LDA) is developed to differentiate NPC and normal tissue according to spectral variables in the selected regions of 792-805, 867-880, 996-1009, 1086-1099, 1288-1304, 1663-1670, and 1742-1752 cm-1 related to proteins, nucleic acids and lipids of tissue. The GA-LDA algorithms with the leave-one-out cross-validation method provide a sensitivity of 69.2% and specificity of 100%. The results are better than that of principal component analysis which is applied to the same Raman dataset of nasopharyngeal tissue with a sensitivity of 63.3% and specificity of 94.6%. This demonstrates that Raman spectroscopy associated with GA-LDA diagnostic algorithm has enormous potential to detect and diagnose nasopharyngeal cancer.
Electrofishing capture probability of smallmouth bass in streams
Dauwalter, D.C.; Fisher, W.L.
2007-01-01
Abundance estimation is an integral part of understanding the ecology and advancing the management of fish populations and communities. Mark-recapture and removal methods are commonly used to estimate the abundance of stream fishes. Alternatively, abundance can be estimated by dividing the number of individuals sampled by the probability of capture. We conducted a mark-recapture study and used multiple repeated-measures logistic regression to determine the influence of fish size, sampling procedures, and stream habitat variables on the cumulative capture probability for smallmouth bass Micropterus dolomieu in two eastern Oklahoma streams. The predicted capture probability was used to adjust the number of individuals sampled to obtain abundance estimates. The observed capture probabilities were higher for larger fish and decreased with successive electrofishing passes for larger fish only. Model selection suggested that the number of electrofishing passes, fish length, and mean thalweg depth affected capture probabilities the most; there was little evidence for any effect of electrofishing power density and woody debris density on capture probability. Leave-one-out cross validation showed that the cumulative capture probability model predicts smallmouth abundance accurately. ?? Copyright by the American Fisheries Society 2007.
Prediction of microsleeps using pairwise joint entropy and mutual information between EEG channels.
Baseer, Abdul; Weddell, Stephen J; Jones, Richard D
2017-07-01
Microsleeps are involuntary and brief instances of complete loss of responsiveness, typically of 0.5-15 s duration. They adversely affect performance in extended attention-driven jobs and can be fatal. Our aim was to predict microsleeps from 16 channel EEG signals. Two information theoretic concepts - pairwise joint entropy and mutual information - were independently used to continuously extract features from EEG signals. k-nearest neighbor (kNN) with k = 3 was used to calculate both joint entropy and mutual information. Highly correlated features were discarded and the rest were ranked using Fisher score followed by an average of 3-fold cross-validation area under the curve of the receiver operating characteristic (AUC ROC ). Leave-one-out method (LOOM) was performed to test the performance of microsleep prediction system on independent data. The best prediction for 0.25 s ahead was AUCROC, sensitivity, precision, geometric mean (GM), and φ of 0.93, 0.68, 0.33, 0.75, and 0.38 respectively with joint entropy using single linear discriminant analysis (LDA) classifier.
Stawiski, Konrad; Strzałka, Alicja; Puła, Anna; Bijakowski, Krzysztof
2015-01-01
Medical nutrition therapy has a pivotal role in the management of chronic gastrointestinal disorders, like chronic pancreatitis, inflammatory bowel diseases (Leśniowski-Crohn's disease and ulcerative colitis) or irritable bowel syndrome. The aim of this study is to develop, deploy and evaluate an interactive application for Windows and Android operating systems, which could serve as a digital diet diary and as an analysis and a prediction tool both for the patient and the doctor. The software is gathering details about patients' diet and associated fettle in order to estimate fettle change after future meals, specifically for an individual patient. In this paper we have described the process of idea development and application design, feasibility assessment using a phone survey, a preliminary evaluation on 6 healthy individuals and early results of a clinical trial, which is still an ongoing study. Results suggest that applied approximative approach (Shepard's method of 6-dimensional metric interpolation) has a potential to predict the fettle accurately; as shown in leave-one-out cross-validation (LOOCV).
Detection of hypertensive retinopathy using vessel measurements and textural features.
Agurto, Carla; Joshi, Vinayak; Nemeth, Sheila; Soliz, Peter; Barriga, Simon
2014-01-01
Features that indicate hypertensive retinopathy have been well described in the medical literature. This paper presents a new system to automatically classify subjects with hypertensive retinopathy (HR) using digital color fundus images. Our method consists of the following steps: 1) normalization and enhancement of the image; 2) determination of regions of interest based on automatic location of the optic disc; 3) segmentation of the retinal vasculature and measurement of vessel width and tortuosity; 4) extraction of color features; 5) classification of vessel segments as arteries or veins; 6) calculation of artery-vein ratios using the six widest (major) vessels for each category; 7) calculation of mean red intensity and saturation values for all arteries; 8) calculation of amplitude-modulation frequency-modulation (AM-FM) features for entire image; and 9) classification of features into HR and non-HR using linear regression. This approach was tested on 74 digital color fundus photographs taken with TOPCON and CANON retinal cameras using leave-one out cross validation. An area under the ROC curve (AUC) of 0.84 was achieved with sensitivity and specificity of 90% and 67%, respectively.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dréan, Gaël; Acosta, Oscar, E-mail: Oscar.Acosta@univ-rennes1.fr; Simon, Antoine
2016-06-15
Purpose: Recent studies revealed a trend toward voxelwise population analysis in order to understand the local dose/toxicity relationships in prostate cancer radiotherapy. Such approaches require, however, an accurate interindividual mapping of the anatomies and 3D dose distributions toward a common coordinate system. This step is challenging due to the high interindividual variability. In this paper, the authors propose a method designed for interindividual nonrigid registration of the rectum and dose mapping for population analysis. Methods: The method is based on the computation of a normalized structural description of the rectum using a Laplacian-based model. This description takes advantage of themore » tubular structure of the rectum and its centerline to be embedded in a nonrigid registration-based scheme. The performances of the method were evaluated on 30 individuals treated for prostate cancer in a leave-one-out cross validation. Results: Performance was measured using classical metrics (Dice score and Hausdorff distance), along with new metrics devised to better assess dose mapping in relation with structural deformation (dose-organ overlap). Considering these scores, the proposed method outperforms intensity-based and distance maps-based registration methods. Conclusions: The proposed method allows for accurately mapping interindividual 3D dose distributions toward a single anatomical template, opening the way for further voxelwise statistical analysis.« less
A universal hybrid decision tree classifier design for human activity classification.
Chien, Chieh; Pottie, Gregory J
2012-01-01
A system that reliably classifies daily life activities can contribute to more effective and economical treatments for patients with chronic conditions or undergoing rehabilitative therapy. We propose a universal hybrid decision tree classifier for this purpose. The tree classifier can flexibly implement different decision rules at its internal nodes, and can be adapted from a population-based model when supplemented by training data for individuals. The system was tested using seven subjects each monitored by 14 triaxial accelerometers. Each subject performed fourteen different activities typical of daily life. Using leave-one-out cross validation, our decision tree produced average classification accuracies of 89.9%. In contrast, the MATLAB personalized tree classifiers using Gini's diversity index as the split criterion followed by optimally tuning the thresholds for each subject yielded 69.2%.
Moran, Lara; Andres, Sonia; Allen, Paul; Moloney, Aidan P
2018-08-01
Visible-near infrared spectroscopy (Vis-NIRS) has been suggested to have potential for authentication of food products. The aim of the present preliminary study was to assess if this technology can be used to authenticate the ageing time (3, 7, 14 and 21 days post mortem) of beef steaks from three different muscles (M. Longissimus thoracis, M. Gluteus medius and M. Semitendinosus). Various mathematical pre-treatments were applied to the spectra to correct scattering and overlapping effects, and then partial least squares-discrimination analysis (PLS-DA) procedures applied. The best models were specific for each muscle, and the ability of prediction of ageing time was validated using full (leave-one-out) cross-validation, whereas authentication performance was evaluated using the parameters of sensitivity, specificity and overall correct classification. The results indicate that overall correct classification ranging from 94.2 to 100% was achieved, depending on the muscle. In conclusion, Vis-NIRS technology seems a valid tool for the authentication of ageing time of beef steaks. Copyright © 2018 Elsevier Ltd. All rights reserved.
Can smartwatches replace smartphones for posture tracking?
Mortazavi, Bobak; Nemati, Ebrahim; VanderWall, Kristina; Flores-Rodriguez, Hector G; Cai, Jun Yu Jacinta; Lucier, Jessica; Naeim, Arash; Sarrafzadeh, Majid
2015-10-22
This paper introduces a human posture tracking platform to identify the human postures of sitting, standing or lying down, based on a smartwatch. This work develops such a system as a proof-of-concept study to investigate a smartwatch's ability to be used in future remote health monitoring systems and applications. This work validates the smartwatches' ability to track the posture of users accurately in a laboratory setting while reducing the sampling rate to potentially improve battery life, the first steps in verifying that such a system would work in future clinical settings. The algorithm developed classifies the transitions between three posture states of sitting, standing and lying down, by identifying these transition movements, as well as other movements that might be mistaken for these transitions. The system is trained and developed on a Samsung Galaxy Gear smartwatch, and the algorithm was validated through a leave-one-subject-out cross-validation of 20 subjects. The system can identify the appropriate transitions at only 10 Hz with an F-score of 0.930, indicating its ability to effectively replace smart phones, if needed.
Predicting the risk of toxic blooms of golden alga from cell abundance and environmental covariates
Patino, Reynaldo; VanLandeghem, Matthew M.; Denny, Shawn
2016-01-01
Golden alga (Prymnesium parvum) is a toxic haptophyte that has caused considerable ecological damage to marine and inland aquatic ecosystems worldwide. Studies focused primarily on laboratory cultures have indicated that toxicity is poorly correlated with the abundance of golden alga cells. This relationship, however, has not been rigorously evaluated in the field where environmental conditions are much different. The ability to predict toxicity using readily measured environmental variables and golden alga abundance would allow managers rapid assessments of ichthyotoxicity potential without laboratory bioassay confirmation, which requires additional resources to accomplish. To assess the potential utility of these relationships, several a priori models relating lethal levels of golden alga ichthyotoxicity to golden alga abundance and environmental covariates were constructed. Model parameters were estimated using archived data from four river basins in Texas and New Mexico (Colorado, Brazos, Red, Pecos). Model predictive ability was quantified using cross-validation, sensitivity, and specificity, and the relative ranking of environmental covariate models was determined by Akaike Information Criterion values and Akaike weights. Overall, abundance was a generally good predictor of ichthyotoxicity as cross validation of golden alga abundance-only models ranged from ∼ 80% to ∼ 90% (leave-one-out cross-validation). Environmental covariates improved predictions, especially the ability to predict lethally toxic events (i.e., increased sensitivity), and top-ranked environmental covariate models differed among the four basins. These associations may be useful for monitoring as well as understanding the abiotic factors that influence toxicity during blooms.
Breast cancer detection via Hu moment invariant and feedforward neural network
NASA Astrophysics Data System (ADS)
Zhang, Xiaowei; Yang, Jiquan; Nguyen, Elijah
2018-04-01
One of eight women can get breast cancer during all her life. This study used Hu moment invariant and feedforward neural network to diagnose breast cancer. With the help of K-fold cross validation, we can test the out-of-sample accuracy of our method. Finally, we found that our methods can improve the accuracy of detecting breast cancer and reduce the difficulty of judging.
Amuzu-Aweh, E N; Bijma, P; Kinghorn, B P; Vereijken, A; Visscher, J; van Arendonk, J Am; Bovenhuis, H
2013-12-01
Prediction of heterosis has a long history with mixed success, partly due to low numbers of genetic markers and/or small data sets. We investigated the prediction of heterosis for egg number, egg weight and survival days in domestic white Leghorns, using ∼400 000 individuals from 47 crosses and allele frequencies on ∼53 000 genome-wide single nucleotide polymorphisms (SNPs). When heterosis is due to dominance, and dominance effects are independent of allele frequencies, heterosis is proportional to the squared difference in allele frequency (SDAF) between parental pure lines (not necessarily homozygous). Under these assumptions, a linear model including regression on SDAF partitions crossbred phenotypes into pure-line values and heterosis, even without pure-line phenotypes. We therefore used models where phenotypes of crossbreds were regressed on the SDAF between parental lines. Accuracy of prediction was determined using leave-one-out cross-validation. SDAF predicted heterosis for egg number and weight with an accuracy of ∼0.5, but did not predict heterosis for survival days. Heterosis predictions allowed preselection of pure lines before field-testing, saving ∼50% of field-testing cost with only 4% loss in heterosis. Accuracies from cross-validation were lower than from the model-fit, suggesting that accuracies previously reported in literature are overestimated. Cross-validation also indicated that dominance cannot fully explain heterosis. Nevertheless, the dominance model had considerable accuracy, clearly greater than that of a general/specific combining ability model. This work also showed that heterosis can be modelled even when pure-line phenotypes are unavailable. We concluded that SDAF is a useful predictor of heterosis in commercial layer breeding.
Prediction of skin sensitization potency using machine learning approaches.
Zang, Qingda; Paris, Michael; Lehmann, David M; Bell, Shannon; Kleinstreuer, Nicole; Allen, David; Matheson, Joanna; Jacobs, Abigail; Casey, Warren; Strickland, Judy
2017-07-01
The replacement of animal use in testing for regulatory classification of skin sensitizers is a priority for US federal agencies that use data from such testing. Machine learning models that classify substances as sensitizers or non-sensitizers without using animal data have been developed and evaluated. Because some regulatory agencies require that sensitizers be further classified into potency categories, we developed statistical models to predict skin sensitization potency for murine local lymph node assay (LLNA) and human outcomes. Input variables for our models included six physicochemical properties and data from three non-animal test methods: direct peptide reactivity assay; human cell line activation test; and KeratinoSens™ assay. Models were built to predict three potency categories using four machine learning approaches and were validated using external test sets and leave-one-out cross-validation. A one-tiered strategy modeled all three categories of response together while a two-tiered strategy modeled sensitizer/non-sensitizer responses and then classified the sensitizers as strong or weak sensitizers. The two-tiered model using the support vector machine with all assay and physicochemical data inputs provided the best performance, yielding accuracy of 88% for prediction of LLNA outcomes (120 substances) and 81% for prediction of human test outcomes (87 substances). The best one-tiered model predicted LLNA outcomes with 78% accuracy and human outcomes with 75% accuracy. By comparison, the LLNA predicts human potency categories with 69% accuracy (60 of 87 substances correctly categorized). These results suggest that computational models using non-animal methods may provide valuable information for assessing skin sensitization potency. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Portable visible and near-infrared spectrophotometer for triglyceride measurements.
Kobayashi, Takanori; Kato, Yukiko Hakariya; Tsukamoto, Megumi; Ikuta, Kazuyoshi; Sakudo, Akikazu
2009-01-01
An affordable and portable machine is required for the practical use of visible and near-infrared (Vis-NIR) spectroscopy. A portable fruit tester comprising a Vis-NIR spectrophotometer was modified for use in the transmittance mode and employed to quantify triglyceride levels in serum in combination with a chemometric analysis. Transmittance spectra collected in the 600- to 1100-nm region were subjected to a partial least-squares regression analysis and leave-out cross-validation to develop a chemometrics model for predicting triglyceride concentrations in serum. The model yielded a coefficient of determination in cross-validation (R2VAL) of 0.7831 with a standard error of cross-validation (SECV) of 43.68 mg/dl. The detection limit of the model was 148.79 mg/dl. Furthermore, masked samples predicted by the model yielded a coefficient of determination in prediction (R2PRED) of 0.6856 with a standard error of prediction (SEP) and detection limit of 61.54 and 159.38 mg/dl, respectively. The portable Vis-NIR spectrophotometer may prove convenient for the measurement of triglyceride concentrations in serum, although before practical use there remain obstacles, which are discussed.
Bueno, Justin; Sikirzhytski, Vitali; Lednev, Igor K
2013-08-06
The ability to link a suspect to a particular shooting incident is a principal task for many forensic investigators. Here, we attempt to achieve this goal by analysis of gunshot residue (GSR) through the use of attenuated total reflectance (ATR) Fourier transform infrared spectroscopy (FT-IR) combined with statistical analysis. The firearm discharge process is analogous to a complex chemical process. Therefore, the products of this process (GSR) will vary based upon numerous factors, including the specific combination of the firearm and ammunition which was discharged. Differentiation of FT-IR data, collected from GSR particles originating from three different firearm-ammunition combinations (0.38 in., 0.40 in., and 9 mm calibers), was achieved using projection to latent structures discriminant analysis (PLS-DA). The technique was cross (leave-one-out), both internally and externally, validated. External validation was achieved via assignment (caliber identification) of unknown FT-IR spectra from unknown GSR particles. The results demonstrate great potential for ATR-FT-IR spectroscopic analysis of GSR for forensic purposes.
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine
Manavalan, Balachandran; Shin, Tae H.; Lee, Gwang
2018-01-01
Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html. PMID:29616000
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.
Manavalan, Balachandran; Shin, Tae H; Lee, Gwang
2018-01-01
Accurately identifying bacteriophage virion proteins from uncharacterized sequences is important to understand interactions between the phage and its host bacteria in order to develop new antibacterial drugs. However, identification of such proteins using experimental techniques is expensive and often time consuming; hence, development of an efficient computational algorithm for the prediction of phage virion proteins (PVPs) prior to in vitro experimentation is needed. Here, we describe a support vector machine (SVM)-based PVP predictor, called PVP-SVM, which was trained with 136 optimal features. A feature selection protocol was employed to identify the optimal features from a large set that included amino acid composition, dipeptide composition, atomic composition, physicochemical properties, and chain-transition-distribution. PVP-SVM achieved an accuracy of 0.870 during leave-one-out cross-validation, which was 6% higher than control SVM predictors trained with all features, indicating the efficiency of the feature selection method. Furthermore, PVP-SVM displayed superior performance compared to the currently available method, PVPred, and two other machine-learning methods developed in this study when objectively evaluated with an independent dataset. For the convenience of the scientific community, a user-friendly and publicly accessible web server has been established at www.thegleelab.org/PVP-SVM/PVP-SVM.html.
Tahir, Fahima; Fahiem, Muhammad Abuzar
2014-01-01
The quality of pharmaceutical products plays an important role in pharmaceutical industry as well as in our lives. Usage of defective tablets can be harmful for patients. In this research we proposed a nondestructive method to identify defective and nondefective tablets using their surface morphology. Three different environmental factors temperature, humidity and moisture are analyzed to evaluate the performance of the proposed method. Multiple textural features are extracted from the surface of the defective and nondefective tablets. These textural features are gray level cooccurrence matrix, run length matrix, histogram, autoregressive model and HAAR wavelet. Total textural features extracted from images are 281. We performed an analysis on all those 281, top 15, and top 2 features. Top 15 features are extracted using three different feature reduction techniques: chi-square, gain ratio and relief-F. In this research we have used three different classifiers: support vector machine, K-nearest neighbors and naïve Bayes to calculate the accuracies against proposed method using two experiments, that is, leave-one-out cross-validation technique and train test models. We tested each classifier against all selected features and then performed the comparison of their results. The experimental work resulted in that in most of the cases SVM performed better than the other two classifiers.
Aggio, Raphael B. M.; de Lacy Costello, Ben; White, Paul; Khalid, Tanzeela; Ratcliffe, Norman M.; Persad, Raj; Probert, Chris S. J.
2016-01-01
Prostate cancer is one of the most common cancers. Serum prostate-specific antigen (PSA) is used to aid the selection of men undergoing biopsies. Its use remains controversial. We propose a GC-sensor algorithm system for classifying urine samples from patients with urological symptoms. This pilot study includes 155 men presenting to urology clinics, 58 were diagnosed with prostate cancer, 24 with bladder cancer and 73 with haematuria and or poor stream, without cancer. Principal component analysis (PCA) was applied to assess the discrimination achieved, while linear discriminant analysis (LDA) and support vector machine (SVM) were used as statistical models for sample classification. Leave-one-out cross-validation (LOOCV), repeated 10-fold cross-validation (10FoldCV), repeated double cross-validation (DoubleCV) and Monte Carlo permutations were applied to assess performance. Significant separation was found between prostate cancer and control samples, bladder cancer and controls and between bladder and prostate cancer samples. For prostate cancer diagnosis, the GC/SVM system classified samples with 95% sensitivity and 96% specificity after LOOCV. For bladder cancer diagnosis, the SVM reported 96% sensitivity and 100% specificity after LOOCV, while the DoubleCV reported 87% sensitivity and 99% specificity, with SVM showing 78% and 98% sensitivity between prostate and bladder cancer samples. Evaluation of the results of the Monte Carlo permutation of class labels obtained chance-like accuracy values around 50% suggesting the observed results for bladder cancer and prostate cancer detection are not due to over fitting. The results of the pilot study presented here indicate that the GC system is able to successfully identify patterns that allow classification of urine samples from patients with urological cancers. An accurate diagnosis based on urine samples would reduce the number of negative prostate biopsies performed, and the frequency of surveillance cystoscopy for bladder cancer patients. Larger cohort studies are planned to investigate the potential of this system. Future work may lead to non-invasive breath analyses for diagnosing urological conditions. PMID:26865331
Tomaschewski-Barlem, Jamila Geri; Lunardi, Valéria Lerch; Barlem, Edison Luiz Devos; da Silveira, Rosemary Silva; Dalmolin, Graziele de Lima; Ramos, Aline Marcelino
2015-01-01
Abstract Objective: to adapt culturally and validate the Protective Nursing Advocacy Scale for Brazilian nurses. Method: methodological study carried out with 153 nurses from two hospitals in the South region of Brazil, one public and the other philanthropic. The cross-cultural adaptation of the Protective Nursing Advocacy Scale was performed according to international standards, and its validation was carried out for use in the Brazilian context, by means of factor analysis and Cronbach's alpha as measure of internal consistency. Results: by means of evaluation by a committee of experts and application of pre-test, face validity and content validity of the instrument were considered satisfactory. From the factor analysis, five constructs were identified: negative implications of the advocacy practice, advocacy actions, facilitators of the advocacy practice, perceptions that favor practice advocacy and barriers to advocacy practice. The instrument showed satisfactory internal consistency, with Cronbach's alpha values ranging from 0.70 to 0.87. Conclusion: it was concluded that the Protective Nursing Advocacy Scale - Brazilian version, is a valid and reliable instrument for use in the evaluation of beliefs and actions of health advocacy, performed by Brazilian nurses in their professional practice environment. PMID:26444169
Lippolis, Vincenzo; Ferrara, Massimo; Cervellieri, Salvatore; Damascelli, Anna; Epifani, Filomena; Pascale, Michelangelo; Perrone, Giancarlo
2016-02-02
The availability of rapid diagnostic methods for monitoring ochratoxigenic species during the seasoning processes for dry-cured meats is crucial and constitutes a key stage in order to prevent the risk of ochratoxin A (OTA) contamination. A rapid, easy-to-perform and non-invasive method using an electronic nose (e-nose) based on metal oxide semiconductors (MOS) was developed to discriminate dry-cured meat samples in two classes based on the fungal contamination: class P (samples contaminated by OTA-producing Penicillium strains) and class NP (samples contaminated by OTA non-producing Penicillium strains). Two OTA-producing strains of Penicillium nordicum and two OTA non-producing strains of Penicillium nalgiovense and Penicillium salamii, were tested. The feasibility of this approach was initially evaluated by e-nose analysis of 480 samples of both Yeast extract sucrose (YES) and meat-based agar media inoculated with the tested Penicillium strains and incubated up to 14 days. The high recognition percentages (higher than 82%) obtained by Discriminant Function Analysis (DFA), either in calibration and cross-validation (leave-more-out approach), for both YES and meat-based samples demonstrated the validity of the used approach. The e-nose method was subsequently developed and validated for the analysis of dry-cured meat samples. A total of 240 e-nose analyses were carried out using inoculated sausages, seasoned by a laboratory-scale process and sampled at 5, 7, 10 and 14 days. DFA provided calibration models that permitted discrimination of dry-cured meat samples after only 5 days of seasoning with mean recognition percentages in calibration and cross-validation of 98 and 88%, respectively. A further validation of the developed e-nose method was performed using 60 dry-cured meat samples produced by an industrial-scale seasoning process showing a total recognition percentage of 73%. The pattern of volatile compounds of dry-cured meat samples was identified and characterized by a developed HS-SPME/GC-MS method. Seven volatile compounds (2-methyl-1-butanol, octane, 1R-α-pinene, d-limonene, undecane, tetradecanal, 9-(Z)-octadecenoic acid methyl ester) allowed discrimination between dry-cured meat samples of classes P and NP. These results demonstrate that MOS-based electronic nose can be a useful tool for a rapid screening in preventing OTA contamination in the cured meat supply chain. Copyright © 2015 Elsevier B.V. All rights reserved.
Li, Guang; Wei, Jie; Huang, Hailiang; Gaebler, Carl Philipp; Yuan, Amy; Deasy, Joseph O
2015-12-01
To automatically estimate average diaphragm motion trajectory (ADMT) based on four-dimensional computed tomography (4DCT), facilitating clinical assessment of respiratory motion and motion variation and retrospective motion study. We have developed an effective motion extraction approach and a machine-learning-based algorithm to estimate the ADMT. Eleven patients with 22 sets of 4DCT images (4DCT1 at simulation and 4DCT2 at treatment) were studied. After automatically segmenting the lungs, the differential volume-per-slice (dVPS) curves of the left and right lungs were calculated as a function of slice number for each phase with respective to the full-exhalation. After 5-slice moving average was performed, the discrete cosine transform (DCT) was applied to analyze the dVPS curves in frequency domain. The dimensionality of the spectrum data was reduced by using several lowest frequency coefficients ( f v ) to account for most of the spectrum energy (Σ f v 2 ). Multiple linear regression (MLR) method was then applied to determine the weights of these frequencies by fitting the ground truth-the measured ADMT, which are represented by three pivot points of the diaphragm on each side. The 'leave-one-out' cross validation method was employed to analyze the statistical performance of the prediction results in three image sets: 4DCT1, 4DCT2, and 4DCT1 + 4DCT2. Seven lowest frequencies in DCT domain were found to be sufficient to approximate the patient dVPS curves ( R = 91%-96% in MLR fitting). The mean error in the predicted ADMT using leave-one-out method was 0.3 ± 1.9 mm for the left-side diaphragm and 0.0 ± 1.4 mm for the right-side diaphragm. The prediction error is lower in 4DCT2 than 4DCT1, and is the lowest in 4DCT1 and 4DCT2 combined. This frequency-analysis-based machine learning technique was employed to predict the ADMT automatically with an acceptable error (0.2 ± 1.6 mm). This volumetric approach is not affected by the presence of the lung tumors, providing an automatic robust tool to evaluate diaphragm motion.
Auditory evoked potentials in patients with major depressive disorder measured by Emotiv system.
Wang, Dongcui; Mo, Fongming; Zhang, Yangde; Yang, Chao; Liu, Jun; Chen, Zhencheng; Zhao, Jinfeng
2015-01-01
In a previous study (unpublished), Emotiv headset was validated for capturing event-related potentials (ERPs) from normal subjects. In the present follow-up study, the signal quality of Emotiv headset was tested by the accuracy rate of discriminating Major Depressive Disorder (MDD) patients from the normal subjects. ERPs of 22 MDD patients and 15 normal subjects were induced by an auditory oddball task and the amplitude of N1, N2 and P3 of ERP components were specifically analyzed. The features of ERPs were statistically investigated. It is found that Emotiv headset is capable of discriminating the abnormal N1, N2 and P3 components in MDD patients. Relief-F algorithm was applied to all features for feature selection. The selected features were then input to a linear discriminant analysis (LDA) classifier with leave-one-out cross-validation to characterize the ERP features of MDD. 127 possible combinations out of the selected 7 ERP features were classified using LDA. The best classification accuracy was achieved to be 89.66%. These results suggest that MDD patients are identifiable from normal subjects by ERPs measured by Emotiv headset.
van Os-Medendorp, Harmieke; Appelman-Noordermeer, Simone; Bruijnzeel-Koomen, Carla; de Bruin-Weller, Marjolein
2015-03-27
Little is known about the prevalence of sick leave due to atopic dermatitis (AD). The current literature on factors influencing sick leave is mostly derived from other chronic inflammatory diseases. This study aimed to determine the prevalence of sick leave due to AD and to identify influencing factors. A cross-sectional study was carried out in adult patients with AD. sick leave during the two-week and one-year periods, socio-demographic characteristics, disease severity, quality of life and socio-occupational factors. Logistic regression analyses were used to determine influencing factors on sick leave over the two-week period. In total, 253 patients were included; 12% of the patients had to take sick leave in the last two weeks due to AD and 42% in the past year. A higher level of symptom interference (OR 1.26; 95% CI 1.13-1.40) or perfectionism/diligence (OR 0.90; 95% CI 0.83-0.96) may respectively increase or decrease the number of sick leave days. Sick leave in patients with AD is a common problem and symptom interference and perfectionism/diligence appeared to influence it. Novel approaches are needed to deal with symptoms at work or school to reduce the amount of sick leave due to AD.
Shiao, S Pamela K; Grayson, James; Yu, Chong Ho; Wasek, Brandi; Bottiglieri, Teodoro
2018-02-16
For the personalization of polygenic/omics-based health care, the purpose of this study was to examine the gene-environment interactions and predictors of colorectal cancer (CRC) by including five key genes in the one-carbon metabolism pathways. In this proof-of-concept study, we included a total of 54 families and 108 participants, 54 CRC cases and 54 matched family friends representing four major racial ethnic groups in southern California (White, Asian, Hispanics, and Black). We used three phases of data analytics, including exploratory, family-based analyses adjusting for the dependence within the family for sharing genetic heritage, the ensemble method, and generalized regression models for predictive modeling with a machine learning validation procedure to validate the results for enhanced prediction and reproducibility. The results revealed that despite the family members sharing genetic heritage, the CRC group had greater combined gene polymorphism rates than the family controls ( p < 0.05), on MTHFR C677T , MTR A2756G , MTRR A66G, and DHFR 19 bp except MTHFR A1298C. Four racial groups presented different polymorphism rates for four genes (all p < 0.05) except MTHFR A1298C. Following the ensemble method, the most influential factors were identified, and the best predictive models were generated by using the generalized regression models, with Akaike's information criterion and leave-one-out cross validation methods. Body mass index (BMI) and gender were consistent predictors of CRC for both models when individual genes versus total polymorphism counts were used, and alcohol use was interactive with BMI status. Body mass index status was also interactive with both gender and MTHFR C677T gene polymorphism, and the exposure to environmental pollutants was an additional predictor. These results point to the important roles of environmental and modifiable factors in relation to gene-environment interactions in the prevention of CRC.
Amiresmaili, Mohammadreza; Khosravi, Sajad; Feyzabadi, Vahid Yazdi
2014-01-01
Background: Rural family physician program as the new reform in the Iranian health system has been implemented since 2005. Its success depends much on physicians’ retention. The present study aimed to identify influential factors on physicians’ willingness to leave out this program in Kerman province. Methods: The present cross-sectional study was performed in Kerman province in 2011. All family physicians working in this program (n = 271) were studied using a questionnaire. Data analysis was carried out using descriptive statistics and logistic regression through SPSS version 18.0. Results: Twenty-six percent (70) of the physicians had left out the program in the past. In addition, 77.3% (208) intended to leave out in the near future. Opportunity for continuing education, inappropriate and long working hours, unsuitable requirements of salary, irregular payments, lack of job security and high working responsibility were regarded as the most important reasons for leaving out the program in the past and intention to leave out in future orderly. According to univariate logistic regression, younger physicians (odds ratio [OR] =2.479; 95% confidence interval [CI]: 1.261-4.872) and physicians who had older children (OR = 4.743; 95% CI: 1.441-15.607) were more willing to leave out the plan in the near future, however it was not significant in multivariate logistic regression. Conclusions: Physician retention in family physician program is faced with serious doubts due to different reasons. The success of the program is endangered because of the pivotal role of human resources. Hence, the revision of human resources policies of the program seems necessary in order to reduce physicians leave out and improving its effectiveness. PMID:25400891
NASA Astrophysics Data System (ADS)
Sosa, Germán. D.; Cruz-Roa, Angel; González, Fabio A.
2015-01-01
This work addresses the problem of lung sound classification, in particular, the problem of distinguishing between wheeze and normal sounds. Wheezing sound detection is an important step to associate lung sounds with an abnormal state of the respiratory system, usually associated with tuberculosis or another chronic obstructive pulmonary diseases (COPD). The paper presents an approach for automatic lung sound classification, which uses different state-of-the-art sound features in combination with a C-weighted support vector machine (SVM) classifier that works better for unbalanced data. Feature extraction methods used here are commonly applied in speech recognition and related problems thanks to the fact that they capture the most informative spectral content from the original signals. The evaluated methods were: Fourier transform (FT), wavelet decomposition using Wavelet Packet Transform bank of filters (WPT) and Mel Frequency Cepstral Coefficients (MFCC). For comparison, we evaluated and contrasted the proposed approach against previous works using different combination of features and/or classifiers. The different methods were evaluated on a set of lung sounds including normal and wheezing sounds. A leave-two-out per-case cross-validation approach was used, which, in each fold, chooses as validation set a couple of cases, one including normal sounds and the other including wheezing sounds. Experimental results were reported in terms of traditional classification performance measures: sensitivity, specificity and balanced accuracy. Our best results using the suggested approach, C-weighted SVM and MFCC, achieve a 82.1% of balanced accuracy obtaining the best result for this problem until now. These results suggest that supervised classifiers based on kernel methods are able to learn better models for this challenging classification problem even using the same feature extraction methods.
Automatic tissue segmentation of head and neck MR images for hyperthermia treatment planning
NASA Astrophysics Data System (ADS)
Fortunati, Valerio; Verhaart, René F.; Niessen, Wiro J.; Veenland, Jifke F.; Paulides, Margarethus M.; van Walsum, Theo
2015-08-01
A hyperthermia treatment requires accurate, patient-specific treatment planning. This planning is based on 3D anatomical models which are generally derived from computed tomography. Because of its superior soft tissue contrast, magnetic resonance imaging (MRI) information can be introduced to improve the quality of these 3D patient models and therefore the treatment planning itself. Thus, we present here an automatic atlas-based segmentation algorithm for MR images of the head and neck. Our method combines multiatlas local weighting fusion with intensity modelling. The accuracy of the method was evaluated using a leave-one-out cross validation experiment over a set of 11 patients for which manual delineation were available. The accuracy of the proposed method was high both in terms of the Dice similarity coefficient (DSC) and the 95th percentile Hausdorff surface distance (HSD) with median DSC higher than 0.8 for all tissues except sclera. For all tissues, except the spine tissues, the accuracy was approaching the interobserver agreement/variability both in terms of DSC and HSD. The positive effect of adding the intensity modelling to the multiatlas fusion decreased when a more accurate atlas fusion method was used. Using the proposed approach we improved the performance of the approach previously presented for H&N hyperthermia treatment planning, making the method suitable for clinical application.
NASA Astrophysics Data System (ADS)
Li, Ke; Ye, Chuyang; Yang, Zhen; Carass, Aaron; Ying, Sarah H.; Prince, Jerry L.
2016-03-01
Cerebellar peduncles (CPs) are white matter tracts connecting the cerebellum to other brain regions. Automatic segmentation methods of the CPs have been proposed for studying their structure and function. Usually the performance of these methods is evaluated by comparing segmentation results with manual delineations (ground truth). However, when a segmentation method is run on new data (for which no ground truth exists) it is highly desirable to efficiently detect and assess algorithm failures so that these cases can be excluded from scientific analysis. In this work, two outlier detection methods aimed to assess the performance of an automatic CP segmentation algorithm are presented. The first one is a univariate non-parametric method using a box-whisker plot. We first categorize automatic segmentation results of a dataset of diffusion tensor imaging (DTI) scans from 48 subjects as either a success or a failure. We then design three groups of features from the image data of nine categorized failures for failure detection. Results show that most of these features can efficiently detect the true failures. The second method—supervised classification—was employed on a larger DTI dataset of 249 manually categorized subjects. Four classifiers—linear discriminant analysis (LDA), logistic regression (LR), support vector machine (SVM), and random forest classification (RFC)—were trained using the designed features and evaluated using a leave-one-out cross validation. Results show that the LR performs worst among the four classifiers and the other three perform comparably, which demonstrates the feasibility of automatically detecting segmentation failures using classification methods.
CoMSIA and Docking Study of Rhenium Based Estrogen Receptor Ligand Analogs
Wolohan, Peter; Reichert, David E.
2007-01-01
OPLS all atom force field parameters were developed in order to model a diverse set of novel rhenium based estrogen receptor ligands whose relative binding affinities (RBA) to the estrogen receptor alpha isoform (ERα) with respect to 17β-Estradiol were available. The binding properties of these novel rhenium based organometallic complexes were studied with a combination of Comparative Molecular Similarity Indices Analysis (CoMSIA) and docking. A total of 29 estrogen receptor ligands consisting of 11 rhenium complexes and 18 organic ligands were docked inside the ligand-binding domain (LBD) of ERα utilizing the program Gold. The top ranked pose was used to construct CoMSIA models from a training set of 22 of the estrogen receptor ligands which were selected at random. In addition scoring functions from the docking runs and the polar volume (PV) were also studied to investigate their ability to predict RBA ERα. A partial least-squares analysis consisting of the CoMSIA steric, electrostatic and hydrophobic indices together with the polar volume proved sufficiently predictive having a correlation coefficient, r2, of 0.94 and a cross-validated correlation coefficient, q2, utilizing the leave one out method of 0.68. Analysis of the scoring functions from Gold showed particularly poor correlation to RBA ERα which did not improve when the rhenium complexes were extracted to leave the organic ligands. The combined CoMSIA and polar volume model ranked correctly the ligands in order of increasing RBA ERα, illustrating the utility of this method as a prescreening tool in the development of novel rhenium based estrogen receptor ligands. PMID:17280694
A prediction scheme of tropical cyclone frequency based on lasso and random forest
NASA Astrophysics Data System (ADS)
Tan, Jinkai; Liu, Hexiang; Li, Mengya; Wang, Jun
2017-07-01
This study aims to propose a novel prediction scheme of tropical cyclone frequency (TCF) over the Western North Pacific (WNP). We concerned the large-scale meteorological factors inclusive of the sea surface temperature, sea level pressure, the Niño-3.4 index, the wind shear, the vorticity, the subtropical high, and the sea ice cover, since the chronic change of these factors in the context of climate change would cause a gradual variation of the annual TCF. Specifically, we focus on the correlation between the year-to-year increment of these factors and TCF. The least absolute shrinkage and selection operator (Lasso) method was used for variable selection and dimension reduction from 11 initial predictors. Then, a prediction model based on random forest (RF) was established by using the training samples (1978-2011) for calibration and the testing samples (2012-2016) for validation. The RF model presents a major variation and trend of TCF in the period of calibration, and also fitted well with the observed TCF in the period of validation though there were some deviations. The leave-one-out cross validation of the model exhibited most of the predicted TCF are in consistence with the observed TCF with a high correlation coefficient. A comparison between results of the RF model and the multiple linear regression (MLR) model suggested the RF is more practical and capable of giving reliable results of TCF prediction over the WNP.
Choi, Bongsam
2018-01-01
[Purpose] This study aimed to cross-cultural adapt and validate the Korean version of an physical activity measure (K-PAM) for community-dwelling elderly. [Subjects and Methods] One hundred and thirty eight community-dwelling elderlies, 32 males and 106 female, participated in the study. All participants were asked to fill out a fifty-one item questionnaire measuring perceived difficulty in the activities of daily living (ADL) for the elderly. One-parameter model of item response theory (Rasch analysis) was applied to determine the construct validity and to inspect item-level psychometric properties of 51 ADL items of the K-PAM. [Results] Person separation reliability (analogous to Cronbach's alpha) for internal consistency was ranging 0.93 to 0.94. A total of 16 items was misfit to the Rasch model. After misfit item deletion, 35 ADL items of the K-PAM were placed in an empirically meaningful hierarchy from easy to hard. The item-person map analysis delineated that the item difficulty was well matched for the elderlies with moderate and low ability except for high ceilings. [Conclusion] Cross-cultural adapted K-PAM was shown to be sufficient for establishing construct validity and stable psychometric properties confirmed by person separation reliability and fit statistics.
Harmony Search as a Powerful Tool for Feature Selection in QSPR Study of the Drugs Lipophilicity.
Bahadori, Behnoosh; Atabati, Morteza
2017-01-01
Aims & Scope: Lipophilicity represents one of the most studied and most frequently used fundamental physicochemical properties. In the present work, harmony search (HS) algorithm is suggested to feature selection in quantitative structure-property relationship (QSPR) modeling to predict lipophilicity of neutral, acidic, basic and amphotheric drugs that were determined by UHPLC. Harmony search is a music-based metaheuristic optimization algorithm. It was affected by the observation that the aim of music is to search for a perfect state of harmony. Semi-empirical quantum-chemical calculations at AM1 level were used to find the optimum 3D geometry of the studied molecules and variant descriptors (1497 descriptors) were calculated by the Dragon software. The selected descriptors by harmony search algorithm (9 descriptors) were applied for model development using multiple linear regression (MLR). In comparison with other feature selection methods such as genetic algorithm and simulated annealing, harmony search algorithm has better results. The root mean square error (RMSE) with and without leave-one out cross validation (LOOCV) were obtained 0.417 and 0.302, respectively. The results were compared with those obtained from the genetic algorithm and simulated annealing methods and it showed that the HS is a helpful tool for feature selection with fine performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, J; Gong, G; Cui, Y
Purpose: To predict early pathological response of breast cancer to neoadjuvant chemotherapy (NAC) based on quantitative, multi-region analysis of dynamic contrast enhancement magnetic resonance imaging (DCE-MRI). Methods: In this institution review board-approved study, 35 patients diagnosed with stage II/III breast cancer were retrospectively investigated using DCE-MR images acquired before and after the first cycle of NAC. First, principal component analysis (PCA) was used to reduce the dimensionality of the DCE-MRI data with a high-temporal resolution. We then partitioned the whole tumor into multiple subregions using k-means clustering based on the PCA-defined eigenmaps. Within each tumor subregion, we extracted four quantitativemore » Haralick texture features based on the gray-level co-occurrence matrix (GLCM). The change in texture features in each tumor subregion between pre- and during-NAC was used to predict pathological complete response after NAC. Results: Three tumor subregions were identified through clustering, each with distinct enhancement characteristics. In univariate analysis, all imaging predictors except one extracted from the tumor subregion associated with fast wash-out were statistically significant (p< 0.05) after correcting for multiple testing, with area under the ROC curve or AUCs between 0.75 and 0.80. In multivariate analysis, the proposed imaging predictors achieved an AUC of 0.79 (p = 0.002) in leave-one-out cross validation. This improved upon conventional imaging predictors such as tumor volume (AUC=0.53) and texture features based on whole-tumor analysis (AUC=0.65). Conclusion: The heterogeneity of the tumor subregion associated with fast wash-out on DCE-MRI predicted early pathological response to neoadjuvant chemotherapy in breast cancer.« less
QSPR for predicting chloroform formation in drinking water disinfection.
Luilo, G B; Cabaniss, S E
2011-01-01
Chlorination is the most widely used technique for water disinfection, but may lead to the formation of chloroform (trichloromethane; TCM) and other by-products. This article reports the first quantitative structure-property relationship (QSPR) for predicting the formation of TCM in chlorinated drinking water. Model compounds (n = 117) drawn from 10 literature sources were divided into training data (n = 90, analysed by five-way leave-many-out internal cross-validation) and external validation data (n = 27). QSPR internal cross-validation had Q² = 0.94 and root mean square error (RMSE) of 0.09 moles TCM per mole compound, consistent with external validation Q2 of 0.94 and RMSE of 0.08 moles TCM per mole compound, and met criteria for high predictive power and robustness. In contrast, log TCM QSPR performed poorly and did not meet the criteria for predictive power. The QSPR predictions were consistent with experimental values for TCM formation from tannic acid and for model fulvic acid structures. The descriptors used are consistent with a relatively small number of important TCM precursor structures based upon 1,3-dicarbonyls or 1,3-diphenols.
Kantsadi, Anastassia L; Parmenopoulou, Vanessa; Bakalov, Dimitar N; Snelgrove, Laura; Stravodimos, George A; Chatzileontiadou, Demetra S M; Manta, Stella; Panagiotopoulou, Angeliki; Hayes, Joseph M; Komiotis, Dimitri; Leonidas, Demetres D
2015-01-01
Glycogen phosphorylase (GP), a validated target for the development of anti-hyperglycaemic agents, has been targeted for the design of novel glycopyranosylamine inhibitors. Exploiting the two most potent inhibitors from our previous study of N-acyl-β-D-glucopyranosylamines (Parmenopoulou et al., Bioorg. Med. Chem. 2014, 22, 4810), we have extended the linking group to -NHCONHCO- between the glucose moiety and the aliphatic/aromatic substituent in the GP catalytic site β-cavity. The N-acyl-N´-(β-D-glucopyranosyl) urea inhibitors were synthesized and their efficiency assessed by biochemical methods, revealing inhibition constant values of 4.95 µM and 2.53 µM. Crystal structures of GP in complex with these inhibitors were determined and analyzed, providing data for further structure based design efforts. A novel Linear Response - Molecular Mechanics Coulomb Surface Area (LR-MM-CBSA) method has been developed which relates predicted and experimental binding free energies for a training set of N-acyl-N´-(β-D-glucopyranosyl) urea ligands with a correlation coefficient R(2) of 0.89 and leave-one-out cross-validation (LOO-cv) Q(2) statistic of 0.79. The method has significant applications to direct future lead optimization studies, where ligand entropy loss on binding is revealed as a key factor to be considered. ADMET property predictions revealed that apart from potential permeability issues, the synthesized N-acyl-N´-(β-D-glucopyranosyl) urea inhibitors have drug-like potential without any toxicity warnings.
Genomic Prediction of Seed Quality Traits Using Advanced Barley Breeding Lines.
Nielsen, Nanna Hellum; Jahoor, Ahmed; Jensen, Jens Due; Orabi, Jihad; Cericola, Fabio; Edriss, Vahid; Jensen, Just
2016-01-01
Genomic selection was recently introduced in plant breeding. The objective of this study was to develop genomic prediction for important seed quality parameters in spring barley. The aim was to predict breeding values without expensive phenotyping of large sets of lines. A total number of 309 advanced spring barley lines tested at two locations each with three replicates were phenotyped and each line was genotyped by Illumina iSelect 9Kbarley chip. The population originated from two different breeding sets, which were phenotyped in two different years. Phenotypic measurements considered were: seed size, protein content, protein yield, test weight and ergosterol content. A leave-one-out cross-validation strategy revealed high prediction accuracies ranging between 0.40 and 0.83. Prediction across breeding sets resulted in reduced accuracies compared to the leave-one-out strategy. Furthermore, predicting across full and half-sib-families resulted in reduced prediction accuracies. Additionally, predictions were performed using reduced marker sets and reduced training population sets. In conclusion, using less than 200 lines in the training set can result in low prediction accuracy, and the accuracy will then be highly dependent on the family structure of the selected training set. However, the results also indicate that relatively small training sets (200 lines) are sufficient for genomic prediction in commercial barley breeding. In addition, our results indicate a minimum marker set of 1,000 to decrease the risk of low prediction accuracy for some traits or some families.
Genomic Prediction of Seed Quality Traits Using Advanced Barley Breeding Lines
Nielsen, Nanna Hellum; Jahoor, Ahmed; Jensen, Jens Due; Orabi, Jihad; Cericola, Fabio; Edriss, Vahid; Jensen, Just
2016-01-01
Genomic selection was recently introduced in plant breeding. The objective of this study was to develop genomic prediction for important seed quality parameters in spring barley. The aim was to predict breeding values without expensive phenotyping of large sets of lines. A total number of 309 advanced spring barley lines tested at two locations each with three replicates were phenotyped and each line was genotyped by Illumina iSelect 9Kbarley chip. The population originated from two different breeding sets, which were phenotyped in two different years. Phenotypic measurements considered were: seed size, protein content, protein yield, test weight and ergosterol content. A leave-one-out cross-validation strategy revealed high prediction accuracies ranging between 0.40 and 0.83. Prediction across breeding sets resulted in reduced accuracies compared to the leave-one-out strategy. Furthermore, predicting across full and half-sib-families resulted in reduced prediction accuracies. Additionally, predictions were performed using reduced marker sets and reduced training population sets. In conclusion, using less than 200 lines in the training set can result in low prediction accuracy, and the accuracy will then be highly dependent on the family structure of the selected training set. However, the results also indicate that relatively small training sets (200 lines) are sufficient for genomic prediction in commercial barley breeding. In addition, our results indicate a minimum marker set of 1,000 to decrease the risk of low prediction accuracy for some traits or some families. PMID:27783639
Luo, Heng; Ye, Hao; Ng, Hui; Shi, Leming; Tong, Weida; Mattes, William; Mendrick, Donna; Hong, Huixiao
2015-01-01
As the major histocompatibility complex (MHC), human leukocyte antigens (HLAs) are one of the most polymorphic genes in humans. Patients carrying certain HLA alleles may develop adverse drug reactions (ADRs) after taking specific drugs. Peptides play an important role in HLA related ADRs as they are the necessary co-binders of HLAs with drugs. Many experimental data have been generated for understanding HLA-peptide binding. However, efficiently utilizing the data for understanding and accurately predicting HLA-peptide binding is challenging. Therefore, we developed a network analysis based method to understand and predict HLA-peptide binding. Qualitative Class I HLA-peptide binding data were harvested and prepared from four major databases. An HLA-peptide binding network was constructed from this dataset and modules were identified by the fast greedy modularity optimization algorithm. To examine the significance of signals in the yielded models, the modularity was compared with the modularity values generated from 1,000 random networks. The peptides and HLAs in the modules were characterized by similarity analysis. The neighbor-edges based and unbiased leverage algorithm (Nebula) was developed for predicting HLA-peptide binding. Leave-one-out (LOO) validations and two-fold cross-validations were conducted to evaluate the performance of Nebula using the constructed HLA-peptide binding network. Nine modules were identified from analyzing the HLA-peptide binding network with a highest modularity compared to all the random networks. Peptide length and functional side chains of amino acids at certain positions of the peptides were different among the modules. HLA sequences were module dependent to some extent. Nebula archived an overall prediction accuracy of 0.816 in the LOO validations and average accuracy of 0.795 in the two-fold cross-validations and outperformed the method reported in the literature. Network analysis is a useful approach for analyzing large and sparse datasets such as the HLA-peptide binding dataset. The modules identified from the network analysis clustered peptides and HLAs with similar sequences and properties of amino acids. Nebula performed well in the predictions of HLA-peptide binding. We demonstrated that network analysis coupled with Nebula is an efficient approach to understand and predict HLA-peptide binding interactions and thus, could further our understanding of ADRs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ren, Shangjie; Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, California; Hara, Wendy
Purpose: To develop a reliable method to estimate electron density based on anatomic magnetic resonance imaging (MRI) of the brain. Methods and Materials: We proposed a unifying multi-atlas approach for electron density estimation based on standard T1- and T2-weighted MRI. First, a composite atlas was constructed through a voxelwise matching process using multiple atlases, with the goal of mitigating effects of inherent anatomic variations between patients. Next we computed for each voxel 2 kinds of conditional probabilities: (1) electron density given its image intensity on T1- and T2-weighted MR images; and (2) electron density given its spatial location in a referencemore » anatomy, obtained by deformable image registration. These were combined into a unifying posterior probability density function using the Bayesian formalism, which provided the optimal estimates for electron density. We evaluated the method on 10 patients using leave-one-patient-out cross-validation. Receiver operating characteristic analyses for detecting different tissue types were performed. Results: The proposed method significantly reduced the errors in electron density estimation, with a mean absolute Hounsfield unit error of 119, compared with 140 and 144 (P<.0001) using conventional T1-weighted intensity and geometry-based approaches, respectively. For detection of bony anatomy, the proposed method achieved an 89% area under the curve, 86% sensitivity, 88% specificity, and 90% accuracy, which improved upon intensity and geometry-based approaches (area under the curve: 79% and 80%, respectively). Conclusion: The proposed multi-atlas approach provides robust electron density estimation and bone detection based on anatomic MRI. If validated on a larger population, our work could enable the use of MRI as a primary modality for radiation treatment planning.« less
2015-01-01
Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD. PMID:24749745
Xia, Jie; Jin, Hongwei; Liu, Zhenming; Zhang, Liangren; Wang, Xiang Simon
2014-05-27
Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the "artificial enrichment" and "analogue bias" of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD.
NASA Astrophysics Data System (ADS)
Cortesi, N.; Trigo, R.; Gonzalez-Hidalgo, J. C.; Ramos, A. M.
2012-06-01
Precipitation over the Iberian Peninsula (IP) is highly variable and shows large spatial contrasts between wet mountainous regions, to the north, and dry regions in the inland plains and southern areas. In this work, a high-density monthly precipitation dataset for the IP was coupled with a set of 26 atmospheric circulation weather types (Trigo and DaCamara, 2000) to reconstruct Iberian monthly precipitation from October to May with a very high resolution of 3030 precipitation series (overall mean density one station each 200 km2). A stepwise linear regression model with forward selection was used to develop monthly reconstructed precipitation series calibrated and validated over 1948-2003 period. Validation was conducted by means of a leave-one-out cross-validation over the calibration period. The results show a good model performance for selected months, with a mean coefficient of variation (CV) around 0.6 for validation period, being particularly robust over the western and central sectors of IP, while the predicted values in the Mediterranean and northern coastal areas are less acute. We show for three long stations (Lisbon, Madrid and Valencia) the comparison between model and original data as an example to how these models can be used in order to obtain monthly precipitation fields since the 1850s over most of IP for this very high density network.
Detection of degenerative change in lateral projection cervical spine x-ray images
NASA Astrophysics Data System (ADS)
Jebri, Beyrem; Phillips, Michael; Knapp, Karen; Appelboam, Andy; Reuben, Adam; Slabaugh, Greg
2015-03-01
Degenerative changes to the cervical spine can be accompanied by neck pain, which can result from narrowing of the intervertebral disc space and growth of osteophytes. In a lateral x-ray image of the cervical spine, degenerative changes are characterized by vertebral bodies that have indistinct boundaries and limited spacing between vertebrae. In this paper, we present a machine learning approach to detect and localize degenerative changes in lateral x-ray images of the cervical spine. Starting from a user-supplied set of points in the center of each vertebral body, we fit a central spline, from which a region of interest is extracted and image features are computed. A Random Forest classifier labels regions as degenerative change or normal. Leave-one-out cross-validation studies performed on a dataset of 103 patients demonstrates performance of above 95% accuracy.
Basavanhally, Ajay; Viswanath, Satish; Madabhushi, Anant
2015-01-01
Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets. PMID:25993029
NASA Astrophysics Data System (ADS)
Mansuy, N. R.; Paré, D.; Thiffault, E.
2015-12-01
Large-scale mapping of soil properties is increasingly important for environmental resource management. Whileforested areas play critical environmental roles at local and global scales, forest soil maps are typically at lowresolution.The objective of this study was to generate continuous national maps of selected soil variables (C, N andsoil texture) for the Canadian managed forest landbase at 250 m resolution. We produced these maps using thekNN method with a training dataset of 538 ground-plots fromthe National Forest Inventory (NFI) across Canada,and 18 environmental predictor variables. The best predictor variables were selected (7 topographic and 5 climaticvariables) using the Least Absolute Shrinkage and Selection Operator method. On average, for all soil variables,topographic predictors explained 37% of the total variance versus 64% for the climatic predictors. Therelative root mean square error (RMSE%) calculated with the leave-one-out cross-validation method gave valuesranging between 22% and 99%, depending on the soil variables tested. RMSE values b 40% can be considered agood imputation in light of the low density of points used in this study. The study demonstrates strong capabilitiesfor mapping forest soil properties at 250m resolution, compared with the current Soil Landscape of CanadaSystem, which is largely oriented towards the agricultural landbase. The methodology used here can potentiallycontribute to the national and international need for spatially explicit soil information in resource managementscience.
Protein Sub-Nuclear Localization Prediction Using SVM and Pfam Domain Information
Kumar, Ravindra; Jain, Sohni; Kumari, Bandana; Kumar, Manish
2014-01-01
The nucleus is the largest and the highly organized organelle of eukaryotic cells. Within nucleus exist a number of pseudo-compartments, which are not separated by any membrane, yet each of them contains only a specific set of proteins. Understanding protein sub-nuclear localization can hence be an important step towards understanding biological functions of the nucleus. Here we have described a method, SubNucPred developed by us for predicting the sub-nuclear localization of proteins. This method predicts protein localization for 10 different sub-nuclear locations sequentially by combining presence or absence of unique Pfam domain and amino acid composition based SVM model. The prediction accuracy during leave-one-out cross-validation for centromeric proteins was 85.05%, for chromosomal proteins 76.85%, for nuclear speckle proteins 81.27%, for nucleolar proteins 81.79%, for nuclear envelope proteins 79.37%, for nuclear matrix proteins 77.78%, for nucleoplasm proteins 76.98%, for nuclear pore complex proteins 88.89%, for PML body proteins 75.40% and for telomeric proteins it was 83.33%. Comparison with other reported methods showed that SubNucPred performs better than existing methods. A web-server for predicting protein sub-nuclear localization named SubNucPred has been established at http://14.139.227.92/mkumar/subnucpred/. Standalone version of SubNucPred can also be downloaded from the web-server. PMID:24897370
Multivariate reference technique for quantitative analysis of fiber-optic tissue Raman spectroscopy.
Bergholt, Mads Sylvest; Duraipandian, Shiyamala; Zheng, Wei; Huang, Zhiwei
2013-12-03
We report a novel method making use of multivariate reference signals of fused silica and sapphire Raman signals generated from a ball-lens fiber-optic Raman probe for quantitative analysis of in vivo tissue Raman measurements in real time. Partial least-squares (PLS) regression modeling is applied to extract the characteristic internal reference Raman signals (e.g., shoulder of the prominent fused silica boson peak (~130 cm(-1)); distinct sapphire ball-lens peaks (380, 417, 646, and 751 cm(-1))) from the ball-lens fiber-optic Raman probe for quantitative analysis of fiber-optic Raman spectroscopy. To evaluate the analytical value of this novel multivariate reference technique, a rapid Raman spectroscopy system coupled with a ball-lens fiber-optic Raman probe is used for in vivo oral tissue Raman measurements (n = 25 subjects) under 785 nm laser excitation powers ranging from 5 to 65 mW. An accurate linear relationship (R(2) = 0.981) with a root-mean-square error of cross validation (RMSECV) of 2.5 mW can be obtained for predicting the laser excitation power changes based on a leave-one-subject-out cross-validation, which is superior to the normal univariate reference method (RMSE = 6.2 mW). A root-mean-square error of prediction (RMSEP) of 2.4 mW (R(2) = 0.985) can also be achieved for laser power prediction in real time when we applied the multivariate method independently on the five new subjects (n = 166 spectra). We further apply the multivariate reference technique for quantitative analysis of gelatin tissue phantoms that gives rise to an RMSEP of ~2.0% (R(2) = 0.998) independent of laser excitation power variations. This work demonstrates that multivariate reference technique can be advantageously used to monitor and correct the variations of laser excitation power and fiber coupling efficiency in situ for standardizing the tissue Raman intensity to realize quantitative analysis of tissue Raman measurements in vivo, which is particularly appealing in challenging Raman endoscopic applications.
Computer-aided detection of bladder mass within non-contrast-enhanced region of CT Urography (CTU)
NASA Astrophysics Data System (ADS)
Cha, Kenny H.; Hadjiiski, Lubomir M.; Chan, Heang-Ping; Caoili, Elaine M.; Cohan, Richard H.; Weizer, Alon; Zhou, Chuan
2016-03-01
We are developing a computer-aided detection system for bladder cancer in CT urography (CTU). We have previously developed methods for detection of bladder masses within the contrast-enhanced region of the bladder. In this study, we investigated methods for detection of bladder masses within the non-contrast enhanced region. The bladder was first segmented using a newly developed deep-learning convolutional neural network in combination with level sets. The non-contrast-enhanced region was separated from the contrast-enhanced region with a maximum-intensityprojection- based method. The non-contrast region was smoothed and a gray level threshold was employed to segment the bladder wall and potential masses. The bladder wall was transformed into a straightened thickness profile, which was analyzed to identify lesion candidates as a prescreening step. The lesion candidates were segmented using our autoinitialized cascaded level set (AI-CALS) segmentation method, and 27 morphological features were extracted for each candidate. Stepwise feature selection with simplex optimization and leave-one-case-out resampling were used for training and validation of a false positive (FP) classifier. In each leave-one-case-out cycle, features were selected from the training cases and a linear discriminant analysis (LDA) classifier was designed to merge the selected features into a single score for classification of the left-out test case. A data set of 33 cases with 42 biopsy-proven lesions in the noncontrast enhanced region was collected. During prescreening, the system obtained 83.3% sensitivity at an average of 2.4 FPs/case. After feature extraction and FP reduction by LDA, the system achieved 81.0% sensitivity at 2.0 FPs/case, and 73.8% sensitivity at 1.5 FPs/case.
Spatial-temporal features of thermal images for Carpal Tunnel Syndrome detection
NASA Astrophysics Data System (ADS)
Estupinan Roldan, Kevin; Ortega Piedrahita, Marco A.; Benitez, Hernan D.
2014-02-01
Disorders associated with repeated trauma account for about 60% of all occupational illnesses, Carpal Tunnel Syndrome (CTS) being the most consulted today. Infrared Thermography (IT) has come to play an important role in the field of medicine. IT is non-invasive and detects diseases based on measuring temperature variations. IT represents a possible alternative to prevalent methods for diagnosis of CTS (i.e. nerve conduction studies and electromiography). This work presents a set of spatial-temporal features extracted from thermal images taken in healthy and ill patients. Support Vector Machine (SVM) classifiers test this feature space with Leave One Out (LOO) validation error. The results of the proposed approach show linear separability and lower validation errors when compared to features used in previous works that do not account for temperature spatial variability.
Yu, Shaohui; Xiao, Xue; Ding, Hong; Xu, Ge; Li, Haixia; Liu, Jing
2017-08-05
The quantitative analysis is very difficult for the emission-excitation fluorescence spectroscopy of multi-component mixtures whose fluorescence peaks are serious overlapping. As an effective method for the quantitative analysis, partial least squares can extract the latent variables from both the independent variables and the dependent variables, so it can model for multiple correlations between variables. However, there are some factors that usually affect the prediction results of partial least squares, such as the noise, the distribution and amount of the samples in calibration set etc. This work focuses on the problems in the calibration set that are mentioned above. Firstly, the outliers in the calibration set are removed by leave-one-out cross-validation. Then, according to two different prediction requirements, the EWPLS method and the VWPLS method are proposed. The independent variables and dependent variables are weighted in the EWPLS method by the maximum error of the recovery rate and weighted in the VWPLS method by the maximum variance of the recovery rate. Three organic matters with serious overlapping excitation-emission fluorescence spectroscopy are selected for the experiments. The step adjustment parameter, the iteration number and the sample amount in the calibration set are discussed. The results show the EWPLS method and the VWPLS method are superior to the PLS method especially for the case of small samples in the calibration set. Copyright © 2017 Elsevier B.V. All rights reserved.
Douglas, R K; Nawar, S; Alamar, M C; Mouazen, A M; Coulon, F
2018-03-01
Visible and near infrared spectrometry (vis-NIRS) coupled with data mining techniques can offer fast and cost-effective quantitative measurement of total petroleum hydrocarbons (TPH) in contaminated soils. Literature showed however significant differences in the performance on the vis-NIRS between linear and non-linear calibration methods. This study compared the performance of linear partial least squares regression (PLSR) with a nonlinear random forest (RF) regression for the calibration of vis-NIRS when analysing TPH in soils. 88 soil samples (3 uncontaminated and 85 contaminated) collected from three sites located in the Niger Delta were scanned using an analytical spectral device (ASD) spectrophotometer (350-2500nm) in diffuse reflectance mode. Sequential ultrasonic solvent extraction-gas chromatography (SUSE-GC) was used as reference quantification method for TPH which equal to the sum of aliphatic and aromatic fractions ranging between C 10 and C 35 . Prior to model development, spectra were subjected to pre-processing including noise cut, maximum normalization, first derivative and smoothing. Then 65 samples were selected as calibration set and the remaining 20 samples as validation set. Both vis-NIR spectrometry and gas chromatography profiles of the 85 soil samples were subjected to RF and PLSR with leave-one-out cross-validation (LOOCV) for the calibration models. Results showed that RF calibration model with a coefficient of determination (R 2 ) of 0.85, a root means square error of prediction (RMSEP) 68.43mgkg -1 , and a residual prediction deviation (RPD) of 2.61 outperformed PLSR (R 2 =0.63, RMSEP=107.54mgkg -1 and RDP=2.55) in cross-validation. These results indicate that RF modelling approach is accounting for the nonlinearity of the soil spectral responses hence, providing significantly higher prediction accuracy compared to the linear PLSR. It is recommended to adopt the vis-NIRS coupled with RF modelling approach as a portable and cost effective method for the rapid quantification of TPH in soils. Copyright © 2017 Elsevier B.V. All rights reserved.
Tang, Rongnian; Chen, Xupeng; Li, Chuang
2018-05-01
Near-infrared spectroscopy is an efficient, low-cost technology that has potential as an accurate method in detecting the nitrogen content of natural rubber leaves. Successive projections algorithm (SPA) is a widely used variable selection method for multivariate calibration, which uses projection operations to select a variable subset with minimum multi-collinearity. However, due to the fluctuation of correlation between variables, high collinearity may still exist in non-adjacent variables of subset obtained by basic SPA. Based on analysis to the correlation matrix of the spectra data, this paper proposed a correlation-based SPA (CB-SPA) to apply the successive projections algorithm in regions with consistent correlation. The result shows that CB-SPA can select variable subsets with more valuable variables and less multi-collinearity. Meanwhile, models established by the CB-SPA subset outperform basic SPA subsets in predicting nitrogen content in terms of both cross-validation and external prediction. Moreover, CB-SPA is assured to be more efficient, for the time cost in its selection procedure is one-twelfth that of the basic SPA.
LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction
Huang, Li
2017-01-01
Predicting novel microRNA (miRNA)-disease associations is clinically significant due to miRNAs’ potential roles of diagnostic biomarkers and therapeutic targets for various human diseases. Previous studies have demonstrated the viability of utilizing different types of biological data to computationally infer new disease-related miRNAs. Yet researchers face the challenge of how to effectively integrate diverse datasets and make reliable predictions. In this study, we presented a computational model named Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction (LRSSLMDA), which projected miRNAs/diseases’ statistical feature profile and graph theoretical feature profile to a common subspace. It used Laplacian regularization to preserve the local structures of the training data and a L1-norm constraint to select important miRNA/disease features for prediction. The strength of dimensionality reduction enabled the model to be easily extended to much higher dimensional datasets than those exploited in this study. Experimental results showed that LRSSLMDA outperformed ten previous models: the AUC of 0.9178 in global leave-one-out cross validation (LOOCV) and the AUC of 0.8418 in local LOOCV indicated the model’s superior prediction accuracy; and the average AUC of 0.9181+/-0.0004 in 5-fold cross validation justified its accuracy and stability. In addition, three types of case studies further demonstrated its predictive power. Potential miRNAs related to Colon Neoplasms, Lymphoma, Kidney Neoplasms, Esophageal Neoplasms and Breast Neoplasms were predicted by LRSSLMDA. Respectively, 98%, 88%, 96%, 98% and 98% out of the top 50 predictions were validated by experimental evidences. Therefore, we conclude that LRSSLMDA would be a valuable computational tool for miRNA-disease association prediction. PMID:29253885
GIMDA: Graphlet interaction-based MiRNA-disease association prediction.
Chen, Xing; Guan, Na-Na; Li, Jian-Qiang; Yan, Gui-Ying
2018-03-01
MicroRNAs (miRNAs) have been confirmed to be closely related to various human complex diseases by many experimental studies. It is necessary and valuable to develop powerful and effective computational models to predict potential associations between miRNAs and diseases. In this work, we presented a prediction model of Graphlet Interaction for MiRNA-Disease Association prediction (GIMDA) by integrating the disease semantic similarity, miRNA functional similarity, Gaussian interaction profile kernel similarity and the experimentally confirmed miRNA-disease associations. The related score of a miRNA to a disease was calculated by measuring the graphlet interactions between two miRNAs or two diseases. The novelty of GIMDA lies in that we used graphlet interaction to analyse the complex relationships between two nodes in a graph. The AUCs of GIMDA in global and local leave-one-out cross-validation (LOOCV) turned out to be 0.9006 and 0.8455, respectively. The average result of five-fold cross-validation reached to 0.8927 ± 0.0012. In case study for colon neoplasms, kidney neoplasms and prostate neoplasms based on the database of HMDD V2.0, 45, 45, 41 of the top 50 potential miRNAs predicted by GIMDA were validated by dbDEMC and miR2Disease. Additionally, in the case study of new diseases without any known associated miRNAs and the case study of predicting potential miRNA-disease associations using HMDD V1.0, there were also high percentages of top 50 miRNAs verified by the experimental literatures. © 2017 The Authors. Journal of Cellular and Molecular Medicine published by John Wiley & Sons Ltd and Foundation for Cellular and Molecular Medicine.
Barroso, Pedro José; Martín, Julia; Santos, Juan Luis; Aparicio, Irene; Alonso, Esteban
2018-01-01
In this work, an analytical method, based on sonication-assisted extraction, clean-up by dispersive solid-phase extraction and determination by liquid chromatography-tandem mass spectrometry, has been developed and validated for the simultaneous determination of 15 emerging pollutants in leaves from four ornamental tree species. Target compounds include perfluorinated organic compounds, plasticizers, surfactants, brominated flame retardant, and preservatives. The method was optimized using Box-Behnken statistical experimental design with response surface methodology and validated in terms of recovery, accuracy, precision, and method detection and quantification limits. Quantification of target compounds was carried out using matrix-matched calibration curves. The highest recoveries were achieved for the perfluorinated organic compounds (mean values up to 87%) and preservatives (up to 88%). The lowest recoveries were achieved for plasticizers (51%) and brominated flame retardant (63%). Method detection and quantification limits were in the ranges 0.01-0.09 ng/g dry matter (dm) and 0.02-0.30 ng/g dm, respectively, for most of the target compounds. The method was successfully applied to the determination of the target compounds on leaves from four tree species used as urban ornamental trees (Citrus aurantium, Celtis australis, Platanus hispanica, and Jacaranda mimosifolia). Graphical abstract Analytical method for the biomonitorization of emerging pollutants in outdoor air.
Breast-Lesion Characterization using Textural Features of Quantitative Ultrasound Parametric Maps.
Sadeghi-Naini, Ali; Suraweera, Harini; Tran, William Tyler; Hadizad, Farnoosh; Bruni, Giancarlo; Rastegar, Rashin Fallah; Curpen, Belinda; Czarnota, Gregory J
2017-10-20
This study evaluated, for the first time, the efficacy of quantitative ultrasound (QUS) spectral parametric maps in conjunction with texture-analysis techniques to differentiate non-invasively benign versus malignant breast lesions. Ultrasound B-mode images and radiofrequency data were acquired from 78 patients with suspicious breast lesions. QUS spectral-analysis techniques were performed on radiofrequency data to generate parametric maps of mid-band fit, spectral slope, spectral intercept, spacing among scatterers, average scatterer diameter, and average acoustic concentration. Texture-analysis techniques were applied to determine imaging biomarkers consisting of mean, contrast, correlation, energy and homogeneity features of parametric maps. These biomarkers were utilized to classify benign versus malignant lesions with leave-one-patient-out cross-validation. Results were compared to histopathology findings from biopsy specimens and radiology reports on MR images to evaluate the accuracy of technique. Among the biomarkers investigated, one mean-value parameter and 14 textural features demonstrated statistically significant differences (p < 0.05) between the two lesion types. A hybrid biomarker developed using a stepwise feature selection method could classify the legions with a sensitivity of 96%, a specificity of 84%, and an AUC of 0.97. Findings from this study pave the way towards adapting novel QUS-based frameworks for breast cancer screening and rapid diagnosis in clinic.
RKNNMDA: Ranking-based KNN for MiRNA-Disease Association prediction.
Chen, Xing; Wu, Qiao-Feng; Yan, Gui-Ying
2017-07-03
Cumulative verified experimental studies have demonstrated that microRNAs (miRNAs) could be closely related with the development and progression of human complex diseases. Based on the assumption that functional similar miRNAs may have a strong correlation with phenotypically similar diseases and vice versa, researchers developed various effective computational models which combine heterogeneous biologic data sets including disease similarity network, miRNA similarity network, and known disease-miRNA association network to identify potential relationships between miRNAs and diseases in biomedical research. Considering the limitations in previous computational study, we introduced a novel computational method of Ranking-based KNN for miRNA-Disease Association prediction (RKNNMDA) to predict potential related miRNAs for diseases, and our method obtained an AUC of 0.8221 based on leave-one-out cross validation. In addition, RKNNMDA was applied to 3 kinds of important human cancers for further performance evaluation. The results showed that 96%, 80% and 94% of predicted top 50 potential related miRNAs for Colon Neoplasms, Esophageal Neoplasms, and Prostate Neoplasms have been confirmed by experimental literatures, respectively. Moreover, RKNNMDA could be used to predict potential miRNAs for diseases without any known miRNAs, and it is anticipated that RKNNMDA would be of great use for novel miRNA-disease association identification.
RKNNMDA: Ranking-based KNN for MiRNA-Disease Association prediction
Chen, Xing; Yan, Gui-Ying
2017-01-01
ABSTRACT Cumulative verified experimental studies have demonstrated that microRNAs (miRNAs) could be closely related with the development and progression of human complex diseases. Based on the assumption that functional similar miRNAs may have a strong correlation with phenotypically similar diseases and vice versa, researchers developed various effective computational models which combine heterogeneous biologic data sets including disease similarity network, miRNA similarity network, and known disease-miRNA association network to identify potential relationships between miRNAs and diseases in biomedical research. Considering the limitations in previous computational study, we introduced a novel computational method of Ranking-based KNN for miRNA-Disease Association prediction (RKNNMDA) to predict potential related miRNAs for diseases, and our method obtained an AUC of 0.8221 based on leave-one-out cross validation. In addition, RKNNMDA was applied to 3 kinds of important human cancers for further performance evaluation. The results showed that 96%, 80% and 94% of predicted top 50 potential related miRNAs for Colon Neoplasms, Esophageal Neoplasms, and Prostate Neoplasms have been confirmed by experimental literatures, respectively. Moreover, RKNNMDA could be used to predict potential miRNAs for diseases without any known miRNAs, and it is anticipated that RKNNMDA would be of great use for novel miRNA-disease association identification. PMID:28421868
NASA Astrophysics Data System (ADS)
Lin, Z. D.; Wang, Y. B.; Wang, R. J.; Wang, L. S.; Lu, C. P.; Zhang, Z. Y.; Song, L. T.; Liu, Y.
2017-07-01
A total of 130 topsoil samples collected from Guoyang County, Anhui Province, China, were used to establish a Vis-NIR model for the prediction of organic matter content (OMC) in lime concretion black soils. Different spectral pretreatments were applied for minimizing the irrelevant and useless information of the spectra and increasing the spectra correlation with the measured values. Subsequently, the Kennard-Stone (KS) method and sample set partitioning based on joint x-y distances (SPXY) were used to select the training set. Successive projection algorithm (SPA) and genetic algorithm (GA) were then applied for wavelength optimization. Finally, the principal component regression (PCR) model was constructed, in which the optimal number of principal components was determined using the leave-one-out cross validation technique. The results show that the combination of the Savitzky-Golay (SG) filter for smoothing and multiplicative scatter correction (MSC) can eliminate the effect of noise and baseline drift; the SPXY method is preferable to KS in the sample selection; both the SPA and the GA can significantly reduce the number of wavelength variables and favorably increase the accuracy, especially GA, which greatly improved the prediction accuracy of soil OMC with Rcc, RMSEP, and RPD up to 0.9316, 0.2142, and 2.3195, respectively.
Chen, Qiu-Feng; Chen, Hua-Jun; Liu, Jun; Sun, Tao; Shen, Qun-Tai
2016-01-01
Machine learning-based approaches play an important role in examining functional magnetic resonance imaging (fMRI) data in a multivariate manner and extracting features predictive of group membership. This study was performed to assess the potential for measuring brain intrinsic activity to identify minimal hepatic encephalopathy (MHE) in cirrhotic patients, using the support vector machine (SVM) method. Resting-state fMRI data were acquired in 16 cirrhotic patients with MHE and 19 cirrhotic patients without MHE. The regional homogeneity (ReHo) method was used to investigate the local synchrony of intrinsic brain activity. Psychometric Hepatic Encephalopathy Score (PHES) was used to define MHE condition. SVM-classifier was then applied using leave-one-out cross-validation, to determine the discriminative ReHo-map for MHE. The discrimination map highlights a set of regions, including the prefrontal cortex, anterior cingulate cortex, anterior insular cortex, inferior parietal lobule, precentral and postcentral gyri, superior and medial temporal cortices, and middle and inferior occipital gyri. The optimized discriminative model showed total accuracy of 82.9% and sensitivity of 81.3%. Our results suggested that a combination of the SVM approach and brain intrinsic activity measurement could be helpful for detection of MHE in cirrhotic patients.
NASA Astrophysics Data System (ADS)
Cicchi, Riccardo; Anand, Suresh; Fantechi, Riccardo; Giordano, Flavio; Gacci, Mauro; Conti, Valerio; Nesi, Gabriella; Buccoliero, Anna Maria; Carini, Marco; Guerrini, Renzo; Pavone, Francesco Saverio
2017-07-01
An optical fiber probe for multimodal spectroscopy was designed, developed and used for tissue diagnostics. The probe, based on a fiber bundle with optical fibers of various size and properties, allows performing spectroscopic measurements with different techniques, including fluorescence, Raman, and diffuse reflectance, using the same probe. Two visible laser diodes were used for fluorescence spectroscopy, a laser diode emitting in the NIR was used for Raman spectroscopy, and a fiber-coupled halogen lamp for diffuse reflectance. The developed probe was successfully employed for diagnostic purposes on various tissues, including brain and bladder. In particular, the device allowed discriminating healthy tissue from both tumor and dysplastic tissue as well as to perform tumor grading. The diagnostic capabilities of the method, determined using a cross-validation method with a leave-one-out approach, demonstrated high sensitivity and specificity for all the examined samples, as well as a good agreement with histopathological examination performed on the same samples. The obtained results demonstrated that the multimodal approach is crucial for improving diagnostic capabilities with respect to what can be obtained from individual techniques. The experimental setup presented here can improve diagnostic capabilities on a broad range of tissues and has the potential of being used clinically for guiding surgical resection in the near future.
In vivo Raman spectroscopy for oral cancers diagnosis
NASA Astrophysics Data System (ADS)
Singh, S. P.; Deshmukh, Atul; Chaturvedi, Pankaj; Krishna, C. Murali
2012-01-01
Oral squamous cell carcinoma is sixth among the major malignancies worldwide. Tobacco habits are known as major causative factor in tumor carcinogenesis in oral cancer. Optical spectroscopy methods, including Raman, are being actively pursued as alternative/adjunct for cancer diagnosis. Earlier studies have demonstrated the feasibility of classifying normal, premalignant and malignant oral ex-vivo tissues. In the present study we have recorded in vivo spectra from contralateral normal and diseased sites of 50 subjects with pathologically confirmed lesions of buccal mucosa using fiber-optic-probe-coupled HE-785 Raman spectrometer. Spectra were recorded on similar points as per teeth positions with an average acquisition time of 8 seconds. A total of 215 and 225 spectra from normal and tumor sites, respectively, were recorded. Finger print region (1200-1800 cm-1) was utilized for classification using LDA. Standard-model was developed using 125 normal and 139 tumor spectra from 27 subjects. Two separate clusters with an efficiency of ~95% were obtained. Cross-validation with leave-one-out yielded ~90% efficiency. Remaining 90 normal and 86 tumor spectra were used as test data and predication efficiency of model was evaluated. Findings of the study indicate that Raman spectroscopic methods in combination with appropriate multivariate tool can be used for objective, noninvasive and rapid diagnosis.
Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.
Williams, Philip H; Eyles, Rod; Weiller, Georg
2012-01-01
MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.
Beyer, Andreas; Grohganz, Holger; Löbmann, Korbinian; Rades, Thomas; Leopold, Claudia S
2015-10-27
To benefit from the optimized dissolution properties of active pharmaceutical ingredients in their amorphous forms, co-amorphisation as a viable tool to stabilize these amorphous phases is of both academic and industrial interest. Reports dealing with the physical stability and recrystallization behavior of co-amorphous systems are however limited to qualitative evaluations based on the corresponding X-ray powder diffractograms. Therefore, the objective of the study was to develop a quantification model based on X-ray powder diffractometry (XRPD), followed by a multivariate partial least squares regression approach that enables the simultaneous determination of up to four solid state fractions: crystalline naproxen, γ-indomethacin, α-indomethacin as well as co-amorphous naproxen-indomethacin. For this purpose, a calibration set that covers the whole range of possible combinations of the four components was prepared and analyzed by XRPD. In order to test the model performances, leave-one-out cross validation was performed and revealed root mean square errors of validation between 3.11% and 3.45% for the crystalline molar fractions and 5.57% for the co-amorphous molar fraction. In summary, even four solid state phases, involving one co-amorphous phase, can be quantified with this XRPD data-based approach.
Lo, Wen-Yen; Chien, Li-Yin; Hwang, Fang-Ming; Huang, Nicole; Chiou, Shu-Ti
2018-03-01
The aim of this study was to examine the structural relationships linking job stress to leaving intentions through job satisfaction, depressed mood and stress adaptation among hospital nurses. High turnover among nurses is a global concern. Structural relationships linking job stress to leaving intentions have not been thoroughly examined. Two nationwide cross-sectional surveys of full-time hospital staff in 2011 and 2014. The study participants were 26,945 and 19,386 full-time clinical nurses in 2011 and 2014 respectively. Structural equation modelling was used to examine the interrelationships among the study variables based on the hypothesized model. We used cross-validation procedures to ensure the stability and validity of the model in the two samples. There were five main paths from job stress to intention to leave the hospital. In addition to the direct path, job stress directly affected job satisfaction and depressed mood, which in turn affected intention to leave the hospital. Stress adaptation mitigated the effects of job stress on job satisfaction and depressed mood, which led to intention to leave the hospital. Intention to leave the hospital preceded intention to leave the profession. Those variables explained about 55% of the variance in intention to leave the profession in both years. The model fit was good for both samples, suggesting validity of the model. Strategies to decrease turnover intentions among nurses could focus on creating a less stressful work environment, increasing job satisfaction and stress adaptation and decreasing depressed mood. Hospitals should cooperate in this issue to decrease nurse turnover. © 2017 John Wiley & Sons Ltd.
Real-Time Food Authentication Using a Miniature Mass Spectrometer.
Gerbig, Stefanie; Neese, Stephan; Penner, Alexander; Spengler, Bernhard; Schulz, Sabine
2017-10-17
Food adulteration is a threat to public health and the economy. In order to determine food adulteration efficiently, rapid and easy-to-use on-site analytical methods are needed. In this study, a miniaturized mass spectrometer in combination with three ambient ionization methods was used for food authentication. The chemical fingerprints of three milk types, five fish species, and two coffee types were measured using electrospray ionization, desorption electrospray ionization, and low temperature plasma ionization. Minimum sample preparation was needed for the analysis of liquid and solid food samples. Mass spectrometric data was processed using the laboratory-built software MS food classifier, which allows for the definition of specific food profiles from reference data sets using multivariate statistical methods and the subsequent classification of unknown data. Applicability of the obtained mass spectrometric fingerprints for food authentication was evaluated using different data processing methods, leave-10%-out cross-validation, and real-time classification of new data. Classification accuracy of 100% was achieved for the differentiation of milk types and fish species, and a classification accuracy of 96.4% was achieved for coffee types in cross-validation experiments. Measurement of two milk mixtures yielded correct classification of >94%. For real-time classification, the accuracies were comparable. Functionality of the software program and its performance is described. Processing time for a reference data set and a newly acquired spectrum was found to be 12 s and 2 s, respectively. These proof-of-principle experiments show that the combination of a miniaturized mass spectrometer, ambient ionization, and statistical analysis is suitable for on-site real-time food authentication.
Bullinger, Monika; Quitmann, Julia; Silva, Neuza; Rohenkohl, Anja; Chaplin, John E; DeBusk, Kendra; Mimoun, Emmanuelle; Feigerlova, Eva; Herdman, Michael; Sanz, Dolores; Wollmann, Hartmut; Pleil, Andreas; Power, Michael
2014-01-01
Testing cross-cultural equivalence of patient-reported outcomes requires sufficiently large samples per country, which is difficult to achieve in rare endocrine paediatric conditions. We describe a novel approach to cross-cultural testing of the Quality of Life in Short Stature Youth (QoLISSY) questionnaire in five countries by sequentially taking one country out (TOCO) from the total sample and iteratively comparing the resulting psychometric performance. Development of the QoLISSY proceeded from focus group discussions through pilot testing to field testing in 268 short-statured patients and their parents. To explore cross-cultural equivalence, the iterative TOCO technique was used to examine and compare the validity, reliability, and convergence of patient and parent responses on QoLISSY in the field test dataset, and to predict QoLISSY scores from clinical, socio-demographic and psychosocial variables. Validity and reliability indicators were satisfactory for each sample after iteratively omitting one country. Comparisons with the total sample revealed cross-cultural equivalence in internal consistency and construct validity for patients and parents, high inter-rater agreement and a substantial proportion of QoLISSY variance explained by predictors. The TOCO technique is a powerful method to overcome problems of country-specific testing of patient-reported outcome instruments. It provides an empirical support to QoLISSY's cross-cultural equivalence and is recommended for future research.
Raman spectroscopic studies on exfoliated cells of oral and cervix
NASA Astrophysics Data System (ADS)
Hole, Arti; Sahu, Aditi; Shaikh, Rubina; Tyagi, Gunjan; Murali Krishna, C.
2018-01-01
Visual inspection followed by biopsy is the standard procedure for cancer diagnosis. Due to invasive nature of the current diagnostic methods, patients are often non-compliant. Hence, it is necessary to explore less invasive and rapid methods for early detection. Exfoliative cytology is a simple, rapid, and less invasive technique. It is thus well accepted by patients and is suitable for routine applications in population screening programs. Raman spectroscopy (RS) has been increasingly explored for disease diagnosis in the recent past. In vivo RS has previously shown promise in management of both oral and cervix cancers. In vivo applications require on-site instrumentation and stringent experimental conditions. Hence, RS of less invasive samples like exfoliated cells has been explored, as this facilitates collection at multiple screening centers followed by analysis at a centralized facility. In the present study, efficacy of Raman spectroscopy in classification of 15 normal and 29 abnormal oral exfoliated cells specimens and 28 normal and 38 abnormal cervix specimens were explored. Spectra were acquired by Raman microprobe (HE 785, Horiba-Jobin-Yvon, France) from several areas to span the pellet. Spectral acquisition parameters were: microscopic objective: 40X, power: 40 mW, acquisition time: 15 s and average: 3. PCA and PC-LDA of pre-processed spectra was carried out on a 4-model system of normal and tumor of both cervix and oral specimens. Leave-one-out-cross-validation findings indicate 73 % correct classification. Findings suggest RS of exfoliated cells may serve as a patient-friendly, non-invasive, rapid and objective method for management of cervix and oral cancers.
Predicting radiotherapy outcomes using statistical learning techniques
NASA Astrophysics Data System (ADS)
El Naqa, Issam; Bradley, Jeffrey D.; Lindsay, Patricia E.; Hope, Andrew J.; Deasy, Joseph O.
2009-09-01
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model variables. These models have the capacity to predict on unseen data. Part of this work was first presented at the Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, 11-13 December 2008.
2011-01-01
Background Moving a forensic mental health patient from one level of therapeutic security to a lower level or to the community is influenced by more than risk assessment and risk management. We set out to construct and validate structured professional judgement instruments for consistency and transparency in decision making Methods Two instruments were developed, the seven-item DUNDRUM-3 programme completion instrument and the six item DUNDRUM-4 recovery instrument. These were assessed for all 95 forensic patients at Ireland's only forensic mental health hospital. Results The two instruments had good internal consistency (Cronbach's alpha 0.911 and 0.887). Scores distinguished those allowed no leave or accompanied leave from those with unaccompanied leave (ANOVA F = 38.1 and 50.3 respectively, p < 0.001). Scores also distinguished those in acute/high security units from those in medium or in low secure/pre-discharge units. Each individual item distinguished these levels of need significantly. The DUNDRUM-3 and DUNDRUM-4 correlated moderately with measures of dynamic risk and with the CANFOR staff rated unmet need (Spearman r = 0.5, p < 0.001). Conclusions The DUNDRUM-3 programme completion items distinguished significantly between levels of therapeutic security while the DUNDRUM-4 recovery items consistently distinguished those given unaccompanied leave outside the hospital and those in the lowest levels of therapeutic security. This data forms the basis for a prospective study of outcomes now underway. PMID:21722396
Piao, Yongjun; Piao, Minghao; Ryu, Keun Ho
2017-01-01
Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/. Copyright © 2016 Elsevier Ltd. All rights reserved.
Tackling Missing Data in Community Health Studies Using Additive LS-SVM Classifier.
Wang, Guanjin; Deng, Zhaohong; Choi, Kup-Sze
2018-03-01
Missing data is a common issue in community health and epidemiological studies. Direct removal of samples with missing data can lead to reduced sample size and information bias, which deteriorates the significance of the results. While data imputation methods are available to deal with missing data, they are limited in performance and could introduce noises into the dataset. Instead of data imputation, a novel method based on additive least square support vector machine (LS-SVM) is proposed in this paper for predictive modeling when the input features of the model contain missing data. The method also determines simultaneously the influence of the features with missing values on the classification accuracy using the fast leave-one-out cross-validation strategy. The performance of the method is evaluated by applying it to predict the quality of life (QOL) of elderly people using health data collected in the community. The dataset involves demographics, socioeconomic status, health history, and the outcomes of health assessments of 444 community-dwelling elderly people, with 5% to 60% of data missing in some of the input features. The QOL is measured using a standard questionnaire of the World Health Organization. Results show that the proposed method outperforms four conventional methods for handling missing data-case deletion, feature deletion, mean imputation, and K-nearest neighbor imputation, with the average QOL prediction accuracy reaching 0.7418. It is potentially a promising technique for tackling missing data in community health research and other applications.
Zhu, Jie; Qin, Yufang; Liu, Taigang; Wang, Jun; Zheng, Xiaoqi
2013-01-01
Identification of gene-phenotype relationships is a fundamental challenge in human health clinic. Based on the observation that genes causing the same or similar phenotypes tend to correlate with each other in the protein-protein interaction network, a lot of network-based approaches were proposed based on different underlying models. A recent comparative study showed that diffusion-based methods achieve the state-of-the-art predictive performance. In this paper, a new diffusion-based method was proposed to prioritize candidate disease genes. Diffusion profile of a disease was defined as the stationary distribution of candidate genes given a random walk with restart where similarities between phenotypes are incorporated. Then, candidate disease genes are prioritized by comparing their diffusion profiles with that of the disease. Finally, the effectiveness of our method was demonstrated through the leave-one-out cross-validation against control genes from artificial linkage intervals and randomly chosen genes. Comparative study showed that our method achieves improved performance compared to some classical diffusion-based methods. To further illustrate our method, we used our algorithm to predict new causing genes of 16 multifactorial diseases including Prostate cancer and Alzheimer's disease, and the top predictions were in good consistent with literature reports. Our study indicates that integration of multiple information sources, especially the phenotype similarity profile data, and introduction of global similarity measure between disease and gene diffusion profiles are helpful for prioritizing candidate disease genes. Programs and data are available upon request.
Adaptive Local Spatiotemporal Features from RGB-D Data for One-Shot Learning Gesture Recognition
Lin, Jia; Ruan, Xiaogang; Yu, Naigong; Yang, Yee-Hong
2016-01-01
Noise and constant empirical motion constraints affect the extraction of distinctive spatiotemporal features from one or a few samples per gesture class. To tackle these problems, an adaptive local spatiotemporal feature (ALSTF) using fused RGB-D data is proposed. First, motion regions of interest (MRoIs) are adaptively extracted using grayscale and depth velocity variance information to greatly reduce the impact of noise. Then, corners are used as keypoints if their depth, and velocities of grayscale and of depth meet several adaptive local constraints in each MRoI. With further filtering of noise, an accurate and sufficient number of keypoints is obtained within the desired moving body parts (MBPs). Finally, four kinds of multiple descriptors are calculated and combined in extended gradient and motion spaces to represent the appearance and motion features of gestures. The experimental results on the ChaLearn gesture, CAD-60 and MSRDailyActivity3D datasets demonstrate that the proposed feature achieves higher performance compared with published state-of-the-art approaches under the one-shot learning setting and comparable accuracy under the leave-one-out cross validation. PMID:27999337
Adaptive Local Spatiotemporal Features from RGB-D Data for One-Shot Learning Gesture Recognition.
Lin, Jia; Ruan, Xiaogang; Yu, Naigong; Yang, Yee-Hong
2016-12-17
Noise and constant empirical motion constraints affect the extraction of distinctive spatiotemporal features from one or a few samples per gesture class. To tackle these problems, an adaptive local spatiotemporal feature (ALSTF) using fused RGB-D data is proposed. First, motion regions of interest (MRoIs) are adaptively extracted using grayscale and depth velocity variance information to greatly reduce the impact of noise. Then, corners are used as keypoints if their depth, and velocities of grayscale and of depth meet several adaptive local constraints in each MRoI. With further filtering of noise, an accurate and sufficient number of keypoints is obtained within the desired moving body parts (MBPs). Finally, four kinds of multiple descriptors are calculated and combined in extended gradient and motion spaces to represent the appearance and motion features of gestures. The experimental results on the ChaLearn gesture, CAD-60 and MSRDailyActivity3D datasets demonstrate that the proposed feature achieves higher performance compared with published state-of-the-art approaches under the one-shot learning setting and comparable accuracy under the leave-one-out cross validation.
Biagiotti, R; Desii, C; Vanzi, E; Gacci, G
1999-02-01
To compare the performance of artificial neural networks (ANNs) with that of multiple logistic regression (MLR) models for predicting ovarian malignancy in patients with adnexal masses by using transvaginal B-mode and color Doppler flow ultrasonography (US). A total of 226 adnexal masses were examined before surgery: Fifty-one were malignant and 175 were benign. The data were divided into training and testing subsets by using a "leave n out method." The training subsets were used to compute the optimum MLR equations and to train the ANNs. The cross-validation subsets were used to estimate the performance of each of the two models in predicting ovarian malignancy. At testing, three-layer back-propagation networks, based on the same input variables selected by using MLR (i.e., women's ages, papillary projections, random echogenicity, peak systolic velocity, and resistance index), had a significantly higher sensitivity than did MLR (96% vs 84%; McNemar test, p = .04). The Brier scores for ANNs were significantly lower than those calculated for MLR (Student t test for paired samples, P = .004). ANNs might have potential for categorizing adnexal masses as either malignant or benign on the basis of multiple variables related to demographic and US features.
Intraoperative Raman spectroscopy of soft tissue sarcomas.
Nguyen, John Q; Gowani, Zain S; O'Connor, Maggie; Pence, Isaac J; Nguyen, The-Quyen; Holt, Ginger E; Schwartz, Herbert S; Halpern, Jennifer L; Mahadevan-Jansen, Anita
2016-10-01
Soft tissue sarcomas (STS) are a rare and heterogeneous group of malignant tumors that are often treated through surgical resection. Current intraoperative margin assessment methods are limited and highlight the need for an improved approach with respect to time and specificity. Here we investigate the potential of near-infrared Raman spectroscopy for the intraoperative differentiation of STS from surrounding normal tissue. In vivo Raman measurements at 785 nm excitation were intraoperatively acquired from subjects undergoing STS resection using a probe based spectroscopy system. A multivariate classification algorithm was developed in order to automatically identify spectral features that can be used to differentiate STS from the surrounding normal muscle and fat. The classification algorithm was subsequently tested using leave-one-subject-out cross-validation. With the exclusion of well-differentiated liposarcomas, the algorithm was able to classify STS from the surrounding normal muscle and fat with a sensitivity and specificity of 89.5% and 96.4%, respectively. These results suggest that single point near-infrared Raman spectroscopy could be utilized as a rapid and non-destructive surgical guidance tool for identifying abnormal tissue margins in need of further excision. Lasers Surg. Med. 48:774-781, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Development of a real time activity monitoring Android application utilizing SmartStep.
Hegde, Nagaraj; Melanson, Edward; Sazonov, Edward
2016-08-01
Footwear based activity monitoring systems are becoming popular in academic research as well as consumer industry segments. In our previous work, we had presented developmental aspects of an insole based activity and gait monitoring system-SmartStep, which is a socially acceptable, fully wireless and versatile insole. The present work describes the development of an Android application that captures the SmartStep data wirelessly over Bluetooth Low energy (BLE), computes features on the received data, runs activity classification algorithms and provides real time feedback. The development of activity classification methods was based on the the data from a human study involving 4 participants. Participants were asked to perform activities of sitting, standing, walking, and cycling while they wore SmartStep insole system. Multinomial Logistic Discrimination (MLD) was utilized in the development of machine learning model for activity prediction. The resulting classification model was implemented in an Android Smartphone. The Android application was benchmarked for power consumption and CPU loading. Leave one out cross validation resulted in average accuracy of 96.9% during model training phase. The Android application for real time activity classification was tested on a human subject wearing SmartStep resulting in testing accuracy of 95.4%.
NASA Astrophysics Data System (ADS)
Hao, Ling; Greer, Tyler; Page, David; Shi, Yatao; Vezina, Chad M.; Macoska, Jill A.; Marker, Paul C.; Bjorling, Dale E.; Bushman, Wade; Ricke, William A.; Li, Lingjun
2016-08-01
Lower urinary tract symptoms (LUTS) are a range of irritative or obstructive symptoms that commonly afflict aging population. The diagnosis is mostly based on patient-reported symptoms, and current medication often fails to completely eliminate these symptoms. There is a pressing need for objective non-invasive approaches to measure symptoms and understand disease mechanisms. We developed an in-depth workflow combining urine metabolomics analysis and machine learning bioinformatics to characterize metabolic alterations and support objective diagnosis of LUTS. Machine learning feature selection and statistical tests were combined to identify candidate biomarkers, which were statistically validated with leave-one-patient-out cross-validation and absolutely quantified by selected reaction monitoring assay. Receiver operating characteristic analysis showed highly-accurate prediction power of candidate biomarkers to stratify patients into disease or non-diseased categories. The key metabolites and pathways may be possibly correlated with smooth muscle tone changes, increased collagen content, and inflammation, which have been identified as potential contributors to urinary dysfunction in humans and rodents. Periurethral tissue staining revealed a significant increase in collagen content and tissue stiffness in men with LUTS. Together, our study provides the first characterization and validation of LUTS urinary metabolites and pathways to support the future development of a urine-based diagnostic test for LUTS.
Gu, Jianli; Li, Jitian; Huang, Manyu; Zhang, Zhiyong; Li, Dongsheng; Song, Guoying; Ding, Xingpo; Li, Wuyin
2014-01-01
Osteosarcoma (OS) is the most common malignant bone tumor. To identify OS-related specific proteins for early diagnosis of OS, a novel approach, surface-enhanced laser desorption/ionization-time-of-flight mass spectrometry (SELDI-TOF-MS) to serum samples from 25 OS patients, 16 osteochondroma, and 26 age-matched normal human volunteers as controls, was performed. Two proteins showed a significantly different expression in OS serum samples from control groups. Proteomic profiles and external leave-one-out cross-validation analysis showed that the correct rate of allocation, the sensitivity, and the specificity of diagnosis were 100%. These two proteins were further identified by searching the EPO-KB database, and one of the proteins identified as Serine rich region profile is involved in various cellular signaling cascades and tumor genesis. The presence of these two proteins in OS patients but absence from premalignant and normal human controls implied that they can be potential biomarkers for early diagnosis of OS.
Farmer, William H.; Koltun, Greg
2017-01-01
Study regionThe state of Ohio in the United States, a humid, continental climate.Study focusThe estimation of nonexceedance probabilities of daily streamflows as an alternative means of establishing the relative magnitudes of streamflows associated with hydrologic and water-quality observations.New hydrological insights for the regionSeveral methods for estimating nonexceedance probabilities of daily mean streamflows are explored, including single-index methodologies (nearest-neighboring index) and geospatial tools (kriging and topological kriging). These methods were evaluated by conducting leave-one-out cross-validations based on analyses of nearly 7 years of daily streamflow data from 79 unregulated streamgages in Ohio and neighboring states. The pooled, ordinary kriging model, with a median Nash–Sutcliffe performance of 0.87, was superior to the single-site index methods, though there was some bias in the tails of the probability distribution. Incorporating network structure through topological kriging did not improve performance. The pooled, ordinary kriging model was applied to 118 locations without systematic streamgaging across Ohio where instantaneous streamflow measurements had been made concurrent with water-quality sampling on at least 3 separate days. Spearman rank correlations between estimated nonexceedance probabilities and measured streamflows were high, with a median value of 0.76. In consideration of application, the degree of regulation in a set of sample sites helped to specify the streamgages required to implement kriging approaches successfully.
NASA Astrophysics Data System (ADS)
Wilson, Machelle; Ustin, Susan L.; Rocke, David
2003-03-01
Remote sensing technologies with high spatial and spectral resolution show a great deal of promise in addressing critical environmental monitoring issues, but the ability to analyze and interpret the data lags behind the technology. Robust analytical methods are required before the wealth of data available through remote sensing can be applied to a wide range of environmental problems for which remote detection is the best method. In this study we compare the classification effectiveness of two relatively new techniques on data consisting of leaf-level reflectance from plants that have been exposed to varying levels of heavy metal toxicity. If these methodologies work well on leaf-level data, then there is some hope that they will also work well on data from airborne and space-borne platforms. The classification methods compared were support vector machine classification of exposed and non-exposed plants based on the reflectance data, and partial east squares compression of the reflectance data followed by classification using logistic discrimination (PLS/LD). PLS/LD was performed in two ways. We used the continuous concentration data as the response during compression, and then used the binary response required during logistic discrimination. We also used a binary response during compression followed by logistic discrimination. The statistics we used to compare the effectiveness of the methodologies was the leave-one-out cross validation estimate of the prediction error.
Liu, Ming; He, Lin; Hu, Xiaopeng; Liu, Peiqing; Luo, Hai-Bin
2010-12-01
The nociceptin/orphanin FQ receptor (NOP) has been implicated in a wide range of biological functions, including pain, anxiety, depression and drug abuse. Especially, its agonists have a great potential to be developed into anxiolytics. However, the crystal structure of NOP is still not available. In the present work, both structure-based and ligand-based modeling methods have been used to achieve a comprehensive understanding on 67N-substituted spiropiperidine analogues as NOP agonists. The comparative molecular-field analysis method was performed to formulate a reasonable 3D-QSAR model (cross-validated coefficient q(2)=0.819 and conventional r(2)=0.950), whose robustness and predictability were further verified by leave-eight-out, Y-randomization, and external test-set validations. The excellent performance of CoMFA to the affinity differences among these compounds was attributed to the contributions of electrostatic/hydrogen-bonding and steric/hydrophobic interactions, which was supported by the Surflex-Dock and CDOCKER molecular-docking simulations based on the 3D model of NOP built by the homology modeling method. The CoMFA contour maps and the molecular docking simulations were integrated to propose a binding mode for the spiropiperidine analogues at the binding site of NOP. Copyright © 2010 Elsevier Ltd. All rights reserved.
Freye, Chris E; Fitz, Brian D; Billingsley, Matthew C; Synovec, Robert E
2016-06-01
The chemical composition and several physical properties of RP-1 fuels were studied using comprehensive two-dimensional (2D) gas chromatography (GC×GC) coupled with flame ionization detection (FID). A "reversed column" GC×GC configuration was implemented with a RTX-wax column on the first dimension ((1)D), and a RTX-1 as the second dimension ((2)D). Modulation was achieved using a high temperature diaphragm valve mounted directly in the oven. Using leave-one-out cross-validation (LOOCV), the summed GC×GC-FID signal of three compound-class selective 2D regions (alkanes, cycloalkanes, and aromatics) was regressed against previously measured ASTM derived values for these compound classes, yielding root mean square errors of cross validation (RMSECV) of 0.855, 0.734, and 0.530mass%, respectively. For comparison, using partial least squares (PLS) analysis with LOOCV, the GC×GC-FID signal of the entire 2D separations was regressed against the same ASTM values, yielding a linear trend for the three compound classes (alkanes, cycloalkanes, and aromatics), yielding RMSECV values of 1.52, 2.76, and 0.945 mass%, respectively. Additionally, a more detailed PLS analysis was undertaken of the compounds classes (n-alkanes, iso-alkanes, mono-, di-, and tri-cycloalkanes, and aromatics), and of physical properties previously determined by ASTM methods (such as net heat of combustion, hydrogen content, density, kinematic viscosity, sustained boiling temperature and vapor rise temperature). Results from these PLS studies using the relatively simple to use and inexpensive GC×GC-FID instrumental platform are compared to previously reported results using the GC×GC-TOFMS instrumental platform. Copyright © 2016 Elsevier B.V. All rights reserved.
Prediction of metabolites of epoxidation reaction in MetaTox.
Rudik, A V; Dmitriev, A V; Bezhentsev, V M; Lagunin, A A; Filimonov, D A; Poroikov, V V
2017-10-01
Biotransformation is a process of the chemical modifications which may lead to the reactive metabolites, in particular the epoxides. Epoxide reactive metabolites may cause the toxic effects. The prediction of such metabolites is important for drug development and ecotoxicology studies. Epoxides are formed by some oxidation reactions, usually catalysed by cytochromes P450, and represent a large class of three-membered cyclic ethers. Identification of molecules, which may be epoxidized, and indication of the specific location of epoxide functional group (which is called SOE - site of epoxidation) are important for prediction of epoxide metabolites. Datasets from 355 molecules and 615 reactions were created for training and validation. The prediction of SOE is based on a combination of LMNA (Labelled Multilevel Neighbourhood of Atom) descriptors and Bayesian-like algorithm implemented in PASS software and MetaTox web-service. The average invariant accuracy of prediction (AUC) calculated in leave-one-out and 20-fold cross-validation procedures is 0.9. Prediction of epoxide formation based on the created SAR model is included as the component of MetaTox web-service ( http://www.way2drug.com/mg ).
Chen, Tinggui; Li, Yayun; Zhang, Liwei
2017-05-12
It is difficult to screen out as many active components as possible from natural plants all at one time. In this study, subfractions of Forsythia suspensa leaves were firstly prepared; then, their inhibitive abilities on pancreatic lipase were tested; finally, the highest inhibiting subfraction was screened by self-made immobilized pancreatic lipase. Results showed that nine ligands, including eight inhibitors and one promotor, were screened out all at one time. They were three flavonoids (rutin, IC 50 : 149 ± 6.0 μmol/L; hesperidin, 52.4 μmol/L; kaempferol-3- O -rutinoside, isolated from F. suspensa leaves for the first time, IC 50 notably reached 2.9 ± 0.5 μmol/L), two polyphenols (chlorogenic acid, 3150 ± 120 μmol/L; caffeic acid, 1394 ± 52 μmol/L), two lignans (phillyrin, promoter; arctigenin, 2129 ± 10.5 μmol/L), and two phenethyl alcohol (forsythiaside A, 2155 ± 8.5 μmol/L; its isomer). Their action mechanisms included competitive inhibition, competitive promotion, noncompetitive inhibition, and uncompetitive inhibition. In sum, using the appropriate methods, more active ingredients can be simply and quickly screened out all at one time from a complex natural product system. In addition, F. suspensa leaves contain numerous inhibitors of pancreatic lipase.
A Probabilistic Atlas of Diffuse WHO Grade II Glioma Locations in the Brain
Baumann, Cédric; Zouaoui, Sonia; Yordanova, Yordanka; Blonski, Marie; Rigau, Valérie; Chemouny, Stéphane; Taillandier, Luc; Bauchet, Luc; Duffau, Hugues; Paragios, Nikos
2016-01-01
Diffuse WHO grade II gliomas are diffusively infiltrative brain tumors characterized by an unavoidable anaplastic transformation. Their management is strongly dependent on their location in the brain due to interactions with functional regions and potential differences in molecular biology. In this paper, we present the construction of a probabilistic atlas mapping the preferential locations of diffuse WHO grade II gliomas in the brain. This is carried out through a sparse graph whose nodes correspond to clusters of tumors clustered together based on their spatial proximity. The interest of such an atlas is illustrated via two applications. The first one correlates tumor location with the patient’s age via a statistical analysis, highlighting the interest of the atlas for studying the origins and behavior of the tumors. The second exploits the fact that the tumors have preferential locations for automatic segmentation. Through a coupled decomposed Markov Random Field model, the atlas guides the segmentation process, and characterizes which preferential location the tumor belongs to and consequently which behavior it could be associated to. Leave-one-out cross validation experiments on a large database highlight the robustness of the graph, and yield promising segmentation results. PMID:26751577
Surface-enhanced Raman spectroscopy for differentiation between benign and malignant thyroid tissues
NASA Astrophysics Data System (ADS)
Li, Zuanfang; Li, Chao; Lin, Duo; Huang, Zufang; Pan, Jianji; Chen, Guannan; Lin, Juqiang; Liu, Nenrong; Yu, Yun; Feng, Shangyuan; Chen, Rong
2014-04-01
The aim of this study was to evaluate the potential of applying silver nano-particle based surface-enhanced Raman scattering (SERS) to discriminate different types of human thyroid tissues. SERS measurements were performed on three groups of tissue samples including thyroid cancers (n = 32), nodular goiters (n = 20) and normal thyroid tissues (n = 25). Tentative assignments of the measured tissue SERS spectra suggest interesting cancer specific biomolecular differences. The principal component analysis (PCA) and linear discriminate analysis (LDA) together with the leave-one-out, cross-validated technique yielded diagnostic sensitivities of 92%, 75% and 87.5%; and specificities of 82.6%, 89.4% and 84.4%, respectively, for differentiation among normal, nodular and malignant thyroid tissue samples. This work demonstrates that tissue SERS spectroscopy associated with multivariate analysis diagnostic algorithms has great potential for detection of thyroid cancer at the molecular level.
Zhang, Xue-Xi; Yin, Jian-Hua; Mao, Zhi-Hua; Xia, Yang
2015-01-01
Abstract. Fourier transform infrared imaging (FTIRI) combined with chemometrics algorithm has strong potential to obtain complex chemical information from biology tissues. FTIRI and partial least squares-discriminant analysis (PLS-DA) were used to differentiate healthy and osteoarthritic (OA) cartilages for the first time. A PLS model was built on the calibration matrix of spectra that was randomly selected from the FTIRI spectral datasets of healthy and lesioned cartilage. Leave-one-out cross-validation was performed in the PLS model, and the fitting coefficient between actual and predicted categorical values of the calibration matrix reached 0.95. In the calibration and prediction matrices, the successful identifying percentages of healthy and lesioned cartilage spectra were 100% and 90.24%, respectively. These results demonstrated that FTIRI combined with PLS-DA could provide a promising approach for the categorical identification of healthy and OA cartilage specimens. PMID:26057029
Complex versus simple models: ion-channel cardiac toxicity prediction.
Mistry, Hitesh B
2018-01-01
There is growing interest in applying detailed mathematical models of the heart for ion-channel related cardiac toxicity prediction. However, a debate as to whether such complex models are required exists. Here an assessment in the predictive performance between two established large-scale biophysical cardiac models and a simple linear model B net was conducted. Three ion-channel data-sets were extracted from literature. Each compound was designated a cardiac risk category using two different classification schemes based on information within CredibleMeds. The predictive performance of each model within each data-set for each classification scheme was assessed via a leave-one-out cross validation. Overall the B net model performed equally as well as the leading cardiac models in two of the data-sets and outperformed both cardiac models on the latest. These results highlight the importance of benchmarking complex versus simple models but also encourage the development of simple models.
Single-accelerometer-based daily physical activity classification.
Long, Xi; Yin, Bin; Aarts, Ronald M
2009-01-01
In this study, a single tri-axial accelerometer placed on the waist was used to record the acceleration data for human physical activity classification. The data collection involved 24 subjects performing daily real-life activities in a naturalistic environment without researchers' intervention. For the purpose of assessing customers' daily energy expenditure, walking, running, cycling, driving, and sports were chosen as target activities for classification. This study compared a Bayesian classification with that of a Decision Tree based approach. A Bayes classifier has the advantage to be more extensible, requiring little effort in classifier retraining and software update upon further expansion or modification of the target activities. Principal components analysis was applied to remove the correlation among features and to reduce the feature vector dimension. Experiments using leave-one-subject-out and 10-fold cross validation protocols revealed a classification accuracy of approximately 80%, which was comparable with that obtained by a Decision Tree classifier.
Zhang, Xue-Xi; Yin, Jian-Hua; Mao, Zhi-Hua; Xia, Yang
2015-06-01
Fourier transform infrared imaging (FTIRI) combined with chemometrics algorithm has strong potential to obtain complex chemical information from biology tissues. FTIRI and partial least squares-discriminant analysis (PLS-DA) were used to differentiate healthy and osteoarthritic (OA) cartilages for the first time. A PLS model was built on the calibration matrix of spectra that was randomly selected from the FTIRI spectral datasets of healthy and lesioned cartilage. Leave-one-out cross-validation was performed in the PLS model, and the fitting coefficient between actual and predicted categorical values of the calibration matrix reached 0.95. In the calibration and prediction matrices, the successful identifying percentages of healthy and lesioned cartilage spectra were 100% and 90.24%, respectively. These results demonstrated that FTIRI combined with PLS-DA could provide a promising approach for the categorical identification of healthy and OA cartilage specimens.
Predicting DNA hybridization kinetics from sequence
NASA Astrophysics Data System (ADS)
Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu
2018-01-01
Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.
Prieto, Luis P; Sharma, Kshitij; Kidzinski, Łukasz; Rodríguez-Triana, María Jesús; Dillenbourg, Pierre
2018-04-01
The pedagogical modelling of everyday classroom practice is an interesting kind of evidence, both for educational research and teachers' own professional development. This paper explores the usage of wearable sensors and machine learning techniques to automatically extract orchestration graphs (teaching activities and their social plane over time), on a dataset of 12 classroom sessions enacted by two different teachers in different classroom settings. The dataset included mobile eye-tracking as well as audiovisual and accelerometry data from sensors worn by the teacher. We evaluated both time-independent and time-aware models, achieving median F1 scores of about 0.7-0.8 on leave-one-session-out k-fold cross-validation. Although these results show the feasibility of this approach, they also highlight the need for larger datasets, recorded in a wider variety of classroom settings, to provide automated tagging of classroom practice that can be used in everyday practice across multiple teachers.
Lungu, Angela; Swift, Andrew J; Capener, David; Kiely, David; Hose, Rod; Wild, Jim M
2016-06-01
Accurately identifying patients with pulmonary hypertension (PH) using noninvasive methods is challenging, and right heart catheterization (RHC) is the gold standard. Magnetic resonance imaging (MRI) has been proposed as an alternative to echocardiography and RHC in the assessment of cardiac function and pulmonary hemodynamics in patients with suspected PH. The aim of this study was to assess whether machine learning using computational modeling techniques and image-based metrics of PH can improve the diagnostic accuracy of MRI in PH. Seventy-two patients with suspected PH attending a referral center underwent RHC and MRI within 48 hours. Fifty-seven patients were diagnosed with PH, and 15 had no PH. A number of functional and structural cardiac and cardiovascular markers derived from 2 mathematical models and also solely from MRI of the main pulmonary artery and heart were integrated into a classification algorithm to investigate the diagnostic utility of the combination of the individual markers. A physiological marker based on the quantification of wave reflection in the pulmonary artery was shown to perform best individually, but optimal diagnostic performance was found by the combination of several image-based markers. Classifier results, validated using leave-one-out cross validation, demonstrated that combining computation-derived metrics reflecting hemodynamic changes in the pulmonary vasculature with measurement of right ventricular morphology and function, in a decision support algorithm, provides a method to noninvasively diagnose PH with high accuracy (92%). The high diagnostic accuracy of these MRI-based model parameters may reduce the need for RHC in patients with suspected PH.
A climatological model of North Indian Ocean tropical cyclone genesis, tracks and landfall
NASA Astrophysics Data System (ADS)
Wahiduzzaman, Mohammad; Oliver, Eric C. J.; Wotherspoon, Simon J.; Holbrook, Neil J.
2017-10-01
Extensive damage and loss of life can be caused by tropical cyclones (TCs) that make landfall. Modelling of TC landfall probability is beneficial to insurance/re-insurance companies, decision makers, government policy and planning, and residents in coastal areas. In this study, we develop a climatological model of tropical cyclone genesis, tracks and landfall for North Indian Ocean (NIO) rim countries based on kernel density estimation, a generalised additive model (GAM) including an Euler integration step, and landfall detection using a country mask approach. Using a 35-year record (1979-2013) of tropical cyclone track observations from the Joint Typhoon Warning Centre (part of the International Best Track Archive Climate Stewardship Version 6), the GAM is fitted to the observed cyclone track velocities as a smooth function of location in each season. The distribution of cyclone genesis points is approximated by kernel density estimation. The model simulated TCs are randomly selected from the fitted kernel (TC genesis), and the cyclone paths (TC tracks), represented by the GAM together with the application of stochastic innovations at each step, are simulated to generate a suite of NIO rim landfall statistics. Three hindcast validation methods are applied to evaluate the integrity of the model. First, leave-one-out cross validation is applied whereby the country of landfall is determined by the majority vote (considering the location by only highest percentage of landfall) from the simulated tracks. Second, the probability distribution of simulated landfall is evaluated against the observed landfall. Third, the distances between the point of observed landfall and simulated landfall are compared and quantified. Overall, the model shows very good cross-validated hindcast skill of modelled landfalling cyclones against observations in each of the NIO tropical cyclone seasons and for most NIO rim countries, with only a relatively small difference in the percentage of predicted landfall locations compared with observations.
Kehimkar, Benjamin; Parsons, Brendon A; Hoggard, Jamin C; Billingsley, Matthew C; Bruno, Thomas J; Synovec, Robert E
2015-01-01
Recent efforts in predicting rocket propulsion (RP-1) fuel performance through modeling put greater emphasis on obtaining detailed and accurate fuel properties, as well as elucidating the relationships between fuel compositions and their properties. Herein, we study multidimensional chromatographic data obtained by comprehensive two-dimensional gas chromatography combined with time-of-flight mass spectrometry (GC × GC-TOFMS) to analyze RP-1 fuels. For GC × GC separations, RTX-Wax (polar stationary phase) and RTX-1 (non-polar stationary phase) columns were implemented for the primary and secondary dimensions, respectively, to separate the chemical compound classes (alkanes, cycloalkanes, aromatics, etc.), providing a significant level of chemical compositional information. The GC × GC-TOFMS data were analyzed using partial least squares regression (PLS) chemometric analysis to model and predict advanced distillation curve (ADC) data for ten RP-1 fuels that were previously analyzed using the ADC method. The PLS modeling provides insight into the chemical species that impact the ADC data. The PLS modeling correlates compositional information found in the GC × GC-TOFMS chromatograms of each RP-1 fuel, and their respective ADC, and allows prediction of the ADC for each RP-1 fuel with good precision and accuracy. The root-mean-square error of calibration (RMSEC) ranged from 0.1 to 0.5 °C, and was typically below ∼0.2 °C, for the PLS calibration of the ADC modeling with GC × GC-TOFMS data, indicating a good fit of the model to the calibration data. Likewise, the predictive power of the overall method via PLS modeling was assessed using leave-one-out cross-validation (LOOCV) yielding root-mean-square error of cross-validation (RMSECV) ranging from 1.4 to 2.6 °C, and was typically below ∼2.0 °C, at each % distilled measurement point during the ADC analysis.
Fraley, Stephanie I.; Athamanolap, Pornpat; Masek, Billie J.; Hardick, Justin; Carroll, Karen C.; Hsieh, Yu-Hsiang; Rothman, Richard E.; Gaydos, Charlotte A.; Wang, Tza-Huei; Yang, Samuel
2016-01-01
High Resolution Melt (HRM) is a versatile and rapid post-PCR DNA analysis technique primarily used to differentiate sequence variants among only a few short amplicons. We recently developed a one-vs-one support vector machine algorithm (OVO SVM) that enables the use of HRM for identifying numerous short amplicon sequences automatically and reliably. Herein, we set out to maximize the discriminating power of HRM + SVM for a single genetic locus by testing longer amplicons harboring significantly more sequence information. Using universal primers that amplify the hypervariable bacterial 16 S rRNA gene as a model system, we found that long amplicons yield more complex HRM curve shapes. We developed a novel nested OVO SVM approach to take advantage of this feature and achieved 100% accuracy in the identification of 37 clinically relevant bacteria in Leave-One-Out-Cross-Validation. A subset of organisms were independently tested. Those from pure culture were identified with high accuracy, while those tested directly from clinical blood bottles displayed more technical variability and reduced accuracy. Our findings demonstrate that long sequences can be accurately and automatically profiled by HRM with a novel nested SVM approach and suggest that clinical sample testing is feasible with further optimization. PMID:26778280
Tang, Qi; Li, Qiang; Xie, Dong; Chu, Ketao; Liu, Lidong; Liao, Chengcheng; Qin, Yunying; Wang, Zheng; Su, Danke
2018-05-21
This study aimed to investigate the utility of a volumetric apparent diffusion coefficient (ADC) histogram method for distinguishing non-puerperal mastitis (NPM) from breast cancer (BC) and to compare this method with a traditional 2-dimensional measurement method. Pretreatment diffusion-weighted imaging data at 3.0 T were obtained for 80 patients (NPM, n = 27; BC, n = 53) and were retrospectively assessed. Two readers measured ADC values according to 2 distinct region-of-interest (ROI) protocols. The first protocol included the generation of ADC histograms for each lesion, and various parameters were examined. In the second protocol, 3 freehand (TF) ROIs for local lesions were generated to obtain a mean ADC value (defined as ADC-ROITF). All of the ADC values were compared by an independent-samples t test or the Mann-Whitney U test. Receiver operating characteristic curves and a leave-one-out cross-validation method were also used to determine diagnostic deficiencies of the significant parameters. The ADC values for NPM were characterized by significantly higher mean, 5th to 95th percentiles, and maximum and mode ADCs compared with the corresponding ADCs for BC (all P < 0.05). However, the minimum, skewness, and kurtosis ADC values, as well as ADC-ROITF, did not significantly differ between the NPM and BC cases. Thus, the generation of volumetric ADC histograms seems to be a superior method to the traditional 2-dimensional method that was examined, and it also seems to represent a promising image analysis method for distinguishing NPM from BC.
Farooq, Muhammad; Sazonov, Edward
2017-11-01
Several methods have been proposed for automatic and objective monitoring of food intake, but their performance suffers in the presence of speech and motion artifacts. This paper presents a novel sensor system and algorithms for detection and characterization of chewing bouts from a piezoelectric strain sensor placed on the temporalis muscle. The proposed data acquisition device was incorporated into the temple of eyeglasses. The system was tested by ten participants in two part experiments, one under controlled laboratory conditions and the other in unrestricted free-living. The proposed food intake recognition method first performed an energy-based segmentation to isolate candidate chewing segments (instead of using epochs of fixed duration commonly reported in research literature), with the subsequent classification of the segments by linear support vector machine models. On participant level (combining data from both laboratory and free-living experiments), with ten-fold leave-one-out cross-validation, chewing were recognized with average F-score of 96.28% and the resultant area under the curve was 0.97, which are higher than any of the previously reported results. A multivariate regression model was used to estimate chew counts from segments classified as chewing with an average mean absolute error of 3.83% on participant level. These results suggest that the proposed system is able to identify chewing segments in the presence of speech and motion artifacts, as well as automatically and accurately quantify chewing behavior, both under controlled laboratory conditions and unrestricted free-living.
NASA Astrophysics Data System (ADS)
Horiba, Kazuki; Muramatsu, Chisako; Hayashi, Tatsuro; Fukui, Tatsumasa; Hara, Takeshi; Katsumata, Akitoshi; Fujita, Hiroshi
2015-03-01
Findings on dental panoramic radiographs (DPRs) have shown that mandibular cortical index (MCI) based on the morphology of mandibular inferior cortex was significantly correlated with osteoporosis. MCI on DPRs can be categorized into one of three groups and has the high potential for identifying patients with osteoporosis. However, most DPRs are used only for diagnosing dental conditions by dentists in their routine clinical work. Moreover, MCI is not generally quantified but assessed subjectively. In this study, we investigated a computer-aided diagnosis (CAD) system that automatically classifies mandibular cortical bone for detection of osteoporotic patients at early stage. First, an inferior border of mandibular bone was detected by use of an active contour method. Second, regions of interest including the cortical bone are extracted and analyzed for its thickness and roughness. Finally, support vector machine (SVM) differentiate cases into three MCI categories by features including the thickness and roughness. Ninety eight DPRs were used to evaluate our proposed scheme. The number of cases classified to Class I, II, and III by a dental radiologist are 56, 25 and 17 cases, respectively. Experimental result based on the leave-one-out cross-validation evaluation showed that the sensitivities for the classes I, II, and III were 94.6%, 57.7% and 94.1%, respectively. Distribution of the groups in the feature space indicates a possibility of MCI quantification by the proposed method. Therefore, our scheme has a potential in identifying osteoporotic patients at an early stage.
Koca, N; Rodriguez-Saona, L E; Harper, W J; Alvarez, V B
2007-08-01
Short-chain free fatty acids (FFA) are important sources of cheese flavor and have been reported to be indicators for assessing quality. The objective of this research was to develop a simple and rapid screening tool for monitoring the short-chain FFA contents in Swiss cheese by using Fourier transform infrared spectroscopy (FTIR). Forty-four Swiss cheese samples were evaluated by using a MIRacle three-reflection diamond attenuated total reflectance (ATR) accessory. Two different sampling techniques were used for FTIR/ATR measurement: direct measurement of Swiss cheese slices (approximately 0.5 g) and measurement of a water-soluble fraction of cheese. The amounts of FFA (propionic, acetic, and butyric acids) in the water-soluble fraction of samples were analyzed by gas chromatography-flame ion-ization detection as a reference method. Calibration models for both direct measurement and the water-soluble fraction of cheese were developed based on a cross-validated (leave-one-out approach) partial least squares regression by using the regions of 3,000 to 2,800, 1,775 to 1,680, and 1,500 to 900 cm(-1) for short-chain FFA in cheese. Promising performance statistics were obtained for the calibration models of both direct measurement and the water-soluble fraction, with improved performance statistics obtained from the water-soluble extract, particularly for propionic acid. Partial least squares models generated from FTIR/ATR spectra by direct measurement of cheeses gave standard errors of cross-validation of 9.7 mg/100 g of cheese for propionic acid, 9.3 mg/100 g of cheese for acetic acid, and 5.5 mg/100 g of cheese for butyric acid, and correlation coefficients >0.9. Standard error of cross-validation values for the water-soluble fraction were 4.4 mg/100 g of cheese for propionic acid, 9.2 mg/100 g of cheese for acetic acid, and 5.2 mg/100 g of cheese for butyric acid with correlation coefficients of 0.98, 0.95, and 0.92, respectively. Infrared spectroscopy and chemometrics accurately and precisely predicted the short-chain FFA content in Swiss cheeses and in the water-soluble fraction of the cheese.
EEG-based Affect and Workload Recognition in a Virtual Driving Environment for ASD Intervention
Wade, Joshua W.; Key, Alexandra P.; Warren, Zachary E.; Sarkar, Nilanjan
2017-01-01
objective To build group-level classification models capable of recognizing affective states and mental workload of individuals with autism spectrum disorder (ASD) during driving skill training. Methods Twenty adolescents with ASD participated in a six-session virtual reality driving simulator based experiment, during which their electroencephalogram (EEG) data were recorded alongside driving events and a therapist’s rating of their affective states and mental workload. Five feature generation approaches including statistical features, fractal dimension features, higher order crossings (HOC)-based features, power features from frequency bands, and power features from bins (Δf = 2 Hz) were applied to extract relevant features. Individual differences were removed with a two-step feature calibration method. Finally, binary classification results based on the k-nearest neighbors algorithm and univariate feature selection method were evaluated by leave-one-subject-out nested cross-validation to compare feature types and identify discriminative features. Results The best classification results were achieved using power features from bins for engagement (0.95) and boredom (0.78), and HOC-based features for enjoyment (0.90), frustration (0.88), and workload (0.86). Conclusion Offline EEG-based group-level classification models are feasible for recognizing binary low and high intensity of affect and workload of individuals with ASD in the context of driving. However, while promising the applicability of the models in an online adaptive driving task requires further development. Significance The developed models provide a basis for an EEG-based passive brain computer interface system that has the potential to benefit individuals with ASD with an affect- and workload-based individualized driving skill training intervention. PMID:28422647
NASA Astrophysics Data System (ADS)
Yuan, Hua; Zhang, Yan; Chen, Chun-Ni; Li, Meng-Yang
2018-03-01
The substituent cross-interaction effect in the substituted benzylidene anilines (p-Xsbnd C6H4sbnd CHdbnd Nsbnd C6H4sbnd Y-p) has been observed and widely investigated. In order to investigate whether the substituent cross-interaction effect exist in all the conjugated systems containing Cdbnd N polar bond, this paper employed 2-X-5-Y pyrimidines as the model compounds for study. The influences of substituents X and Y on the 1H NMR and 13C NMR chemical shifts of 2, 5-disubsitituted pyrimidines have been systematically investigated. Quantitative structure-chemical shifts relationship models have been built for δ(H4,6), δ(C2), δ(C4,6) and δ(C5) with four to six molecular descriptors. These models were confirmed of good stability and predictive performances by leave-one-out cross validation. This study indicates that the substituent effects of 2,5-disubstituted pyrimidines are much more complex than that of the substituted benzylidene anilines. More structural factors besides of Hammett parameter should be taken into consideration. Different from the substituted benzylidene anilines, the cross-interaction effect (Δσ2) of substituents X and Y has little contribution to δ(H4,6), δ(C2), δ(C5) and δ(C4,6) of 2,5-disubstituted pyrimidines.
Li, Longhai; Feng, Cindy X; Qiu, Shi
2017-06-30
An important statistical task in disease mapping problems is to identify divergent regions with unusually high or low risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is the gold standard for estimating predictive p-values that can flag such divergent regions. However, actual LOOCV is time-consuming because one needs to rerun a Markov chain Monte Carlo analysis for each posterior distribution in which an observation is held out as a test case. This paper introduces a new method, called integrated importance sampling (iIS), for estimating LOOCV predictive p-values with only Markov chain samples drawn from the posterior based on a full data set. The key step in iIS is that we integrate away the latent variables associated the test observation with respect to their conditional distribution without reference to the actual observation. By following the general theory for importance sampling, the formula used by iIS can be proved to be equivalent to the LOOCV predictive p-value. We compare iIS and other three existing methods in the literature with two disease mapping datasets. Our empirical results show that the predictive p-values estimated with iIS are almost identical to the predictive p-values estimated with actual LOOCV and outperform those given by the existing three methods, namely, the posterior predictive checking, the ordinary importance sampling, and the ghosting method by Marshall and Spiegelhalter (2003). Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Raman spectroscopy of bio fluids: an exploratory study for oral cancer detection
NASA Astrophysics Data System (ADS)
Brindha, Elumalai; Rajasekaran, Ramu; Aruna, Prakasarao; Koteeswaran, Dornadula; Ganesan, Singaravelu
2016-03-01
ion for various disease diagnosis including cancers. Oral cancer is one of the most common cancers in India and it accounts for one third of the global oral cancer burden. Raman spectroscopy of tissues has gained much attention in the diagnostic oncology, as it provides unique spectral signature corresponding to metabolic alterations under different pathological conditions and micro-environment. Based on these, several studies have been reported on the use of Raman spectroscopy in the discrimination of diseased conditions from their normal counterpart at cellular and tissue level but only limited studies were available on bio-fluids. Recently, optical characterization of bio-fluids has also geared up for biomarker identification in the disease diagnosis. In this context, an attempt was made to study the metabolic variations in the blood, urine and saliva of oral cancer patients and normal subjects using Raman spectroscopy. Principal Component based Linear Discriminant Analysis (PC-LDA) followed by Leave-One-Out Cross-Validation (LOOCV) was employed to find the statistical significance of the present technique in discriminating the malignant conditions from normal subjects.
Mohamadi Monavar, H; Afseth, N K; Lozano, J; Alimardani, R; Omid, M; Wold, J P
2013-07-15
The purpose of this study was to evaluate the feasibility of Raman spectroscopy for predicting purity of caviars. The 93 wild caviar samples of three different types, namely; Beluga, Asetra and Sevruga were analysed by Raman spectroscopy in the range 1995 cm(-1) to 545 cm(-1). Also, 60 samples from combinations of every two types were examined. The chemical origin of the samples was identified by reference measurements on pure samples. Linear chemometric methods like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) were used for data visualisation and classification which permitted clear distinction between different caviars. Non-linear methods like Artificial Neural Networks (ANN) were used to classify caviar samples. Two different networks were tested in the classification: Probabilistic Neural Network with Radial-Basis Function (PNN) and Multilayer Feed Forward Networks with Back Propagation (BP-NN). In both cases, scores of principal components (PCs) were chosen as input nodes for the input layer in PC-ANN models in order to reduce the redundancy of data and time of training. Leave One Out (LOO) cross validation was applied in order to check the performance of the networks. Results of PCA indicated that, features like type and purity can be used to discriminate different caviar samples. These findings were also supported by LDA with efficiency between 83.77% and 100%. These results were confirmed with the results obtained by developed PC-ANN models, able to classify pure caviar samples with 93.55% and 71.00% accuracy in BP network and PNN, respectively. In comparison, LDA, PNN and BP-NN models for predicting caviar types have 90.3%, 73.1% and 91.4% accuracy. Partial least squares regression (PLSR) models were built under cross validation and tested with different independent data sets, yielding determination coefficients (R(2)) of 0.86, 0.83, 0.92 and 0.91 with root mean square error (RMSE) of validation of 0.32, 0.11, 0.03 and 0.09 for fatty acids of 16.0, 20.5, 22.6 and fat, respectively. Crown Copyright © 2013. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Tailanián, Matías; Castiglioni, Enrique; Musé, Pablo; Fernández Flores, Germán.; Lema, Gabriel; Mastrángelo, Pedro; Almansa, Mónica; Fernández Liñares, Ignacio; Fernández Liñares, Germán.
2015-10-01
Soybean producers suffer from caterpillar damage in many areas of the world. Estimated average economic losses are annually 500 million USD in Brazil, Argentina, Paraguay and Uruguay. Designing efficient pest control management using selective and targeted pesticide applications is extremely important both from economic and environmental perspectives. With that in mind, we conducted a research program during the 2013-2014 and 2014-2015 planting seasons in a 4,000 ha soybean farm, seeking to achieve early pest detection. Nowadays pest presence is evaluated using manual, labor-intensive counting methods based on sampling strategies which are time consuming and imprecise. The experiment was conducted as follows. Using manual counting methods as ground-truth, a spectrometer capturing reflectance from 400 to 1100 nm was used to measure the reflectance of soy plants. A first conclusion, resulting from measuring the spectral response at leaves level, showed that stress was a property of plants since different leaves with different levels of damage yielded the same spectral response. Then, to assess the applicability of unsupervised classification of plants as healthy, biotic-stressed or abiotic-stressed, feature extraction and selection from leaves spectral signatures, combined with a Supported Vector Machine classifier was designed. Optimization of SVM parameters using grid search with cross-validation, along with classification evaluation by ten-folds cross-validation showed a correct classification rate of 95%, consistently on both seasons. Controlled experiments using cages with different numbers of caterpillars--including caterpillar-free plants--were also conducted to evaluate consistency in trends of the spectral response as well as the extracted features.
Long, Zhuqing; Jing, Bin; Yan, Huagang; Dong, Jianxin; Liu, Han; Mo, Xiao; Han, Ying; Li, Haiyun
2016-09-07
Mild cognitive impairment (MCI) represents a transitional state between normal aging and Alzheimer's disease (AD). Non-invasive diagnostic methods are desirable to identify MCI for early therapeutic interventions. In this study, we proposed a support vector machine (SVM)-based method to discriminate between MCI patients and normal controls (NCs) using multi-level characteristics of magnetic resonance imaging (MRI). This method adopted a radial basis function (RBF) as the kernel function, and a grid search method to optimize the two parameters of SVM. The calculated characteristics, i.e., the Hurst exponent (HE), amplitude of low-frequency fluctuations (ALFF), regional homogeneity (ReHo) and gray matter density (GMD), were adopted as the classification features. A leave-one-out cross-validation (LOOCV) was used to evaluate the classification performance of the method. Applying the proposed method to the experimental data from 29 MCI patients and 33 healthy subjects, we achieved a classification accuracy of up to 96.77%, with a sensitivity of 93.10% and a specificity of 100%, and the area under the curve (AUC) yielded up to 0.97. Furthermore, the most discriminative features for classification were found to predominantly involve default-mode regions, such as hippocampus (HIP), parahippocampal gyrus (PHG), posterior cingulate gyrus (PCG) and middle frontal gyrus (MFG), and subcortical regions such as lentiform nucleus (LN) and amygdala (AMYG). Therefore, our method is promising in distinguishing MCI patients from NCs and may be useful for the diagnosis of MCI. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Mundava, C.; Helmholz, P.; Schut, A. G. T.; Corner, R.; McAtee, B.; Lamb, D. W.
2014-09-01
The objective of this paper is to test the relationships between Above Ground Biomass (AGB) and remotely sensed vegetation indices for AGB assessments in the Kimberley area in Western Australia. For 19 different sites, vegetation indices were derived from eight Landsat ETM+ scenes over a period of two years (2011-2013). The sites were divided into three groups (Open plains, Bunch grasses and Spinifex) based on similarities in dominant vegetation types. Dry and green biomass fractions were measured at these sites. Single and multiple regression relationships between vegetation indices and green and total AGB were calibrated and validated using a "leave site out" cross validation. Four tests were compared: (1) relationships between AGB and vegetation indices combining all sites; (2) separate relationships per site group; (3) multiple regressions including selected vegetation indices per site group; and (4) as in 3 but including rainfall and elevation data. Results indicate that relationships based on single vegetation indices are moderately accurate for green biomass in wide open plains covered with annual grasses. The cross-validation results for green AGB improved for a combination of indices for the Open plains and Bunch grasses sites, but not for Spinifex sites. When rainfall and elevation data are included, cross validation improved slightly with a Q2 of 0.49-0.72 for Open plains and Bunch grasses sites respectively. Cross validation results for total AGB were moderately accurate (Q2 of 0.41) for Open plains but weak or absent for other site groups despite good calibration results, indicating strong influence of site-specific factors.
Rosowsky, Erlene; Young, Alexander S; Malloy, Mary C; van Alphen, S P J; Ellison, James M
2018-03-01
The Delphi method is a consensus-building technique using expert opinion to formulate a shared framework for understanding a topic with limited empirical support. This cross-validation study replicates one completed in the Netherlands and Belgium, and explores US experts' views on the diagnosis and treatment of older adults with personality disorders (PD). Twenty-one geriatric PD experts participated in a Delphi survey addressing diagnosis and treatment of older adults with PD. The European survey was translated and administered electronically. First-round consensus was reached for 16 out of 18 items relevant to diagnosis and specific mental health programs for personality disorders in older adults. Experts agreed on the usefulness of establishing criteria for specific types of treatments. The majority of psychologists did not initially agree on the usefulness of pharmacotherapy. Expert consensus was reached following two subsequent rounds after clarification addressing medication use. Study results suggest consensus among regarding psychosocial treatments. Limited acceptance amongst US psychologists about the suitability of pharmacotherapy for late-life PDs contrasted with the views expressed by experts surveyed in Netherlands and Belgium studies.
NASA Astrophysics Data System (ADS)
Sun, Wei; Ding, Wei; Yan, Huifang; Duan, Shunli
2018-06-01
Shoe-mounted pedestrian navigation systems based on micro inertial sensors rely on zero velocity updates to correct their positioning errors in time, which effectively makes determining the zero velocity interval play a key role during normal walking. However, as walking gaits are complicated, and vary from person to person, it is difficult to detect walking gaits with a fixed threshold method. This paper proposes a pedestrian gait classification method based on a hidden Markov model. Pedestrian gait data are collected with a micro inertial measurement unit installed at the instep. On the basis of analyzing the characteristics of the pedestrian walk, a single direction angular rate gyro output is used to classify gait features. The angular rate data are modeled into a univariate Gaussian mixture model with three components, and a four-state left–right continuous hidden Markov model (CHMM) is designed to classify the normal walking gait. The model parameters are trained and optimized using the Baum–Welch algorithm and then the sliding window Viterbi algorithm is used to decode the gait. Walking data are collected through eight subjects walking along the same route at three different speeds; the leave-one-subject-out cross validation method is conducted to test the model. Experimental results show that the proposed algorithm can accurately detect different walking gaits of zero velocity interval. The location experiment shows that the precision of CHMM-based pedestrian navigation improved by 40% when compared to the angular rate threshold method.
Liu, Zhiming; Luo, Jiawei
2017-08-01
Associating protein complexes to human inherited diseases is critical for better understanding of biological processes and functional mechanisms of the disease. Many protein complexes have been identified and functionally annotated by computational and purification methods so far, however, the particular roles they were playing in causing disease have not yet been well determined. In this study, we present a novel method to identify associations between protein complexes and diseases. First, we construct a disease-protein heterogeneous network based on data integration and laplacian normalization. Second, we apply a random walk with restart on heterogeneous network (RWRH) algorithm on this network to quantify the strength of the association between proteins and the query disease. Third, we sum over the scores of member proteins to obtain a summary score for each candidate protein complex, and then rank all candidate protein complexes according to their scores. With a series of leave-one-out cross-validation experiments, we found that our method not only possesses high performance but also demonstrates robustness regarding the parameters and the network structure. We test our approach with breast cancer and select top 20 highly ranked protein complexes, 17 of the selected protein complexes are evidenced to be connected with breast cancer. Our proposed method is effective in identifying disease-related protein complexes based on data integration and laplacian normalization. Copyright © 2017. Published by Elsevier Ltd.
Che, Wenkai; Sun, Laijun; Zhang, Qian; Zhang, Dan; Ye, Dandan; Tan, Wenyi; Wang, Lekai; Dai, Changjun
2017-10-01
Azodicarbonamide is wildly used in flour industry as a flour gluten fortifier in many countries, but it was proved by some researches to be dangerous or unhealthy for people and not suitable to be added in flour. Applying a rapid, convenient, and noninvasive technique in food analytical procedure for the safety inspection has become an urgent need. This paper used Vis/NIR reflectance spectroscopy analysis technology, which is based on the physical property analysis to predict the concentration of azodicarbonamide in flour. Spectral data in range from 400 to 2498 nm were obtained by scanning 101 samples which were prepared using the stepwise dilution method. Furthermore, the combination of leave-one-out cross-validation and Mahalanobis distance method was used to eliminate abnormal spectral data, and correlation coefficient method was used to choose characteristic wavebands. Partial least squares, back propagation neural network, and radial basis function were used to establish prediction model separately. By comparing the prediction results between 3 models, the radial basis function model has the best prediction results whose correlation coefficients (R), root mean square error of prediction (RMSEP), and ratio of performance to deviation (RPD) reached 0.99996, 0.5467, and 116.5858, respectively. Azodicarbonamide has been banned or limited in many countries. This paper proposes a method to predict azodicarbonamide concentrate in wheat flour, which will be used for a rapid, convenient, and noninvasive detection device. © 2017 Institute of Food Technologists®.
Quality grading of Atlantic salmon (Salmo salar) by computer vision.
Misimi, E; Erikson, U; Skavhaug, A
2008-06-01
In this study, we present a promising method of computer vision-based quality grading of whole Atlantic salmon (Salmo salar). Using computer vision, it was possible to differentiate among different quality grades of Atlantic salmon based on the external geometrical information contained in the fish images. Initially, before the image acquisition, the fish were subjectively graded and labeled into grading classes by a qualified human inspector in the processing plant. Prior to classification, the salmon images were segmented into binary images, and then feature extraction was performed on the geometrical parameters of the fish from the grading classes. The classification algorithm was a threshold-based classifier, which was designed using linear discriminant analysis. The performance of the classifier was tested by using the leave-one-out cross-validation method, and the classification results showed a good agreement between the classification done by human inspectors and by the computer vision. The computer vision-based method classified correctly 90% of the salmon from the data set as compared with the classification by human inspector. Overall, it was shown that computer vision can be used as a powerful tool to grade Atlantic salmon into quality grades in a fast and nondestructive manner by a relatively simple classifier algorithm. The low cost of implementation of today's advanced computer vision solutions makes this method feasible for industrial purposes in fish plants as it can replace manual labor, on which grading tasks still rely.
Jeong, In-Young; Kim, Ji-Soo
2018-04-01
To identify the relationship between emergency nurses' intention to leave the hospital and their coping methods following workplace violence. Emergency departments report a high prevalence of workplace violence, with nurses being at particular risk of violence from patients and patients' relatives. Violence negatively influences nurses' personal and professional lives and increases their turnover. This is a cross-sectional, descriptive survey study. Participants were nurses (n = 214) with over one year of experience of working in an emergency department. We measured workplace violence, coping after workplace violence experiences and job satisfaction using scales validated through a preliminary survey. Questionnaires were distributed to all nurses who signed informed consent forms. Multiple logistic regression analysis was used to identify the relationships between nurses' intention to leave the hospital and their coping methods after workplace violence. Verbal abuse was the most frequent violence experience and more often originated from patients' relatives than from patients. Of the nurses who experienced violence, 61.0% considered leaving the hospital. As for coping, nurses who employed problem-focused coping most frequently sought to identify the problems that cause violence, while nurses who employed emotion-focused coping primarily attempted to endure the situation. The multiple logistic regression analysis revealed that female sex, emotion-focused coping and job satisfaction were significantly related to emergency nurses' intention to leave. Emotion-focused coping seems to have a stronger effect on intention to leave after experiencing violence than does job satisfaction. Nurse managers should begin providing emergency nurses with useful information to guide their management of violence experiences. Nurse managers should also encourage nurses to report violent experiences to the administrative department rather than resorting to emotion-focused coping. Nurses should be provided with the opportunity to communicate their feelings to their colleagues. © 2017 John Wiley & Sons Ltd.
Cross-Validating Chinese Language Mental Health Recovery Measures in Hong Kong
ERIC Educational Resources Information Center
Bola, John; Chan, Tiffany Hill Ching; Chen, Eric HY; Ng, Roger
2016-01-01
Objectives: Promoting recovery in mental health services is hampered by a shortage of reliable and valid measures, particularly in Hong Kong. We seek to cross validate two Chinese language measures of recovery and one of recovery-promoting environments. Method: A cross-sectional survey of people recovering from early episode psychosis (n = 121)…
Porter, Teresita M.; Golding, G. Brian
2012-01-01
Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys. PMID:22558215
NASA Astrophysics Data System (ADS)
Metusala, D.
2017-07-01
This alternative method provides a simple and faster procedure for preparing cross-sections of leaves and roots in herbaceous plants, especially for living specimens of orchids (Orchidaceae). This method used a clamp-on hand sliding microtome to make cross-sections of leaves and roots, with sections preserved inside the microtubes containing preservation liquid. This preservation technique allowed the sections to be restained and to be used for further usage in future. This method was more practical than the paraffin embedding method because it does not need the additional steps of paraffin embedding and deparaffinization. It may also provide better cross-section results than free-hand sectioning method. The procedure is very feasible and is recommended for use in plant anatomy observation.
Roine, Antti; Saviauk, Taavi; Kumpulainen, Pekka; Karjalainen, Markus; Tuokko, Antti; Aittoniemi, Janne; Vuento, Risto; Lekkala, Jukka; Lehtimäki, Terho; Tammela, Teuvo L; Oksala, Niku K J
2014-01-01
Urinary tract infection (UTI) is a common disease with significant morbidity and economic burden, accounting for a significant part of the workload in clinical microbiology laboratories. Current clinical chemisty point-of-care diagnostics rely on imperfect dipstick analysis which only provides indirect and insensitive evidence of urinary bacterial pathogens. An electronic nose (eNose) is a handheld device mimicking mammalian olfaction that potentially offers affordable and rapid analysis of samples without preparation at athmospheric pressure. In this study we demonstrate the applicability of ion mobility spectrometry (IMS) -based eNose to discriminate the most common UTI pathogens from gaseous headspace of culture plates rapidly and without sample preparation. We gathered a total of 101 culture samples containing four most common UTI bacteries: E. coli, S. saprophyticus, E. faecalis, Klebsiella spp and sterile culture plates. The samples were analyzed using ChemPro 100i device, consisting of IMS cell and six semiconductor sensors. Data analysis was conducted by linear discriminant analysis (LDA) and logistic regression (LR). The results were validated by leave-one-out and 5-fold cross validation analysis. In discrimination of sterile and bacterial samples sensitivity of 95% and specificity of 97% were achieved. The bacterial species were identified with sensitivity of 95% and specificity of 96% using eNose as compared to urine bacterial cultures. These findings strongly demonstrate the ability of our eNose to discriminate bacterial cultures and provides a proof of principle to use this method in urinanalysis of UTI.
Robust prediction of individual creative ability from brain functional connectivity.
Beaty, Roger E; Kenett, Yoed N; Christensen, Alexander P; Rosenberg, Monica D; Benedek, Mathias; Chen, Qunlin; Fink, Andreas; Qiu, Jiang; Kwapil, Thomas R; Kane, Michael J; Silvia, Paul J
2018-01-30
People's ability to think creatively is a primary means of technological and cultural progress, yet the neural architecture of the highly creative brain remains largely undefined. Here, we employed a recently developed method in functional brain imaging analysis-connectome-based predictive modeling-to identify a brain network associated with high-creative ability, using functional magnetic resonance imaging (fMRI) data acquired from 163 participants engaged in a classic divergent thinking task. At the behavioral level, we found a strong correlation between creative thinking ability and self-reported creative behavior and accomplishment in the arts and sciences ( r = 0.54). At the neural level, we found a pattern of functional brain connectivity related to high-creative thinking ability consisting of frontal and parietal regions within default, salience, and executive brain systems. In a leave-one-out cross-validation analysis, we show that this neural model can reliably predict the creative quality of ideas generated by novel participants within the sample. Furthermore, in a series of external validation analyses using data from two independent task fMRI samples and a large task-free resting-state fMRI sample, we demonstrate robust prediction of individual creative thinking ability from the same pattern of brain connectivity. The findings thus reveal a whole-brain network associated with high-creative ability comprised of cortical hubs within default, salience, and executive systems-intrinsic functional networks that tend to work in opposition-suggesting that highly creative people are characterized by the ability to simultaneously engage these large-scale brain networks.
Peng, Youyi; Keenan, Susan M; Zhang, Qiang; Kholodovych, Vladyslav; Welsh, William J
2005-03-10
Three-dimensional quantitative structure-activity relationship (3D-QSAR) models were constructed using comparative molecular field analysis (CoMFA) on a series of opioid receptor antagonists. To obtain statistically significant and robust CoMFA models, a sizable data set of naltrindole and naltrexone analogues was assembled by pooling biological and structural data from independent studies. A process of "leave one data set out", similar to the traditional "leave one out" cross-validation procedure employed in partial least squares (PLS) analysis, was utilized to study the feasibility of pooling data in the present case. These studies indicate that our approach yields statistically significant and highly predictive CoMFA models from the pooled data set of delta, mu, and kappa opioid receptor antagonists. All models showed excellent internal predictability and self-consistency: q(2) = 0.69/r(2) = 0.91 (delta), q(2) = 0.67/r(2) = 0.92 (mu), and q(2) = 0.60/r(2) = 0.96 (kappa). The CoMFA models were further validated using two separate test sets: one test set was selected randomly from the pooled data set, while the other test set was retrieved from other published sources. The overall excellent agreement between CoMFA-predicted and experimental binding affinities for a structurally diverse array of ligands across all three opioid receptor subtypes gives testimony to the superb predictive power of these models. CoMFA field analysis demonstrated that the variations in binding affinity of opioid antagonists are dominated by steric rather than electrostatic interactions with the three opioid receptor binding sites. The CoMFA steric-electrostatic contour maps corresponding to the delta, mu, and kappa opioid receptor subtypes reflected the characteristic similarities and differences in the familiar "message-address" concept of opioid receptor ligands. Structural modifications to increase selectivity for the delta over mu and kappa opioid receptors have been predicted on the basis of the CoMFA contour maps. The structure-activity relationships (SARs) together with the CoMFA models should find utility for the rational design of subtype-selective opioid receptor antagonists.
Raman spectroscopic study of keratin 8 knockdown oral squamous cell carcinoma derived cells
NASA Astrophysics Data System (ADS)
Singh, S. P.; Alam, Hunain; Dmello, Crismita; Vaidya, Milind M.; Krishna, C. Murali
2012-03-01
Keratins are one of most widely used markers for oral cancers. Keratin 8 and 18 are expressed in simple epithelia and perform both mechanical and regulatory functions. Their expression are not seen in normal oral tissues but are often expressed in oral squamous cell carcinoma. Aberrant expression of keratins 8 and 18 is most common change in human oral cancer. Optical-spectroscopic methods are sensitive to biochemical changes and being projected as novel diagnostic tools for cancer diagnosis. Aim of this study was to evaluate potentials of Raman spectroscopy in detecting minor changes associated with differential level of keratin expression in tongue-cancer-derived AW13516 cells. Knockdown clones for K8 were generated and synchronized by growing under serum-free conditions. Cell pellets of three independent experiments in duplicate were used for recording Raman spectra with fiberoptic-probe coupled HE-785 Raman-instrument. A total of 123 and 96 spectra from knockdown clones and vector controls respectively in 1200-1800 cm-1 region were successfully utilized for classification using LDA. Two separate clusters with classification-efficiency of ~95% were obtained. Leave-one-out cross-validation yielded ~63% efficiency. Findings of the study demonstrate the potentials of Raman spectroscopy in detecting even subtle changes such as variations in keratin expression levels. Future studies towards identifying Raman signals from keratin in oral cells can help in precise cancer diagnosis.
The psychometric properties of an Iranian translation of the Work Ability Index (WAI) questionnaire.
Abdolalizadeh, M; Arastoo, A A; Ghsemzadeh, R; Montazeri, A; Ahmadi, K; Azizi, A
2012-09-01
This study was carried out to evaluate the psychometric properties of an Iranian translation of the Work Ability Index (WAI) questionnaire. In this methodological study, nurses and healthcare workers aged 40 years and older who worked in educational hospitals in Ahvaz (236 workers) in 2010, completed the questionnaire and 60 of the workers filled out the WAI questionnaire for the second time to ensure test-retest reliability. Forward-backward method was applied to translate the questionnaire from English into Persian. The psychometric properties of the Iranian translation of the WAI were assessed using the fallowing tests: Internal consistency (to test reliability), test-retest analysis, exploratory factor analysis (construct validity), discriminate validity by comparing the mean WAI score in two groups of the employees that had different levels of sick leave, criterion validity by determining the correlation between the Persian version of short form health survey (SF-36) and WAI score. Cronbach's alpha coefficient was estimated to be 0.79 and it was concluded that the internal consistency was high enough. The intraclass correlation coefficient was recognized to be 0.92. Factor analysis indicated three factors in the structure of the work ability including self-perceived work ability (24.5% of the variance), mental resources (22.23% of the variance), and presence of disease and health related limitation (18.55% of the variance). Statistical tests showed that this questionnaire was capable of discriminating two groups of employees who had different levels of sick leave. Criterion validity analysis showed that this instrument and all dimensions of the Iranian version of SF-36 were correlated significantly. Item correlation corrective for overlap showed the items tests had a good correlation except for one. The finding of the study showed that the Iranian version of the WAI is a reliable and valid measure of work ability and can be used both in research and practical activities.
Huet, S; Marie, J P; Gualde, N; Robert, J
1998-12-15
Multidrug resistance (MDR) associated with overexpression of the MDR1 gene and of its product, P-glycoprotein (Pgp), plays an important role in limiting cancer treatment efficacy. Many studies have investigated Pgp expression in clinical samples of hematological malignancies but failed to give definitive conclusion on its usefulness. One convenient method for fluorescent detection of Pgp in malignant cells is flow cytometry which however gives variable results from a laboratory to another one, partly due to the lack of a reference method rigorously tested. The purpose of this technical note is to describe each step of a reference flow cytometric method. The guidelines for sample handling, staining and analysis have been established both for Pgp detection with monoclonal antibodies directed against extracellular epitopes (MRK16, UIC2 and 4E3), and for Pgp functional activity measurement with Rhodamine 123 as a fluorescent probe. Both methods have been validated on cultured cell lines and clinical samples by 12 laboratories of the French Drug Resistance Network. This cross-validated multicentric study points out crucial steps for the accuracy and reproducibility of the results, like cell viability, data analysis and expression.
Reflectance spectroscopy: a tool for predicting the risk of iron chlorosis in soils
NASA Astrophysics Data System (ADS)
Cañasveras, J. C.; Barrón, V.; Del Campillo, M. C.; Viscarra Rossel, R. A.
2012-04-01
Chlorosis due to iron (Fe) deficiency is the most important nutritional problem a plant can have in calcareous soils. The most characteristic symptom of Fe chlorosis is internervial yellowing in the youngest leaves due to a lack of chlorophyll caused by a disorder in Fe nutrition. Fe chlorosis is related with calcium carbonate equivalent (CCE), clay content and Fe extracted with oxalate (Feo). The conventional technique for determining these properties and others, based on laboratory analysis, are time-consuming and costly. Reflectance spectroscopy (RS) is a rapid, non-destructive, less expensive alternative tool that can be used to enhance or replace conventional methods of soil analysis. The aim of this work was to assess the usefulness of RS for the determination of some properties of Mediterranean soils including clay content, CCE, Feo, cation exchange capacity (CEC), organic matter (OM) and pHw, with emphasis on those with a specially marked influence on the risk of Fe chlorosis. To this end, we used partial least-squares regression (PLS) to construct calibration models, leave-one-out cross-validation and an independent validation set. Our results testify to the usefulness of qualitative soil interpretations based on the variable importance for projection (VIP) as derived by PLS decomposition. The accuracy of predictions in each of the Vis-NIR, MIR and combined spectral regions differed considerably between properties. The R2adj and root mean square error (RMSE) for the external validation predictions were as follows: 0.83 and 37 mg kg-1 for clay content in the Vis-NIR-MIR range; 0.99 and 25 mg kg-1 for CCE, 0.80 and 0.1 mg kg-1 for Feo in the MIR range; 0.93 and 3 cmolc kg-1 for CEC in the Vis-NIR range; 0.87 and 2 mg kg-1 for OM in the Vis-NIR-MIR range, 0.61 and 0.2 for pHw in the MIR range. These results testify to the potential of RS in the Vis, NIR and MIR ranges for efficient soil analysis, the acquisition of soil information and the assessment of the risk of Fe chlorosis in soils.
Przednowek, Krzysztof; Iskra, Janusz; Wiktorowicz, Krzysztof; Krzeszowski, Tomasz; Maszczyk, Adam
2017-12-01
This paper presents a novel approach to planning training loads in hurdling using artificial neural networks. The neural models performed the task of generating loads for athletes' training for the 400 meters hurdles. All the models were calculated based on the training data of 21 Polish National Team hurdlers, aged 22.25 ± 1.96, competing between 1989 and 2012. The analysis included 144 training plans that represented different stages in the annual training cycle. The main contribution of this paper is to develop neural models for planning training loads for the entire career of a typical hurdler. In the models, 29 variables were used, where four characterized the runner and 25 described the training process. Two artificial neural networks were used: a multi-layer perceptron and a network with radial basis functions. To assess the quality of the models, the leave-one-out cross-validation method was used in which the Normalized Root Mean Squared Error was calculated. The analysis shows that the method generating the smallest error was the radial basis function network with nine neurons in the hidden layer. Most of the calculated training loads demonstrated a non-linear relationship across the entire competitive period. The resulting model can be used as a tool to assist a coach in planning training loads during a selected training period.
NOXclass: prediction of protein-protein interaction types.
Zhu, Hongbo; Domingues, Francisco S; Sommer, Ingolf; Lengauer, Thomas
2006-01-19
Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. Six interface properties have been investigated on a dataset of 243 protein interactions. The six properties have been combined using a support vector machine algorithm, resulting in NOXclass, a classifier for distinguishing obligate, non-obligate and crystal packing interactions. We achieve an accuracy of 91.8% for the classification of these three types of interactions using a leave-one-out cross-validation procedure. NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypotheses regarding the nature of protein-protein interactions, when experimental results are not available. We expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists. A web server based on the method and the datasets used in this study are available at http://noxclass.bioinf.mpi-inf.mpg.de/.
NASA Astrophysics Data System (ADS)
Dormer, James D.; Halicek, Martin; Ma, Ling; Reilly, Carolyn M.; Schreibmann, Eduard; Fei, Baowei
2018-02-01
Cardiovascular disease is a leading cause of death in the United States. The identification of cardiac diseases on conventional three-dimensional (3D) CT can have many clinical applications. An automated method that can distinguish between healthy and diseased hearts could improve diagnostic speed and accuracy when the only modality available is conventional 3D CT. In this work, we proposed and implemented convolutional neural networks (CNNs) to identify diseased hears on CT images. Six patients with healthy hearts and six with previous cardiovascular disease events received chest CT. After the left atrium for each heart was segmented, 2D and 3D patches were created. A subset of the patches were then used to train separate convolutional neural networks using leave-one-out cross-validation of patient pairs. The results of the two neural networks were compared, with 3D patches producing the higher testing accuracy. The full list of 3D patches from the left atrium was then classified using the optimal 3D CNN model, and the receiver operating curves (ROCs) were produced. The final average area under the curve (AUC) from the ROC curves was 0.840 +/- 0.065 and the average accuracy was 78.9% +/- 5.9%. This demonstrates that the CNN-based method is capable of distinguishing healthy hearts from those with previous cardiovascular disease.
QSAR models for thiophene and imidazopyridine derivatives inhibitors of the Polo-Like Kinase 1.
Comelli, Nieves C; Duchowicz, Pablo R; Castro, Eduardo A
2014-10-01
The inhibitory activity of 103 thiophene and 33 imidazopyridine derivatives against Polo-Like Kinase 1 (PLK1) expressed as pIC50 (-logIC50) was predicted by QSAR modeling. Multivariate linear regression (MLR) was employed to model the relationship between 0D and 3D molecular descriptors and biological activities of molecules using the replacement method (MR) as variable selection tool. The 136 compounds were separated into several training and test sets. Two splitting approaches, distribution of biological data and structural diversity, and the statistical experimental design procedure D-optimal distance were applied to the dataset. The significance of the training set models was confirmed by statistically higher values of the internal leave one out cross-validated coefficient of determination (Q2) and external predictive coefficient of determination for the test set (Rtest2). The model developed from a training set, obtained with the D-optimal distance protocol and using 3D descriptor space along with activity values, separated chemical features that allowed to distinguish high and low pIC50 values reasonably well. Then, we verified that such model was sufficient to reliably and accurately predict the activity of external diverse structures. The model robustness was properly characterized by means of standard procedures and their applicability domain (AD) was analyzed by leverage method. Copyright © 2014 Elsevier B.V. All rights reserved.
Incremental online learning in high dimensions.
Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan
2005-12-01
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Nohara, Ryuki; Endo, Yui; Murai, Akihiko; Takemura, Hiroshi; Kouchi, Makiko; Tada, Mitsunori
2016-08-01
Individual human models are usually created by direct 3D scanning or deforming a template model according to the measured dimensions. In this paper, we propose a method to estimate all the necessary dimensions (full set) for the human model individualization from a small number of measured dimensions (subset) and human dimension database. For this purpose, we solved multiple regression equation from the dimension database given full set dimensions as the objective variable and subset dimensions as the explanatory variables. Thus, the full set dimensions are obtained by simply multiplying the subset dimensions to the coefficient matrix of the regression equation. We verified the accuracy of our method by imputing hand, foot, and whole body dimensions from their dimension database. The leave-one-out cross validation is employed in this evaluation. The mean absolute errors (MAE) between the measured and the estimated dimensions computed from 4 dimensions (hand length, breadth, middle finger breadth at proximal, and middle finger depth at proximal) in the hand, 2 dimensions (foot length, breadth, and lateral malleolus height) in the foot, and 1 dimension (height) and weight in the whole body are computed. The average MAE of non-measured dimensions were 4.58% in the hand, 4.42% in the foot, and 3.54% in the whole body, while that of measured dimensions were 0.00%.
Jiang, Yang; Gong, Yuanzheng; Rubenstein, Joel H; Wang, Thomas D; Seibel, Eric J
2017-04-01
Multimodal endoscopy using fluorescence molecular probes is a promising method of surveying the entire esophagus to detect cancer progression. Using the fluorescence ratio of a target compared to a surrounding background, a quantitative value is diagnostic for progression from Barrett's esophagus to high-grade dysplasia (HGD) and esophageal adenocarcinoma (EAC). However, current quantification of fluorescent images is done only after the endoscopic procedure. We developed a Chan-Vese-based algorithm to segment fluorescence targets, and subsequent morphological operations to generate background, thus calculating target/background (T/B) ratios, potentially to provide real-time guidance for biopsy and endoscopic therapy. With an initial processing speed of 2 fps and by calculating the T/B ratio for each frame, our method provides quasireal-time quantification of the molecular probe labeling to the endoscopist. Furthermore, an automatic computer-aided diagnosis algorithm can be applied to the recorded endoscopic video, and the overall T/B ratio is calculated for each patient. The receiver operating characteristic curve was employed to determine the threshold for classification of HGD/EAC using leave-one-out cross-validation. With 92% sensitivity and 75% specificity to classify HGD/EAC, our automatic algorithm shows promising results for a surveillance procedure to help manage esophageal cancer and other cancers inspected by endoscopy.
Iskra, Janusz; Wiktorowicz, Krzysztof; Krzeszowski, Tomasz; Maszczyk, Adam
2017-01-01
Abstract This paper presents a novel approach to planning training loads in hurdling using artificial neural networks. The neural models performed the task of generating loads for athletes’ training for the 400 meters hurdles. All the models were calculated based on the training data of 21 Polish National Team hurdlers, aged 22.25 ± 1.96, competing between 1989 and 2012. The analysis included 144 training plans that represented different stages in the annual training cycle. The main contribution of this paper is to develop neural models for planning training loads for the entire career of a typical hurdler. In the models, 29 variables were used, where four characterized the runner and 25 described the training process. Two artificial neural networks were used: a multi-layer perceptron and a network with radial basis functions. To assess the quality of the models, the leave-one-out cross-validation method was used in which the Normalized Root Mean Squared Error was calculated. The analysis shows that the method generating the smallest error was the radial basis function network with nine neurons in the hidden layer. Most of the calculated training loads demonstrated a non-linear relationship across the entire competitive period. The resulting model can be used as a tool to assist a coach in planning training loads during a selected training period. PMID:29339998
Sun, Jin; Kelbert, Anna; Egbert, G.D.
2015-01-01
Long-period global-scale electromagnetic induction studies of deep Earth conductivity are based almost exclusively on magnetovariational methods and require accurate models of external source spatial structure. We describe approaches to inverting for both the external sources and three-dimensional (3-D) conductivity variations and apply these methods to long-period (T≥1.2 days) geomagnetic observatory data. Our scheme involves three steps: (1) Observatory data from 60 years (only partly overlapping and with many large gaps) are reduced and merged into dominant spatial modes using a scheme based on frequency domain principal components. (2) Resulting modes are inverted for corresponding external source spatial structure, using a simplified conductivity model with radial variations overlain by a two-dimensional thin sheet. The source inversion is regularized using a physically based source covariance, generated through superposition of correlated tilted zonal (quasi-dipole) current loops, representing ionospheric source complexity smoothed by Earth rotation. Free parameters in the source covariance model are tuned by a leave-one-out cross-validation scheme. (3) The estimated data modes are inverted for 3-D Earth conductivity, assuming the source excitation estimated in step 2. Together, these developments constitute key components in a practical scheme for simultaneous inversion of the catalogue of historical and modern observatory data for external source spatial structure and 3-D Earth conductivity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fave, X; Court, L; UT Health Science Center, Graduate School of Biomedical Sciences, Houston, TX
Purpose: To determine how radiomics features change during radiation therapy and whether those changes (delta-radiomics features) can improve prognostic models built with clinical factors. Methods: 62 radiomics features, including histogram, co-occurrence, run-length, gray-tone difference, and shape features, were calculated from pretreatment and weekly intra-treatment CTs for 107 stage III NSCLC patients (5–9 images per patient). Image preprocessing for each feature was determined using the set of pretreatment images: bit-depth resample and/or a smoothing filter were tested for their impact on volume-correlation and significance of each feature in univariate cox regression models to maximize their information content. Next, the optimized featuresmore » were calculated from the intratreatment images and tested in linear mixed-effects models to determine which features changed significantly with dose-fraction. The slopes in these significant features were defined as delta-radiomics features. To test their prognostic potential multivariate cox regression models were fitted, first using only clinical features and then clinical+delta-radiomics features for overall-survival, local-recurrence, and distant-metastases. Leave-one-out cross validation was used for model-fitting and patient predictions. Concordance indices(c-index) and p-values for the log-rank test with patients stratified at the median were calculated. Results: Approximately one-half of the 62 optimized features required no preprocessing, one-fourth required smoothing, and one-fourth required smoothing and resampling. From these, 54 changed significantly during treatment. For overall-survival, the c-index improved from 0.52 for clinical factors alone to 0.62 for clinical+delta-radiomics features. For distant-metastases, the c-index improved from 0.53 to 0.58, while for local-recurrence it did not improve. Patient stratification significantly improved (p-value<0.05) for overallsurvival and distant-metastases when delta-radiomics features were included. The delta-radiomics versions of autocorrelation, kurtosis, and compactness were selected most frequently in leave-one-out iterations. Conclusion: Weekly changes in radiomics features can potentially be used to evaluate treatment response and predict patient outcomes. High-risk patients could be recommended for dose escalation or consolidation chemotherapy. This project was funded in part by grants from the National Cancer Institute (NCI) and the Cancer Prevention Research Institute of Texas (CPRIT).« less
NASA Astrophysics Data System (ADS)
Cook, Ellyn J.; van der Kaars, Sander
2006-10-01
We review attempts to derive quantitative climatic estimates from Australian pollen data, including the climatic envelope, climatic indicator and modern analogue approaches, and outline the need to pursue alternatives for use as input to, or validation of, simulations by models of past, present and future climate patterns. To this end, we have constructed and tested modern pollen-climate transfer functions for mainland southeastern Australia and Tasmania using the existing southeastern Australian pollen database and for northern Australia using a new pollen database we are developing. After testing for statistical significance, 11 parameters were selected for mainland southeastern Australia, seven for Tasmania and six for northern Australia. The functions are based on weighted-averaging partial least squares regression and their predictive ability evaluated against modern observational climate data using leave-one-out cross-validation. Functions for summer, annual and winter rainfall and temperatures are most robust for southeastern Australia, while in Tasmania functions for minimum temperature of the coldest period, mean winter and mean annual temperature are the most reliable. In northern Australia, annual and summer rainfall and annual and summer moisture indexes are the strongest. The validation of all functions means all can be applied to Quaternary pollen records from these three areas with confidence. Copyright
NASA Technical Reports Server (NTRS)
Pliutau, Denis; Prasad, Narasimha S
2013-01-01
Studies were performed to carry out semi-empirical validation of a new measurement approach we propose for molecular mixing ratios determination. The approach is based on relative measurements in bands of O2 and other molecules and as such may be best described as cross band relative absorption (CoBRA). . The current validation studies rely upon well verified and established theoretical and experimental databases, satellite data assimilations and modeling codes such as HITRAN, line-by-line radiative transfer model (LBLRTM), and the modern-era retrospective analysis for research and applications (MERRA). The approach holds promise for atmospheric mixing ratio measurements of CO2 and a variety of other molecules currently under investigation for several future satellite lidar missions. One of the advantages of the method is a significant reduction of the temperature sensitivity uncertainties which is illustrated with application to the ASCENDS mission for the measurement of CO2 mixing ratios (XCO2). Additional advantages of the method include the possibility to closely match cross-band weighting function combinations which is harder to achieve using conventional differential absorption techniques and the potential for additional corrections for water vapor and other interferences without using the data from numerical weather prediction (NWP) models.
Pakravan, M; Abedinzadeh, H; Safaeepur, J
2007-08-01
Distribution of Mucilage Cells (MC) in leaves and petals of two species of Malva L. : Malva neglecta Wallr and M. nicaeensis All, one species of Altheae L.: A. officinalis L. and one species of Alcea L: A. angulata (Freyn and Sint.) Freyn and Sint. ex Iljin, have studied. Except ofA. angulata that mucilage cells observed both in epidermis and mesophyll of leaves, in the others mucilage cells confined to epidermis cells. All of species have mucilage cells in the petals. The area of the mucilaginous elements in the leaves and petals of species determined planimetrically on definite cross-sections was studied as a comparative element to the mucilage content determined by extracting the raw mucilage by Hot Extraction Method (HEM) and then by comparing the dry weight, comparison between species was done. A correlation between the greater area of the mucilaginous elements and the mucilage content measured by methods mentioned was shown, basing on different microscopic examination of cross-sections of the organs fixed and stained with ruthenium red. The results were shown that mucilage content in the leaves of Malva neglecta was more than the others and mucilage content in petals of Malva neglecta was more than the others.
Eeftens, Marloes; Meier, Reto; Schindler, Christian; Aguilera, Inmaculada; Phuleria, Harish; Ineichen, Alex; Davey, Mark; Ducret-Stich, Regina; Keidel, Dirk; Probst-Hensch, Nicole; Künzli, Nino; Tsai, Ming-Yi
2016-04-18
Land Use Regression (LUR) is a popular method to explain and predict spatial contrasts in air pollution concentrations, but LUR models for ultrafine particles, such as particle number concentration (PNC) are especially scarce. Moreover, no models have been previously presented for the lung deposited surface area (LDSA) of ultrafine particles. The additional value of ultrafine particle metrics has not been well investigated due to lack of exposure measurements and models. Air pollution measurements were performed in 2011 and 2012 in the eight areas of the Swiss SAPALDIA study at up to 40 sites per area for NO2 and at 20 sites in four areas for markers of particulate air pollution. We developed multi-area LUR models for biannual average concentrations of PM2.5, PM2.5 absorbance, PM10, PMcoarse, PNC and LDSA, as well as alpine, non-alpine and study area specific models for NO2, using predictor variables which were available at a national level. Models were validated using leave-one-out cross-validation, as well as independent external validation with routine monitoring data. Model explained variance (R(2)) was moderate for the various PM mass fractions PM2.5 (0.57), PM10 (0.63) and PMcoarse (0.45), and was high for PM2.5 absorbance (0.81), PNC (0.87) and LDSA (0.91). Study-area specific LUR models for NO2 (R(2) range 0.52-0.89) outperformed combined-area alpine (R (2) = 0.53) and non-alpine (R (2) = 0.65) models in terms of both cross-validation and independent external validation, and were better able to account for between-area variability. Predictor variables related to traffic and national dispersion model estimates were important predictors. LUR models for all pollutants captured spatial variability of long-term average concentrations, performed adequately in validation, and could be successfully applied to the SAPALDIA cohort. Dispersion model predictions or area indicators served well to capture the between area variance. For NO2, applying study-area specific models was preferable over applying combined-area alpine/non-alpine models. Correlations between pollutants were higher in the model predictions than in the measurements, so it will remain challenging to disentangle their health effects.
Modeling and predicting tumor response in radioligand therapy.
Kletting, Peter; Thieme, Anne; Eberhardt, Nina; Rinscheid, Andreas; D'Alessandria, Calogero; Allmann, Jakob; Wester, Hans-Jürgen; Tauber, Robert; Beer, Ambros J; Glatting, Gerhard; Eiber, Matthias
2018-05-10
The aim of this work was to develop a theranostic method that allows predicting PSMA-positive tumor volume after radioligand therapy (RLT) based on a pre-therapeutic PET/CT measurement and physiologically based pharmacokinetic/dynamic (PBPK/PD) modeling at the example of RLT using 177 Lu-labeled PSMA for imaging and therapy (PSMA I&T). Methods: A recently developed PBPK model for 177 Lu PSMA I&T RLT was extended to account for tumor (exponential) growth and reduction due to irradiation (linear quadratic model). Data of 13 patients with metastatic castration-resistant prostate cancer (mCRPC) were retrospectively analyzed. Pharmacokinetic/dynamic parameters were simultaneously fitted in a Bayesian framework to PET/CT activity concentrations, planar scintigraphy data and tumor volumes prior and post (6 weeks) therapy. The method was validated using the leave-one-out Jackknife method. The tumor volume post therapy was predicted based on pre-therapy PET/CT imaging and PBPK/PD modeling. Results: The relative deviation of the predicted and measured tumor volume for PSMA-positive tumor cells (6 weeks post therapy) was 1±40% excluding one patient (PSA negative) from the population. The radiosensitivity for the PSA positive patients was determined to be 0.0172±0.0084 Gy-1. Conclusion: The proposed method is the first attempt to solely use PET/CT and modeling methods to predict the PSMA-positive tumor volume after radioligand therapy. Internal validation shows that this is feasible with an acceptable accuracy. Improvement of the method and external validation of the model is ongoing. Copyright © 2018 by the Society of Nuclear Medicine and Molecular Imaging, Inc.
Random ensemble learning for EEG classification.
Hosseini, Mohammad-Parsa; Pompili, Dario; Elisevich, Kost; Soltanian-Zadeh, Hamid
2018-01-01
Real-time detection of seizure activity in epilepsy patients is critical in averting seizure activity and improving patients' quality of life. Accurate evaluation, presurgical assessment, seizure prevention, and emergency alerts all depend on the rapid detection of seizure onset. A new method of feature selection and classification for rapid and precise seizure detection is discussed wherein informative components of electroencephalogram (EEG)-derived data are extracted and an automatic method is presented using infinite independent component analysis (I-ICA) to select independent features. The feature space is divided into subspaces via random selection and multichannel support vector machines (SVMs) are used to classify these subspaces. The result of each classifier is then combined by majority voting to establish the final output. In addition, a random subspace ensemble using a combination of SVM, multilayer perceptron (MLP) neural network and an extended k-nearest neighbors (k-NN), called extended nearest neighbor (ENN), is developed for the EEG and electrocorticography (ECoG) big data problem. To evaluate the solution, a benchmark ECoG of eight patients with temporal and extratemporal epilepsy was implemented in a distributed computing framework as a multitier cloud-computing architecture. Using leave-one-out cross-validation, the accuracy, sensitivity, specificity, and both false positive and false negative ratios of the proposed method were found to be 0.97, 0.98, 0.96, 0.04, and 0.02, respectively. Application of the solution to cases under investigation with ECoG has also been effected to demonstrate its utility. Copyright © 2017 Elsevier B.V. All rights reserved.
Gong, Gordon; Mattevada, Sravan; O'Bryant, Sid E
2014-04-01
Exposure to arsenic causes many diseases. Most Americans in rural areas use groundwater for drinking, which may contain arsenic above the currently allowable level, 10µg/L. It is cost-effective to estimate groundwater arsenic levels based on data from wells with known arsenic concentrations. We compared the accuracy of several commonly used interpolation methods in estimating arsenic concentrations in >8000 wells in Texas by the leave-one-out-cross-validation technique. Correlation coefficient between measured and estimated arsenic levels was greater with inverse distance weighted (IDW) than kriging Gaussian, kriging spherical or cokriging interpolations when analyzing data from wells in the entire Texas (p<0.0001). Correlation coefficient was significantly lower with cokriging than any other methods (p<0.006) for wells in Texas, east Texas or the Edwards aquifer. Correlation coefficient was significantly greater for wells in southwestern Texas Panhandle than in east Texas, and was higher for wells in Ogallala aquifer than in Edwards aquifer (p<0.0001) regardless of interpolation methods. In regression analysis, the best models are when well depth and/or elevation were entered into the model as covariates regardless of area/aquifer or interpolation methods, and models with IDW are better than kriging in any area/aquifer. In conclusion, the accuracy in estimating groundwater arsenic level depends on both interpolation methods and wells' geographic distributions and characteristics in Texas. Taking well depth and elevation into regression analysis as covariates significantly increases the accuracy in estimating groundwater arsenic level in Texas with IDW in particular. Published by Elsevier Inc.
[Rapid identification of potato cultivars using NIR-excited fluorescence and Raman spectroscopy].
Dai, Fen; Bergholt, Mads Sylvest; Benjamin, Arnold Julian Vinoj; Hong, Tian-Sheng; Zhiwei, Huang
2014-03-01
Potato is one of the most important food in the world. Rapid and noninvasive identification of potato cultivars plays a important role in the better use of varieties. In this study, The identification ability of optical spectroscopy techniques, including near-infrared (NIR) Raman spectroscopy and NIR fluorescence spectroscopy, for invasive detection of potato cultivars was evaluated. A rapid NIR Raman spectroscopy system was applied to measure the composite Raman and NIR fluorescence spectroscopy of 3 different species of potatoes (98 samples in total) under 785 nm laser light excitation. Then pure Raman and NIR fluorescence spectroscopy were abstracted from the composite spectroscopy, respectively. At last, the partial least squares-discriminant analysis (PLS-DA) was utilized to analyze and classify Raman spectra of 3 different types of potatoes. All the samples were divided into two sets at random: the calibration set (74samples) and prediction set (24 samples), the model was validated using a leave-one-out, cross-validation method. The results showed that both the NIR-excited fluorescence spectra and pure Raman spectra could be used to identify three cultivars of potatoes. The fluorescence spectrum could distinguish the Favorita variety well (sensitivity: 1, specificity: 0.86 and accuracy: 0.92), but the result for Diamant (sensitivity: 0.75, specificity: 0.75 and accuracy: 0. 75) and Granola (sensitivity: 0.16, specificity: 0.89 and accuracy: 0.71) cultivars identification were a bit poorer. We demonstrated that Raman spectroscopy uncovered the main biochemical compositions contained in potato species, and provided a better classification sensitivity, specificity and accuracy (sensitivity: 1, specificity: 1 and accuracy: 1 for all 3 potato cultivars identification) among the three types of potatoes as compared to fluorescence spectroscopy.
A simple metric to predict stream water quality from storm runoff in an urban watershed.
Easton, Zachary M; Sullivan, Patrick J; Walter, M Todd; Fuka, Daniel R; Petrovic, A Martin; Steenhuis, Tammo S
2010-01-01
The contribution of runoff from various land uses to stream channels in a watershed is often speculated and used to underpin many model predictions. However, these contributions, often based on little or no measurements in the watershed, fail to appropriately consider the influence of the hydrologic location of a particular landscape unit in relation to the stream network. A simple model was developed to predict storm runoff and the phosphorus (P) status of a perennial stream in an urban watershed in New York State using the covariance structure of runoff from different landscape units in the watershed to predict runoff in time. One hundred and twenty-seven storm events were divided into parameterization (n = 85) and forecasting (n = 42) data sets. Runoff, dissolved P (DP), and total P (TP) were measured at nine sites distributed among three land uses (high maintenance, unmaintained, wooded), three positions in the watershed (near the outlet, midwatershed, upper watershed), and in the stream at the watershed outlet. The autocorrelation among runoff and P concentrations from the watershed landscape units (n = 9) and the covariance between measurements from the landscape units and measurements from the stream were calculated and used to predict the stream response. Models, validated using leave-one-out cross-validation and a forecasting method, were able to correctly capture temporal trends in streamflow and stream P chemistry (Nash-Sutcliffe efficiencies, 0.49-0.88). The analysis suggests that the covariance structure was consistent for all models, indicating that the physical processes governing runoff and P loss from these landscape units were stationary in time and that landscapes located in hydraulically active areas have a direct hydraulic link to the stream. This methodology provides insight into the impact of various urban landscape units on stream water quantity and quality.
Bitella, Giovanni; Rossi, Roberta; Bochicchio, Rocco; Perniola, Michele; Amato, Mariana
2014-10-21
Monitoring soil water content at high spatio-temporal resolution and coupled to other sensor data is crucial for applications oriented towards water sustainability in agriculture, such as precision irrigation or phenotyping root traits for drought tolerance. The cost of instrumentation, however, limits measurement frequency and number of sensors. The objective of this work was to design a low cost "open hardware" platform for multi-sensor measurements including water content at different depths, air and soil temperatures. The system is based on an open-source ARDUINO microcontroller-board, programmed in a simple integrated development environment (IDE). Low cost high-frequency dielectric probes were used in the platform and lab tested on three non-saline soils (ECe1: 2.5 < 0.1 mS/cm). Empirical calibration curves were subjected to cross-validation (leave-one-out method), and normalized root mean square error (NRMSE) were respectively 0.09 for the overall model, 0.09 for the sandy soil, 0.07 for the clay loam and 0.08 for the sandy loam. The overall model (pooled soil data) fitted the data very well (R2 = 0.89) showing a high stability, being able to generate very similar RMSEs during training and validation (RMSE(training) = 2.63; RMSE(validation) = 2.61). Data recorded on the card were automatically sent to a remote server allowing repeated field-data quality checks. This work provides a framework for the replication and upgrading of a customized low cost platform, consistent with the open source approach whereby sharing information on equipment design and software facilitates the adoption and continuous improvement of existing technologies.
HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction
Zhang, Xu; You, Zhu-Hong; Huang, Yu-An; Yan, Gui-Ying
2016-01-01
Recently, microRNAs (miRNAs) have drawn more and more attentions because accumulating experimental studies have indicated miRNA could play critical roles in multiple biological processes as well as the development and progression of human complex diseases. Using the huge number of known heterogeneous biological datasets to predict potential associations between miRNAs and diseases is an important topic in the field of biology, medicine, and bioinformatics. In this study, considering the limitations in the previous computational methods, we developed the computational model of Heterogeneous Graph Inference for MiRNA-Disease Association prediction (HGIMDA) to uncover potential miRNA-disease associations by integrating miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity, and experimentally verified miRNA-disease associations into a heterogeneous graph. HGIMDA obtained AUCs of 0.8781 and 0.8077 based on global and local leave-one-out cross validation, respectively. Furthermore, HGIMDA was applied to three important human cancers for performance evaluation. As a result, 90% (Colon Neoplasms), 88% (Esophageal Neoplasms) and 88% (Kidney Neoplasms) of top 50 predicted miRNAs are confirmed by recent experiment reports. Furthermore, HGIMDA could be effectively applied to new diseases and new miRNAs without any known associations, which overcome the important limitations of many previous computational models. PMID:27533456
HGIMDA: Heterogeneous graph inference for miRNA-disease association prediction.
Chen, Xing; Yan, Chenggang Clarence; Zhang, Xu; You, Zhu-Hong; Huang, Yu-An; Yan, Gui-Ying
2016-10-04
Recently, microRNAs (miRNAs) have drawn more and more attentions because accumulating experimental studies have indicated miRNA could play critical roles in multiple biological processes as well as the development and progression of human complex diseases. Using the huge number of known heterogeneous biological datasets to predict potential associations between miRNAs and diseases is an important topic in the field of biology, medicine, and bioinformatics. In this study, considering the limitations in the previous computational methods, we developed the computational model of Heterogeneous Graph Inference for MiRNA-Disease Association prediction (HGIMDA) to uncover potential miRNA-disease associations by integrating miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity, and experimentally verified miRNA-disease associations into a heterogeneous graph. HGIMDA obtained AUCs of 0.8781 and 0.8077 based on global and local leave-one-out cross validation, respectively. Furthermore, HGIMDA was applied to three important human cancers for performance evaluation. As a result, 90% (Colon Neoplasms), 88% (Esophageal Neoplasms) and 88% (Kidney Neoplasms) of top 50 predicted miRNAs are confirmed by recent experiment reports. Furthermore, HGIMDA could be effectively applied to new diseases and new miRNAs without any known associations, which overcome the important limitations of many previous computational models.
Improved Rubin-Bodner Model for the Prediction of Soft Tissue Deformations
Zhang, Guangming; Xia, James J.; Liebschner, Michael; Zhang, Xiaoyan; Kim, Daeseung; Zhou, Xiaobo
2016-01-01
In craniomaxillofacial (CMF) surgery, a reliable way of simulating the soft tissue deformation resulted from skeletal reconstruction is vitally important for preventing the risks of facial distortion postoperatively. However, it is difficult to simulate the soft tissue behaviors affected by different types of CMF surgery. This study presents an integrated bio-mechanical and statistical learning model to improve accuracy and reliability of predictions on soft facial tissue behavior. The Rubin-Bodner (RB) model is initially used to describe the biomechanical behavior of the soft facial tissue. Subsequently, a finite element model (FEM) computers the stress of each node in soft facial tissue mesh data resulted from bone displacement. Next, the Generalized Regression Neural Network (GRNN) method is implemented to obtain the relationship between the facial soft tissue deformation and the stress distribution corresponding to different CMF surgical types and to improve evaluation of elastic parameters included in the RB model. Therefore, the soft facial tissue deformation can be predicted by biomechanical properties and statistical model. Leave-one-out cross-validation is used on eleven patients. As a result, the average prediction error of our model (0.7035mm) is lower than those resulting from other approaches. It also demonstrates that the more accurate bio-mechanical information the model has, the better prediction performance it could achieve. PMID:27717593
Zhang, Jian; Zhao, Xiaowei; Sun, Pingping; Gao, Bo; Ma, Zhiqiang
2014-01-01
B-cell epitopes are regions of the antigen surface which can be recognized by certain antibodies and elicit the immune response. Identification of epitopes for a given antigen chain finds vital applications in vaccine and drug research. Experimental prediction of B-cell epitopes is time-consuming and resource intensive, which may benefit from the computational approaches to identify B-cell epitopes. In this paper, a novel cost-sensitive ensemble algorithm is proposed for predicting the antigenic determinant residues and then a spatial clustering algorithm is adopted to identify the potential epitopes. Firstly, we explore various discriminative features from primary sequences. Secondly, cost-sensitive ensemble scheme is introduced to deal with imbalanced learning problem. Thirdly, we adopt spatial algorithm to tell which residues may potentially form the epitopes. Based on the strategies mentioned above, a new predictor, called CBEP (conformational B-cell epitopes prediction), is proposed in this study. CBEP achieves good prediction performance with the mean AUC scores (AUCs) of 0.721 and 0.703 on two benchmark datasets (bound and unbound) using the leave-one-out cross-validation (LOOCV). When compared with previous prediction tools, CBEP produces higher sensitivity and comparable specificity values. A web server named CBEP which implements the proposed method is available for academic use.
Towards Measuring Stress with Smartphones and Wearable Devices During Workday and Sleep.
Muaremi, Amir; Arnrich, Bert; Tröster, Gerhard
2013-01-01
Work should be a source of health, pride, and happiness, in the sense of enhancing motivation and strengthening personal development. Healthy and motivated employees perform better and remain loyal to the company for a longer time. But, when the person constantly experiences high workload over a longer period of time and is not able to recover, then work may lead to prolonged negative effects and might cause serious illnesses like chronic stress disease. In this work, we present a solution for assessing the stress experience of people, using features derived from smartphones and wearable chest belts. In particular, we use information from audio, physical activity, and communication data collected during workday and heart rate variability data collected at night during sleep to build multinomial logistic regression models. We evaluate our system in a real work environment and in daily-routine scenarios of 35 employees over a period of 4 months and apply the leave-one-day-out cross-validation method for each user individually to estimate the prediction accuracy. Using only smartphone features, we get an accuracy of 55 %, and using only heart rate variability features, we get an accuracy of 59 %. The combination of all features leads to a rate of 61 % for a three-stress level (low, moderate, and high perceived stress) classification problem.
Yang, Xiaofei; Gao, Lin; Guo, Xingli; Shi, Xinghua; Wu, Hao; Song, Fei; Wang, Bingbo
2014-01-01
Increasing evidence has indicated that long non-coding RNAs (lncRNAs) are implicated in and associated with many complex human diseases. Despite of the accumulation of lncRNA-disease associations, only a few studies had studied the roles of these associations in pathogenesis. In this paper, we investigated lncRNA-disease associations from a network view to understand the contribution of these lncRNAs to complex diseases. Specifically, we studied both the properties of the diseases in which the lncRNAs were implicated, and that of the lncRNAs associated with complex diseases. Regarding the fact that protein coding genes and lncRNAs are involved in human diseases, we constructed a coding-non-coding gene-disease bipartite network based on known associations between diseases and disease-causing genes. We then applied a propagation algorithm to uncover the hidden lncRNA-disease associations in this network. The algorithm was evaluated by leave-one-out cross validation on 103 diseases in which at least two genes were known to be involved, and achieved an AUC of 0.7881. Our algorithm successfully predicted 768 potential lncRNA-disease associations between 66 lncRNAs and 193 diseases. Furthermore, our results for Alzheimer's disease, pancreatic cancer, and gastric cancer were verified by other independent studies. PMID:24498199
NASA Astrophysics Data System (ADS)
Cicchi, Riccardo; Anand, Suresh; Rossari, Susanna; Sturiale, Alessandro; Giordano, Flavio; De Giorgi, Vincenzo; Maio, Vincenza; Massi, Daniela; Nesi, Gabriella; Buccoliero, Anna Maria; Tonelli, Francesco; Guerrini, Renzo; Pimpinelli, Nicola; Pavone, Francesco S.
2015-03-01
Two different optical fiber probes for combined Raman and fluorescence spectroscopic measurements were designed, developed and used for tissue diagnostics. Two visible laser diodes were used for fluorescence spectroscopy, whereas a laser diode emitting in the NIR was used for Raman spectroscopy. The two probes were based on fiber bundles with a central multimode optical fiber, used for delivering light to the tissue, and 24 surrounding optical fibers for signal collection. Both fluorescence and Raman spectra were acquired using the same detection unit, based on a cooled CCD camera, connected to a spectrograph. The two probes were successfully employed for diagnostic purposes on various tissues in a good agreement with common routine histology. This study included skin, brain and bladder tissues and in particular the classification of: malignant melanoma against melanocytic lesions and healthy skin; urothelial carcinoma against healthy bladder mucosa; brain tumor against dysplastic brain tissue. The diagnostic capabilities were determined using a cross-validation method with a leave-one-out approach, finding very high sensitivity and specificity for all the examined tissues. The obtained results demonstrated that the multimodal approach is crucial for improving diagnostic capabilities. The system presented here can improve diagnostic capabilities on a broad range of tissues and has the potential of being used for endoscopic inspections in the near future.
NASA Astrophysics Data System (ADS)
Cicchi, Riccardo; Anand, Suresh; Crisci, Alfonso; Giordano, Flavio; Rossari, Susanna; De Giorgi, Vincenzo; Maio, Vincenza; Massi, Daniela; Nesi, Gabriella; Buccoliero, Anna Maria; Guerrini, Renzo; Pimpinelli, Nicola; Pavone, Francesco S.
2015-07-01
Two different optical fiber probes for combined Raman and fluorescence spectroscopic measurements were designed, developed and used for tissue diagnostics. Two visible laser diodes were used for fluorescence spectroscopy, whereas a laser diode emitting in the NIR was used for Raman spectroscopy. The two probes were based on fiber bundles with a central multimode optical fiber, used for delivering light to the tissue, and 24 surrounding optical fibers for signal collection. Both fluorescence and Raman spectra were acquired using the same detection unit, based on a cooled CCD camera, connected to a spectrograph. The two probes were successfully employed for diagnostic purposes on various tissues in a good agreement with common routine histology. This study included skin, brain and bladder tissues and in particular the classification of: malignant melanoma against melanocytic lesions and healthy skin; urothelial carcinoma against healthy bladder mucosa; brain tumor against dysplastic brain tissue. The diagnostic capabilities were determined using a cross-validation method with a leave-one-out approach, finding very high sensitivity and specificity for all the examined tissues. The obtained results demonstrated that the multimodal approach is crucial for improving diagnostic capabilities. The system presented here can improve diagnostic capabilities on a broad range of tissues and has the potential of being used for endoscopic inspections in the near future.
Scanning elastic scattering spectroscopy detects metastatic breast cancer in sentinel lymph nodes
NASA Astrophysics Data System (ADS)
Austwick, Martin R.; Clark, Benjamin; Mosse, Charles A.; Johnson, Kristie; Chicken, D. Wayne; Somasundaram, Santosh K.; Calabro, Katherine W.; Zhu, Ying; Falzon, Mary; Kocjan, Gabrijela; Fearn, Tom; Bown, Stephen G.; Bigio, Irving J.; Keshtgar, Mohammed R. S.
2010-07-01
A novel method for rapidly detecting metastatic breast cancer within excised sentinel lymph node(s) of the axilla is presented. Elastic scattering spectroscopy (ESS) is a point-contact technique that collects broadband optical spectra sensitive to absorption and scattering within the tissue. A statistical discrimination algorithm was generated from a training set of nearly 3000 clinical spectra and used to test clinical spectra collected from an independent set of nodes. Freshly excised nodes were bivalved and mounted under a fiber-optic plate. Stepper motors raster-scanned a fiber-optic probe over the plate to interrogate the node's cut surface, creating a 20×20 grid of spectra. These spectra were analyzed to create a map of cancer risk across the node surface. Rules were developed to convert these maps to a prediction for the presence of cancer in the node. Using these analyses, a leave-one-out cross-validation to optimize discrimination parameters on 128 scanned nodes gave a sensitivity of 69% for detection of clinically relevant metastases (71% for macrometastases) and a specificity of 96%, comparable to literature results for touch imprint cytology, a standard technique for intraoperative diagnosis. ESS has the advantage of not requiring a pathologist to review the tissue sample.
Prognosis Relevance of Serum Cytokines in Pancreatic Cancer
Alejandre, Maria José; Palomino-Morales, Rogelio J.; Prados, Jose; Aránega, Antonia; Delgado, Juan R.; Irigoyen, Antonio; Martínez-Galán, Joaquina; Ortuño, Francisco M.
2015-01-01
The overall survival of patients with pancreatic ductal adenocarcinoma is extremely low. Although gemcitabine is the standard used chemotherapy for this disease, clinical outcomes do not reflect significant improvements, not even when combined with adjuvant treatments. There is an urgent need for prognosis markers to be found. The aim of this study was to analyze the potential value of serum cytokines to find a profile that can predict the clinical outcome in patients with pancreatic cancer and to establish a practical prognosis index that significantly predicts patients' outcomes. We have conducted an extensive analysis of serum prognosis biomarkers using an antibody array comprising 507 human cytokines. Overall survival was estimated using the Kaplan-Meier method. Univariate and multivariate Cox's proportional hazard models were used to analyze prognosis factors. To determine the extent that survival could be predicted based on this index, we used the leave-one-out cross-validation model. The multivariate model showed a better performance and it could represent a novel panel of serum cytokines that correlates to poor prognosis in pancreatic cancer. B7-1/CD80, EG-VEGF/PK1, IL-29, NRG1-beta1/HRG1-beta1, and PD-ECGF expressions portend a poor prognosis for patients with pancreatic cancer and these cytokines could represent novel therapeutic targets for this disease. PMID:26346854
Burnside, Elizabeth S.; Drukker, Karen; Li, Hui; Bonaccio, Ermelinda; Zuley, Margarita; Ganott, Marie; Net, Jose M.; Sutton, Elizabeth; Brandt, Kathleen R.; Whitman, Gary; Conzen, Suzanne; Lan, Li; Ji, Yuan; Zhu, Yitan; Jaffe, Carl; Huang, Erich; Freymann, John; Kirby, Justin; Morris, Elizabeth; Giger, Maryellen
2015-01-01
Background To demonstrate that computer-extracted image phenotypes (CEIPs) of biopsy-proven breast cancer on MRI can accurately predict pathologic stage. Methods We used a dataset of de-identified breast MRIs organized by the National Cancer Institute in The Cancer Imaging Archive. We analyzed 91 biopsy-proven breast cancer cases with pathologic stage (stage I = 22; stage II = 58; stage III = 11) and surgically proven nodal status (negative nodes = 46, ≥ 1 positive node = 44, no nodes examined = 1). We characterized tumors by (a) radiologist measured size, and (b) CEIP. We built models combining two CEIPs to predict tumor pathologic stage and lymph node involvement, evaluated them in leave-one-out cross-validation with area under the ROC curve (AUC) as figure of merit. Results Tumor size was the most powerful predictor of pathologic stage but CEIPs capturing biologic behavior also emerged as predictive (e.g. stage I+II vs. III demonstrated AUC = 0.83). No size measure was successful in the prediction of positive lymph nodes but adding a CEIP describing tumor “homogeneity,” significantly improved this discrimination (AUC = 0.62, p=.003) over chance. Conclusions Our results indicate that MRI phenotypes show promise for predicting breast cancer pathologic stage and lymph node status. PMID:26619259
Unique volatolomic signatures of TP53 and KRAS in lung cells
Davies, M P A; Barash, O; Jeries, R; Peled, N; Ilouze, M; Hyde, R; Marcus, M W; Field, J K; Haick, H
2014-01-01
Background: Volatile organic compounds (VOCs) are potential biomarkers for cancer detection in breath, but it is unclear if they reflect specific mutations. To test this, we have compared human bronchial epithelial cell (HBEC) cell lines carrying the KRASV12 mutation, knockdown of TP53 or both with parental HBEC cells. Methods: VOC from headspace above cultured cells were collected by passive sampling and analysed by thermal desorption gas chromatography mass spectrometry (TD-GC–MS) or sensor array with discriminant factor analysis (DFA). Results: In TD-GC–MS analysis, individual compounds had limited ability to discriminate between cell lines, but by applying DFA analysis combinations of 20 VOCs successfully discriminated between all cell types (accuracies 80–100%, with leave-one-out cross validation). Sensor array detection DFA demonstrated the ability to discriminate samples based on their cell type for all comparisons with accuracies varying between 77% and 93%. Conclusions: Our results demonstrate that minimal genetic changes in bronchial airway cells lead to detectable differences in levels of specific VOCs identified by TD-GC–MS or of patterns of VOCs identified by sensor array output. From the clinical aspect, these results suggest the possibility of breath analysis for detection of minimal genetic changes for earlier diagnosis or for genetic typing of lung cancers. PMID:25051409
Veeraraghavan, Harini; Dashevsky, Brittany Z; Onishi, Natsuko; Sadinski, Meredith; Morris, Elizabeth; Deasy, Joseph O; Sutton, Elizabeth J
2018-03-19
We present a segmentation approach that combines GrowCut (GC) with cancer-specific multi-parametric Gaussian Mixture Model (GCGMM) to produce accurate and reproducible segmentations. We evaluated GCGMM using a retrospectively collected 75 invasive ductal carcinoma with ERPR+ HER2- (n = 15), triple negative (TN) (n = 9), and ER-HER2+ (n = 57) cancers with variable presentation (mass and non-mass enhancement) and background parenchymal enhancement (mild and marked). Expert delineated manual contours were used to assess the segmentation performance using Dice coefficient (DSC), mean surface distance (mSD), Hausdorff distance, and volume ratio (VR). GCGMM segmentations were significantly more accurate than GrowCut (GC) and fuzzy c-means clustering (FCM). GCGMM's segmentations and the texture features computed from those segmentations were the most reproducible compared with manual delineations and other analyzed segmentation methods. Finally, random forest (RF) classifier trained with leave-one-out cross-validation using features extracted from GCGMM segmentation resulted in the best accuracy for ER-HER2+ vs. ERPR+/TN (GCGMM 0.95, expert 0.95, GC 0.90, FCM 0.92) and for ERPR + HER2- vs. TN (GCGMM 0.92, expert 0.91, GC 0.77, FCM 0.83).
Stapelfeldt, Christina Malmose; Jensen, Chris; Andersen, Niels Trolle; Fleten, Nils; Nielsen, Claus Vinther
2012-08-15
Previous validation studies of sick leave measures have focused on self-reports. Register-based sick leave data are considered to be valid; however methodological problems may be associated with such data. A Danish national register on sickness benefit (DREAM) has been widely used in sick leave research. On the basis of sick leave records from 3,554 and 2,311 eldercare workers in 14 different workplaces, the aim of this study was to: 1) validate registered sickness benefit data from DREAM against workplace-registered sick leave spells of at least 15 days; 2) validate self-reported sick leave days during one year against workplace-registered sick leave. Agreement between workplace-registered sick leave and DREAM-registered sickness benefit was reported as sensitivities, specificities and positive predictive values. A receiver-operating characteristic curve and a Bland-Altman plot were used to study the concordance with sick leave duration of the first spell. By means of an analysis of agreement between self-reported and workplace-registered sick leave sensitivity and specificity was calculated. Ninety-five percent confidence intervals (95% CI) were used. The probability that registered DREAM data on sickness benefit agrees with workplace-registered sick leave of at least 15 days was 96.7% (95% CI: 95.6-97.6). Specificity was close to 100% (95% CI: 98.3-100). The registered DREAM data on sickness benefit overestimated the duration of sick leave spells by an average of 1.4 (SD: 3.9) weeks. Separate analysis on pregnancy-related sick leave revealed a maximum sensitivity of 20% (95% CI: 4.3-48.1).The sensitivity of self-reporting at least one or at least 56 sick leave day/s was 94.5 (95% CI: 93.4 - 95.5) % and 58.5 (95% CI: 51.1 - 65.6) % respectively. The corresponding specificities were 85.3 (95% CI: 81.4 - 88.6) % and 98.9 (95% CI: 98.3 - 99.3) %. The DREAM register offered valid measures of sick leave spells of at least 15 days among eldercare employees. Pregnancy-related sick leave should be excluded in studies planning to use DREAM data on sickness benefit. Self-reported sick leave became more imprecise when number of absence days increased, but the sensitivity and specificity were acceptable for lengths not exceeding one week.
Adhi, Mehreen; Semy, Salim K; Stein, David W; Potter, Daniel M; Kuklinski, Walter S; Sleeper, Harry A; Duker, Jay S; Waheed, Nadia K
2016-05-01
To present novel software algorithms applied to spectral-domain optical coherence tomography (SD-OCT) for automated detection of diabetic retinopathy (DR). Thirty-one diabetic patients (44 eyes) and 18 healthy, nondiabetic controls (20 eyes) who underwent volumetric SD-OCT imaging and fundus photography were retrospectively identified. A retina specialist independently graded DR stage. Trained automated software generated a retinal thickness score signifying macular edema and a cluster score signifying microaneurysms and/or hard exudates for each volumetric SD-OCT. Of 44 diabetic eyes, 38 had DR and six eyes did not have DR. Leave-one-out cross-validation using a linear discriminant at missed detection/false alarm ratio of 3.00 computed software sensitivity and specificity of 92% and 69%, respectively, for DR detection when compared to clinical assessment. Novel software algorithms applied to commercially available SD-OCT can successfully detect DR and may have potential as a viable screening tool for DR in future. [Ophthalmic Surg Lasers Imaging Retina. 2016;47:410-417.]. Copyright 2016, SLACK Incorporated.
Gaura, Elena; Kemp, John; Brusey, James
2013-12-01
The paper demonstrates that wearable sensor systems, coupled with real-time on-body processing and actuation, can enhance safety for wearers of heavy protective equipment who are subjected to harsh thermal environments by reducing risk of Uncompensable Heat Stress (UHS). The work focuses on Explosive Ordnance Disposal operatives and shows that predictions of UHS risk can be performed in real-time with sufficient accuracy for real-world use. Furthermore, it is shown that the required sensory input for such algorithms can be obtained with wearable, non-intrusive sensors. Two algorithms, one based on Bayesian nets and another on decision trees, are presented for determining the heat stress risk, considering the mean skin temperature prediction as a proxy. The algorithms are trained on empirical data and have accuracies of 92.1±2.9% and 94.4±2.1%, respectively when tested using leave-one-subject-out cross-validation. In applications such as Explosive Ordnance Disposal operative monitoring, such prediction algorithms can enable autonomous actuation of cooling systems and haptic alerts to minimize casualties.
Bakire, Serge; Yang, Xinya; Ma, Guangcai; Wei, Xiaoxuan; Yu, Haiying; Chen, Jianrong; Lin, Hongjun
2018-01-01
Organic chemicals in the aquatic ecosystem may inhibit algae growth and subsequently lead to the decline of primary productivity. Growth inhibition tests are required for ecotoxicological assessments for regulatory purposes. In silico study is playing an important role in replacing or reducing animal tests and decreasing experimental expense due to its efficiency. In this work, a series of theoretical models was developed for predicting algal growth inhibition (log EC 50 ) after 72 h exposure to diverse chemicals. In total 348 organic compounds were classified into five modes of toxic action using the Verhaar Scheme. Each model was established by using molecular descriptors that characterize electronic and structural properties. The external validation and leave-one-out cross validation proved the statistical robustness of the derived models. Thus they can be used to predict log EC 50 values of chemicals that lack authorized algal growth inhibition values (72 h). This work systematically studied algal growth inhibition according to toxic modes and the developed model suite covers all five toxic modes. The outcome of this research will promote toxic mechanism analysis and be made applicable to structural diversity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Webb, Andrea K; Vincent, Ashley L; Jin, Alvin B; Pollack, Mark H
2015-02-01
Post-traumatic stress disorder (PTSD) currently is diagnosed via clinical interview in which subjective self reports of traumatic events and associated experiences are discussed with a mental health professional. The reliability and validity of diagnoses can be improved with the use of objective physiological measures. In this study, physiological activity was recorded from 58 male veterans (PTSD Diagnosis n = 16; Trauma Exposed/No PTSD Diagnosis: n = 23; No Trauma/No PTSD Diagnosis: n = 19) with and without PTSD and combat trauma exposure in response to emotionally evocative non-idiographic virtual reality stimuli. Statistically significant differences among the Control, Trauma, and PTSD groups were present during the viewing of two virtual reality videos. Skin conductance and interbeat interval features were extracted for each of ten video events (five events of increasing severity per video). These features were submitted to three stepwise discriminant function analyses to assess classification accuracy for Control versus Trauma, Control versus PTSD, and Trauma versus PTSD pairings of participant groups. Leave-one-out cross-validation classification accuracy was between 71 and 94%. These results are promising and suggest the utility of objective physiological measures in assisting with PTSD diagnosis.
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press. With a CD: data, software, guides. (2009). 2. Kanevski M. Spatial Predictions of Soil Contamination Using General Regression Neural Networks. Systems Research and Information Systems, Volume 8, number 4, 1999. 3. Robert S., Foresti L., Kanevski M. Spatial prediction of monthly wind speeds in complex terrain with adaptive general regression neural networks. International Journal of Climatology, 33 pp. 1793-1804, 2013.
Zunjar, Vishwanath; Dash, Ranjeet Prasad; Jivrajani, Mehul; Trivedi, Bhavna; Nivsarkar, Manish
2016-04-02
The decoction of Carica papaya Linn. leaves is used in folklore medicine in certain parts of Malaysia and Indonesia for the treatment of different types of thrombocytopenia associated with diseases and drugs. There are several scientific studies carried out on humans and animal models to confirm the efficacy of decoction of papaya leave for the treatment of disease induced and drug induced thrombocytopenia, however very little is known about the bio-active compounds responsible for the observed activity. The aim of present study was to identify the active phytochemical component of Carica papaya Linn. leaves decoction responsible for anti-thrombocytopenic activity in busulfan-induced thrombocytopenic rats. Antithrombocytopenic activity was assessed on busulfan induced thrombocytopenic Wistar rats. The antithrombocytopenic activity of different bio-guided fractions was evaluated by monitoring blood platelet count. Bioactive compound carpaine was isolated and purified by chromatographic methods and confirmed by spectroscopic methods (LC-MS and 1D/2D-1H/13C NMR) and the structure was confirmed by single crystal X-ray diffraction. Quantification of carpaine was carried out by LC-MS/MS equipped with XTerra(®) MS C18 column and ESI-MS detector using 90:10 CH3CN:CH3COONH4 (6mM) under isocratic conditions and detected with multiple reaction monitoring (MRM) in positive ion mode. Two different phytochemical groups were isolated from decoction of Carica papaya leaves: phenolics, and alkaloids. Out of these, only alkaloid fraction showed good biological activity. Carpaine was isolated from the alkaloid fraction and exhibited potent activity in sustaining platelet counts upto 555.50±85.17×10(9)/L with no acute toxicity. This study scientifically validates the popular usage of decoction of Carica papaya leaves and it also proves that alkaloids particularly carpaine present in the leaves to be responsible for the antithrombocytopenic activity. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Benchmark of Machine Learning Methods for Classification of a SENTINEL-2 Image
NASA Astrophysics Data System (ADS)
Pirotti, F.; Sunar, F.; Piragnolo, M.
2016-06-01
Thanks to mainly ESA and USGS, a large bulk of free images of the Earth is readily available nowadays. One of the main goals of remote sensing is to label images according to a set of semantic categories, i.e. image classification. This is a very challenging issue since land cover of a specific class may present a large spatial and spectral variability and objects may appear at different scales and orientations. In this study, we report the results of benchmarking 9 machine learning algorithms tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset. The following machine learning methods (MLM) have been tested: linear discriminant analysis, k-nearest neighbour, random forests, support vector machines, multi layered perceptron, multi layered perceptron ensemble, ctree, boosting, logarithmic regression. The validation is carried out using a control dataset which consists of an independent classification in 11 land-cover classes of an area about 60 km2, obtained by manual visual interpretation of high resolution images (20 cm ground sampling distance) by experts. In this study five out of the eleven classes are used since the others have too few samples (pixels) for testing and validating subsets. The classes used are the following: (i) urban (ii) sowable areas (iii) water (iv) tree plantations (v) grasslands. Validation is carried out using three different approaches: (i) using pixels from the training dataset (train), (ii) using pixels from the training dataset and applying cross-validation with the k-fold method (kfold) and (iii) using all pixels from the control dataset. Five accuracy indices are calculated for the comparison between the values predicted with each model and control values over three sets of data: the training dataset (train), the whole control dataset (full) and with k-fold cross-validation (kfold) with ten folds. Results from validation of predictions of the whole dataset (full) show the random forests method with the highest values; kappa index ranging from 0.55 to 0.42 respectively with the most and least number pixels for training. The two neural networks (multi layered perceptron and its ensemble) and the support vector machines - with default radial basis function kernel - methods follow closely with comparable performance.
Effect of curvature on the backscattering from leaves
NASA Technical Reports Server (NTRS)
Sarabandi, K.; Senior, T. B. A.; Ulaby, F. T.
1988-01-01
Using a model previously developed for the backscattering cross section of a planar leaf at X-band frequencies and above, the effect of leaf curvature is examined. For normal incidence on a rectangular section of a leaf curved in one and two dimensions, an integral expression for the backscattered field is evaluated numerically and by a stationary phase approximation, leading to a simple analytical expression for the cross section reduction produced by the curvature. Numerical results based on the two methods are virtually identical, and in excellent agreement with measured data for rectangular sections of coleus leaves applied to the surfaces of styrofoam cylinders and spheres of different radii.
NASA Astrophysics Data System (ADS)
Xu, Ye; Sonka, Milan; McLennan, Geoffrey; Guo, Junfeng; Hoffman, Eric
2005-04-01
Lung parenchyma evaluation via multidetector-row CT (MDCT), has significantly altered clinical practice in the early detection of lung disease. Our goal is to enhance our texture-based tissue classification ability to differentiate early pathologic processes by extending our 2-D Adaptive Multiple Feature Method (AMFM) to 3-D AMFM. We performed MDCT on 34 human volunteers in five categories: emphysema in severe Chronic Obstructive Pulmonary Disease (COPD) as EC, emphysema in mild COPD (MC), normal appearing lung in COPD (NC), non-smokers with normal lung function (NN), smokers with normal function (NS). We volumetrically excluded the airway and vessel regions, calculated 24 volumetric texture features for each Volume of Interest (VOI); and used Bayesian rules for discrimination. Leave-one-out and half-half methods were used for testing. Sensitivity, specificity and accuracy were calculated. The accuracy of the leave-one-out method for the four-class classification in the form of 3-D/2-D is: EC: 84.9%/70.7%, MC: 89.8%/82.7%; NC: 87.5.0%/49.6%; NN: 100.0%/60.0%. The accuracy of the leave-one-out method for the two-class classification in the form of 3-D/2-D is: NN: 99.3%/71.6%; NS: 99.7%/74.5%. We conclude that 3-D AMFM analysis of the lung parenchyma improves discrimination compared to 2-D analysis of the same images.
Predicting helix–helix interactions from residue contacts in membrane proteins
Lo, Allan; Chiu, Yi-Yuan; Rødland, Einar Andreas; Lyu, Ping-Chiang; Sung, Ting-Yi; Hsu, Wen-Lian
2009-01-01
Motivation: Helix–helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused on helix-packing patterns and sequence motifs, few of them developed methods specifically for contact prediction. Results: We present a new hierarchical framework for contact prediction, with an application in membrane proteins. The hierarchical scheme consists of two levels: in the first level, contact residues are predicted from the sequence and their pairing relationships are further predicted in the second level. Statistical analyses on contact propensities are combined with other sequence and structural information for training the support vector machine classifiers. Evaluated on 52 protein chains using leave-one-out cross validation (LOOCV) and an independent test set of 14 protein chains, the two-level approach consistently improves the conventional direct approach in prediction accuracy, with 80% reduction of input for prediction. Furthermore, the predicted contacts are then used to infer interactions between pairs of helices. When at least three predicted contacts are required for an inferred interaction, the accuracy, sensitivity and specificity are 56%, 40% and 89%, respectively. Our results demonstrate that a hierarchical framework can be applied to eliminate false positives (FP) while reducing computational complexity in predicting contacts. Together with the estimated contact propensities, this method can be used to gain insights into helix-packing in membrane proteins. Availability: http://bio-cluster.iis.sinica.edu.tw/TMhit/ Contact: tsung@iis.sinica.edu.tw Supplementary information:Supplementary data are available at Bioinformatics online. PMID:19244388
Papini, Gabriele; Bonomi, Alberto G; Stut, Wim; Kraal, Jos J; Kemps, Hareld M C; Sartor, Francesco
2017-01-01
Cardiorespiratory fitness (CRF) provides important diagnostic and prognostic information. It is measured directly via laboratory maximal testing or indirectly via submaximal protocols making use of predictor parameters such as submaximal [Formula: see text], heart rate, workload, and perceived exertion. We have established an innovative methodology, which can provide CRF prediction based only on body motion during a periodic movement. Thirty healthy subjects (40% females, 31.3 ± 7.8 yrs, 25.1 ± 3.2 BMI) and eighteen male coronary artery disease (CAD) (56.6 ± 7.4 yrs, 28.7 ± 4.0 BMI) patients performed a [Formula: see text] test on a cycle ergometer as well as a 45 second squatting protocol at a fixed tempo (80 bpm). A tri-axial accelerometer was used to monitor movements during the squat exercise test. Three regression models were developed to predict CRF based on subject characteristics and a new accelerometer-derived feature describing motion decay. For each model, the Pearson correlation coefficient and the root mean squared error percentage were calculated using the leave-one-subject-out cross-validation method (rcv, RMSEcv). The model built with all healthy individuals' data showed an rcv = 0.68 and an RMSEcv = 16.7%. The CRF prediction improved when only healthy individuals with normal to lower fitness (CRF<40 ml/min/kg) were included, showing an rcv = 0.91 and RMSEcv = 8.7%. Finally, our accelerometry-based CRF prediction CAD patients, the majority of whom taking β-blockers, still showed high accuracy (rcv = 0.91; RMSEcv = 9.6%). In conclusion, motion decay and subject characteristics could be used to predict CRF in healthy people as well as in CAD patients taking β-blockers, accurately. This method could represent a valid alternative for patients taking β-blockers, but needs to be further validated in a larger population.
NASA Astrophysics Data System (ADS)
Bierstedt, Svenja E.; Hünicke, Birgit; Zorita, Eduardo; Ludwig, Juliane
2017-07-01
We statistically analyse the relationship between the structure of migrating dunes in the southern Baltic and the driving wind conditions over the past 26 years, with the long-term aim of using migrating dunes as a proxy for past wind conditions at an interannual resolution. The present analysis is based on the dune record derived from geo-radar measurements by Ludwig et al. (2017). The dune system is located at the Baltic Sea coast of Poland and is migrating from west to east along the coast. The dunes present layers with different thicknesses that can be assigned to absolute dates at interannual timescales and put in relation to seasonal wind conditions. To statistically analyse this record and calibrate it as a wind proxy, we used a gridded regional meteorological reanalysis data set (coastDat2) covering recent decades. The identified link between the dune annual layers and wind conditions was additionally supported by the co-variability between dune layers and observed sea level variations in the southern Baltic Sea. We include precipitation and temperature into our analysis, in addition to wind, to learn more about the dependency between these three atmospheric factors and their common influence on the dune system. We set up a statistical linear model based on the correlation between the frequency of days with specific wind conditions in a given season and dune migration velocities derived for that season. To some extent, the dune records can be seen as analogous to tree-ring width records, and hence we use a proxy validation method usually applied in dendrochronology, cross-validation with the leave-one-out method, when the observational record is short. The revealed correlations between the wind record from the reanalysis and the wind record derived from the dune structure is in the range between 0.28 and 0.63, yielding similar statistical validation skill as dendroclimatological records.
NASA Astrophysics Data System (ADS)
Julià Selvas, Núria; Ninyerola Casals, Miquel
2015-04-01
It has been implemented an automatic system to predict the fire risk in the Principality of Andorra, a small country located in the eastern Pyrenees mountain range, bordered by Catalonia and France, due to its location, his landscape is a set of a rugged mountains with an average elevation around 2000 meters. The system is based on the Fire Weather Index (FWI) that consists on different components, each one, measuring a different aspect of the fire danger calculated by the values of the weather variables at midday. CENMA (Centre d'Estudis de la Neu i de la Muntanya d'Andorra) has a network around 10 automatic meteorological stations, located in different places, peeks and valleys, that measure weather data like relative humidity, wind direction and speed, surface temperature, rainfall and snow cover every ten minutes; this data is sent daily and automatically to the system implemented that will be processed in the way to filter incorrect measurements and to homogenizer measurement units. Then this data is used to calculate all components of the FWI at midday and for the level of each station, creating a database with the values of the homogeneous measurements and the FWI components for each weather station. In order to extend and model this data to all Andorran territory and to obtain a continuous map, an interpolation method based on a multiple regression with spline residual interpolation has been implemented. This interpolation considerer the FWI data as well as other relevant predictors such as latitude, altitude, global solar radiation and sea distance. The obtained values (maps) are validated using a cross-validation leave-one-out method. The discrete and continuous maps are rendered in tiled raster maps and published in a web portal conform to Web Map Service (WMS) Open Geospatial Consortium (OGC) standard. Metadata and other reference maps (fuel maps, topographic maps, etc) are also available from this geoportal.
Silva, Daniel R.; Brenzan, Mislaine A.; Kambara, Lauro M.; Cortez, Lucia E. R.; Cortez, Diógenes A. G.
2013-01-01
Background: Piper ovatum (Piperaceae) has been used in traditional medicine for the treatment of inflammations and as an analgesic. Previous studies have showed important biological activities of the extracts and amides from P. ovatum leaves. Objective: In this study, a high-performance liquid chromatographic (HPLC) method was developed and validated for quantitative determination of the amides in different parts of Piper ovatum. Materials and Methods: The analysis was carried out on a Metasil ODS column (150 × 4.6 mm, 5μm) at room temperature. HPLC conditions were as follows: acetonitrile (A), and water (B), 1.0% acetic acid. The gradient elution used was 0–30 min, 0-60% A; 30–40 min, 60% A. Flow rate used was 1.0mL/min, and detection at 280nm. Results: The validation using piperlonguminine, as the standard, demonstrated that the method shows linearity (linear correlation coefficient = 0.998), precision (relative standard deviation <5%) and accuracy (mean recovery = 103.78%) in the concentration range 31.25 – 500μg/mL. The limit of detection and quantification were 1.21 and 4.03μg/mL, respectively. This method allowed the identification and quantification of piperlonguminine and piperovatine in the hydroethanolic extracts of P. ovatum obtained from the leaves, stems and roots. All the extracts showed the same chromatographic profile. The leaves and roots contained the highest concentrations of piperlonguminine and the stems and leaves showed the most concentrations of piperovatine. Conclusion: This HPLC method is suitable for routine quantitative analysis of amides in extracts of Piper ovatum and phytopharmaceuticals containing this herb. PMID:24174818
Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.
Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço
2017-11-01
The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.
NASA Astrophysics Data System (ADS)
Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo
2017-04-01
This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.
Sarker, Hillol; Sharmin, Moushumi; Ali, Amin Ahsan; Rahman, Md Mahbubur; Bari, Rummana; Hossain, Syed Monowar; Kumar, Santosh
Wearable wireless sensors for health monitoring are enabling the design and delivery of just-in-time interventions (JITI). Critical to the success of JITI is to time its delivery so that the user is available to be engaged. We take a first step in modeling users' availability by analyzing 2,064 hours of physiological sensor data and 2,717 self-reports collected from 30 participants in a week-long field study. We use delay in responding to a prompt to objectively measure availability. We compute 99 features and identify 30 as most discriminating to train a machine learning model for predicting availability. We find that location, affect, activity type, stress, time, and day of the week, play significant roles in predicting availability. We find that users are least available at work and during driving, and most available when walking outside. Our model finally achieves an accuracy of 74.7% in 10-fold cross-validation and 77.9% with leave-one-subject-out.
Sarker, Hillol; Sharmin, Moushumi; Ali, Amin Ahsan; Rahman, Md. Mahbubur; Bari, Rummana; Hossain, Syed Monowar; Kumar, Santosh
2015-01-01
Wearable wireless sensors for health monitoring are enabling the design and delivery of just-in-time interventions (JITI). Critical to the success of JITI is to time its delivery so that the user is available to be engaged. We take a first step in modeling users’ availability by analyzing 2,064 hours of physiological sensor data and 2,717 self-reports collected from 30 participants in a week-long field study. We use delay in responding to a prompt to objectively measure availability. We compute 99 features and identify 30 as most discriminating to train a machine learning model for predicting availability. We find that location, affect, activity type, stress, time, and day of the week, play significant roles in predicting availability. We find that users are least available at work and during driving, and most available when walking outside. Our model finally achieves an accuracy of 74.7% in 10-fold cross-validation and 77.9% with leave-one-subject-out. PMID:25798455
Gender classification of running subjects using full-body kinematics
NASA Astrophysics Data System (ADS)
Williams, Christina M.; Flora, Jeffrey B.; Iftekharuddin, Khan M.
2016-05-01
This paper proposes novel automated gender classification of subjects while engaged in running activity. The machine learning techniques include preprocessing steps using principal component analysis followed by classification with linear discriminant analysis, and nonlinear support vector machines, and decision-stump with AdaBoost. The dataset consists of 49 subjects (25 males, 24 females, 2 trials each) all equipped with approximately 80 retroreflective markers. The trials are reflective of the subject's entire body moving unrestrained through a capture volume at a self-selected running speed, thus producing highly realistic data. The classification accuracy using leave-one-out cross validation for the 49 subjects is improved from 66.33% using linear discriminant analysis to 86.74% using the nonlinear support vector machine. Results are further improved to 87.76% by means of implementing a nonlinear decision stump with AdaBoost classifier. The experimental findings suggest that the linear classification approaches are inadequate in classifying gender for a large dataset with subjects running in a moderately uninhibited environment.
NIR detection of honey adulteration reveals differences in water spectral pattern.
Bázár, György; Romvári, Róbert; Szabó, András; Somogyi, Tamás; Éles, Viktória; Tsenkova, Roumiana
2016-03-01
High fructose corn syrup (HFCS) was mixed with four artisanal Robinia honeys at various ratios (0-40%) and near infrared (NIR) spectra were recorded with a fiber optic immersion probe. Levels of HFCS adulteration could be detected accurately using leave-one-honey-out cross-validation (RMSECV=1.48; R(2)CV=0.987), partial least squares regression and the 1300-1800nm spectral interval containing absorption bands related to both water and carbohydrates. Aquaphotomics-based evaluations showed that unifloral honeys contained more highly organized water than the industrial sugar syrup, supposedly because of the greater variety of molecules dissolved in the multi-component honeys. Adulteration with HFCS caused a gradual reduction of water molecular structures, especially water trimers, which facilitate interaction with other molecules. Quick, non-destructive NIR spectroscopy combined with aquaphotomics could be used to describe water molecular structures in honey and to detect a rather common form of adulteration. Copyright © 2015 Elsevier Ltd. All rights reserved.
Mahrooghy, Majid; Ashraf, Ahmed B; Daye, Dania; Mies, Carolyn; Feldman, Michael; Rosen, Mark; Kontos, Despina
2013-01-01
Breast tumors are heterogeneous lesions. Intra-tumor heterogeneity presents a major challenge for cancer diagnosis and treatment. Few studies have worked on capturing tumor heterogeneity from imaging. Most studies to date consider aggregate measures for tumor characterization. In this work we capture tumor heterogeneity by partitioning tumor pixels into subregions and extracting heterogeneity wavelet kinetic (HetWave) features from breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to obtain the spatiotemporal patterns of the wavelet coefficients and contrast agent uptake from each partition. Using a genetic algorithm for feature selection, and a logistic regression classifier with leave one-out cross validation, we tested our proposed HetWave features for the task of classifying breast cancer recurrence risk. The classifier based on our features gave an ROC AUC of 0.78, outperforming previously proposed kinetic, texture, and spatial enhancement variance features which give AUCs of 0.69, 0.64, and 0.65, respectively.
NASA Astrophysics Data System (ADS)
Shaltout, Abdallah A.; Moharram, Mohammed A.; Mostafa, Nasser Y.
2012-01-01
This work is the first attempt to quantify trace elements in the Catha edulis plant (Khat) with a fundamental parameter approach. C. edulis is a famous drug plant in east Africa and Arabian Peninsula. We have previously confirmed that hydroxyapatite represents one of the main inorganic compounds in the leaves and stalks of C. edulis. Comparable plant leaves from basil, mint and green tea were included in the present investigation as well as trifolium leaves were included as a non-related plant. The elemental analyses of the plants were done by Wavelength Dispersive X-Ray Fluorescence (WDXRF) spectroscopy. Standard-less quantitative WDXRF analysis was carried out based on the fundamental parameter approaches. According to the standard-less analysis algorithms, there is an essential need for an accurate determination of the amount of organic material in the sample. A new approach, based on the differential thermal analysis, was successfully used for the organic material determination. The obtained results based on this approach were in a good agreement with the commonly used methods. Depending on the developed method, quantitative analysis results of eighteen elements including; Al, Br, Ca, Cl, Cu, Fe, K, Na, Ni, Mg, Mn, P, Rb, S, Si, Sr, Ti and Zn were obtained for each plant. The results of the certified reference materials of green tea (NCSZC73014, China National Analysis Center for Iron and Steel, Beijing, China) confirmed the validity of the proposed method.
Mavridis, Lazaros; Janes, Robert W
2017-01-01
Circular dichroism (CD) spectroscopy is extensively utilized for determining the percentages of secondary structure content present in proteins. However, although a large contributor, secondary structure is not the only factor that influences the shape and magnitude of the CD spectrum produced. Other structural features can make contributions so an entire protein structural conformation can give rise to a CD spectrum. There is a need for an application capable of generating protein CD spectra from atomic coordinates. However, no empirically derived method to do this currently exists. PDB2CD has been created as an empirical-based approach to the generation of protein CD spectra from atomic coordinates. The method utilizes a combination of structural features within the conformation of a protein; not only its percentage secondary structure content, but also the juxtaposition of these structural components relative to one another, and the overall structure similarity of the query protein to proteins in our dataset, the SP175 dataset, the 'gold standard' set obtained from the Protein Circular Dichroism Data Bank (PCDDB). A significant number of the CD spectra associated with the 71 proteins in this dataset have been produced with excellent accuracy using a leave-one-out cross-validation process. The method also creates spectra in good agreement with those of a test set of 14 proteins from the PCDDB. The PDB2CD package provides a web-based, user friendly approach to enable researchers to produce CD spectra from protein atomic coordinates. http://pdb2cd.cryst.bbk.ac.uk CONTACT: r.w.janes@qmul.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Young, Jonathan; Ridgway, Gerard; Leung, Kelvin; Ourselin, Sebastien
2012-02-01
It is well known that hippocampal atrophy is a marker of the onset of Alzheimer's disease (AD) and as a result hippocampal volumetry has been used in a number of studies to provide early diagnosis of AD and predict conversion of mild cognitive impairment patients to AD. However, rates of atrophy are not uniform across the hippocampus making shape analysis a potentially more accurate biomarker. This study studies the hippocampi from 226 healthy controls, 148 AD patients and 330 MCI patients obtained from T1 weighted structural MRI images from the ADNI database. The hippocampi are anatomically segmented using the MAPS multi-atlas segmentation method, and the resulting binary images are then processed with SPHARM software to decompose their shapes as a weighted sum of spherical harmonic basis functions. The resulting parameterizations are then used as feature vectors in Support Vector Machine (SVM) classification. A wrapper based feature selection method was used as this considers the utility of features in discriminating classes in combination, fully exploiting the multivariate nature of the data and optimizing the selected set of features for the type of classifier that is used. The leave-one-out cross validated accuracy obtained on training data is 88.6% for classifying AD vs controls and 74% for classifying MCI-converters vs MCI-stable with very compact feature sets, showing that this is a highly promising method. There is currently a considerable fall in accuracy on unseen data indicating that the feature selection is sensitive to the data used, however feature ensemble methods may overcome this.
Huang, Yu-An; You, Zhu-Hong; Chen, Xing; Huang, Zhi-An; Zhang, Shanwen; Yan, Gui-Ying
2017-10-16
Accumulating clinical researches have shown that specific microbes with abnormal levels are closely associated with the development of various human diseases. Knowledge of microbe-disease associations can provide valuable insights for complex disease mechanism understanding as well as the prevention, diagnosis and treatment of various diseases. However, little effort has been made to predict microbial candidates for human complex diseases on a large scale. In this work, we developed a new computational model for predicting microbe-disease associations by combining two single recommendation methods. Based on the assumption that functionally similar microbes tend to get involved in the mechanism of similar disease, we adopted neighbor-based collaborative filtering and a graph-based scoring method to compute association possibility of microbe-disease pairs. The promising prediction performance could be attributed to the use of hybrid approach based on two single recommendation methods as well as the introduction of Gaussian kernel-based similarity and symptom-based disease similarity. To evaluate the performance of the proposed model, we implemented leave-one-out and fivefold cross validations on the HMDAD database, which is recently built as the first database collecting experimentally-confirmed microbe-disease associations. As a result, NGRHMDA achieved reliable results with AUCs of 0.9023 ± 0.0031 and 0.9111 in the validation frameworks of fivefold CV and LOOCV. In addition, 78.2% microbe samples and 66.7% disease samples are found to be consistent with the basic assumption of our work that microbes tend to get involved in the similar disease clusters, and vice versa. Compared with other methods, the prediction results yielded by NGRHMDA demonstrate its effective prediction performance for microbe-disease associations. It is anticipated that NGRHMDA can be used as a useful tool to search the most potential microbial candidates for various diseases, and therefore boosts the medical knowledge and drug development. The codes and dataset of our work can be downloaded from https://github.com/yahuang1991/NGRHMDA .
Chen, Sheng; Yao, Liping; Chen, Bao
2016-11-01
The enhancement of lung nodules in chest radiographs (CXRs) plays an important role in the manual as well as computer-aided detection (CADe) lung cancer. In this paper, we proposed a parameterized logarithmic image processing (PLIP) method combined with the Laplacian of a Gaussian (LoG) filter to enhance lung nodules in CXRs. We first applied several LoG filters with varying parameters to an original CXR to enhance the nodule-like structures as well as the edges in the image. We then applied the PLIP model, which can enhance lung nodule images with high contrast and was beneficial in extracting effective features for nodule detection in the CADe scheme. Our method combined the advantages of both the PLIP algorithm and the LoG algorithm, which can enhance lung nodules in chest radiographs with high contrast. To test our nodule enhancement method, we tested a CADe scheme, with a relatively high performance in nodule detection, using a publically available database containing 140 nodules in 140 CXRs enhanced through our nodule enhancement method. The CADe scheme attained a sensitivity of 81 and 70 % with an average of 5.0 frame rate (FP) and 2.0 FP, respectively, in a leave-one-out cross-validation test. By contrast, the CADe scheme based on the original image recorded a sensitivity of 77 and 63 % at 5.0 FP and 2.0 FP, respectively. We introduced the measurement of enhancement by entropy evaluation to objectively assess our method. Experimental results show that the proposed method obtains an effective enhancement of lung nodules in CXRs for both radiologists and CADe schemes.
Bjorner, Jakob Bue; Pejtersen, Jan Hyld
2010-02-01
To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.
Murai, Hideko; Nakayama, Takeo
2008-01-01
Background In Japan, the temporary leave and drop-out rate of university or junior college students has been increasing in recent years, and many cases have been attributed to psychological problems. To establish a mental health support system for entering students, we conducted a questionnaire and follow-up survey, and explored predictors of temporary leaves and drop-outs among junior college women. Methods Our sample consisted of 485 first-year female students attending a junior college in Osaka, Japan. Between 1998 and 2002, the following factors were assessed: lifestyle, college life, subjective well-being measured by the General Well-Being Schedule (GWBS), self-esteem, and emotional support network. A follow-up survey was conducted during 1 year. Results Thirty-seven women, who had taken temporary leaves or had dropped out during the first year, showed unfavorable responses to lifestyle, college life and/or subjective well-being compared with other students. No differences in self-esteem and emotional support network were found between the two groups. A multiple regression analysis showed that non-existence of interesting club activity, smoking, and low level of life satisfaction and emotional stability measured by the GWBS were predictors of temporary leaves and drop-outs. Conclusion It may be possible to determine which students are at risk for taking temporary leaves or dropping out based on their psychological state and lifestyle at the time of enrollment in college. More support is needed to continue the students at school who are at high risk for taking temporary leaves or dropping out. PMID:18305364
Kerry, Ruth; Goovaerts, Pierre; Smit, Izak P.J.; Ingram, Ben R.
2015-01-01
Kruger National Park (KNP), South Africa, provides protected habitats for the unique animals of the African savannah. For the past 40 years, annual aerial surveys of herbivores have been conducted to aid management decisions based on (1) the spatial distribution of species throughout the park and (2) total species populations in a year. The surveys are extremely time consuming and costly. For many years, the whole park was surveyed, but in 1998 a transect survey approach was adopted. This is cheaper and less time consuming but leaves gaps in the data spatially. Also the distance method currently employed by the park only gives estimates of total species populations but not their spatial distribution. We compare the ability of multiple indicator kriging and area-to-point Poisson kriging to accurately map species distribution in the park. A leave-one-out cross-validation approach indicates that multiple indicator kriging makes poor estimates of the number of animals, particularly the few large counts, as the indicator variograms for such high thresholds are pure nugget. Poisson kriging was applied to the prediction of two types of abundance data: spatial density and proportion of a given species. Both Poisson approaches had standardized mean absolute errors (St. MAEs) of animal counts at least an order of magnitude lower than multiple indicator kriging. The spatial density, Poisson approach (1), gave the lowest St. MAEs for the most abundant species and the proportion, Poisson approach (2), did for the least abundant species. Incorporating environmental data into Poisson approach (2) further reduced St. MAEs. PMID:25729318
Kerry, Ruth; Goovaerts, Pierre; Smit, Izak P J; Ingram, Ben R
Kruger National Park (KNP), South Africa, provides protected habitats for the unique animals of the African savannah. For the past 40 years, annual aerial surveys of herbivores have been conducted to aid management decisions based on (1) the spatial distribution of species throughout the park and (2) total species populations in a year. The surveys are extremely time consuming and costly. For many years, the whole park was surveyed, but in 1998 a transect survey approach was adopted. This is cheaper and less time consuming but leaves gaps in the data spatially. Also the distance method currently employed by the park only gives estimates of total species populations but not their spatial distribution. We compare the ability of multiple indicator kriging and area-to-point Poisson kriging to accurately map species distribution in the park. A leave-one-out cross-validation approach indicates that multiple indicator kriging makes poor estimates of the number of animals, particularly the few large counts, as the indicator variograms for such high thresholds are pure nugget. Poisson kriging was applied to the prediction of two types of abundance data: spatial density and proportion of a given species. Both Poisson approaches had standardized mean absolute errors (St. MAEs) of animal counts at least an order of magnitude lower than multiple indicator kriging. The spatial density, Poisson approach (1), gave the lowest St. MAEs for the most abundant species and the proportion, Poisson approach (2), did for the least abundant species. Incorporating environmental data into Poisson approach (2) further reduced St. MAEs.
ActivityAware: An App for Real-Time Daily Activity Level Monitoring on the Amulet Wrist-Worn Device.
Boateng, George; Batsis, John A; Halter, Ryan; Kotz, David
2017-03-01
Physical activity helps reduce the risk of cardiovascular disease, hypertension and obesity. The ability to monitor a person's daily activity level can inform self-management of physical activity and related interventions. For older adults with obesity, the importance of regular, physical activity is critical to reduce the risk of long-term disability. In this work, we present ActivityAware , an application on the Amulet wrist-worn device that measures daily activity levels (sedentary, moderate and vigorous) of individuals, continuously and in real-time. The app implements an activity-level detection model, continuously collects acceleration data on the Amulet, classifies the current activity level, updates the day's accumulated time spent at that activity level, logs the data for later analysis, and displays the results on the screen. We developed an activity-level detection model using a Support Vector Machine (SVM). We trained our classifiers using data from a user study, where subjects performed the following physical activities: sit, stand, lay down, walk and run. With 10-fold cross validation and leave-one-subject-out (LOSO) cross validation, we obtained preliminary results that suggest accuracies up to 98%, for n=14 subjects. Testing the ActivityAware app revealed a projected battery life of up to 4 weeks before needing to recharge. The results are promising, indicating that the app may be used for activity-level monitoring, and eventually for the development of interventions that could improve the health of individuals.
Artificial neural network classifier predicts neuroblastoma patients' outcome.
Cangelosi, Davide; Pelassa, Simone; Morini, Martina; Conte, Massimo; Bosco, Maria Carla; Eva, Alessandra; Sementa, Angela Rita; Varesio, Luigi
2016-11-08
More than fifty percent of neuroblastoma (NB) patients with adverse prognosis do not benefit from treatment making the identification of new potential targets mandatory. Hypoxia is a condition of low oxygen tension, occurring in poorly vascularized tissues, which activates specific genes and contributes to the acquisition of the tumor aggressive phenotype. We defined a gene expression signature (NB-hypo), which measures the hypoxic status of the neuroblastoma tumor. We aimed at developing a classifier predicting neuroblastoma patients' outcome based on the assessment of the adverse effects of tumor hypoxia on the progression of the disease. Multi-layer perceptron (MLP) was trained on the expression values of the 62 probe sets constituting NB-hypo signature to develop a predictive model for neuroblastoma patients' outcome. We utilized the expression data of 100 tumors in a leave-one-out analysis to select and construct the classifier and the expression data of the remaining 82 tumors to test the classifier performance in an external dataset. We utilized the Gene set enrichment analysis (GSEA) to evaluate the enrichment of hypoxia related gene sets in patients predicted with "Poor" or "Good" outcome. We utilized the expression of the 62 probe sets of the NB-Hypo signature in 182 neuroblastoma tumors to develop a MLP classifier predicting patients' outcome (NB-hypo classifier). We trained and validated the classifier in a leave-one-out cross-validation analysis on 100 tumor gene expression profiles. We externally tested the resulting NB-hypo classifier on an independent 82 tumors' set. The NB-hypo classifier predicted the patients' outcome with the remarkable accuracy of 87 %. NB-hypo classifier prediction resulted in 2 % classification error when applied to clinically defined low-intermediate risk neuroblastoma patients. The prediction was 100 % accurate in assessing the death of five low/intermediated risk patients. GSEA of tumor gene expression profile demonstrated the hypoxic status of the tumor in patients with poor prognosis. We developed a robust classifier predicting neuroblastoma patients' outcome with a very low error rate and we provided independent evidence that the poor outcome patients had hypoxic tumors, supporting the potential of using hypoxia as target for neuroblastoma treatment.
2012-01-10
water and satellite water-leaving radiance data for bidirectional effects. The proposed model is first validated with a one year time series of in situ... model is proposed to correct above-water and satellite water-leaving radiance data for bidirectional effects. The proposed model is first validated with...proposed model over the current one, demonstrating the need for a specific case 2 water BRDF correction algorithm as well as the feasibility of enhancing
Seasonal changes and effect of harvest on glucosinolates in Isatis leaves.
Mohn, Tobias; Suter, Kathrin; Hamburger, Matthias
2008-04-01
The seasonal fluctuation of glucosinolates in five defined Isatis tinctoria and one Isatis indigotica accessions (first year, rosette stage), grown on field plots under identical conditions, was investigated. Analysis of the intact glucosinolates was carried out with shock frozen, freeze dried leaf samples using a recently developed and validated PLE (pressurized liquid extraction) protocol and ion-pair HPLC coupled with ESI-MS in the negative mode. When comparing the two Isatis species, significant qualitative and quantitative differences in the glucosinolate patterns were observed. Differences among the various Isatis tinctoria accessions were much smaller. We studied the effects of repeated harvesting during the growth season on glucosinolate concentrations and found that repeated harvest did not have a major effect on glucosinolate concentrations of newly grown leaves. Glucosinolates could not be detected in woad leaves submitted to conventional drying.
Anatomical brain images alone can accurately diagnose chronic neuropsychiatric illnesses.
Bansal, Ravi; Staib, Lawrence H; Laine, Andrew F; Hao, Xuejun; Xu, Dongrong; Liu, Jun; Weissman, Myrna; Peterson, Bradley S
2012-01-01
Diagnoses using imaging-based measures alone offer the hope of improving the accuracy of clinical diagnosis, thereby reducing the costs associated with incorrect treatments. Previous attempts to use brain imaging for diagnosis, however, have had only limited success in diagnosing patients who are independent of the samples used to derive the diagnostic algorithms. We aimed to develop a classification algorithm that can accurately diagnose chronic, well-characterized neuropsychiatric illness in single individuals, given the availability of sufficiently precise delineations of brain regions across several neural systems in anatomical MR images of the brain. We have developed an automated method to diagnose individuals as having one of various neuropsychiatric illnesses using only anatomical MRI scans. The method employs a semi-supervised learning algorithm that discovers natural groupings of brains based on the spatial patterns of variation in the morphology of the cerebral cortex and other brain regions. We used split-half and leave-one-out cross-validation analyses in large MRI datasets to assess the reproducibility and diagnostic accuracy of those groupings. In MRI datasets from persons with Attention-Deficit/Hyperactivity Disorder, Schizophrenia, Tourette Syndrome, Bipolar Disorder, or persons at high or low familial risk for Major Depressive Disorder, our method discriminated with high specificity and nearly perfect sensitivity the brains of persons who had one specific neuropsychiatric disorder from the brains of healthy participants and the brains of persons who had a different neuropsychiatric disorder. Although the classification algorithm presupposes the availability of precisely delineated brain regions, our findings suggest that patterns of morphological variation across brain surfaces, extracted from MRI scans alone, can successfully diagnose the presence of chronic neuropsychiatric disorders. Extensions of these methods are likely to provide biomarkers that will aid in identifying biological subtypes of those disorders, predicting disease course, and individualizing treatments for a wide range of neuropsychiatric illnesses.
MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites
2017-01-01
Quality control of MRI is essential for excluding problematic acquisitions and avoiding bias in subsequent image processing and analysis. Visual inspection is subjective and impractical for large scale datasets. Although automated quality assessments have been demonstrated on single-site datasets, it is unclear that solutions can generalize to unseen data acquired at new sites. Here, we introduce the MRI Quality Control tool (MRIQC), a tool for extracting quality measures and fitting a binary (accept/exclude) classifier. Our tool can be run both locally and as a free online service via the OpenNeuro.org portal. The classifier is trained on a publicly available, multi-site dataset (17 sites, N = 1102). We perform model selection evaluating different normalization and feature exclusion approaches aimed at maximizing across-site generalization and estimate an accuracy of 76%±13% on new sites, using leave-one-site-out cross-validation. We confirm that result on a held-out dataset (2 sites, N = 265) also obtaining a 76% accuracy. Even though the performance of the trained classifier is statistically above chance, we show that it is susceptible to site effects and unable to account for artifacts specific to new sites. MRIQC performs with high accuracy in intra-site prediction, but performance on unseen sites leaves space for improvement which might require more labeled data and new approaches to the between-site variability. Overcoming these limitations is crucial for a more objective quality assessment of neuroimaging data, and to enable the analysis of extremely large and multi-site samples. PMID:28945803
Chen, Xing; Niu, Ya-Wei; Wang, Guang-Hui; Yan, Gui-Ying
2017-12-12
Recently, as the research of microRNA (miRNA) continues, there are plenty of experimental evidences indicating that miRNA could be associated with various human complex diseases development and progression. Hence, it is necessary and urgent to pay more attentions to the relevant study of predicting diseases associated miRNAs, which may be helpful for effective prevention, diagnosis and treatment of human diseases. Especially, constructing computational methods to predict potential miRNA-disease associations is worthy of more studies because of the feasibility and effectivity. In this work, we developed a novel computational model of multiple kernels learning-based Kronecker regularized least squares for MiRNA-disease association prediction (MKRMDA), which could reveal potential miRNA-disease associations by automatically optimizing the combination of multiple kernels for disease and miRNA. MKRMDA obtained AUCs of 0.9040 and 0.8446 in global and local leave-one-out cross validation, respectively. Meanwhile, MKRMDA achieved average AUCs of 0.8894 ± 0.0015 in fivefold cross validation. Furthermore, we conducted three different kinds of case studies on some important human cancers for further performance evaluation. In the case studies of colonic cancer, esophageal cancer and lymphoma based on known miRNA-disease associations in HMDDv2.0 database, 76, 94 and 88% of the corresponding top 50 predicted miRNAs were confirmed by experimental reports, respectively. In another two kinds of case studies for new diseases without any known associated miRNAs and diseases only with known associations in HMDDv1.0 database, the verified ratios of two different cancers were 88 and 94%, respectively. All the results mentioned above adequately showed the reliable prediction ability of MKRMDA. We anticipated that MKRMDA could serve to facilitate further developments in the field and the follow-up investigations by biomedical researchers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pražnikar, Jure; University of Primorska,; Turk, Dušan, E-mail: dusan.turk@ijs.si
2014-12-01
The maximum-likelihood free-kick target, which calculates model error estimates from the work set and a randomly displaced model, proved superior in the accuracy and consistency of refinement of crystal structures compared with the maximum-likelihood cross-validation target, which calculates error estimates from the test set and the unperturbed model. The refinement of a molecular model is a computational procedure by which the atomic model is fitted to the diffraction data. The commonly used target in the refinement of macromolecular structures is the maximum-likelihood (ML) function, which relies on the assessment of model errors. The current ML functions rely on cross-validation. Theymore » utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model. An approach has been developed that uses the work set to calculate the phase-error estimates in the ML refinement from simulating the model errors via the random displacement of atomic coordinates. It is called ML free-kick refinement as it uses the ML formulation of the target function and is based on the idea of freeing the model from the model bias imposed by the chemical energy restraints used in refinement. This approach for the calculation of error estimates is superior to the cross-validation approach: it reduces the phase error and increases the accuracy of molecular models, is more robust, provides clearer maps and may use a smaller portion of data for the test set for the calculation of R{sub free} or may leave it out completely.« less
NASA Astrophysics Data System (ADS)
Talai, Sahand; Boelmans, Kai; Sedlacik, Jan; Forkert, Nils D.
2017-03-01
Parkinsonian syndromes encompass a spectrum of neurodegenerative diseases, which can be classified into various subtypes. The differentiation of these subtypes is typically conducted based on clinical criteria. Due to the overlap of intra-syndrome symptoms, the accurate differential diagnosis based on clinical guidelines remains a challenge with failure rates up to 25%. The aim of this study is to present an image-based classification method of patients with Parkinson's disease (PD) and patients with progressive supranuclear palsy (PSP), an atypical variant of PD. Therefore, apparent diffusion coefficient (ADC) parameter maps were calculated based on diffusion-tensor magnetic resonance imaging (MRI) datasets. Mean ADC values were determined in 82 brain regions using an atlas-based approach. The extracted mean ADC values for each patient were then used as features for classification using a linear kernel support vector machine classifier. To increase the classification accuracy, a feature selection was performed, which resulted in the top 17 attributes to be used as the final input features. A leave-one-out cross validation based on 56 PD and 21 PSP subjects revealed that the proposed method is capable of differentiating PD and PSP patients with an accuracy of 94.8%. In conclusion, the classification of PD and PSP patients based on ADC features obtained from diffusion MRI datasets is a promising new approach for the differentiation of Parkinsonian syndromes in the broader context of decision support systems.
NASA Astrophysics Data System (ADS)
Chakraborty, Jayasree; Langdon-Embry, Liana; Escalon, Joanna G.; Allen, Peter J.; Lowery, Maeve A.; O'Reilly, Eileen M.; Do, Richard K. G.; Simpson, Amber L.
2016-03-01
Pancreatic ductal adenocarcinoma (PDAC) is the fourth leading cause of cancer-related death in the United States. The five-year survival rate for all stages is approximately 6%, and approximately 2% when presenting with distant disease.1 Only 10-20% of all patients present with resectable disease, but recurrence rates are high with only 5 to 15% remaining free of disease at 5 years. At this time, we are unable to distinguish between resectable PDAC patients with occult metastatic disease from those with potentially curable disease. Early classification of these tumor types may eventually lead to changes in initial management including the use of neoadjuvant chemotherapy or radiation, or in the choice of postoperative adjuvant treatments. Texture analysis is an emerging methodology in oncologic imaging for quantitatively assessing tumor heterogeneity that could potentially aid in the stratification of these patients. The present study derives several texture-based features from CT images of PDAC patients, acquired prior to neoadjuvant chemotherapy, and analyzes their performance, individually as well as in combination, as prognostic markers. A fuzzy minimum redundancy maximum relevance method with leave-one-image-out technique is included to select discriminating features from the set of extracted features. With a naive Bayes classifier, the proposed method predicts the 5-year overall survival of PDAC patients prior to neoadjuvant therapy and achieves the best results in terms of the area under the receiver operating characteristic curve of 0:858 and accuracy of 83:0% with four-fold cross-validation techniques.
Automated analysis of free speech predicts psychosis onset in high-risk youths
Bedi, Gillinder; Carrillo, Facundo; Cecchi, Guillermo A; Slezak, Diego Fernández; Sigman, Mariano; Mota, Natália B; Ribeiro, Sidarta; Javitt, Daniel C; Copelli, Mauro; Corcoran, Cheryl M
2015-01-01
Background/Objectives: Psychiatry lacks the objective clinical tests routinely used in other specializations. Novel computerized methods to characterize complex behaviors such as speech could be used to identify and predict psychiatric illness in individuals. AIMS: In this proof-of-principle study, our aim was to test automated speech analyses combined with Machine Learning to predict later psychosis onset in youths at clinical high-risk (CHR) for psychosis. Methods: Thirty-four CHR youths (11 females) had baseline interviews and were assessed quarterly for up to 2.5 years; five transitioned to psychosis. Using automated analysis, transcripts of interviews were evaluated for semantic and syntactic features predicting later psychosis onset. Speech features were fed into a convex hull classification algorithm with leave-one-subject-out cross-validation to assess their predictive value for psychosis outcome. The canonical correlation between the speech features and prodromal symptom ratings was computed. Results: Derived speech features included a Latent Semantic Analysis measure of semantic coherence and two syntactic markers of speech complexity: maximum phrase length and use of determiners (e.g., which). These speech features predicted later psychosis development with 100% accuracy, outperforming classification from clinical interviews. Speech features were significantly correlated with prodromal symptoms. Conclusions: Findings support the utility of automated speech analysis to measure subtle, clinically relevant mental state changes in emergent psychosis. Recent developments in computer science, including natural language processing, could provide the foundation for future development of objective clinical tests for psychiatry. PMID:27336038
Wang, Li; Shi, Feng; Gao, Yaozong; Li, Gang; Gilmore, John H.; Lin, Weili; Shen, Dinggang
2014-01-01
Segmentation of infant brain MR images is challenging due to poor spatial resolution, severe partial volume effect, and the ongoing maturation and myelination process. During the first year of life, the brain image contrast between white and gray matters undergoes dramatic changes. In particular, the image contrast inverses around 6–8 months of age, where the white and gray matter tissues are isointense in T1 and T2 weighted images and hence exhibit the extremely low tissue contrast, posing significant challenges for automated segmentation. In this paper, we propose a general framework that adopts sparse representation to fuse the multi-modality image information and further incorporate the anatomical constraints for brain tissue segmentation. Specifically, we first derive an initial segmentation from a library of aligned images with ground-truth segmentations by using sparse representation in a patch-based fashion for the multi-modality T1, T2 and FA images. The segmentation result is further iteratively refined by integration of the anatomical constraint. The proposed method was evaluated on 22 infant brain MR images acquired at around 6 months of age by using a leave-one-out cross-validation, as well as other 10 unseen testing subjects. Our method achieved a high accuracy for the Dice ratios that measure the volume overlap between automated and manual segmentations, i.e., 0.889±0.008 for white matter and 0.870±0.006 for gray matter. PMID:24291615
Automatic polyp detection in colonoscopy videos
NASA Astrophysics Data System (ADS)
Yuan, Zijie; IzadyYazdanabadi, Mohammadhassan; Mokkapati, Divya; Panvalkar, Rujuta; Shin, Jae Y.; Tajbakhsh, Nima; Gurudu, Suryakanth; Liang, Jianming
2017-02-01
Colon cancer is the second cancer killer in the US [1]. Colonoscopy is the primary method for screening and prevention of colon cancer, but during colonoscopy, a significant number (25% [2]) of polyps (precancerous abnormal growths inside of the colon) are missed; therefore, the goal of our research is to reduce the polyp miss-rate of colonoscopy. This paper presents a method to detect polyp automatically in a colonoscopy video. Our system has two stages: Candidate generation and candidate classification. In candidate generation (stage 1), we chose 3,463 frames (including 1,718 with-polyp frames) from real-time colonoscopy video database. We first applied processing procedures, namely intensity adjustment, edge detection and morphology operations, as pre-preparation. We extracted each connected component (edge contour) as one candidate patch from the pre-processed image. With the help of ground truth (GT) images, 2 constraints were implemented on each candidate patch, dividing and saving them into polyp group and non-polyp group. In candidate classification (stage 2), we trained and tested convolutional neural networks (CNNs) with AlexNet architecture [3] to classify each candidate into with-polyp or non-polyp class. Each with-polyp patch was processed by rotation, translation and scaling for invariant to get a much robust CNNs system. We applied leave-2-patients-out cross-validation on this model (4 of 6 cases were chosen as training set and the rest 2 were as testing set). The system accuracy and sensitivity are 91.47% and 91.76%, respectively.
A Machine Learning Approach to Automated Gait Analysis for the Noldus Catwalk System.
Frohlich, Holger; Claes, Kasper; De Wolf, Catherine; Van Damme, Xavier; Michel, Anne
2018-05-01
Gait analysis of animal disease models can provide valuable insights into in vivo compound effects and thus help in preclinical drug development. The purpose of this paper is to establish a computational gait analysis approach for the Noldus Catwalk system, in which footprints are automatically captured and stored. We present a - to our knowledge - first machine learning based approach for the Catwalk system, which comprises a step decomposition, definition and extraction of meaningful features, multivariate step sequence alignment, feature selection, and training of different classifiers (gradient boosting machine, random forest, and elastic net). Using animal-wise leave-one-out cross validation we demonstrate that with our method we can reliable separate movement patterns of a putative Parkinson's disease animal model and several control groups. Furthermore, we show that we can predict the time point after and the type of different brain lesions and can even forecast the brain region, where the intervention was applied. We provide an in-depth analysis of the features involved into our classifiers via statistical techniques for model interpretation. A machine learning method for automated analysis of data from the Noldus Catwalk system was established. Our works shows the ability of machine learning to discriminate pharmacologically relevant animal groups based on their walking behavior in a multivariate manner. Further interesting aspects of the approach include the ability to learn from past experiments, improve with more data arriving and to make predictions for single animals in future studies.
Hoggarth, Petra A; Innes, Carrie R H; Dalrymple-Alford, John C; Jones, Richard D
2013-12-01
To generate a robust model of computerized sensory-motor and cognitive test performance to predict on-road driving assessment outcomes in older persons with diagnosed or suspected cognitive impairment. A logistic regression model classified pass–fail outcomes of a blinded on-road driving assessment. Generalizability of the model was tested using leave-one-out cross-validation. Three specialist clinics in New Zealand. Drivers (n=279; mean age 78.4, 65% male) with diagnosed or suspected dementia, mild cognitive impairment, unspecified cognitive impairment, or memory problems referred for a medical driving assessment. A computerized battery of sensory-motor and cognitive tests and an on-road medical driving assessment. One hundred fifty-five participants (55.5%) received an on-road fail score. Binary logistic regression correctly classified 75.6% of the sample into on-road pass and fail groups. The cross-validation indicated accuracy of the model of 72.0% with sensitivity for detecting on-road fails of 73.5%, specificity of 70.2%, positive predictive value of 75.5%, and negative predictive value of 68%. The off-road assessment prediction model resulted in a substantial number of people who were assessed as likely to fail despite passing an on-road assessment and vice versa. Thus, despite a large multicenter sample, the use of off-road tests previously found to be useful in other older populations, and a carefully constructed and tested prediction model, off-road measures have yet to be found that are sufficiently accurate to allow acceptable determination of on-road driving safety of cognitively impaired older drivers. © 2013, Copyright the Authors Journal compilation © 2013, The American Geriatrics Society.
Wolf, Kathrin; Cyrys, Josef; Harciníková, Tatiana; Gu, Jianwei; Kusch, Thomas; Hampel, Regina; Schneider, Alexandra; Peters, Annette
2017-02-01
Important health relevance has been suggested for ultrafine particles (UFP) and ozone, but studies on long-term effects are scarce, mainly due to the lack of appropriate spatial exposure models. We designed a measurement campaign to develop land use regression (LUR) models to predict the spatial variability focusing on particle number concentration (PNC) as indicator for UFP, ozone and several other air pollutants in the Augsburg region, Southern Germany. Three bi-weekly measurements of PNC, ozone, particulate matter (PM 10 , PM 2.5 ), soot (PM 2.5 abs) and nitrogen oxides (NO x , NO 2 ) were performed at 20 sites in 2014/15. Annual average concentration were calculated and temporally adjusted by measurements from a continuous background station. As geographic predictors we offered several traffic and land use variables, altitude, population and building density. Models were validated using leave-one-out cross-validation. Adjusted model explained variance (R 2 ) was high for PNC and ozone (0.89 and 0.88). Cross-validation adjusted R 2 was slightly lower (0.82 and 0.81) but still indicated a very good fit. LUR models for other pollutants performed well with adjusted R 2 between 0.68 (PM coarse ) and 0.94 (NO 2 ). Contrary to previous studies, ozone showed a moderate correlation with NO 2 (Pearson's r=-0.26). PNC was moderately correlated with ozone and PM 2.5 , but highly correlated with NO x (r=0.91). For PNC and NO x , LUR models comprised similar predictors and future epidemiological analyses evaluating health effects need to consider these similarities. Copyright © 2016 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fried, David V.; Graduate School of Biomedical Sciences, The University of Texas Health Science Center at Houston, Houston, Texas; Tucker, Susan L.
2014-11-15
Purpose: To determine whether pretreatment CT texture features can improve patient risk stratification beyond conventional prognostic factors (CPFs) in stage III non-small cell lung cancer (NSCLC). Methods and Materials: We retrospectively reviewed 91 cases with stage III NSCLC treated with definitive chemoradiation therapy. All patients underwent pretreatment diagnostic contrast enhanced computed tomography (CE-CT) followed by 4-dimensional CT (4D-CT) for treatment simulation. We used the average-CT and expiratory (T50-CT) images from the 4D-CT along with the CE-CT for texture extraction. Histogram, gradient, co-occurrence, gray tone difference, and filtration-based techniques were used for texture feature extraction. Penalized Cox regression implementing cross-validation wasmore » used for covariate selection and modeling. Models incorporating texture features from the 33 image types and CPFs were compared to those with models incorporating CPFs alone for overall survival (OS), local-regional control (LRC), and freedom from distant metastases (FFDM). Predictive Kaplan-Meier curves were generated using leave-one-out cross-validation. Patients were stratified based on whether their predicted outcome was above or below the median. Reproducibility of texture features was evaluated using test-retest scans from independent patients and quantified using concordance correlation coefficients (CCC). We compared models incorporating the reproducibility seen on test-retest scans to our original models and determined the classification reproducibility. Results: Models incorporating both texture features and CPFs demonstrated a significant improvement in risk stratification compared to models using CPFs alone for OS (P=.046), LRC (P=.01), and FFDM (P=.005). The average CCCs were 0.89, 0.91, and 0.67 for texture features extracted from the average-CT, T50-CT, and CE-CT, respectively. Incorporating reproducibility within our models yielded 80.4% (±3.7% SD), 78.3% (±4.0% SD), and 78.8% (±3.9% SD) classification reproducibility in terms of OS, LRC, and FFDM, respectively. Conclusions: Pretreatment tumor texture may provide prognostic information beyond that obtained from CPFs. Models incorporating feature reproducibility achieved classification rates of ∼80%. External validation would be required to establish texture as a prognostic factor.« less
Suffert, Frédéric; Delestre, Ghislain; Carpentier, Florence; Gazeau, Gwilherm; Walker, Anne-Sophie; Gélisse, Sandrine; Duplaix, Clémentine
2016-07-01
The wheat pathogen Zymoseptoria tritici is a relevant fungal model organism for investigations of the epidemiological determinants of sexual reproduction. The objective of this experimental study was to determine which intrinsic factors, including parental fitness and timing conditions of infection, affect the numbers of ascospores produced. We first performed 28 crosses on adult wheat plants in semi-controlled conditions, with 10 isolates characterized for their fitness traits. We validated the efficiency of the crossing method, opening up new perspectives for epidemiological studies. We found that the ability to reproduce sexually was determined, at least partly, by the parental genotypes. We also found that the number of ascospores released was correlated with the mean size of the sporulating lesions of the parental isolates on the one hand, and the absolute difference in the latent periods of these isolates on the other. No functional trade-off between the two modes of reproduction in Z. tritici was revealed: there was no adaptive compromise between pathogenicity (asexual multiplication on leaves) and transmission (intensity of sexual reproduction on wheat debris). Moreover, a few days' difference in the latent periods of the two parental isolates, such that one progressed more rapidly in the host tissue than the other, seemed to be slightly beneficial to ascosporogenesis. This may be because the first parental isolate breaks down host defenses, thereby facilitating infection for the other parental isolate. However, a larger difference (a few weeks), generated by leaving two to three weeks between the inoculations of the plant with the parental isolates, was clearly detrimental to ascosporogenesis. In this case, the host tissues were likely colonized by the first isolate, leaving less host resources available for the second, consistent with a competition effect during the asexual stage. Copyright © 2016 Elsevier Inc. All rights reserved.
Liang, Xianrui; Ma, Meiling; Su, Weike
2013-01-01
Background: A method for chemical fingerprint analysis of Hibiscus mutabilis L. leaves was developed based on ultra performance liquid chromatography with photodiode array detector (UPLC-PAD) combined with similarity analysis (SA) and hierarchical clustering analysis (HCA). Materials and Methods: 10 batches of Hibiscus mutabilis L. leaves samples were collected from different regions of China. UPLC-PAD was employed to collect chemical fingerprints of Hibiscus mutabilis L. leaves. Results: The relative standard deviations (RSDs) of the relative retention times (RRT) and relative peak areas (RPA) of 10 characteristic peaks (one of them was identified as rutin) in precision, repeatability and stability test were less than 3%, and the method of fingerprint analysis was validated to be suitable for the Hibiscus mutabilis L. leaves. Conclusions: The chromatographic fingerprints showed abundant diversity of chemical constituents qualitatively in the 10 batches of Hibiscus mutabilis L. leaves samples from different locations by similarity analysis on basis of calculating the correlation coefficients between each two fingerprints. Moreover, the HCA method clustered the samples into four classes, and the HCA dendrogram showed the close or distant relations among the 10 samples, which was consistent to the SA result to some extent. PMID:23930008
NASA Astrophysics Data System (ADS)
Li, Lianfa; Wu, Anna H.; Cheng, Iona; Chen, Jiu-Chiuan; Wu, Jun
2017-10-01
Monitoring of fine particulate matter with diameter <2.5 μm (PM2.5) started from 1999 in the US and even later in many other countries. The lack of historical PM2.5 data limits epidemiological studies of long-term exposure of PM2.5 and health outcomes such as cancer. In this study, we aimed to design a flexible approach to reliably estimate historical PM2.5 concentrations by incorporating spatial effect and the measurements of existing co-pollutants such as particulate matter with diameter <10 μm (PM10) and meteorological variables. Monitoring data of PM10, PM2.5, and meteorological variables covering the entire state of California were obtained from 1999 through 2013. We developed a spatiotemporal model that quantified non-linear associations between PM2.5 concentrations and the following predictor variables: spatiotemporal factors (PM10 and meteorological variables), spatial factors (land-use patterns, traffic, elevation, distance to shorelines, and spatial autocorrelation), and season. Our model accounted for regional-(county) scale spatial autocorrelation, using spatial weight matrix, and local-scale spatiotemporal variability, using local covariates in additive non-linear model. The spatiotemporal model was evaluated, using leaving-one-site-month-out cross validation. Our final daily model had an R2 of 0.81, with PM10, meteorological variables, and spatial autocorrelation, explaining 55%, 10%, and 10% of the variance in PM2.5 concentrations, respectively. The model had a cross-validation R2 of 0.83 for monthly PM2.5 concentrations (N = 8170) and 0.79 for daily PM2.5 concentrations (N = 51,421) with few extreme values in prediction. Further, the incorporation of spatial effects reduced bias in predictions. Our approach achieved a cross validation R2 of 0.61 for the daily model when PM10 was replaced by total suspended particulate. Our model can robustly estimate historical PM2.5 concentrations in California when PM2.5 measurements were not available.
Roy, Kunal; Leonard, J Thomas
2005-01-01
CCR5 receptor binding affinity of a series of 3-(4-benzylpiperidin-1-yl)propylamine congeners was subjected to QSAR study using the linear free energy related (LFER) model of Hansch. Appropriate indicator variables encoding different group contributions and different physicochemical variables such as hydrophobicity (pi), electronic (Hammett sigma), and steric (molar refractivity, STERIMOL values) parameters of phenyl ring substituents of the compounds were used as predictor variables. The Hansch analysis explores the importance of the lipophilicity and electron-donating substituents for the binding affinity. However, this method could not give more insight into the structure-activity relationships because of the diverse molecular features in the data set. 3D-QSAR analyses of the same data set using Molecular Shape Analysis (MSA), Receptor Surface Analysis (RSA), and Molecular Field Analysis (MFA) techniques were also performed. The best model with acceptable statistical quality was derived from the MSA, which showed the importance of the relative negative charge (RNCG): substituents with a high RNCG value have more binding affinity than the unsubstituted piperidine and phenyl (R1 position) congeners. The relative negative charge surface area (RNCS) is detrimental (e.g. R2 = 3,4-Cl2) for the activity. An increase in the length of the molecule in the Z dimension (Lz) is conducive (e.g. R3 = sulfonylmorpholino), while an increase in the area of the molecular shadow in the XZ plane (Sxz) is detrimental (e.g. R1 = N-c-hexylmethyl-5-oxopyrrolidin-3-yl) for the binding affinity. The presence of a chiral center makes the molecule less active (e.g. R1 = N-methyl-5-oxopyrrolidin-3-yl). An increase in the van der Waals area, the molecular volume, and the difference between the volume of the individual molecule and the shape reference compound are conducive (e.g. R3 = (CH3)2NSO2-) for the binding affinity. Substituents with higher JursFPSA_2 values (fractional charged partial surface area) like the N-methylsulfonylpiperidin-4-yl (R1 position) group have better binding affinity than the substituents such as 4-chlorophenylamino (R1 position). Unsubstituted piperidines (R1 position) with less JursFNSA_1 values have lower binding affinity than the 4-chlorophenyl substituted compounds. The MFA derived equation shows interaction energies at different grid points, while the RSA model shows the importance of hydrophobicity and charge at different regions of the molecules. The models were validated through the leave-one-out, leave-15%-out, and leave-25%-out cross-validation techniques. The developed models were also subjected to a randomization test (99% confidence level). Although the MSA derived models had excellent statistical qualities both for the training as well as test sets, RSA and MFA results for the test sets are not comparable statistically with the MSA derived models.
Monga, Isha; Qureshi, Abid; Thakur, Nishant; Gupta, Amit Kumar; Kumar, Manoj
2017-01-01
Allele-specific siRNAs (ASP-siRNAs) have emerged as promising therapeutic molecules owing to their selectivity to inhibit the mutant allele or associated single-nucleotide polymorphisms (SNPs) sparing the expression of the wild-type counterpart. Thus, a dedicated bioinformatics platform encompassing updated ASP-siRNAs and an algorithm for the prediction of their inhibitory efficacy will be helpful in tackling currently intractable genetic disorders. In the present study, we have developed the ASPsiRNA resource (http://crdd.osdd.net/servers/aspsirna/) covering three components viz (i) ASPsiDb, (ii) ASPsiPred, and (iii) analysis tools like ASP-siOffTar. ASPsiDb is a manually curated database harboring 4543 (including 422 chemically modified) ASP-siRNAs targeting 78 unique genes involved in 51 different diseases. It furnishes comprehensive information from experimental studies on ASP-siRNAs along with multidimensional genetic and clinical information for numerous mutations. ASPsiPred is a two-layered algorithm to predict efficacy of ASP-siRNAs for fully complementary mutant (Effmut) and wild-type allele (Effwild) with one mismatch by ASPsiPredSVM and ASPsiPredmatrix, respectively. In ASPsiPredSVM, 922 unique ASP-siRNAs with experimentally validated quantitative Effmut were used. During 10-fold cross-validation (10nCV) employing various sequence features on the training/testing dataset (T737), the best predictive model achieved a maximum Pearson’s correlation coefficient (PCC) of 0.71. Further, the accuracy of the classifier to predict Effmut against novel genes was assessed by leave one target out cross-validation approach (LOTOCV). ASPsiPredmatrix was constructed from rule-based studies describing the effect of single siRNA:mRNA mismatches on the efficacy at 19 different locations of siRNA. Thus, ASPsiRNA encompasses the first database, prediction algorithm, and off-target analysis tool that is expected to accelerate research in the field of RNAi-based therapeutics for human genetic diseases. PMID:28696921
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ren, S; Tianjin University, Tianjin; Hara, W
Purpose: MRI has a number of advantages over CT as a primary modality for radiation treatment planning (RTP). However, one key bottleneck problem still remains, which is the lack of electron density information in MRI. In the work, a reliable method to map electron density is developed by leveraging the differential contrast of multi-parametric MRI. Methods: We propose a probabilistic Bayesian approach for electron density mapping based on T1 and T2-weighted MRI, using multiple patients as atlases. For each voxel, we compute two conditional probabilities: (1) electron density given its image intensity on T1 and T2-weighted MR images, and (2)more » electron density given its geometric location in a reference anatomy. The two sources of information (image intensity and spatial location) are combined into a unifying posterior probability density function using the Bayesian formalism. The mean value of the posterior probability density function provides the estimated electron density. Results: We evaluated the method on 10 head and neck patients and performed leave-one-out cross validation (9 patients as atlases and remaining 1 as test). The proposed method significantly reduced the errors in electron density estimation, with a mean absolute HU error of 138, compared with 193 for the T1-weighted intensity approach and 261 without density correction. For bone detection (HU>200), the proposed method had an accuracy of 84% and a sensitivity of 73% at specificity of 90% (AUC = 87%). In comparison, the AUC for bone detection is 73% and 50% using the intensity approach and without density correction, respectively. Conclusion: The proposed unifying method provides accurate electron density estimation and bone detection based on multi-parametric MRI of the head with highly heterogeneous anatomy. This could allow for accurate dose calculation and reference image generation for patient setup in MRI-based radiation treatment planning.« less
Differentiation of Glioblastoma and Lymphoma Using Feature Extraction and Support Vector Machine.
Yang, Zhangjing; Feng, Piaopiao; Wen, Tian; Wan, Minghua; Hong, Xunning
2017-01-01
Differentiation of glioblastoma multiformes (GBMs) and lymphomas using multi-sequence magnetic resonance imaging (MRI) is an important task that is valuable for treatment planning. However, this task is a challenge because GBMs and lymphomas may have a similar appearance in MRI images. This similarity may lead to misclassification and could affect the treatment results. In this paper, we propose a semi-automatic method based on multi-sequence MRI to differentiate these two types of brain tumors. Our method consists of three steps: 1) the key slice is selected from 3D MRIs and region of interests (ROIs) are drawn around the tumor region; 2) different features are extracted based on prior clinical knowledge and validated using a t-test; and 3) features that are helpful for classification are used to build an original feature vector and a support vector machine is applied to perform classification. In total, 58 GBM cases and 37 lymphoma cases are used to validate our method. A leave-one-out crossvalidation strategy is adopted in our experiments. The global accuracy of our method was determined as 96.84%, which indicates that our method is effective for the differentiation of GBM and lymphoma and can be applied in clinical diagnosis. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
QSAR Analysis of 2-Amino or 2-Methyl-1-Substituted Benzimidazoles Against Pseudomonas aeruginosa
Podunavac-Kuzmanović, Sanja O.; Cvetković, Dragoljub D.; Barna, Dijana J.
2009-01-01
A set of benzimidazole derivatives were tested for their inhibitory activities against the Gram-negative bacterium Pseudomonas aeruginosa and minimum inhibitory concentrations were determined for all the compounds. Quantitative structure activity relationship (QSAR) analysis was applied to fourteen of the abovementioned derivatives using a combination of various physicochemical, steric, electronic, and structural molecular descriptors. A multiple linear regression (MLR) procedure was used to model the relationships between molecular descriptors and the antibacterial activity of the benzimidazole derivatives. The stepwise regression method was used to derive the most significant models as a calibration model for predicting the inhibitory activity of this class of molecules. The best QSAR models were further validated by a leave one out technique as well as by the calculation of statistical parameters for the established theoretical models. To confirm the predictive power of the models, an external set of molecules was used. High agreement between experimental and predicted inhibitory values, obtained in the validation procedure, indicated the good quality of the derived QSAR models. PMID:19468332
Białek, Michał; Markiewicz, Łukasz; Sawicki, Przemysław
2015-01-01
The delayed lotteries are much more common in everyday life than are pure lotteries. Usually, we need to wait to find out the outcome of the risky decision (e.g., investing in a stock market, engaging in a relationship). However, most research has studied the time discounting and probability discounting in isolation using the methodologies designed specifically to track changes in one parameter. Most commonly used method is adjusting, but its reported validity and time stability in research on discounting are suboptimal. The goal of this study was to introduce the novel method for analyzing delayed lotteries-conjoint analysis-which hypothetically is more suitable for analyzing individual preferences in this area. A set of two studies compared the conjoint analysis with adjusting. The results suggest that individual parameters of discounting strength estimated with conjoint have higher predictive value (Study 1 and 2), and they are more stable over time (Study 2) compared to adjusting. We discuss these findings, despite the exploratory character of reported studies, by suggesting that future research on delayed lotteries should be cross-validated using both methods.
NASA Astrophysics Data System (ADS)
Liu, L.; Du, L.; Liao, Y.
2017-12-01
Based on the ensemble hindcast dataset of CSM1.1m by NCC, CMA, Bayesian merging models and a two-step statistical model are developed and employed to predict monthly grid/station precipitation in the Huaihe River China during summer at the lead-time of 1 to 3 months. The hindcast datasets span a period of 1991 to 2014. The skill of the two models is evaluated using area under the ROC curve (AUC) in a leave-one-out cross-validation framework, and is compared to the skill of CSM1.1m. CSM1.1m has highest skill for summer precipitation from April while lowest from May, and has highest skill for precipitation in June but lowest for precipitation in July. Compared with raw outputs of climate models, some schemes of the two approaches have higher skill for the prediction from March and May, but almost schemes have lower skill for prediction from April. Compared to two-step approach, one sampling scheme of Bayesian merging approach has higher skill for the prediction from March, but has lower skill from May. The results suggest that there is potential to apply the two statistical models for monthly precipitation forecast in summer from March and from May over Huaihe River basin, but is potential to apply CSM1.1m forecast from April. Finally, the summer runoff during 1991 to 2014 is simulated based on one hydrological model using the climate hindcast of CSM1.1m and the two statistical models.
Kalderstam, Jonas; Edén, Patrik; Ohlsson, Mattias
2015-01-01
We investigate a new method to place patients into risk groups in censored survival data. Properties such as median survival time, and end survival rate, are implicitly improved by optimizing the area under the survival curve. Artificial neural networks (ANN) are trained to either maximize or minimize this area using a genetic algorithm, and combined into an ensemble to predict one of low, intermediate, or high risk groups. Estimated patient risk can influence treatment choices, and is important for study stratification. A common approach is to sort the patients according to a prognostic index and then group them along the quartile limits. The Cox proportional hazards model (Cox) is one example of this approach. Another method of doing risk grouping is recursive partitioning (Rpart), which constructs a decision tree where each branch point maximizes the statistical separation between the groups. ANN, Cox, and Rpart are compared on five publicly available data sets with varying properties. Cross-validation, as well as separate test sets, are used to validate the models. Results on the test sets show comparable performance, except for the smallest data set where Rpart's predicted risk groups turn out to be inverted, an example of crossing survival curves. Cross-validation shows that all three models exhibit crossing of some survival curves on this small data set but that the ANN model manages the best separation of groups in terms of median survival time before such crossings. The conclusion is that optimizing the area under the survival curve is a viable approach to identify risk groups. Training ANNs to optimize this area combines two key strengths from both prognostic indices and Rpart. First, a desired minimum group size can be specified, as for a prognostic index. Second, the ability to utilize non-linear effects among the covariates, which Rpart is also able to do.
Intratumor heterogeneity of DCE-MRI reveals Ki-67 proliferation status in breast cancer
NASA Astrophysics Data System (ADS)
Cheng, Hu; Fan, Ming; Zhang, Peng; Liu, Bin; Shao, Guoliang; Li, Lihua
2018-03-01
Breast cancer is a highly heterogeneous disease both biologically and clinically, and certain pathologic parameters, i.e., Ki67 expression, are useful in predicting the prognosis of patients. The aim of the study is to identify intratumor heterogeneity of breast cancer for predicting Ki-67 proliferation status in estrogen receptor (ER)-positive breast cancer patients. A dataset of 77 patients was collected who underwent dynamic contrast enhancement magnetic resonance imaging (DCE-MRI) examination. Of these patients, 51 were high-Ki-67 expression and 26 were low-Ki-67 expression. We partitioned the breast tumor into subregions using two methods based on the values of time to peak (TTP) and peak enhancement rate (PER). Within each tumor subregion, image features were extracted including statistical and morphological features from DCE-MRI. The classification models were applied on each region separately to assess whether the classifiers based on features extracted from various subregions features could have different performance for prediction. An area under a receiver operating characteristic curve (AUC) was computed using leave-one-out cross-validation (LOOCV) method. The classifier using features related with moderate time to peak achieved best performance with AUC of 0.826 than that based on the other regions. While using multi-classifier fusion method, the AUC value was significantly (P=0.03) increased to 0.858+/-0.032 compare to classifier with AUC of 0.778 using features from the entire tumor. The results demonstrated that features reflect heterogeneity in intratumoral subregions can improve the classifier performance to predict the Ki-67 proliferation status than the classifier using features from entire tumor alone.
Saichek, Nicholas R; Cox, Christopher R; Kim, Seungki; Harrington, Peter B; Stambach, Nicholas R; Voorhees, Kent J
2016-04-23
The Staphylococcus genus is composed of 44 species, with S. aureus being the most pathogenic. Isolates of S. aureus are generally susceptible to β-lactam antibiotics, but extensive use of this class of drugs has led to increasing emergence of resistant strains. Increased occurrence of coagulase-negative staphylococci as well as S. aureus infections, some with resistance to multiple classes of antibiotics, has driven the necessity for innovative options for treatment and infection control. Despite these increasing needs, current methods still only possess species-level capabilities and require secondary testing to determine antibiotic resistance. This study describes the use of metal oxide laser ionization mass spectrometry fatty acid (FA) profiling as a rapid, simultaneous Staphylococcus identification and antibiotic resistance determination method. Principal component analysis was used to classify 50 Staphyloccocus isolates. Leave-one-spectrum-out cross-validation indicated 100 % correct assignment at the species and strain level. Fuzzy rule building expert system classification and self-optimizing partial least squares discriminant analysis, with more rigorous evaluations, also consistently achieved greater than 94 and 84 % accuracy, respectively. Preliminary analysis differentiating MRSA from MSSA demonstrated the feasibility of simultaneous determination of strain identification and antibiotic resistance. The utility of CeO2-MOLI MS FA profiling coupled with multivariate statistical analysis for performing strain-level differentiation of various Staphylococcus species proved to be a fast and reliable tool for identification. The simultaneous strain-level detection and antibiotic resistance determination achieved with this method should greatly improve outcomes and reduce clinical costs for therapeutic management and infection control.
Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin
2017-01-01
Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization. PMID:28599282
Liu, Wei; Wang, Zhen-Zhong; Qing, Jian-Ping; Li, Hong-Juan; Xiao, Wei
2014-01-01
Background: Peach kernels which contain kinds of fatty acids play an important role in the regulation of a variety of physiological and biological functions. Objective: To establish an innovative and rapid diffuse reflectance near-infrared spectroscopy (DR-NIR) analysis method along with chemometric techniques for the qualitative and quantitative determination of a peach kernel. Materials and Methods: Peach kernel samples from nine different origins were analyzed with high-performance liquid chromatography (HPLC) as a reference method. DR-NIR is in the spectral range 1100-2300 nm. Principal component analysis (PCA) and partial least squares regression (PLSR) algorithm were applied to obtain prediction models, The Savitzky-Golay derivative and first derivative were adopted for the spectral pre-processing, PCA was applied to classify the varieties of those samples. For the quantitative calibration, the models of linoleic and oleinic acids were established with the PLSR algorithm and the optimal principal component (PC) numbers were selected with leave-one-out (LOO) cross-validation. The established models were evaluated with the root mean square error of deviation (RMSED) and corresponding correlation coefficients (R2). Results: The PCA results of DR-NIR spectra yield clear classification of the two varieties of peach kernel. PLSR had a better predictive ability. The correlation coefficients of the two calibration models were above 0.99, and the RMSED of linoleic and oleinic acids were 1.266% and 1.412%, respectively. Conclusion: The DR-NIR combined with PCA and PLSR algorithm could be used efficiently to identify and quantify peach kernels and also help to solve variety problem. PMID:25422544
IntNetLncSim: an integrative network analysis method to infer human lncRNA functional similarity
Hu, Yang; Yang, Haixiu; Zhou, Chen; Sun, Jie; Zhou, Meng
2016-01-01
Increasing evidence indicated that long non-coding RNAs (lncRNAs) were involved in various biological processes and complex diseases by communicating with mRNAs/miRNAs each other. Exploiting interactions between lncRNAs and mRNA/miRNAs to lncRNA functional similarity (LFS) is an effective method to explore function of lncRNAs and predict novel lncRNA-disease associations. In this article, we proposed an integrative framework, IntNetLncSim, to infer LFS by modeling the information flow in an integrated network that comprises both lncRNA-related transcriptional and post-transcriptional information. The performance of IntNetLncSim was evaluated by investigating the relationship of LFS with the similarity of lncRNA-related mRNA sets (LmRSets) and miRNA sets (LmiRSets). As a result, LFS by IntNetLncSim was significant positively correlated with the LmRSet (Pearson correlation γ2=0.8424) and LmiRSet (Pearson correlation γ2=0.2601). Particularly, the performance of IntNetLncSim is superior to several previous methods. In the case of applying the LFS to identify novel lncRNA-disease relationships, we achieved an area under the ROC curve (0.7300) in experimentally verified lncRNA-disease associations based on leave-one-out cross-validation. Furthermore, highly-ranked lncRNA-disease associations confirmed by literature mining demonstrated the excellent performance of IntNetLncSim. Finally, a web-accessible system was provided for querying LFS and potential lncRNA-disease relationships: http://www.bio-bigdata.com/IntNetLncSim. PMID:27323856
IntNetLncSim: an integrative network analysis method to infer human lncRNA functional similarity.
Cheng, Liang; Shi, Hongbo; Wang, Zhenzhen; Hu, Yang; Yang, Haixiu; Zhou, Chen; Sun, Jie; Zhou, Meng
2016-07-26
Increasing evidence indicated that long non-coding RNAs (lncRNAs) were involved in various biological processes and complex diseases by communicating with mRNAs/miRNAs each other. Exploiting interactions between lncRNAs and mRNA/miRNAs to lncRNA functional similarity (LFS) is an effective method to explore function of lncRNAs and predict novel lncRNA-disease associations. In this article, we proposed an integrative framework, IntNetLncSim, to infer LFS by modeling the information flow in an integrated network that comprises both lncRNA-related transcriptional and post-transcriptional information. The performance of IntNetLncSim was evaluated by investigating the relationship of LFS with the similarity of lncRNA-related mRNA sets (LmRSets) and miRNA sets (LmiRSets). As a result, LFS by IntNetLncSim was significant positively correlated with the LmRSet (Pearson correlation γ2=0.8424) and LmiRSet (Pearson correlation γ2=0.2601). Particularly, the performance of IntNetLncSim is superior to several previous methods. In the case of applying the LFS to identify novel lncRNA-disease relationships, we achieved an area under the ROC curve (0.7300) in experimentally verified lncRNA-disease associations based on leave-one-out cross-validation. Furthermore, highly-ranked lncRNA-disease associations confirmed by literature mining demonstrated the excellent performance of IntNetLncSim. Finally, a web-accessible system was provided for querying LFS and potential lncRNA-disease relationships: http://www.bio-bigdata.com/IntNetLncSim.
Zhang, Xin; Yan, Lin-Feng; Hu, Yu-Chuan; Li, Gang; Yang, Yang; Han, Yu; Sun, Ying-Zhi; Liu, Zhi-Cheng; Tian, Qiang; Han, Zi-Yang; Liu, Le-De; Hu, Bin-Quan; Qiu, Zi-Yu; Wang, Wen; Cui, Guang-Bin
2017-07-18
Current machine learning techniques provide the opportunity to develop noninvasive and automated glioma grading tools, by utilizing quantitative parameters derived from multi-modal magnetic resonance imaging (MRI) data. However, the efficacies of different machine learning methods in glioma grading have not been investigated.A comprehensive comparison of varied machine learning methods in differentiating low-grade gliomas (LGGs) and high-grade gliomas (HGGs) as well as WHO grade II, III and IV gliomas based on multi-parametric MRI images was proposed in the current study. The parametric histogram and image texture attributes of 120 glioma patients were extracted from the perfusion, diffusion and permeability parametric maps of preoperative MRI. Then, 25 commonly used machine learning classifiers combined with 8 independent attribute selection methods were applied and evaluated using leave-one-out cross validation (LOOCV) strategy. Besides, the influences of parameter selection on the classifying performances were investigated. We found that support vector machine (SVM) exhibited superior performance to other classifiers. By combining all tumor attributes with synthetic minority over-sampling technique (SMOTE), the highest classifying accuracy of 0.945 or 0.961 for LGG and HGG or grade II, III and IV gliomas was achieved. Application of Recursive Feature Elimination (RFE) attribute selection strategy further improved the classifying accuracies. Besides, the performances of LibSVM, SMO, IBk classifiers were influenced by some key parameters such as kernel type, c, gama, K, etc. SVM is a promising tool in developing automated preoperative glioma grading system, especially when being combined with RFE strategy. Model parameters should be considered in glioma grading model optimization.
Constructing an integrated gene similarity network for the identification of disease genes.
Tian, Zhen; Guo, Maozu; Wang, Chunyu; Xing, LinLin; Wang, Lei; Zhang, Yin
2017-09-20
Discovering novel genes that are involved human diseases is a challenging task in biomedical research. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a credible gene similarity network and then infer disease genes on the whole genomic scale. We proposed a novel method, named RWRB, to infer causal genes of interested diseases. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employee the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks. Moreover, we conduct a comprehensive case study for Alzheimer's disease and predict some novel disease genes that supported by literature. RWRB is an effective and reliable algorithm in prioritizing candidate disease genes on the genomic scale. Software and supplementary information are available at http://nclab.hit.edu.cn/~tianzhen/RWRB/ .
Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad
2014-01-01
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.
Guo, Xiali; Cui, Meng; Deng, Min; Liu, Xingxing; Huang, Xueyong; Zhang, Xinglei; Luo, Liping
2017-01-01
Five chemotypes, the isoborneol-type, camphora-type, cineole-type, linalool-type and borneol-type of Cinnamomum camphora (L.) Presl have been identified at the molecular level based on the multivariate analysis of mass spectral fingerprints recorded from a total of 750 raw leaf samples (i.e., 150 leaves equally collected for each chemotype) using desorption atmospheric pressure chemical ionization mass spectrometry (DAPCI-MS). Both volatile and semi-volatile metabolites of the fresh leaves of C. camphora were simultaneously detected by DAPCI-MS without any sample pretreatment, reducing the analysis time from half a day using conventional methods (e.g., GC-MS) down to 30 s. The pattern recognition results obtained using principal component analysis (PCA) was cross-checked by cluster analysis (CA), showing that the difference visualized by the DAPCI-MS spectral fingerprints was validated with 100% accuracy. The study demonstrates that DAPCI-MS meets the challenging requirements for accurate differentiation of all the five chemotypes of C. camphora leaves, motivating more advanced application of DAPCI-MS in plant science and forestry studies. PMID:28425482
Knee cartilage segmentation using active shape models and local binary patterns
NASA Astrophysics Data System (ADS)
González, Germán.; Escalante-Ramírez, Boris
2014-05-01
Segmentation of knee cartilage has been useful for opportune diagnosis and treatment of osteoarthritis (OA). This paper presents a semiautomatic segmentation technique based on Active Shape Models (ASM) combined with Local Binary Patterns (LBP) and its approaches to describe the surrounding texture of femoral cartilage. The proposed technique is tested on a 16-image database of different patients and it is validated through Leave- One-Out method. We compare different segmentation techniques: ASM-LBP, ASM-medianLBP, and ASM proposed by Cootes. The ASM-LBP approaches are tested with different ratios to decide which of them describes the cartilage texture better. The results show that ASM-medianLBP has better performance than ASM-LBP and ASM. Furthermore, we add a routine which improves the robustness versus two principal problems: oversegmentation and initialization.
Adams, James; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen
2017-01-01
Introduction A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. Methods In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. “Leave-one-out” cross-validation was used to ensure statistical independence of results. Results and Discussion Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate Speech), but significant associations were found for UTM with all eleven autism-related assessments with cross-validation R2 values ranging from 0.12–0.48. PMID:28068407
Bakrania, Kishan; Yates, Thomas; Rowlands, Alex V.; Esliger, Dale W.; Bunnewell, Sarah; Sanders, James; Davies, Melanie; Khunti, Kamlesh; Edwardson, Charlotte L.
2016-01-01
Objectives (1) To develop and internally-validate Euclidean Norm Minus One (ENMO) and Mean Amplitude Deviation (MAD) thresholds for separating sedentary behaviours from common light-intensity physical activities using raw acceleration data collected from both hip- and wrist-worn tri-axial accelerometers; and (2) to compare and evaluate the performances between the ENMO and MAD metrics. Methods Thirty-three adults [mean age (standard deviation (SD)) = 27.4 (5.9) years; mean BMI (SD) = 23.9 (3.7) kg/m2; 20 females (60.6%)] wore four accelerometers; an ActiGraph GT3X+ and a GENEActiv on the right hip; and an ActiGraph GT3X+ and a GENEActiv on the non-dominant wrist. Under laboratory-conditions, participants performed 16 different activities (11 sedentary behaviours and 5 light-intensity physical activities) for 5 minutes each. ENMO and MAD were computed from the raw acceleration data, and logistic regression and receiver-operating-characteristic (ROC) analyses were implemented to derive thresholds for activity discrimination. Areas under ROC curves (AUROC) were calculated to summarise performances and thresholds were assessed via executing leave-one-out-cross-validations. Results For both hip and wrist monitor placements, in comparison to the ActiGraph GT3X+ monitors, the ENMO and MAD values derived from the GENEActiv devices were observed to be slightly higher, particularly for the lower-intensity activities. Monitor-specific hip and wrist ENMO and MAD thresholds showed excellent ability for separating sedentary behaviours from motion-based light-intensity physical activities (in general, AUROCs >0.95), with validation indicating robustness. However, poor classification was experienced when attempting to isolate standing still from sedentary behaviours (in general, AUROCs <0.65). The ENMO and MAD metrics tended to perform similarly across activities and accelerometer brands. Conclusions Researchers can utilise these robust monitor-specific hip and wrist ENMO and MAD thresholds, in order to accurately separate sedentary behaviours from common motion-based light-intensity physical activities. However, caution should be taken if isolating sedentary behaviours from standing is of particular interest. PMID:27706241
Ghasemi, Jahan B; Safavi-Sohi, Reihaneh; Barbosa, Euzébio G
2012-02-01
A quasi 4D-QSAR has been carried out on a series of potent Gram-negative LpxC inhibitors. This approach makes use of the molecular dynamics (MD) trajectories and topology information retrieved from the GROMACS package. This new methodology is based on the generation of a conformational ensemble profile, CEP, for each compound instead of only one conformation, followed by the calculation intermolecular interaction energies at each grid point considering probes and all aligned conformations resulting from MD simulations. These interaction energies are independent variables employed in a QSAR analysis. The comparison of the proposed methodology to comparative molecular field analysis (CoMFA) formalism was performed. This methodology explores jointly the main features of CoMFA and 4D-QSAR models. Step-wise multiple linear regression was used for the selection of the most informative variables. After variable selection, multiple linear regression (MLR) and partial least squares (PLS) methods used for building the regression models. Leave-N-out cross-validation (LNO), and Y-randomization were performed in order to confirm the robustness of the model in addition to analysis of the independent test set. Best models provided the following statistics: [Formula in text] (PLS) and [Formula in text] (MLR). Docking study was applied to investigate the major interactions in protein-ligand complex with CDOCKER algorithm. Visualization of the descriptors of the best model helps us to interpret the model from the chemical point of view, supporting the applicability of this new approach in rational drug design.
Deep residual networks for automatic segmentation of laparoscopic videos of the liver
NASA Astrophysics Data System (ADS)
Gibson, Eli; Robu, Maria R.; Thompson, Stephen; Edwards, P. Eddie; Schneider, Crispin; Gurusamy, Kurinchi; Davidson, Brian; Hawkes, David J.; Barratt, Dean C.; Clarkson, Matthew J.
2017-03-01
Motivation: For primary and metastatic liver cancer patients undergoing liver resection, a laparoscopic approach can reduce recovery times and morbidity while offering equivalent curative results; however, only about 10% of tumours reside in anatomical locations that are currently accessible for laparoscopic resection. Augmenting laparoscopic video with registered vascular anatomical models from pre-procedure imaging could support using laparoscopy in a wider population. Segmentation of liver tissue on laparoscopic video supports the robust registration of anatomical liver models by filtering out false anatomical correspondences between pre-procedure and intra-procedure images. In this paper, we present a convolutional neural network (CNN) approach to liver segmentation in laparoscopic liver procedure videos. Method: We defined a CNN architecture comprising fully-convolutional deep residual networks with multi-resolution loss functions. The CNN was trained in a leave-one-patient-out cross-validation on 2050 video frames from 6 liver resections and 7 laparoscopic staging procedures, and evaluated using the Dice score. Results: The CNN yielded segmentations with Dice scores >=0.95 for the majority of images; however, the inter-patient variability in median Dice score was substantial. Four failure modes were identified from low scoring segmentations: minimal visible liver tissue, inter-patient variability in liver appearance, automatic exposure correction, and pathological liver tissue that mimics non-liver tissue appearance. Conclusion: CNNs offer a feasible approach for accurately segmenting liver from other anatomy on laparoscopic video, but additional data or computational advances are necessary to address challenges due to the high inter-patient variability in liver appearance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dou, T; Ruan, D; Heinrich, M
2016-06-15
Purpose: To obtain a functional relationship that calibrates the lung tissue density change under free breathing conditions through correlating Jacobian values to the Hounsfield units. Methods: Free-breathing lung computed tomography images were acquired using a fast helical CT protocol, where 25 scans were acquired per patient. Using a state-of-the-art deformable registration algorithm, a set of the deformation vector fields (DVF) was generated to provide spatial mapping from the reference image geometry to the other free-breathing scans. These DVFs were used to generate Jacobian maps, which estimate voxelwise volume change. Subsequently, the set of 25 corresponding Jacobian and voxel intensity inmore » Hounsfield units (HU) were collected and linear regression was performed based on the mass conservation relationship to correlate the volume change to density change. Based on the resulting fitting coefficients, the tissues were classified into parenchymal (Type I), vascular (Type II), and soft tissue (Type III) types. These coefficients modeled the voxelwise density variation during quiet breathing. The accuracy of the proposed method was assessed using mean absolute difference in HU between the CT scan intensities and the model predicted values. In addition, validation experiments employing a leave-five-out method were performed to evaluate the model accuracy. Results: The computed mean model errors were 23.30±9.54 HU, 29.31±10.67 HU, and 35.56±20.56 HU, respectively, for regions I, II, and III, respectively. The cross validation experiments averaged over 100 trials had mean errors of 30.02 ± 1.67 HU over the entire lung. These mean values were comparable with the estimated CT image background noise. Conclusion: The reported validation experiment statistics confirmed the lung density modeling during free breathing. The proposed technique was general and could be applied to a wide range of problem scenarios where accurate dynamic lung density information is needed. This work was supported in part by NIH R01 CA0096679.« less
Testate amoeba transfer function performance along localised hydrological gradients.
Tsyganov, Andrey N; Mityaeva, Olga A; Mazei, Yuri A; Payne, Richard J
2016-09-01
Testate amoeba transfer functions are widely used for reconstruction of palaeo-hydrological regime in peatlands. However, the limitations of this approach have become apparent with increasing attention to validation and assessing sources of uncertainty. This paper investigates effects of peatland type and sampling depth on the performance of a transfer function using an independent test-set from four Sphagnum-dominated sites in European Russia (Penza Region). We focus on transfer function performance along localised hydrological gradients, which is a useful analogue for predictive ability through time. The performance of the transfer function with the independent test-set was generally weaker than for the leave-one-out or bootstrap cross-validations. However, the transfer function was robust for the reconstruction of relative changes in water-table depth, provided the presence of good modern analogues and overlap in water-table depth ranges. When applied to subsurface samples, the performance of the transfer function was reduced due to selective decomposition, the presence of deep-dwelling taxa or vertical transfer of shells. Our results stress the importance of thorough testing of transfer functions, and highlight the role of taphonomic processes in determining results. Further studies of stratification, taxonomy and taphonomy of testate amoebae will be needed to improve the robustness of transfer function output. Copyright © 2015 Elsevier GmbH. All rights reserved.
Zhang, Yong-Hong; Xia, Zhi-Ning; Qin, Li-Tang; Liu, Shu-Shen
2010-09-01
The objective of this paper is to build a reliable model based on the molecular electronegativity distance vector (MEDV) descriptors for predicting the blood-brain barrier (BBB) permeability and to reveal the effects of the molecular structural segments on the BBB permeability. Using 70 structurally diverse compounds, the partial least squares regression (PLSR) models between the BBB permeability and the MEDV descriptors were developed and validated by the variable selection and modeling based on prediction (VSMP) technique. The estimation ability, stability, and predictive power of a model are evaluated by the estimated correlation coefficient (r), leave-one-out (LOO) cross-validation correlation coefficient (q), and predictive correlation coefficient (R(p)). It has been found that PLSR model has good quality, r=0.9202, q=0.7956, and R(p)=0.6649 for M1 model based on the training set of 57 samples. To search the most important structural factors affecting the BBB permeability of compounds, we performed the values of the variable importance in projection (VIP) analysis for MEDV descriptors. It was found that some structural fragments in compounds, such as -CH(3), -CH(2)-, =CH-, =C, triple bond C-, -CH<, =C<, =N-, -NH-, =O, and -OH, are the most important factors affecting the BBB permeability. (c) 2010. Published by Elsevier Inc.
Spatial prediction of near surface soil water retention functions using hydrogeophysics
NASA Astrophysics Data System (ADS)
Gibson, J. P.; Franz, T. E.
2017-12-01
The hydrological community often turns to widely available spatial datasets such as SSURGO to characterize the spatial variability of soil across a landscape of interest. This has served as a reasonable first approximation when lacking localized soil data. However, previous work has shown that information loss within land surface models primarily stems from parameterization. Localized soil sampling is both expensive and time intense, and thus a need exists in connecting spatial datasets with ground observations. Given that hydrogeophysics is data-dense, rapid, and relatively easy to adopt, it is a promising technique to help dovetail localized soil sampling with larger spatial datasets. In this work, we utilize 2 geophysical techniques; cosmic ray neutron probe and electromagnetic induction, to identify temporally stable soil moisture patterns. This is achieved by measuring numerous times over a range of wet to dry field conditions in order to apply an empirical orthogonal function. We then present measured water retention functions of shallow cores extracted within each temporally stable zone. Lastly, we use soil moisture patterns as a covariate to predict soil hydraulic properties in areas without measurement and validate using a leave-one-out cross validation analysis. Using these approaches to better constrain soil hydraulic property variability, we speculate that further research can better estimate hydrologic fluxes in areas of interest.
Mazzotti, M; Bartoli, I; Castellazzi, G; Marzani, A
2014-09-01
The paper aims at validating a recently proposed Semi Analytical Finite Element (SAFE) formulation coupled with a 2.5D Boundary Element Method (2.5D BEM) for the extraction of dispersion data in immersed waveguides of generic cross-section. To this end, three-dimensional vibroacoustic analyses are carried out on two waveguides of square and rectangular cross-section immersed in water using the commercial Finite Element software Abaqus/Explicit. Real wavenumber and attenuation dispersive data are extracted by means of a modified Matrix Pencil Method. It is demonstrated that the results obtained using the two techniques are in very good agreement. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
McReynolds, Naomi; Cooke, Fiona G. M.; Chen, Mingzhou; Powis, Simon J.; Dholakia, Kishan
2017-02-01
Moving towards label-free techniques for cell identification is essential for many clinical and research applications. Raman spectroscopy and digital holographic microscopy (DHM) are both label-free, non-destructive optical techniques capable of providing complimentary information. We demonstrate a multi-modal system which may simultaneously take Raman spectra and DHM images to provide both a molecular and a morphological description of our sample. In this study we use Raman spectroscopy and DHM to discriminate between three immune cell populations CD4+ T cells, B cells, and monocytes, which together comprise key functional immune cell subsets in immune responses to invading pathogens. Various parameters that may be used to describe the phase images are also examined such as pixel value histograms or texture analysis. Using our system it is possible to consider each technique individually or in combination. Principal component analysis is used on the data set to discriminate between cell types and leave-one-out cross-validation is used to estimate the efficiency of our method. Raman spectroscopy provides specific chemical information but requires relatively long acquisition times, combining this with a faster modality such as DHM could help achieve faster throughput rates. The combination of these two complimentary optical techniques provides a wealth of information for cell characterisation which is a step towards achieving label free technology for the identification of human immune cells.
Liu, Fengping; Cao, Chenzhong; Cheng, Bin
2011-01-01
A quantitative structure–property relationship (QSPR) analysis of aliphatic alcohols is presented. Four physicochemical properties were studied: boiling point (BP), n-octanol–water partition coefficient (lg POW), water solubility (lg W) and the chromatographic retention indices (RI) on different polar stationary phases. In order to investigate the quantitative structure–property relationship of aliphatic alcohols, the molecular structure ROH is divided into two parts, R and OH to generate structural parameter. It was proposed that the property is affected by three main factors for aliphatic alcohols, alkyl group R, substituted group OH, and interaction between R and OH. On the basis of the polarizability effect index (PEI), previously developed by Cao, the novel molecular polarizability effect index (MPEI) combined with odd-even index (OEI), the sum eigenvalues of bond-connecting matrix (SX1CH) previously developed in our team, were used to predict the property of aliphatic alcohols. The sets of molecular descriptors were derived directly from the structure of the compounds based on graph theory. QSPR models were generated using only calculated descriptors and multiple linear regression techniques. These QSPR models showed high values of multiple correlation coefficient (R > 0.99) and Fisher-ratio statistics. The leave-one-out cross-validation demonstrated the final models to be statistically significant and reliable. PMID:21731451
Preliminary experiments on quantification of skin condition
NASA Astrophysics Data System (ADS)
Kitajima, Kenzo; Iyatomi, Hitoshi
2014-03-01
In this study, we investigated a preliminary assessment method for skin conditions such as a moisturizing property and its fineness of the skin with an image analysis only. We captured a facial images from volunteer subjects aged between 30s and 60s by Pocket Micro (R) device (Scalar Co., Japan). This device has two image capturing modes; the normal mode and the non-reflection mode with the aid of the equipped polarization filter. We captured skin images from a total of 68 spots from subjects' face using both modes (i.e. total of 136 skin images). The moisture-retaining property of the skin and subjective evaluation score of the skin fineness in 5-point scale for each case were also obtained in advance as a gold standard (their mean and SD were 35.15 +/- 3.22 (μS) and 3.45 +/- 1.17, respectively). We extracted a total of 107 image features from each image and built linear regression models for estimating abovementioned criteria with a stepwise feature selection. The developed model for estimating the skin moisture achieved the MSE of 1.92 (μS) with 6 selected parameters, while the model for skin fineness achieved that of 0.51 scales with 7 parameters under the leave-one-out cross validation. We confirmed the developed models predicted the moisture-retaining property and fineness of the skin appropriately with only captured image.
Yan, Huagang; Dong, Jianxin; Mo, Xiao; Li, Dan; Liu, Chunhong; Li, Haiyun
2017-01-01
Major depressive disorder (MDD) is a leading world-wide psychiatric disorder with high recurrence rate, therefore, it is desirable to identify current MDD (cMDD) and remitted MDD (rMDD) for their appropriate therapeutic interventions. In the study, 19 cMDD, 19 rMDD and 19 well-matched healthy controls (HC) were enrolled and scanned with the resting-state functional magnetic resonance imaging (rs-fMRI). The Hurst exponent (HE) of rs-fMRI in AAL-90 and AAL-1024 atlases were calculated and compared between groups. Then, a radial basis function (RBF) based support vector machine was proposed to identify every pair of the cMDD, rMDD and HC groups using the abnormal HE features, and a leave-one-out cross-validation was used to evaluate the classification performance. Applying the proposed method with AAL-1024 and AAL-90 atlas respectively, 87% and 84% subjects were correctly identified between cMDD and HC, 84% and 71% between rMDD and HC, and 89% and 74% between cMDD and rMDD. Our results indicated that the HE was an effective feature to distinguish cMDD and rMDD from HC, and the recognition performances with AAL-1024 parcellation were better than that with the conventional AAL-90 parcellation. PMID:29163844
Myocardial strain estimation from CT: towards computer-aided diagnosis on infarction identification
NASA Astrophysics Data System (ADS)
Wong, Ken C. L.; Tee, Michael; Chen, Marcus; Bluemke, David A.; Summers, Ronald M.; Yao, Jianhua
2015-03-01
Regional myocardial strains have the potential for early quantification and detection of cardiac dysfunctions. Although image modalities such as tagged and strain-encoded MRI can provide motion information of the myocardium, they are uncommon in clinical routine. In contrary, cardiac CT images are usually available, but they only provide motion information at salient features such as the cardiac boundaries. To estimate myocardial strains from a CT image sequence, we adopted a cardiac biomechanical model with hyperelastic material properties to relate the motion on the cardiac boundaries to the myocardial deformation. The frame-to-frame displacements of the cardiac boundaries are obtained using B-spline deformable image registration based on mutual information, which are enforced as boundary conditions to the biomechanical model. The system equation is solved by the finite element method to provide the dense displacement field of the myocardium, and the regional values of the three principal strains and the six strains in cylindrical coordinates are computed in terms of the American Heart Association nomenclature. To study the potential of the estimated regional strains on identifying myocardial infarction, experiments were performed on cardiac CT image sequences of ten canines with artificially induced myocardial infarctions. The leave-one-subject-out cross validations show that, by using the optimal strain magnitude thresholds computed from ROC curves, the radial strain and the first principal strain have the best performance.
NASA Astrophysics Data System (ADS)
Chaney, Nathaniel W.; Herman, Jonathan D.; Ek, Michael B.; Wood, Eric F.
2016-11-01
With their origins in numerical weather prediction and climate modeling, land surface models aim to accurately partition the surface energy balance. An overlooked challenge in these schemes is the role of model parameter uncertainty, particularly at unmonitored sites. This study provides global parameter estimates for the Noah land surface model using 85 eddy covariance sites in the global FLUXNET network. The at-site parameters are first calibrated using a Latin Hypercube-based ensemble of the most sensitive parameters, determined by the Sobol method, to be the minimum stomatal resistance (rs,min), the Zilitinkevich empirical constant (Czil), and the bare soil evaporation exponent (fxexp). Calibration leads to an increase in the mean Kling-Gupta Efficiency performance metric from 0.54 to 0.71. These calibrated parameter sets are then related to local environmental characteristics using the Extra-Trees machine learning algorithm. The fitted Extra-Trees model is used to map the optimal parameter sets over the globe at a 5 km spatial resolution. The leave-one-out cross validation of the mapped parameters using the Noah land surface model suggests that there is the potential to skillfully relate calibrated model parameter sets to local environmental characteristics. The results demonstrate the potential to use FLUXNET to tune the parameterizations of surface fluxes in land surface models and to provide improved parameter estimates over the globe.
Muñoz-Ruiz, Miguel Ángel; Hall, Anette; Mattila, Jussi; Koikkalainen, Juha; Herukka, Sanna-Kaisa; Husso, Minna; Hänninen, Tuomo; Vanninen, Ritva; Liu, Yawu; Hallikainen, Merja; Lötjönen, Jyrki; Remes, Anne M.; Alafuzoff, Irina; Soininen, Hilkka; Hartikainen, Päivi
2016-01-01
Background Disease State Index (DSI) and its visualization, Disease State Fingerprint (DSF), form a computer-assisted clinical decision making tool that combines patient data and compares them with cases with known outcomes. Aims To investigate the ability of the DSI to diagnose frontotemporal dementia (FTD) and Alzheimer's disease (AD). Methods The study cohort consisted of 38 patients with FTD, 57 with AD and 22 controls. Autopsy verification of FTD with TDP-43 positive pathology was available for 14 and AD pathology for 12 cases. We utilized data from neuropsychological tests, volumetric magnetic resonance imaging, single-photon emission tomography, cerebrospinal fluid biomarkers and the APOE genotype. The DSI classification results were calculated with a combination of leave-one-out cross-validation and bootstrapping. A DSF visualization of a FTD patient is presented as an example. Results The DSI distinguishes controls from FTD (area under the receiver-operator curve, AUC = 0.99) and AD (AUC = 1.00) very well and achieves a good differential diagnosis between AD and FTD (AUC = 0.89). In subsamples of autopsy-confirmed cases (AUC = 0.97) and clinically diagnosed cases (AUC = 0.94), differential diagnosis of AD and FTD performs very well. Conclusions DSI is a promising computer-assisted biomarker approach for aiding in the diagnostic process of dementing diseases. Here, DSI separates controls from dementia and differentiates between AD and FTD. PMID:27703465
Adaptive Laplacian filtering for sensorimotor rhythm-based brain-computer interfaces.
Lu, Jun; McFarland, Dennis J; Wolpaw, Jonathan R
2013-02-01
Sensorimotor rhythms (SMRs) are 8-30 Hz oscillations in the electroencephalogram (EEG) recorded from the scalp over sensorimotor cortex that change with movement and/or movement imagery. Many brain-computer interface (BCI) studies have shown that people can learn to control SMR amplitudes and can use that control to move cursors and other objects in one, two or three dimensions. At the same time, if SMR-based BCIs are to be useful for people with neuromuscular disabilities, their accuracy and reliability must be improved substantially. These BCIs often use spatial filtering methods such as common average reference (CAR), Laplacian (LAP) filter or common spatial pattern (CSP) filter to enhance the signal-to-noise ratio of EEG. Here, we test the hypothesis that a new filter design, called an 'adaptive Laplacian (ALAP) filter', can provide better performance for SMR-based BCIs. An ALAP filter employs a Gaussian kernel to construct a smooth spatial gradient of channel weights and then simultaneously seeks the optimal kernel radius of this spatial filter and the regularization parameter of linear ridge regression. This optimization is based on minimizing the leave-one-out cross-validation error through a gradient descent method and is computationally feasible. Using a variety of kinds of BCI data from a total of 22 individuals, we compare the performances of ALAP filter to CAR, small LAP, large LAP and CSP filters. With a large number of channels and limited data, ALAP performs significantly better than CSP, CAR, small LAP and large LAP both in classification accuracy and in mean-squared error. Using fewer channels restricted to motor areas, ALAP is still superior to CAR, small LAP and large LAP, but equally matched to CSP. Thus, ALAP may help to improve the accuracy and robustness of SMR-based BCIs.
Adaptive Laplacian filtering for sensorimotor rhythm-based brain-computer interfaces
NASA Astrophysics Data System (ADS)
Lu, Jun; McFarland, Dennis J.; Wolpaw, Jonathan R.
2013-02-01
Objective. Sensorimotor rhythms (SMRs) are 8-30 Hz oscillations in the electroencephalogram (EEG) recorded from the scalp over sensorimotor cortex that change with movement and/or movement imagery. Many brain-computer interface (BCI) studies have shown that people can learn to control SMR amplitudes and can use that control to move cursors and other objects in one, two or three dimensions. At the same time, if SMR-based BCIs are to be useful for people with neuromuscular disabilities, their accuracy and reliability must be improved substantially. These BCIs often use spatial filtering methods such as common average reference (CAR), Laplacian (LAP) filter or common spatial pattern (CSP) filter to enhance the signal-to-noise ratio of EEG. Here, we test the hypothesis that a new filter design, called an ‘adaptive Laplacian (ALAP) filter’, can provide better performance for SMR-based BCIs. Approach. An ALAP filter employs a Gaussian kernel to construct a smooth spatial gradient of channel weights and then simultaneously seeks the optimal kernel radius of this spatial filter and the regularization parameter of linear ridge regression. This optimization is based on minimizing the leave-one-out cross-validation error through a gradient descent method and is computationally feasible. Main results. Using a variety of kinds of BCI data from a total of 22 individuals, we compare the performances of ALAP filter to CAR, small LAP, large LAP and CSP filters. With a large number of channels and limited data, ALAP performs significantly better than CSP, CAR, small LAP and large LAP both in classification accuracy and in mean-squared error. Using fewer channels restricted to motor areas, ALAP is still superior to CAR, small LAP and large LAP, but equally matched to CSP. Significance. Thus, ALAP may help to improve the accuracy and robustness of SMR-based BCIs.
Archfield, Stacey A.; Pugliese, Alessio; Castellarin, Attilio; Skøien, Jon O.; Kiang, Julie E.
2013-01-01
In the United States, estimation of flood frequency quantiles at ungauged locations has been largely based on regional regression techniques that relate measurable catchment descriptors to flood quantiles. More recently, spatial interpolation techniques of point data have been shown to be effective for predicting streamflow statistics (i.e., flood flows and low-flow indices) in ungauged catchments. Literature reports successful applications of two techniques, canonical kriging, CK (or physiographical-space-based interpolation, PSBI), and topological kriging, TK (or top-kriging). CK performs the spatial interpolation of the streamflow statistic of interest in the two-dimensional space of catchment descriptors. TK predicts the streamflow statistic along river networks taking both the catchment area and nested nature of catchments into account. It is of interest to understand how these spatial interpolation methods compare with generalized least squares (GLS) regression, one of the most common approaches to estimate flood quantiles at ungauged locations. By means of a leave-one-out cross-validation procedure, the performance of CK and TK was compared to GLS regression equations developed for the prediction of 10, 50, 100 and 500 yr floods for 61 streamgauges in the southeast United States. TK substantially outperforms GLS and CK for the study area, particularly for large catchments. The performance of TK over GLS highlights an important distinction between the treatments of spatial correlation when using regression-based or spatial interpolation methods to estimate flood quantiles at ungauged locations. The analysis also shows that coupling TK with CK slightly improves the performance of TK; however, the improvement is marginal when compared to the improvement in performance over GLS.
Cha, Kenny H; Hadjiiski, Lubomir M; Samala, Ravi K; Chan, Heang-Ping; Cohan, Richard H; Caoili, Elaine M; Paramagul, Chintana; Alva, Ajjai; Weizer, Alon Z
2016-12-01
Assessing the response of bladder cancer to neoadjuvant chemotherapy is crucial for reducing morbidity and increasing quality of life of patients. Changes in tumor volume during treatment is generally used to predict treatment outcome. We are developing a method for bladder cancer segmentation in CT using a pilot data set of 62 cases. 65 000 regions of interests were extracted from pre-treatment CT images to train a deep-learning convolution neural network (DL-CNN) for tumor boundary detection using leave-one-case-out cross-validation. The results were compared to our previous AI-CALS method. For all lesions in the data set, the longest diameter and its perpendicular were measured by two radiologists, and 3D manual segmentation was obtained from one radiologist. The World Health Organization (WHO) criteria and the Response Evaluation Criteria In Solid Tumors (RECIST) were calculated, and the prediction accuracy of complete response to chemotherapy was estimated by the area under the receiver operating characteristic curve (AUC). The AUCs were 0.73 ± 0.06, 0.70 ± 0.07, and 0.70 ± 0.06, respectively, for the volume change calculated using DL-CNN segmentation, the AI-CALS and the manual contours. The differences did not achieve statistical significance. The AUCs using the WHO criteria were 0.63 ± 0.07 and 0.61 ± 0.06, while the AUCs using RECIST were 0.65 ± 007 and 0.63 ± 0.06 for the two radiologists, respectively. Our results indicate that DL-CNN can produce accurate bladder cancer segmentation for calculation of tumor size change in response to treatment. The volume change performed better than the estimations from the WHO criteria and RECIST for the prediction of complete response.
Cha, Kenny H.; Hadjiiski, Lubomir M.; Samala, Ravi K.; Chan, Heang-Ping; Cohan, Richard H.; Caoili, Elaine M.; Paramagul, Chintana; Alva, Ajjai; Weizer, Alon Z.
2017-01-01
Assessing the response of bladder cancer to neoadjuvant chemotherapy is crucial for reducing morbidity and increasing quality of life of patients. Changes in tumor volume during treatment is generally used to predict treatment outcome. We are developing a method for bladder cancer segmentation in CT using a pilot data set of 62 cases. 65 000 regions of interests were extracted from pre-treatment CT images to train a deep-learning convolution neural network (DL-CNN) for tumor boundary detection using leave-one-case-out cross-validation. The results were compared to our previous AI-CALS method. For all lesions in the data set, the longest diameter and its perpendicular were measured by two radiologists, and 3D manual segmentation was obtained from one radiologist. The World Health Organization (WHO) criteria and the Response Evaluation Criteria In Solid Tumors (RECIST) were calculated, and the prediction accuracy of complete response to chemotherapy was estimated by the area under the receiver operating characteristic curve (AUC). The AUCs were 0.73 ± 0.06, 0.70 ± 0.07, and 0.70 ± 0.06, respectively, for the volume change calculated using DL-CNN segmentation, the AI-CALS and the manual contours. The differences did not achieve statistical significance. The AUCs using the WHO criteria were 0.63 ± 0.07 and 0.61 ± 0.06, while the AUCs using RECIST were 0.65 ± 007 and 0.63 ± 0.06 for the two radiologists, respectively. Our results indicate that DL-CNN can produce accurate bladder cancer segmentation for calculation of tumor size change in response to treatment. The volume change performed better than the estimations from the WHO criteria and RECIST for the prediction of complete response. PMID:28105470
Serum Prognostic Biomarkers in Head and Neck Cancer Patients
Lin, Ho-Sheng; Siddiq, Fauzia; Talwar, Harvinder S.; Chen, Wei; Voichita, Calin; Draghici, Sorin; Jeyapalan, Gerald; Chatterjee, Madhumita; Fribley, Andrew; Yoo, George H.; Sethi, Seema; Kim, Harold; Sukari, Ammar; Folbe, Adam J.; Tainsky, Michael A.
2014-01-01
Objectives/Hypothesis A reliable estimate of survival is important as it may impact treatment choice. The objective of this study is to identify serum autoantibody biomarkers that can be used to improve prognostication for patients affected with head and neck squamous cell carcinoma (HNSCC). Study Design Prospective cohort study. Methods A panel of 130 serum biomarkers, previously selected for cancer detection using microarray-based serological profiling and specialized bioinformatics, were evaluated for their potential as prognostic biomarkers in a cohort of 119 HNSCC patients followed for up to 12.7 years. A biomarker was considered positive if its reactivity to the particular patient’s serum was greater than one standard deviation above the mean reactivity to sera from the other 118 patients, using a leave-one-out cross-validation model. Survival curves were estimated according to the Kaplan-Meier method, and statistically significant differences in survival were examined using the log rank test. Independent prognostic biomarkers were identified following analysis using multivariate Cox proportional hazards models. Results Poor overall survival was associated with African Americans (hazard ratio [HR] for death =2.61; 95% confidence interval [CI]: 1.58–4.33; P =.000), advanced stage (HR =2.79; 95% CI: 1.40–5.57; P =.004), and recurrent disease (HR =6.66; 95% CI: 2.54–17.44; P =.000). On multivariable Cox analysis adjusted for covariates (race and stage), six of the 130 markers evaluated were found to be independent prognosticators of overall survival. Conclusions The results shown here are promising and demonstrate the potential use of serum biomarkers for prognostication in HNSCC patients. Further clinical trials to include larger samples of patients across multiple centers may be warranted. PMID:24347532
A method to classify schizophrenia using inter-task spatial correlations of functional brain images.
Michael, Andrew M; Calhoun, Vince D; Andreasen, Nancy C; Baum, Stefi A
2008-01-01
The clinical heterogeneity of schizophrenia (scz) and the overlap of self reported and observed symptoms with other mental disorders makes its diagnosis a difficult task. At present no laboratory-based or image-based diagnostic tool for scz exists and such tools are desired to support existing methods for more precise diagnosis. Functional magnetic resonance imaging (fMRI) is currently employed to identify and correlate cognitive processes related to scz and its symptoms. Fusion of multiple fMRI tasks that probe different cognitive processes may help to better understand hidden networks of this complex disorder. In this paper we utilize three different fMRI tasks and introduce an approach to classify subjects based on inter-task spatial correlations of brain activation. The technique was applied to groups of patients and controls and its validity was checked with the leave-one-out method. We show that the classification rate increases when information from multiple tasks are combined.
Physicians' intention to leave direct patient care: an integrative review.
Degen, Christiane; Li, Jian; Angerer, Peter
2015-09-08
In light of the growing shortage of physicians worldwide, the problem of physicians who intend to leave direct patient care has become more acute, particularly in terms of quality of care and health-care costs. A literature search was carried out following Cooper's five-stage model for conducting an integrative literature review. Database searches were made in MEDLINE, PsycINFO and Web of Science in May 2014. A total of 17 studies from five countries were identified and the study results synthesized. Measures and percentages of physicians' intention to leave varied between the studies. Variables associated with intention to leave were demographics, with age- and gender-specific findings, family or personal domain, working time and psychosocial working conditions, job-related well-being and other career-related aspects. Gender differences were identified in several risk clusters. Factors such as long working hours and work-family conflict were particularly relevant for female physicians' intention to leave. Health-care managers and policy-makers should take action to improve physicians' working hours and psychosocial working conditions in order to prevent a high rate of intention to leave and limit the number of physicians actually leaving direct patient care. Further research is needed on gender-specific needs in the workplace, the connection between intention to leave and actually leaving and measures of intention to leave as well as using qualitative methods to gain a deeper understanding and developing validated questionnaires.
3D active shape models of human brain structures: application to patient-specific mesh generation
NASA Astrophysics Data System (ADS)
Ravikumar, Nishant; Castro-Mateos, Isaac; Pozo, Jose M.; Frangi, Alejandro F.; Taylor, Zeike A.
2015-03-01
The use of biomechanics-based numerical simulations has attracted growing interest in recent years for computer-aided diagnosis and treatment planning. With this in mind, a method for automatic mesh generation of brain structures of interest, using statistical models of shape (SSM) and appearance (SAM), for personalised computational modelling is presented. SSMs are constructed as point distribution models (PDMs) while SAMs are trained using intensity profiles sampled from a training set of T1-weighted magnetic resonance images. The brain structures of interest are, the cortical surface (cerebrum, cerebellum & brainstem), lateral ventricles and falx-cerebri membrane. Two methods for establishing correspondences across the training set of shapes are investigated and compared (based on SSM quality): the Coherent Point Drift (CPD) point-set registration method and B-spline mesh-to-mesh registration method. The MNI-305 (Montreal Neurological Institute) average brain atlas is used to generate the template mesh, which is deformed and registered to each training case, to establish correspondence over the training set of shapes. 18 healthy patients' T1-weightedMRimages form the training set used to generate the SSM and SAM. Both model-training and model-fitting are performed over multiple brain structures simultaneously. Compactness and generalisation errors of the BSpline-SSM and CPD-SSM are evaluated and used to quantitatively compare the SSMs. Leave-one-out cross validation is used to evaluate SSM quality in terms of these measures. The mesh-based SSM is found to generalise better and is more compact, relative to the CPD-based SSM. Quality of the best-fit model instance from the trained SSMs, to test cases are evaluated using the Hausdorff distance (HD) and mean absolute surface distance (MASD) metrics.
Zhang, Huiling; Huang, Qingsheng; Bei, Zhendong; Wei, Yanjie; Floudas, Christodoulos A
2016-03-01
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/. © 2016 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Liang, Sheng-Fu; Chen, Yi-Chun; Wang, Yu-Lin; Chen, Pin-Tzu; Yang, Chia-Hsiang; Chiueh, Herming
2013-08-01
Objective. Around 1% of the world's population is affected by epilepsy, and nearly 25% of patients cannot be treated effectively by available therapies. The presence of closed-loop seizure-triggered stimulation provides a promising solution for these patients. Realization of fast, accurate, and energy-efficient seizure detection is the key to such implants. In this study, we propose a two-stage on-line seizure detection algorithm with low-energy consumption for temporal lobe epilepsy (TLE). Approach. Multi-channel signals are processed through independent component analysis and the most representative independent component (IC) is automatically selected to eliminate artifacts. Seizure-like intracranial electroencephalogram (iEEG) segments are fast detected in the first stage of the proposed method and these seizures are confirmed in the second stage. The conditional activation of the second-stage signal processing reduces the computational effort, and hence energy, since most of the non-seizure events are filtered out in the first stage. Main results. Long-term iEEG recordings of 11 patients who suffered from TLE were analyzed via leave-one-out cross validation. The proposed method has a detection accuracy of 95.24%, a false alarm rate of 0.09/h, and an average detection delay time of 9.2 s. For the six patients with mesial TLE, a detection accuracy of 100.0%, a false alarm rate of 0.06/h, and an average detection delay time of 4.8 s can be achieved. The hierarchical approach provides a 90% energy reduction, yielding effective and energy-efficient implementation for real-time epileptic seizure detection. Significance. An on-line seizure detection method that can be applied to monitor continuous iEEG signals of patients who suffered from TLE was developed. An IC selection strategy to automatically determine the most seizure-related IC for seizure detection was also proposed. The system has advantages of (1) high detection accuracy, (2) low false alarm, (3) short detection latency, and (4) energy-efficient design for hardware implementation.
An empirical assessment of validation practices for molecular classifiers
Castaldi, Peter J.; Dahabreh, Issa J.
2011-01-01
Proposed molecular classifiers may be overfit to idiosyncrasies of noisy genomic and proteomic data. Cross-validation methods are often used to obtain estimates of classification accuracy, but both simulations and case studies suggest that, when inappropriate methods are used, bias may ensue. Bias can be bypassed and generalizability can be tested by external (independent) validation. We evaluated 35 studies that have reported on external validation of a molecular classifier. We extracted information on study design and methodological features, and compared the performance of molecular classifiers in internal cross-validation versus external validation for 28 studies where both had been performed. We demonstrate that the majority of studies pursued cross-validation practices that are likely to overestimate classifier performance. Most studies were markedly underpowered to detect a 20% decrease in sensitivity or specificity between internal cross-validation and external validation [median power was 36% (IQR, 21–61%) and 29% (IQR, 15–65%), respectively]. The median reported classification performance for sensitivity and specificity was 94% and 98%, respectively, in cross-validation and 88% and 81% for independent validation. The relative diagnostic odds ratio was 3.26 (95% CI 2.04–5.21) for cross-validation versus independent validation. Finally, we reviewed all studies (n = 758) which cited those in our study sample, and identified only one instance of additional subsequent independent validation of these classifiers. In conclusion, these results document that many cross-validation practices employed in the literature are potentially biased and genuine progress in this field will require adoption of routine external validation of molecular classifiers, preferably in much larger studies than in current practice. PMID:21300697
Karunamoorthi, Kaliyaperumal; Mulelam, Adane; Wassie, Fentahun
2009-01-12
A cross-sectional descriptive study was carried out to assess the knowledge and usage custom of traditional insect/mosquitoes repellent plants among the inhabitants in Addis Zemen Town, Ethiopia. Stratified, systematic random sampling was used for selection of 393 households from the total of 5161 households. One adult from each household was interviewed. The ethnobotonical survey was carried out during the period February 2007 to March 2007. Data analysis was carried out using SPSS, version 9.0. Range and mean were analysed and appropriate tables, graphs and percentage were displayed. Level of significance also determined by using 95% of confidence intervals and p-value. Overall, 97.2% of the respondents had ample knowledge and usage custom concerning traditional insect/mosquito repellent plants. Application of smoke (91.55%) was one of the most commonly well-known methods amongst local community by burning the plant parts such as leaves, stems and roots. Leaves were used by 90.2% for the application smoke. Knowledge and usage custom of traditional insect/mosquito repellent plants had significantly associated with sex (p=0.013) and lower income of respondents (p=0.002). In spite of this, knowledge and usage custom had no significant association with age and educational status. Furthermore, the survey also indicated that most commonly known traditional insect/mosquito repellent plants were Woira*(1) (Olea europaea) 44%, Tinjut* (Ostostegia integrifolia) 39%, Neem* (Azadirachta indica) 14.1%, Wogert* (Silene macroserene) 1.4%, and Kebercho* (Echinops sp.) 1.1%. Indigenous traditional insect/mosquito repellent plants have been used by local hamlet since ancient times for various medicinal purposes. Besides, they are not toxic like existing modern synthetic chemical repellents. Therefore, the traditional use of repellent plants should be encouraged and promoted among the local community.
Ren, Biye
2003-01-01
Structure-boiling point relationships are studied for a series of oxo organic compounds by means of multiple linear regression (MLR) analysis. Excellent MLR models based on the recently introduced Xu index and the atom-type-based AI indices are obtained for the two subsets containing respectively 77 ethers and 107 carbonyl compounds and a combined set of 184 oxo compounds. The best models are tested using the leave-one-out cross-validation and an external test set, respectively. The MLR model produces a correlation coefficient of r = 0.9977 and a standard error of s = 3.99 degrees C for the training set of 184 compounds, and r(cv) = 0.9974 and s(cv) = 4.16 degrees C for the cross-validation set, and r(pred) = 0.9949 and s(pred) = 4.38 degrees C for the prediction set of 21 compounds. For the two subsets containing respectively 77 ethers and 107 carbonyl compounds, the quality of the models is further improved. The standard errors are reduced to 3.30 and 3.02 degrees C, respectively. Furthermore, the results obtained from this study indicate that the boiling points of the studied oxo compound dominantly depend on molecular size and also depend on individual atom types, especially oxygen heteroatoms in molecules due to strong polar interactions between molecules. These excellent structure-boiling point models not only provide profound insights into the role of structural features in a molecule but also illustrate the usefulness of these indices in QSPR/QSAR modeling of complex compounds.
Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
Korjus, Kristjan; Hebart, Martin N; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.
Yoo, Kwangsun; Rosenberg, Monica D; Hsu, Wei-Ting; Zhang, Sheng; Li, Chiang-Shan R; Scheinost, Dustin; Constable, R Todd; Chun, Marvin M
2018-02-15
Connectome-based predictive modeling (CPM; Finn et al., 2015; Shen et al., 2017) was recently developed to predict individual differences in traits and behaviors, including fluid intelligence (Finn et al., 2015) and sustained attention (Rosenberg et al., 2016a), from functional brain connectivity (FC) measured with fMRI. Here, using the CPM framework, we compared the predictive power of three different measures of FC (Pearson's correlation, accordance, and discordance) and two different prediction algorithms (linear and partial least square [PLS] regression) for attention function. Accordance and discordance are recently proposed FC measures that respectively track in-phase synchronization and out-of-phase anti-correlation (Meskaldji et al., 2015). We defined connectome-based models using task-based or resting-state FC data, and tested the effects of (1) functional connectivity measure and (2) feature-selection/prediction algorithm on individualized attention predictions. Models were internally validated in a training dataset using leave-one-subject-out cross-validation, and externally validated with three independent datasets. The training dataset included fMRI data collected while participants performed a sustained attention task and rested (N = 25; Rosenberg et al., 2016a). The validation datasets included: 1) data collected during performance of a stop-signal task and at rest (N = 83, including 19 participants who were administered methylphenidate prior to scanning; Farr et al., 2014a; Rosenberg et al., 2016b), 2) data collected during Attention Network Task performance and rest (N = 41, Rosenberg et al., in press), and 3) resting-state data and ADHD symptom severity from the ADHD-200 Consortium (N = 113; Rosenberg et al., 2016a). Models defined using all combinations of functional connectivity measure (Pearson's correlation, accordance, and discordance) and prediction algorithm (linear and PLS regression) predicted attentional abilities, with correlations between predicted and observed measures of attention as high as 0.9 for internal validation, and 0.6 for external validation (all p's < 0.05). Models trained on task data outperformed models trained on rest data. Pearson's correlation and accordance features generally showed a small numerical advantage over discordance features, while PLS regression models were usually better than linear regression models. Overall, in addition to correlation features combined with linear models (Rosenberg et al., 2016a), it is useful to consider accordance features and PLS regression for CPM. Copyright © 2017 Elsevier Inc. All rights reserved.
One size fits all electronics for insole-based activity monitoring.
Hegde, Nagaraj; Bries, Matthew; Melanson, Edward; Sazonov, Edward
2017-07-01
Footwear based wearable sensors are becoming prominent in many areas of monitoring health and wellness, such as gait and activity monitoring. In our previous research we introduced an insole based wearable system SmartStep, which is completely integrated in a socially acceptable package. From a manufacturing perspective, SmartStep's electronics had to be custom made for each shoe size, greatly complicating the manufacturing process. In this work we explore the possibility of making a universal electronics platform for SmartStep - SmartStep 3.0, which can be used in the most common insole sizes without modifications. A pilot human subject experiments were run to compare the accuracy between the one-size fits all (SmartStep 3.0) and custom size SmartStep 2.0. A total of ~10 hours of data was collected in the pilot study involving three participants performing different activities of daily living while wearing SmartStep 2.0 and SmartStep 3.0. Leave one out cross validation resulted in a 98.5% average accuracy from SmartStep 2.0, while SmartStep 3.0 resulted in 98.3% accuracy, suggesting that the SmartStep 3.0 can be as accurate as SmartStep 2.0, while fitting most common shoe sizes.
Shahreen, Shejuty; Banik, Joyanta; Hafiz, Abdul; Rahman, Shahnaz; Zaman, Anahita Tanzia; Shoyeb, Md Abu; Chowdhury, Majeedul H; Rahmatullah, Mohammed
2012-01-01
Averrhoa carambola L. (Oxalidaceae), Ficus hispida L.f. (Moraceae), and Syzygium samarangense (Blume) Merr. & L.M. Perry (Myrtaceae) are three common plants in Bangladesh, the fruits of which are edible. The leaves and fruits of A. carambola and F. hispida are used by folk medicinal practitioners for treatment of diabetes, while the leaves of S. samarangense are used for treatment of cold, itches, and waist pain. Since scientific studies are absent on the antihyperglycemic effects of the leaves of the three plants, it was the objective of the present study to evaluate the antihyperglycemic potential of methanolic extract of leaves of the plants in oral glucose tolerance tests carried out with glucose-loaded mice. The extracts at different doses were administered one hour prior to glucose administration and blood glucose level was measured after two hours of glucose administration (p.o.) using glucose oxidase method. Significant oral hypoglycemic activity was found with the extracts of leaves of all three plants tested. The fall in serum glucose levels were dose-dependent for every individual plant, being highest at the highest dose tested of 400 mg extract per kg body weight. At this dose, the extracts of A. carambola, F. hispida, and S. samarangense caused, respectively, 34.1, 22.7, and 59.3% reductions in serum glucose levels when compared to control animals. The standard antihyperglycemic drug, glibenclamide, caused a 57.3% reduction in serum glucose levels versus control. Among the three plants evaluated, the methanolic extract of leaves of S. samarangense proved to be the most potent in demonstrating antihyperglycemic effects. The result validates the folk medicinal uses of A. carambola and F. hispida in the treatment of diabetes, and indicates that the leaves of S. samarangense can also possibly be used for amelioration of diabetes-induced hyperglycemia.
WaLIDD score, a new tool to diagnose dysmenorrhea and predict medical leave in university students
Teherán, Aníbal A; Piñeros, Luis Gabriel; Pulido, Fabián; Mejía Guatibonza, María Camila
2018-01-01
Background Dysmenorrhea is a frequent and misdiagnosed symptom affecting the quality of life in young women. A working ability, location, intensity, days of pain, dysmenorrhea (WaLIDD) score was designed to diagnose dysmenorrhea and to predict medical leave. Methods This cross-sectional design included young medical students, who completed a self-administered questionnaire that contained the verbal rating score (VRS; pain and drug subscales) and WaLIDD scales. The correlation between scales was established through Spearman test. The area under the receiver operating characteristic (ROC) curve, sensitivity, specificity, and likelihood ratio (LR +/−) were evaluated to diagnose students availing medical leave due to dysmenorrhea; moreover, to predict medical leave in students with dysmenorrhea, a binary logistic regression was performed. Results In all, 585 students, with a mean age of 21 years and menarche at 12 years, participated. Most of them had regular cycles, 5 days of menstrual blood flow and 1–2 days of lower abdominal pain. The WaLIDD scale presented an adequate internal consistency and strong correlation with VRS subscales. With a cutoff of >6 for WaLIDD and 2 for VRS subscales (drug subscale and pain subscale) to identify students with dysmenorrhea, these scales presented an area under the curve (AUC) ROC of 0.82, 0.62, and 0.67, respectively. To identify students taking medical leave due to dysmenorrhea, WaLIDD (cutoff >9) and VRS subscales (cutoff >2) presented an AUC ROC of 0.97, 0.68, and 0.81; moreover, the WaLIDD scale showed a good LR +14.2 (95% CI, 13.5–14.9), LR −0.00 (95% CI, undefined), and predictive risk (OR 5.38; 95% CI, 1.78–16.2). Conclusion This research allowed a comparison between two multidimensional scales regarding their capabilities, one previously validated and a new one, to discriminate among the general population of medical students, among those with dysmenorrhea or those availing medical leave secondary to dysmenorrhea. WaLIDD score showed a larger effect size than the pain and drug score in the students. In addition, this study demonstrated the ability to predict this combination of events. PMID:29398923
Furuhama, A; Toida, T; Nishikawa, N; Aoki, Y; Yoshioka, Y; Shiraishi, H
2010-07-01
The KAshinhou Tool for Ecotoxicity (KATE) system, including ecotoxicity quantitative structure-activity relationship (QSAR) models, was developed by the Japanese National Institute for Environmental Studies (NIES) using the database of aquatic toxicity results gathered by the Japanese Ministry of the Environment and the US EPA fathead minnow database. In this system chemicals can be entered according to their one-dimensional structures and classified by substructure. The QSAR equations for predicting the toxicity of a chemical compound assume a linear correlation between its log P value and its aquatic toxicity. KATE uses a structural domain called C-judgement, defined by the substructures of specified functional groups in the QSAR models. Internal validation by the leave-one-out method confirms that the QSAR equations, with r(2 )> 0.7, RMSE
NASA Astrophysics Data System (ADS)
Zhao, Jianhua; Zeng, Haishan; Kalia, Sunil; Lui, Harvey
2017-02-01
Background: Raman spectroscopy is a non-invasive optical technique which can measure molecular vibrational modes within tissue. A large-scale clinical study (n = 518) has demonstrated that real-time Raman spectroscopy could distinguish malignant from benign skin lesions with good diagnostic accuracy; this was validated by a follow-up independent study (n = 127). Objective: Most of the previous diagnostic algorithms have typically been based on analyzing the full band of the Raman spectra, either in the fingerprint or high wavenumber regions. Our objective in this presentation is to explore wavenumber selection based analysis in Raman spectroscopy for skin cancer diagnosis. Methods: A wavenumber selection algorithm was implemented using variably-sized wavenumber windows, which were determined by the correlation coefficient between wavenumbers. Wavenumber windows were chosen based on accumulated frequency from leave-one-out cross-validated stepwise regression or least and shrinkage selection operator (LASSO). The diagnostic algorithms were then generated from the selected wavenumber windows using multivariate statistical analyses, including principal component and general discriminant analysis (PC-GDA) and partial least squares (PLS). A total cohort of 645 confirmed lesions from 573 patients encompassing skin cancers, precancers and benign skin lesions were included. Lesion measurements were divided into training cohort (n = 518) and testing cohort (n = 127) according to the measurement time. Result: The area under the receiver operating characteristic curve (ROC) improved from 0.861-0.891 to 0.891-0.911 and the diagnostic specificity for sensitivity levels of 0.99-0.90 increased respectively from 0.17-0.65 to 0.20-0.75 by selecting specific wavenumber windows for analysis. Conclusion: Wavenumber selection based analysis in Raman spectroscopy improves skin cancer diagnostic specificity at high sensitivity levels.
NASA Astrophysics Data System (ADS)
Mahrooghy, Majid; Ashraf, Ahmed B.; Daye, Dania; Mies, Carolyn; Rosen, Mark; Feldman, Michael; Kontos, Despina
2014-03-01
We evaluate the prognostic value of sparse representation-based features by applying the K-SVD algorithm on multiparametric kinetic, textural, and morphologic features in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). K-SVD is an iterative dimensionality reduction method that optimally reduces the initial feature space by updating the dictionary columns jointly with the sparse representation coefficients. Therefore, by using K-SVD, we not only provide sparse representation of the features and condense the information in a few coefficients but also we reduce the dimensionality. The extracted K-SVD features are evaluated by a machine learning algorithm including a logistic regression classifier for the task of classifying high versus low breast cancer recurrence risk as determined by a validated gene expression assay. The features are evaluated using ROC curve analysis and leave one-out cross validation for different sparse representation and dimensionality reduction numbers. Optimal sparse representation is obtained when the number of dictionary elements is 4 (K=4) and maximum non-zero coefficients is 2 (L=2). We compare K-SVD with ANOVA based feature selection for the same prognostic features. The ROC results show that the AUC of the K-SVD based (K=4, L=2), the ANOVA based, and the original features (i.e., no dimensionality reduction) are 0.78, 0.71. and 0.68, respectively. From the results, it can be inferred that by using sparse representation of the originally extracted multi-parametric, high-dimensional data, we can condense the information on a few coefficients with the highest predictive value. In addition, the dimensionality reduction introduced by K-SVD can prevent models from over-fitting.
Bitella, Giovanni; Rossi, Roberta; Bochicchio, Rocco; Perniola, Michele; Amato, Mariana
2014-01-01
Monitoring soil water content at high spatio-temporal resolution and coupled to other sensor data is crucial for applications oriented towards water sustainability in agriculture, such as precision irrigation or phenotyping root traits for drought tolerance. The cost of instrumentation, however, limits measurement frequency and number of sensors. The objective of this work was to design a low cost “open hardware” platform for multi-sensor measurements including water content at different depths, air and soil temperatures. The system is based on an open-source ARDUINO microcontroller-board, programmed in a simple integrated development environment (IDE). Low cost high-frequency dielectric probes were used in the platform and lab tested on three non-saline soils (ECe1: 2.5 < 0.1 mS/cm). Empirical calibration curves were subjected to cross-validation (leave-one-out method), and normalized root mean square error (NRMSE) were respectively 0.09 for the overall model, 0.09 for the sandy soil, 0.07 for the clay loam and 0.08 for the sandy loam. The overall model (pooled soil data) fitted the data very well (R2 = 0.89) showing a high stability, being able to generate very similar RMSEs during training and validation (RMSEtraining = 2.63; RMSEvalidation = 2.61). Data recorded on the card were automatically sent to a remote server allowing repeated field-data quality checks. This work provides a framework for the replication and upgrading of a customized low cost platform, consistent with the open source approach whereby sharing information on equipment design and software facilitates the adoption and continuous improvement of existing technologies. PMID:25337742
Vyas, V K; Gupta, N; Ghate, M; Patel, S
2014-01-01
In this study we designed novel substituted benzimidazole derivatives and predicted their absorption, distribution, metabolism, excretion and toxicity (ADMET) properties, based on a predictive 3D QSAR study on 132 substituted benzimidazoles as AngII-AT1 receptor antagonists. The two best predicted compounds were synthesized and evaluated for AngII-AT1 receptor antagonism. Three different alignment tools for comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were used. The best 3D QSAR models were obtained using the rigid body (Distill) alignment method. CoMFA and CoMSIA models were found to be statistically significant with leave-one-out correlation coefficients (q(2)) of 0.630 and 0.623, respectively, cross-validated coefficients (r(2)cv) of 0.651 and 0.630, respectively, and conventional coefficients of determination (r(2)) of 0.848 and 0.843, respectively. 3D QSAR models were validated using a test set of 24 compounds, giving satisfactory predicted results (r(2)pred) of 0.727 and 0.689 for the CoMFA and CoMSIA models, respectively. We have identified some key features in substituted benzimidazole derivatives, such as lipophilicity and H-bonding at the 2- and 5-positions of the benzimidazole nucleus, respectively, for AT1 receptor antagonistic activity. We designed 20 novel substituted benzimidazole derivatives and predicted their activity. In silico ADMET properties were also predicted for these designed molecules. Finally, the compounds with best predicted activity were synthesized and evaluated for in vitro angiotensin II-AT1 receptor antagonism.
Computer-aided diagnosis of pulmonary diseases using x-ray darkfield radiography
NASA Astrophysics Data System (ADS)
Einarsdóttir, Hildur; Yaroshenko, Andre; Velroyen, Astrid; Bech, Martin; Hellbach, Katharina; Auweter, Sigrid; Yildirim, Önder; Meinel, Felix G.; Eickelberg, Oliver; Reiser, Maximilian; Larsen, Rasmus; Kjær Ersbøll, Bjarne; Pfeiffer, Franz
2015-12-01
In this work we develop a computer-aided diagnosis (CAD) scheme for classification of pulmonary disease for grating-based x-ray radiography. In addition to conventional transmission radiography, the grating-based technique provides a dark-field imaging modality, which utilizes the scattering properties of the x-rays. This modality has shown great potential for diagnosing early stage emphysema and fibrosis in mouse lungs in vivo. The CAD scheme is developed to assist radiologists and other medical experts to develop new diagnostic methods when evaluating grating-based images. The scheme consists of three stages: (i) automatic lung segmentation; (ii) feature extraction from lung shape and dark-field image intensities; (iii) classification between healthy, emphysema and fibrosis lungs. A study of 102 mice was conducted with 34 healthy, 52 emphysema and 16 fibrosis subjects. Each image was manually annotated to build an experimental dataset. System performance was assessed by: (i) determining the quality of the segmentations; (ii) validating emphysema and fibrosis recognition by a linear support vector machine using leave-one-out cross-validation. In terms of segmentation quality, we obtained an overlap percentage (Ω) 92.63 ± 3.65%, Dice Similarity Coefficient (DSC) 89.74 ± 8.84% and Jaccard Similarity Coefficient 82.39 ± 12.62%. For classification, the accuracy, sensitivity and specificity of diseased lung recognition was 100%. Classification between emphysema and fibrosis resulted in an accuracy of 93%, whilst the sensitivity was 94% and specificity 88%. In addition to the automatic classification of lungs, deviation maps created by the CAD scheme provide a visual aid for medical experts to further assess the severity of pulmonary disease in the lung, and highlights regions affected.
Wearable Vector Electrical Bioimpedance System to Assess Knee Joint Health
Hersek, Sinan; Töreyin, Hakan; Teague, Caitlin N.; Millard-Stafford, Mindy L.; Jeong, Hyeon-Ki; Bavare, Miheer M.; Wolkoff, Paul; Sawka, Michael N.; Inan, Omer T.
2017-01-01
Objective We designed and validated a portable electrical bioimpedance (EBI) system to quantify knee joint health. Methods Five separate experiments were performed to demonstrate the: (1) ability of the EBI system to assess knee injury and recovery; (2) inter-day variability of knee EBI measurements; (3) sensitivity of the system to small changes in interstitial fluid volume; (4) reducing the error of EBI measurements using acceleration signals; (5) use of the system with dry electrodes integrated to a wearable knee wrap. Results (1) The absolute difference in resistance (R) and reactance (X) from the left to the right knee was able to distinguish injured and healthy knees (p<0.05); the absolute difference in R decreased significantly (p<0.05) in injured subjects following rehabilitation. (2) The average inter-day variability (standard deviation) of the absolute difference in knee R was 2.5Ω, and for X was, 1.2 Ω. (3) Local heating/cooling resulted in a significant decrease/increase in knee R (p<0.01). (4) The proposed subject position detection algorithm achieved 97.4% leave-one subject out cross-validated accuracy and 98.2% precision in detecting when the subject is in the correct position to take measurements. (5) Linear regression between the knee R and X measured using the wet electrodes and the designed wearable knee wrap were highly correlated (r2 = 0.8 and 0.9, respectively). Conclusion This work demonstrates the use of wearable EBI measurements in monitoring knee joint health. Significance The proposed wearable system has the potential for assessing knee joint health outside the clinic/lab and help guide rehabilitation. PMID:28026745
Radiogenomics to characterize regional genetic heterogeneity in glioblastoma
Hu, Leland S.; Ning, Shuluo; Eschbacher, Jennifer M.; Baxter, Leslie C.; Gaw, Nathan; Ranjbar, Sara; Plasencia, Jonathan; Dueck, Amylou C.; Peng, Sen; Smith, Kris A.; Nakaji, Peter; Karis, John P.; Quarles, C. Chad; Wu, Teresa; Loftus, Joseph C.; Jenkins, Robert B.; Sicotte, Hugues; Kollmeyer, Thomas M.; O'Neill, Brian P.; Elmquist, William; Hoxworth, Joseph M.; Frakes, David; Sarkaria, Jann; Swanson, Kristin R.; Tran, Nhan L.; Li, Jing; Mitchell, J. Ross
2017-01-01
Background Glioblastoma (GBM) exhibits profound intratumoral genetic heterogeneity. Each tumor comprises multiple genetically distinct clonal populations with different therapeutic sensitivities. This has implications for targeted therapy and genetically informed paradigms. Contrast-enhanced (CE)-MRI and conventional sampling techniques have failed to resolve this heterogeneity, particularly for nonenhancing tumor populations. This study explores the feasibility of using multiparametric MRI and texture analysis to characterize regional genetic heterogeneity throughout MRI-enhancing and nonenhancing tumor segments. Methods We collected multiple image-guided biopsies from primary GBM patients throughout regions of enhancement (ENH) and nonenhancing parenchyma (so called brain-around-tumor, [BAT]). For each biopsy, we analyzed DNA copy number variants for core GBM driver genes reported by The Cancer Genome Atlas. We co-registered biopsy locations with MRI and texture maps to correlate regional genetic status with spatially matched imaging measurements. We also built multivariate predictive decision-tree models for each GBM driver gene and validated accuracies using leave-one-out-cross-validation (LOOCV). Results We collected 48 biopsies (13 tumors) and identified significant imaging correlations (univariate analysis) for 6 driver genes: EGFR, PDGFRA, PTEN, CDKN2A, RB1, and TP53. Predictive model accuracies (on LOOCV) varied by driver gene of interest. Highest accuracies were observed for PDGFRA (77.1%), EGFR (75%), CDKN2A (87.5%), and RB1 (87.5%), while lowest accuracy was observed in TP53 (37.5%). Models for 4 driver genes (EGFR, RB1, CDKN2A, and PTEN) showed higher accuracy in BAT samples (n = 16) compared with those from ENH segments (n = 32). Conclusion MRI and texture analysis can help characterize regional genetic heterogeneity, which offers potential diagnostic value under the paradigm of individualized oncology. PMID:27502248
Depeursinge, Adrien; Kurtz, Camille; Beaulieu, Christopher; Napel, Sandy; Rubin, Daniel
2014-08-01
We describe a framework to model visual semantics of liver lesions in CT images in order to predict the visual semantic terms (VST) reported by radiologists in describing these lesions. Computational models of VST are learned from image data using linear combinations of high-order steerable Riesz wavelets and support vector machines (SVM). In a first step, these models are used to predict the presence of each semantic term that describes liver lesions. In a second step, the distances between all VST models are calculated to establish a nonhierarchical computationally-derived ontology of VST containing inter-term synonymy and complementarity. A preliminary evaluation of the proposed framework was carried out using 74 liver lesions annotated with a set of 18 VSTs from the RadLex ontology. A leave-one-patient-out cross-validation resulted in an average area under the ROC curve of 0.853 for predicting the presence of each VST. The proposed framework is expected to foster human-computer synergies for the interpretation of radiological images while using rotation-covariant computational models of VSTs to 1) quantify their local likelihood and 2) explicitly link them with pixel-based image content in the context of a given imaging domain.
NASA Astrophysics Data System (ADS)
Yoon, Hong-Jun; Carmichael, Tandy R.; Tourassi, Georgia
2014-03-01
Two people may analyze a visual scene in two completely different ways. Our study sought to determine whether human gaze may be used to establish the identity of an individual. To accomplish this objective we investigated the gaze pattern of twelve individuals viewing still images with different spatial relationships. Specifically, we created 5 visual "dotpattern" tests to be shown on a standard computer monitor. These tests challenged the viewer's capacity to distinguish proximity, alignment, and perceptual organization. Each test included 50 images of varying difficulty (total of 250 images). Eye-tracking data were collected from each individual while taking the tests. The eye-tracking data were converted into gaze velocities and analyzed with Hidden Markov Models to develop personalized gaze profiles. Using leave-one-out cross-validation, we observed that these personalized profiles could differentiate among the 12 users with classification accuracy ranging between 53% and 76%, depending on the test. This was statistically significantly better than random guessing (i.e., 8.3% or 1 out of 12). Classification accuracy was higher for the tests where the users' average gaze velocity per case was lower. The study findings support the feasibility of using gaze as a biometric or personalized biomarker. These findings could have implications in Radiology training and the development of personalized e-learning environments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoon, Hong-Jun; Carmichael, Tandy; Tourassi, Georgia
Two people may analyze a visual scene in two completely different ways. Our study sought to determine whether human gaze may be used to establish the identity of an individual. To accomplish this objective we investigated the gaze pattern of twelve individuals viewing different still images with different spatial relationships. Specifically, we created 5 visual dot-pattern tests to be shown on a standard computer monitor. These tests challenged the viewer s capacity to distinguish proximity, alignment, and perceptual organization. Each test included 50 images of varying difficulty (total of 250 images). Eye-tracking data were collected from each individual while takingmore » the tests. The eye-tracking data were converted into gaze velocities and analyzed with Hidden Markov Models to develop personalized gaze profiles. Using leave-one-out cross-validation, we observed that these personalized profiles could differentiate among the 12 users with classification accuracy ranging between 53% and 76%, depending on the test. This was statistically significantly better than random guessing (i.e., 8.3% or 1 out of 12). Classification accuracy was higher for the tests where the users average gaze velocity per case was lower. The study findings support the feasibility of using gaze as a biometric or personalized biomarker. These findings could have implications in Radiology training and the development of personalized e-learning environments.« less
Cawley, Gavin C; Talbot, Nicola L C
2006-10-01
Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/
Development and Validation of an HPLC Method for Karanjin in Pongamia pinnata linn. Leaves.
Katekhaye, S; Kale, M S; Laddha, K S
2012-01-01
A rapid, simple and specific reversed-phase HPLC method has been developed for analysis of karanjin in Pongamia pinnata Linn. leaves. HPLC analysis was performed on a C(18) column using an 85:13.5:1.5 (v/v) mixtures of methanol, water and acetic acid as isocratic mobile phase at a flow rate of 1 ml/min. UV detection was at 300 nm. The method was validated for accuracy, precision, linearity, specificity. Validation revealed the method is specific, accurate, precise, reliable and reproducible. Good linear correlation coefficients (r(2)>0.997) were obtained for calibration plots in the ranges tested. Limit of detection was 4.35 μg and limit of quantification was 16.56 μg. Intra and inter-day RSD of retention times and peak areas was less than 1.24% and recovery was between 95.05 and 101.05%. The established HPLC method is appropriate enabling efficient quantitative analysis of karanjin in Pongamia pinnata leaves.
Development and Validation of an HPLC Method for Karanjin in Pongamia pinnata linn. Leaves
Katekhaye, S; Kale, M. S.; Laddha, K. S.
2012-01-01
A rapid, simple and specific reversed-phase HPLC method has been developed for analysis of karanjin in Pongamia pinnata Linn. leaves. HPLC analysis was performed on a C18 column using an 85:13.5:1.5 (v/v) mixtures of methanol, water and acetic acid as isocratic mobile phase at a flow rate of 1 ml/min. UV detection was at 300 nm. The method was validated for accuracy, precision, linearity, specificity. Validation revealed the method is specific, accurate, precise, reliable and reproducible. Good linear correlation coefficients (r2>0.997) were obtained for calibration plots in the ranges tested. Limit of detection was 4.35 μg and limit of quantification was 16.56 μg. Intra and inter-day RSD of retention times and peak areas was less than 1.24% and recovery was between 95.05 and 101.05%. The established HPLC method is appropriate enabling efficient quantitative analysis of karanjin in Pongamia pinnata leaves. PMID:23204626
NASA Astrophysics Data System (ADS)
Mantilla, Juan; Garreau, Mireille; Bellanger, Jean-Jacques; Paredes, José Luis
2013-11-01
Assessment of the cardiac Left Ventricle (LV) wall motion is generally based on visual inspection or quantitative analysis of 2D+t sequences acquired in short-axis cardiac cine-Magnetic Resonance Imaging (MRI). Most often, cardiac dynamic is globally analized from two particular phases of the cardiac cycle. In this paper, we propose an automated method to classify regional wall motion in LV function based on spatio-temporal pro les and Support Vector Machines (SVM). This approach allows to obtain a binary classi cation between normal and abnormal motion, without the need of pre-processing and by exploiting all the images of the cardiac cycle. In each short- axis MRI slice level (basal, median, and apical), the spatio-temporal pro les are extracted from the selection of a subset of diametrical lines crossing opposites LV segments. Initialized at end-diastole phase, the pro les are concatenated with their corresponding projections into the succesive temporal phases of the cardiac cycle. These pro les are associated to di erent types of information that derive from the image (gray levels), Fourier, Wavelet or Curvelet domains. The approach has been tested on a set of 14 abnormal and 6 healthy patients by using a leave-one-out cross validation and two kernel functions for SVM classi er. The best classi cation performance is yielded by using four-level db4 wavelet transform and SVM with a linear kernel. At each slice level the results provided a classi cation rate of 87.14% in apical level, 95.48% in median level and 93.65% in basal level.
Low-back electromyography (EMG) data-driven load classification for dynamic lifting tasks
Ojeda, Lauro; Johnson, Daniel D.; Gates, Deanna; Mower Provost, Emily; Barton, Kira
2018-01-01
Objective Numerous devices have been designed to support the back during lifting tasks. To improve the utility of such devices, this research explores the use of preparatory muscle activity to classify muscle loading and initiate appropriate device activation. The goal of this study was to determine the earliest time window that enabled accurate load classification during a dynamic lifting task. Methods Nine subjects performed thirty symmetrical lifts, split evenly across three weight conditions (no-weight, 10-lbs and 24-lbs), while low-back muscle activity data was collected. Seven descriptive statistics features were extracted from 100 ms windows of data. A multinomial logistic regression (MLR) classifier was trained and tested, employing leave-one subject out cross-validation, to classify lifted load values. Dimensionality reduction was achieved through feature cross-correlation analysis and greedy feedforward selection. The time of full load support by the subject was defined as load-onset. Results Regions of highest average classification accuracy started at 200 ms before until 200 ms after load-onset with average accuracies ranging from 80% (±10%) to 81% (±7%). The average recall for each class ranged from 69–92%. Conclusion These inter-subject classification results indicate that preparatory muscle activity can be leveraged to identify the intent to lift a weight up to 100 ms prior to load-onset. The high accuracies shown indicate the potential to utilize intent classification for assistive device applications. Significance Active assistive devices, e.g. exoskeletons, could prevent back injury by off-loading low-back muscles. Early intent classification allows more time for actuators to respond and integrate seamlessly with the user. PMID:29447252
NASA Astrophysics Data System (ADS)
Fabre, Anne-Claire; Salesa, Manuel J.; Cornette, Raphael; Antón, Mauricio; Morales, Jorge; Peigné, Stéphane
2015-06-01
Inferences of function and ecology in extinct taxa have long been a subject of interest because it is fundamental to understand the evolutionary history of species. In this study, we use a quantitative approach to investigate the locomotor behaviour of Simocyon batalleri, a key taxon related to the ailurid family. To do so, we use 3D surface geometric morphometric approaches on the three long bones of the forelimb of an extant reference sample. Next, we test the locomotor strategy of S. batalleri using a leave-one-out cross-validated linear discriminant analysis. Our results show that S. batalleri is included in the morphospace of the living species of musteloids. However, each bone of the forelimb appears to show a different functional signal suggesting that inferring the lifestyle or locomotor behaviour of fossils can be difficult and dependent on the bone investigated. This highlights the importance of studying, where possible, a maximum of skeletal elements to be able to make robust inferences on the lifestyle of extinct species. Finally, our results suggest that S. batalleri may be more arboreal than previously suggested.
On Time Domain Analysis of Photoplethysmogram Signals for Monitoring Heat Stress
Elgendi, Mohamed; Fletcher, Rich; Norton, Ian; Brearley, Matt; Abbott, Derek; Lovell, Nigel H.; Schuurmans, Dale
2015-01-01
There are a limited number of studies on heat stress dynamics during exercise using the photoplethysmogram (PPG) and its second derivative (APG). However, we investigate the most suitable index from short PPG signal recordings for heat stress assessment. The APG waveform consists of a, b, c and d waves in systole and an e wave in diastole. Our preliminary results indicate that the use of the energy of aa area, derived from PPG signals measured from emergency responders in tropical conditions, is promising in determining the heat stress level using 20-s recordings. After examining 14 time domain features using leave-one-out cross-validation, we found that the aa energy extracted from PPG signals is the most informative feature for classifying heat-stressed subjects, with an overall accuracy of 79%. Moreover, the combination of the aa energy with the traditional heart rate variability index of heat stress (i.e., the square root of the mean of the squares of the successive aa intervals) improved the heat stress detection to an overall accuracy of 83%. PMID:26404271
On Time Domain Analysis of Photoplethysmogram Signals for Monitoring Heat Stress.
Elgendi, Mohamed; Fletcher, Rich; Norton, Ian; Brearley, Matt; Abbott, Derek; Lovell, Nigel H; Schuurmans, Dale
2015-09-25
There are a limited number of studies on heat stress dynamics during exercise using the photoplethysmogram (PPG) and its second derivative (APG). However, we investigate the most suitable index from short PPG signal recordings for heat stress assessment. The APG waveform consists of a, b, c and d waves in systole and an e wave in diastole. Our preliminary results indicate that the use of the energy of aa area, derived from PPG signals measured from emergency responders in tropical conditions, is promising in determining the heat stress level using 20-s recordings. After examining 14 time domain features using leave-one-out cross-validation, we found that the aa energy extracted from PPG signals is the most informative feature for classifying heat-stressed subjects, with an overall accuracy of 79%. Moreover, the combination of the aa energy with the traditional Sensors 2015, 15 24717 heart rate variability index of heat stress (i.e., the square root of the mean of the squares of the successive aa intervals) improved the heat stress detection to an overall accuracy of 83%.