A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression.
Stock, Michiel; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem
2018-06-12
Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.
Jacquin, Laval; Cao, Tuong-Vi; Ahmadi, Nourollah
2016-01-01
One objective of this study was to provide readers with a clear and unified understanding of parametric statistical and kernel methods, used for genomic prediction, and to compare some of these in the context of rice breeding for quantitative traits. Furthermore, another objective was to provide a simple and user-friendly R package, named KRMM, which allows users to perform RKHS regression with several kernels. After introducing the concept of regularized empirical risk minimization, the connections between well-known parametric and kernel methods such as Ridge regression [i.e., genomic best linear unbiased predictor (GBLUP)] and reproducing kernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulated so as to show and emphasize the advantage of the kernel "trick" concept, exploited by kernel methods in the context of epistatic genetic architectures, over parametric frameworks used by conventional methods. Some parametric and kernel methods; least absolute shrinkage and selection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHS regression were thereupon compared for their genomic predictive ability in the context of rice breeding using three real data sets. Among the compared methods, RKHS regression and SVR were often the most accurate methods for prediction followed by GBLUP and LASSO. An R function which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression, with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time has been developed. Moreover, a modified version of this function, which allows users to tune kernels for RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.
Approximate l-fold cross-validation with Least Squares SVM and Kernel Ridge Regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, Richard E; Zhang, Hao; Parker, Lynne Edwards
2013-01-01
Kernel methods have difficulties scaling to large modern data sets. The scalability issues are based on computational and memory requirements for working with a large matrix. These requirements have been addressed over the years by using low-rank kernel approximations or by improving the solvers scalability. However, Least Squares Support VectorMachines (LS-SVM), a popular SVM variant, and Kernel Ridge Regression still have several scalability issues. In particular, the O(n^3) computational complexity for solving a single model, and the overall computational complexity associated with tuning hyperparameters are still major problems. We address these problems by introducing an O(n log n) approximate l-foldmore » cross-validation method that uses a multi-level circulant matrix to approximate the kernel. In addition, we prove our algorithm s computational complexity and present empirical runtimes on data sets with approximately 1 million data points. We also validate our approximate method s effectiveness at selecting hyperparameters on real world and standard benchmark data sets. Lastly, we provide experimental results on using a multi-level circulant kernel approximation to solve LS-SVM problems with hyperparameters selected using our method.« less
Learning a peptide-protein binding affinity predictor with kernel ridge regression
2013-01-01
Background The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. Results We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it’s approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. Conclusion On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/. PMID:23497081
2018-04-25
unlimited. NOTICES Disclaimers The findings in this report are not to be construed as an official Department of the Army position unless so...this report, intermolecular potentials for 1,3,5-triamino-2,4,6-trinitrobenzene (TATB) are developed using machine learning techniques. Three...potentials based on support vector regression, kernel ridge regression, and a neural network are fit using symmetry-adapted perturbation theory. The
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
2013-01-01
Background Arguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel. Results We sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible. Conclusions It is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance. PMID:23763755
DRREP: deep ridge regressed epitope predictor.
Sher, Gene; Zhi, Degui; Zhang, Shaojie
2017-10-03
The ability to predict epitopes plays an enormous role in vaccine development in terms of our ability to zero in on where to do a more thorough in-vivo analysis of the protein in question. Though for the past decade there have been numerous advancements and improvements in epitope prediction, on average the best benchmark prediction accuracies are still only around 60%. New machine learning algorithms have arisen within the domain of deep learning, text mining, and convolutional networks. This paper presents a novel analytically trained and string kernel using deep neural network, which is tailored for continuous epitope prediction, called: Deep Ridge Regressed Epitope Predictor (DRREP). DRREP was tested on long protein sequences from the following datasets: SARS, Pellequer, HIV, AntiJen, and SEQ194. DRREP was compared to numerous state of the art epitope predictors, including the most recently published predictors called LBtope and DMNLBE. Using area under ROC curve (AUC), DRREP achieved a performance improvement over the best performing predictors on SARS (13.7%), HIV (8.9%), Pellequer (1.5%), and SEQ194 (3.1%), with its performance being matched only on the AntiJen dataset, by the LBtope predictor, where both DRREP and LBtope achieved an AUC of 0.702. DRREP is an analytically trained deep neural network, thus capable of learning in a single step through regression. By combining the features of deep learning, string kernels, and convolutional networks, the system is able to perform residue-by-residue prediction of continues epitopes with higher accuracy than the current state of the art predictors.
Lopes, Marta B; Calado, Cecília R C; Figueiredo, Mário A T; Bioucas-Dias, José M
2017-06-01
The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid. Despite being successful in the presence of small nonlinearities, linear methods may fail in the presence of strong nonlinearities. This paper studies the potential usefulness of nonlinear regression methods for predicting, from in situ near-infrared (NIR) and mid-infrared (MIR) spectra acquired in high-throughput mode, biomass and plasmid concentrations in Escherichia coli DH5-α cultures producing the plasmid model pVAX-LacZ. The linear methods PLS and ridge regression (RR) are compared with their kernel (nonlinear) versions, kPLS and kRR, as well as with the (also nonlinear) relevance vector machine (RVM) and Gaussian process regression (GPR). For the systems studied, RR provided better predictive performances compared to the remaining methods. Moreover, the results point to further investigation based on larger data sets whenever differences in predictive accuracy between a linear method and its kernelized version could not be found. The use of nonlinear methods, however, shall be judged regarding the additional computational cost required to tune their additional parameters, especially when the less computationally demanding linear methods herein studied are able to successfully monitor the variables under study.
Chung, Moo K.; Qiu, Anqi; Seo, Seongho; Vorperian, Houri K.
2014-01-01
We present a novel kernel regression framework for smoothing scalar surface data using the Laplace-Beltrami eigenfunctions. Starting with the heat kernel constructed from the eigenfunctions, we formulate a new bivariate kernel regression framework as a weighted eigenfunction expansion with the heat kernel as the weights. The new kernel regression is mathematically equivalent to isotropic heat diffusion, kernel smoothing and recently popular diffusion wavelets. Unlike many previous partial differential equation based approaches involving diffusion, our approach represents the solution of diffusion analytically, reducing numerical inaccuracy and slow convergence. The numerical implementation is validated on a unit sphere using spherical harmonics. As an illustration, we have applied the method in characterizing the localized growth pattern of mandible surfaces obtained in CT images from subjects between ages 0 and 20 years by regressing the length of displacement vectors with respect to the template surface. PMID:25791435
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jolly, Brian C.; Helmreich, Grant; Cooley, Kevin M.
In support of fully ceramic microencapsulated (FCM) fuel development, coating development work is ongoing at Oak Ridge National Laboratory (ORNL) to produce tri-structural isotropic (TRISO) coated fuel particles with both UN kernels and surrogate (uranium-free) kernels. The nitride kernels are used to increase fissile density in these SiC-matrix fuel pellets with details described elsewhere. The surrogate TRISO particles are necessary for separate effects testing and for utilization in the consolidation process development. This report focuses on the fabrication and characterization of surrogate TRISO particles which use 800μm in diameter ZrO 2 microspheres as the kernel.
Chung, Moo K; Qiu, Anqi; Seo, Seongho; Vorperian, Houri K
2015-05-01
We present a novel kernel regression framework for smoothing scalar surface data using the Laplace-Beltrami eigenfunctions. Starting with the heat kernel constructed from the eigenfunctions, we formulate a new bivariate kernel regression framework as a weighted eigenfunction expansion with the heat kernel as the weights. The new kernel method is mathematically equivalent to isotropic heat diffusion, kernel smoothing and recently popular diffusion wavelets. The numerical implementation is validated on a unit sphere using spherical harmonics. As an illustration, the method is applied to characterize the localized growth pattern of mandible surfaces obtained in CT images between ages 0 and 20 by regressing the length of displacement vectors with respect to a surface template. Copyright © 2015 Elsevier B.V. All rights reserved.
Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.
2014-01-01
Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289
Kernel Partial Least Squares for Nonlinear Regression and Discrimination
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Clancy, Daniel (Technical Monitor)
2002-01-01
This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing kernel Hilbert space (RKHS). A previously proposed kernel PLS regression model was proven to be competitive with other regularized regression methods in RKHS. The family of nonlinear kernel-based PLS models is extended by considering the kernel PLS method for discrimination. Theoretical and experimental results on a two-class discrimination problem indicate usefulness of the method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jolly, Brian C.; Lindemer, Terrence; Terrani, Kurt A.
In support of fully ceramic matrix (FCM) fuel development, coating development work has begun at the Oak Ridge National Laboratory (ORNL) to produce tri-isotropic (TRISO) coated fuel particles with UN kernels. The nitride kernels are used to increase heavy metal density in these SiC-matrix fuel pellets with details described elsewhere. The advanced gas reactor (AGR) program at ORNL used fluidized bed chemical vapor deposition (FBCVD) techniques for TRISO coating of UCO (two phase mixture of UO 2 and UC x) kernels. Similar techniques were employed for coating of the UN kernels, however significant changes in processing conditions were required tomore » maintain acceptable coating properties due to physical property and dimensional differences between the UCO and UN kernels.« less
Triso coating development progress for uranium nitride kernels
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jolly, Brian C.; Lindemer, Terrence; Terrani, Kurt A.
2015-08-01
In support of fully ceramic matrix (FCM) fuel development [1-2], coating development work is ongoing at the Oak Ridge National Laboratory (ORNL) to produce tri-structural isotropic (TRISO) coated fuel particles with UN kernels [3]. The nitride kernels are used to increase fissile density in these SiC-matrix fuel pellets with details described elsewhere [4]. The advanced gas reactor (AGR) program at ORNL used fluidized bed chemical vapor deposition (FBCVD) techniques for TRISO coating of UCO (two phase mixture of UO2 and UCx) kernels [5]. Similar techniques were employed for coating of the UN kernels, however significant changes in processing conditions weremore » required to maintain acceptable coating properties due to physical property and dimensional differences between the UCO and UN kernels (Table 1).« less
Garriga, Miguel; Romero-Bravo, Sebastián; Estrada, Félix; Escobar, Alejandro; Matus, Iván A.; del Pozo, Alejandro; Astudillo, Cesar A.; Lobos, Gustavo A.
2017-01-01
Phenotyping, via remote and proximal sensing techniques, of the agronomic and physiological traits associated with yield potential and drought adaptation could contribute to improvements in breeding programs. In the present study, 384 genotypes of wheat (Triticum aestivum L.) were tested under fully irrigated (FI) and water stress (WS) conditions. The following traits were evaluated and assessed via spectral reflectance: Grain yield (GY), spikes per square meter (SM2), kernels per spike (KPS), thousand-kernel weight (TKW), chlorophyll content (SPAD), stem water soluble carbohydrate concentration and content (WSC and WSCC, respectively), carbon isotope discrimination (Δ13C), and leaf area index (LAI). The performances of spectral reflectance indices (SRIs), four regression algorithms (PCR, PLSR, ridge regression RR, and SVR), and three classification methods (PCA-LDA, PLS-DA, and kNN) were evaluated for the prediction of each trait. For the classification approaches, two classes were established for each trait: The lower 80% of the trait variability range (Class 1) and the remaining 20% (Class 2 or elite genotypes). Both the SRIs and regression methods performed better when data from FI and WS were combined. The traits that were best estimated by SRIs and regression methods were GY and Δ13C. For most traits and conditions, the estimations provided by RR and SVR were the same, or better than, those provided by the SRIs. PLS-DA showed the best performance among the categorical methods and, unlike the SRI and regression models, most traits were relatively well-classified within a specific hydric condition (FI or WS), proving that classification approach is an effective tool to be explored in future studies related to genotype selection. PMID:28337210
Garriga, Miguel; Romero-Bravo, Sebastián; Estrada, Félix; Escobar, Alejandro; Matus, Iván A; Del Pozo, Alejandro; Astudillo, Cesar A; Lobos, Gustavo A
2017-01-01
Phenotyping, via remote and proximal sensing techniques, of the agronomic and physiological traits associated with yield potential and drought adaptation could contribute to improvements in breeding programs. In the present study, 384 genotypes of wheat ( Triticum aestivum L.) were tested under fully irrigated (FI) and water stress (WS) conditions. The following traits were evaluated and assessed via spectral reflectance: Grain yield (GY), spikes per square meter (SM2), kernels per spike (KPS), thousand-kernel weight (TKW), chlorophyll content (SPAD), stem water soluble carbohydrate concentration and content (WSC and WSCC, respectively), carbon isotope discrimination (Δ 13 C), and leaf area index (LAI). The performances of spectral reflectance indices (SRIs), four regression algorithms (PCR, PLSR, ridge regression RR, and SVR), and three classification methods (PCA-LDA, PLS-DA, and k NN) were evaluated for the prediction of each trait. For the classification approaches, two classes were established for each trait: The lower 80% of the trait variability range (Class 1) and the remaining 20% (Class 2 or elite genotypes). Both the SRIs and regression methods performed better when data from FI and WS were combined. The traits that were best estimated by SRIs and regression methods were GY and Δ 13 C. For most traits and conditions, the estimations provided by RR and SVR were the same, or better than, those provided by the SRIs. PLS-DA showed the best performance among the categorical methods and, unlike the SRI and regression models, most traits were relatively well-classified within a specific hydric condition (FI or WS), proving that classification approach is an effective tool to be explored in future studies related to genotype selection.
Ridge: a computer program for calculating ridge regression estimates
Donald E. Hilt; Donald W. Seegrist
1977-01-01
Least-squares coefficients for multiple-regression models may be unstable when the independent variables are highly correlated. Ridge regression is a biased estimation procedure that produces stable estimates of the coefficients. Ridge regression is discussed, and a computer program for calculating the ridge coefficients is presented.
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
Locally-Based Kernal PLS Smoothing to Non-Parametric Regression Curve Fitting
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Trejo, Leonard J.; Wheeler, Kevin; Korsmeyer, David (Technical Monitor)
2002-01-01
We present a novel smoothing approach to non-parametric regression curve fitting. This is based on kernel partial least squares (PLS) regression in reproducing kernel Hilbert space. It is our concern to apply the methodology for smoothing experimental data where some level of knowledge about the approximate shape, local inhomogeneities or points where the desired function changes its curvature is known a priori or can be derived based on the observed noisy data. We propose locally-based kernel PLS regression that extends the previous kernel PLS methodology by incorporating this knowledge. We compare our approach with existing smoothing splines, hybrid adaptive splines and wavelet shrinkage techniques on two generated data sets.
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
ERIC Educational Resources Information Center
Bulcock, J. W.
The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…
ERIC Educational Resources Information Center
Lee, Wan-Fung; Bulcock, Jeffrey Wilson
The purposes of this study are: (1) to demonstrate the superiority of simple ridge regression over ordinary least squares regression through theoretical argument and empirical example; (2) to modify ridge regression through use of the variance normalization criterion; and (3) to demonstrate the superiority of simple ridge regression based on the…
Méndez, Nelson; Oviedo-Pastrana, Misael; Mattar, Salim; Caicedo-Castro, Isaac; Arrieta, German
2017-01-01
The Zika virus disease (ZVD) has had a huge impact on public health in Colombia for the numbers of people affected and the presentation of Guillain-Barre syndrome (GBS) and microcephaly cases associated to ZVD. A retrospective descriptive study was carried out, we analyze the epidemiological situation of ZVD and its association with microcephaly and GBS during a 21-month period, from October 2015 to June 2017. The variables studied were: (i) ZVD cases, (ii) ZVD cases in pregnant women, (iii) laboratory-confirmed ZVD in pregnant women, (iv) ZVD cases associated with microcephaly, (v) laboratory-confirmed ZVD associated with microcephaly, and (vi) ZVD associated to GBS cases. Average number of cases, attack rates (AR) and proportions were also calculated. The studied variables were plotted by epidemiological weeks and months. The distribution of ZVD cases in Colombia was mapped across the time using Kernel density estimator and QGIS software; we adopted Kernel Ridge Regression (KRR) and the Gaussian Kernel to estimate the number of Guillain Barre cases given the number of ZVD cases. One hundred eight thousand eighty-seven ZVD cases had been reported in Colombia, including 19,963 (18.5%) in pregnant women, 710 (0.66%) associated with microcephaly (AR, 4.87 cases per 10,000 live births) and 453 (0.42%) ZVD associated to GBS cases (AR, 41.9 GBS cases per 10,000 ZVD cases). It appears the cases of GBS increased in parallel with the cases of ZVD, cases of microcephaly appeared 5 months after recognition of the outbreak. The kernel density map shows that throughout the study period, the states most affected by the Zika outbreak in Colombia were mainly San Andrés and Providencia islands, Casanare, Norte de Santander, Arauca and Huila. The KRR shows that there is no proportional relationship between the number of GBS and ZVD cases. During the cross validation, the RMSE achieved for the second order polynomial kernel, the linear kernel, the sigmoid kernel, and the Gaussian kernel are 9.15, 9.2, 10.7, and 7.2 respectively. This study updates the epidemiological analysis of the ZVD situation in Colombia describes the geographical distribution of ZVD and shows the functional relationship between ZVD cases and GBS.
NASA Astrophysics Data System (ADS)
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
A Comparison of Methods for Nonparametric Estimation of Item Characteristic Curves for Binary Items
ERIC Educational Resources Information Center
Lee, Young-Sun
2007-01-01
This study compares the performance of three nonparametric item characteristic curve (ICC) estimation procedures: isotonic regression, smoothed isotonic regression, and kernel smoothing. Smoothed isotonic regression, employed along with an appropriate kernel function, provides better estimates and also satisfies the assumption of strict…
Chen, Lidong; Basu, Anup; Zhang, Maojun; Wang, Wei; Liu, Yu
2014-03-20
A complementary catadioptric imaging technique was proposed to solve the problem of low and nonuniform resolution in omnidirectional imaging. To enhance this research, our paper focuses on how to generate a high-resolution panoramic image from the captured omnidirectional image. To avoid the interference between the inner and outer images while fusing the two complementary views, a cross-selection kernel regression method is proposed. First, in view of the complementarity of sampling resolution in the tangential and radial directions between the inner and the outer images, respectively, the horizontal gradients in the expected panoramic image are estimated based on the scattered neighboring pixels mapped from the outer, while the vertical gradients are estimated using the inner image. Then, the size and shape of the regression kernel are adaptively steered based on the local gradients. Furthermore, the neighboring pixels in the next interpolation step of kernel regression are also selected based on the comparison between the horizontal and vertical gradients. In simulation and real-image experiments, the proposed method outperforms existing kernel regression methods and our previous wavelet-based fusion method in terms of both visual quality and objective evaluation.
Geographically weighted regression model on poverty indicator
NASA Astrophysics Data System (ADS)
Slamet, I.; Nugroho, N. F. T. A.; Muslich
2017-12-01
In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.
ℓ(p)-Norm multikernel learning approach for stock market price forecasting.
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.
Pronobis, Wiktor; Tkatchenko, Alexandre; Müller, Klaus-Robert
2018-06-12
Machine learning (ML) based prediction of molecular properties across chemical compound space is an important and alternative approach to efficiently estimate the solutions of highly complex many-electron problems in chemistry and physics. Statistical methods represent molecules as descriptors that should encode molecular symmetries and interactions between atoms. Many such descriptors have been proposed; all of them have advantages and limitations. Here, we propose a set of general two-body and three-body interaction descriptors which are invariant to translation, rotation, and atomic indexing. By adapting the successfully used kernel ridge regression methods of machine learning, we evaluate our descriptors on predicting several properties of small organic molecules calculated using density-functional theory. We use two data sets. The GDB-7 set contains 6868 molecules with up to 7 heavy atoms of type CNO. The GDB-9 set is composed of 131722 molecules with up to 9 heavy atoms containing CNO. When trained on 5000 random molecules, our best model achieves an accuracy of 0.8 kcal/mol (on the remaining 1868 molecules of GDB-7) and 1.5 kcal/mol (on the remaining 126722 molecules of GDB-9) respectively. Applying a linear regression model on our novel many-body descriptors performs almost equal to a nonlinear kernelized model. Linear models are readily interpretable: a feature importance ranking measure helps to obtain qualitative and quantitative insights on the importance of two- and three-body molecular interactions for predicting molecular properties computed with quantum-mechanical methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Svanda, Michal, E-mail: michal@astronomie.cz; Astronomical Institute, Charles University in Prague, Faculty of Mathematics and Physics, V Holesovickach 2, CZ-18000 Prague 8
2013-09-20
The consistency of time-distance inversions for horizontal components of the plasma flow on supergranular scales in the upper solar convection zone is checked by comparing the results derived using two k-{omega} filtering procedures-ridge filtering and phase-speed filtering-commonly used in time-distance helioseismology. I show that both approaches result in similar flow estimates when finite-frequency sensitivity kernels are used. I further demonstrate that the performance of the inversion improves (in terms of a simultaneously better averaging kernel and a lower noise level) when the two approaches are combined together in one inversion. Using the combined inversion, I invert for horizontal flows inmore » the upper 10 Mm of the solar convection zone. The flows connected with supergranulation seem to be coherent only for the top {approx}5 Mm; deeper down there is a hint of change of the convection scales toward structures larger than supergranules.« less
ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561
2010-01-01
Background Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. Results We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. Conclusion In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype. PMID:20840752
NASA Astrophysics Data System (ADS)
Shariff, Nurul Sima Mohamad; Ferdaos, Nur Aqilah
2017-08-01
Multicollinearity often leads to inconsistent and unreliable parameter estimates in regression analysis. This situation will be more severe in the presence of outliers it will cause fatter tails in the error distributions than the normal distributions. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is expected to be affected by the presence of outliers due to some assumptions imposed in the modeling procedure. Thus, the robust version of existing ridge method with some modification in the inverse matrix and the estimated response value is introduced. The performance of the proposed method is discussed and comparisons are made with several existing estimators namely, Ordinary Least Squares (OLS), ridge regression and robust ridge regression based on GM-estimates. The finding of this study is able to produce reliable parameter estimates in the presence of both multicollinearity and outliers in the data.
Optimization of fixture layouts of glass laser optics using multiple kernel regression.
Su, Jianhua; Cao, Enhua; Qiao, Hong
2014-05-10
We aim to build an integrated fixturing model to describe the structural properties and thermal properties of the support frame of glass laser optics. Therefore, (a) a near global optimal set of clamps can be computed to minimize the surface shape error of the glass laser optic based on the proposed model, and (b) a desired surface shape error can be obtained by adjusting the clamping forces under various environmental temperatures based on the model. To construct the model, we develop a new multiple kernel learning method and call it multiple kernel support vector functional regression. The proposed method uses two layer regressions to group and order the data sources by the weights of the kernels and the factors of the layers. Because of that, the influences of the clamps and the temperature can be evaluated by grouping them into different layers.
ERIC Educational Resources Information Center
Mugrage, Beverly; And Others
Three ridge regression solutions are compared with ordinary least squares regression and with principal components regression using all components. Ridge regression, particularly the Lawless-Wang solution, out-performed ordinary least squares regression and the principal components solution on the criteria of stability of coefficient and closeness…
Kernel analysis of partial least squares (PLS) regression models.
Shinzawa, Hideyuki; Ritthiruangdej, Pitiporn; Ozaki, Yukihiro
2011-05-01
An analytical technique based on kernel matrix representation is demonstrated to provide further chemically meaningful insight into partial least squares (PLS) regression models. The kernel matrix condenses essential information about scores derived from PLS or principal component analysis (PCA). Thus, it becomes possible to establish the proper interpretation of the scores. A PLS model for the total nitrogen (TN) content in multiple Thai fish sauces is built with a set of near-infrared (NIR) transmittance spectra of the fish sauce samples. The kernel analysis of the scores effectively reveals that the variation of the spectral feature induced by the change in protein content is substantially associated with the total water content and the protein hydration. Kernel analysis is also carried out on a set of time-dependent infrared (IR) spectra representing transient evaporation of ethanol from a binary mixture solution of ethanol and oleic acid. A PLS model to predict the elapsed time is built with the IR spectra and the kernel matrix is derived from the scores. The detailed analysis of the kernel matrix provides penetrating insight into the interaction between the ethanol and the oleic acid.
An Approximate Approach to Automatic Kernel Selection.
Ding, Lizhong; Liao, Shizhong
2016-02-02
Kernel selection is a fundamental problem of kernel-based learning algorithms. In this paper, we propose an approximate approach to automatic kernel selection for regression from the perspective of kernel matrix approximation. We first introduce multilevel circulant matrices into automatic kernel selection, and develop two approximate kernel selection algorithms by exploiting the computational virtues of multilevel circulant matrices. The complexity of the proposed algorithms is quasi-linear in the number of data points. Then, we prove an approximation error bound to measure the effect of the approximation in kernel matrices by multilevel circulant matrices on the hypothesis and further show that the approximate hypothesis produced with multilevel circulant matrices converges to the accurate hypothesis produced with kernel matrices. Experimental evaluations on benchmark datasets demonstrate the effectiveness of approximate kernel selection.
Adaptive kernel regression for freehand 3D ultrasound reconstruction
NASA Astrophysics Data System (ADS)
Alshalalfah, Abdel-Latif; Daoud, Mohammad I.; Al-Najar, Mahasen
2017-03-01
Freehand three-dimensional (3D) ultrasound imaging enables low-cost and flexible 3D scanning of arbitrary-shaped organs, where the operator can freely move a two-dimensional (2D) ultrasound probe to acquire a sequence of tracked cross-sectional images of the anatomy. Often, the acquired 2D ultrasound images are irregularly and sparsely distributed in the 3D space. Several 3D reconstruction algorithms have been proposed to synthesize 3D ultrasound volumes based on the acquired 2D images. A challenging task during the reconstruction process is to preserve the texture patterns in the synthesized volume and ensure that all gaps in the volume are correctly filled. This paper presents an adaptive kernel regression algorithm that can effectively reconstruct high-quality freehand 3D ultrasound volumes. The algorithm employs a kernel regression model that enables nonparametric interpolation of the voxel gray-level values. The kernel size of the regression model is adaptively adjusted based on the characteristics of the voxel that is being interpolated. In particular, when the algorithm is employed to interpolate a voxel located in a region with dense ultrasound data samples, the size of the kernel is reduced to preserve the texture patterns. On the other hand, the size of the kernel is increased in areas that include large gaps to enable effective gap filling. The performance of the proposed algorithm was compared with seven previous interpolation approaches by synthesizing freehand 3D ultrasound volumes of a benign breast tumor. The experimental results show that the proposed algorithm outperforms the other interpolation approaches.
On Quantile Regression in Reproducing Kernel Hilbert Spaces with Data Sparsity Constraint
Zhang, Chong; Liu, Yufeng; Wu, Yichao
2015-01-01
For spline regressions, it is well known that the choice of knots is crucial for the performance of the estimator. As a general learning framework covering the smoothing splines, learning in a Reproducing Kernel Hilbert Space (RKHS) has a similar issue. However, the selection of training data points for kernel functions in the RKHS representation has not been carefully studied in the literature. In this paper we study quantile regression as an example of learning in a RKHS. In this case, the regular squared norm penalty does not perform training data selection. We propose a data sparsity constraint that imposes thresholding on the kernel function coefficients to achieve a sparse kernel function representation. We demonstrate that the proposed data sparsity method can have competitive prediction performance for certain situations, and have comparable performance in other cases compared to that of the traditional squared norm penalty. Therefore, the data sparsity method can serve as a competitive alternative to the squared norm penalty method. Some theoretical properties of our proposed method using the data sparsity constraint are obtained. Both simulated and real data sets are used to demonstrate the usefulness of our data sparsity constraint. PMID:27134575
Deep Restricted Kernel Machines Using Conjugate Feature Duality.
Suykens, Johan A K
2017-08-01
The aim of this letter is to propose a theory of deep restricted kernel machines offering new foundations for deep learning with kernel machines. From the viewpoint of deep learning, it is partially related to restricted Boltzmann machines, which are characterized by visible and hidden units in a bipartite graph without hidden-to-hidden connections and deep learning extensions as deep belief networks and deep Boltzmann machines. From the viewpoint of kernel machines, it includes least squares support vector machines for classification and regression, kernel principal component analysis (PCA), matrix singular value decomposition, and Parzen-type models. A key element is to first characterize these kernel machines in terms of so-called conjugate feature duality, yielding a representation with visible and hidden units. It is shown how this is related to the energy form in restricted Boltzmann machines, with continuous variables in a nonprobabilistic setting. In this new framework of so-called restricted kernel machine (RKM) representations, the dual variables correspond to hidden features. Deep RKM are obtained by coupling the RKMs. The method is illustrated for deep RKM, consisting of three levels with a least squares support vector machine regression level and two kernel PCA levels. In its primal form also deep feedforward neural networks can be trained within this framework.
Javanrouh, Niloufar; Daneshpour, Maryam S; Soltanian, Ali Reza; Tapak, Leili
2018-06-05
Obesity is a serious health problem that leads to low quality of life and early mortality. To the purpose of prevention and gene therapy for such a worldwide disease, genome wide association study is a powerful tool for finding SNPs associated with increased risk of obesity. To conduct an association analysis, kernel machine regression is a generalized regression method, has an advantage of considering the epistasis effects as well as the correlation between individuals due to unknown factors. In this study, information of the people who participated in Tehran cardio-metabolic genetic study was used. They were genotyped for the chromosomal region, evaluation 986 variations located at 16q12.2; build 38hg. Kernel machine regression and single SNP analysis were used to assess the association between obesity and SNPs genotyped data. We found that associated SNP sets with obesity, were almost in the FTO (P = 0.01), AIKTIP (P = 0.02) and MMP2 (P = 0.02) genes. Moreover, two SNPs, i.e., rs10521296 and rs11647470, showed significant association with obesity using kernel regression (P = 0.02). In conclusion, significant sets were randomly distributed throughout the region with more density around the FTO, AIKTIP and MMP2 genes. Furthermore, two intergenic SNPs showed significant association after using kernel machine regression. Therefore, more studies have to be conducted to assess their functionality or precise mechanism. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Dougherty, Andrew W.
Metal oxides are a staple of the sensor industry. The combination of their sensitivity to a number of gases, and the electrical nature of their sensing mechanism, make the particularly attractive in solid state devices. The high temperature stability of the ceramic material also make them ideal for detecting combustion byproducts where exhaust temperatures can be high. However, problems do exist with metal oxide sensors. They are not very selective as they all tend to be sensitive to a number of reduction and oxidation reactions on the oxide's surface. This makes sensors with large numbers of sensors interesting to study as a method for introducing orthogonality to the system. Also, the sensors tend to suffer from long term drift for a number of reasons. In this thesis I will develop a system for intelligently modeling metal oxide sensors and determining their suitability for use in large arrays designed to analyze exhaust gas streams. It will introduce prior knowledge of the metal oxide sensors' response mechanisms in order to produce a response function for each sensor from sparse training data. The system will use the same technique to model and remove any long term drift from the sensor response. It will also provide an efficient means for determining the orthogonality of the sensor to determine whether they are useful in gas sensing arrays. The system is based on least squares support vector regression using the reciprocal kernel. The reciprocal kernel is introduced along with a method of optimizing the free parameters of the reciprocal kernel support vector machine. The reciprocal kernel is shown to be simpler and to perform better than an earlier kernel, the modified reciprocal kernel. Least squares support vector regression is chosen as it uses all of the training points and an emphasis was placed throughout this research for extracting the maximum information from very sparse data. The reciprocal kernel is shown to be effective in modeling the sensor responses in the time, gas and temperature domains, and the dual representation of the support vector regression solution is shown to provide insight into the sensor's sensitivity and potential orthogonality. Finally, the dual weights of the support vector regression solution to the sensor's response are suggested as a fitness function for a genetic algorithm, or some other method for efficiently searching large parameter spaces.
Mixed kernel function support vector regression for global sensitivity analysis
NASA Astrophysics Data System (ADS)
Cheng, Kai; Lu, Zhenzhou; Wei, Yuhao; Shi, Yan; Zhou, Yicheng
2017-11-01
Global sensitivity analysis (GSA) plays an important role in exploring the respective effects of input variables on an assigned output response. Amongst the wide sensitivity analyses in literature, the Sobol indices have attracted much attention since they can provide accurate information for most models. In this paper, a mixed kernel function (MKF) based support vector regression (SVR) model is employed to evaluate the Sobol indices at low computational cost. By the proposed derivation, the estimation of the Sobol indices can be obtained by post-processing the coefficients of the SVR meta-model. The MKF is constituted by the orthogonal polynomials kernel function and Gaussian radial basis kernel function, thus the MKF possesses both the global characteristic advantage of the polynomials kernel function and the local characteristic advantage of the Gaussian radial basis kernel function. The proposed approach is suitable for high-dimensional and non-linear problems. Performance of the proposed approach is validated by various analytical functions and compared with the popular polynomial chaos expansion (PCE). Results demonstrate that the proposed approach is an efficient method for global sensitivity analysis.
Ahn, Jae Joon; Kim, Young Min; Yoo, Keunje; Park, Joonhong; Oh, Kyong Joo
2012-11-01
For groundwater conservation and management, it is important to accurately assess groundwater pollution vulnerability. This study proposed an integrated model using ridge regression and a genetic algorithm (GA) to effectively select the major hydro-geological parameters influencing groundwater pollution vulnerability in an aquifer. The GA-Ridge regression method determined that depth to water, net recharge, topography, and the impact of vadose zone media were the hydro-geological parameters that influenced trichloroethene pollution vulnerability in a Korean aquifer. When using these selected hydro-geological parameters, the accuracy was improved for various statistical nonlinear and artificial intelligence (AI) techniques, such as multinomial logistic regression, decision trees, artificial neural networks, and case-based reasoning. These results provide a proof of concept that the GA-Ridge regression is effective at determining influential hydro-geological parameters for the pollution vulnerability of an aquifer, and in turn, improves the AI performance in assessing groundwater pollution vulnerability.
Using ridge regression in systematic pointing error corrections
NASA Technical Reports Server (NTRS)
Guiar, C. N.
1988-01-01
A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.
Structured Kernel Subspace Learning for Autonomous Robot Navigation.
Kim, Eunwoo; Choi, Sungjoon; Oh, Songhwai
2018-02-14
This paper considers two important problems for autonomous robot navigation in a dynamic environment, where the goal is to predict pedestrian motion and control a robot with the prediction for safe navigation. While there are several methods for predicting the motion of a pedestrian and controlling a robot to avoid incoming pedestrians, it is still difficult to safely navigate in a dynamic environment due to challenges, such as the varying quality and complexity of training data with unwanted noises. This paper addresses these challenges simultaneously by proposing a robust kernel subspace learning algorithm based on the recent advances in nuclear-norm and l 1 -norm minimization. We model the motion of a pedestrian and the robot controller using Gaussian processes. The proposed method efficiently approximates a kernel matrix used in Gaussian process regression by learning low-rank structured matrix (with symmetric positive semi-definiteness) to find an orthogonal basis, which eliminates the effects of erroneous and inconsistent data. Based on structured kernel subspace learning, we propose a robust motion model and motion controller for safe navigation in dynamic environments. We evaluate the proposed robust kernel learning in various tasks, including regression, motion prediction, and motion control problems, and demonstrate that the proposed learning-based systems are robust against outliers and outperform existing regression and navigation methods.
GPU-accelerated Kernel Regression Reconstruction for Freehand 3D Ultrasound Imaging.
Wen, Tiexiang; Li, Ling; Zhu, Qingsong; Qin, Wenjian; Gu, Jia; Yang, Feng; Xie, Yaoqin
2017-07-01
Volume reconstruction method plays an important role in improving reconstructed volumetric image quality for freehand three-dimensional (3D) ultrasound imaging. By utilizing the capability of programmable graphics processing unit (GPU), we can achieve a real-time incremental volume reconstruction at a speed of 25-50 frames per second (fps). After incremental reconstruction and visualization, hole-filling is performed on GPU to fill remaining empty voxels. However, traditional pixel nearest neighbor-based hole-filling fails to reconstruct volume with high image quality. On the contrary, the kernel regression provides an accurate volume reconstruction method for 3D ultrasound imaging but with the cost of heavy computational complexity. In this paper, a GPU-based fast kernel regression method is proposed for high-quality volume after the incremental reconstruction of freehand ultrasound. The experimental results show that improved image quality for speckle reduction and details preservation can be obtained with the parameter setting of kernel window size of [Formula: see text] and kernel bandwidth of 1.0. The computational performance of the proposed GPU-based method can be over 200 times faster than that on central processing unit (CPU), and the volume with size of 50 million voxels in our experiment can be reconstructed within 10 seconds.
Gaussian process regression for geometry optimization
NASA Astrophysics Data System (ADS)
Denzel, Alexander; Kästner, Johannes
2018-03-01
We implemented a geometry optimizer based on Gaussian process regression (GPR) to find minimum structures on potential energy surfaces. We tested both a two times differentiable form of the Matérn kernel and the squared exponential kernel. The Matérn kernel performs much better. We give a detailed description of the optimization procedures. These include overshooting the step resulting from GPR in order to obtain a higher degree of interpolation vs. extrapolation. In a benchmark against the Limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer of the DL-FIND library on 26 test systems, we found the new optimizer to generally reduce the number of required optimization steps.
The Variance Normalization Method of Ridge Regression Analysis.
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
The testing of contemporary sociological theory often calls for the application of structural-equation models to data which are inherently collinear. It is shown that simple ridge regression, which is commonly used for controlling the instability of ordinary least squares regression estimates in ill-conditioned data sets, is not a legitimate…
Simultaneous multiple non-crossing quantile regression estimation using kernel constraints
Liu, Yufeng; Wu, Yichao
2011-01-01
Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better estimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility of incorporating simultaneous non-crossing constraints of QR functions. In this paper, we propose a new kernel-based multiple QR estimation technique, namely simultaneous non-crossing quantile regression (SNQR). We use kernel representations for QR functions and apply constraints on the kernel coefficients to avoid crossing. Both unregularised and regularised SNQR techniques are considered. Asymptotic properties such as asymptotic normality of linear SNQR and oracle properties of the sparse linear SNQR are developed. Our numerical results demonstrate the competitive performance of our SNQR over the original individual QR estimation. PMID:22190842
Out-of-Sample Extensions for Non-Parametric Kernel Methods.
Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang
2017-02-01
Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.
OPC modeling by genetic algorithm
NASA Astrophysics Data System (ADS)
Huang, W. C.; Lai, C. M.; Luo, B.; Tsai, C. K.; Tsay, C. S.; Lai, C. W.; Kuo, C. C.; Liu, R. G.; Lin, H. T.; Lin, B. J.
2005-05-01
Optical proximity correction (OPC) is usually used to pre-distort mask layouts to make the printed patterns as close to the desired shapes as possible. For model-based OPC, a lithographic model to predict critical dimensions after lithographic processing is needed. The model is usually obtained via a regression of parameters based on experimental data containing optical proximity effects. When the parameters involve a mix of the continuous (optical and resist models) and the discrete (kernel numbers) sets, the traditional numerical optimization method may have difficulty handling model fitting. In this study, an artificial-intelligent optimization method was used to regress the parameters of the lithographic models for OPC. The implemented phenomenological models were constant-threshold models that combine diffused aerial image models with loading effects. Optical kernels decomposed from Hopkin"s equation were used to calculate aerial images on the wafer. Similarly, the numbers of optical kernels were treated as regression parameters. This way, good regression results were obtained with different sets of optical proximity effect data.
Ridge Regression Signal Processing
NASA Technical Reports Server (NTRS)
Kuhl, Mark R.
1990-01-01
The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Feature Selection for Ridge Regression with Provable Guarantees.
Paul, Saurabh; Drineas, Petros
2016-04-01
We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
Optimizing Support Vector Machine Parameters with Genetic Algorithm for Credit Risk Assessment
NASA Astrophysics Data System (ADS)
Manurung, Jonson; Mawengkang, Herman; Zamzami, Elviawaty
2017-12-01
Support vector machine (SVM) is a popular classification method known to have strong generalization capabilities. SVM can solve the problem of classification and linear regression or nonlinear kernel which can be a learning algorithm for the ability of classification and regression. However, SVM also has a weakness that is difficult to determine the optimal parameter value. SVM calculates the best linear separator on the input feature space according to the training data. To classify data which are non-linearly separable, SVM uses kernel tricks to transform the data into a linearly separable data on a higher dimension feature space. The kernel trick using various kinds of kernel functions, such as : linear kernel, polynomial, radial base function (RBF) and sigmoid. Each function has parameters which affect the accuracy of SVM classification. To solve the problem genetic algorithms are proposed to be applied as the optimal parameter value search algorithm thus increasing the best classification accuracy on SVM. Data taken from UCI repository of machine learning database: Australian Credit Approval. The results show that the combination of SVM and genetic algorithms is effective in improving classification accuracy. Genetic algorithms has been shown to be effective in systematically finding optimal kernel parameters for SVM, instead of randomly selected kernel parameters. The best accuracy for data has been upgraded from kernel Linear: 85.12%, polynomial: 81.76%, RBF: 77.22% Sigmoid: 78.70%. However, for bigger data sizes, this method is not practical because it takes a lot of time.
Multiple kernel SVR based on the MRE for remote sensing water depth fusion detection
NASA Astrophysics Data System (ADS)
Wang, Jinjin; Ma, Yi; Zhang, Jingyu
2018-03-01
Remote sensing has an important means of water depth detection in coastal shallow waters and reefs. Support vector regression (SVR) is a machine learning method which is widely used in data regression. In this paper, SVR is used to remote sensing multispectral bathymetry. Aiming at the problem that the single-kernel SVR method has a large error in shallow water depth inversion, the mean relative error (MRE) of different water depth is retrieved as a decision fusion factor with single kernel SVR method, a multi kernel SVR fusion method based on the MRE is put forward. And taking the North Island of the Xisha Islands in China as an experimentation area, the comparison experiments with the single kernel SVR method and the traditional multi-bands bathymetric method are carried out. The results show that: 1) In range of 0 to 25 meters, the mean absolute error(MAE)of the multi kernel SVR fusion method is 1.5m,the MRE is 13.2%; 2) Compared to the 4 single kernel SVR method, the MRE of the fusion method reduced 1.2% (1.9%) 3.4% (1.8%), and compared to traditional multi-bands method, the MRE reduced 1.9%; 3) In 0-5m depth section, compared to the single kernel method and the multi-bands method, the MRE of fusion method reduced 13.5% to 44.4%, and the distribution of points is more concentrated relative to y=x.
An enhanced structure tensor method for sea ice ridge detection from GF-3 SAR imagery
NASA Astrophysics Data System (ADS)
Zhu, T.; Li, F.; Zhang, Y.; Zhang, S.; Spreen, G.; Dierking, W.; Heygster, G.
2017-12-01
In SAR imagery, ridges or leads are shown as the curvilinear features. The proposed ridge detection method is facilitated by their curvilinear shapes. The bright curvilinear features are recognized as the ridges while the dark curvilinear features are classified as the leads. In dual-polarization HH or HV channel of C-band SAR imagery, the bright curvilinear feature may be false alarm because the frost flowers of young leads may show as bright pixels associated with changes in the surface salinity under calm surface conditions. Wind roughened leads also trigger the backscatter increasing that can be misclassified as ridges [1]. Thus the width limitation is considered in this proposed structure tensor method [2], since only shape feature based method is not enough for detecting ridges. The ridge detection algorithm is based on the hypothesis that the bright pixels are ridges with curvilinear shapes and the ridge width is less 30 meters. Benefited from GF-3 with high spatial resolution of 3 meters, we provide an enhanced structure tensor method for detecting the significant ridge. The preprocessing procedures including the calibration and incidence angle normalization are also investigated. The bright pixels will have strong response to the bandpass filtering. The ridge training samples are delineated from the SAR imagery in the Log-Gabor filters to construct structure tensor. From the tensor, the dominant orientation of the pixel representing the ridge is determined by the dominant eigenvector. For the post-processing of structure tensor, the elongated kernel is desired to enhance the ridge curvilinear shape. Since ridge presents along a certain direction, the ratio of the dominant eigenvector will be used to measure the intensity of local anisotropy. The convolution filter has been utilized in the constructed structure tensor is used to model spatial contextual information. Ridge detection results from GF-3 show the proposed method performs better compared to the direct threshold method.
Penalized nonparametric scalar-on-function regression via principal coordinates
Reiss, Philip T.; Miller, David L.; Wu, Pei-Shien; Hua, Wen-Yu
2016-01-01
A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression, one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, principal coordinate ridge regression, with dynamic time warping distance used to define the principal coordinates, is shown to outperform a functional generalized linear model. PMID:29217963
A locally adaptive kernel regression method for facies delineation
NASA Astrophysics Data System (ADS)
Fernàndez-Garcia, D.; Barahona-Palomo, M.; Henri, C. V.; Sanchez-Vila, X.
2015-12-01
Facies delineation is defined as the separation of geological units with distinct intrinsic characteristics (grain size, hydraulic conductivity, mineralogical composition). A major challenge in this area stems from the fact that only a few scattered pieces of hydrogeological information are available to delineate geological facies. Several methods to delineate facies are available in the literature, ranging from those based only on existing hard data, to those including secondary data or external knowledge about sedimentological patterns. This paper describes a methodology to use kernel regression methods as an effective tool for facies delineation. The method uses both the spatial and the actual sampled values to produce, for each individual hard data point, a locally adaptive steering kernel function, self-adjusting the principal directions of the local anisotropic kernels to the direction of highest local spatial correlation. The method is shown to outperform the nearest neighbor classification method in a number of synthetic aquifers whenever the available number of hard data is small and randomly distributed in space. In the case of exhaustive sampling, the steering kernel regression method converges to the true solution. Simulations ran in a suite of synthetic examples are used to explore the selection of kernel parameters in typical field settings. It is shown that, in practice, a rule of thumb can be used to obtain suboptimal results. The performance of the method is demonstrated to significantly improve when external information regarding facies proportions is incorporated. Remarkably, the method allows for a reasonable reconstruction of the facies connectivity patterns, shown in terms of breakthrough curves performance.
Multineuron spike train analysis with R-convolution linear combination kernel.
Tezuka, Taro
2018-06-01
A spike train kernel provides an effective way of decoding information represented by a spike train. Some spike train kernels have been extended to multineuron spike trains, which are simultaneously recorded spike trains obtained from multiple neurons. However, most of these multineuron extensions were carried out in a kernel-specific manner. In this paper, a general framework is proposed for extending any single-neuron spike train kernel to multineuron spike trains, based on the R-convolution kernel. Special subclasses of the proposed R-convolution linear combination kernel are explored. These subclasses have a smaller number of parameters and make optimization tractable when the size of data is limited. The proposed kernel was evaluated using Gaussian process regression for multineuron spike trains recorded from an animal brain. It was compared with the sum kernel and the population Spikernel, which are existing ways of decoding multineuron spike trains using kernels. The results showed that the proposed approach performs better than these kernels and also other commonly used neural decoding methods. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Pham, Tien-Lam; Nguyen, Nguyen-Duong; Nguyen, Van-Doan; Kino, Hiori; Miyake, Takashi; Dam, Hieu-Chi
2018-05-01
We have developed a descriptor named Orbital Field Matrix (OFM) for representing material structures in datasets of multi-element materials. The descriptor is based on the information regarding atomic valence shell electrons and their coordination. In this work, we develop an extension of OFM called OFM1. We have shown that these descriptors are highly applicable in predicting the physical properties of materials and in providing insights on the materials space by mapping into a low embedded dimensional space. Our experiments with transition metal/lanthanide metal alloys show that the local magnetic moments and formation energies can be accurately reproduced using simple nearest-neighbor regression, thus confirming the relevance of our descriptors. Using kernel ridge regressions, we could accurately reproduce formation energies and local magnetic moments calculated based on first-principles, with mean absolute errors of 0.03 μB and 0.10 eV/atom, respectively. We show that meaningful low-dimensional representations can be extracted from the original descriptor using descriptive learning algorithms. Intuitive prehension on the materials space, qualitative evaluation on the similarities in local structures or crystalline materials, and inference in the designing of new materials by element substitution can be performed effectively based on these low-dimensional representations.
Alamaniotis, Miltiadis; Bargiotas, Dimitrios; Tsoukalas, Lefteri H
2016-01-01
Integration of energy systems with information technologies has facilitated the realization of smart energy systems that utilize information to optimize system operation. To that end, crucial in optimizing energy system operation is the accurate, ahead-of-time forecasting of load demand. In particular, load forecasting allows planning of system expansion, and decision making for enhancing system safety and reliability. In this paper, the application of two types of kernel machines for medium term load forecasting (MTLF) is presented and their performance is recorded based on a set of historical electricity load demand data. The two kernel machine models and more specifically Gaussian process regression (GPR) and relevance vector regression (RVR) are utilized for making predictions over future load demand. Both models, i.e., GPR and RVR, are equipped with a Gaussian kernel and are tested on daily predictions for a 30-day-ahead horizon taken from the New England Area. Furthermore, their performance is compared to the ARMA(2,2) model with respect to mean average percentage error and squared correlation coefficient. Results demonstrate the superiority of RVR over the other forecasting models in performing MTLF.
graphkernels: R and Python packages for graph comparison
Ghisu, M Elisabetta; Llinares-López, Felipe; Borgwardt, Karsten
2018-01-01
Abstract Summary Measuring the similarity of graphs is a fundamental step in the analysis of graph-structured data, which is omnipresent in computational biology. Graph kernels have been proposed as a powerful and efficient approach to this problem of graph comparison. Here we provide graphkernels, the first R and Python graph kernel libraries including baseline kernels such as label histogram based kernels, classic graph kernels such as random walk based kernels, and the state-of-the-art Weisfeiler-Lehman graph kernel. The core of all graph kernels is implemented in C ++ for efficiency. Using the kernel matrices computed by the package, we can easily perform tasks such as classification, regression and clustering on graph-structured samples. Availability and implementation The R and Python packages including source code are available at https://CRAN.R-project.org/package=graphkernels and https://pypi.python.org/pypi/graphkernels. Contact mahito@nii.ac.jp or elisabetta.ghisu@bsse.ethz.ch Supplementary information Supplementary data are available online at Bioinformatics. PMID:29028902
graphkernels: R and Python packages for graph comparison.
Sugiyama, Mahito; Ghisu, M Elisabetta; Llinares-López, Felipe; Borgwardt, Karsten
2018-02-01
Measuring the similarity of graphs is a fundamental step in the analysis of graph-structured data, which is omnipresent in computational biology. Graph kernels have been proposed as a powerful and efficient approach to this problem of graph comparison. Here we provide graphkernels, the first R and Python graph kernel libraries including baseline kernels such as label histogram based kernels, classic graph kernels such as random walk based kernels, and the state-of-the-art Weisfeiler-Lehman graph kernel. The core of all graph kernels is implemented in C ++ for efficiency. Using the kernel matrices computed by the package, we can easily perform tasks such as classification, regression and clustering on graph-structured samples. The R and Python packages including source code are available at https://CRAN.R-project.org/package=graphkernels and https://pypi.python.org/pypi/graphkernels. mahito@nii.ac.jp or elisabetta.ghisu@bsse.ethz.ch. Supplementary data are available online at Bioinformatics. © The Author(s) 2017. Published by Oxford University Press.
Comparison of buried sand ridges and regressive sand ridges on the outer shelf of the East China Sea
NASA Astrophysics Data System (ADS)
Wu, Ziyin; Jin, Xianglong; Zhou, Jieqiong; Zhao, Dineng; Shang, Jihong; Li, Shoujun; Cao, Zhenyi; Liang, Yuyang
2017-06-01
Based on multi-beam echo soundings and high-resolution single-channel seismic profiles, linear sand ridges in U14 and U2 on the East China Sea (ECS) shelf are identified and compared in detail. Linear sand ridges in U14 are buried sand ridges, which are 90 m below the seafloor. It is presumed that these buried sand ridges belong to the transgressive systems tract (TST) formed 320-200 ka ago and that their top interface is the maximal flooding surface (MFS). Linear sand ridges in U2 are regressive sand ridges. It is presumed that these buried sand ridges belong to the TST of the last glacial maximum (LGM) and that their top interface is the MFS of the LGM. Four sub-stage sand ridges of U2 are discerned from the high-resolution single-channel seismic profile and four strikes of regressive sand ridges are distinguished from the submarine topographic map based on the multi-beam echo soundings. These multi-stage and multi-strike linear sand ridges are the response of, and evidence for, the evolution of submarine topography with respect to sea-level fluctuations since the LGM. Although the difference in the age of formation between U14 and U2 is 200 ka and their sequences are 90 m apart, the general strikes of the sand ridges are similar. This indicates that the basic configuration of tidal waves on the ECS shelf has been stable for the last 200 ka. A basic evolutionary model of the strata of the ECS shelf is proposed, in which sea-level change is the controlling factor. During the sea-level change of about 100 ka, five to six strata are developed and the sand ridges develop in the TST. A similar story of the evolution of paleo-topography on the ECS shelf has been repeated during the last 300 ka.
NASA Astrophysics Data System (ADS)
Verrelst, Jochem; Rivera, J. P.; Alonso, L.; Guanter, L.; Moreno, J.
2012-04-01
ESA’s upcoming satellites Sentinel-2 (S2) and Sentinel-3 (S3) aim to ensure continuity for Landsat 5/7, SPOT- 5, SPOT-Vegetation and Envisat MERIS observations by providing superspectral images of high spatial and temporal resolution. S2 and S3 will deliver near real-time operational products with a high accuracy for land monitoring. This unprecedented data availability leads to an urgent need for developing robust and accurate retrieval methods. Machine learning regression algorithms could be powerful candidates for the estimation of biophysical parameters from satellite reflectance measurements because of their ability to perform adaptive, nonlinear data fitting. By using data from the ESA-led field campaign SPARC (Barrax, Spain), it was recently found [1] that Gaussian processes regression (GPR) outperformed competitive machine learning algorithms such as neural networks, support vector regression) and kernel ridge regression both in terms of accuracy and computational speed. For various Sentinel configurations (S2-10m, S2- 20m, S2-60m and S3-300m) three important biophysical parameters were estimated: leaf chlorophyll content (Chl), leaf area index (LAI) and fractional vegetation cover (FVC). GPR was the only method that reached the 10% precision required by end users in the estimation of Chl. In view of implementing the regressor into operational monitoring applications, here the portability of locally trained GPR models to other images was evaluated. The associated confidence maps proved to be a good indicator for evaluating the robustness of the trained models. Consistent retrievals were obtained across the different images, particularly over agricultural sites. To make the method suitable for operational use, however, the poorer confidences over bare soil areas suggest that the training dataset should be expanded with inputs from various land cover types.
Ridge Regression for Interactive Models.
ERIC Educational Resources Information Center
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
Omnibus Risk Assessment via Accelerated Failure Time Kernel Machine Modeling
Sinnott, Jennifer A.; Cai, Tianxi
2013-01-01
Summary Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai et al., 2011). In this paper, we derive testing and prediction methods for KM regression under the accelerated failure time model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. PMID:24328713
Compound analysis via graph kernels incorporating chirality.
Brown, J B; Urata, Takashi; Tamura, Takeyuki; Arai, Midori A; Kawabata, Takeo; Akutsu, Tatsuya
2010-12-01
High accuracy is paramount when predicting biochemical characteristics using Quantitative Structural-Property Relationships (QSPRs). Although existing graph-theoretic kernel methods combined with machine learning techniques are efficient for QSPR model construction, they cannot distinguish topologically identical chiral compounds which often exhibit different biological characteristics. In this paper, we propose a new method that extends the recently developed tree pattern graph kernel to accommodate stereoisomers. We show that Support Vector Regression (SVR) with a chiral graph kernel is useful for target property prediction by demonstrating its application to a set of human vitamin D receptor ligands currently under consideration for their potential anti-cancer effects.
Sepsis mortality prediction with the Quotient Basis Kernel.
Ribas Ripoll, Vicent J; Vellido, Alfredo; Romero, Enrique; Ruiz-Rodríguez, Juan Carlos
2014-05-01
This paper presents an algorithm to assess the risk of death in patients with sepsis. Sepsis is a common clinical syndrome in the intensive care unit (ICU) that can lead to severe sepsis, a severe state of septic shock or multi-organ failure. The proposed algorithm may be implemented as part of a clinical decision support system that can be used in combination with the scores deployed in the ICU to improve the accuracy, sensitivity and specificity of mortality prediction for patients with sepsis. In this paper, we used the Simplified Acute Physiology Score (SAPS) for ICU patients and the Sequential Organ Failure Assessment (SOFA) to build our kernels and algorithms. In the proposed method, we embed the available data in a suitable feature space and use algorithms based on linear algebra, geometry and statistics for inference. We present a simplified version of the Fisher kernel (practical Fisher kernel for multinomial distributions), as well as a novel kernel that we named the Quotient Basis Kernel (QBK). These kernels are used as the basis for mortality prediction using soft-margin support vector machines. The two new kernels presented are compared against other generative kernels based on the Jensen-Shannon metric (centred, exponential and inverse) and other widely used kernels (linear, polynomial and Gaussian). Clinical relevance is also evaluated by comparing these results with logistic regression and the standard clinical prediction method based on the initial SAPS score. As described in this paper, we tested the new methods via cross-validation with a cohort of 400 test patients. The results obtained using our methods compare favourably with those obtained using alternative kernels (80.18% accuracy for the QBK) and the standard clinical prediction method, which are based on the basal SAPS score or logistic regression (71.32% and 71.55%, respectively). The QBK presented a sensitivity and specificity of 79.34% and 83.24%, which outperformed the other kernels analysed, logistic regression and the standard clinical prediction method based on the basal SAPS score. Several scoring systems for patients with sepsis have been introduced and developed over the last 30 years. They allow for the assessment of the severity of disease and provide an estimate of in-hospital mortality. Physiology-based scoring systems are applied to critically ill patients and have a number of advantages over diagnosis-based systems. Severity score systems are often used to stratify critically ill patients for possible inclusion in clinical trials. In this paper, we present an effective algorithm that combines both scoring methodologies for the assessment of death in patients with sepsis that can be used to improve the sensitivity and specificity of the currently available methods. Copyright © 2014 Elsevier B.V. All rights reserved.
Kernel-based whole-genome prediction of complex traits: a review.
Morota, Gota; Gianola, Daniel
2014-01-01
Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Omnibus risk assessment via accelerated failure time kernel machine modeling.
Sinnott, Jennifer A; Cai, Tianxi
2013-12-01
Integrating genomic information with traditional clinical risk factors to improve the prediction of disease outcomes could profoundly change the practice of medicine. However, the large number of potential markers and possible complexity of the relationship between markers and disease make it difficult to construct accurate risk prediction models. Standard approaches for identifying important markers often rely on marginal associations or linearity assumptions and may not capture non-linear or interactive effects. In recent years, much work has been done to group genes into pathways and networks. Integrating such biological knowledge into statistical learning could potentially improve model interpretability and reliability. One effective approach is to employ a kernel machine (KM) framework, which can capture nonlinear effects if nonlinear kernels are used (Scholkopf and Smola, 2002; Liu et al., 2007, 2008). For survival outcomes, KM regression modeling and testing procedures have been derived under a proportional hazards (PH) assumption (Li and Luan, 2003; Cai, Tonini, and Lin, 2011). In this article, we derive testing and prediction methods for KM regression under the accelerated failure time (AFT) model, a useful alternative to the PH model. We approximate the null distribution of our test statistic using resampling procedures. When multiple kernels are of potential interest, it may be unclear in advance which kernel to use for testing and estimation. We propose a robust Omnibus Test that combines information across kernels, and an approach for selecting the best kernel for estimation. The methods are illustrated with an application in breast cancer. © 2013, The International Biometric Society.
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients
NASA Astrophysics Data System (ADS)
Gorgees, HazimMansoor; Mahdi, FatimahAssim
2018-05-01
This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
Zhao, Ni; Chen, Jun; Carroll, Ian M.; Ringel-Kulka, Tamar; Epstein, Michael P.; Zhou, Hua; Zhou, Jin J.; Ringel, Yehuda; Li, Hongzhe; Wu, Michael C.
2015-01-01
High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Distance-based analysis is a popular strategy for evaluating the overall association between microbiome diversity and outcome, wherein the phylogenetic distance between individuals’ microbiome profiles is computed and tested for association via permutation. Despite their practical popularity, distance-based approaches suffer from important challenges, especially in selecting the best distance and extending the methods to alternative outcomes, such as survival outcomes. We propose the microbiome regression-based kernel association test (MiRKAT), which directly regresses the outcome on the microbiome profiles via the semi-parametric kernel machine regression framework. MiRKAT allows for easy covariate adjustment and extension to alternative outcomes while non-parametrically modeling the microbiome through a kernel that incorporates phylogenetic distance. It uses a variance-component score statistic to test for the association with analytical p value calculation. The model also allows simultaneous examination of multiple distances, alleviating the problem of choosing the best distance. Our simulations demonstrated that MiRKAT provides correctly controlled type I error and adequate power in detecting overall association. “Optimal” MiRKAT, which considers multiple candidate distances, is robust in that it suffers from little power loss in comparison to when the best distance is used and can achieve tremendous power gain in comparison to when a poor distance is chosen. Finally, we applied MiRKAT to real microbiome datasets to show that microbial communities are associated with smoking and with fecal protease levels after confounders are controlled for. PMID:25957468
Fast metabolite identification with Input Output Kernel Regression.
Brouard, Céline; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho
2016-06-15
An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. celine.brouard@aalto.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Fast metabolite identification with Input Output Kernel Regression
Brouard, Céline; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho
2016-01-01
Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: celine.brouard@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307628
Javed, Faizan; Savkin, Andrey V; Chan, Gregory S H; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-11-01
This study aims to assess the blood volume and heart rate (HR) responses during haemodialysis in fluid overloaded patients by a nonparametric nonlinear regression approach based on a support vector machine (SVM). Relative blood volume (RBV) and electrocardiogram (ECG) was recorded from 23 haemodynamically stable renal failure patients during regular haemodialysis. Modelling was performed on 18 fluid overloaded patients (fluid removal of >2 L). SVM-based regression was used to obtain the models of RBV change with time as well as the percentage change in HR with respect to RBV. Mean squared error (MSE) and goodness of fit (R(2)) were used for comparison among different kernel functions. The design parameters were estimated using a grid search approach and the selected models were validated by a k-fold cross-validation technique. For the model of HR versus RBV change, a radial basis function (RBF) kernel (MSE = 17.37 and R(2) = 0.932) gave the least MSE compared to linear (MSE = 25.97 and R(2) = 0.898) and polynomial (MSE = 18.18 and R(2)= 0.929). The MSE was significantly lower for training data set when using RBF kernel compared to other kernels (p < 0.01). The RBF kernel also provided a slightly better fit of RBV change with time (MSE = 1.12 and R(2) = 0.91) compared to a linear kernel (MSE = 1.46 and R(2) = 0.88). The modelled HR response was characterized by an initial drop and a subsequent rise during progressive reduction in RBV, which may be interpreted as the reflex response to a transition from central hypervolaemia to hypovolaemia. These modelled curves can be used as references to a controller that can be designed to regulate the haemodynamic variables to ensure the stability of patients undergoing haemodialysis.
Influence of wheat kernel physical properties on the pulverizing process.
Dziki, Dariusz; Cacak-Pietrzak, Grażyna; Miś, Antoni; Jończyk, Krzysztof; Gawlik-Dziki, Urszula
2014-10-01
The physical properties of wheat kernel were determined and related to pulverizing performance by correlation analysis. Nineteen samples of wheat cultivars about similar level of protein content (11.2-12.8 % w.b.) and obtained from organic farming system were used for analysis. The kernel (moisture content 10 % w.b.) was pulverized by using the laboratory hammer mill equipped with round holes 1.0 mm screen. The specific grinding energy ranged from 120 kJkg(-1) to 159 kJkg(-1). On the basis of data obtained many of significant correlations (p < 0.05) were found between wheat kernel physical properties and pulverizing process of wheat kernel, especially wheat kernel hardness index (obtained on the basis of Single Kernel Characterization System) and vitreousness significantly and positively correlated with the grinding energy indices and the mass fraction of coarse particles (> 0.5 mm). Among the kernel mechanical properties determined on the basis of uniaxial compression test only the rapture force was correlated with the impact grinding results. The results showed also positive and significant relationships between kernel ash content and grinding energy requirements. On the basis of wheat physical properties the multiple linear regression was proposed for predicting the average particle size of pulverized kernel.
Effects of Employing Ridge Regression in Structural Equation Models.
ERIC Educational Resources Information Center
McQuitty, Shaun
1997-01-01
LISREL 8 invokes a ridge option when maximum likelihood or generalized least squares are used to estimate a structural equation model with a nonpositive definite covariance or correlation matrix. Implications of the ridge option for model fit, parameter estimates, and standard errors are explored through two examples. (SLD)
Nondestructive In Situ Measurement Method for Kernel Moisture Content in Corn Ear.
Zhang, Han-Lin; Ma, Qin; Fan, Li-Feng; Zhao, Peng-Fei; Wang, Jian-Xu; Zhang, Xiao-Dong; Zhu, De-Hai; Huang, Lan; Zhao, Dong-Jie; Wang, Zhong-Yi
2016-12-20
Moisture content is an important factor in corn breeding and cultivation. A corn breed with low moisture at harvest is beneficial for mechanical operations, reduces drying and storage costs after harvesting and, thus, reduces energy consumption. Nondestructive measurement of kernel moisture in an intact corn ear allows us to select corn varieties with seeds that have high dehydration speeds in the mature period. We designed a sensor using a ring electrode pair for nondestructive measurement of the kernel moisture in a corn ear based on a high-frequency detection circuit. Through experiments using the effective scope of the electrodes' electric field, we confirmed that the moisture in the corn cob has little effect on corn kernel moisture measurement. Before the sensor was applied in practice, we investigated temperature and conductivity effects on the output impedance. Results showed that the temperature was linearly related to the output impedance (both real and imaginary parts) of the measurement electrodes and the detection circuit's output voltage. However, the conductivity has a non-monotonic dependence on the output impedance (both real and imaginary parts) of the measurement electrodes and the output voltage of the high-frequency detection circuit. Therefore, we reduced the effect of conductivity on the measurement results through measurement frequency selection. Corn moisture measurement results showed a quadric regression between corn ear moisture and the imaginary part of the output impedance, and there is also a quadric regression between corn kernel moisture and the high-frequency detection circuit output voltage at 100 MHz. In this study, two corn breeds were measured using our sensor and gave R ² values for the quadric regression equation of 0.7853 and 0.8496.
Hybrid approach of selecting hyperparameters of support vector machine for regression.
Jeng, Jin-Tsong
2006-06-01
To select the hyperparameters of the support vector machine for regression (SVR), a hybrid approach is proposed to determine the kernel parameter of the Gaussian kernel function and the epsilon value of Vapnik's epsilon-insensitive loss function. The proposed hybrid approach includes a competitive agglomeration (CA) clustering algorithm and a repeated SVR (RSVR) approach. Since the CA clustering algorithm is used to find the nearly "optimal" number of clusters and the centers of clusters in the clustering process, the CA clustering algorithm is applied to select the Gaussian kernel parameter. Additionally, an RSVR approach that relies on the standard deviation of a training error is proposed to obtain an epsilon in the loss function. Finally, two functions, one real data set (i.e., a time series of quarterly unemployment rate for West Germany) and an identification of nonlinear plant are used to verify the usefulness of the hybrid approach.
Least square regularized regression in sum space.
Xu, Yong-Li; Chen, Di-Rong; Li, Han-Xiong; Liu, Lu
2013-04-01
This paper proposes a least square regularized regression algorithm in sum space of reproducing kernel Hilbert spaces (RKHSs) for nonflat function approximation, and obtains the solution of the algorithm by solving a system of linear equations. This algorithm can approximate the low- and high-frequency component of the target function with large and small scale kernels, respectively. The convergence and learning rate are analyzed. We measure the complexity of the sum space by its covering number and demonstrate that the covering number can be bounded by the product of the covering numbers of basic RKHSs. For sum space of RKHSs with Gaussian kernels, by choosing appropriate parameters, we tradeoff the sample error and regularization error, and obtain a polynomial learning rate, which is better than that in any single RKHS. The utility of this method is illustrated with two simulated data sets and five real-life databases.
NASA Astrophysics Data System (ADS)
Nugroho, N. F. T. A.; Slamet, I.
2018-05-01
Poverty is a socio-economic condition of a person or group of people who can not fulfil their basic need to maintain and develop a dignified life. This problem still cannot be solved completely in Central Java Province. Currently, the percentage of poverty in Central Java is 13.32% which is higher than the national poverty rate which is 11.13%. In this research, data of percentage of poor people in Central Java Province has been analyzed through geographically weighted regression (GWR). The aim of this research is therefore to model poverty percentage data in Central Java Province using GWR with weighted function of kernel bisquare, and tricube. As the results, we obtained GWR model with bisquare and tricube kernel weighted function on poverty percentage data in Central Java province. From the GWR model, there are three categories of region which are influenced by different of significance factors.
An information theoretic approach of designing sparse kernel adaptive filters.
Liu, Weifeng; Park, Il; Principe, José C
2009-12-01
This paper discusses an information theoretic approach of designing sparse kernel adaptive filters. To determine useful data to be learned and remove redundant ones, a subjective information measure called surprise is introduced. Surprise captures the amount of information a datum contains which is transferable to a learning system. Based on this concept, we propose a systematic sparsification scheme, which can drastically reduce the time and space complexity without harming the performance of kernel adaptive filters. Nonlinear regression, short term chaotic time-series prediction, and long term time-series forecasting examples are presented.
NASA Astrophysics Data System (ADS)
Ma, Zhi-Sai; Liu, Li; Zhou, Si-Da; Yu, Lei; Naets, Frank; Heylen, Ward; Desmet, Wim
2018-01-01
The problem of parametric output-only identification of time-varying structures in a recursive manner is considered. A kernelized time-dependent autoregressive moving average (TARMA) model is proposed by expanding the time-varying model parameters onto the basis set of kernel functions in a reproducing kernel Hilbert space. An exponentially weighted kernel recursive extended least squares TARMA identification scheme is proposed, and a sliding-window technique is subsequently applied to fix the computational complexity for each consecutive update, allowing the method to operate online in time-varying environments. The proposed sliding-window exponentially weighted kernel recursive extended least squares TARMA method is employed for the identification of a laboratory time-varying structure consisting of a simply supported beam and a moving mass sliding on it. The proposed method is comparatively assessed against an existing recursive pseudo-linear regression TARMA method via Monte Carlo experiments and shown to be capable of accurately tracking the time-varying dynamics. Furthermore, the comparisons demonstrate the superior achievable accuracy, lower computational complexity and enhanced online identification capability of the proposed kernel recursive extended least squares TARMA approach.
Lu, Zhao; Sun, Jing; Butts, Kenneth
2014-05-01
Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.
Adaptive Laplacian filtering for sensorimotor rhythm-based brain-computer interfaces.
Lu, Jun; McFarland, Dennis J; Wolpaw, Jonathan R
2013-02-01
Sensorimotor rhythms (SMRs) are 8-30 Hz oscillations in the electroencephalogram (EEG) recorded from the scalp over sensorimotor cortex that change with movement and/or movement imagery. Many brain-computer interface (BCI) studies have shown that people can learn to control SMR amplitudes and can use that control to move cursors and other objects in one, two or three dimensions. At the same time, if SMR-based BCIs are to be useful for people with neuromuscular disabilities, their accuracy and reliability must be improved substantially. These BCIs often use spatial filtering methods such as common average reference (CAR), Laplacian (LAP) filter or common spatial pattern (CSP) filter to enhance the signal-to-noise ratio of EEG. Here, we test the hypothesis that a new filter design, called an 'adaptive Laplacian (ALAP) filter', can provide better performance for SMR-based BCIs. An ALAP filter employs a Gaussian kernel to construct a smooth spatial gradient of channel weights and then simultaneously seeks the optimal kernel radius of this spatial filter and the regularization parameter of linear ridge regression. This optimization is based on minimizing the leave-one-out cross-validation error through a gradient descent method and is computationally feasible. Using a variety of kinds of BCI data from a total of 22 individuals, we compare the performances of ALAP filter to CAR, small LAP, large LAP and CSP filters. With a large number of channels and limited data, ALAP performs significantly better than CSP, CAR, small LAP and large LAP both in classification accuracy and in mean-squared error. Using fewer channels restricted to motor areas, ALAP is still superior to CAR, small LAP and large LAP, but equally matched to CSP. Thus, ALAP may help to improve the accuracy and robustness of SMR-based BCIs.
Adaptive Laplacian filtering for sensorimotor rhythm-based brain-computer interfaces
NASA Astrophysics Data System (ADS)
Lu, Jun; McFarland, Dennis J.; Wolpaw, Jonathan R.
2013-02-01
Objective. Sensorimotor rhythms (SMRs) are 8-30 Hz oscillations in the electroencephalogram (EEG) recorded from the scalp over sensorimotor cortex that change with movement and/or movement imagery. Many brain-computer interface (BCI) studies have shown that people can learn to control SMR amplitudes and can use that control to move cursors and other objects in one, two or three dimensions. At the same time, if SMR-based BCIs are to be useful for people with neuromuscular disabilities, their accuracy and reliability must be improved substantially. These BCIs often use spatial filtering methods such as common average reference (CAR), Laplacian (LAP) filter or common spatial pattern (CSP) filter to enhance the signal-to-noise ratio of EEG. Here, we test the hypothesis that a new filter design, called an ‘adaptive Laplacian (ALAP) filter’, can provide better performance for SMR-based BCIs. Approach. An ALAP filter employs a Gaussian kernel to construct a smooth spatial gradient of channel weights and then simultaneously seeks the optimal kernel radius of this spatial filter and the regularization parameter of linear ridge regression. This optimization is based on minimizing the leave-one-out cross-validation error through a gradient descent method and is computationally feasible. Main results. Using a variety of kinds of BCI data from a total of 22 individuals, we compare the performances of ALAP filter to CAR, small LAP, large LAP and CSP filters. With a large number of channels and limited data, ALAP performs significantly better than CSP, CAR, small LAP and large LAP both in classification accuracy and in mean-squared error. Using fewer channels restricted to motor areas, ALAP is still superior to CAR, small LAP and large LAP, but equally matched to CSP. Significance. Thus, ALAP may help to improve the accuracy and robustness of SMR-based BCIs.
Sparse kernel methods for high-dimensional survival data.
Evers, Ludger; Messow, Claudia-Martina
2008-07-15
Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Evans, Thomas; Hamilton, Steven; Slattery, Stuart
Profugus is an open-source mini-application (mini-app) for radiation transport and reactor applications. It contains the fundamental computational kernels used in the Exnihilo code suite from Oak Ridge National Laboratory. However, Exnihilo is production code with a substantial user base. Furthermore, Exnihilo is export controlled. This makes collaboration with computer scientists and computer engineers difficult. Profugus is designed to bridge that gap. By encapsulating the core numerical algorithms in an abbreviated code base that is open-source, computer scientists can analyze the algorithms and easily make code-architectural changes to test performance without compromising the production code values of Exnihilo. Profugus is notmore » meant to be production software with respect to problem analysis. The computational kernels in Profugus are designed to analyze performance, not correctness. Nonetheless, users of Profugus can setup and run problems with enough real-world features to be useful as proof-of-concept for actual production work.« less
Balancing Particle and Mesh Computation in a Particle-In-Cell Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Worley, Patrick H; D'Azevedo, Eduardo; Hager, Robert
2016-01-01
The XGC1 plasma microturbulence particle-in-cell simulation code has both particle-based and mesh-based computational kernels that dominate performance. Both of these are subject to load imbalances that can degrade performance and that evolve during a simulation. Each separately can be addressed adequately, but optimizing just for one can introduce significant load imbalances in the other, degrading overall performance. A technique has been developed based on Golden Section Search that minimizes wallclock time given prior information on wallclock time, and on current particle distribution and mesh cost per cell, and also adapts to evolution in load imbalance in both particle and meshmore » work. In problems of interest this doubled the performance on full system runs on the XK7 at the Oak Ridge Leadership Computing Facility compared to load balancing only one of the kernels.« less
Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures.
Bobb, Jennifer F; Valeri, Linda; Claus Henn, Birgit; Christiani, David C; Wright, Robert O; Mazumdar, Maitreyi; Godleski, John J; Coull, Brent A
2015-07-01
Because humans are invariably exposed to complex chemical mixtures, estimating the health effects of multi-pollutant exposures is of critical concern in environmental epidemiology, and to regulatory agencies such as the U.S. Environmental Protection Agency. However, most health effects studies focus on single agents or consider simple two-way interaction models, in part because we lack the statistical methodology to more realistically capture the complexity of mixed exposures. We introduce Bayesian kernel machine regression (BKMR) as a new approach to study mixtures, in which the health outcome is regressed on a flexible function of the mixture (e.g. air pollution or toxic waste) components that is specified using a kernel function. In high-dimensional settings, a novel hierarchical variable selection approach is incorporated to identify important mixture components and account for the correlated structure of the mixture. Simulation studies demonstrate the success of BKMR in estimating the exposure-response function and in identifying the individual components of the mixture responsible for health effects. We demonstrate the features of the method through epidemiology and toxicology applications. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Kwong, Qi Bin; Ong, Ai Ling; Teh, Chee Keng; Chew, Fook Tim; Tammi, Martti; Mayes, Sean; Kulaveerasingam, Harikrishna; Yeoh, Suat Hui; Harikrishna, Jennifer Ann; Appleton, David Ross
2017-06-06
Genomic selection (GS) uses genome-wide markers to select individuals with the desired overall combination of breeding traits. A total of 1,218 individuals from a commercial population of Ulu Remis x AVROS (UR x AVROS) were genotyped using the OP200K array. The traits of interest included: shell-to-fruit ratio (S/F, %), mesocarp-to-fruit ratio (M/F, %), kernel-to-fruit ratio (K/F, %), fruit per bunch (F/B, %), oil per bunch (O/B, %) and oil per palm (O/P, kg/palm/year). Genomic heritabilities of these traits were estimated to be in the range of 0.40 to 0.80. GS methods assessed were RR-BLUP, Bayes A (BA), Cπ (BC), Lasso (BL) and Ridge Regression (BRR). All methods resulted in almost equal prediction accuracy. The accuracy achieved ranged from 0.40 to 0.70, correlating with the heritability of traits. By selecting the most important markers, RR-BLUP B has the potential to outperform other methods. The marker density for certain traits can be further reduced based on the linkage disequilibrium (LD). Together with in silico breeding, GS is now being used in oil palm breeding programs to hasten parental palm selection.
An application of robust ridge regression model in the presence of outliers to real data problem
NASA Astrophysics Data System (ADS)
Shariff, N. S. Md.; Ferdaos, N. A.
2017-09-01
Multicollinearity and outliers are often leads to inconsistent and unreliable parameter estimates in regression analysis. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is believed are affected by the presence of outlier. The combination of GM-estimation and ridge parameter that is robust towards both problems is on interest in this study. As such, both techniques are employed to investigate the relationship between stock market price and macroeconomic variables in Malaysia due to curiosity of involving the multicollinearity and outlier problem in the data set. There are four macroeconomic factors selected for this study which are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The results demonstrate that the proposed procedure is able to produce reliable results towards the presence of multicollinearity and outliers in the real data.
An experimental validation of genomic selection in octoploid strawberry
Gezan, Salvador A; Osorio, Luis F; Verma, Sujeet; Whitaker, Vance M
2017-01-01
The primary goal of genomic selection is to increase genetic gains for complex traits by predicting performance of individuals for which phenotypic data are not available. The objective of this study was to experimentally evaluate the potential of genomic selection in strawberry breeding and to define a strategy for its implementation. Four clonally replicated field trials, two in each of 2 years comprised of a total of 1628 individuals, were established in 2013–2014 and 2014–2015. Five complex yield and fruit quality traits with moderate to low heritability were assessed in each trial. High-density genotyping was performed with the Affymetrix Axiom IStraw90 single-nucleotide polymorphism array, and 17 479 polymorphic markers were chosen for analysis. Several methods were compared, including Genomic BLUP, Bayes B, Bayes C, Bayesian LASSO Regression, Bayesian Ridge Regression and Reproducing Kernel Hilbert Spaces. Cross-validation within training populations resulted in higher values than for true validations across trials. For true validations, Bayes B gave the highest predictive abilities on average and also the highest selection efficiencies, particularly for yield traits that were the lowest heritability traits. Selection efficiencies using Bayes B for parent selection ranged from 74% for average fruit weight to 34% for early marketable yield. A breeding strategy is proposed in which advanced selection trials are utilized as training populations and in which genomic selection can reduce the breeding cycle from 3 to 2 years for a subset of untested parents based on their predicted genomic breeding values. PMID:28090334
Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.
Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan
2016-11-01
In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects. Copyright © 2016 Crop Science Society of America.
Locally Based Kernel PLS Regression De-noising with Application to Event-Related Potentials
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Trejo, Leonard J.; Wheeler, Kevin; Tino, Peter
2002-01-01
The close relation of signal de-noising and regression problems dealing with the estimation of functions reflecting dependency between a set of inputs and dependent outputs corrupted with some level of noise have been employed in our approach.
Roso, V M; Schenkel, F S; Miller, S P; Schaeffer, L R
2005-08-01
Breed additive, dominance, and epistatic loss effects are of concern in the genetic evaluation of a multibreed population. Multiple regression equations used for fitting these effects may show a high degree of multicollinearity among predictor variables. Typically, when strong linear relationships exist, the regression coefficients have large SE and are sensitive to changes in the data file and to the addition or deletion of variables in the model. Generalized ridge regression methods were applied to obtain stable estimates of direct and maternal breed additive, dominance, and epistatic loss effects in the presence of multicollinearity among predictor variables. Preweaning weight gains of beef calves in Ontario, Canada, from 1986 to 1999 were analyzed. The genetic model included fixed direct and maternal breed additive, dominance, and epistatic loss effects, fixed environmental effects of age of the calf, contemporary group, and age of the dam x sex of the calf, random additive direct and maternal genetic effects, and random maternal permanent environment effect. The degree and the nature of the multicollinearity were identified and ridge regression methods were used as an alternative to ordinary least squares (LS). Ridge parameters were obtained using two different objective methods: 1) generalized ridge estimator of Hoerl and Kennard (R1); and 2) bootstrap in combination with cross-validation (R2). Both ridge regression methods outperformed the LS estimator with respect to mean squared error of predictions (MSEP) and variance inflation factors (VIF) computed over 100 bootstrap samples. The MSEP of R1 and R2 were similar, and they were 3% less than the MSEP of LS. The average VIF of LS, R1, and R2 were equal to 26.81, 6.10, and 4.18, respectively. Ridge regression methods were particularly effective in decreasing the multicollinearity involving predictor variables of breed additive effects. Because of a high degree of confounding between estimates of maternal dominance and direct epistatic loss effects, it was not possible to compare the relative importance of these effects with a high level of confidence. The inclusion of epistatic loss effects in the additive-dominance model did not cause noticeable reranking of sires, dams, and calves based on across-breed EBV. More precise estimates of breed effects as a result of this study may result in more stable across-breed estimated breeding values over the years.
Factors affecting cadmium absorbed by pistachio kernel in calcareous soils, southeast of Iran.
Shirani, H; Hosseinifard, S J; Hashemipour, H
2018-03-01
Cadmium (Cd) which does not have a biological role is one of the most toxic heavy metals for organisms. This metal enters environment through industrial processes and fertilizers. The main objective of this study was to determine the relationships between absorbed Cd by pistachio kernel and some of soil physical and chemical characteristics using modeling by stepwise regression and Artificial Neural Network (ANN), in calcareous soils in Rafsanjan region, southeast of Iran. For these purposes, 220 pistachio orchards were selected, and soil samples were taken from two depths of 0-40 and 40-80cm. Besides, fruit and leaf samples from branches with and without fruit were taken in each sampling point. The results showed that affecting factors on absorbed Cd by pistachio kernel which were obtained by regression method (pH and clay percent) were not interpretable, and considering unsuitable vales of determinant coefficient (R 2 ) and Root Mean Squares Error (RMSE), the model did not have sufficient validity. However, ANN modeling was highly accurate and reliable. Based on its results, soil available P and Zn and soil salinity were the most important factors affecting the concentration of Cd in pistachio kernel in pistachio growing areas of Rafsanjan. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Shah-Heydari pour, A.; Pahlavani, P.; Bigdeli, B.
2017-09-01
According to the industrialization of cities and the apparent increase in pollutants and greenhouse gases, the importance of forests as the natural lungs of the earth is felt more than ever to clean these pollutants. Annually, a large part of the forests is destroyed due to the lack of timely action during the fire. Knowledge about areas with a high-risk of fire and equipping these areas by constructing access routes and allocating the fire-fighting equipment can help to eliminate the destruction of the forest. In this research, the fire risk of region was forecasted and the risk map of that was provided using MODIS images by applying geographically weighted regression model with Gaussian kernel and ordinary least squares over the effective parameters in forest fire including distance from residential areas, distance from the river, distance from the road, height, slope, aspect, soil type, land use, average temperature, wind speed, and rainfall. After the evaluation, it was found that the geographically weighted regression model with Gaussian kernel forecasted 93.4% of the all fire points properly, however the ordinary least squares method could forecast properly only 66% of the fire points.
Kim, Sungjin; Jinich, Adrián; Aspuru-Guzik, Alán
2017-04-24
We propose a multiple descriptor multiple kernel (MultiDK) method for efficient molecular discovery using machine learning. We show that the MultiDK method improves both the speed and accuracy of molecular property prediction. We apply the method to the discovery of electrolyte molecules for aqueous redox flow batteries. Using multiple-type-as opposed to single-type-descriptors, we obtain more relevant features for machine learning. Following the principle of "wisdom of the crowds", the combination of multiple-type descriptors significantly boosts prediction performance. Moreover, by employing multiple kernels-more than one kernel function for a set of the input descriptors-MultiDK exploits nonlinear relations between molecular structure and properties better than a linear regression approach. The multiple kernels consist of a Tanimoto similarity kernel and a linear kernel for a set of binary descriptors and a set of nonbinary descriptors, respectively. Using MultiDK, we achieve an average performance of r 2 = 0.92 with a test set of molecules for solubility prediction. We also extend MultiDK to predict pH-dependent solubility and apply it to a set of quinone molecules with different ionizable functional groups to assess their performance as flow battery electrolytes.
Zhang, Wencan; Leong, Siew Mun; Zhao, Feifei; Zhao, Fangju; Yang, Tiankui; Liu, Shaoquan
2018-05-01
With an interest to enhance the aroma of palm kernel oil (PKO), Viscozyme L, an enzyme complex containing a wide range of carbohydrases, was applied to alter the carbohydrates in palm kernels (PK) to modulate the formation of volatiles upon kernel roasting. After Viscozyme treatment, the content of simple sugars and free amino acids in PK increased by 4.4-fold and 4.5-fold, respectively. After kernel roasting and oil extraction, significantly more 2,5-dimethylfuran, 2-[(methylthio)methyl]-furan, 1-(2-furanyl)-ethanone, 1-(2-furyl)-2-propanone, 5-methyl-2-furancarboxaldehyde and 2-acetyl-5-methylfuran but less 2-furanmethanol and 2-furanmethanol acetate were found in treated PKO; the correlation between their formation and simple sugar profile was estimated by using partial least square regression (PLS1). Obvious differences in pyrroles and Strecker aldehydes were also found between the control and treated PKOs. Principal component analysis (PCA) clearly discriminated the treated PKOs from that of control PKOs on the basis of all volatile compounds. Such changes in volatiles translated into distinct sensory attributes, whereby treated PKO was more caramelic and burnt after aqueous extraction and more nutty, roasty, caramelic and smoky after solvent extraction. Copyright © 2018 Elsevier Ltd. All rights reserved.
Pulsations, interpulsations, and sea-floor spreading.
NASA Technical Reports Server (NTRS)
Pessagno, E. A., Jr.
1973-01-01
It is postulated that worldwide transgressions (pulsations) and regressions (interpulsations) through the course of geologic time are related to the elevation and subsidence of oceanic ridge systems and to sea-floor spreading. Two multiple working hypotheses are advanced to explain major transgressions and regressions and the elevation and subsidence of oceanic ridge systems. One hypothesis interrelates the sea-floor spreading hypothesis to the hypothesis of sub-Mohorovicic serpentinization. The second hypothesis relates the sea-floor spreading hypothesis to a hypothesis involving thermal expansion and contraction.
NASA Astrophysics Data System (ADS)
Hart, Vern; Burrow, Damon; Li, X. Allen
2017-08-01
A systematic method is presented for determining optimal parameters in variable-kernel deformable image registration of cone beam CT and CT images, in order to improve accuracy and convergence for potential use in online adaptive radiotherapy. Assessed conditions included the noise constant (symmetric force demons), the kernel reduction rate, the kernel reduction percentage, and the kernel adjustment criteria. Four such parameters were tested in conjunction with reductions of 5, 10, 15, 20, 30, and 40%. Noise constants ranged from 1.0 to 1.9 for pelvic images in ten prostate cancer patients. A total of 516 tests were performed and assessed using the structural similarity index. Registration accuracy was plotted as a function of iteration number and a least-squares regression line was calculated, which implied an average improvement of 0.0236% per iteration. This baseline was used to determine if a given set of parameters under- or over-performed. The most accurate parameters within this range were applied to contoured images. The mean Dice similarity coefficient was calculated for bladder, prostate, and rectum with mean values of 98.26%, 97.58%, and 96.73%, respectively; corresponding to improvements of 2.3%, 9.8%, and 1.2% over previously reported values for the same organ contours. This graphical approach to registration analysis could aid in determining optimal parameters for Demons-based algorithms. It also establishes expectation values for convergence rates and could serve as an indicator of non-physical warping, which often occurred in cases >0.6% from the regression line.
ERIC Educational Resources Information Center
Rule, David L.
Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…
Bivariate discrete beta Kernel graduation of mortality data.
Mazza, Angelo; Punzo, Antonio
2015-07-01
Various parametric/nonparametric techniques have been proposed in literature to graduate mortality data as a function of age. Nonparametric approaches, as for example kernel smoothing regression, are often preferred because they do not assume any particular mortality law. Among the existing kernel smoothing approaches, the recently proposed (univariate) discrete beta kernel smoother has been shown to provide some benefits. Bivariate graduation, over age and calendar years or durations, is common practice in demography and actuarial sciences. In this paper, we generalize the discrete beta kernel smoother to the bivariate case, and we introduce an adaptive bandwidth variant that may provide additional benefits when data on exposures to the risk of death are available; furthermore, we outline a cross-validation procedure for bandwidths selection. Using simulations studies, we compare the bivariate approach proposed here with its corresponding univariate formulation and with two popular nonparametric bivariate graduation techniques, based on Epanechnikov kernels and on P-splines. To make simulations realistic, a bivariate dataset, based on probabilities of dying recorded for the US males, is used. Simulations have confirmed the gain in performance of the new bivariate approach with respect to both the univariate and the bivariate competitors.
Schaid, Daniel J
2010-01-01
Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
Yao, H; Hruska, Z; Kincaid, R; Brown, R; Cleveland, T; Bhatnagar, D
2010-05-01
The objective of this study was to examine the relationship between fluorescence emissions of corn kernels inoculated with Aspergillus flavus and aflatoxin contamination levels within the kernels. Aflatoxin contamination in corn has been a long-standing problem plaguing the grain industry with potentially devastating consequences to corn growers. In this study, aflatoxin-contaminated corn kernels were produced through artificial inoculation of corn ears in the field with toxigenic A. flavus spores. The kernel fluorescence emission data were taken with a fluorescence hyperspectral imaging system when corn kernels were excited with ultraviolet light. Raw fluorescence image data were preprocessed and regions of interest in each image were created for all kernels. The regions of interest were used to extract spectral signatures and statistical information. The aflatoxin contamination level of single corn kernels was then chemically measured using affinity column chromatography. A fluorescence peak shift phenomenon was noted among different groups of kernels with different aflatoxin contamination levels. The fluorescence peak shift was found to move more toward the longer wavelength in the blue region for the highly contaminated kernels and toward the shorter wavelengths for the clean kernels. Highly contaminated kernels were also found to have a lower fluorescence peak magnitude compared with the less contaminated kernels. It was also noted that a general negative correlation exists between measured aflatoxin and the fluorescence image bands in the blue and green regions. The correlation coefficients of determination, r(2), was 0.72 for the multiple linear regression model. The multivariate analysis of variance found that the fluorescence means of four aflatoxin groups, <1, 1-20, 20-100, and >or=100 ng g(-1) (parts per billion), were significantly different from each other at the 0.01 level of alpha. Classification accuracy under a two-class schema ranged from 0.84 to 0.91 when a threshold of either 20 or 100 ng g(-1) was used. Overall, the results indicate that fluorescence hyperspectral imaging may be applicable in estimating aflatoxin content in individual corn kernels.
SVM-Based Synthetic Fingerprint Discrimination Algorithm and Quantitative Optimization Strategy
Chen, Suhang; Chang, Sheng; Huang, Qijun; He, Jin; Wang, Hao; Huang, Qiangui
2014-01-01
Synthetic fingerprints are a potential threat to automatic fingerprint identification systems (AFISs). In this paper, we propose an algorithm to discriminate synthetic fingerprints from real ones. First, four typical characteristic factors—the ridge distance features, global gray features, frequency feature and Harris Corner feature—are extracted. Then, a support vector machine (SVM) is used to distinguish synthetic fingerprints from real fingerprints. The experiments demonstrate that this method can achieve a recognition accuracy rate of over 98% for two discrete synthetic fingerprint databases as well as a mixed database. Furthermore, a performance factor that can evaluate the SVM's accuracy and efficiency is presented, and a quantitative optimization strategy is established for the first time. After the optimization of our synthetic fingerprint discrimination task, the polynomial kernel with a training sample proportion of 5% is the optimized value when the minimum accuracy requirement is 95%. The radial basis function (RBF) kernel with a training sample proportion of 15% is a more suitable choice when the minimum accuracy requirement is 98%. PMID:25347063
Does money matter in inflation forecasting?
NASA Astrophysics Data System (ADS)
Binner, J. M.; Tino, P.; Tepper, J.; Anderson, R.; Jones, B.; Kendall, G.
2010-11-01
This paper provides the most fully comprehensive evidence to date on whether or not monetary aggregates are valuable for forecasting US inflation in the early to mid 2000s. We explore a wide range of different definitions of money, including different methods of aggregation and different collections of included monetary assets. In our forecasting experiment we use two nonlinear techniques, namely, recurrent neural networks and kernel recursive least squares regression-techniques that are new to macroeconomics. Recurrent neural networks operate with potentially unbounded input memory, while the kernel regression technique is a finite memory predictor. The two methodologies compete to find the best fitting US inflation forecasting models and are then compared to forecasts from a naïve random walk model. The best models were nonlinear autoregressive models based on kernel methods. Our findings do not provide much support for the usefulness of monetary aggregates in forecasting inflation. Beyond its economic findings, our study is in the tradition of physicists’ long-standing interest in the interconnections among statistical mechanics, neural networks, and related nonparametric statistical methods, and suggests potential avenues of extension for such studies.
Spectral imaging using consumer-level devices and kernel-based regression.
Heikkinen, Ville; Cámara, Clara; Hirvonen, Tapani; Penttinen, Niko
2016-06-01
Hyperspectral reflectance factor image estimations were performed in the 400-700 nm wavelength range using a portable consumer-level laptop display as an adjustable light source for a trichromatic camera. Targets of interest were ColorChecker Classic samples, Munsell Matte samples, geometrically challenging tempera icon paintings from the turn of the 20th century, and human hands. Measurements and simulations were performed using Nikon D80 RGB camera and Dell Vostro 2520 laptop screen as a light source. Estimations were performed without spectral characteristics of the devices and by emphasizing simplicity for training sets and estimation model optimization. Spectral and color error images are shown for the estimations using line-scanned hyperspectral images as the ground truth. Estimations were performed using kernel-based regression models via a first-degree inhomogeneous polynomial kernel and a Matérn kernel, where in the latter case the median heuristic approach for model optimization and link function for bounded estimation were evaluated. Results suggest modest requirements for a training set and show that all estimation models have markedly improved accuracy with respect to the DE00 color distance (up to 99% for paintings and hands) and the Pearson distance (up to 98% for paintings and 99% for hands) from a weak training set (Digital ColorChecker SG) case when small representative training data were used in the estimation.
Stochastic subset selection for learning with kernel machines.
Rhinelander, Jason; Liu, Xiaoping P
2012-06-01
Kernel machines have gained much popularity in applications of machine learning. Support vector machines (SVMs) are a subset of kernel machines and generalize well for classification, regression, and anomaly detection tasks. The training procedure for traditional SVMs involves solving a quadratic programming (QP) problem. The QP problem scales super linearly in computational effort with the number of training samples and is often used for the offline batch processing of data. Kernel machines operate by retaining a subset of observed data during training. The data vectors contained within this subset are referred to as support vectors (SVs). The work presented in this paper introduces a subset selection method for the use of kernel machines in online, changing environments. Our algorithm works by using a stochastic indexing technique when selecting a subset of SVs when computing the kernel expansion. The work described here is novel because it separates the selection of kernel basis functions from the training algorithm used. The subset selection algorithm presented here can be used in conjunction with any online training technique. It is important for online kernel machines to be computationally efficient due to the real-time requirements of online environments. Our algorithm is an important contribution because it scales linearly with the number of training samples and is compatible with current training techniques. Our algorithm outperforms standard techniques in terms of computational efficiency and provides increased recognition accuracy in our experiments. We provide results from experiments using both simulated and real-world data sets to verify our algorithm.
A simulation study on Bayesian Ridge regression models for several collinearity levels
NASA Astrophysics Data System (ADS)
Efendi, Achmad; Effrihan
2017-12-01
When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.
The relationship of the concentration of air pollutants to wind direction has been determined by nonparametric regression using a Gaussian kernel. The results are smooth curves with error bars that allow for the accurate determination of the wind direction where the concentrat...
Decreasing Multicollinearity: A Method for Models with Multiplicative Functions.
ERIC Educational Resources Information Center
Smith, Kent W.; Sasaki, M. S.
1979-01-01
A method is proposed for overcoming the problem of multicollinearity in multiple regression equations where multiplicative independent terms are entered. The method is not a ridge regression solution. (JKS)
Alchemical and structural distribution based representation for universal quantum machine learning
NASA Astrophysics Data System (ADS)
Faber, Felix A.; Christensen, Anders S.; Huang, Bing; von Lilienfeld, O. Anatole
2018-06-01
We introduce a representation of any atom in any chemical environment for the automatized generation of universal kernel ridge regression-based quantum machine learning (QML) models of electronic properties, trained throughout chemical compound space. The representation is based on Gaussian distribution functions, scaled by power laws and explicitly accounting for structural as well as elemental degrees of freedom. The elemental components help us to lower the QML model's learning curve, and, through interpolation across the periodic table, even enable "alchemical extrapolation" to covalent bonding between elements not part of training. This point is demonstrated for the prediction of covalent binding in single, double, and triple bonds among main-group elements as well as for atomization energies in organic molecules. We present numerical evidence that resulting QML energy models, after training on a few thousand random training instances, reach chemical accuracy for out-of-sample compounds. Compound datasets studied include thousands of structurally and compositionally diverse organic molecules, non-covalently bonded protein side-chains, (H2O)40-clusters, and crystalline solids. Learning curves for QML models also indicate competitive predictive power for various other electronic ground state properties of organic molecules, calculated with hybrid density functional theory, including polarizability, heat-capacity, HOMO-LUMO eigenvalues and gap, zero point vibrational energy, dipole moment, and highest vibrational fundamental frequency.
Structured functional additive regression in reproducing kernel Hilbert spaces.
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2014-06-01
Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application.
Graphical Evaluation of the Ridge-Type Robust Regression Estimators in Mixture Experiments
Erkoc, Ali; Emiroglu, Esra
2014-01-01
In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set. PMID:25202738
Graphical evaluation of the ridge-type robust regression estimators in mixture experiments.
Erkoc, Ali; Emiroglu, Esra; Akay, Kadri Ulas
2014-01-01
In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.
Accurate interatomic force fields via machine learning with covariant kernels
NASA Astrophysics Data System (ADS)
Glielmo, Aldo; Sollich, Peter; De Vita, Alessandro
2017-06-01
We present a novel scheme to accurately predict atomic forces as vector quantities, rather than sets of scalar components, by Gaussian process (GP) regression. This is based on matrix-valued kernel functions, on which we impose the requirements that the predicted force rotates with the target configuration and is independent of any rotations applied to the configuration database entries. We show that such covariant GP kernels can be obtained by integration over the elements of the rotation group SO (d ) for the relevant dimensionality d . Remarkably, in specific cases the integration can be carried out analytically and yields a conservative force field that can be recast into a pair interaction form. Finally, we show that restricting the integration to a summation over the elements of a finite point group relevant to the target system is sufficient to recover an accurate GP. The accuracy of our kernels in predicting quantum-mechanical forces in real materials is investigated by tests on pure and defective Ni, Fe, and Si crystalline systems.
Comparative efficacy of storage bags, storability and damage potential of bruchid beetle.
Harish, G; Nataraja, M V; Ajay, B C; Holajjer, Prasanna; Savaliya, S D; Gedia, M V
2014-12-01
Groundnut during storage is attacked by number of stored grain pests and management of these insect pests particularly bruchid beetle, Caryedon serratus (Oliver) is of prime importance as they directly damage the pod and kernels. In this regard different storage bags that could be used and duration up to which we can store groundnut has been studied. Super grain bag recorded minimum number of eggs laid and less damage and minimum weight loss in pods and kernels in comparison to other storage bags. Analysis of variance for multiple regression models were found to be significant in all bags for variables viz, number of eggs laid, damage in pods and kernels, weight loss in pods and kernels throughout the season. Multiple comparison results showed that there was a high probability of eggs laid and pod damage in lino bag, fertilizer bag and gunny bag, whereas super grain bag was found to be more effective in managing the C. serratus owing to very low air circulation.
Cao, Ana; Santiago, Rogelio; Ramos, Antonio J; Souto, Xosé C; Aguín, Olga; Malvar, Rosa Ana; Butrón, Ana
2014-05-02
In northwestern Spain, where weather is rainy and mild throughout the year, Fusarium verticillioides is the most prevalent fungus in kernels and a significant risk of fumonisin contamination has been exposed. In this study, detailed information about environmental and maize genotypic factors affecting F. verticillioides infection, fungal growth and fumonisin content in maize kernels was obtained in order to establish control points to reduce fumonisin contamination. Evaluations were conducted in a total of 36 environments and factorial regression analyses were performed to determine the contribution of each factor to variability among environments, genotypes, and genotype × environment interactions for F. verticillioides infection, fungal growth and fumonisin content. Flowering and kernel drying were the most critical periods throughout the growing season for F. verticillioides infection and fumonisin contamination. Around flowering, wetter and cooler conditions limited F. verticillioides infection and growth, and high temperatures increased fumonisin contents. During kernel drying, increased damaged kernels favored fungal growth, and higher ear damage by corn borers and hard rainfall favored fumonisin accumulation. Later planting dates and especially earlier harvest dates reduced the risk of fumonisin contamination, possibly due to reduced incidence of insects and accumulation of rainfall during the kernel drying period. The use of maize varieties resistant to Sitotroga cerealella, with good husk coverage and non-excessive pericarp thickness could also be useful to reduce fumonisin contamination of maize kernels. Copyright © 2014 Elsevier B.V. All rights reserved.
A Solution to Separation and Multicollinearity in Multiple Logistic Regression
Shen, Jianzhao; Gao, Sujuan
2010-01-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Structured functional additive regression in reproducing kernel Hilbert spaces
Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen
2013-01-01
Summary Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application. PMID:25013362
Multi-Target Regression via Robust Low-Rank Learning.
Zhen, Xiantong; Yu, Mengyang; He, Xiaofei; Li, Shuo
2018-02-01
Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
Advantages of normalization regression estimation over ridge regression estimation are demonstrated by reference to Bloom's model of school learning. Theoretical concern centered on the structure of scholastic achievement at grade 10 in Canadian high schools. Data on 886 students were randomly sampled from the Carnegie Human Resources Data Bank.…
Jović, Ozren; Smrečki, Neven; Popović, Zora
2016-04-01
A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for p<0.05). Also, iRR can be a fast alternative to iPLS, especially in case of unknown degree of complexity of analyzed system, i.e. if upper limit of number of latent variables is not easily estimated for iPLS. Adulteration of hempseed (H) oil, a well known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as soybean (So) oil, rapeseed (R) oil and sunflower (Su) oil. Binary mixture sets of hempseed oil with these three oils (HSo, HR and HSu) and a ternary mixture set of H oil, R oil and Su oil (HRSu) were considered. The obtained accuracy indicates that using iRR on FTIR and UV-vis data, each particular oil can be very successfully quantified (in all 8 cases RMSEP<1.2%). This means that FTIR-ATR coupled with iRR can very rapidly and effectively determine the level of adulteration in the adulterated hempseed oil (R(2)>0.99). Copyright © 2015 Elsevier B.V. All rights reserved.
Huang, Desheng; Guan, Peng; Guo, Junqiao; Wang, Ping; Zhou, Baosen
2008-01-01
Background The effects of climate variations on bacillary dysentery incidence have gained more recent concern. However, the multi-collinearity among meteorological factors affects the accuracy of correlation with bacillary dysentery incidence. Methods As a remedy, a modified method to combine ridge regression and hierarchical cluster analysis was proposed for investigating the effects of climate variations on bacillary dysentery incidence in northeast China. Results All weather indicators, temperatures, precipitation, evaporation and relative humidity have shown positive correlation with the monthly incidence of bacillary dysentery, while air pressure had a negative correlation with the incidence. Ridge regression and hierarchical cluster analysis showed that during 1987–1996, relative humidity, temperatures and air pressure affected the transmission of the bacillary dysentery. During this period, all meteorological factors were divided into three categories. Relative humidity and precipitation belonged to one class, temperature indexes and evaporation belonged to another class, and air pressure was the third class. Conclusion Meteorological factors have affected the transmission of bacillary dysentery in northeast China. Bacillary dysentery prevention and control would benefit from by giving more consideration to local climate variations. PMID:18816415
Explaining Support Vector Machines: A Color Based Nomogram
Van Belle, Vanya; Van Calster, Ben; Van Huffel, Sabine; Suykens, Johan A. K.; Lisboa, Paulo
2016-01-01
Problem setting Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting the models is far from obvious, especially when non-linear kernels are used. Hence, the methods are used as black boxes. As a consequence, the use of SVMs is less supported in areas where interpretability is important and where people are held responsible for the decisions made by models. Objective In this work, we investigate whether SVMs using linear, polynomial and RBF kernels can be explained such that interpretations for model-based decisions can be provided. We further indicate when SVMs can be explained and in which situations interpretation of SVMs is (hitherto) not possible. Here, explainability is defined as the ability to produce the final decision based on a sum of contributions which depend on one single or at most two input variables. Results Our experiments on simulated and real-life data show that explainability of an SVM depends on the chosen parameter values (degree of polynomial kernel, width of RBF kernel and regularization constant). When several combinations of parameter values yield the same cross-validation performance, combinations with a lower polynomial degree or a larger kernel width have a higher chance of being explainable. Conclusions This work summarizes SVM classifiers obtained with linear, polynomial and RBF kernels in a single plot. Linear and polynomial kernels up to the second degree are represented exactly. For other kernels an indication of the reliability of the approximation is presented. The complete methodology is available as an R package and two apps and a movie are provided to illustrate the possibilities offered by the method. PMID:27723811
A hybrid neural network model for noisy data regression.
Lee, Eric W M; Lim, Chee Peng; Yuen, Richard K K; Lo, S M
2004-04-01
A hybrid neural network model, based on the fusion of fuzzy adaptive resonance theory (FA ART) and the general regression neural network (GRNN), is proposed in this paper. Both FA and the GRNN are incremental learning systems and are very fast in network training. The proposed hybrid model, denoted as GRNNFA, is able to retain these advantages and, at the same time, to reduce the computational requirements in calculating and storing information of the kernels. A clustering version of the GRNN is designed with data compression by FA for noise removal. An adaptive gradient-based kernel width optimization algorithm has also been devised. Convergence of the gradient descent algorithm can be accelerated by the geometric incremental growth of the updating factor. A series of experiments with four benchmark datasets have been conducted to assess and compare effectiveness of GRNNFA with other approaches. The GRNNFA model is also employed in a novel application task for predicting the evacuation time of patrons at typical karaoke centers in Hong Kong in the event of fire. The results positively demonstrate the applicability of GRNNFA in noisy data regression problems.
[New method of mixed gas infrared spectrum analysis based on SVM].
Bai, Peng; Xie, Wen-Jun; Liu, Jun-Hua
2007-07-01
A new method of infrared spectrum analysis based on support vector machine (SVM) for mixture gas was proposed. The kernel function in SVM was used to map the seriously overlapping absorption spectrum into high-dimensional space, and after transformation, the high-dimensional data could be processed in the original space, so the regression calibration model was established, then the regression calibration model with was applied to analyze the concentration of component gas. Meanwhile it was proved that the regression calibration model with SVM also could be used for component recognition of mixture gas. The method was applied to the analysis of different data samples. Some factors such as scan interval, range of the wavelength, kernel function and penalty coefficient C that affect the model were discussed. Experimental results show that the component concentration maximal Mean AE is 0.132%, and the component recognition accuracy is higher than 94%. The problems of overlapping absorption spectrum, using the same method for qualitative and quantitative analysis, and limit number of training sample, were solved. The method could be used in other mixture gas infrared spectrum analyses, promising theoretic and application values.
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
Multicollinearity refers to the presence of highly intercorrelated independent variables in structural equation models, that is, models estimated by using techniques such as least squares regression and maximum likelihood. There is a problem of multicollinearity in both the natural and social sciences where theory formulation and estimation is in…
NASA Astrophysics Data System (ADS)
Mohan, Dhanya; Kumar, C. Santhosh
2016-03-01
Predicting the physiological condition (normal/abnormal) of a patient is highly desirable to enhance the quality of health care. Multi-parameter patient monitors (MPMs) using heart rate, arterial blood pressure, respiration rate and oxygen saturation (S pO2) as input parameters were developed to monitor the condition of patients, with minimum human resource utilization. The Support vector machine (SVM), an advanced machine learning approach popularly used for classification and regression is used for the realization of MPMs. For making MPMs cost effective, we experiment on the hardware implementation of the MPM using support vector machine classifier. The training of the system is done using the matlab environment and the detection of the alarm/noalarm condition is implemented in hardware. We used different kernels for SVM classification and note that the best performance was obtained using intersection kernel SVM (IKSVM). The intersection kernel support vector machine classifier MPM has outperformed the best known MPM using radial basis function kernel by an absoute improvement of 2.74% in accuracy, 1.86% in sensitivity and 3.01% in specificity. The hardware model was developed based on the improved performance system using Verilog Hardware Description Language and was implemented on Altera cyclone-II development board.
A Fast Reduced Kernel Extreme Learning Machine.
Deng, Wan-Yu; Ong, Yew-Soon; Zheng, Qing-Hua
2016-04-01
In this paper, we present a fast and accurate kernel-based supervised algorithm referred to as the Reduced Kernel Extreme Learning Machine (RKELM). In contrast to the work on Support Vector Machine (SVM) or Least Square SVM (LS-SVM), which identifies the support vectors or weight vectors iteratively, the proposed RKELM randomly selects a subset of the available data samples as support vectors (or mapping samples). By avoiding the iterative steps of SVM, significant cost savings in the training process can be readily attained, especially on Big datasets. RKELM is established based on the rigorous proof of universal learning involving reduced kernel-based SLFN. In particular, we prove that RKELM can approximate any nonlinear functions accurately under the condition of support vectors sufficiency. Experimental results on a wide variety of real world small instance size and large instance size applications in the context of binary classification, multi-class problem and regression are then reported to show that RKELM can perform at competitive level of generalized performance as the SVM/LS-SVM at only a fraction of the computational effort incurred. Copyright © 2015 Elsevier Ltd. All rights reserved.
[Study on application of SVM in prediction of coronary heart disease].
Zhu, Yue; Wu, Jianghua; Fang, Ying
2013-12-01
Base on the data of blood pressure, plasma lipid, Glu and UA by physical test, Support Vector Machine (SVM) was applied to identify coronary heart disease (CHD) in patients and non-CHD individuals in south China population for guide of further prevention and treatment of the disease. Firstly, the SVM classifier was built using radial basis kernel function, liner kernel function and polynomial kernel function, respectively. Secondly, the SVM penalty factor C and kernel parameter sigma were optimized by particle swarm optimization (PSO) and then employed to diagnose and predict the CHD. By comparison with those from artificial neural network with the back propagation (BP) model, linear discriminant analysis, logistic regression method and non-optimized SVM, the overall results of our calculation demonstrated that the classification performance of optimized RBF-SVM model could be superior to other classifier algorithm with higher accuracy rate, sensitivity and specificity, which were 94.51%, 92.31% and 96.67%, respectively. So, it is well concluded that SVM could be used as a valid method for assisting diagnosis of CHD.
Mohr, Johannes A; Jain, Brijnesh J; Obermayer, Klaus
2008-09-01
Quantitative structure activity relationship (QSAR) analysis is traditionally based on extracting a set of molecular descriptors and using them to build a predictive model. In this work, we propose a QSAR approach based directly on the similarity between the 3D structures of a set of molecules measured by a so-called molecule kernel, which is independent of the spatial prealignment of the compounds. Predictors can be build using the molecule kernel in conjunction with the potential support vector machine (P-SVM), a recently proposed machine learning method for dyadic data. The resulting models make direct use of the structural similarities between the compounds in the test set and a subset of the training set and do not require an explicit descriptor construction. We evaluated the predictive performance of the proposed method on one classification and four regression QSAR datasets and compared its results to the results reported in the literature for several state-of-the-art descriptor-based and 3D QSAR approaches. In this comparison, the proposed molecule kernel method performed better than the other QSAR methods.
Multi-pose facial correction based on Gaussian process with combined kernel function
NASA Astrophysics Data System (ADS)
Shi, Shuyan; Ji, Ruirui; Zhang, Fan
2018-04-01
In order to improve the recognition rate of various postures, this paper proposes a method of facial correction based on Gaussian Process which build a nonlinear regression model between the front and the side face with combined kernel function. The face images with horizontal angle from -45° to +45° can be properly corrected to front faces. Finally, Support Vector Machine is employed for face recognition. Experiments on CAS PEAL R1 face database show that Gaussian process can weaken the influence of pose changes and improve the accuracy of face recognition to certain extent.
Christoper J. Schmitt; A. Dennis Lemly; Parley V. Winger
1993-01-01
Data from several sources were collated and analyzed by correlation, regression, and principal components analysis to define surrrogate variables for use in the brook trout (Salvelinus fontinalis) habitat suitability index (HSI) model, and to evaluate the applicability of the model for assessing habitat in high elevation streams of the southern Blue Ridge Province (...
Ridge regression for predicting elastic moduli and hardness of calcium aluminosilicate glasses
NASA Astrophysics Data System (ADS)
Deng, Yifan; Zeng, Huidan; Jiang, Yejia; Chen, Guorong; Chen, Jianding; Sun, Luyi
2018-03-01
It is of great significance to design glasses with satisfactory mechanical properties predictively through modeling. Among various modeling methods, data-driven modeling is such a reliable approach that can dramatically shorten research duration, cut research cost and accelerate the development of glass materials. In this work, the ridge regression (RR) analysis was used to construct regression models for predicting the compositional dependence of CaO-Al2O3-SiO2 glass elastic moduli (Shear, Bulk, and Young’s moduli) and hardness based on the ternary diagram of the compositions. The property prediction over a large glass composition space was accomplished with known experimental data of various compositions in the literature, and the simulated results are in good agreement with the measured ones. This regression model can serve as a facile and effective tool for studying the relationship between the compositions and the property, enabling high-efficient design of glasses to meet the requirements for specific elasticity and hardness.
Lim, Changwon
2015-03-30
Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. Copyright © 2014 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System weremore » used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. - Highlights: • Hypertrophy (H) and hypertrophic carcinogenesis (C) were studied by toxicogenomics. • Important genes for H and C were selected by logistic ridge regression analysis. • Amino acid biosynthesis and oxidative responses may be involved in C. • Predictive models for H and C provided 94.8% and 82.7% accuracy, respectively. • The identified genes could be useful for assessment of liver hypertrophy.« less
Time reversal seismic imaging using laterally reflected surface waves in southern California
NASA Astrophysics Data System (ADS)
Tape, C.; Liu, Q.; Tromp, J.; Plesch, A.; Shaw, J. H.
2010-12-01
We use observed post-surface-wave seismic waveforms to image shallow (upper 10 km) lateral reflectors in southern California. Our imaging technique employs the 3D crustal model m16 of Tape et al. (2009), which is accurate for most local earthquakes over the period range 2-30 s. Model m16 captures the resonance of the major sedimentary basins in southern California, as well as some lateral surface wave reflections associated with these basins. We apply a 3D Gaussian smoothing function (12 km horizontal, 2 km vertical) to model m16. This smoothing has the effect of suppressing synthetic waveforms within the period range of interest (3-10 s) that are associated with reflections (single and multiple) from these basins. The smoothed 3D model serves as the background model within which we propagate an ``adjoint wavefield'' comprised of time-reversed windows of post-surface-wave coda waveforms that are initiated at the respective station locations. This adjoint wavefield constructively interferes with the regular wavefield in the locations of potential reflectors. The potential reflectors are revealed in an ``event kernel,'' which is the time-integrated volumetric field for each earthquake. By summing (or ``stacking'') the event kernels from 28 well-recorded earthquakes, we identify several reflectors using this imaging procedure. The most prominent lateral reflectors occur in proximity to: the southernmost San Joaquin basin, the Los Angeles basin, the San Pedro basin, the Ventura basin, the Manix basin, the San Clemente--Santa Cruz--Santa Barbara ridge, and isolated segments of the San Jacinto and San Andreas faults. The correspondence between observed coherent coda waveforms and the imaged reflectors provides a solid basis for interpreting the kernel features as material contrasts. The 3D spatial extent and amplitude of the kernel features provide constraints on the geometry and material contrast of the imaged reflectors.
Gradient descent for robust kernel-based regression
NASA Astrophysics Data System (ADS)
Guo, Zheng-Chu; Hu, Ting; Shi, Lei
2018-06-01
In this paper, we study the gradient descent algorithm generated by a robust loss function over a reproducing kernel Hilbert space (RKHS). The loss function is defined by a windowing function G and a scale parameter σ, which can include a wide range of commonly used robust losses for regression. There is still a gap between theoretical analysis and optimization process of empirical risk minimization based on loss: the estimator needs to be global optimal in the theoretical analysis while the optimization method can not ensure the global optimality of its solutions. In this paper, we aim to fill this gap by developing a novel theoretical analysis on the performance of estimators generated by the gradient descent algorithm. We demonstrate that with an appropriately chosen scale parameter σ, the gradient update with early stopping rules can approximate the regression function. Our elegant error analysis can lead to convergence in the standard L 2 norm and the strong RKHS norm, both of which are optimal in the mini-max sense. We show that the scale parameter σ plays an important role in providing robustness as well as fast convergence. The numerical experiments implemented on synthetic examples and real data set also support our theoretical results.
Liu, Wei; Wang, Zhen-Zhong; Qing, Jian-Ping; Li, Hong-Juan; Xiao, Wei
2014-01-01
Background: Peach kernels which contain kinds of fatty acids play an important role in the regulation of a variety of physiological and biological functions. Objective: To establish an innovative and rapid diffuse reflectance near-infrared spectroscopy (DR-NIR) analysis method along with chemometric techniques for the qualitative and quantitative determination of a peach kernel. Materials and Methods: Peach kernel samples from nine different origins were analyzed with high-performance liquid chromatography (HPLC) as a reference method. DR-NIR is in the spectral range 1100-2300 nm. Principal component analysis (PCA) and partial least squares regression (PLSR) algorithm were applied to obtain prediction models, The Savitzky-Golay derivative and first derivative were adopted for the spectral pre-processing, PCA was applied to classify the varieties of those samples. For the quantitative calibration, the models of linoleic and oleinic acids were established with the PLSR algorithm and the optimal principal component (PC) numbers were selected with leave-one-out (LOO) cross-validation. The established models were evaluated with the root mean square error of deviation (RMSED) and corresponding correlation coefficients (R2). Results: The PCA results of DR-NIR spectra yield clear classification of the two varieties of peach kernel. PLSR had a better predictive ability. The correlation coefficients of the two calibration models were above 0.99, and the RMSED of linoleic and oleinic acids were 1.266% and 1.412%, respectively. Conclusion: The DR-NIR combined with PCA and PLSR algorithm could be used efficiently to identify and quantify peach kernels and also help to solve variety problem. PMID:25422544
Liu, Shelley H; Bobb, Jennifer F; Lee, Kyu Ha; Gennings, Chris; Claus Henn, Birgit; Bellinger, David; Austin, Christine; Schnaas, Lourdes; Tellez-Rojo, Martha M; Hu, Howard; Wright, Robert O; Arora, Manish; Coull, Brent A
2018-07-01
The impact of neurotoxic chemical mixtures on children's health is a critical public health concern. It is well known that during early life, toxic exposures may impact cognitive function during critical time intervals of increased vulnerability, known as windows of susceptibility. Knowledge on time windows of susceptibility can help inform treatment and prevention strategies, as chemical mixtures may affect a developmental process that is operating at a specific life phase. There are several statistical challenges in estimating the health effects of time-varying exposures to multi-pollutant mixtures, such as: multi-collinearity among the exposures both within time points and across time points, and complex exposure-response relationships. To address these concerns, we develop a flexible statistical method, called lagged kernel machine regression (LKMR). LKMR identifies critical exposure windows of chemical mixtures, and accounts for complex non-linear and non-additive effects of the mixture at any given exposure window. Specifically, LKMR estimates how the effects of a mixture of exposures change with the exposure time window using a Bayesian formulation of a grouped, fused lasso penalty within a kernel machine regression (KMR) framework. A simulation study demonstrates the performance of LKMR under realistic exposure-response scenarios, and demonstrates large gains over approaches that consider each time window separately, particularly when serial correlation among the time-varying exposures is high. Furthermore, LKMR demonstrates gains over another approach that inputs all time-specific chemical concentrations together into a single KMR. We apply LKMR to estimate associations between neurodevelopment and metal mixtures in Early Life Exposures in Mexico and Neurotoxicology, a prospective cohort study of child health in Mexico City.
NASA Astrophysics Data System (ADS)
Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang
2017-01-01
Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods.
Adeli, Ehsan; Wu, Guorong; Saghafi, Behrouz; An, Le; Shi, Feng; Shen, Dinggang
2017-01-01
Feature selection methods usually select the most compact and relevant set of features based on their contribution to a linear regression model. Thus, these features might not be the best for a non-linear classifier. This is especially crucial for the tasks, in which the performance is heavily dependent on the feature selection techniques, like the diagnosis of neurodegenerative diseases. Parkinson’s disease (PD) is one of the most common neurodegenerative disorders, which progresses slowly while affects the quality of life dramatically. In this paper, we use the data acquired from multi-modal neuroimaging data to diagnose PD by investigating the brain regions, known to be affected at the early stages. We propose a joint kernel-based feature selection and classification framework. Unlike conventional feature selection techniques that select features based on their performance in the original input feature space, we select features that best benefit the classification scheme in the kernel space. We further propose kernel functions, specifically designed for our non-negative feature types. We use MRI and SPECT data of 538 subjects from the PPMI database, and obtain a diagnosis accuracy of 97.5%, which outperforms all baseline and state-of-the-art methods. PMID:28120883
Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods.
Vizcaíno, Iván P; Carrera, Enrique V; Muñoz-Romero, Sergio; Cumbal, Luis H; Rojo-Álvarez, José Luis
2017-10-16
Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer's kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer's kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem.
Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods
Vizcaíno, Iván P.; Muñoz-Romero, Sergio; Cumbal, Luis H.
2017-01-01
Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer’s kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer’s kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem. PMID:29035333
Fingerprint Ridge Density as a Potential Forensic Anthropological Tool for Sex Identification.
Dhall, Jasmine Kaur; Kapoor, Anup Kumar
2016-03-01
In cases of partial or poor print recovery and lack of database/suspect print, fingerprint evidence is generally neglected. In light of such constraints, this study was designed to examine whether ridge density can aid in narrowing down the investigation for sex identification. The study was conducted on the right-hand index digit of 245 males and 246 females belonging to the Punjabis of Delhi region. Five ridge density count areas, namely upper radial, radial, ulnar, upper ulnar, and proximal, were selected and designated. Probability of sex origin was calculated, and stepwise discriminant function analysis was performed to determine the discriminating ability of the selected areas. Females were observed with a significantly higher ridge density than males in all the five areas. Discriminant function analysis and logistic regression exhibited 96.8% and 97.4% accuracy, respectively, in sex identification. Hence, fingerprint ridge density is a potential tool for sex identification, even from partial prints. © 2015 American Academy of Forensic Sciences.
Source Region Identification Using Kernel Smoothing
As described in this paper, Nonparametric Wind Regression is a source-to-receptor source apportionment model that can be used to identify and quantify the impact of possible source regions of pollutants as defined by wind direction sectors. It is described in detail with an exam...
Thompson, T.A.; Lepper, K.; Endres, A.L.; Johnston, J.W.; Baedke, S.J.; Argyilan, E.P.; Booth, R.K.; Wilcox, D.A.
2011-01-01
The Nipissing phase was the last pre-modern high-water stage of the upper Great Lakes. Represented as either a one- or two-peak highstand, the Nipissing occurred following a long-term lake-level rise. This transgression was primarily an erosional event with only the final stage of the transgression preserved as barriers, spits, and strandplains of beach ridges. South of Alpena, Michigan, mid to late Holocene coastal deposits occur as a strandplain between Devils Lake and Lake Huron. The landward part of this strandplain is a higher elevation platform that formed during the final stage of lake-level rise to the Nipissing peak. The pre-Nipissing shoreline transgressed over Devils Lake lagoonal deposits from 6.4 to 6.1. ka. The first beach ridge formed ~ 6. ka, and then the shoreline advanced toward Lake Huron, producing beach ridges about every 70. years. This depositional regression produced a slightly thickening wedge of sediment during a lake-level rise that formed 20 beach ridges. The rise ended at 4.5. ka at the Nipissing peak. This peak was short-lived, as lake level fell > 4. m during the following 500. years. During this lake-level rise and subsequent fall, the shoreline underwent several forms of shoreline behavior, including erosional transgression, aggradation, depositional transgression, depositional regression, and forced regression. Other upper Great Lakes Nipissing platforms indicate that the lake-level change observed at Alpena of a rapid pre-Nipissing lake-level rise followed by a slower rise to the Nipissing peak, and a post-Nipissing rapid lake-level fall is representative of mid Holocene lake level in the upper Great Lakes. ?? 2011 Elsevier B.V.
Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C
2018-06-29
A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.
Stable Local Volatility Calibration Using Kernel Splines
NASA Astrophysics Data System (ADS)
Coleman, Thomas F.; Li, Yuying; Wang, Cheng
2010-09-01
We propose an optimization formulation using L1 norm to ensure accuracy and stability in calibrating a local volatility function for option pricing. Using a regularization parameter, the proposed objective function balances the calibration accuracy with the model complexity. Motivated by the support vector machine learning, the unknown local volatility function is represented by a kernel function generating splines and the model complexity is controlled by minimizing the 1-norm of the kernel coefficient vector. In the context of the support vector regression for function estimation based on a finite set of observations, this corresponds to minimizing the number of support vectors for predictability. We illustrate the ability of the proposed approach to reconstruct the local volatility function in a synthetic market. In addition, based on S&P 500 market index option data, we demonstrate that the calibrated local volatility surface is simple and resembles the observed implied volatility surface in shape. Stability is illustrated by calibrating local volatility functions using market option data from different dates.
Kan, Hirohito; Kasai, Harumasa; Arai, Nobuyuki; Kunitomo, Hiroshi; Hirose, Yasujiro; Shibamoto, Yuta
2016-09-01
An effective background field removal technique is desired for more accurate quantitative susceptibility mapping (QSM) prior to dipole inversion. The aim of this study was to evaluate the accuracy of regularization enabled sophisticated harmonic artifact reduction for phase data with varying spherical kernel sizes (REV-SHARP) method using a three-dimensional head phantom and human brain data. The proposed REV-SHARP method used the spherical mean value operation and Tikhonov regularization in the deconvolution process, with varying 2-14mm kernel sizes. The kernel sizes were gradually reduced, similar to the SHARP with varying spherical kernel (VSHARP) method. We determined the relative errors and relationships between the true local field and estimated local field in REV-SHARP, VSHARP, projection onto dipole fields (PDF), and regularization enabled SHARP (RESHARP). Human experiment was also conducted using REV-SHARP, VSHARP, PDF, and RESHARP. The relative errors in the numerical phantom study were 0.386, 0.448, 0.838, and 0.452 for REV-SHARP, VSHARP, PDF, and RESHARP. REV-SHARP result exhibited the highest correlation between the true local field and estimated local field. The linear regression slopes were 1.005, 1.124, 0.988, and 0.536 for REV-SHARP, VSHARP, PDF, and RESHARP in regions of interest on the three-dimensional head phantom. In human experiments, no obvious errors due to artifacts were present in REV-SHARP. The proposed REV-SHARP is a new method combined with variable spherical kernel size and Tikhonov regularization. This technique might make it possible to be more accurate backgroud field removal and help to achive better accuracy of QSM. Copyright © 2016 Elsevier Inc. All rights reserved.
A Comparison of Strategies for Estimating Conditional DIF
ERIC Educational Resources Information Center
Moses, Tim; Miao, Jing; Dorans, Neil J.
2010-01-01
In this study, the accuracies of four strategies were compared for estimating conditional differential item functioning (DIF), including raw data, logistic regression, log-linear models, and kernel smoothing. Real data simulations were used to evaluate the estimation strategies across six items, DIF and No DIF situations, and four sample size…
Building machine learning force fields for nanoclusters
NASA Astrophysics Data System (ADS)
Zeni, Claudio; Rossi, Kevin; Glielmo, Aldo; Fekete, Ádám; Gaston, Nicola; Baletto, Francesca; De Vita, Alessandro
2018-06-01
We assess Gaussian process (GP) regression as a technique to model interatomic forces in metal nanoclusters by analyzing the performance of 2-body, 3-body, and many-body kernel functions on a set of 19-atom Ni cluster structures. We find that 2-body GP kernels fail to provide faithful force estimates, despite succeeding in bulk Ni systems. However, both 3- and many-body kernels predict forces within an ˜0.1 eV/Å average error even for small training datasets and achieve high accuracy even on out-of-sample, high temperature structures. While training and testing on the same structure always provide satisfactory accuracy, cross-testing on dissimilar structures leads to higher prediction errors, posing an extrapolation problem. This can be cured using heterogeneous training on databases that contain more than one structure, which results in a good trade-off between versatility and overall accuracy. Starting from a 3-body kernel trained this way, we build an efficient non-parametric 3-body force field that allows accurate prediction of structural properties at finite temperatures, following a newly developed scheme [A. Glielmo et al., Phys. Rev. B 95, 214302 (2017)]. We use this to assess the thermal stability of Ni19 nanoclusters at a fractional cost of full ab initio calculations.
Kernel machine methods for integrative analysis of genome-wide methylation and genotyping studies.
Zhao, Ni; Zhan, Xiang; Huang, Yen-Tsung; Almli, Lynn M; Smith, Alicia; Epstein, Michael P; Conneely, Karen; Wu, Michael C
2018-03-01
Many large GWAS consortia are expanding to simultaneously examine the joint role of DNA methylation in addition to genotype in the same subjects. However, integrating information from both data types is challenging. In this paper, we propose a composite kernel machine regression model to test the joint epigenetic and genetic effect. Our approach works at the gene level, which allows for a common unit of analysis across different data types. The model compares the pairwise similarities in the phenotype to the pairwise similarities in the genotype and methylation values; and high correspondence is suggestive of association. A composite kernel is constructed to measure the similarities in the genotype and methylation values between pairs of samples. We demonstrate through simulations and real data applications that the proposed approach can correctly control type I error, and is more robust and powerful than using only the genotype or methylation data in detecting trait-associated genes. We applied our method to investigate the genetic and epigenetic regulation of gene expression in response to stressful life events using data that are collected from the Grady Trauma Project. Within the kernel machine testing framework, our methods allow for heterogeneity in effect sizes, nonlinear, and interactive effects, as well as rapid P-value computation. © 2017 WILEY PERIODICALS, INC.
Kim, Jongin; Park, Hyeong-jun
2016-01-01
The purpose of this study is to classify EEG data on imagined speech in a single trial. We recorded EEG data while five subjects imagined different vowels, /a/, /e/, /i/, /o/, and /u/. We divided each single trial dataset into thirty segments and extracted features (mean, variance, standard deviation, and skewness) from all segments. To reduce the dimension of the feature vector, we applied a feature selection algorithm based on the sparse regression model. These features were classified using a support vector machine with a radial basis function kernel, an extreme learning machine, and two variants of an extreme learning machine with different kernels. Because each single trial consisted of thirty segments, our algorithm decided the label of the single trial by selecting the most frequent output among the outputs of the thirty segments. As a result, we observed that the extreme learning machine and its variants achieved better classification rates than the support vector machine with a radial basis function kernel and linear discrimination analysis. Thus, our results suggested that EEG responses to imagined speech could be successfully classified in a single trial using an extreme learning machine with a radial basis function and linear kernel. This study with classification of imagined speech might contribute to the development of silent speech BCI systems. PMID:28097128
Tensor manifold-based extreme learning machine for 2.5-D face recognition
NASA Astrophysics Data System (ADS)
Chong, Lee Ying; Ong, Thian Song; Teoh, Andrew Beng Jin
2018-01-01
We explore the use of the Gabor regional covariance matrix (GRCM), a flexible matrix-based descriptor that embeds the Gabor features in the covariance matrix, as a 2.5-D facial descriptor and an effective means of feature fusion for 2.5-D face recognition problems. Despite its promise, matching is not a trivial problem for GRCM since it is a special instance of a symmetric positive definite (SPD) matrix that resides in non-Euclidean space as a tensor manifold. This implies that GRCM is incompatible with the existing vector-based classifiers and distance matchers. Therefore, we bridge the gap of the GRCM and extreme learning machine (ELM), a vector-based classifier for the 2.5-D face recognition problem. We put forward a tensor manifold-compliant ELM and its two variants by embedding the SPD matrix randomly into reproducing kernel Hilbert space (RKHS) via tensor kernel functions. To preserve the pair-wise distance of the embedded data, we orthogonalize the random-embedded SPD matrix. Hence, classification can be done using a simple ridge regressor, an integrated component of ELM, on the random orthogonal RKHS. Experimental results show that our proposed method is able to improve the recognition performance and further enhance the computational efficiency.
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi
2017-03-01
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Ghadiriyan Arani, M.; Pahlavani, P.; Effati, M.; Noori Alamooti, F.
2017-09-01
Today, one of the social problems influencing on the lives of many people is the road traffic crashes especially the highway ones. In this regard, this paper focuses on highway of capital and the most populous city in the U.S. state of Georgia and the ninth largest metropolitan area in the United States namely Atlanta. Geographically weighted regression and general centrality criteria are the aspects of traffic used for this article. In the first step, in order to estimate of crash intensity, it is needed to extract the dual graph from the status of streets and highways to use general centrality criteria. With the help of the graph produced, the criteria are: Degree, Pageranks, Random walk, Eccentricity, Closeness, Betweenness, Clustering coefficient, Eigenvector, and Straightness. The intensity of crash point is counted for every highway by dividing the number of crashes in that highway to the total number of crashes. Intensity of crash point is calculated for each highway. Then, criteria and crash point were normalized and the correlation between them was calculated to determine the criteria that are not dependent on each other. The proposed hybrid approach is a good way to regression issues because these effective measures result to a more desirable output. R2 values for geographically weighted regression using the Gaussian kernel was 0.539 and also 0.684 was obtained using a triple-core cube. The results showed that the triple-core cube kernel is better for modeling the crash intensity.
NASA Astrophysics Data System (ADS)
Ilie, Iulia; Dittrich, Peter; Carvalhais, Nuno; Jung, Martin; Heinemeyer, Andreas; Migliavacca, Mirco; Morison, James I. L.; Sippel, Sebastian; Subke, Jens-Arne; Wilkinson, Matthew; Mahecha, Miguel D.
2017-09-01
Accurate model representation of land-atmosphere carbon fluxes is essential for climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments, complemented by a steadily evolving body of mechanistic theory, provides the main basis for developing such models. The strongly increasing availability of measurements may facilitate new ways of identifying suitable model structures using machine learning. Here, we explore the potential of gene expression programming (GEP) to derive relevant model formulations based solely on the signals present in data by automatically applying various mathematical transformations to potential predictors and repeatedly evolving the resulting model structures. In contrast to most other machine learning regression techniques, the GEP approach generates readable
models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions, with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (random forests, support vector machines, artificial neural networks, and kernel ridge regressions). Based on real observations we explore the responses of the different components of terrestrial respiration at an oak forest in south-eastern England. We find that the GEP-retrieved models are often better in prediction than some established respiration models. Based on their structures, we find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. We noticed that the GEP models are only partly portable across respiration components, the identification of a general
terrestrial respiration model possibly prevented by equifinality issues. Overall, GEP is a promising tool for uncovering new model structures for terrestrial ecology in the data-rich era, complementing more traditional modelling approaches.
Heterogeneity in Schooling Rates of Return
ERIC Educational Resources Information Center
Henderson, Daniel J.; Polachek, Solomon W.; Wang, Le
2011-01-01
This paper relaxes the assumption of homogeneous rates of return to schooling by employing nonparametric kernel regression. This approach allows us to examine the differences in rates of return to education both across and within groups. Similar to previous studies we find that on average blacks have higher returns to education than whites,…
Nonparametric Item Response Curve Estimation with Correction for Measurement Error
ERIC Educational Resources Information Center
Guo, Hongwen; Sinharay, Sandip
2011-01-01
Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…
On the role of cost-sensitive learning in multi-class brain-computer interfaces.
Devlaminck, Dieter; Waegeman, Willem; Wyns, Bart; Otte, Georges; Santens, Patrick
2010-06-01
Brain-computer interfaces (BCIs) present an alternative way of communication for people with severe disabilities. One of the shortcomings in current BCI systems, recently put forward in the fourth BCI competition, is the asynchronous detection of motor imagery versus resting state. We investigated this extension to the three-class case, in which the resting state is considered virtually lying between two motor classes, resulting in a large penalty when one motor task is misclassified into the other motor class. We particularly focus on the behavior of different machine-learning techniques and on the role of multi-class cost-sensitive learning in such a context. To this end, four different kernel methods are empirically compared, namely pairwise multi-class support vector machines (SVMs), two cost-sensitive multi-class SVMs and kernel-based ordinal regression. The experimental results illustrate that ordinal regression performs better than the other three approaches when a cost-sensitive performance measure such as the mean-squared error is considered. By contrast, multi-class cost-sensitive learning enables us to control the number of large errors made between two motor tasks.
Predicting β-Turns in Protein Using Kernel Logistic Regression
Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, FangXiang; Li, Min
2013-01-01
A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case. PMID:23509793
Predicting β-turns in protein using kernel logistic regression.
Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, Fangxiang; Li, Min
2013-01-01
A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case.
Analytics of Radioactive Materials Released in the Fukushima Daiichi Nuclear Accident
DOE Office of Scientific and Technical Information (OSTI.GOV)
Egarievwe, Stephen U.; Nuclear Engineering Department, University of Tennessee, Knoxville, TN; Coble, Jamie B.
The 2011 Fukushima Daiichi nuclear accident in Japan resulted in the release of radioactive materials into the atmosphere, the nearby sea, and the surrounding land. Following the accident, several meteorological models were used to predict the transport of the radioactive materials to other continents such as North America and Europe. Also of high importance is the dispersion of radioactive materials locally and within Japan. Based on the International Atomic Energy Agency (IAEA) Convention on Early Notification of a nuclear accident, several radiological data sets were collected on the accident by the Japanese authorities. Among the radioactive materials monitored, are I-131more » and Cs-137 which form the major contributions to the contamination of drinking water. The radiation dose in the atmosphere was also measured. It is impractical to measure contamination and radiation dose in every place of interest. Therefore, modeling helps to predict contamination and radiation dose. Some modeling studies that have been reported in the literature include the simulation of transport and deposition of I-131 and Cs-137 from the accident, Cs-137 deposition and contamination of Japanese soils, and preliminary estimates of I-131 and Cs-137 discharged from the plant into the atmosphere. In this paper, we present statistical analytics of I-131 and Cs-137 with the goal of predicting gamma dose from the Fukushima Daiichi nuclear accident. The data sets used in our study were collected from the IAEA Fukushima Monitoring Database. As part of this study, we investigated several regression models to find the best algorithm for modeling the gamma dose. The modeling techniques used in our study include linear regression, principal component regression (PCR), partial least square (PLS) regression, and ridge regression. Our preliminary results on the first set of data showed that the linear regression model with one variable was the best with a root mean square error of 0.0133 μSv/h, compared to 0.0210 μSv/h for PCR, 0.231 μSv/h for ridge regression L-curve, 0.0856 μSv/h for PLS, and 0.0860 μSv/h for ridge regression cross validation. Complete results using the full datasets for these models will also be presented. (authors)« less
Kernel approach to molecular similarity based on iterative graph similarity.
Rupp, Matthias; Proschak, Ewgenij; Schneider, Gisbert
2007-01-01
Similarity measures for molecules are of basic importance in chemical, biological, and pharmaceutical applications. We introduce a molecular similarity measure defined directly on the annotated molecular graph, based on iterative graph similarity and optimal assignments. We give an iterative algorithm for the computation of the proposed molecular similarity measure, prove its convergence and the uniqueness of the solution, and provide an upper bound on the required number of iterations necessary to achieve a desired precision. Empirical evidence for the positive semidefiniteness of certain parametrizations of our function is presented. We evaluated our molecular similarity measure by using it as a kernel in support vector machine classification and regression applied to several pharmaceutical and toxicological data sets, with encouraging results.
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method
Zhang, Tingting; Kou, S. C.
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure. PMID:21258615
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.
Zhang, Tingting; Kou, S C
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Robust kernel representation with statistical local features for face recognition.
Yang, Meng; Zhang, Lei; Shiu, Simon Chi-Keung; Zhang, David
2013-06-01
Factors such as misalignment, pose variation, and occlusion make robust face recognition a difficult problem. It is known that statistical features such as local binary pattern are effective for local feature extraction, whereas the recently proposed sparse or collaborative representation-based classification has shown interesting results in robust face recognition. In this paper, we propose a novel robust kernel representation model with statistical local features (SLF) for robust face recognition. Initially, multipartition max pooling is used to enhance the invariance of SLF to image registration error. Then, a kernel-based representation model is proposed to fully exploit the discrimination information embedded in the SLF, and robust regression is adopted to effectively handle the occlusion in face images. Extensive experiments are conducted on benchmark face databases, including extended Yale B, AR (A. Martinez and R. Benavente), multiple pose, illumination, and expression (multi-PIE), facial recognition technology (FERET), face recognition grand challenge (FRGC), and labeled faces in the wild (LFW), which have different variations of lighting, expression, pose, and occlusions, demonstrating the promising performance of the proposed method.
Buck, Christoph; Kneib, Thomas; Tkaczick, Tobias; Konstabel, Kenn; Pigeot, Iris
2015-12-22
Built environment studies provide broad evidence that urban characteristics influence physical activity (PA). However, findings are still difficult to compare, due to inconsistent measures assessing urban point characteristics and varying definitions of spatial scale. Both were found to influence the strength of the association between the built environment and PA. We simultaneously evaluated the effect of kernel approaches and network-distances to investigate the association between urban characteristics and physical activity depending on spatial scale and intensity measure. We assessed urban measures of point characteristics such as intersections, public transit stations, and public open spaces in ego-centered network-dependent neighborhoods based on geographical data of one German study region of the IDEFICS study. We calculated point intensities using the simple intensity and kernel approaches based on fixed bandwidths, cross-validated bandwidths including isotropic and anisotropic kernel functions and considering adaptive bandwidths that adjust for residential density. We distinguished six network-distances from 500 m up to 2 km to calculate each intensity measure. A log-gamma regression model was used to investigate the effect of each urban measure on moderate-to-vigorous physical activity (MVPA) of 400 2- to 9.9-year old children who participated in the IDEFICS study. Models were stratified by sex and age groups, i.e. pre-school children (2 to <6 years) and school children (6-9.9 years), and were adjusted for age, body mass index (BMI), education and safety concerns of parents, season and valid weartime of accelerometers. Association between intensity measures and MVPA strongly differed by network-distance, with stronger effects found for larger network-distances. Simple intensity revealed smaller effect estimates and smaller goodness-of-fit compared to kernel approaches. Smallest variation in effect estimates over network-distances was found for kernel intensity measures based on isotropic and anisotropic cross-validated bandwidth selection. We found a strong variation in the association between the built environment and PA of children based on the choice of intensity measure and network-distance. Kernel intensity measures provided stable results over various scales and improved the assessment compared to the simple intensity measure. Considering different spatial scales and kernel intensity methods might reduce methodological limitations in assessing opportunities for PA in the built environment.
Mohammadi Moghaddam, Toktam; Razavi, Seyed M A; Taghizadeh, Masoud; Sazgarnia, Ameneh
2016-01-01
Roasting is an important step in the processing of pistachio nuts. The effect of hot air roasting temperature (90, 120 and 150 °C), time (20, 35 and 50 min) and air velocity (0.5, 1.5 and 2.5 m/s) on textural and sensory characteristics of pistachio nuts and kernels were investigated. The results showed that increasing the roasting temperature decreased the fracture force (82-25.54 N), instrumental hardness (82.76-37.59 N), apparent modulus of elasticity (47-21.22 N/s), compressive energy (280.73-101.18 N.s) and increased amount of bitterness (1-2.5) and the hardness score (6-8.40) of pistachio kernels. Higher roasting time improved the flavor of samples. The results of the consumer test showed that the roasted pistachio kernels have good acceptability for flavor (score 5.83-8.40), color (score 7.20-8.40) and hardness (score 6-8.40) acceptance. Moreover, Partial Least Square (PLS) analysis of instrumental and sensory data provided important information for the correlation of objective and subjective properties. The univariate analysis showed that over 93.87 % of the variation in sensory hardness and almost 87 % of the variation in sensory acceptability could be explained by instrumental texture properties.
NASA Astrophysics Data System (ADS)
Popov, Aleksey
2013-04-01
The magnetic field of the Earth has global meaning for a life on the Earth. The world geophysical science explains: - occurrence of a magnetic field of the Earth it is transformation of kinetic energy of movements of the fused iron in the liquid core of Earth - into the magnetic energy; - the warming up of a kernel of the Earth occurs due to radioactive disintegration of elements, with excretion of thermal energy. The world science does not define the reasons: - drift of a magnetic dipole on 0,2 a year to the West; - drift of lithospheric slabs and continents. The author offers: an alternative variant existing in a world science the theories "Geodynamo" - it is the theory « the Magnetic field of the Earth », created on the basis of physical laws. Education of a magnetic field of the Earth occurs at moving the electric charge located in a liquid kernel, at rotation of the Earth. At calculation of a magnetic field is used law the Bio Savara for a ring electric current: dB = . Magnetic induction in a kernel of the Earth: B = 2,58 Gs. According to the law of electromagnetic induction the Faradey, rotation of a iron kernel of the Earth in magnetic field causes occurrence of an electric field Emf which moves electrons from the center of a kernel towards the mantle. So of arise the radial electric currents. The magnetic field amplifies the iron of mantle and a kernel of the Earth. As a result of action of a radial electric field the electrons will flow from the center of a kernel in a layer of an electric charge. The central part of a kernel represents the field with a positive electric charge, which creates inverse magnetic field Binv and Emfinv When ?mfinv = ?mf ; ?inv = B, there will be an inversion a magnetic field of the Earth. It is a fact: drift of a magnetic dipole of the Earth in the western direction approximately 0,2 longitude, into a year. Radial electric currents a actions with the basic magnetic field of a Earth - it turn a kernel. It coincides with laws of electromagnetism. According to a rule of the left hand: if the magnetic field in a kernel is directed to drawing, electric current are directed to an axis of rotation of the Earth, - a action of force clockwise (to West). Definition of the force causing drift a kernel according to the law of Ampere F = IBlsin. Powerful force 3,5 × 1012 Nyton, what makes drift of the central part of a kernel of the Earth on 0,2 the longitude in year to West, and also it is engine of the mechanism of movement of slabs together with continents. Movement of a core of the Earth carry out around of a terrestrial axis one circulation in the western direction in 2000 of years. Linear speed of rotation of a kernel concerning a mantle on border the mantle a kernel: V = × 3,471 × 10 = 3,818 × 10 m/s = 33 m/day = 12 km/years. Considering greater viscosity of a mantle, the powerful energy at rotation of a kernel seize a mantle and lithospheric slabs and makes their collisions as a result of which there are earthquakes and volcano. Continents Northern and Southern America every year separate from the Europe and Africa on several centimeters. Atlantic ocean as a result of movement of these slabs with such speed was formed for 200 million years, that in comparison with the age of the Earth - several billions years, not so long time. Drift of a kernel in the western direction is a principal cause of delay of speed of rotation of the Earth. Flow of radial electric currents allot according to the law of Joule - Lenz, the quantity of warmth : Q = I2Rt = IUt, of thermal energy 6,92 × 1017 calories/year. This defines heating of a kernel and the Earth as a whole. In the valley of the median-Atlantic ridge having numerous volcanos, the lava flow constantly thus warm up waters of Atlantic ocean. It is a fact the warm current Gulf Stream. Thawing of a permafrost and ices of Arctic ocean, of glaciers of Greenland and Antarctica is acknowledgement: the warmth of earth defines character of thawing of glaciers and a permafrost. This is a global warming. The version of the author: the periods of inversion of a magnetic field of the Earth determine cycles of the Ice Age. At inversions of a magnetic field when B=0, radial electric currents are small or are absent, excretion of thermal energy minimally or an equal to zero,it is the beginning of the cooling the Earth and offensive of the Ice Age. Disappearance warm current Gulf Stream warming the north of the Europe and Canada. Drift of a magnetic dipole of the Earth in a rotation the opposite to rotation of the Earth, is acknowledgement of drift of a kernel of the Earth in a rotation the opposite to rotation of the Earth and is acknowledgement of the theory « the Magnetic field of the Earth ». The author continues to develop the theory « the Magnetic field of the Earth » and invites geophysicists to accept in it participation in it.
Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors.
Woodard, Dawn B; Crainiceanu, Ciprian; Ruppert, David
2013-01-01
We propose a new method for regression using a parsimonious and scientifically interpretable representation of functional predictors. Our approach is designed for data that exhibit features such as spikes, dips, and plateaus whose frequency, location, size, and shape varies stochastically across subjects. We propose Bayesian inference of the joint functional and exposure models, and give a method for efficient computation. We contrast our approach with existing state-of-the-art methods for regression with functional predictors, and show that our method is more effective and efficient for data that include features occurring at varying locations. We apply our methodology to a large and complex dataset from the Sleep Heart Health Study, to quantify the association between sleep characteristics and health outcomes. Software and technical appendices are provided in online supplemental materials.
Quantifying predictive capability of electronic health records for the most harmful breast cancer
NASA Astrophysics Data System (ADS)
Wu, Yirong; Fan, Jun; Peissig, Peggy; Berg, Richard; Tafti, Ahmad Pahlavan; Yin, Jie; Yuan, Ming; Page, David; Cox, Jennifer; Burnside, Elizabeth S.
2018-03-01
Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and LassoLR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.
Quantifying predictive capability of electronic health records for the most harmful breast cancer.
Wu, Yirong; Fan, Jun; Peissig, Peggy; Berg, Richard; Tafti, Ahmad Pahlavan; Yin, Jie; Yuan, Ming; Page, David; Cox, Jennifer; Burnside, Elizabeth S
2018-02-01
Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.
Characterization of Microbiota in Children with Chronic Functional Constipation.
de Meij, Tim G J; de Groot, Evelien F J; Eck, Anat; Budding, Andries E; Kneepkens, C M Frank; Benninga, Marc A; van Bodegraven, Adriaan A; Savelkoul, Paul H M
2016-01-01
Disruption of the intestinal microbiota is considered an etiological factor in pediatric functional constipation. Scientifically based selection of potential beneficial probiotic strains in functional constipation therapy is not feasible due to insufficient knowledge of microbiota composition in affected subjects. The aim of this study was to describe microbial composition and diversity in children with functional constipation, compared to healthy controls. Fecal samples from 76 children diagnosed with functional constipation according to the Rome III criteria (median age 8.0 years; range 4.2-17.8) were analyzed by IS-pro, a PCR-based microbiota profiling method. Outcome was compared with intestinal microbiota profiles of 61 healthy children (median 8.6 years; range 4.1-17.9). Microbiota dissimilarity was depicted by principal coordinate analysis (PCoA), diversity was calculated by Shannon diversity index. To determine the most discriminative species, cross validated logistic ridge regression was performed. Applying total microbiota profiles (all phyla together) or per phylum analysis, no disease-specific separation was observed by PCoA and by calculation of diversity indices. By ridge regression, however, functional constipation and controls could be discriminated with 82% accuracy. Most discriminative species were Bacteroides fragilis, Bacteroides ovatus, Bifidobacterium longum, Parabacteroides species (increased in functional constipation) and Alistipes finegoldii (decreased in functional constipation). None of the commonly used unsupervised statistical methods allowed for microbiota-based discrimination of children with functional constipation and controls. By ridge regression, however, both groups could be discriminated with 82% accuracy. Optimization of microbiota-based interventions in constipated children warrants further characterization of microbial signatures linked to clinical subgroups of functional constipation.
Constant size descriptors for accurate machine learning models of molecular properties
NASA Astrophysics Data System (ADS)
Collins, Christopher R.; Gordon, Geoffrey J.; von Lilienfeld, O. Anatole; Yaron, David J.
2018-06-01
Two different classes of molecular representations for use in machine learning of thermodynamic and electronic properties are studied. The representations are evaluated by monitoring the performance of linear and kernel ridge regression models on well-studied data sets of small organic molecules. One class of representations studied here counts the occurrence of bonding patterns in the molecule. These require only the connectivity of atoms in the molecule as may be obtained from a line diagram or a SMILES string. The second class utilizes the three-dimensional structure of the molecule. These include the Coulomb matrix and Bag of Bonds, which list the inter-atomic distances present in the molecule, and Encoded Bonds, which encode such lists into a feature vector whose length is independent of molecular size. Encoded Bonds' features introduced here have the advantage of leading to models that may be trained on smaller molecules and then used successfully on larger molecules. A wide range of feature sets are constructed by selecting, at each rank, either a graph or geometry-based feature. Here, rank refers to the number of atoms involved in the feature, e.g., atom counts are rank 1, while Encoded Bonds are rank 2. For atomization energies in the QM7 data set, the best graph-based feature set gives a mean absolute error of 3.4 kcal/mol. Inclusion of 3D geometry substantially enhances the performance, with Encoded Bonds giving 2.4 kcal/mol, when used alone, and 1.19 kcal/mol, when combined with graph features.
Estimating Mixture of Gaussian Processes by Kernel Smoothing
Huang, Mian; Li, Runze; Wang, Hansheng; Yao, Weixin
2014-01-01
When the functional data are not homogeneous, e.g., there exist multiple classes of functional curves in the dataset, traditional estimation methods may fail. In this paper, we propose a new estimation procedure for the Mixture of Gaussian Processes, to incorporate both functional and inhomogeneous properties of the data. Our method can be viewed as a natural extension of high-dimensional normal mixtures. However, the key difference is that smoothed structures are imposed for both the mean and covariance functions. The model is shown to be identifiable, and can be estimated efficiently by a combination of the ideas from EM algorithm, kernel regression, and functional principal component analysis. Our methodology is empirically justified by Monte Carlo simulations and illustrated by an analysis of a supermarket dataset. PMID:24976675
Studying Canonical Analysis: A Reply to Thorndike's Comments
ERIC Educational Resources Information Center
Barcikowski, Robert S.; Stevens, James P.
1976-01-01
This article is a rejoinder to TM 502 249. Each of Thorndike's comments are examined. A possible solution to the large number of subjects necessary for stable weights and variate-variable correlations using ridge regression procedures is suggested. (RC)
NASA Astrophysics Data System (ADS)
Cheng, Jun-Hu; Jin, Huali; Liu, Zhiwei
2018-01-01
The feasibility of developing a multispectral imaging method using important wavelengths from hyperspectral images selected by genetic algorithm (GA), successive projection algorithm (SPA) and regression coefficient (RC) methods for modeling and predicting protein content in peanut kernel was investigated for the first time. Partial least squares regression (PLSR) calibration model was established between the spectral data from the selected optimal wavelengths and the reference measured protein content ranged from 23.46% to 28.43%. The RC-PLSR model established using eight key wavelengths (1153, 1567, 1972, 2143, 2288, 2339, 2389 and 2446 nm) showed the best predictive results with the coefficient of determination of prediction (R2P) of 0.901, and root mean square error of prediction (RMSEP) of 0.108 and residual predictive deviation (RPD) of 2.32. Based on the obtained best model and image processing algorithms, the distribution maps of protein content were generated. The overall results of this study indicated that developing a rapid and online multispectral imaging system using the feature wavelengths and PLSR analysis is potential and feasible for determination of the protein content in peanut kernels.
NASA Astrophysics Data System (ADS)
Elnasir, Selma; Shamsuddin, Siti Mariyam; Farokhi, Sajad
2015-01-01
Palm vein recognition (PVR) is a promising new biometric that has been applied successfully as a method of access control by many organizations, which has even further potential in the field of forensics. The palm vein pattern has highly discriminative features that are difficult to forge because of its subcutaneous position in the palm. Despite considerable progress and a few practical issues, providing accurate palm vein readings has remained an unsolved issue in biometrics. We propose a robust and more accurate PVR method based on the combination of wavelet scattering (WS) with spectral regression kernel discriminant analysis (SRKDA). As the dimension of WS generated features is quite large, SRKDA is required to reduce the extracted features to enhance the discrimination. The results based on two public databases-PolyU Hyper Spectral Palmprint public database and PolyU Multi Spectral Palmprint-show the high performance of the proposed scheme in comparison with state-of-the-art methods. The proposed approach scored a 99.44% identification rate and a 99.90% verification rate [equal error rate (EER)=0.1%] for the hyperspectral database and a 99.97% identification rate and a 99.98% verification rate (EER=0.019%) for the multispectral database.
Asymptotics of nonparametric L-1 regression models with dependent data
ZHAO, ZHIBIAO; WEI, YING; LIN, DENNIS K.J.
2013-01-01
We investigate asymptotic properties of least-absolute-deviation or median quantile estimates of the location and scale functions in nonparametric regression models with dependent data from multiple subjects. Under a general dependence structure that allows for longitudinal data and some spatially correlated data, we establish uniform Bahadur representations for the proposed median quantile estimates. The obtained Bahadur representations provide deep insights into the asymptotic behavior of the estimates. Our main theoretical development is based on studying the modulus of continuity of kernel weighted empirical process through a coupling argument. Progesterone data is used for an illustration. PMID:24955016
Regression-assisted deconvolution.
McIntyre, Julie; Stefanski, Leonard A
2011-06-30
We present a semi-parametric deconvolution estimator for the density function of a random variable biX that is measured with error, a common challenge in many epidemiological studies. Traditional deconvolution estimators rely only on assumptions about the distribution of X and the error in its measurement, and ignore information available in auxiliary variables. Our method assumes the availability of a covariate vector statistically related to X by a mean-variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Simulations suggest that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. We illustrate the method using anthropometric measurements of newborns to estimate the density function of newborn length. Copyright © 2011 John Wiley & Sons, Ltd.
Encoding Dissimilarity Data for Statistical Model Building.
Wahba, Grace
2010-12-01
We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332-1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128-8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison.
Yip, T C-F; Ma, A J; Wong, V W-S; Tse, Y-K; Chan, H L-Y; Yuen, P-C; Wong, G L-H
2017-08-01
Non-alcoholic fatty liver disease (NAFLD) affects 20%-40% of the general population in developed countries and is an increasingly important cause of hepatocellular carcinoma. Electronic medical records facilitate large-scale epidemiological studies, existing NAFLD scores often require clinical and anthropometric parameters that may not be captured in those databases. To develop and validate a laboratory parameter-based machine learning model to detect NAFLD for the general population. We randomly divided 922 subjects from a population screening study into training and validation groups; NAFLD was diagnosed by proton-magnetic resonance spectroscopy. On the basis of machine learning from 23 routine clinical and laboratory parameters after elastic net regulation, we evaluated the logistic regression, ridge regression, AdaBoost and decision tree models. The areas under receiver-operating characteristic curve (AUROC) of models in validation group were compared. Six predictors including alanine aminotransferase, high-density lipoprotein cholesterol, triglyceride, haemoglobin A 1c , white blood cell count and the presence of hypertension were selected. The NAFLD ridge score achieved AUROC of 0.87 (95% CI 0.83-0.90) and 0.88 (0.84-0.91) in the training and validation groups respectively. Using dual cut-offs of 0.24 and 0.44, NAFLD ridge score achieved 92% (86%-96%) sensitivity and 90% (86%-93%) specificity with corresponding negative and positive predictive values of 96% (91%-98%) and 69% (59%-78%), and 87% of overall accuracy among 70% of classifiable subjects in the validation group; 30% of subjects remained indeterminate. NAFLD ridge score is a simple and robust reference comparable to existing NAFLD scores to exclude NAFLD patients in epidemiological studies. © 2017 John Wiley & Sons Ltd.
Comparative decision models for anticipating shortage of food grain production in India
NASA Astrophysics Data System (ADS)
Chattopadhyay, Manojit; Mitra, Subrata Kumar
2018-01-01
This paper attempts to predict food shortages in advance from the analysis of rainfall during the monsoon months along with other inputs used for crop production, such as land used for cereal production, percentage of area covered under irrigation and fertiliser use. We used six binary classification data mining models viz., logistic regression, Multilayer Perceptron, kernel lab-Support Vector Machines, linear discriminant analysis, quadratic discriminant analysis and k-Nearest Neighbors Network, and found that linear discriminant analysis and kernel lab-Support Vector Machines are equally suitable for predicting per capita food shortage with 89.69 % accuracy in overall prediction and 92.06 % accuracy in predicting food shortage ( true negative rate). Advance information of food shortage can help policy makers to take remedial measures in order to prevent devastating consequences arising out of food non-availability.
Characterization of Microbiota in Children with Chronic Functional Constipation
de Meij, Tim G. J.; de Groot, Evelien F. J.; Eck, Anat; Budding, Andries E.; Kneepkens, C. M. Frank; Benninga, Marc A.; van Bodegraven, Adriaan A.; Savelkoul, Paul H. M.
2016-01-01
Objectives Disruption of the intestinal microbiota is considered an etiological factor in pediatric functional constipation. Scientifically based selection of potential beneficial probiotic strains in functional constipation therapy is not feasible due to insufficient knowledge of microbiota composition in affected subjects. The aim of this study was to describe microbial composition and diversity in children with functional constipation, compared to healthy controls. Study Design Fecal samples from 76 children diagnosed with functional constipation according to the Rome III criteria (median age 8.0 years; range 4.2–17.8) were analyzed by IS-pro, a PCR-based microbiota profiling method. Outcome was compared with intestinal microbiota profiles of 61 healthy children (median 8.6 years; range 4.1–17.9). Microbiota dissimilarity was depicted by principal coordinate analysis (PCoA), diversity was calculated by Shannon diversity index. To determine the most discriminative species, cross validated logistic ridge regression was performed. Results Applying total microbiota profiles (all phyla together) or per phylum analysis, no disease-specific separation was observed by PCoA and by calculation of diversity indices. By ridge regression, however, functional constipation and controls could be discriminated with 82% accuracy. Most discriminative species were Bacteroides fragilis, Bacteroides ovatus, Bifidobacterium longum, Parabacteroides species (increased in functional constipation) and Alistipes finegoldii (decreased in functional constipation). Conclusions None of the commonly used unsupervised statistical methods allowed for microbiota-based discrimination of children with functional constipation and controls. By ridge regression, however, both groups could be discriminated with 82% accuracy. Optimization of microbiota-based interventions in constipated children warrants further characterization of microbial signatures linked to clinical subgroups of functional constipation. PMID:27760208
A robust background regression based score estimation algorithm for hyperspectral anomaly detection
NASA Astrophysics Data System (ADS)
Zhao, Rui; Du, Bo; Zhang, Liangpei; Zhang, Lefei
2016-12-01
Anomaly detection has become a hot topic in the hyperspectral image analysis and processing fields in recent years. The most important issue for hyperspectral anomaly detection is the background estimation and suppression. Unreasonable or non-robust background estimation usually leads to unsatisfactory anomaly detection results. Furthermore, the inherent nonlinearity of hyperspectral images may cover up the intrinsic data structure in the anomaly detection. In order to implement robust background estimation, as well as to explore the intrinsic data structure of the hyperspectral image, we propose a robust background regression based score estimation algorithm (RBRSE) for hyperspectral anomaly detection. The Robust Background Regression (RBR) is actually a label assignment procedure which segments the hyperspectral data into a robust background dataset and a potential anomaly dataset with an intersection boundary. In the RBR, a kernel expansion technique, which explores the nonlinear structure of the hyperspectral data in a reproducing kernel Hilbert space, is utilized to formulate the data as a density feature representation. A minimum squared loss relationship is constructed between the data density feature and the corresponding assigned labels of the hyperspectral data, to formulate the foundation of the regression. Furthermore, a manifold regularization term which explores the manifold smoothness of the hyperspectral data, and a maximization term of the robust background average density, which suppresses the bias caused by the potential anomalies, are jointly appended in the RBR procedure. After this, a paired-dataset based k-nn score estimation method is undertaken on the robust background and potential anomaly datasets, to implement the detection output. The experimental results show that RBRSE achieves superior ROC curves, AUC values, and background-anomaly separation than some of the other state-of-the-art anomaly detection methods, and is easy to implement in practice.
ERIC Educational Resources Information Center
Moses, Tim; Miao, Jing; Dorans, Neil
2010-01-01
This study compared the accuracies of four differential item functioning (DIF) estimation methods, where each method makes use of only one of the following: raw data, logistic regression, loglinear models, or kernel smoothing. The major focus was on the estimation strategies' potential for estimating score-level, conditional DIF. A secondary focus…
(im)Balance of Forces in the Corona
NASA Astrophysics Data System (ADS)
Sylwester, J.; Sylwester, B.
Observed pattern of variability of solar atmosphere plasma structures, often accompanied by respective measured Doppler shifts, provides a direct evidence of imbalanced forces acting in this environment. Observed motions have been studied in various energy bands, extending from radio to hard X-rays using ground and space-borne instruments. Here, we present the results of a dedicated study of present observational databases in selected energy ranges with a special interest focused on TRACE movies. In our search we included also recently released wavelet-processed EIT and LASCO movies (from SOHO) as they provide additional support to the conclusions of this study. The main outcome of the work performed is our better understanding of a basic role played by plasma kernels in every ``layer'' of the solar atmosphere. These kernels appear to be present, and rapidly evolve at the locations of violent (intense) energy release locations. Subsequent formation of a more stable coronal magnetic structures seen in the form of ``spiders'' or ``scorpions'' is due to self-reorganization of plasma kernels. It comes out that the spider structure represents a basic, quasi-equilibrium building block of the solar atmosphere. When observed in a particular image, within a limited energy band, i.e. optical, EUV, soft or hard X-rays, only a part of this spider plasma structure can usually be seen, noticeably resembling a loop-like structure with a brighter top, or an arcade of loops connected along the ridge of summit kernels, or seemingly isolated oval source. This energy-dependent visibility effects caused a general confusion present in solar physics and led to proliferation of simple fluxtube scenarios. In our study presented herewith, we used the images obtained with the best available resolution, being enhanced numerically where possible. For the first time we enhanced the TRACE image data cube in a systematic way for a particular flare. Based on the results of analysis of a large number of images, we push forward a qualitative toy model of atmospheric connectivity pattern (Sylwester, J. and Sylwester, B., 2004). This hierarchic model is able to handle in a natural way observed complexity of atmospheric phenomena. Here, we discuss to some extent verifiable predictions of the hierarchical model outlining a number of new studies which might prove the concept. These predictions arise concurrently with the first data coming down from new missions being recently launched into orbit: the Hinode and the Stereo.
Elsyad, Moustafa Abdou; Mohamed, Shahinaz Sayed; Shawky, Ahmad Fathalla
This retrospective study compared posterior mandibular residual ridge resorption with two different retentive mechanisms for overdentures after 7 years. A convenience sample of 18 edentulous men was assigned to one of two equal groups. Two implants were placed in the mandibular canine areas for each patient using the conventional two-stage surgical protocol, and the implants were splinted with a round bar 3 months later. New mandibular overdentures were then connected to the bars with clips (clip-retained overdentures, CR group) or resilient liners (resilient liner-retained overdentures, RR group). Posterior mandibular ridge resorption (PMRR) was recorded using proportional measurements and posterior area index (PAI) on panoramic radiographs taken immediately after overdenture insertion (T 0 ) and 7 years later (T 7 ). A linear regression model was used to verify the relationship between PAI and the following considerations: attachment type, age, initial mandibular ridge height, period of mandibular edentulism, number of previously worn dentures, and relining events. After 7 years, the RR group demonstrated a significantly (P = .014) higher change in PAI (0.11 ± 0.02) than the CR group (0.06 ± 0.04). The average PMRR for each mm of posterior mandibular ridge was 0.79 mm (0.11 mm/year) in the CR group and 1.4 mm (0.2 mm/year) in the RR group. Attachment type, initial mandibular ridge height, and relining times were significantly correlated with change in the PAI (P = .004, P = .035, and P = .045, respectively). Within the limitations of this preliminary study's design, it was observed that following a 7-year period of use, resilient liner attachments for bar/implant-retained overdentures appear to be associated with greater posterior mandibular ridge resorption when compared to clip attachments.
Factor regression for interpreting genotype-environment interaction in bread-wheat trials.
Baril, C P
1992-05-01
The French INRA wheat (Triticum aestivum L. em Thell.) breeding program is based on multilocation trials to produce high-yielding, adapted lines for a wide range of environments. Differential genotypic responses to variable environment conditions limit the accuracy of yield estimations. Factor regression was used to partition the genotype-environment (GE) interaction into four biologically interpretable terms. Yield data were analyzed from 34 wheat genotypes grown in four environments using 12 auxiliary agronomic traits as genotypic and environmental covariates. Most of the GE interaction (91%) was explained by the combination of only three traits: 1,000-kernel weight, lodging susceptibility and spike length. These traits are easily measured in breeding programs, therefore factor regression model can provide a convenient and useful prediction method of yield.
Connell, J.F.; Bailey, Z.C.
1989-01-01
A total of 338 single-well aquifer tests from Bear Creek and Melton Valley, Tennessee were statistically grouped to estimate hydraulic conductivities for the geologic formations in the valleys. A cross-sectional simulation model linked to a regression model was used to further refine the statistical estimates for each of the formations and to improve understanding of ground-water flow in Bear Creek Valley. Median hydraulic-conductivity values were used as initial values in the model. Model-calculated estimates of hydraulic conductivity were generally lower than the statistical estimates. Simulations indicate that (1) the Pumpkin Valley Shale controls groundwater flow between Pine Ridge and Bear Creek; (2) all the recharge on Chestnut Ridge discharges to the Maynardville Limestone; (3) the formations having smaller hydraulic gradients may have a greater tendency for flow along strike; (4) local hydraulic conditions in the Maynardville Limestone cause inaccurate model-calculated estimates of hydraulic conductivity; and (5) the conductivity of deep bedrock neither affects the results of the model nor does it add information on the flow system. Improved model performance would require: (1) more water level data for the Copper Ridge Dolomite; (2) improved estimates of hydraulic conductivity in the Copper Ridge Dolomite and Maynardville Limestone; and (3) more water level data and aquifer tests in deep bedrock. (USGS)
NASA Astrophysics Data System (ADS)
Jia, Weile; Wang, Jue; Chi, Xuebin; Wang, Lin-Wang
2017-02-01
LS3DF, namely linear scaling three-dimensional fragment method, is an efficient linear scaling ab initio total energy electronic structure calculation code based on a divide-and-conquer strategy. In this paper, we present our GPU implementation of the LS3DF code. Our test results show that the GPU code can calculate systems with about ten thousand atoms fully self-consistently in the order of 10 min using thousands of computing nodes. This makes the electronic structure calculations of 10,000-atom nanosystems routine work. This speed is 4.5-6 times faster than the CPU calculations using the same number of nodes on the Titan machine in the Oak Ridge leadership computing facility (OLCF). Such speedup is achieved by (a) carefully re-designing of the computationally heavy kernels; (b) redesign of the communication pattern for heterogeneous supercomputers.
Drug-target interaction prediction using ensemble learning and dimensionality reduction.
Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong
2017-10-01
Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction. Copyright © 2017 Elsevier Inc. All rights reserved.
Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E
2017-07-01
The leaf spotting diseases in wheat that include Septoria tritici blotch (STB) caused by , Stagonospora nodorum blotch (SNB) caused by , and tan spot (TS) caused by pose challenges to breeding programs in selecting for resistance. A promising approach that could enable selection prior to phenotyping is genomic selection that uses genome-wide markers to estimate breeding values (BVs) for quantitative traits. To evaluate this approach for seedling and/or adult plant resistance (APR) to STB, SNB, and TS, we compared the predictive ability of least-squares (LS) approach with genomic-enabled prediction models including genomic best linear unbiased predictor (GBLUP), Bayesian ridge regression (BRR), Bayes A (BA), Bayes B (BB), Bayes Cπ (BC), Bayesian least absolute shrinkage and selection operator (BL), and reproducing kernel Hilbert spaces markers (RKHS-M), a pedigree-based model (RKHS-P) and RKHS markers and pedigree (RKHS-MP). We observed that LS gave the lowest prediction accuracies and RKHS-MP, the highest. The genomic-enabled prediction models and RKHS-P gave similar accuracies. The increase in accuracy using genomic prediction models over LS was 48%. The mean genomic prediction accuracies were 0.45 for STB (APR), 0.55 for SNB (seedling), 0.66 for TS (seedling) and 0.48 for TS (APR). We also compared markers from two whole-genome profiling approaches: genotyping by sequencing (GBS) and diversity arrays technology sequencing (DArTseq) for prediction. While, GBS markers performed slightly better than DArTseq, combining markers from the two approaches did not improve accuracies. We conclude that implementing GS in breeding for these diseases would help to achieve higher accuracies and rapid gains from selection. Copyright © 2017 Crop Science Society of America.
Hot Swapping Protocol Implementations in the OPNET Modeler Development Environment
2008-03-01
components. Unfortunately, this style is not efficient or particularly human–readable. Even purely pedagogical scenarios consisting of a client and a...definition provided by the mock object. sion of this kernel procedure steers all packets sent with op pk deliver() to the unit testing’s specialized...forms of development. Moreover, batteries of unit tests could ship with the accompanying process models and serve as robust regression tests
Robinson, M.M.; McBride, R.A.
2008-01-01
Certain details regarding the origin and evolution of shelf sand ridges remain elusive. Knowledge of their internal stratigraphy and microfossil distribution is necessary to define the origin and to determine the processes that modify sand ridges. Fourteen vibracores from False Cape Shoal A, a well-developed shoreface-attached sand ridge on the Virginia/North Carolina inner continental shelf, were examined to document the internal stratigraphy and benthic foraminiferal assemblages, as well as to reconstruct the depositional environments recorded in down-core sediments. Seven sedimentary and foraminiferal facies correspond to the following stratigraphic units: fossiliferous silt, barren sand, clay to sandy clay, laminated and bioturbated sand, poorly sorted massive sand, fine clean sand, and poorly sorted clay to gravel. The units represent a Pleistocene estuary and shoreface, a Holocene estuary, ebb tidal delta, modern shelf, modern shoreface, and swale fill, respectively. The succession of depositional environments reflects a Pleistocene sea-level highstand and subsequent regression followed by the Holocene transgression in which barrier island/spit systems formed along the Virginia/North Carolina inner shelf ???5.2 ka and migrated landward and an ebb tidal delta that was deposited, reworked, and covered by shelf sand.
Robinson, Marci M.; McBride, Randolph A.
2008-01-01
Certain details regarding the origin and evolution of shelf sand ridges remain elusive. Knowledge of their internal stratigraphy and microfossil distribution is necessary to define the origin and to determine the processes that modify sand ridges. Fourteen vibracores from False Cape Shoal A, a well-developed shoreface-attached sand ridge on the Virginia/North Carolina inner continental shelf, were examined to document the internal stratigraphy and benthic foraminiferal assemblages, as well as to reconstruct the depositional environments recorded in down-core sediments. Seven sedimentary and foraminiferal facies correspond to the following stratigraphic units: fossiliferous silt, barren sand, clay to sandy clay, laminated and bioturbated sand, poorly sorted massive sand, fine clean sand, and poorly sorted clay to gravel. The units represent a Pleistocene estuary and shoreface, a Holocene estuary, ebb tidal delta, modern shelf, modern shoreface, and swale fill, respectively. The succession of depositional environments reflects a Pleistocene sea-level highstand and subsequent regression followed by the Holocene transgression in which barrier island/spit systems formed along the Virginia/North Carolina inner shelf not, vert, ~5.2 ka and migrated landward and an ebb tidal delta that was deposited, reworked, and covered by shelf sand.
[Determination of acidity and vitamin C in apples using portable NIR analyzer].
Yang, Fan; Li, Ya-Ting; Gu, Xuan; Ma, Jiang; Fan, Xing; Wang, Xiao-Xuan; Zhang, Zhuo-Yong
2011-09-01
Near infrared (NIR) spectroscopy technology based on a portable NIR analyzer, combined with kernel Isomap algorithm and generalized regression neural network (GRNN) has been applied to establishing quantitative models for prediction of acidity and vitamin C in six kinds of apple samples. The obtained results demonstrated that the fitting and the predictive accuracy of the models with kernel Isomap algorithm were satisfactory. The correlation between actual and predicted values of calibration samples (R(c)) obtained by the acidity model was 0.999 4, and for prediction samples (R(p)) was 0.979 9. The root mean square error of prediction set (RMSEP) was 0.055 8. For the vitamin C model, R(c) was 0.989 1, R(p) was 0.927 2, and RMSEP was 4.043 1. Results proved that the portable NIR analyzer can be a feasible tool for the determination of acidity and vitamin C in apples.
Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J
2017-05-01
Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.
Using kernel density estimation to understand the influence of neighbourhood destinations on BMI
King, Tania L; Bentley, Rebecca J; Thornton, Lukar E; Kavanagh, Anne M
2016-01-01
Objectives Little is known about how the distribution of destinations in the local neighbourhood is related to body mass index (BMI). Kernel density estimation (KDE) is a spatial analysis technique that accounts for the location of features relative to each other. Using KDE, this study investigated whether individuals living near destinations (shops and service facilities) that are more intensely distributed rather than dispersed, have lower BMIs. Study design and setting A cross-sectional study of 2349 residents of 50 urban areas in metropolitan Melbourne, Australia. Methods Destinations were geocoded, and kernel density estimates of destination intensity were created using kernels of 400, 800 and 1200 m. Using multilevel linear regression, the association between destination intensity (classified in quintiles Q1(least)–Q5(most)) and BMI was estimated in models that adjusted for the following confounders: age, sex, country of birth, education, dominant household occupation, household type, disability/injury and area disadvantage. Separate models included a physical activity variable. Results For kernels of 800 and 1200 m, there was an inverse relationship between BMI and more intensely distributed destinations (compared to areas with least destination intensity). Effects were significant at 1200 m: Q4, β −0.86, 95% CI −1.58 to −0.13, p=0.022; Q5, β −1.03 95% CI −1.65 to −0.41, p=0.001. Inclusion of physical activity in the models attenuated effects, although effects remained marginally significant for Q5 at 1200 m: β −0.77 95% CI −1.52, −0.02, p=0.045. Conclusions This study conducted within urban Melbourne, Australia, found that participants living in areas of greater destination intensity within 1200 m of home had lower BMIs. Effects were partly explained by physical activity. The results suggest that increasing the intensity of destination distribution could reduce BMI levels by encouraging higher levels of physical activity. PMID:26883235
Raz, Gal; Svanera, Michele; Singer, Neomi; Gilam, Gadi; Cohen, Maya Bleich; Lin, Tamar; Admon, Roee; Gonen, Tal; Thaler, Avner; Granot, Roni Y; Goebel, Rainer; Benini, Sergio; Valente, Giancarlo
2017-12-01
Major methodological advancements have been recently made in the field of neural decoding, which is concerned with the reconstruction of mental content from neuroimaging measures. However, in the absence of a large-scale examination of the validity of the decoding models across subjects and content, the extent to which these models can be generalized is not clear. This study addresses the challenge of producing generalizable decoding models, which allow the reconstruction of perceived audiovisual features from human magnetic resonance imaging (fMRI) data without prior training of the algorithm on the decoded content. We applied an adapted version of kernel ridge regression combined with temporal optimization on data acquired during film viewing (234 runs) to generate standardized brain models for sound loudness, speech presence, perceived motion, face-to-frame ratio, lightness, and color brightness. The prediction accuracies were tested on data collected from different subjects watching other movies mainly in another scanner. Substantial and significant (Q FDR <0.05) correlations between the reconstructed and the original descriptors were found for the first three features (loudness, speech, and motion) in all of the 9 test movies (R¯=0.62, R¯ = 0.60, R¯ = 0.60, respectively) with high reproducibility of the predictors across subjects. The face ratio model produced significant correlations in 7 out of 8 movies (R¯=0.56). The lightness and brightness models did not show robustness (R¯=0.23, R¯ = 0). Further analysis of additional data (95 runs) indicated that loudness reconstruction veridicality can consistently reveal relevant group differences in musical experience. The findings point to the validity and generalizability of our loudness, speech, motion, and face ratio models for complex cinematic stimuli (as well as for music in the case of loudness). While future research should further validate these models using controlled stimuli and explore the feasibility of extracting more complex models via this method, the reliability of our results indicates the potential usefulness of the approach and the resulting models in basic scientific and diagnostic contexts. Copyright © 2017 Elsevier Inc. All rights reserved.
A New Technique for Personality Scale Construction. Preliminary Findings.
ERIC Educational Resources Information Center
Schaffner, Paul E.; Darlington, Richard B.
Most methods of personality scale construction have clear statistical disadvantages. A hybrid method (Darlington and Bishop, 1966) was found to increase scale validity more than any other method, with large item pools. A simple modification of the Darlington-Bishop method (algebraically and conceptually similar to ridge regression, but…
A Distributed Learning Method for ℓ1-Regularized Kernel Machine over Wireless Sensor Networks
Ji, Xinrong; Hou, Cuiqin; Hou, Yibin; Gao, Fang; Wang, Shulong
2016-01-01
In wireless sensor networks, centralized learning methods have very high communication costs and energy consumption. These are caused by the need to transmit scattered training examples from various sensor nodes to the central fusion center where a classifier or a regression machine is trained. To reduce the communication cost, a distributed learning method for a kernel machine that incorporates ℓ1 norm regularization (ℓ1-regularized) is investigated, and a novel distributed learning algorithm for the ℓ1-regularized kernel minimum mean squared error (KMSE) machine is proposed. The proposed algorithm relies on in-network processing and a collaboration that transmits the sparse model only between single-hop neighboring nodes. This paper evaluates the proposed algorithm with respect to the prediction accuracy, the sparse rate of model, the communication cost and the number of iterations on synthetic and real datasets. The simulation results show that the proposed algorithm can obtain approximately the same prediction accuracy as that obtained by the batch learning method. Moreover, it is significantly superior in terms of the sparse rate of model and communication cost, and it can converge with fewer iterations. Finally, an experiment conducted on a wireless sensor network (WSN) test platform further shows the advantages of the proposed algorithm with respect to communication cost. PMID:27376298
Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2015-01-01
Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797
NASA Astrophysics Data System (ADS)
Aghayari, M.; Pahlavani, P.; Bigdeli, B.
2017-09-01
Based on world health organization (WHO) report, driving incidents are counted as one of the eight initial reasons for death in the world. The purpose of this paper is to develop a method for regression on effective parameters of highway crashes. In the traditional methods, it was assumed that the data are completely independent and environment is homogenous while the crashes are spatial events which are occurring in geographic space and crashes have spatial data. Spatial data have spatial features such as spatial autocorrelation and spatial non-stationarity in a way working with them is going to be a bit difficult. The proposed method has implemented on a set of records of fatal crashes that have been occurred in highways connecting eight east states of US. This data have been recorded between the years 2007 and 2009. In this study, we have used GWR method with two Gaussian and Tricube kernels. The Number of casualties has been considered as dependent variable and number of persons in crash, road alignment, number of lanes, pavement type, surface condition, road fence, light condition, vehicle type, weather, drunk driver, speed limitation, harmful event, road profile, and junction type have been considered as explanatory variables according to previous studies in using GWR method. We have compered the results of implementation with OLS method. Results showed that R2 for OLS method is 0.0654 and for the proposed method is 0.9196 that implies the proposed GWR is better method for regression in rural highway crashes.
SNPs selection using support vector regression and genetic algorithms in GWAS
2014-01-01
Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. PMID:25573332
Weighted functional linear regression models for gene-based association analysis.
Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I
2018-01-01
Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.
Cooley, Richard L.
1983-01-01
This paper investigates factors influencing the degree of improvement in estimates of parameters of a nonlinear regression groundwater flow model by incorporating prior information of unknown reliability. Consideration of expected behavior of the regression solutions and results of a hypothetical modeling problem lead to several general conclusions. First, if the parameters are properly scaled, linearized expressions for the mean square error (MSE) in parameter estimates of a nonlinear model will often behave very nearly as if the model were linear. Second, by using prior information, the MSE in properly scaled parameters can be reduced greatly over the MSE of ordinary least squares estimates of parameters. Third, plots of estimated MSE and the estimated standard deviation of MSE versus an auxiliary parameter (the ridge parameter) specifying the degree of influence of the prior information on regression results can help determine the potential for improvement of parameter estimates. Fourth, proposed criteria can be used to make appropriate choices for the ridge parameter and another parameter expressing degree of overall bias in the prior information. Results of a case study of Truckee Meadows, Reno-Sparks area, Washoe County, Nevada, conform closely to the results of the hypothetical problem. In the Truckee Meadows case, incorporation of prior information did not greatly change the parameter estimates from those obtained by ordinary least squares. However, the analysis showed that both sets of estimates are more reliable than suggested by the standard errors from ordinary least squares.
Multi-model ensemble combinations of the water budget in the East/Japan Sea
NASA Astrophysics Data System (ADS)
HAN, S.; Hirose, N.; Usui, N.; Miyazawa, Y.
2016-02-01
The water balance of East/Japan Sea is determined mainly by inflow and outflow through the Korea/Tsushima, Tsugaru and Soya/La Perouse Straits. However, the volume transports measured at three straits remain quantitatively unbalanced. This study examined the seasonal variation of the volume transport using the multiple linear regression and ridge regression of multi-model ensemble (MME) methods to estimate physically consistent circulation in East/Japan Sea by using four different data assimilation models. The MME outperformed all of the single models by reducing uncertainties, especially the multicollinearity problem with the ridge regression. However, the regression constants turned out to be inconsistent with each other if the MME was applied separately for each strait. The MME for a connected system was thus performed to find common constants for these straits. The estimation of this MME was found to be similar to the MME result of sea level difference (SLD). The estimated mean transport (2.42 Sv) was smaller than the measurement data at the Korea/Tsushima Strait, but the calibrated transport of the Tsugaru Strait (1.63 Sv) was larger than the observed data. The MME results of transport and SLD also suggested that the standard deviation (STD) of the Korea/Tsushima Strait is larger than the STD of the observation, whereas the estimated results were almost identical to that observed for the Tsugaru and Soya/La Perouse Straits. The similarity between MME results enhances the reliability of the present MME estimation.
Multi-model ensemble estimation of volume transport through the straits of the East/Japan Sea
NASA Astrophysics Data System (ADS)
Han, Sooyeon; Hirose, Naoki; Usui, Norihisa; Miyazawa, Yasumasa
2016-01-01
The volume transports measured at the Korea/Tsushima, Tsugaru, and Soya/La Perouse Straits remain quantitatively inconsistent. However, data assimilation models at least provide a self-consistent budget despite subtle differences among the models. This study examined the seasonal variation of the volume transport using the multiple linear regression and ridge regression of multi-model ensemble (MME) methods to estimate more accurately transport at these straits by using four different data assimilation models. The MME outperformed all of the single models by reducing uncertainties, especially the multicollinearity problem with the ridge regression. However, the regression constants turned out to be inconsistent with each other if the MME was applied separately for each strait. The MME for a connected system was thus performed to find common constants for these straits. The estimation of this MME was found to be similar to the MME result of sea level difference (SLD). The estimated mean transport (2.43 Sv) was smaller than the measurement data at the Korea/Tsushima Strait, but the calibrated transport of the Tsugaru Strait (1.63 Sv) was larger than the observed data. The MME results of transport and SLD also suggested that the standard deviation (STD) of the Korea/Tsushima Strait is larger than the STD of the observation, whereas the estimated results were almost identical to that observed for the Tsugaru and Soya/La Perouse Straits. The similarity between MME results enhances the reliability of the present MME estimation.
[Study on objectively evaluating skin aging according to areas of skin texture].
Shan, Gaixin; Gan, Ping; He, Ling; Sun, Lu; Li, Qiannan; Jiang, Zheng; He, Xiangqian
2015-02-01
Skin aging principles play important roles in skin disease diagnosis, the evaluation of skin cosmetic effect, forensic identification and age identification in sports competition, etc. This paper proposes a new method to evaluate the skin aging objectively and quantitatively by skin texture area. Firstly, the enlarged skin image was acquired. Then, the skin texture image was segmented by using the iterative threshold method, and the skin ridge image was extracted according to the watershed algorithm. Finally, the skin ridge areas of the skin texture were extracted. The experiment data showed that the average areas of skin ridges, of both men and women, had a good correlation with age (the correlation coefficient r of male was 0.938, and the correlation coefficient r of female was 0.922), and skin texture area and age regression curve showed that the skin texture area increased with age. Therefore, it is effective to evaluate skin aging objectively by the new method presented in this paper.
L2-norm multiple kernel learning and its application to biomedical data fusion
2010-01-01
Background This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as L∞, L1, and L2 MKL. In particular, L2 MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing L∞ MKL method. In real biomedical applications, L2 MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources. Results We provide a theoretical analysis of the relationship between the L2 optimization of kernels in the dual problem with the L2 coefficient regularization in the primal problem. Understanding the dual L2 problem grants a unified view on MKL and enables us to extend the L2 method to a wide range of machine learning problems. We implement L2 MKL for ranking and classification problems and compare its performance with the sparse L∞ and the averaging L1 MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. L2 MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel L2 MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing. Conclusions This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in L∞ MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing L2 kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL. Availability The MATLAB code of algorithms implemented in this paper is downloadable from http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html. PMID:20529363
Cooley, Richard L.
1982-01-01
Prior information on the parameters of a groundwater flow model can be used to improve parameter estimates obtained from nonlinear regression solution of a modeling problem. Two scales of prior information can be available: (1) prior information having known reliability (that is, bias and random error structure) and (2) prior information consisting of best available estimates of unknown reliability. A regression method that incorporates the second scale of prior information assumes the prior information to be fixed for any particular analysis to produce improved, although biased, parameter estimates. Approximate optimization of two auxiliary parameters of the formulation is used to help minimize the bias, which is almost always much smaller than that resulting from standard ridge regression. It is shown that if both scales of prior information are available, then a combined regression analysis may be made.
Geodesic regression for image time-series.
Niethammer, Marc; Huang, Yang; Vialard, François-Xavier
2011-01-01
Registration of image-time series has so far been accomplished (i) by concatenating registrations between image pairs, (ii) by solving a joint estimation problem resulting in piecewise geodesic paths between image pairs, (iii) by kernel based local averaging or (iv) by augmenting the joint estimation with additional temporal irregularity penalties. Here, we propose a generative model extending least squares linear regression to the space of images by using a second-order dynamic formulation for image registration. Unlike previous approaches, the formulation allows for a compact representation of an approximation to the full spatio-temporal trajectory through its initial values. The method also opens up possibilities to design image-based approximation algorithms. The resulting optimization problem is solved using an adjoint method.
Normalization Ridge Regression in Practice II: The Estimation of Multiple Feedback Linkages.
ERIC Educational Resources Information Center
Bulcock, J. W.
The use of the two-stage least squares (2 SLS) procedure for estimating nonrecursive social science models is often impractical when multiple feedback linkages are required. This is because 2 SLS is extremely sensitive to multicollinearity. The standard statistical solution to the multicollinearity problem is a biased, variance reduced procedure…
Local Prediction Models on Mid-Atlantic Ridge MORB by Principal Component Regression
NASA Astrophysics Data System (ADS)
Ling, X.; Snow, J. E.; Chin, W.
2017-12-01
The isotopic compositions of the daughter isotopes of long-lived radioactive systems (Sr, Nd, Hf and Pb ) can be used to map the scale and history of mantle heterogeneities beneath mid-ocean ridges. Our goal is to relate the multidimensional structure in the existing isotopic dataset with an underlying physical reality of mantle sources. The numerical technique of Principal Component Analysis is useful to reduce the linear dependence of the data to a minimum set of orthogonal eigenvectors encapsulating the information contained (cf Agranier et al 2005). The dataset used for this study covers almost all the MORBs along mid-Atlantic Ridge (MAR), from 54oS to 77oN and 8.8oW to -46.7oW, including replicating the dataset of Agranier et al., 2005 published plus 53 basalt samples dredged and analyzed since then (data from PetDB). The principal components PC1 and PC2 account for 61.56% and 29.21%, respectively, of the total isotope ratios variability. The samples with similar compositions to HIMU and EM and DM are identified to better understand the PCs. PC1 and PC2 are accountable for HIMU and EM whereas PC2 has limited control over the DM source. PC3 is more strongly controlled by the depleted mantle source than PC2. What this means is that all three principal components have a high degree of significance relevant to the established mantle sources. We also tested the relationship between mantle heterogeneity and sample locality. K-means clustering algorithm is a type of unsupervised learning to find groups in the data based on feature similarity. The PC factor scores of each sample are clustered into three groups. Cluster one and three are alternating on the north and south MAR. Cluster two exhibits on 45.18oN to 0.79oN and -27.9oW to -30.40oW alternating with cluster one. The ridge has been preliminarily divided into 16 sections considering both the clusters and ridge segments. The principal component regression models the section based on 6 isotope ratios and PCs. The prediction residual is about 1-2km. It means that the combined 5 isotopes are a strong predictor of geographic location along the ridge, a slightly surprising result. PCR is a robust and powerful method for both visualizing and manipulating the multidimensional representation of isotope data.
Hanft, J M; Jones, R J
1986-06-01
Kernels cultured in vitro were induced to abort by high temperature (35 degrees C) and by culturing six kernels/cob piece. Aborting kernels failed to enter a linear phase of dry mass accumulation and had a final mass that was less than 6% of nonaborting field-grown kernels. Kernels induced to abort by high temperature failed to synthesize starch in the endosperm and had elevated sucrose concentrations and low fructose and glucose concentrations in the pedicel during early growth compared to nonaborting kernels. Kernels induced to abort by high temperature also had much lower pedicel soluble acid invertase activities than did nonaborting kernels. These results suggest that high temperature during the lag phase of kernel growth may impair the process of sucrose unloading in the pedicel by indirectly inhibiting soluble acid invertase activity and prevent starch synthesis in the endosperm. Kernels induced to abort by culturing six kernels/cob piece had reduced pedicel fructose, glucose, and sucrose concentrations compared to kernels from field-grown ears. These aborting kernels also had a lower pedicel soluble acid invertase activity compared to nonaborting kernels from the same cob piece and from field-grown ears. The low invertase activity in pedicel tissue of the aborting kernels was probably caused by a lack of substrate (sucrose) for the invertase to cleave due to the intense competition for available assimilates. In contrast to kernels cultured at 35 degrees C, aborting kernels from cob pieces containing all six kernels accumulated starch in a linear fashion. These results indicate that kernels cultured six/cob piece abort because of an inadequate supply of sugar and are similar to apical kernels from field-grown ears that often abort prior to the onset of linear growth.
Mutual information estimation for irregularly sampled time series
NASA Astrophysics Data System (ADS)
Rehfeld, K.; Marwan, N.; Heitzig, J.; Kurths, J.
2012-04-01
For the automated, objective and joint analysis of time series, similarity measures are crucial. Used in the analysis of climate records, they allow for a complimentary, unbiased view onto sparse datasets. The irregular sampling of many of these time series, however, makes it necessary to either perform signal reconstruction (e.g. interpolation) or to develop and use adapted measures. Standard linear interpolation comes with an inevitable loss of information and bias effects. We have recently developed a Gaussian kernel-based correlation algorithm with which the interpolation error can be substantially lowered, but this would not work should the functional relationship in a bivariate setting be non-linear. We therefore propose an algorithm to estimate lagged auto and cross mutual information from irregularly sampled time series. We have extended the standard and adaptive binning histogram estimators and use Gaussian distributed weights in the estimation of the (joint) probabilities. To test our method we have simulated linear and nonlinear auto-regressive processes with Gamma-distributed inter-sampling intervals. We have then performed a sensitivity analysis for the estimation of actual coupling length, the lag of coupling and the decorrelation time in the synthetic time series and contrast our results to the performance of a signal reconstruction scheme. Finally we applied our estimator to speleothem records. We compare the estimated memory (or decorrelation time) to that from a least-squares estimator based on fitting an auto-regressive process of order 1. The calculated (cross) mutual information results are compared for the different estimators (standard or adaptive binning) and contrasted with results from signal reconstruction. We find that the kernel-based estimator has a significantly lower root mean square error and less systematic sampling bias than the interpolation-based method. It is possible that these encouraging results could be further improved by using non-histogram mutual information estimators, like k-Nearest Neighbor or Kernel-Density estimators, but for short (<1000 points) and irregularly sampled datasets the proposed algorithm is already a great improvement.
2014-01-01
Background Support vector regression (SVR) and Gaussian process regression (GPR) were used for the analysis of electroanalytical experimental data to estimate diffusion coefficients. Results For simulated cyclic voltammograms based on the EC, Eqr, and EqrC mechanisms these regression algorithms in combination with nonlinear kernel/covariance functions yielded diffusion coefficients with higher accuracy as compared to the standard approach of calculating diffusion coefficients relying on the Nicholson-Shain equation. The level of accuracy achieved by SVR and GPR is virtually independent of the rate constants governing the respective reaction steps. Further, the reduction of high-dimensional voltammetric signals by manual selection of typical voltammetric peak features decreased the performance of both regression algorithms compared to a reduction by downsampling or principal component analysis. After training on simulated data sets, diffusion coefficients were estimated by the regression algorithms for experimental data comprising voltammetric signals for three organometallic complexes. Conclusions Estimated diffusion coefficients closely matched the values determined by the parameter fitting method, but reduced the required computational time considerably for one of the reaction mechanisms. The automated processing of voltammograms according to the regression algorithms yields better results than the conventional analysis of peak-related data. PMID:24987463
Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)
NASA Astrophysics Data System (ADS)
Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul
2018-05-01
The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.
7 CFR 810.602 - Definition of other terms.
Code of Federal Regulations, 2010 CFR
2010-01-01
...) Damaged kernels. Kernels and pieces of flaxseed kernels that are badly ground-damaged, badly weather... instructions. Also, underdeveloped, shriveled, and small pieces of flaxseed kernels removed in properly... recleaning. (c) Heat-damaged kernels. Kernels and pieces of flaxseed kernels that are materially discolored...
Hanft, Jonathan M.; Jones, Robert J.
1986-01-01
Kernels cultured in vitro were induced to abort by high temperature (35°C) and by culturing six kernels/cob piece. Aborting kernels failed to enter a linear phase of dry mass accumulation and had a final mass that was less than 6% of nonaborting field-grown kernels. Kernels induced to abort by high temperature failed to synthesize starch in the endosperm and had elevated sucrose concentrations and low fructose and glucose concentrations in the pedicel during early growth compared to nonaborting kernels. Kernels induced to abort by high temperature also had much lower pedicel soluble acid invertase activities than did nonaborting kernels. These results suggest that high temperature during the lag phase of kernel growth may impair the process of sucrose unloading in the pedicel by indirectly inhibiting soluble acid invertase activity and prevent starch synthesis in the endosperm. Kernels induced to abort by culturing six kernels/cob piece had reduced pedicel fructose, glucose, and sucrose concentrations compared to kernels from field-grown ears. These aborting kernels also had a lower pedicel soluble acid invertase activity compared to nonaborting kernels from the same cob piece and from field-grown ears. The low invertase activity in pedicel tissue of the aborting kernels was probably caused by a lack of substrate (sucrose) for the invertase to cleave due to the intense competition for available assimilates. In contrast to kernels cultured at 35°C, aborting kernels from cob pieces containing all six kernels accumulated starch in a linear fashion. These results indicate that kernels cultured six/cob piece abort because of an inadequate supply of sugar and are similar to apical kernels from field-grown ears that often abort prior to the onset of linear growth. PMID:16664846
7 CFR 810.1202 - Definition of other terms.
Code of Federal Regulations, 2010 CFR
2010-01-01
... kernels. Kernels, pieces of rye kernels, and other grains that are badly ground-damaged, badly weather.... Also, underdeveloped, shriveled, and small pieces of rye kernels removed in properly separating the...-damaged kernels. Kernels, pieces of rye kernels, and other grains that are materially discolored and...
Chen, Jiafa; Zhang, Luyan; Liu, Songtao; Li, Zhimin; Huang, Rongrong; Li, Yongming; Cheng, Hongliang; Li, Xiantang; Zhou, Bo; Wu, Suowei; Chen, Wei; Wu, Jianyu; Ding, Junqiang
2016-01-01
Kernel size is an important component of grain yield in maize breeding programs. To extend the understanding on the genetic basis of kernel size traits (i.e., kernel length, kernel width and kernel thickness), we developed a set of four-way cross mapping population derived from four maize inbred lines with varied kernel sizes. In the present study, we investigated the genetic basis of natural variation in seed size and other components of maize yield (e.g., hundred kernel weight, number of rows per ear, number of kernels per row). In total, ten QTL affecting kernel size were identified, three of which (two for kernel length and one for kernel width) had stable expression in other components of maize yield. The possible genetic mechanism behind the trade-off of kernel size and yield components was discussed.
Liu, Songtao; Li, Zhimin; Huang, Rongrong; Li, Yongming; Cheng, Hongliang; Li, Xiantang; Zhou, Bo; Wu, Suowei; Chen, Wei; Wu, Jianyu; Ding, Junqiang
2016-01-01
Kernel size is an important component of grain yield in maize breeding programs. To extend the understanding on the genetic basis of kernel size traits (i.e., kernel length, kernel width and kernel thickness), we developed a set of four-way cross mapping population derived from four maize inbred lines with varied kernel sizes. In the present study, we investigated the genetic basis of natural variation in seed size and other components of maize yield (e.g., hundred kernel weight, number of rows per ear, number of kernels per row). In total, ten QTL affecting kernel size were identified, three of which (two for kernel length and one for kernel width) had stable expression in other components of maize yield. The possible genetic mechanism behind the trade-off of kernel size and yield components was discussed. PMID:27070143
Real estate value prediction using multivariate regression models
NASA Astrophysics Data System (ADS)
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
7 CFR 810.802 - Definition of other terms.
Code of Federal Regulations, 2010 CFR
2010-01-01
...) Damaged kernels. Kernels and pieces of grain kernels for which standards have been established under the.... (d) Heat-damaged kernels. Kernels and pieces of grain kernels for which standards have been...
Regionalization of precipitation characteristics in Montana using L-moments
Parrett, C.
1998-01-01
Dimensionless precipitation-frequency curves for estimating precipitation depths having small exceedance probabilities were developed for 2-, 6-, and 24-hour storm durations for three homogeneous regions in Montana. L-moment statistics were used to help define the homogeneous regions. The generalized extreme value distribution was used to construct the frequency curves for each duration within each region. The effective record length for each duration in each region was estimated using a graphical method and was found to range from 500 years for 6-hour duration data in Region 2 to 5,100 years for 24-hour duration data in Region 3. The temporal characteristics of storms were analyzed, and methods for estimating synthetic storm hyetographs were developed. Dimensionless depth-duration data were grouped by independent duration (2,6, and 24 hours) and by region, and the beta distribution was fit to dimensionless depth data for various incremental time intervals. Ordinary least-squares regression was used to develop relations between dimensionless depths for a key, short duration - termed the kernel duration - and dimensionless depths for other durations. The regression relations were used, together with the probabilistic dimensionless depth data for the kernel duration, to calculate dimensionless depth-duration curves for exceedance probabilities from .1 to .9. Dimensionless storm hyetographs for each independent duration in each region were constructed for median value conditions based on an exceedance probability of .5.
7 CFR 981.408 - Inedible kernel.
Code of Federal Regulations, 2014 CFR
2014-01-01
... kernel is modified to mean a kernel, piece, or particle of almond kernel with any defect scored as... purposes of determining inedible kernels, pieces, or particles of almond kernels. [59 FR 39419, Aug. 3...
7 CFR 981.408 - Inedible kernel.
Code of Federal Regulations, 2011 CFR
2011-01-01
... kernel is modified to mean a kernel, piece, or particle of almond kernel with any defect scored as... purposes of determining inedible kernels, pieces, or particles of almond kernels. [59 FR 39419, Aug. 3...
7 CFR 981.408 - Inedible kernel.
Code of Federal Regulations, 2012 CFR
2012-01-01
... kernel is modified to mean a kernel, piece, or particle of almond kernel with any defect scored as... purposes of determining inedible kernels, pieces, or particles of almond kernels. [59 FR 39419, Aug. 3...
7 CFR 981.408 - Inedible kernel.
Code of Federal Regulations, 2013 CFR
2013-01-01
... kernel is modified to mean a kernel, piece, or particle of almond kernel with any defect scored as... purposes of determining inedible kernels, pieces, or particles of almond kernels. [59 FR 39419, Aug. 3...
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-06-19
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.
Genome-wide regression and prediction with the BGLR statistical package.
Pérez, Paulino; de los Campos, Gustavo
2014-10-01
Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.
Classification With Truncated Distance Kernel.
Huang, Xiaolin; Suykens, Johan A K; Wang, Shuning; Hornegger, Joachim; Maier, Andreas
2018-05-01
This brief proposes a truncated distance (TL1) kernel, which results in a classifier that is nonlinear in the global region but is linear in each subregion. With this kernel, the subregion structure can be trained using all the training data and local linear classifiers can be established simultaneously. The TL1 kernel has good adaptiveness to nonlinearity and is suitable for problems which require different nonlinearities in different areas. Though the TL1 kernel is not positive semidefinite, some classical kernel learning methods are still applicable which means that the TL1 kernel can be directly used in standard toolboxes by replacing the kernel evaluation. In numerical experiments, the TL1 kernel with a pregiven parameter achieves similar or better performance than the radial basis function kernel with the parameter tuned by cross validation, implying the TL1 kernel a promising nonlinear kernel for classification tasks.
Shimizu, Yu; Yoshimoto, Junichiro; Takamura, Masahiro; Okada, Go; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS) regression to resting-state functional magnetic resonance imaging (rs-fMRI) data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area. PMID:28700672
Reduced kernel recursive least squares algorithm for aero-engine degradation prediction
NASA Astrophysics Data System (ADS)
Zhou, Haowen; Huang, Jinquan; Lu, Feng
2017-10-01
Kernel adaptive filters (KAFs) generate a linear growing radial basis function (RBF) network with the number of training samples, thereby lacking sparseness. To deal with this drawback, traditional sparsification techniques select a subset of original training data based on a certain criterion to train the network and discard the redundant data directly. Although these methods curb the growth of the network effectively, it should be noted that information conveyed by these redundant samples is omitted, which may lead to accuracy degradation. In this paper, we present a novel online sparsification method which requires much less training time without sacrificing the accuracy performance. Specifically, a reduced kernel recursive least squares (RKRLS) algorithm is developed based on the reduced technique and the linear independency. Unlike conventional methods, our novel methodology employs these redundant data to update the coefficients of the existing network. Due to the effective utilization of the redundant data, the novel algorithm achieves a better accuracy performance, although the network size is significantly reduced. Experiments on time series prediction and online regression demonstrate that RKRLS algorithm requires much less computational consumption and maintains the satisfactory accuracy performance. Finally, we propose an enhanced multi-sensor prognostic model based on RKRLS and Hidden Markov Model (HMM) for remaining useful life (RUL) estimation. A case study in a turbofan degradation dataset is performed to evaluate the performance of the novel prognostic approach.
Kernelized rank learning for personalized drug recommendation.
He, Xiao; Folkman, Lukas; Borgwardt, Karsten
2018-03-08
Large-scale screenings of cancer cell lines with detailed molecular profiles against libraries of pharmacological compounds are currently being performed in order to gain a better understanding of the genetic component of drug response and to enhance our ability to recommend therapies given a patient's molecular profile. These comprehensive screens differ from the clinical setting in which (1) medical records only contain the response of a patient to very few drugs, (2) drugs are recommended by doctors based on their expert judgment, and (3) selecting the most promising therapy is often more important than accurately predicting the sensitivity to all potential drugs. Current regression models for drug sensitivity prediction fail to account for these three properties. We present a machine learning approach, named Kernelized Rank Learning (KRL), that ranks drugs based on their predicted effect per cell line (patient), circumventing the difficult problem of precisely predicting the sensitivity to the given drug. Our approach outperforms several state-of-the-art predictors in drug recommendation, particularly if the training dataset is sparse, and generalizes to patient data. Our work phrases personalized drug recommendation as a new type of machine learning problem with translational potential to the clinic. The Python implementation of KRL and scripts for running our experiments are available at https://github.com/BorgwardtLab/Kernelized-Rank-Learning. xiao.he@bsse.ethz.ch, lukas.folkman@bsse.ethz.ch. Supplementary data are available at Bioinformatics online.
Contour-Driven Atlas-Based Segmentation
Wachinger, Christian; Fritscher, Karl; Sharp, Greg; Golland, Polina
2016-01-01
We propose new methods for automatic segmentation of images based on an atlas of manually labeled scans and contours in the image. First, we introduce a Bayesian framework for creating initial label maps from manually annotated training images. Within this framework, we model various registration- and patch-based segmentation techniques by changing the deformation field prior. Second, we perform contour-driven regression on the created label maps to refine the segmentation. Image contours and image parcellations give rise to non-stationary kernel functions that model the relationship between image locations. Setting the kernel to the covariance function in a Gaussian process establishes a distribution over label maps supported by image structures. Maximum a posteriori estimation of the distribution over label maps conditioned on the outcome of the atlas-based segmentation yields the refined segmentation. We evaluate the segmentation in two clinical applications: the segmentation of parotid glands in head and neck CT scans and the segmentation of the left atrium in cardiac MR angiography images. PMID:26068202
Wang, Yonghua; Li, Yan; Wang, Bin
2007-01-01
Nicotine and a variety of other drugs and toxins are metabolized by cytochrome P450 (CYP) 2A6. The aim of the present study was to build a quantitative structure-activity relationship (QSAR) model to predict the activities of nicotine analogues on CYP2A6. Kernel partial least squares (K-PLS) regression was employed with the electro-topological descriptors to build the computational models. Both the internal and external predictabilities of the models were evaluated with test sets to ensure their validity and reliability. As a comparison to K-PLS, a standard PLS algorithm was also applied on the same training and test sets. Our results show that the K-PLS produced reasonable results that outperformed the PLS model on the datasets. The obtained K-PLS model will be helpful for the design of novel nicotine-like selective CYP2A6 inhibitors.
Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong
2017-01-01
A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification. PMID:28629202
Gabor-based kernel PCA with fractional power polynomial models for face recognition.
Liu, Chengjun
2004-05-01
This paper presents a novel Gabor-based kernel Principal Component Analysis (PCA) method by integrating the Gabor wavelet representation of face images and the kernel PCA method for face recognition. Gabor wavelets first derive desirable facial features characterized by spatial frequency, spatial locality, and orientation selectivity to cope with the variations due to illumination and facial expression changes. The kernel PCA method is then extended to include fractional power polynomial models for enhanced face recognition performance. A fractional power polynomial, however, does not necessarily define a kernel function, as it might not define a positive semidefinite Gram matrix. Note that the sigmoid kernels, one of the three classes of widely used kernel functions (polynomial kernels, Gaussian kernels, and sigmoid kernels), do not actually define a positive semidefinite Gram matrix either. Nevertheless, the sigmoid kernels have been successfully used in practice, such as in building support vector machines. In order to derive real kernel PCA features, we apply only those kernel PCA eigenvectors that are associated with positive eigenvalues. The feasibility of the Gabor-based kernel PCA method with fractional power polynomial models has been successfully tested on both frontal and pose-angled face recognition, using two data sets from the FERET database and the CMU PIE database, respectively. The FERET data set contains 600 frontal face images of 200 subjects, while the PIE data set consists of 680 images across five poses (left and right profiles, left and right half profiles, and frontal view) with two different facial expressions (neutral and smiling) of 68 subjects. The effectiveness of the Gabor-based kernel PCA method with fractional power polynomial models is shown in terms of both absolute performance indices and comparative performance against the PCA method, the kernel PCA method with polynomial kernels, the kernel PCA method with fractional power polynomial models, the Gabor wavelet-based PCA method, and the Gabor wavelet-based kernel PCA method with polynomial kernels.
A multi-label learning based kernel automatic recommendation method for support vector machine.
Zhang, Xueying; Song, Qinbao
2015-01-01
Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance.
A Multi-Label Learning Based Kernel Automatic Recommendation Method for Support Vector Machine
Zhang, Xueying; Song, Qinbao
2015-01-01
Choosing an appropriate kernel is very important and critical when classifying a new problem with Support Vector Machine. So far, more attention has been paid on constructing new kernels and choosing suitable parameter values for a specific kernel function, but less on kernel selection. Furthermore, most of current kernel selection methods focus on seeking a best kernel with the highest classification accuracy via cross-validation, they are time consuming and ignore the differences among the number of support vectors and the CPU time of SVM with different kernels. Considering the tradeoff between classification success ratio and CPU time, there may be multiple kernel functions performing equally well on the same classification problem. Aiming to automatically select those appropriate kernel functions for a given data set, we propose a multi-label learning based kernel recommendation method built on the data characteristics. For each data set, the meta-knowledge data base is first created by extracting the feature vector of data characteristics and identifying the corresponding applicable kernel set. Then the kernel recommendation model is constructed on the generated meta-knowledge data base with the multi-label classification method. Finally, the appropriate kernel functions are recommended to a new data set by the recommendation model according to the characteristics of the new data set. Extensive experiments over 132 UCI benchmark data sets, with five different types of data set characteristics, eleven typical kernels (Linear, Polynomial, Radial Basis Function, Sigmoidal function, Laplace, Multiquadric, Rational Quadratic, Spherical, Spline, Wave and Circular), and five multi-label classification methods demonstrate that, compared with the existing kernel selection methods and the most widely used RBF kernel function, SVM with the kernel function recommended by our proposed method achieved the highest classification performance. PMID:25893896
NASA Astrophysics Data System (ADS)
Setiyorini, Anis; Suprijadi, Jadi; Handoko, Budhi
2017-03-01
Geographically Weighted Regression (GWR) is a regression model that takes into account the spatial heterogeneity effect. In the application of the GWR, inference on regression coefficients is often of interest, as is estimation and prediction of the response variable. Empirical research and studies have demonstrated that local correlation between explanatory variables can lead to estimated regression coefficients in GWR that are strongly correlated, a condition named multicollinearity. It later results on a large standard error on estimated regression coefficients, and, hence, problematic for inference on relationships between variables. Geographically Weighted Lasso (GWL) is a method which capable to deal with spatial heterogeneity and local multicollinearity in spatial data sets. GWL is a further development of GWR method, which adds a LASSO (Least Absolute Shrinkage and Selection Operator) constraint in parameter estimation. In this study, GWL will be applied by using fixed exponential kernel weights matrix to establish a poverty modeling of Java Island, Indonesia. The results of applying the GWL to poverty datasets show that this method stabilizes regression coefficients in the presence of multicollinearity and produces lower prediction and estimation error of the response variable than GWR does.
Censored quantile regression with recursive partitioning-based weights
Wey, Andrew; Wang, Lan; Rudser, Kyle
2014-01-01
Censored quantile regression provides a useful alternative to the Cox proportional hazards model for analyzing survival data. It directly models the conditional quantile of the survival time and hence is easy to interpret. Moreover, it relaxes the proportionality constraint on the hazard function associated with the popular Cox model and is natural for modeling heterogeneity of the data. Recently, Wang and Wang (2009. Locally weighted censored quantile regression. Journal of the American Statistical Association 103, 1117–1128) proposed a locally weighted censored quantile regression approach that allows for covariate-dependent censoring and is less restrictive than other censored quantile regression methods. However, their kernel smoothing-based weighting scheme requires all covariates to be continuous and encounters practical difficulty with even a moderate number of covariates. We propose a new weighting approach that uses recursive partitioning, e.g. survival trees, that offers greater flexibility in handling covariate-dependent censoring in moderately high dimensions and can incorporate both continuous and discrete covariates. We prove that this new weighting scheme leads to consistent estimation of the quantile regression coefficients and demonstrate its effectiveness via Monte Carlo simulations. We also illustrate the new method using a widely recognized data set from a clinical trial on primary biliary cirrhosis. PMID:23975800
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 8 2010-01-01 2010-01-01 false Edible kernel. 981.7 Section 981.7 Agriculture... Regulating Handling Definitions § 981.7 Edible kernel. Edible kernel means a kernel, piece, or particle of almond kernel that is not inedible. [41 FR 26852, June 30, 1976] ...
Kernel K-Means Sampling for Nyström Approximation.
He, Li; Zhang, Hong
2018-05-01
A fundamental problem in Nyström-based kernel matrix approximation is the sampling method by which training set is built. In this paper, we suggest to use kernel -means sampling, which is shown in our works to minimize the upper bound of a matrix approximation error. We first propose a unified kernel matrix approximation framework, which is able to describe most existing Nyström approximations under many popular kernels, including Gaussian kernel and polynomial kernel. We then show that, the matrix approximation error upper bound, in terms of the Frobenius norm, is equal to the -means error of data points in kernel space plus a constant. Thus, the -means centers of data in kernel space, or the kernel -means centers, are the optimal representative points with respect to the Frobenius norm error upper bound. Experimental results, with both Gaussian kernel and polynomial kernel, on real-world data sets and image segmentation tasks show the superiority of the proposed method over the state-of-the-art methods.
KMgene: a unified R package for gene-based association analysis for complex traits.
Yan, Qi; Fang, Zhou; Chen, Wei; Stegle, Oliver
2018-02-09
In this report, we introduce an R package KMgene for performing gene-based association tests for familial, multivariate or longitudinal traits using kernel machine (KM) regression under a generalized linear mixed model (GLMM) framework. Extensive simulations were performed to evaluate the validity of the approaches implemented in KMgene. http://cran.r-project.org/web/packages/KMgene. qi.yan@chp.edu or wei.chen@chp.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.
fRMSDPred: Predicting Local RMSD Between Structural Fragments Using Sequence Information
2007-04-04
machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel
A Regression Design Approach to Optimal and Robust Spacing Selection.
1981-07-01
Hassanein (1968, 1969a, 1969b, 1971, 1972, 1977), Kulldorf (1963), Kulldorf and Vannman (1973), Rhodin (1976), Sarhan and Greenberg (1958, 1962) and...of d0 and Q0 1 d 0 "Q0 ’ are in the reproducing kernel Hilbert space (RKHS) generated by R, the techniques developed by Parzen (1961a, 1961b) may be... Greenberg , B.G. (1958). Estimation problems in the exponential distribution using order statistics. Proceedings of the Statistical Techniques in Missile
Predicting Error Bars for QSAR Models
NASA Astrophysics Data System (ADS)
Schroeter, Timon; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-09-01
Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D7 models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniques for the other modelling approaches.
Exploiting graph kernels for high performance biomedical relation extraction.
Panyam, Nagesh C; Verspoor, Karin; Cohn, Trevor; Ramamohanarao, Kotagiri
2018-01-30
Relation extraction from biomedical publications is an important task in the area of semantic mining of text. Kernel methods for supervised relation extraction are often preferred over manual feature engineering methods, when classifying highly ordered structures such as trees and graphs obtained from syntactic parsing of a sentence. Tree kernels such as the Subset Tree Kernel and Partial Tree Kernel have been shown to be effective for classifying constituency parse trees and basic dependency parse graphs of a sentence. Graph kernels such as the All Path Graph kernel (APG) and Approximate Subgraph Matching (ASM) kernel have been shown to be suitable for classifying general graphs with cycles, such as the enhanced dependency parse graph of a sentence. In this work, we present a high performance Chemical-Induced Disease (CID) relation extraction system. We present a comparative study of kernel methods for the CID task and also extend our study to the Protein-Protein Interaction (PPI) extraction task, an important biomedical relation extraction task. We discuss novel modifications to the ASM kernel to boost its performance and a method to apply graph kernels for extracting relations expressed in multiple sentences. Our system for CID relation extraction attains an F-score of 60%, without using external knowledge sources or task specific heuristic or rules. In comparison, the state of the art Chemical-Disease Relation Extraction system achieves an F-score of 56% using an ensemble of multiple machine learning methods, which is then boosted to 61% with a rule based system employing task specific post processing rules. For the CID task, graph kernels outperform tree kernels substantially, and the best performance is obtained with APG kernel that attains an F-score of 60%, followed by the ASM kernel at 57%. The performance difference between the ASM and APG kernels for CID sentence level relation extraction is not significant. In our evaluation of ASM for the PPI task, ASM performed better than APG kernel for the BioInfer dataset, in the Area Under Curve (AUC) measure (74% vs 69%). However, for all the other PPI datasets, namely AIMed, HPRD50, IEPA and LLL, ASM is substantially outperformed by the APG kernel in F-score and AUC measures. We demonstrate a high performance Chemical Induced Disease relation extraction, without employing external knowledge sources or task specific heuristics. Our work shows that graph kernels are effective in extracting relations that are expressed in multiple sentences. We also show that the graph kernels, namely the ASM and APG kernels, substantially outperform the tree kernels. Among the graph kernels, we showed the ASM kernel as effective for biomedical relation extraction, with comparable performance to the APG kernel for datasets such as the CID-sentence level relation extraction and BioInfer in PPI. Overall, the APG kernel is shown to be significantly more accurate than the ASM kernel, achieving better performance on most datasets.
7 CFR 810.2202 - Definition of other terms.
Code of Federal Regulations, 2014 CFR
2014-01-01
... kernels, foreign material, and shrunken and broken kernels. The sum of these three factors may not exceed... the removal of dockage and shrunken and broken kernels. (g) Heat-damaged kernels. Kernels, pieces of... sample after the removal of dockage and shrunken and broken kernels. (h) Other grains. Barley, corn...
7 CFR 981.8 - Inedible kernel.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 8 2010-01-01 2010-01-01 false Inedible kernel. 981.8 Section 981.8 Agriculture... Regulating Handling Definitions § 981.8 Inedible kernel. Inedible kernel means a kernel, piece, or particle of almond kernel with any defect scored as serious damage, or damage due to mold, gum, shrivel, or...
7 CFR 51.1415 - Inedible kernels.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Inedible kernels. 51.1415 Section 51.1415 Agriculture... Standards for Grades of Pecans in the Shell 1 Definitions § 51.1415 Inedible kernels. Inedible kernels means that the kernel or pieces of kernels are rancid, moldy, decayed, injured by insects or otherwise...
Coupling individual kernel-filling processes with source-sink interactions into GREENLAB-Maize.
Ma, Yuntao; Chen, Youjia; Zhu, Jinyu; Meng, Lei; Guo, Yan; Li, Baoguo; Hoogenboom, Gerrit
2018-02-13
Failure to account for the variation of kernel growth in a cereal crop simulation model may cause serious deviations in the estimates of crop yield. The goal of this research was to revise the GREENLAB-Maize model to incorporate source- and sink-limited allocation approaches to simulate the dry matter accumulation of individual kernels of an ear (GREENLAB-Maize-Kernel). The model used potential individual kernel growth rates to characterize the individual potential sink demand. The remobilization of non-structural carbohydrates from reserve organs to kernels was also incorporated. Two years of field experiments were conducted to determine the model parameter values and to evaluate the model using two maize hybrids with different plant densities and pollination treatments. Detailed observations were made on the dimensions and dry weights of individual kernels and other above-ground plant organs throughout the seasons. Three basic traits characterizing an individual kernel were compared on simulated and measured individual kernels: (1) final kernel size; (2) kernel growth rate; and (3) duration of kernel filling. Simulations of individual kernel growth closely corresponded to experimental data. The model was able to reproduce the observed dry weight of plant organs well. Then, the source-sink dynamics and the remobilization of carbohydrates for kernel growth were quantified to show that remobilization processes accompanied source-sink dynamics during the kernel-filling process. We conclude that the model may be used to explore options for optimizing plant kernel yield by matching maize management to the environment, taking into account responses at the level of individual kernels. © The Author(s) 2018. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Unconventional protein sources: apricot seed kernels.
Gabrial, G N; El-Nahry, F I; Awadalla, M Z; Girgis, S M
1981-09-01
Hamawy apricot seed kernels (sweet), Amar apricot seed kernels (bitter) and treated Amar apricot kernels (bitterness removed) were evaluated biochemically. All kernels were found to be high in fat (42.2--50.91%), protein (23.74--25.70%) and fiber (15.08--18.02%). Phosphorus, calcium, and iron were determined in all experimental samples. The three different apricot seed kernels were used for extensive study including the qualitative determination of the amino acid constituents by acid hydrolysis, quantitative determination of some amino acids, and biological evaluation of the kernel proteins in order to use them as new protein sources. Weanling albino rats failed to grow on diets containing the Amar apricot seed kernels due to low food consumption because of its bitterness. There was no loss in weight in that case. The Protein Efficiency Ratio data and blood analysis results showed the Hamawy apricot seed kernels to be higher in biological value than treated apricot seed kernels. The Net Protein Ratio data which accounts for both weight, maintenance and growth showed the treated apricot seed kernels to be higher in biological value than both Hamawy and Amar kernels. The Net Protein Ratio for the last two kernels were nearly equal.
Effective dose rate coefficients for exposure to contaminated soil
Veinot, Kenneth G.; Eckerman, Keith F.; Bellamy, Michael B.; ...
2017-05-10
The Oak Ridge National Laboratory Center for Radiation Protection Knowledge has undertaken calculations related to various environmental exposure scenarios. A previous paper reported the results for submersion in radioactive air and immersion in water using age-specific mathematical phantoms. This paper presents age-specific effective dose rate coefficients derived using stylized mathematical phantoms for exposure to contaminated soils. Dose rate coefficients for photon, electron, and positrons of discrete energies were calculated and folded with emissions of 1252 radionuclides addressed in ICRP Publication 107 to determine equivalent and effective dose rate coefficients. The MCNP6 radiation transport code was used for organ dose ratemore » calculations for photons and the contribution of electrons to skin dose rate was derived using point-kernels. Bremsstrahlung and annihilation photons of positron emission were evaluated as discrete photons. As a result, the coefficients calculated in this work compare favorably to those reported in the US Federal Guidance Report 12 as well as by other authors who employed voxel phantoms for similar exposure scenarios.« less
Effective dose rate coefficients for exposure to contaminated soil
DOE Office of Scientific and Technical Information (OSTI.GOV)
Veinot, Kenneth G.; Eckerman, Keith F.; Bellamy, Michael B.
The Oak Ridge National Laboratory Center for Radiation Protection Knowledge has undertaken calculations related to various environmental exposure scenarios. A previous paper reported the results for submersion in radioactive air and immersion in water using age-specific mathematical phantoms. This paper presents age-specific effective dose rate coefficients derived using stylized mathematical phantoms for exposure to contaminated soils. Dose rate coefficients for photon, electron, and positrons of discrete energies were calculated and folded with emissions of 1252 radionuclides addressed in ICRP Publication 107 to determine equivalent and effective dose rate coefficients. The MCNP6 radiation transport code was used for organ dose ratemore » calculations for photons and the contribution of electrons to skin dose rate was derived using point-kernels. Bremsstrahlung and annihilation photons of positron emission were evaluated as discrete photons. As a result, the coefficients calculated in this work compare favorably to those reported in the US Federal Guidance Report 12 as well as by other authors who employed voxel phantoms for similar exposure scenarios.« less
An introduction to kernel-based learning algorithms.
Müller, K R; Mika, S; Rätsch, G; Tsuda, K; Schölkopf, B
2001-01-01
This paper provides an introduction to support vector machines, kernel Fisher discriminant analysis, and kernel principal component analysis, as examples for successful kernel-based learning methods. We first give a short background about Vapnik-Chervonenkis theory and kernel feature spaces and then proceed to kernel based learning in supervised and unsupervised scenarios including practical and algorithmic considerations. We illustrate the usefulness of kernel algorithms by discussing applications such as optical character recognition and DNA analysis.
7 CFR 981.408 - Inedible kernel.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 8 2010-01-01 2010-01-01 false Inedible kernel. 981.408 Section 981.408 Agriculture... Administrative Rules and Regulations § 981.408 Inedible kernel. Pursuant to § 981.8, the definition of inedible kernel is modified to mean a kernel, piece, or particle of almond kernel with any defect scored as...
Design of CT reconstruction kernel specifically for clinical lung imaging
NASA Astrophysics Data System (ADS)
Cody, Dianna D.; Hsieh, Jiang; Gladish, Gregory W.
2005-04-01
In this study we developed a new reconstruction kernel specifically for chest CT imaging. An experimental flat-panel CT scanner was used on large dogs to produce 'ground-truth" reference chest CT images. These dogs were also examined using a clinical 16-slice CT scanner. We concluded from the dog images acquired on the clinical scanner that the loss of subtle lung structures was due mostly to the presence of the background noise texture when using currently available reconstruction kernels. This qualitative evaluation of the dog CT images prompted the design of a new recon kernel. This new kernel consisted of the combination of a low-pass and a high-pass kernel to produce a new reconstruction kernel, called the 'Hybrid" kernel. The performance of this Hybrid kernel fell between the two kernels on which it was based, as expected. This Hybrid kernel was also applied to a set of 50 patient data sets; the analysis of these clinical images is underway. We are hopeful that this Hybrid kernel will produce clinical images with an acceptable tradeoff of lung detail, reliable HU, and image noise.
Quality changes in macadamia kernel between harvest and farm-gate.
Walton, David A; Wallace, Helen M
2011-02-01
Macadamia integrifolia, Macadamia tetraphylla and their hybrids are cultivated for their edible kernels. After harvest, nuts-in-shell are partially dried on-farm and sorted to eliminate poor-quality kernels before consignment to a processor. During these operations, kernel quality may be lost. In this study, macadamia nuts-in-shell were sampled at five points of an on-farm postharvest handling chain from dehusking to the final storage silo to assess quality loss prior to consignment. Shoulder damage, weight of pieces and unsound kernel were assessed for raw kernels, and colour, mottled colour and surface damage for roasted kernels. Shoulder damage, weight of pieces and unsound kernel for raw kernels increased significantly between the dehusker and the final silo. Roasted kernels displayed a significant increase in dark colour, mottled colour and surface damage during on-farm handling. Significant loss of macadamia kernel quality occurred on a commercial farm during sorting and storage of nuts-in-shell before nuts were consigned to a processor. Nuts-in-shell should be dried as quickly as possible and on-farm handling minimised to maintain optimum kernel quality. 2010 Society of Chemical Industry.
A new discriminative kernel from probabilistic models.
Tsuda, Koji; Kawanabe, Motoaki; Rätsch, Gunnar; Sonnenburg, Sören; Müller, Klaus-Robert
2002-10-01
Recently, Jaakkola and Haussler (1999) proposed a method for constructing kernel functions from probabilistic models. Their so-called Fisher kernel has been combined with discriminative classifiers such as support vector machines and applied successfully in, for example, DNA and protein analysis. Whereas the Fisher kernel is calculated from the marginal log-likelihood, we propose the TOP kernel derived; from tangent vectors of posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments, our new discriminative TOP kernel compares favorably to the Fisher kernel.
Implementing Kernel Methods Incrementally by Incremental Nonlinear Projection Trick.
Kwak, Nojun
2016-05-20
Recently, the nonlinear projection trick (NPT) was introduced enabling direct computation of coordinates of samples in a reproducing kernel Hilbert space. With NPT, any machine learning algorithm can be extended to a kernel version without relying on the so called kernel trick. However, NPT is inherently difficult to be implemented incrementally because an ever increasing kernel matrix should be treated as additional training samples are introduced. In this paper, an incremental version of the NPT (INPT) is proposed based on the observation that the centerization step in NPT is unnecessary. Because the proposed INPT does not change the coordinates of the old data, the coordinates obtained by INPT can directly be used in any incremental methods to implement a kernel version of the incremental methods. The effectiveness of the INPT is shown by applying it to implement incremental versions of kernel methods such as, kernel singular value decomposition, kernel principal component analysis, and kernel discriminant analysis which are utilized for problems of kernel matrix reconstruction, letter classification, and face image retrieval, respectively.
Increasing accuracy of dispersal kernels in grid-based population models
Slone, D.H.
2011-01-01
Dispersal kernels in grid-based population models specify the proportion, distance and direction of movements within the model landscape. Spatial errors in dispersal kernels can have large compounding effects on model accuracy. Circular Gaussian and Laplacian dispersal kernels at a range of spatial resolutions were investigated, and methods for minimizing errors caused by the discretizing process were explored. Kernels of progressively smaller sizes relative to the landscape grid size were calculated using cell-integration and cell-center methods. These kernels were convolved repeatedly, and the final distribution was compared with a reference analytical solution. For large Gaussian kernels (σ > 10 cells), the total kernel error was <10 &sup-11; compared to analytical results. Using an invasion model that tracked the time a population took to reach a defined goal, the discrete model results were comparable to the analytical reference. With Gaussian kernels that had σ ≤ 0.12 using the cell integration method, or σ ≤ 0.22 using the cell center method, the kernel error was greater than 10%, which resulted in invasion times that were orders of magnitude different than theoretical results. A goal-seeking routine was developed to adjust the kernels to minimize overall error. With this, corrections for small kernels were found that decreased overall kernel error to <10-11 and invasion time error to <5%.
Anthraquinones isolated from the browned Chinese chestnut kernels (Castanea mollissima blume)
NASA Astrophysics Data System (ADS)
Zhang, Y. L.; Qi, J. H.; Qin, L.; Wang, F.; Pang, M. X.
2016-08-01
Anthraquinones (AQS) represent a group of secondary metallic products in plants. AQS are often naturally occurring in plants and microorganisms. In a previous study, we found that AQS were produced by enzymatic browning reaction in Chinese chestnut kernels. To find out whether non-enzymatic browning reaction in the kernels could produce AQS too, AQS were extracted from three groups of chestnut kernels: fresh kernels, non-enzymatic browned kernels, and browned kernels, and the contents of AQS were determined. High performance liquid chromatography (HPLC) and nuclear magnetic resonance (NMR) methods were used to identify two compounds of AQS, rehein(1) and emodin(2). AQS were barely exists in the fresh kernels, while both browned kernel groups sample contained a high amount of AQS. Thus, we comfirmed that AQS could be produced during both enzymatic and non-enzymatic browning process. Rhein and emodin were the main components of AQS in the browned kernels.
Broken rice kernels and the kinetics of rice hydration and texture during cooking.
Saleh, Mohammed; Meullenet, Jean-Francois
2013-05-01
During rice milling and processing, broken kernels are inevitably present, although to date it has been unclear as to how the presence of broken kernels affects rice hydration and cooked rice texture. Therefore, this work intended to study the effect of broken kernels in a rice sample on rice hydration and texture during cooking. Two medium-grain and two long-grain rice cultivars were harvested, dried and milled, and the broken kernels were separated from unbroken kernels. Broken rice kernels were subsequently combined with unbroken rice kernels forming treatments of 0, 40, 150, 350 or 1000 g kg(-1) broken kernels ratio. Rice samples were then cooked and the moisture content of the cooked rice, the moisture uptake rate, and rice hardness and stickiness were measured. As the amount of broken rice kernels increased, rice sample texture became increasingly softer (P < 0.05) but the unbroken kernels became significantly harder. Moisture content and moisture uptake rate were positively correlated, and cooked rice hardness was negatively correlated to the percentage of broken kernels in rice samples. Differences in the proportions of broken rice in a milled rice sample play a major role in determining the texture properties of cooked rice. Variations in the moisture migration kinetics between broken and unbroken kernels caused faster hydration of the cores of broken rice kernels, with greater starch leach-out during cooking affecting the texture of the cooked rice. The texture of cooked rice can be controlled, to some extent, by varying the proportion of broken kernels in milled rice. © 2012 Society of Chemical Industry.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu Qin, E-mail: zhuqin@fudan.edu.cn; Peng Xizhe, E-mail: xzpeng@fudan.edu.cn
This study examines the impacts of population size, population structure, and consumption level on carbon emissions in China from 1978 to 2008. To this end, we expanded the stochastic impacts by regression on population, affluence, and technology model and used the ridge regression method, which overcomes the negative influences of multicollinearity among independent variables under acceptable bias. Results reveal that changes in consumption level and population structure were the major impact factors, not changes in population size. Consumption level and carbon emissions were highly correlated. In terms of population structure, urbanization, population age, and household size had distinct effects onmore » carbon emissions. Urbanization increased carbon emissions, while the effect of age acted primarily through the expansion of the labor force and consequent overall economic growth. Shrinking household size increased residential consumption, resulting in higher carbon emissions. Households, rather than individuals, are a more reasonable explanation for the demographic impact on carbon emissions. Potential social policies for low carbon development are also discussed. - Highlights: Black-Right-Pointing-Pointer We examine the impacts of population change on carbon emissions in China. Black-Right-Pointing-Pointer We expand the STIRPAT model by containing population structure factors in the model. Black-Right-Pointing-Pointer The population structure includes age structure, urbanization level, and household size. Black-Right-Pointing-Pointer The ridge regression method is used to estimate the model with multicollinearity. Black-Right-Pointing-Pointer The population structure plays a more important role compared with the population size.« less
Estimation of Electrically-Evoked Knee Torque from Mechanomyography Using Support Vector Regression.
Ibitoye, Morufu Olusola; Hamzaid, Nur Azah; Abdul Wahab, Ahmad Khairi; Hasnan, Nazirah; Olatunji, Sunday Olusanya; Davis, Glen M
2016-07-19
The difficulty of real-time muscle force or joint torque estimation during neuromuscular electrical stimulation (NMES) in physical therapy and exercise science has motivated recent research interest in torque estimation from other muscle characteristics. This study investigated the accuracy of a computational intelligence technique for estimating NMES-evoked knee extension torque based on the Mechanomyographic signals (MMG) of contracting muscles that were recorded from eight healthy males. Simulation of the knee torque was modelled via Support Vector Regression (SVR) due to its good generalization ability in related fields. Inputs to the proposed model were MMG amplitude characteristics, the level of electrical stimulation or contraction intensity, and knee angle. Gaussian kernel function, as well as its optimal parameters were identified with the best performance measure and were applied as the SVR kernel function to build an effective knee torque estimation model. To train and test the model, the data were partitioned into training (70%) and testing (30%) subsets, respectively. The SVR estimation accuracy, based on the coefficient of determination (R²) between the actual and the estimated torque values was up to 94% and 89% during the training and testing cases, with root mean square errors (RMSE) of 9.48 and 12.95, respectively. The knee torque estimations obtained using SVR modelling agreed well with the experimental data from an isokinetic dynamometer. These findings support the realization of a closed-loop NMES system for functional tasks using MMG as the feedback signal source and an SVR algorithm for joint torque estimation.
NASA Astrophysics Data System (ADS)
Lawal, S. A.; Choudhury, I. A.; Nukman, Y.
2015-01-01
The understanding of cutting fluids performance in turning process is very important in order to improve the efficiency of the process. This efficiency can be determined based on certain process parameters such as flank wear, cutting forces developed, temperature developed at the tool chip interface, surface roughness on the work piece, etc. In this study, the objective is to determine the influence of cutting fluids on flank wear during turning of AISI 4340 with coated carbide inserts. The performances of three types of cutting fluids were compared using Taguchi experimental method. The results show that palm kernel oil based cutting fluids performed better than the other two cutting fluids in reducing flank wear. Mathematical models for cutting parameters such as cutting speed, feed rate, depth of cut and cutting fluids were obtained from regression analysis using MINITAB 14 software to predict flank wear. Experiments were conducted based on the optimized values to validate the regression equations for flank wear and 5.82 % error was obtained. The optimal cutting parameters for the flank wear using S/N ratio were 160 m/min of cutting speed (level 1), 0.18 mm/rev of feed (level 1), 1.75 mm of depth of cut (level 2) and 2.97 mm2/s palm kernel oil based cutting fluid (level 3). ANOVA shows cutting speed of 85.36 %; and feed rate 4.81 %) as significant factors.
Nonlinear Deep Kernel Learning for Image Annotation.
Jiu, Mingyuan; Sahbi, Hichem
2017-02-08
Multiple kernel learning (MKL) is a widely used technique for kernel design. Its principle consists in learning, for a given support vector classifier, the most suitable convex (or sparse) linear combination of standard elementary kernels. However, these combinations are shallow and often powerless to capture the actual similarity between highly semantic data, especially for challenging classification tasks such as image annotation. In this paper, we redefine multiple kernels using deep multi-layer networks. In this new contribution, a deep multiple kernel is recursively defined as a multi-layered combination of nonlinear activation functions, each one involves a combination of several elementary or intermediate kernels, and results into a positive semi-definite deep kernel. We propose four different frameworks in order to learn the weights of these networks: supervised, unsupervised, kernel-based semisupervised and Laplacian-based semi-supervised. When plugged into support vector machines (SVMs), the resulting deep kernel networks show clear gain, compared to several shallow kernels for the task of image annotation. Extensive experiments and analysis on the challenging ImageCLEF photo annotation benchmark, the COREL5k database and the Banana dataset validate the effectiveness of the proposed method.
NASA Astrophysics Data System (ADS)
Haryanto, B.; Bukit, R. Br; Situmeang, E. M.; Christina, E. P.; Pandiangan, F.
2018-02-01
The purpose of this study was to determine the performance, productivity and feasibility of the operation of palm kernel processing plant based on Energy Productivity Ratio (EPR). EPR is expressed as the ratio of output to input energy and by-product. Palm Kernel plan is process in palm kernel to become palm kernel oil. The procedure started from collecting data needed as energy input such as: palm kernel prices, energy demand and depreciation of the factory. The energy output and its by-product comprise the whole production price such as: palm kernel oil price and the remaining products such as shells and pulp price. Calculation the equality of energy of palm kernel oil is to analyze the value of Energy Productivity Ratio (EPR) bases on processing capacity per year. The investigation has been done in Kernel Oil Processing Plant PT-X at Sumatera Utara plantation. The value of EPR was 1.54 (EPR > 1), which indicated that the processing of palm kernel into palm kernel oil is feasible to be operated based on the energy productivity.
Total Ambient Dose Equivalent Buildup Factor Determination for Nbs04 Concrete.
Duckic, Paulina; Hayes, Robert B
2018-06-01
Buildup factors are dimensionless multiplicative factors required by the point kernel method to account for scattered radiation through a shielding material. The accuracy of the point kernel method is strongly affected by the correspondence of analyzed parameters to experimental configurations, which is attempted to be simplified here. The point kernel method has not been found to have widespread practical use for neutron shielding calculations due to the complex neutron transport behavior through shielding materials (i.e. the variety of interaction mechanisms that neutrons may undergo while traversing the shield) as well as non-linear neutron total cross section energy dependence. In this work, total ambient dose buildup factors for NBS04 concrete are calculated in terms of neutron and secondary gamma ray transmission factors. The neutron and secondary gamma ray transmission factors are calculated using MCNP6™ code with updated cross sections. Both transmission factors and buildup factors are given in a tabulated form. Practical use of neutron transmission and buildup factors warrants rigorously calculated results with all associated uncertainties. In this work, sensitivity analysis of neutron transmission factors and total buildup factors with varying water content has been conducted. The analysis showed significant impact of varying water content in concrete on both neutron transmission factors and total buildup factors. Finally, support vector regression, a machine learning technique, has been engaged to make a model based on the calculated data for calculation of the buildup factors. The developed model can predict most of the data with 20% relative error.
Variable importance in nonlinear kernels (VINK): classification of digitized histopathology.
Ginsburg, Shoshana; Ali, Sahirzeeshan; Lee, George; Basavanhally, Ajay; Madabhushi, Anant
2013-01-01
Quantitative histomorphometry is the process of modeling appearance of disease morphology on digitized histopathology images via image-based features (e.g., texture, graphs). Due to the curse of dimensionality, building classifiers with large numbers of features requires feature selection (which may require a large training set) or dimensionality reduction (DR). DR methods map the original high-dimensional features in terms of eigenvectors and eigenvalues, which limits the potential for feature transparency or interpretability. Although methods exist for variable selection and ranking on embeddings obtained via linear DR schemes (e.g., principal components analysis (PCA)), similar methods do not yet exist for nonlinear DR (NLDR) methods. In this work we present a simple yet elegant method for approximating the mapping between the data in the original feature space and the transformed data in the kernel PCA (KPCA) embedding space; this mapping provides the basis for quantification of variable importance in nonlinear kernels (VINK). We show how VINK can be implemented in conjunction with the popular Isomap and Laplacian eigenmap algorithms. VINK is evaluated in the contexts of three different problems in digital pathology: (1) predicting five year PSA failure following radical prostatectomy, (2) predicting Oncotype DX recurrence risk scores for ER+ breast cancers, and (3) distinguishing good and poor outcome p16+ oropharyngeal tumors. We demonstrate that subsets of features identified by VINK provide similar or better classification or regression performance compared to the original high dimensional feature sets.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 8 2010-01-01 2010-01-01 false Kernel weight. 981.9 Section 981.9 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (Marketing Agreements... Regulating Handling Definitions § 981.9 Kernel weight. Kernel weight means the weight of kernels, including...
An SVM model with hybrid kernels for hydrological time series
NASA Astrophysics Data System (ADS)
Wang, C.; Wang, H.; Zhao, X.; Xie, Q.
2017-12-01
Support Vector Machine (SVM) models have been widely applied to the forecast of climate/weather and its impact on other environmental variables such as hydrologic response to climate/weather. When using SVM, the choice of the kernel function plays the key role. Conventional SVM models mostly use one single type of kernel function, e.g., radial basis kernel function. Provided that there are several featured kernel functions available, each having its own advantages and drawbacks, a combination of these kernel functions may give more flexibility and robustness to SVM approach, making it suitable for a wide range of application scenarios. This paper presents such a linear combination of radial basis kernel and polynomial kernel for the forecast of monthly flowrate in two gaging stations using SVM approach. The results indicate significant improvement in the accuracy of predicted series compared to the approach with either individual kernel function, thus demonstrating the feasibility and advantages of such hybrid kernel approach for SVM applications.
Approximate kernel competitive learning.
Wu, Jian-Sheng; Zheng, Wei-Shi; Lai, Jian-Huang
2015-03-01
Kernel competitive learning has been successfully used to achieve robust clustering. However, kernel competitive learning (KCL) is not scalable for large scale data processing, because (1) it has to calculate and store the full kernel matrix that is too large to be calculated and kept in the memory and (2) it cannot be computed in parallel. In this paper we develop a framework of approximate kernel competitive learning for processing large scale dataset. The proposed framework consists of two parts. First, it derives an approximate kernel competitive learning (AKCL), which learns kernel competitive learning in a subspace via sampling. We provide solid theoretical analysis on why the proposed approximation modelling would work for kernel competitive learning, and furthermore, we show that the computational complexity of AKCL is largely reduced. Second, we propose a pseudo-parallelled approximate kernel competitive learning (PAKCL) based on a set-based kernel competitive learning strategy, which overcomes the obstacle of using parallel programming in kernel competitive learning and significantly accelerates the approximate kernel competitive learning for large scale clustering. The empirical evaluation on publicly available datasets shows that the proposed AKCL and PAKCL can perform comparably as KCL, with a large reduction on computational cost. Also, the proposed methods achieve more effective clustering performance in terms of clustering precision against related approximate clustering approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.
Multiple kernels learning-based biological entity relationship extraction method.
Dongliang, Xu; Jingchang, Pan; Bailing, Wang
2017-09-20
Automatic extracting protein entity interaction information from biomedical literature can help to build protein relation network and design new drugs. There are more than 20 million literature abstracts included in MEDLINE, which is the most authoritative textual database in the field of biomedicine, and follow an exponential growth over time. This frantic expansion of the biomedical literature can often be difficult to absorb or manually analyze. Thus efficient and automated search engines are necessary to efficiently explore the biomedical literature using text mining techniques. The P, R, and F value of tag graph method in Aimed corpus are 50.82, 69.76, and 58.61%, respectively. The P, R, and F value of tag graph kernel method in other four evaluation corpuses are 2-5% higher than that of all-paths graph kernel. And The P, R and F value of feature kernel and tag graph kernel fuse methods is 53.43, 71.62 and 61.30%, respectively. The P, R and F value of feature kernel and tag graph kernel fuse methods is 55.47, 70.29 and 60.37%, respectively. It indicated that the performance of the two kinds of kernel fusion methods is better than that of simple kernel. In comparison with the all-paths graph kernel method, the tag graph kernel method is superior in terms of overall performance. Experiments show that the performance of the multi-kernels method is better than that of the three separate single-kernel method and the dual-mutually fused kernel method used hereof in five corpus sets.
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
De Smet, F; De Brabanter, J; Van den Bosch, T; Pochet, N; Amant, F; Van Holsbeke, C; Moerman, P; De Moor, B; Vergote, I; Timmerman, D
2006-06-01
Preoperative knowledge of the depth of myometrial infiltration is important in patients with endometrial carcinoma. This study aimed at assessing the value of histopathological parameters obtained from an endometrial biopsy (Pipelle de Cornier; results available preoperatively) and ultrasound measurements obtained after transvaginal sonography with color Doppler imaging in the preoperative prediction of the depth of myometrial invasion, as determined by the final histopathological examination of the hysterectomy specimen (the gold standard). We first collected ultrasound and histopathological data from 97 consecutive women with endometrial carcinoma and divided them into two groups according to surgical stage (Stages Ia and Ib vs. Stages Ic and higher). The areas (AUC) under the receiver-operating characteristics curves of the subjective assessment of depth of invasion by an experienced gynecologist and of the individual ultrasound parameters were calculated. Subsequently, we used these variables to train a logistic regression model and least squares support vector machines (LS-SVM) with linear and RBF (radial basis function) kernels. Finally, these models were validated prospectively on data from 76 new patients in order to make a preoperative prediction of the depth of invasion. Of all ultrasound parameters, the ratio of the endometrial and uterine volumes had the largest AUC (78%), while that of the subjective assessment was 79%. The AUCs of the blood flow indices were low (range, 51-64%). Stepwise logistic regression selected the degree of differentiation, the number of fibroids, the endometrial thickness and the volume of the tumor. Compared with the AUC of the subjective assessment (72%), prospective evaluation of the mathematical models resulted in a higher AUC for the LS-SVM model with an RBF kernel (77%), but this difference was not significant. Single morphological parameters do not improve the predictive power when compared with the subjective assessment of depth of myometrial invasion of endometrial cancer, and blood flow indices do not contribute to the prediction of stage. In this study an LS-SVM model with an RBF kernel gave the best prediction; while this might be more reliable than subjective assessment, confirmation by larger prospective studies is required. Copyright 2006 ISUOG. Published by John Wiley & Sons, Ltd.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Half kernel. 51.2295 Section 51.2295 Agriculture... Standards for Shelled English Walnuts (Juglans Regia) Definitions § 51.2295 Half kernel. Half kernel means the separated half of a kernel with not more than one-eighth broken off. ...
7 CFR 810.206 - Grades and grade requirements for barley.
Code of Federal Regulations, 2010 CFR
2010-01-01
... weight per bushel (pounds) Sound barley (percent) Maximum Limits of— Damaged kernels 1 (percent) Heat damaged kernels (percent) Foreign material (percent) Broken kernels (percent) Thin barley (percent) U.S... or otherwise of distinctly low quality. 1 Includes heat-damaged kernels. Injured-by-frost kernels and...
DOE Office of Scientific and Technical Information (OSTI.GOV)
IJ van Rooyen; DE Janney; BD Miller
2014-05-01
Post-irradiation examination of coated particle fuel from the AGR-1 experiment is in progress at Idaho National Laboratory and Oak Ridge National Laboratory. In this paper a brief summary of results from characterization of microstructures in the coating layers of selected irradiated fuel particles with burnup of 11.3% and 19.3% FIMA will be given. The main objectives of the characterization were to study irradiation effects, fuel kernel porosity, layer debonding, layer degradation or corrosion, fission-product precipitation, grain sizes, and transport of fission products from the kernels across the TRISO layers. Characterization techniques such as scanning electron microscopy, transmission electron microscopy, energymore » dispersive spectroscopy, and wavelength dispersive spectroscopy were used. A new approach to microscopic quantification of fission-product precipitates is also briefly demonstrated. Microstructural characterization focused on fission-product precipitates in the SiC-IPyC interface, the SiC layer and the fuel-buffer interlayer. The results provide significant new insights into mechanisms of fission-product transport. Although Pd-rich precipitates were identified at the SiC-IPyC interlayer, no significant SiC-layer thinning was observed for the particles investigated. Characterization of these precipitates highlighted the difficulty of measuring low concentrations of Ag in precipitates with significantly higher concentrations of Pd and U. Different approaches to resolving this problem are discussed. An initial hypothesis is provided to explain fission-product precipitate compositions and locations. No SiC phase transformations were observed and no debonding of the SiC-IPyC interlayer as a result of irradiation was observed for the samples investigated. Lessons learned from the post-irradiation examination are described and future actions are recommended.« less
Mapping Fire Severity Using Imaging Spectroscopy and Kernel Based Image Analysis
NASA Astrophysics Data System (ADS)
Prasad, S.; Cui, M.; Zhang, Y.; Veraverbeke, S.
2014-12-01
Improved spatial representation of within-burn heterogeneity after wildfires is paramount to effective land management decisions and more accurate fire emissions estimates. In this work, we demonstrate feasibility and efficacy of airborne imaging spectroscopy (hyperspectral imagery) for quantifying wildfire burn severity, using kernel based image analysis techniques. Two different airborne hyperspectral datasets, acquired over the 2011 Canyon and 2013 Rim fire in California using the Airborne Visible InfraRed Imaging Spectrometer (AVIRIS) sensor, were used in this study. The Rim Fire, covering parts of the Yosemite National Park started on August 17, 2013, and was the third largest fire in California's history. Canyon Fire occurred in the Tehachapi mountains, and started on September 4, 2011. In addition to post-fire data for both fires, half of the Rim fire was also covered with pre-fire images. Fire severity was measured in the field using Geo Composite Burn Index (GeoCBI). The field data was utilized to train and validate our models, wherein the trained models, in conjunction with imaging spectroscopy data were used for GeoCBI estimation wide geographical regions. This work presents an approach for using remotely sensed imagery combined with GeoCBI field data to map fire scars based on a non-linear (kernel based) epsilon-Support Vector Regression (e-SVR), which was used to learn the relationship between spectra and GeoCBI in a kernel-induced feature space. Classification of healthy vegetation versus fire-affected areas based on morphological multi-attribute profiles was also studied. The availability of pre- and post-fire imaging spectroscopy data over the Rim Fire provided a unique opportunity to evaluate the performance of bi-temporal imaging spectroscopy for assessing post-fire effects. This type of data is currently constrained because of limited airborne acquisitions before a fire, but will become widespread with future spaceborne sensors such as those on the planned NASA HyspIRI mission.
Predicting Error Bars for QSAR Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schroeter, Timon; Technische Universitaet Berlin, Department of Computer Science, Franklinstrasse 28/29, 10587 Berlin; Schwaighofer, Anton
2007-09-18
Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D{sub 7} models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniquesmore » for the other modelling approaches.« less
Code of Federal Regulations, 2014 CFR
2014-01-01
...) Kernel which is “dark amber” or darker color; (e) Kernel having more than one dark kernel spot, or one dark kernel spot more than one-eighth inch in greatest dimension; (f) Shriveling when the surface of the kernel is very conspicuously wrinkled; (g) Internal flesh discoloration of a medium shade of gray...
Code of Federal Regulations, 2013 CFR
2013-01-01
...) Kernel which is “dark amber” or darker color; (e) Kernel having more than one dark kernel spot, or one dark kernel spot more than one-eighth inch in greatest dimension; (f) Shriveling when the surface of the kernel is very conspicuously wrinkled; (g) Internal flesh discoloration of a medium shade of gray...
7 CFR 51.2125 - Split or broken kernels.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Split or broken kernels. 51.2125 Section 51.2125 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards... kernels. Split or broken kernels means seven-eighths or less of complete whole kernels but which will not...
7 CFR 51.2296 - Three-fourths half kernel.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Three-fourths half kernel. 51.2296 Section 51.2296 Agriculture Regulations of the Department of Agriculture AGRICULTURAL MARKETING SERVICE (Standards...-fourths half kernel. Three-fourths half kernel means a portion of a half of a kernel which has more than...
The Classification of Diabetes Mellitus Using Kernel k-means
NASA Astrophysics Data System (ADS)
Alamsyah, M.; Nafisah, Z.; Prayitno, E.; Afida, A. M.; Imah, E. M.
2018-01-01
Diabetes Mellitus is a metabolic disorder which is characterized by chronicle hypertensive glucose. Automatics detection of diabetes mellitus is still challenging. This study detected diabetes mellitus by using kernel k-Means algorithm. Kernel k-means is an algorithm which was developed from k-means algorithm. Kernel k-means used kernel learning that is able to handle non linear separable data; where it differs with a common k-means. The performance of kernel k-means in detecting diabetes mellitus is also compared with SOM algorithms. The experiment result shows that kernel k-means has good performance and a way much better than SOM.
UNICOS Kernel Internals Application Development
NASA Technical Reports Server (NTRS)
Caredo, Nicholas; Craw, James M. (Technical Monitor)
1995-01-01
Having an understanding of UNICOS Kernel Internals is valuable information. However, having the knowledge is only half the value. The second half comes with knowing how to use this information and apply it to the development of tools. The kernel contains vast amounts of useful information that can be utilized. This paper discusses the intricacies of developing utilities that utilize kernel information. In addition, algorithms, logic, and code will be discussed for accessing kernel information. Code segments will be provided that demonstrate how to locate and read kernel structures. Types of applications that can utilize kernel information will also be discussed.
Detection of maize kernels breakage rate based on K-means clustering
NASA Astrophysics Data System (ADS)
Yang, Liang; Wang, Zhuo; Gao, Lei; Bai, Xiaoping
2017-04-01
In order to optimize the recognition accuracy of maize kernels breakage detection and improve the detection efficiency of maize kernels breakage, this paper using computer vision technology and detecting of the maize kernels breakage based on K-means clustering algorithm. First, the collected RGB images are converted into Lab images, then the original images clarity evaluation are evaluated by the energy function of Sobel 8 gradient. Finally, the detection of maize kernels breakage using different pixel acquisition equipments and different shooting angles. In this paper, the broken maize kernels are identified by the color difference between integrity kernels and broken kernels. The original images clarity evaluation and different shooting angles are taken to verify that the clarity and shooting angles of the images have a direct influence on the feature extraction. The results show that K-means clustering algorithm can distinguish the broken maize kernels effectively.
Modeling adaptive kernels from probabilistic phylogenetic trees.
Nicotra, Luca; Micheli, Alessio
2009-01-01
Modeling phylogenetic interactions is an open issue in many computational biology problems. In the context of gene function prediction we introduce a class of kernels for structured data leveraging on a hierarchical probabilistic modeling of phylogeny among species. We derive three kernels belonging to this setting: a sufficient statistics kernel, a Fisher kernel, and a probability product kernel. The new kernels are used in the context of support vector machine learning. The kernels adaptivity is obtained through the estimation of the parameters of a tree structured model of evolution using as observed data phylogenetic profiles encoding the presence or absence of specific genes in a set of fully sequenced genomes. We report results obtained in the prediction of the functional class of the proteins of the budding yeast Saccharomyces cerevisae which favorably compare to a standard vector based kernel and to a non-adaptive tree kernel function. A further comparative analysis is performed in order to assess the impact of the different components of the proposed approach. We show that the key features of the proposed kernels are the adaptivity to the input domain and the ability to deal with structured data interpreted through a graphical model representation.
Aflatoxin and nutrient contents of peanut collected from local market and their processed foods
NASA Astrophysics Data System (ADS)
Ginting, E.; Rahmianna, A. A.; Yusnawan, E.
2018-01-01
Peanut is succeptable to aflatoxin contamination and the sources of peanut as well as processing methods considerably affect aflatoxin content of the products. Therefore, the study on aflatoxin and nutrient contents of peanut collected from local market and their processed foods were performed. Good kernels of peanut were prepared into fried peanut, pressed-fried peanut, peanut sauce, peanut press cake, fermented peanut press cake (tempe) and fried tempe, while blended kernels (good and poor kernels) were processed into peanut sauce and tempe and poor kernels were only processed into tempe. The results showed that good and blended kernels which had high number of sound/intact kernels (82,46% and 62,09%), contained 9.8-9.9 ppb of aflatoxin B1, while slightly higher level was seen in poor kernels (12.1 ppb). However, the moisture, ash, protein, and fat contents of the kernels were similar as well as the products. Peanut tempe and fried tempe showed the highest increase in protein content, while decreased fat contents were seen in all products. The increase in aflatoxin B1 of peanut tempe prepared from poor kernels > blended kernels > good kernels. However, it averagely decreased by 61.2% after deep-fried. Excluding peanut tempe and fried tempe, aflatoxin B1 levels in all products derived from good kernels were below the permitted level (15 ppb). This suggests that sorting peanut kernels as ingredients and followed by heat processing would decrease the aflatoxin content in the products.
Partial Deconvolution with Inaccurate Blur Kernel.
Ren, Dongwei; Zuo, Wangmeng; Zhang, David; Xu, Jun; Zhang, Lei
2017-10-17
Most non-blind deconvolution methods are developed under the error-free kernel assumption, and are not robust to inaccurate blur kernel. Unfortunately, despite the great progress in blind deconvolution, estimation error remains inevitable during blur kernel estimation. Consequently, severe artifacts such as ringing effects and distortions are likely to be introduced in the non-blind deconvolution stage. In this paper, we tackle this issue by suggesting: (i) a partial map in the Fourier domain for modeling kernel estimation error, and (ii) a partial deconvolution model for robust deblurring with inaccurate blur kernel. The partial map is constructed by detecting the reliable Fourier entries of estimated blur kernel. And partial deconvolution is applied to wavelet-based and learning-based models to suppress the adverse effect of kernel estimation error. Furthermore, an E-M algorithm is developed for estimating the partial map and recovering the latent sharp image alternatively. Experimental results show that our partial deconvolution model is effective in relieving artifacts caused by inaccurate blur kernel, and can achieve favorable deblurring quality on synthetic and real blurry images.Most non-blind deconvolution methods are developed under the error-free kernel assumption, and are not robust to inaccurate blur kernel. Unfortunately, despite the great progress in blind deconvolution, estimation error remains inevitable during blur kernel estimation. Consequently, severe artifacts such as ringing effects and distortions are likely to be introduced in the non-blind deconvolution stage. In this paper, we tackle this issue by suggesting: (i) a partial map in the Fourier domain for modeling kernel estimation error, and (ii) a partial deconvolution model for robust deblurring with inaccurate blur kernel. The partial map is constructed by detecting the reliable Fourier entries of estimated blur kernel. And partial deconvolution is applied to wavelet-based and learning-based models to suppress the adverse effect of kernel estimation error. Furthermore, an E-M algorithm is developed for estimating the partial map and recovering the latent sharp image alternatively. Experimental results show that our partial deconvolution model is effective in relieving artifacts caused by inaccurate blur kernel, and can achieve favorable deblurring quality on synthetic and real blurry images.
Lien, Tonje G; Borgan, Ørnulf; Reppe, Sjur; Gautvik, Kaare; Glad, Ingrid Kristine
2018-03-07
Using high-dimensional penalized regression we studied genome-wide DNA-methylation in bone biopsies of 80 postmenopausal women in relation to their bone mineral density (BMD). The women showed BMD varying from severely osteoporotic to normal. Global gene expression data from the same individuals was available, and since DNA-methylation often affects gene expression, the overall aim of this paper was to include both of these omics data sets into an integrated analysis. The classical penalized regression uses one penalty, but we incorporated individual penalties for each of the DNA-methylation sites. These individual penalties were guided by the strength of association between DNA-methylations and gene transcript levels. DNA-methylations that were highly associated to one or more transcripts got lower penalties and were therefore favored compared to DNA-methylations showing less association to expression. Because of the complex pathways and interactions among genes, we investigated both the association between DNA-methylations and their corresponding cis gene, as well as the association between DNA-methylations and trans-located genes. Two integrating penalized methods were used: first, an adaptive group-regularized ridge regression, and secondly, variable selection was performed through a modified version of the weighted lasso. When information from gene expressions was integrated, predictive performance was considerably improved, in terms of predictive mean square error, compared to classical penalized regression without data integration. We found a 14.7% improvement in the ridge regression case and a 17% improvement for the lasso case. Our version of the weighted lasso with data integration found a list of 22 interesting methylation sites. Several corresponded to genes that are known to be important in bone formation. Using BMD as response and these 22 methylation sites as covariates, least square regression analyses resulted in R 2 =0.726, comparable to an average R 2 =0.438 for 10000 randomly selected groups of DNA-methylations with group size 22. Two recent types of penalized regression methods were adapted to integrate DNA-methylation and their association to gene expression in the analysis of bone mineral density. In both cases predictions clearly benefit from including the additional information on gene expressions.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
Seismic imaging beneath southwest Africa based on finite-frequency body wave tomography
NASA Astrophysics Data System (ADS)
Youssof, Mohammad; Yuan, Xiaohui; Tilmann, Frederik; Heit, Benjamin; Weber, Michael; Jokat, Wilfried; Geissler, Wolfram; Laske, Gabi
2016-04-01
We present a seismic model of southwest Africa from teleseismic tomographic inversion of the P- and S- wave data recorded by an amphibious temporary seismic network. The area of study is located at the intersection of the Walvis Ridge with the continental margin of northern Namibia, and extends into the Congo craton. Utilizing 3D finite-frequency sensitivity kernels, we invert traveltime residuals of the teleseismic body waves to image seismic structures in the upper mantle. To test the robustness of our tomographic imaging, we employed various resolution assessments that allow us to inspect the extent of smearing effects and to evaluate the optimum regularization weights (i.e., damping and smoothness). These tests include applying different (ir)regular parameterizations, classical checkerboard and anomaly tests and squeezing modeling. Furthermore, we performed different kinds of weighing schemes for the traveltime dataset. These schemes account for balancing between the picks data amount with their corresponding events directions. Our assessment procedure involves also a detailed investigation of the effect of the crustal correction on the final velocity image, which strongly influenced the image resolution for the mantle structures. Our model can resolve horizontal structures of 1° x 1° below the array down to 300-350 km depth. The resulting model is mainly dominated by the difference in the oceanic and continental mantle lithosphere beneath the study area, with second-order features related to their respective internal structures. The fast lithospheric keel of the Congo Craton reaches a depth of ~250 km. The orogenic Damara Belt and continental flood basalt areas are characterized by low velocity perturbations down to a depth of ~150 km, indicating a normal fertile mantle. High velocities in the oceanic lithosphere beneath the Walvis Ridge appear to show signatures of chemical depletion. A pronounced anomaly of fast velocity is imaged underneath continental NW Namibia and is separated from the high velocity anomaly of the Congo Craton. We interpret this positive perturbation as depleted mantle materials. The depletion event is most probably related to the emplacement of the Parana-Etendeka flood basalts at about 132 Ma triggered by a mantle plume, which has left traces on the Walvis Ridge as well.
7 CFR 981.401 - Adjusted kernel weight.
Code of Federal Regulations, 2012 CFR
2012-01-01
... based on the analysis of a 1,000 gram sample taken from a lot of almonds weighing 10,000 pounds with less than 95 percent kernels, and a 1,000 gram sample taken from a lot of almonds weighing 10,000... percent kernels containing the following: Edible kernels, 530 grams; inedible kernels, 120 grams; foreign...
7 CFR 981.401 - Adjusted kernel weight.
Code of Federal Regulations, 2011 CFR
2011-01-01
... based on the analysis of a 1,000 gram sample taken from a lot of almonds weighing 10,000 pounds with less than 95 percent kernels, and a 1,000 gram sample taken from a lot of almonds weighing 10,000... percent kernels containing the following: Edible kernels, 530 grams; inedible kernels, 120 grams; foreign...
7 CFR 981.401 - Adjusted kernel weight.
Code of Federal Regulations, 2013 CFR
2013-01-01
... based on the analysis of a 1,000 gram sample taken from a lot of almonds weighing 10,000 pounds with less than 95 percent kernels, and a 1,000 gram sample taken from a lot of almonds weighing 10,000... percent kernels containing the following: Edible kernels, 530 grams; inedible kernels, 120 grams; foreign...
7 CFR 981.401 - Adjusted kernel weight.
Code of Federal Regulations, 2010 CFR
2010-01-01
... based on the analysis of a 1,000 gram sample taken from a lot of almonds weighing 10,000 pounds with less than 95 percent kernels, and a 1,000 gram sample taken from a lot of almonds weighing 10,000... percent kernels containing the following: Edible kernels, 530 grams; inedible kernels, 120 grams; foreign...
7 CFR 981.401 - Adjusted kernel weight.
Code of Federal Regulations, 2014 CFR
2014-01-01
... based on the analysis of a 1,000 gram sample taken from a lot of almonds weighing 10,000 pounds with less than 95 percent kernels, and a 1,000 gram sample taken from a lot of almonds weighing 10,000... percent kernels containing the following: Edible kernels, 530 grams; inedible kernels, 120 grams; foreign...
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Half-kernel. 51.1441 Section 51.1441 Agriculture... Standards for Grades of Shelled Pecans Definitions § 51.1441 Half-kernel. Half-kernel means one of the separated halves of an entire pecan kernel with not more than one-eighth of its original volume missing...
7 CFR 51.1403 - Kernel color classification.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Kernel color classification. 51.1403 Section 51.1403... STANDARDS) United States Standards for Grades of Pecans in the Shell 1 Kernel Color Classification § 51.1403 Kernel color classification. (a) The skin color of pecan kernels may be described in terms of the color...
7 CFR 51.1450 - Serious damage.
Code of Federal Regulations, 2010 CFR
2010-01-01
...; (c) Decay affecting any portion of the kernel; (d) Insects, web, or frass or any distinct evidence of insect feeding on the kernel; (e) Internal discoloration which is dark gray, dark brown, or black and...) Dark kernel spots when more than three are on the kernel, or when any dark kernel spot or the aggregate...
7 CFR 51.1450 - Serious damage.
Code of Federal Regulations, 2011 CFR
2011-01-01
...; (c) Decay affecting any portion of the kernel; (d) Insects, web, or frass or any distinct evidence of insect feeding on the kernel; (e) Internal discoloration which is dark gray, dark brown, or black and...) Dark kernel spots when more than three are on the kernel, or when any dark kernel spot or the aggregate...
7 CFR 51.1450 - Serious damage.
Code of Federal Regulations, 2012 CFR
2012-01-01
...; (c) Decay affecting any portion of the kernel; (d) Insects, web, or frass or any distinct evidence of insect feeding on the kernel; (e) Internal discoloration which is dark gray, dark brown, or black and...) Dark kernel spots when more than three are on the kernel, or when any dark kernel spot or the aggregate...
Taking Lessons Learned from a Proxy Application to a Full Application for SNAP and PARTISN
Womeldorff, Geoffrey Alan; Payne, Joshua Estes; Bergen, Benjamin Karl
2017-06-09
SNAP is a proxy application which simulates the computational motion of a neutral particle transport code, PARTISN. Here in this work, we have adapted parts of SNAP separately; we have re-implemented the iterative shell of SNAP in the task-model runtime Legion, showing an improvement to the original schedule, and we have created multiple Kokkos implementations of the computational kernel of SNAP, displaying similar performance to the native Fortran. We then translate our Kokkos experiments in SNAP to PARTISN, necessitating engineering development, regression testing, and further thought.
Taking Lessons Learned from a Proxy Application to a Full Application for SNAP and PARTISN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Womeldorff, Geoffrey Alan; Payne, Joshua Estes; Bergen, Benjamin Karl
SNAP is a proxy application which simulates the computational motion of a neutral particle transport code, PARTISN. Here in this work, we have adapted parts of SNAP separately; we have re-implemented the iterative shell of SNAP in the task-model runtime Legion, showing an improvement to the original schedule, and we have created multiple Kokkos implementations of the computational kernel of SNAP, displaying similar performance to the native Fortran. We then translate our Kokkos experiments in SNAP to PARTISN, necessitating engineering development, regression testing, and further thought.
NASA Astrophysics Data System (ADS)
Du, Peijun; Tan, Kun; Xing, Xiaoshi
2010-12-01
Combining Support Vector Machine (SVM) with wavelet analysis, we constructed wavelet SVM (WSVM) classifier based on wavelet kernel functions in Reproducing Kernel Hilbert Space (RKHS). In conventional kernel theory, SVM is faced with the bottleneck of kernel parameter selection which further results in time-consuming and low classification accuracy. The wavelet kernel in RKHS is a kind of multidimensional wavelet function that can approximate arbitrary nonlinear functions. Implications on semiparametric estimation are proposed in this paper. Airborne Operational Modular Imaging Spectrometer II (OMIS II) hyperspectral remote sensing image with 64 bands and Reflective Optics System Imaging Spectrometer (ROSIS) data with 115 bands were used to experiment the performance and accuracy of the proposed WSVM classifier. The experimental results indicate that the WSVM classifier can obtain the highest accuracy when using the Coiflet Kernel function in wavelet transform. In contrast with some traditional classifiers, including Spectral Angle Mapping (SAM) and Minimum Distance Classification (MDC), and SVM classifier using Radial Basis Function kernel, the proposed wavelet SVM classifier using the wavelet kernel function in Reproducing Kernel Hilbert Space is capable of improving classification accuracy obviously.
A trace ratio maximization approach to multiple kernel-based dimensionality reduction.
Jiang, Wenhao; Chung, Fu-lai
2014-01-01
Most dimensionality reduction techniques are based on one metric or one kernel, hence it is necessary to select an appropriate kernel for kernel-based dimensionality reduction. Multiple kernel learning for dimensionality reduction (MKL-DR) has been recently proposed to learn a kernel from a set of base kernels which are seen as different descriptions of data. As MKL-DR does not involve regularization, it might be ill-posed under some conditions and consequently its applications are hindered. This paper proposes a multiple kernel learning framework for dimensionality reduction based on regularized trace ratio, termed as MKL-TR. Our method aims at learning a transformation into a space of lower dimension and a corresponding kernel from the given base kernels among which some may not be suitable for the given data. The solutions for the proposed framework can be found based on trace ratio maximization. The experimental results demonstrate its effectiveness in benchmark datasets, which include text, image and sound datasets, for supervised, unsupervised as well as semi-supervised settings. Copyright © 2013 Elsevier Ltd. All rights reserved.
Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar
2017-01-01
Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems. PMID:29099838
Hadamard Kernel SVM with applications for breast cancer outcome predictions.
Jiang, Hao; Ching, Wai-Ki; Cheung, Wai-Shun; Hou, Wenpin; Yin, Hong
2017-12-21
Breast cancer is one of the leading causes of deaths for women. It is of great necessity to develop effective methods for breast cancer detection and diagnosis. Recent studies have focused on gene-based signatures for outcome predictions. Kernel SVM for its discriminative power in dealing with small sample pattern recognition problems has attracted a lot attention. But how to select or construct an appropriate kernel for a specified problem still needs further investigation. Here we propose a novel kernel (Hadamard Kernel) in conjunction with Support Vector Machines (SVMs) to address the problem of breast cancer outcome prediction using gene expression data. Hadamard Kernel outperform the classical kernels and correlation kernel in terms of Area under the ROC Curve (AUC) values where a number of real-world data sets are adopted to test the performance of different methods. Hadamard Kernel SVM is effective for breast cancer predictions, either in terms of prognosis or diagnosis. It may benefit patients by guiding therapeutic options. Apart from that, it would be a valuable addition to the current SVM kernel families. We hope it will contribute to the wider biology and related communities.
Murugesan, Gurusamy; Abdulkadhar, Sabenabanu; Natarajan, Jeyakumar
2017-01-01
Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems.
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press. With a CD: data, software, guides. (2009). 2. Kanevski M. Spatial Predictions of Soil Contamination Using General Regression Neural Networks. Systems Research and Information Systems, Volume 8, number 4, 1999. 3. Robert S., Foresti L., Kanevski M. Spatial prediction of monthly wind speeds in complex terrain with adaptive general regression neural networks. International Journal of Climatology, 33 pp. 1793-1804, 2013.
Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts
NASA Astrophysics Data System (ADS)
AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.
2014-12-01
Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using nonparametric kernel methods. In addition, to the pointwise hourly wind speed forecasts, a confidence interval is also provided which allows to quantify the uncertainty around the forecasts.
Filatov, Gleb; Bauwens, Bruno; Kertész-Farkas, Attila
2018-05-07
Bioinformatics studies often rely on similarity measures between sequence pairs, which often pose a bottleneck in large-scale sequence analysis. Here, we present a new convolutional kernel function for protein sequences called the LZW-Kernel. It is based on code words identified with the Lempel-Ziv-Welch (LZW) universal text compressor. The LZW-Kernel is an alignment-free method, it is always symmetric, is positive, always provides 1.0 for self-similarity and it can directly be used with Support Vector Machines (SVMs) in classification problems, contrary to normalized compression distance (NCD), which often violates the distance metric properties in practice and requires further techniques to be used with SVMs. The LZW-Kernel is a one-pass algorithm, which makes it particularly plausible for big data applications. Our experimental studies on remote protein homology detection and protein classification tasks reveal that the LZW-Kernel closely approaches the performance of the Local Alignment Kernel (LAK) and the SVM-pairwise method combined with Smith-Waterman (SW) scoring at a fraction of the time. Moreover, the LZW-Kernel outperforms the SVM-pairwise method when combined with BLAST scores, which indicates that the LZW code words might be a better basis for similarity measures than local alignment approximations found with BLAST. In addition, the LZW-Kernel outperforms n-gram based mismatch kernels, hidden Markov model based SAM and Fisher kernel, and protein family based PSI-BLAST, among others. Further advantages include the LZW-Kernel's reliance on a simple idea, its ease of implementation, and its high speed, three times faster than BLAST and several magnitudes faster than SW or LAK in our tests. LZW-Kernel is implemented as a standalone C code and is a free open-source program distributed under GPLv3 license and can be downloaded from https://github.com/kfattila/LZW-Kernel. akerteszfarkas@hse.ru. Supplementary data are available at Bioinformatics Online.
Geddes, C.A.; Brown, D.G.; Fagre, D.B.
2005-01-01
We derived and implemented two spatial models of May snow water equivalent (SWE) at Lee Ridge in Glacier National Park, Montana. We used the models to test the hypothesis that vegetation structure is a control on snow redistribution at the alpine treeline ecotone (ATE). The statistical models were derived using stepwise and "best" subsets regression techniques. The first model was derived from field measurements of SWE, topography, and vegetation taken at 27 sample points. The second model was derived using GIS-based measures of topography and vegetation. Both the field- (R² = 0.93) and GIS-based models (R² = 0.69) of May SWE included the following variables: site type (based on vegetation), elevation, maximum slope, and general slope aspect. Site type was identified as the most important predictor of SWE in both models, accounting for 74.0% and 29.5% of the variation, respectively. The GIS-based model was applied to create a predictive map of SWE across Lee Ridge, predicting little snow accumulation on the top of the ridge where vegetation is scarce. The GIS model failed in large depressions, including ephemeral stream channels. The models supported the hypothesis that upright vegetation has a positive effect on accumulation of SWE above and beyond the effects of topography. Vegetation, therefore, creates a positive feedback in which it modifies its, environment and could affect the ability of additional vegetation to become established.
A framework for optimal kernel-based manifold embedding of medical image data.
Zimmer, Veronika A; Lekadir, Karim; Hoogendoorn, Corné; Frangi, Alejandro F; Piella, Gemma
2015-04-01
Kernel-based dimensionality reduction is a widely used technique in medical image analysis. To fully unravel the underlying nonlinear manifold the selection of an adequate kernel function and of its free parameters is critical. In practice, however, the kernel function is generally chosen as Gaussian or polynomial and such standard kernels might not always be optimal for a given image dataset or application. In this paper, we present a study on the effect of the kernel functions in nonlinear manifold embedding of medical image data. To this end, we first carry out a literature review on existing advanced kernels developed in the statistics, machine learning, and signal processing communities. In addition, we implement kernel-based formulations of well-known nonlinear dimensional reduction techniques such as Isomap and Locally Linear Embedding, thus obtaining a unified framework for manifold embedding using kernels. Subsequently, we present a method to automatically choose a kernel function and its associated parameters from a pool of kernel candidates, with the aim to generate the most optimal manifold embeddings. Furthermore, we show how the calculated selection measures can be extended to take into account the spatial relationships in images, or used to combine several kernels to further improve the embedding results. Experiments are then carried out on various synthetic and phantom datasets for numerical assessment of the methods. Furthermore, the workflow is applied to real data that include brain manifolds and multispectral images to demonstrate the importance of the kernel selection in the analysis of high-dimensional medical images. Copyright © 2014 Elsevier Ltd. All rights reserved.
Evaluating the Gradient of the Thin Wire Kernel
NASA Technical Reports Server (NTRS)
Wilton, Donald R.; Champagne, Nathan J.
2008-01-01
Recently, a formulation for evaluating the thin wire kernel was developed that employed a change of variable to smooth the kernel integrand, canceling the singularity in the integrand. Hence, the typical expansion of the wire kernel in a series for use in the potential integrals is avoided. The new expression for the kernel is exact and may be used directly to determine the gradient of the wire kernel, which consists of components that are parallel and radial to the wire axis.
Kernel Machine SNP-set Testing under Multiple Candidate Kernels
Wu, Michael C.; Maity, Arnab; Lee, Seunggeun; Simmons, Elizabeth M.; Harmon, Quaker E.; Lin, Xinyi; Engel, Stephanie M.; Molldrem, Jeffrey J.; Armistead, Paul M.
2013-01-01
Joint testing for the cumulative effect of multiple single nucleotide polymorphisms grouped on the basis of prior biological knowledge has become a popular and powerful strategy for the analysis of large scale genetic association studies. The kernel machine (KM) testing framework is a useful approach that has been proposed for testing associations between multiple genetic variants and many different types of complex traits by comparing pairwise similarity in phenotype between subjects to pairwise similarity in genotype, with similarity in genotype defined via a kernel function. An advantage of the KM framework is its flexibility: choosing different kernel functions allows for different assumptions concerning the underlying model and can allow for improved power. In practice, it is difficult to know which kernel to use a priori since this depends on the unknown underlying trait architecture and selecting the kernel which gives the lowest p-value can lead to inflated type I error. Therefore, we propose practical strategies for KM testing when multiple candidate kernels are present based on constructing composite kernels and based on efficient perturbation procedures. We demonstrate through simulations and real data applications that the procedures protect the type I error rate and can lead to substantially improved power over poor choices of kernels and only modest differences in power versus using the best candidate kernel. PMID:23471868
Takagi, Satoshi; Nagase, Hiroyuki; Hayashi, Tatsuya; Kita, Tamotsu; Hayashi, Katsumi; Sanada, Shigeru; Koike, Masayuki
2014-01-01
The hybrid convolution kernel technique for computed tomography (CT) is known to enable the depiction of an image set using different window settings. Our purpose was to decrease the number of artifacts in the hybrid convolution kernel technique for head CT and to determine whether our improved combined multi-kernel head CT images enabled diagnosis as a substitute for both brain (low-pass kernel-reconstructed) and bone (high-pass kernel-reconstructed) images. Forty-four patients with nondisplaced skull fractures were included. Our improved multi-kernel images were generated so that pixels of >100 Hounsfield unit in both brain and bone images were composed of CT values of bone images and other pixels were composed of CT values of brain images. Three radiologists compared the improved multi-kernel images with bone images. The improved multi-kernel images and brain images were identically displayed on the brain window settings. All three radiologists agreed that the improved multi-kernel images on the bone window settings were sufficient for diagnosing skull fractures in all patients. This improved multi-kernel technique has a simple algorithm and is practical for clinical use. Thus, simplified head CT examinations and fewer images that need to be stored can be expected.
7 CFR 810.202 - Definition of other terms.
Code of Federal Regulations, 2014 CFR
2014-01-01
... barley kernels, other grains, and wild oats that are badly shrunken and distinctly discolored black or... kernels. Kernels and pieces of barley kernels that are distinctly indented, immature or shrunken in...
7 CFR 810.202 - Definition of other terms.
Code of Federal Regulations, 2013 CFR
2013-01-01
... barley kernels, other grains, and wild oats that are badly shrunken and distinctly discolored black or... kernels. Kernels and pieces of barley kernels that are distinctly indented, immature or shrunken in...
7 CFR 810.202 - Definition of other terms.
Code of Federal Regulations, 2012 CFR
2012-01-01
... barley kernels, other grains, and wild oats that are badly shrunken and distinctly discolored black or... kernels. Kernels and pieces of barley kernels that are distinctly indented, immature or shrunken in...
Aflatoxin variability in pistachios.
Mahoney, N E; Rodriguez, S B
1996-01-01
Pistachio fruit components, including hulls (mesocarps and epicarps), seed coats (testas), and kernels (seeds), all contribute to variable aflatoxin content in pistachios. Fresh pistachio kernels were individually inoculated with Aspergillus flavus and incubated 7 or 10 days. Hulled, shelled kernels were either left intact or wounded prior to inoculation. Wounded kernels, with or without the seed coat, were readily colonized by A. flavus and after 10 days of incubation contained 37 times more aflatoxin than similarly treated unwounded kernels. The aflatoxin levels in the individual wounded pistachios were highly variable. Neither fungal colonization nor aflatoxin was detected in intact kernels without seed coats. Intact kernels with seed coats had limited fungal colonization and low aflatoxin concentrations compared with their wounded counterparts. Despite substantial fungal colonization of wounded hulls, aflatoxin was not detected in hulls. Aflatoxin levels were significantly lower in wounded kernels with hulls than in kernels of hulled pistachios. Both the seed coat and a water-soluble extract of hulls suppressed aflatoxin production by A. flavus. PMID:8919781
Huang, Jessie Y.; Eklund, David; Childress, Nathan L.; Howell, Rebecca M.; Mirkovic, Dragan; Followill, David S.; Kry, Stephen F.
2013-01-01
Purpose: Several simplifications used in clinical implementations of the convolution/superposition (C/S) method, specifically, density scaling of water kernels for heterogeneous media and use of a single polyenergetic kernel, lead to dose calculation inaccuracies. Although these weaknesses of the C/S method are known, it is not well known which of these simplifications has the largest effect on dose calculation accuracy in clinical situations. The purpose of this study was to generate and characterize high-resolution, polyenergetic, and material-specific energy deposition kernels (EDKs), as well as to investigate the dosimetric impact of implementing spatially variant polyenergetic and material-specific kernels in a collapsed cone C/S algorithm. Methods: High-resolution, monoenergetic water EDKs and various material-specific EDKs were simulated using the EGSnrc Monte Carlo code. Polyenergetic kernels, reflecting the primary spectrum of a clinical 6 MV photon beam at different locations in a water phantom, were calculated for different depths, field sizes, and off-axis distances. To investigate the dosimetric impact of implementing spatially variant polyenergetic kernels, depth dose curves in water were calculated using two different implementations of the collapsed cone C/S method. The first method uses a single polyenergetic kernel, while the second method fully takes into account spectral changes in the convolution calculation. To investigate the dosimetric impact of implementing material-specific kernels, depth dose curves were calculated for a simplified titanium implant geometry using both a traditional C/S implementation that performs density scaling of water kernels and a novel implementation using material-specific kernels. Results: For our high-resolution kernels, we found good agreement with the Mackie et al. kernels, with some differences near the interaction site for low photon energies (<500 keV). For our spatially variant polyenergetic kernels, we found that depth was the most dominant factor affecting the pattern of energy deposition; however, the effects of field size and off-axis distance were not negligible. For the material-specific kernels, we found that as the density of the material increased, more energy was deposited laterally by charged particles, as opposed to in the forward direction. Thus, density scaling of water kernels becomes a worse approximation as the density and the effective atomic number of the material differ more from water. Implementation of spatially variant, polyenergetic kernels increased the percent depth dose value at 25 cm depth by 2.1%–5.8% depending on the field size, while implementation of titanium kernels gave 4.9% higher dose upstream of the metal cavity (i.e., higher backscatter dose) and 8.2% lower dose downstream of the cavity. Conclusions: Of the various kernel refinements investigated, inclusion of depth-dependent and metal-specific kernels into the C/S method has the greatest potential to improve dose calculation accuracy. Implementation of spatially variant polyenergetic kernels resulted in a harder depth dose curve and thus has the potential to affect beam modeling parameters obtained in the commissioning process. For metal implants, the C/S algorithms generally underestimate the dose upstream and overestimate the dose downstream of the implant. Implementation of a metal-specific kernel mitigated both of these errors. PMID:24320507
ERIC Educational Resources Information Center
Lee, Yi-Hsuan; von Davier, Alina A.
2008-01-01
The kernel equating method (von Davier, Holland, & Thayer, 2004) is based on a flexible family of equipercentile-like equating functions that use a Gaussian kernel to continuize the discrete score distributions. While the classical equipercentile, or percentile-rank, equating method carries out the continuization step by linear interpolation,…
Code of Federal Regulations, 2010 CFR
2010-01-01
...— Damaged kernels 1 (percent) Foreign material (percent) Other grains (percent) Skinned and broken kernels....0 10.0 15.0 1 Injured-by-frost kernels and injured-by-mold kernels are not considered damaged kernels or considered against sound barley. Notes: Malting barley shall not be infested in accordance with...
Code of Federal Regulations, 2013 CFR
2013-01-01
... well cured; (e) Poorly developed kernels; (f) Kernels which are dark amber in color; (g) Kernel spots when more than one dark spot is present on either half of the kernel, or when any such spot is more...
Code of Federal Regulations, 2014 CFR
2014-01-01
... well cured; (e) Poorly developed kernels; (f) Kernels which are dark amber in color; (g) Kernel spots when more than one dark spot is present on either half of the kernel, or when any such spot is more...
7 CFR 810.205 - Grades and grade requirements for Two-rowed Malting barley.
Code of Federal Regulations, 2010 CFR
2010-01-01
... (percent) Maximum limits of— Wild oats (percent) Foreign material (percent) Skinned and broken kernels... Injured-by-frost kernels and injured-by-mold kernels are not considered damaged kernels or considered...
GRMDA: Graph Regression for MiRNA-Disease Association Prediction
Chen, Xing; Yang, Jing-Ru; Guan, Na-Na; Li, Jian-Qiang
2018-01-01
Nowadays, as more and more associations between microRNAs (miRNAs) and diseases have been discovered, miRNA has gradually become a hot topic in the biological field. Because of the high consumption of time and money on carrying out biological experiments, computational method which can help scientists choose the most likely associations between miRNAs and diseases for further experimental studies is desperately needed. In this study, we proposed a method of Graph Regression for MiRNA-Disease Association prediction (GRMDA) which combines known miRNA-disease associations, miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity. We used Gaussian interaction profile kernel similarity to supplement the shortage of miRNA functional similarity and disease semantic similarity. Furthermore, the graph regression was synchronously performed in three latent spaces, including association space, miRNA similarity space, and disease similarity space, by using two matrix factorization approaches called Singular Value Decomposition and Partial Least-Squares to extract important related attributes and filter the noise. In the leave-one-out cross validation and five-fold cross validation, GRMDA obtained the AUCs of 0.8272 and 0.8080 ± 0.0024, respectively. Thus, its performance is better than some previous models. In the case study of Lymphoma using the recorded miRNA-disease associations in HMDD V2.0 database, 88% of top 50 predicted miRNAs were verified by experimental literatures. In order to test the performance of GRMDA on new diseases with no known related miRNAs, we took Breast Neoplasms as an example by regarding all the known related miRNAs as unknown ones. We found that 100% of top 50 predicted miRNAs were verified. Moreover, 84% of top 50 predicted miRNAs in case study for Esophageal Neoplasms based on HMDD V1.0 were verified to have known associations. In conclusion, GRMDA is an effective and practical method for miRNA-disease association prediction. PMID:29515453
GRMDA: Graph Regression for MiRNA-Disease Association Prediction.
Chen, Xing; Yang, Jing-Ru; Guan, Na-Na; Li, Jian-Qiang
2018-01-01
Nowadays, as more and more associations between microRNAs (miRNAs) and diseases have been discovered, miRNA has gradually become a hot topic in the biological field. Because of the high consumption of time and money on carrying out biological experiments, computational method which can help scientists choose the most likely associations between miRNAs and diseases for further experimental studies is desperately needed. In this study, we proposed a method of Graph Regression for MiRNA-Disease Association prediction (GRMDA) which combines known miRNA-disease associations, miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity. We used Gaussian interaction profile kernel similarity to supplement the shortage of miRNA functional similarity and disease semantic similarity. Furthermore, the graph regression was synchronously performed in three latent spaces, including association space, miRNA similarity space, and disease similarity space, by using two matrix factorization approaches called Singular Value Decomposition and Partial Least-Squares to extract important related attributes and filter the noise. In the leave-one-out cross validation and five-fold cross validation, GRMDA obtained the AUCs of 0.8272 and 0.8080 ± 0.0024, respectively. Thus, its performance is better than some previous models. In the case study of Lymphoma using the recorded miRNA-disease associations in HMDD V2.0 database, 88% of top 50 predicted miRNAs were verified by experimental literatures. In order to test the performance of GRMDA on new diseases with no known related miRNAs, we took Breast Neoplasms as an example by regarding all the known related miRNAs as unknown ones. We found that 100% of top 50 predicted miRNAs were verified. Moreover, 84% of top 50 predicted miRNAs in case study for Esophageal Neoplasms based on HMDD V1.0 were verified to have known associations. In conclusion, GRMDA is an effective and practical method for miRNA-disease association prediction.
Detection of ochratoxin A contamination in stored wheat using near-infrared hyperspectral imaging
NASA Astrophysics Data System (ADS)
Senthilkumar, T.; Jayas, D. S.; White, N. D. G.; Fields, P. G.; Gräfenhan, T.
2017-03-01
Near-infrared (NIR) hyperspectral imaging system was used to detect five concentration levels of ochratoxin A (OTA) in contaminated wheat kernels. The wheat kernels artificially inoculated with two different OTA producing Penicillium verrucosum strains, two different non-toxigenic P. verrucosum strains, and sterile control wheat kernels were subjected to NIR hyperspectral imaging. The acquired three-dimensional data were reshaped into readable two-dimensional data. Principal Component Analysis (PCA) was applied to the two dimensional data to identify the key wavelengths which had greater significance in detecting OTA contamination in wheat. Statistical and histogram features extracted at the key wavelengths were used in the linear, quadratic and Mahalanobis statistical discriminant models to differentiate between sterile control, five concentration levels of OTA contamination in wheat kernels, and five infection levels of non-OTA producing P. verrucosum inoculated wheat kernels. The classification models differentiated sterile control samples from OTA contaminated wheat kernels and non-OTA producing P. verrucosum inoculated wheat kernels with a 100% accuracy. The classification models also differentiated between five concentration levels of OTA contaminated wheat kernels and between five infection levels of non-OTA producing P. verrucosum inoculated wheat kernels with a correct classification of more than 98%. The non-OTA producing P. verrucosum inoculated wheat kernels and OTA contaminated wheat kernels subjected to hyperspectral imaging provided different spectral patterns.
Application of kernel method in fluorescence molecular tomography
NASA Astrophysics Data System (ADS)
Zhao, Yue; Baikejiang, Reheman; Li, Changqing
2017-02-01
Reconstruction of fluorescence molecular tomography (FMT) is an ill-posed inverse problem. Anatomical guidance in the FMT reconstruction can improve FMT reconstruction efficiently. We have developed a kernel method to introduce the anatomical guidance into FMT robustly and easily. The kernel method is from machine learning for pattern analysis and is an efficient way to represent anatomical features. For the finite element method based FMT reconstruction, we calculate a kernel function for each finite element node from an anatomical image, such as a micro-CT image. Then the fluorophore concentration at each node is represented by a kernel coefficient vector and the corresponding kernel function. In the FMT forward model, we have a new system matrix by multiplying the sensitivity matrix with the kernel matrix. Thus, the kernel coefficient vector is the unknown to be reconstructed following a standard iterative reconstruction process. We convert the FMT reconstruction problem into the kernel coefficient reconstruction problem. The desired fluorophore concentration at each node can be calculated accordingly. Numerical simulation studies have demonstrated that the proposed kernel-based algorithm can improve the spatial resolution of the reconstructed FMT images. In the proposed kernel method, the anatomical guidance can be obtained directly from the anatomical image and is included in the forward modeling. One of the advantages is that we do not need to segment the anatomical image for the targets and background.
Evaluation of trends in wheat yield models
NASA Technical Reports Server (NTRS)
Ferguson, M. C.
1982-01-01
Trend terms in models for wheat yield in the U.S. Great Plains for the years 1932 to 1976 are evaluated. The subset of meteorological variables yielding the largest adjusted R(2) is selected using the method of leaps and bounds. Latent root regression is used to eliminate multicollinearities, and generalized ridge regression is used to introduce bias to provide stability in the data matrix. The regression model used provides for two trends in each of two models: a dependent model in which the trend line is piece-wise continuous, and an independent model in which the trend line is discontinuous at the year of the slope change. It was found that the trend lines best describing the wheat yields consisted of combinations of increasing, decreasing, and constant trend: four combinations for the dependent model and seven for the independent model.
Credit scoring analysis using kernel discriminant
NASA Astrophysics Data System (ADS)
Widiharih, T.; Mukid, M. A.; Mustafid
2018-05-01
Credit scoring model is an important tool for reducing the risk of wrong decisions when granting credit facilities to applicants. This paper investigate the performance of kernel discriminant model in assessing customer credit risk. Kernel discriminant analysis is a non- parametric method which means that it does not require any assumptions about the probability distribution of the input. The main ingredient is a kernel that allows an efficient computation of Fisher discriminant. We use several kernel such as normal, epanechnikov, biweight, and triweight. The models accuracy was compared each other using data from a financial institution in Indonesia. The results show that kernel discriminant can be an alternative method that can be used to determine who is eligible for a credit loan. In the data we use, it shows that a normal kernel is relevant to be selected for credit scoring using kernel discriminant model. Sensitivity and specificity reach to 0.5556 and 0.5488 respectively.
Classification of Phylogenetic Profiles for Protein Function Prediction: An SVM Approach
NASA Astrophysics Data System (ADS)
Kotaru, Appala Raju; Joshi, Ramesh C.
Predicting the function of an uncharacterized protein is a major challenge in post-genomic era due to problems complexity and scale. Having knowledge of protein function is a crucial link in the development of new drugs, better crops, and even the development of biochemicals such as biofuels. Recently numerous high-throughput experimental procedures have been invented to investigate the mechanisms leading to the accomplishment of a protein’s function and Phylogenetic profile is one of them. Phylogenetic profile is a way of representing a protein which encodes evolutionary history of proteins. In this paper we proposed a method for classification of phylogenetic profiles using supervised machine learning method, support vector machine classification along with radial basis function as kernel for identifying functionally linked proteins. We experimentally evaluated the performance of the classifier with the linear kernel, polynomial kernel and compared the results with the existing tree kernel. In our study we have used proteins of the budding yeast saccharomyces cerevisiae genome. We generated the phylogenetic profiles of 2465 yeast genes and for our study we used the functional annotations that are available in the MIPS database. Our experiments show that the performance of the radial basis kernel is similar to polynomial kernel is some functional classes together are better than linear, tree kernel and over all radial basis kernel outperformed the polynomial kernel, linear kernel and tree kernel. In analyzing these results we show that it will be feasible to make use of SVM classifier with radial basis function as kernel to predict the gene functionality using phylogenetic profiles.
Steckel, S; Stewart, S D
2015-06-01
Ear-feeding larvae, such as corn earworm, Helicoverpa zea Boddie (Lepidoptera: Noctuidae), can be important insect pests of field corn, Zea mays L., by feeding on kernels. Recently introduced, stacked Bacillus thuringiensis (Bt) traits provide improved protection from ear-feeding larvae. Thus, our objective was to evaluate how injury to kernels in the ear tip might affect yield when this injury was inflicted at the blister and milk stages. In 2010, simulated corn earworm injury reduced total kernel weight (i.e., yield) at both the blister and milk stage. In 2011, injury to ear tips at the milk stage affected total kernel weight. No differences in total kernel weight were found in 2013, regardless of when or how much injury was inflicted. Our data suggested that kernels within the same ear could compensate for injury to ear tips by increasing in size, but this increase was not always statistically significant or sufficient to overcome high levels of kernel injury. For naturally occurring injury observed on multiple corn hybrids during 2011 and 2012, our analyses showed either no or a minimal relationship between number of kernels injured by ear-feeding larvae and the total number of kernels per ear, total kernel weight, or the size of individual kernels. The results indicate that intraear compensation for kernel injury to ear tips can occur under at least some conditions. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Web document ranking via active learning and kernel principal component analysis
NASA Astrophysics Data System (ADS)
Cai, Fei; Chen, Honghui; Shu, Zhen
2015-09-01
Web document ranking arises in many information retrieval (IR) applications, such as the search engine, recommendation system and online advertising. A challenging issue is how to select the representative query-document pairs and informative features as well for better learning and exploring new ranking models to produce an acceptable ranking list of candidate documents of each query. In this study, we propose an active sampling (AS) plus kernel principal component analysis (KPCA) based ranking model, viz. AS-KPCA Regression, to study the document ranking for a retrieval system, i.e. how to choose the representative query-document pairs and features for learning. More precisely, we fill those documents gradually into the training set by AS such that each of which will incur the highest expected DCG loss if unselected. Then, the KPCA is performed via projecting the selected query-document pairs onto p-principal components in the feature space to complete the regression. Hence, we can cut down the computational overhead and depress the impact incurred by noise simultaneously. To the best of our knowledge, we are the first to perform the document ranking via dimension reductions in two dimensions, namely, the number of documents and features simultaneously. Our experiments demonstrate that the performance of our approach is better than that of the baseline methods on the public LETOR 4.0 datasets. Our approach brings an improvement against RankBoost as well as other baselines near 20% in terms of MAP metric and less improvements using P@K and NDCG@K, respectively. Moreover, our approach is particularly suitable for document ranking on the noisy dataset in practice.
NASA Astrophysics Data System (ADS)
Hein, C. J.; Billy, J.; Robin, N.; FitzGerald, D.; Certain, R.
2017-12-01
The internal architecture of beach-ridge systems can provide insight into processes ongoing during its period of formation, such as changing relative sea-level (RSL). The paraglacial beach-ridge plain at Miquelon-Langlade (south of Newfoundland - NW Atlantic) is an example of a well-preserved regressive barrier. Initiation of this plain correlates with a decrease in the rate of RSL rise (from +4.4 mm/yr to 1.3 mm/yr) at around 3000 years ago. The combination of stratigraphic (ground-penetrating radar and sediment cores), topographic (RTK-GPS) and chronologic (optically stimulated luminescence, OSL) data provide a detailed understanding of the constructional history of the plain. The internal architecture of individual beach ridges are characterized by sigmoidal configurations with seaward-dipping (2.3-4.7°) beds. Field mapping data reveal the processes associated with development of individual ridges in relation to sea level elevation. First, wave-built facies (sand-and-gravel) are deposited as beach berms, likely by fair-weather waves, with their elevations controlled by sea level and the swash height of constructive waves. This is followed by the accretion of aeolian sand deposits (foredunes) on the previous relict ridge, and then colonization by pioneer grasses. The well-defined contact between coarse-grained, wave-built facies and overlying aeolian deposits is used to demonstrate the dominant influence of RSL change in the development of the barrier system and, with chronology provided by OSL dating, produce a RSL curve for the 2500-year period of its formation. A net increase of 2.4 m in the surface elevation of wave-built facies is observed across the plain, corresponding to an overall increase in mean sea level through time. Three distinct periods can be distinguished: (1) an increase from 2.4 to 1 m below modern MSL between 2400 and 1500 years (rate: +1.3 mm/yr); (2) relatively stable or slowly rising RSL (<+0.2 mm/yr) from 1400 to 700 years; and (3) a rise of ca. 0.7 m during the past 700 years (+1.1 mm/yr). This study presents a moderate-resolution RSL curve for southern Newfoundland over the last 2500 years and a field demonstration of the utility of wave-built/aeolian stratigraphic contacts in beach ridges for sea-level reconstructions in mixed clastic systems.
Evidence-based Kernels: Fundamental Units of Behavioral Influence
Biglan, Anthony
2008-01-01
This paper describes evidence-based kernels, fundamental units of behavioral influence that appear to underlie effective prevention and treatment for children, adults, and families. A kernel is a behavior–influence procedure shown through experimental analysis to affect a specific behavior and that is indivisible in the sense that removing any of its components would render it inert. Existing evidence shows that a variety of kernels can influence behavior in context, and some evidence suggests that frequent use or sufficient use of some kernels may produce longer lasting behavioral shifts. The analysis of kernels could contribute to an empirically based theory of behavioral influence, augment existing prevention or treatment efforts, facilitate the dissemination of effective prevention and treatment practices, clarify the active ingredients in existing interventions, and contribute to efficiently developing interventions that are more effective. Kernels involve one or more of the following mechanisms of behavior influence: reinforcement, altering antecedents, changing verbal relational responding, or changing physiological states directly. The paper describes 52 of these kernels, and details practical, theoretical, and research implications, including calling for a national database of kernels that influence human behavior. PMID:18712600
Integrating the Gradient of the Thin Wire Kernel
NASA Technical Reports Server (NTRS)
Champagne, Nathan J.; Wilton, Donald R.
2008-01-01
A formulation for integrating the gradient of the thin wire kernel is presented. This approach employs a new expression for the gradient of the thin wire kernel derived from a recent technique for numerically evaluating the exact thin wire kernel. This approach should provide essentially arbitrary accuracy and may be used with higher-order elements and basis functions using the procedure described in [4].When the source and observation points are close, the potential integrals over wire segments involving the wire kernel are split into parts to handle the singular behavior of the integrand [1]. The singularity characteristics of the gradient of the wire kernel are different than those of the wire kernel, and the axial and radial components have different singularities. The characteristics of the gradient of the wire kernel are discussed in [2]. To evaluate the near electric and magnetic fields of a wire, the integration of the gradient of the wire kernel needs to be calculated over the source wire. Since the vector bases for current have constant direction on linear wire segments, these integrals reduce to integrals of the form
Ranking Support Vector Machine with Kernel Approximation
Dou, Yong
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms. PMID:28293256
Ranking Support Vector Machine with Kernel Approximation.
Chen, Kai; Li, Rongchun; Dou, Yong; Liang, Zhengfa; Lv, Qi
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.
Novel applications of the temporal kernel method: Historical and future radiative forcing
NASA Astrophysics Data System (ADS)
Portmann, R. W.; Larson, E.; Solomon, S.; Murphy, D. M.
2017-12-01
We present a new estimate of the historical radiative forcing derived from the observed global mean surface temperature and a model derived kernel function. Current estimates of historical radiative forcing are usually derived from climate models. Despite large variability in these models, the multi-model mean tends to do a reasonable job of representing the Earth system and climate. One method of diagnosing the transient radiative forcing in these models requires model output of top of the atmosphere radiative imbalance and global mean temperature anomaly. It is difficult to apply this method to historical observations due to the lack of TOA radiative measurements before CERES. We apply the temporal kernel method (TKM) of calculating radiative forcing to the historical global mean temperature anomaly. This novel approach is compared against the current regression based methods using model outputs and shown to produce consistent forcing estimates giving confidence in the forcing derived from the historical temperature record. The derived TKM radiative forcing provides an estimate of the forcing time series that the average climate model needs to produce the observed temperature record. This forcing time series is found to be in good overall agreement with previous estimates but includes significant differences that will be discussed. The historical anthropogenic aerosol forcing is estimated as a residual from the TKM and found to be consistent with earlier moderate forcing estimates. In addition, this method is applied to future temperature projections to estimate the radiative forcing required to achieve those temperature goals, such as those set in the Paris agreement.
Code of Federal Regulations, 2011 CFR
2011-04-01
... source Apricot kernel (persic oil) Prunus armeniaca L. Peach kernel (persic oil) Prunus persica Sieb. et Zucc. Peanut stearine Arachis hypogaea L. Persic oil (see apricot kernel and peach kernel) Quince seed...
Code of Federal Regulations, 2013 CFR
2013-04-01
... source Apricot kernel (persic oil) Prunus armeniaca L. Peach kernel (persic oil) Prunus persica Sieb. et Zucc. Peanut stearine Arachis hypogaea L. Persic oil (see apricot kernel and peach kernel) Quince seed...
Code of Federal Regulations, 2012 CFR
2012-04-01
... source Apricot kernel (persic oil) Prunus armeniaca L. Peach kernel (persic oil) Prunus persica Sieb. et Zucc. Peanut stearine Arachis hypogaea L. Persic oil (see apricot kernel and peach kernel) Quince seed...
Wigner functions defined with Laplace transform kernels.
Oh, Se Baek; Petruccelli, Jonathan C; Tian, Lei; Barbastathis, George
2011-10-24
We propose a new Wigner-type phase-space function using Laplace transform kernels--Laplace kernel Wigner function. Whereas momentum variables are real in the traditional Wigner function, the Laplace kernel Wigner function may have complex momentum variables. Due to the property of the Laplace transform, a broader range of signals can be represented in complex phase-space. We show that the Laplace kernel Wigner function exhibits similar properties in the marginals as the traditional Wigner function. As an example, we use the Laplace kernel Wigner function to analyze evanescent waves supported by surface plasmon polariton. © 2011 Optical Society of America
Online learning control using adaptive critic designs with sparse kernel machines.
Xu, Xin; Hou, Zhongsheng; Lian, Chuanqiang; He, Haibo
2013-05-01
In the past decade, adaptive critic designs (ACDs), including heuristic dynamic programming (HDP), dual heuristic programming (DHP), and their action-dependent ones, have been widely studied to realize online learning control of dynamical systems. However, because neural networks with manually designed features are commonly used to deal with continuous state and action spaces, the generalization capability and learning efficiency of previous ACDs still need to be improved. In this paper, a novel framework of ACDs with sparse kernel machines is presented by integrating kernel methods into the critic of ACDs. To improve the generalization capability as well as the computational efficiency of kernel machines, a sparsification method based on the approximately linear dependence analysis is used. Using the sparse kernel machines, two kernel-based ACD algorithms, that is, kernel HDP (KHDP) and kernel DHP (KDHP), are proposed and their performance is analyzed both theoretically and empirically. Because of the representation learning and generalization capability of sparse kernel machines, KHDP and KDHP can obtain much better performance than previous HDP and DHP with manually designed neural networks. Simulation and experimental results of two nonlinear control problems, that is, a continuous-action inverted pendulum problem and a ball and plate control problem, demonstrate the effectiveness of the proposed kernel ACD methods.
NASA Astrophysics Data System (ADS)
Basak, Subhash C.; Mills, Denise; Hawkins, Douglas M.
2008-06-01
A hierarchical classification study was carried out based on a set of 70 chemicals—35 which produce allergic contact dermatitis (ACD) and 35 which do not. This approach was implemented using a regular ridge regression computer code, followed by conversion of regression output to binary data values. The hierarchical descriptor classes used in the modeling include topostructural (TS), topochemical (TC), and quantum chemical (QC), all of which are based solely on chemical structure. The concordance, sensitivity, and specificity are reported. The model based on the TC descriptors was found to be the best, while the TS model was extremely poor.
Relationship between processing score and kernel-fraction particle size in whole-plant corn silage.
Dias Junior, G S; Ferraretto, L F; Salvati, G G S; de Resende, L C; Hoffman, P C; Pereira, M N; Shaver, R D
2016-04-01
Kernel processing increases starch digestibility in whole-plant corn silage (WPCS). Corn silage processing score (CSPS), the percentage of starch passing through a 4.75-mm sieve, is widely used to assess degree of kernel breakage in WPCS. However, the geometric mean particle size (GMPS) of the kernel-fraction that passes through the 4.75-mm sieve has not been well described. Therefore, the objectives of this study were (1) to evaluate particle size distribution and digestibility of kernels cut in varied particle sizes; (2) to propose a method to measure GMPS in WPCS kernels; and (3) to evaluate the relationship between CSPS and GMPS of the kernel fraction in WPCS. Composite samples of unfermented, dried kernels from 110 corn hybrids commonly used for silage production were kept whole (WH) or manually cut in 2, 4, 8, 16, 32 or 64 pieces (2P, 4P, 8P, 16P, 32P, and 64P, respectively). Dry sieving to determine GMPS, surface area, and particle size distribution using 9 sieves with nominal square apertures of 9.50, 6.70, 4.75, 3.35, 2.36, 1.70, 1.18, and 0.59 mm and pan, as well as ruminal in situ dry matter (DM) digestibilities were performed for each kernel particle number treatment. Incubation times were 0, 3, 6, 12, and 24 h. The ruminal in situ DM disappearance of unfermented kernels increased with the reduction in particle size of corn kernels. Kernels kept whole had the lowest ruminal DM disappearance for all time points with maximum DM disappearance of 6.9% at 24 h and the greatest disappearance was observed for 64P, followed by 32P and 16P. Samples of WPCS (n=80) from 3 studies representing varied theoretical length of cut settings and processor types and settings were also evaluated. Each WPCS sample was divided in 2 and then dried at 60 °C for 48 h. The CSPS was determined in duplicate on 1 of the split samples, whereas on the other split sample the kernel and stover fractions were separated using a hydrodynamic separation procedure. After separation, the kernel fraction was redried at 60°C for 48 h in a forced-air oven and dry sieved to determine GMPS and surface area. Linear relationships between CSPS from WPCS (n=80) and kernel fraction GMPS, surface area, and proportion passing through the 4.75-mm screen were poor. Strong quadratic relationships between proportion of kernel fraction passing through the 4.75-mm screen and kernel fraction GMPS and surface area were observed. These findings suggest that hydrodynamic separation and dry sieving of the kernel fraction may provide a better assessment of kernel breakage in WPCS than CSPS. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhu, Fengle; Yao, Haibo; Hruska, Zuzana; Kincaid, Russell; Brown, Robert; Bhatnagar, Deepak; Cleveland, Thomas
2015-05-01
Aflatoxins are secondary metabolites produced by certain fungal species of the Aspergillus genus. Aflatoxin contamination remains a problem in agricultural products due to its toxic and carcinogenic properties. Conventional chemical methods for aflatoxin detection are time-consuming and destructive. This study employed fluorescence and reflectance visible near-infrared (VNIR) hyperspectral images to classify aflatoxin contaminated corn kernels rapidly and non-destructively. Corn ears were artificially inoculated in the field with toxigenic A. flavus spores at the early dough stage of kernel development. After harvest, a total of 300 kernels were collected from the inoculated ears. Fluorescence hyperspectral imagery with UV excitation and reflectance hyperspectral imagery with halogen illumination were acquired on both endosperm and germ sides of kernels. All kernels were then subjected to chemical analysis individually to determine aflatoxin concentrations. A region of interest (ROI) was created for each kernel to extract averaged spectra. Compared with healthy kernels, fluorescence spectral peaks for contaminated kernels shifted to longer wavelengths with lower intensity, and reflectance values for contaminated kernels were lower with a different spectral shape in 700-800 nm region. Principal component analysis was applied for data compression before classifying kernels into contaminated and healthy based on a 20 ppb threshold utilizing the K-nearest neighbors algorithm. The best overall accuracy achieved was 92.67% for germ side in the fluorescence data analysis. The germ side generally performed better than endosperm side. Fluorescence and reflectance image data achieved similar accuracy.
Influence of Kernel Age on Fumonisin B1 Production in Maize by Fusarium moniliforme
Warfield, Colleen Y.; Gilchrist, David G.
1999-01-01
Production of fumonisins by Fusarium moniliforme on naturally infected maize ears is an important food safety concern due to the toxic nature of this class of mycotoxins. Assessing the potential risk of fumonisin production in developing maize ears prior to harvest requires an understanding of the regulation of toxin biosynthesis during kernel maturation. We investigated the developmental-stage-dependent relationship between maize kernels and fumonisin B1 production by using kernels collected at the blister (R2), milk (R3), dough (R4), and dent (R5) stages following inoculation in culture at their respective field moisture contents with F. moniliforme. Highly significant differences (P ≤ 0.001) in fumonisin B1 production were found among kernels at the different developmental stages. The highest levels of fumonisin B1 were produced on the dent stage kernels, and the lowest levels were produced on the blister stage kernels. The differences in fumonisin B1 production among kernels at the different developmental stages remained significant (P ≤ 0.001) when the moisture contents of the kernels were adjusted to the same level prior to inoculation. We concluded that toxin production is affected by substrate composition as well as by moisture content. Our study also demonstrated that fumonisin B1 biosynthesis on maize kernels is influenced by factors which vary with the developmental age of the tissue. The risk of fumonisin contamination may begin early in maize ear development and increases as the kernels reach physiological maturity. PMID:10388675
A framework for longitudinal data analysis via shape regression
NASA Astrophysics Data System (ADS)
Fishbaugh, James; Durrleman, Stanley; Piven, Joseph; Gerig, Guido
2012-02-01
Traditional longitudinal analysis begins by extracting desired clinical measurements, such as volume or head circumference, from discrete imaging data. Typically, the continuous evolution of a scalar measurement is estimated by choosing a 1D regression model, such as kernel regression or fitting a polynomial of fixed degree. This type of analysis not only leads to separate models for each measurement, but there is no clear anatomical or biological interpretation to aid in the selection of the appropriate paradigm. In this paper, we propose a consistent framework for the analysis of longitudinal data by estimating the continuous evolution of shape over time as twice differentiable flows of deformations. In contrast to 1D regression models, one model is chosen to realistically capture the growth of anatomical structures. From the continuous evolution of shape, we can simply extract any clinical measurements of interest. We demonstrate on real anatomical surfaces that volume extracted from a continuous shape evolution is consistent with a 1D regression performed on the discrete measurements. We further show how the visualization of shape progression can aid in the search for significant measurements. Finally, we present an example on a shape complex of the brain (left hemisphere, right hemisphere, cerebellum) that demonstrates a potential clinical application for our framework.
Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.
Gijsberts, Arjan; Metta, Giorgio
2013-05-01
Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Binol, Hamidullah; Bal, Abdullah; Cukur, Huseyin
2015-10-01
The performance of the kernel based techniques depends on the selection of kernel parameters. That's why; suitable parameter selection is an important problem for many kernel based techniques. This article presents a novel technique to learn the kernel parameters in kernel Fukunaga-Koontz Transform based (KFKT) classifier. The proposed approach determines the appropriate values of kernel parameters through optimizing an objective function constructed based on discrimination ability of KFKT. For this purpose we have utilized differential evolution algorithm (DEA). The new technique overcomes some disadvantages such as high time consumption existing in the traditional cross-validation method, and it can be utilized in any type of data. The experiments for target detection applications on the hyperspectral images verify the effectiveness of the proposed method.
Design of a multiple kernel learning algorithm for LS-SVM by convex programming.
Jian, Ling; Xia, Zhonghang; Liang, Xijun; Gao, Chuanhou
2011-06-01
As a kernel based method, the performance of least squares support vector machine (LS-SVM) depends on the selection of the kernel as well as the regularization parameter (Duan, Keerthi, & Poo, 2003). Cross-validation is efficient in selecting a single kernel and the regularization parameter; however, it suffers from heavy computational cost and is not flexible to deal with multiple kernels. In this paper, we address the issue of multiple kernel learning for LS-SVM by formulating it as semidefinite programming (SDP). Furthermore, we show that the regularization parameter can be optimized in a unified framework with the kernel, which leads to an automatic process for model selection. Extensive experimental validations are performed and analyzed. Copyright © 2011 Elsevier Ltd. All rights reserved.
Novel near-infrared sampling apparatus for single kernel analysis of oil content in maize.
Janni, James; Weinstock, B André; Hagen, Lisa; Wright, Steve
2008-04-01
A method of rapid, nondestructive chemical and physical analysis of individual maize (Zea mays L.) kernels is needed for the development of high value food, feed, and fuel traits. Near-infrared (NIR) spectroscopy offers a robust nondestructive method of trait determination. However, traditional NIR bulk sampling techniques cannot be applied successfully to individual kernels. Obtaining optimized single kernel NIR spectra for applied chemometric predictive analysis requires a novel sampling technique that can account for the heterogeneous forms, morphologies, and opacities exhibited in individual maize kernels. In this study such a novel technique is described and compared to less effective means of single kernel NIR analysis. Results of the application of a partial least squares (PLS) derived model for predictive determination of percent oil content per individual kernel are shown.
Zhou, Qijing; Jiang, Biao; Dong, Fei; Huang, Peiyu; Liu, Hongtao; Zhang, Minming
2014-01-01
To evaluate the improvement of iterative reconstruction in image space (IRIS) technique in computed tomographic (CT) coronary stent imaging with sharp kernel, and to make a trade-off analysis. Fifty-six patients with 105 stents were examined by 128-slice dual-source CT coronary angiography (CTCA). Images were reconstructed using standard filtered back projection (FBP) and IRIS with both medium kernel and sharp kernel applied. Image noise and the stent diameter were investigated. Image noise was measured both in background vessel and in-stent lumen as objective image evaluation. Image noise score and stent score were performed as subjective image evaluation. The CTCA images reconstructed with IRIS were associated with significant noise reduction compared to that of CTCA images reconstructed using FBP technique in both of background vessel and in-stent lumen (the background noise decreased by approximately 25.4% ± 8.2% in medium kernel (P
Zhang, Guoqing; Sun, Huaijiang; Xia, Guiyu; Sun, Quansen
2016-07-07
Sparse representation based classification (SRC) has been developed and shown great potential for real-world application. Based on SRC, Yang et al. [10] devised a SRC steered discriminative projection (SRC-DP) method. However, as a linear algorithm, SRC-DP cannot handle the data with highly nonlinear distribution. Kernel sparse representation-based classifier (KSRC) is a non-linear extension of SRC and can remedy the drawback of SRC. KSRC requires the use of a predetermined kernel function and selection of the kernel function and its parameters is difficult. Recently, multiple kernel learning for SRC (MKL-SRC) [22] has been proposed to learn a kernel from a set of base kernels. However, MKL-SRC only considers the within-class reconstruction residual while ignoring the between-class relationship, when learning the kernel weights. In this paper, we propose a novel multiple kernel sparse representation-based classifier (MKSRC), and then we use it as a criterion to design a multiple kernel sparse representation based orthogonal discriminative projection method (MK-SR-ODP). The proposed algorithm aims at learning a projection matrix and a corresponding kernel from the given base kernels such that in the low dimension subspace the between-class reconstruction residual is maximized and the within-class reconstruction residual is minimized. Furthermore, to achieve a minimum overall loss by performing recognition in the learned low-dimensional subspace, we introduce cost information into the dimensionality reduction method. The solutions for the proposed method can be efficiently found based on trace ratio optimization method [33]. Extensive experimental results demonstrate the superiority of the proposed algorithm when compared with the state-of-the-art methods.
Improving prediction of heterodimeric protein complexes using combination with pairwise kernel.
Ruan, Peiying; Hayashida, Morihiro; Akutsu, Tatsuya; Vert, Jean-Philippe
2018-02-19
Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.
Mapping QTLs controlling kernel dimensions in a wheat inter-varietal RIL mapping population.
Cheng, Ruiru; Kong, Zhongxin; Zhang, Liwei; Xie, Quan; Jia, Haiyan; Yu, Dong; Huang, Yulong; Ma, Zhengqiang
2017-07-01
Seven kernel dimension QTLs were identified in wheat, and kernel thickness was found to be the most important dimension for grain weight improvement. Kernel morphology and weight of wheat (Triticum aestivum L.) affect both yield and quality; however, the genetic basis of these traits and their interactions has not been fully understood. In this study, to investigate the genetic factors affecting kernel morphology and the association of kernel morphology traits with kernel weight, kernel length (KL), width (KW) and thickness (KT) were evaluated, together with hundred-grain weight (HGW), in a recombinant inbred line population derived from Nanda2419 × Wangshuibai, with data from five trials (two different locations over 3 years). The results showed that HGW was more closely correlated with KT and KW than with KL. A whole genome scan revealed four QTLs for KL, one for KW and two for KT, distributed on five different chromosomes. Of them, QKl.nau-2D for KL, and QKt.nau-4B and QKt.nau-5A for KT were newly identified major QTLs for the respective traits, explaining up to 32.6 and 41.5% of the phenotypic variations, respectively. Increase of KW and KT and reduction of KL/KT and KW/KT ratios always resulted in significant higher grain weight. Lines combining the Nanda 2419 alleles of the 4B and 5A intervals had wider, thicker, rounder kernels and a 14% higher grain weight in the genotype-based analysis. A strong, negative linear relationship of the KW/KT ratio with grain weight was observed. It thus appears that kernel thickness is the most important kernel dimension factor in wheat improvement for higher yield. Mapping and marker identification of the kernel dimension-related QTLs definitely help realize the breeding goals.
Kernel learning at the first level of inference.
Cawley, Gavin C; Talbot, Nicola L C
2014-05-01
Kernel learning methods, whether Bayesian or frequentist, typically involve multiple levels of inference, with the coefficients of the kernel expansion being determined at the first level and the kernel and regularisation parameters carefully tuned at the second level, a process known as model selection. Model selection for kernel machines is commonly performed via optimisation of a suitable model selection criterion, often based on cross-validation or theoretical performance bounds. However, if there are a large number of kernel parameters, as for instance in the case of automatic relevance determination (ARD), there is a substantial risk of over-fitting the model selection criterion, resulting in poor generalisation performance. In this paper we investigate the possibility of learning the kernel, for the Least-Squares Support Vector Machine (LS-SVM) classifier, at the first level of inference, i.e. parameter optimisation. The kernel parameters and the coefficients of the kernel expansion are jointly optimised at the first level of inference, minimising a training criterion with an additional regularisation term acting on the kernel parameters. The key advantage of this approach is that the values of only two regularisation parameters need be determined in model selection, substantially alleviating the problem of over-fitting the model selection criterion. The benefits of this approach are demonstrated using a suite of synthetic and real-world binary classification benchmark problems, where kernel learning at the first level of inference is shown to be statistically superior to the conventional approach, improves on our previous work (Cawley and Talbot, 2007) and is competitive with Multiple Kernel Learning approaches, but with reduced computational expense. Copyright © 2014 Elsevier Ltd. All rights reserved.
Workflow Management Systems for Molecular Dynamics on Leadership Computers
NASA Astrophysics Data System (ADS)
Wells, Jack; Panitkin, Sergey; Oleynik, Danila; Jha, Shantenu
Molecular Dynamics (MD) simulations play an important role in a range of disciplines from Material Science to Biophysical systems and account for a large fraction of cycles consumed on computing resources. Increasingly science problems require the successful execution of ''many'' MD simulations as opposed to a single MD simulation. There is a need to provide scalable and flexible approaches to the execution of the workload. We present preliminary results on the Titan computer at the Oak Ridge Leadership Computing Facility that demonstrate a general capability to manage workload execution agnostic of a specific MD simulation kernel or execution pattern, and in a manner that integrates disparate grid-based and supercomputing resources. Our results build upon our extensive experience of distributed workload management in the high-energy physics ATLAS project using PanDA (Production and Distributed Analysis System), coupled with recent conceptual advances in our understanding of workload management on heterogeneous resources. We will discuss how we will generalize these initial capabilities towards a more production level service on DOE leadership resources. This research is sponsored by US DOE/ASCR and used resources of the OLCF computing facility.
Adaptive kernel function using line transect sampling
NASA Astrophysics Data System (ADS)
Albadareen, Baker; Ismail, Noriszura
2018-04-01
The estimation of f(0) is crucial in the line transect method which is used for estimating population abundance in wildlife survey's. The classical kernel estimator of f(0) has a high negative bias. Our study proposes an adaptation in the kernel function which is shown to be more efficient than the usual kernel estimator. A simulation study is adopted to compare the performance of the proposed estimators with the classical kernel estimators.
Vidyasagar, Mathukumalli
2015-01-01
This article reviews several techniques from machine learning that can be used to study the problem of identifying a small number of features, from among tens of thousands of measured features, that can accurately predict a drug response. Prediction problems are divided into two categories: sparse classification and sparse regression. In classification, the clinical parameter to be predicted is binary, whereas in regression, the parameter is a real number. Well-known methods for both classes of problems are briefly discussed. These include the SVM (support vector machine) for classification and various algorithms such as ridge regression, LASSO (least absolute shrinkage and selection operator), and EN (elastic net) for regression. In addition, several well-established methods that do not directly fall into machine learning theory are also reviewed, including neural networks, PAM (pattern analysis for microarrays), SAM (significance analysis for microarrays), GSEA (gene set enrichment analysis), and k-means clustering. Several references indicative of the application of these methods to cancer biology are discussed.
Won, Sungho; Choi, Hosik; Park, Suyeon; Lee, Juyoung; Park, Changyi; Kwon, Sunghoon
2015-01-01
Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.
Statistical downscaling of precipitation using long short-term memory recurrent neural networks
NASA Astrophysics Data System (ADS)
Misra, Saptarshi; Sarkar, Sudeshna; Mitra, Pabitra
2017-11-01
Hydrological impacts of global climate change on regional scale are generally assessed by downscaling large-scale climatic variables, simulated by General Circulation Models (GCMs), to regional, small-scale hydrometeorological variables like precipitation, temperature, etc. In this study, we propose a new statistical downscaling model based on Recurrent Neural Network with Long Short-Term Memory which captures the spatio-temporal dependencies in local rainfall. The previous studies have used several other methods such as linear regression, quantile regression, kernel regression, beta regression, and artificial neural networks. Deep neural networks and recurrent neural networks have been shown to be highly promising in modeling complex and highly non-linear relationships between input and output variables in different domains and hence we investigated their performance in the task of statistical downscaling. We have tested this model on two datasets—one on precipitation in Mahanadi basin in India and the second on precipitation in Campbell River basin in Canada. Our autoencoder coupled long short-term memory recurrent neural network model performs the best compared to other existing methods on both the datasets with respect to temporal cross-correlation, mean squared error, and capturing the extremes.
Pollen source effects on growth of kernel structures and embryo chemical compounds in maize.
Tanaka, W; Mantese, A I; Maddonni, G A
2009-08-01
Previous studies have reported effects of pollen source on the oil concentration of maize (Zea mays) kernels through modifications to both the embryo/kernel ratio and embryo oil concentration. The present study expands upon previous analyses by addressing pollen source effects on the growth of kernel structures (i.e. pericarp, endosperm and embryo), allocation of embryo chemical constituents (i.e. oil, protein, starch and soluble sugars), and the anatomy and histology of the embryos. Maize kernels with different oil concentration were obtained from pollinations with two parental genotypes of contrasting oil concentration. The dynamics of the growth of kernel structures and allocation of embryo chemical constituents were analysed during the post-flowering period. Mature kernels were dissected to study the anatomy (embryonic axis and scutellum) and histology [cell number and cell size of the scutellums, presence of sub-cellular structures in scutellum tissue (starch granules, oil and protein bodies)] of the embryos. Plants of all crosses exhibited a similar kernel number and kernel weight. Pollen source modified neither the growth period of kernel structures, nor pericarp growth rate. By contrast, pollen source determined a trade-off between embryo and endosperm growth rates, which impacted on the embryo/kernel ratio of mature kernels. Modifications to the embryo size were mediated by scutellum cell number. Pollen source also affected (P < 0.01) allocation of embryo chemical compounds. Negative correlations among embryo oil concentration and those of starch (r = 0.98, P < 0.01) and soluble sugars (r = 0.95, P < 0.05) were found. Coincidently, embryos with low oil concentration had an increased (P < 0.05-0.10) scutellum cell area occupied by starch granules and fewer oil bodies. The effects of pollen source on both embryo/kernel ratio and allocation of embryo chemicals seems to be related to the early established sink strength (i.e. sink size and sink activity) of the embryos.
7 CFR 868.254 - Broken kernels determination.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 7 2010-01-01 2010-01-01 false Broken kernels determination. 868.254 Section 868.254 Agriculture Regulations of the Department of Agriculture (Continued) GRAIN INSPECTION, PACKERS AND STOCKYARD... Governing Application of Standards § 868.254 Broken kernels determination. Broken kernels shall be...
7 CFR 51.2090 - Serious damage.
Code of Federal Regulations, 2010 CFR
2010-01-01
... defect which makes a kernel or piece of kernel unsuitable for human consumption, and includes decay...: Shriveling when the kernel is seriously withered, shrunken, leathery, tough or only partially developed: Provided, that partially developed kernels are not considered seriously damaged if more than one-fourth of...
Anisotropic hydrodynamics with a scalar collisional kernel
NASA Astrophysics Data System (ADS)
Almaalol, Dekrayat; Strickland, Michael
2018-04-01
Prior studies of nonequilibrium dynamics using anisotropic hydrodynamics have used the relativistic Anderson-Witting scattering kernel or some variant thereof. In this paper, we make the first study of the impact of using a more realistic scattering kernel. For this purpose, we consider a conformal system undergoing transversally homogenous and boost-invariant Bjorken expansion and take the collisional kernel to be given by the leading order 2 ↔2 scattering kernel in scalar λ ϕ4 . We consider both classical and quantum statistics to assess the impact of Bose enhancement on the dynamics. We also determine the anisotropic nonequilibrium attractor of a system subject to this collisional kernel. We find that, when the near-equilibrium relaxation-times in the Anderson-Witting and scalar collisional kernels are matched, the scalar kernel results in a higher degree of momentum-space anisotropy during the system's evolution, given the same initial conditions. Additionally, we find that taking into account Bose enhancement further increases the dynamically generated momentum-space anisotropy.
Ideal regularization for learning kernels from labels.
Pan, Binbin; Lai, Jianhuang; Shen, Lixin
2014-08-01
In this paper, we propose a new form of regularization that is able to utilize the label information of a data set for learning kernels. The proposed regularization, referred to as ideal regularization, is a linear function of the kernel matrix to be learned. The ideal regularization allows us to develop efficient algorithms to exploit labels. Three applications of the ideal regularization are considered. Firstly, we use the ideal regularization to incorporate the labels into a standard kernel, making the resulting kernel more appropriate for learning tasks. Next, we employ the ideal regularization to learn a data-dependent kernel matrix from an initial kernel matrix (which contains prior similarity information, geometric structures, and labels of the data). Finally, we incorporate the ideal regularization to some state-of-the-art kernel learning problems. With this regularization, these learning problems can be formulated as simpler ones which permit more efficient solvers. Empirical results show that the ideal regularization exploits the labels effectively and efficiently. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Baker, M. P.; King, J. C.; Gorman, B. P.; Braley, J. C.
2015-03-01
Current methods of TRISO fuel kernel production in the United States use a sol-gel process with trichloroethylene (TCE) as the forming fluid. After contact with radioactive materials, the spent TCE becomes a mixed hazardous waste, and high costs are associated with its recycling or disposal. Reducing or eliminating this mixed waste stream would not only benefit the environment, but would also enhance the economics of kernel production. Previous research yielded three candidates for testing as alternatives to TCE: 1-bromotetradecane, 1-chlorooctadecane, and 1-iodododecane. This study considers the production of yttria-stabilized zirconia (YSZ) kernels in silicone oil and the three chosen alternative formation fluids, with subsequent characterization of the produced kernels and used forming fluid. Kernels formed in silicone oil and bromotetradecane were comparable to those produced by previous kernel production efforts, while those produced in chlorooctadecane and iodododecane experienced gelation issues leading to poor kernel formation and geometry.
NASA Astrophysics Data System (ADS)
Jaravel, Thomas; Labahn, Jeffrey; Ihme, Matthias
2017-11-01
The reliable initiation of flame ignition by high-energy spark kernels is critical for the operability of aviation gas turbines. The evolution of a spark kernel ejected by an igniter into a turbulent stratified environment is investigated using detailed numerical simulations with complex chemistry. At early times post ejection, comparisons of simulation results with high-speed Schlieren data show that the initial trajectory of the kernel is well reproduced, with a significant amount of air entrainment from the surrounding flow that is induced by the kernel ejection. After transiting in a non-flammable mixture, the kernel reaches a second stream of flammable methane-air mixture, where the successful of the kernel ignition was found to depend on the local flow state and operating conditions. By performing parametric studies, the probability of kernel ignition was identified, and compared with experimental observations. The ignition behavior is characterized by analyzing the local chemical structure, and its stochastic variability is also investigated.
The site, size, spatial stability, and energetics of an X-ray flare kernel
NASA Technical Reports Server (NTRS)
Petrasso, R.; Gerassimenko, M.; Nolte, J.
1979-01-01
The site, size evolution, and energetics of an X-ray kernel that dominated a solar flare during its rise and somewhat during its peak are investigated. The position of the kernel remained stationary to within about 3 arc sec over the 30-min interval of observations, despite pulsations in the kernel X-ray brightness in excess of a factor of 10. This suggests a tightly bound, deeply rooted magnetic structure, more plausibly associated with the near chromosphere or low corona rather than with the high corona. The H-alpha flare onset coincided with the appearance of the kernel, again suggesting a close spatial and temporal coupling between the chromospheric H-alpha event and the X-ray kernel. At the first kernel brightness peak its size was no larger than about 2 arc sec, when it accounted for about 40% of the total flare flux. In the second rise phase of the kernel, a source power input of order 2 times 10 to the 24th ergs/sec is minimally required.
The pre-image problem in kernel methods.
Kwok, James Tin-yau; Tsang, Ivor Wai-hung
2004-11-01
In this paper, we address the problem of finding the pre-image of a feature vector in the feature space induced by a kernel. This is of central importance in some kernel applications, such as on using kernel principal component analysis (PCA) for image denoising. Unlike the traditional method which relies on nonlinear optimization, our proposed method directly finds the location of the pre-image based on distance constraints in the feature space. It is noniterative, involves only linear algebra and does not suffer from numerical instability or local minimum problems. Evaluations on performing kernel PCA and kernel clustering on the USPS data set show much improved performance.
Effects of Amygdaline from Apricot Kernel on Transplanted Tumors in Mice.
Yamshanov, V A; Kovan'ko, E G; Pustovalov, Yu I
2016-03-01
The effects of amygdaline from apricot kernel added to fodder on the growth of transplanted LYO-1 and Ehrlich carcinoma were studied in mice. Apricot kernels inhibited the growth of both tumors. Apricot kernels, raw and after thermal processing, given 2 days before transplantation produced a pronounced antitumor effect. Heat-processed apricot kernels given in 3 days after transplantation modified the tumor growth and prolonged animal lifespan. Thermal treatment did not considerably reduce the antitumor effect of apricot kernels. It was hypothesized that the antitumor effect of amygdaline on Ehrlich carcinoma and LYO-1 lymphosarcoma was associated with the presence of bacterial genome in the tumor.
Development of a kernel function for clinical data.
Daemen, Anneleen; De Moor, Bart
2009-01-01
For most diseases and examinations, clinical data such as age, gender and medical history guides clinical management, despite the rise of high-throughput technologies. To fully exploit such clinical information, appropriate modeling of relevant parameters is required. As the widely used linear kernel function has several disadvantages when applied to clinical data, we propose a new kernel function specifically developed for this data. This "clinical kernel function" more accurately represents similarities between patients. Evidently, three data sets were studied and significantly better performances were obtained with a Least Squares Support Vector Machine when based on the clinical kernel function compared to the linear kernel function.
Manycore Performance-Portability: Kokkos Multidimensional Array Library
Edwards, H. Carter; Sunderland, Daniel; Porter, Vicki; ...
2012-01-01
Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel executionmore » performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].« less
Wang, Shunfang; Nie, Bing; Yue, Kun; Fei, Yu; Li, Wenjia; Xu, Dongshu
2017-12-15
Kernel discriminant analysis (KDA) is a dimension reduction and classification algorithm based on nonlinear kernel trick, which can be novelly used to treat high-dimensional and complex biological data before undergoing classification processes such as protein subcellular localization. Kernel parameters make a great impact on the performance of the KDA model. Specifically, for KDA with the popular Gaussian kernel, to select the scale parameter is still a challenging problem. Thus, this paper introduces the KDA method and proposes a new method for Gaussian kernel parameter selection depending on the fact that the differences between reconstruction errors of edge normal samples and those of interior normal samples should be maximized for certain suitable kernel parameters. Experiments with various standard data sets of protein subcellular localization show that the overall accuracy of protein classification prediction with KDA is much higher than that without KDA. Meanwhile, the kernel parameter of KDA has a great impact on the efficiency, and the proposed method can produce an optimum parameter, which makes the new algorithm not only perform as effectively as the traditional ones, but also reduce the computational time and thus improve efficiency.
NASA Astrophysics Data System (ADS)
Jin, Hyeongmin; Heo, Changyong; Kim, Jong Hyo
2018-02-01
Differing reconstruction kernels are known to strongly affect the variability of imaging biomarkers and thus remain as a barrier in translating the computer aided quantification techniques into clinical practice. This study presents a deep learning application to CT kernel conversion which converts a CT image of sharp kernel to that of standard kernel and evaluates its impact on variability reduction of a pulmonary imaging biomarker, the emphysema index (EI). Forty cases of low-dose chest CT exams obtained with 120kVp, 40mAs, 1mm thickness, of 2 reconstruction kernels (B30f, B50f) were selected from the low dose lung cancer screening database of our institution. A Fully convolutional network was implemented with Keras deep learning library. The model consisted of symmetric layers to capture the context and fine structure characteristics of CT images from the standard and sharp reconstruction kernels. Pairs of the full-resolution CT data set were fed to input and output nodes to train the convolutional network to learn the appropriate filter kernels for converting the CT images of sharp kernel to standard kernel with a criterion of measuring the mean squared error between the input and target images. EIs (RA950 and Perc15) were measured with a software package (ImagePrism Pulmo, Seoul, South Korea) and compared for the data sets of B50f, B30f, and the converted B50f. The effect of kernel conversion was evaluated with the mean and standard deviation of pair-wise differences in EI. The population mean of RA950 was 27.65 +/- 7.28% for B50f data set, 10.82 +/- 6.71% for the B30f data set, and 8.87 +/- 6.20% for the converted B50f data set. The mean of pair-wise absolute differences in RA950 between B30f and B50f is reduced from 16.83% to 1.95% using kernel conversion. Our study demonstrates the feasibility of applying the deep learning technique for CT kernel conversion and reducing the kernel-induced variability of EI quantification. The deep learning model has a potential to improve the reliability of imaging biomarker, especially in evaluating the longitudinal changes of EI even when the patient CT scans were performed with different kernels.
Metabolic network prediction through pairwise rational kernels.
Roche-Lima, Abiel; Domaratzki, Michael; Fristensky, Brian
2014-09-26
Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations. We create a new family of pairwise kernels using weighted finite-state transducers (called Pairwise Rational Kernel (PRK)) to predict metabolic pathways from a variety of biological data. PRKs take advantage of the simpler representations and faster algorithms of transducers. Because raw sequence data can be used, the predictor model avoids the errors introduced by incorrect gene annotations. We then developed several experiments with PRKs and Pairwise SVM to validate our methods using the metabolic network of Saccharomyces cerevisiae. As a result, when PRKs are used, our method executes faster in comparison with other pairwise kernels. Also, when we use PRKs combined with other simple kernels that include evolutionary information, the accuracy values have been improved, while maintaining lower construction and execution times. The power of using kernels is that almost any sort of data can be represented using kernels. Therefore, completely disparate types of data can be combined to add power to kernel-based machine learning methods. When we compared our proposal using PRKs with other similar kernel, the execution times were decreased, with no compromise of accuracy. We also proved that by combining PRKs with other kernels that include evolutionary information, the accuracy can also also be improved. As our proposal can use any type of sequence data, genes do not need to be properly annotated, avoiding accumulation errors because of incorrect previous annotations.
Iceberg Ahead: The Effect of Bands and Ridges During Chaos Formation on Europa.
NASA Astrophysics Data System (ADS)
Hedgepeth, J. E.; Schmidt, B. E.
2016-12-01
Europa presents a dynamic and varied surface, but the most enticing component is arguably its chaos structures. With it, the surface and subsurface can interact, but in order to fully understand if this is occurring we have to properly parameterize the surface structural integrity. We consider the Schmidt et al. (2011) method of classifying icebergs by feature type to study what features remained intact in the chaos matrix. In this work we expand on this idea. We hypothesize that the ice that forms ridges and bands exhibit higher structural strengths than plains. Subsequently, this ice is more likely to remain during chaos formation in the form of icebergs. We begin by mapping the surface around Murias chaos and other prominent chaos features. Maps are used to infer what paleo-topographic features existed before chaos formation by using the features surrounding the chaos regions as blueprints for what existed before. We perform a multivariate regression to correlate the amount of icebergs present to the amount of surface that was covered by either bands, plains, or ridges. We find ridges play the biggest role in the production of icebergs with a weighted value of 40%. Bands may play a smaller role (13%), but plains show little to no correlation (5%). Further mapping will better reveal if this trend holds true in other regions. This statistical analysis supports our hypothesis, and further work will better quantify what is occurring. We will address the energy expended in the chaos regions via movement and rotation of icebergs during the formation event and through ice-melt.
Differential metabolome analysis of field-grown maize kernels in response to drought stress
USDA-ARS?s Scientific Manuscript database
Drought stress constrains maize kernel development and can exacerbate aflatoxin contamination. In order to identify drought responsive metabolites and explore pathways involved in kernel responses, a metabolomics analysis was conducted on kernels from a drought tolerant line, Lo964, and a sensitive ...
Occurrence of 'super soft' wheat kernel texture in hexaploid and tetraploid wheats
USDA-ARS?s Scientific Manuscript database
Wheat kernel texture is a key trait that governs milling performance, flour starch damage, flour particle size, flour hydration properties, and baking quality. Kernel texture is commonly measured using the Perten Single Kernel Characterization System (SKCS). The SKCS returns texture values (Hardness...
7 CFR 868.203 - Basis of determination.
Code of Federal Regulations, 2010 CFR
2010-01-01
... FOR CERTAIN AGRICULTURAL COMMODITIES United States Standards for Rough Rice Principles Governing..., heat-damaged kernels, red rice and damaged kernels, chalky kernels, other types, color, and the special grade Parboiled rough rice shall be on the basis of the whole and large broken kernels of milled rice...
7 CFR 868.203 - Basis of determination.
Code of Federal Regulations, 2011 CFR
2011-01-01
... FOR CERTAIN AGRICULTURAL COMMODITIES United States Standards for Rough Rice Principles Governing..., heat-damaged kernels, red rice and damaged kernels, chalky kernels, other types, color, and the special grade Parboiled rough rice shall be on the basis of the whole and large broken kernels of milled rice...
7 CFR 868.304 - Broken kernels determination.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 7 2011-01-01 2011-01-01 false Broken kernels determination. 868.304 Section 868.304 Agriculture Regulations of the Department of Agriculture (Continued) GRAIN INSPECTION, PACKERS AND STOCKYARD... Application of Standards § 868.304 Broken kernels determination. Broken kernels shall be determined by the use...
7 CFR 868.304 - Broken kernels determination.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 7 2010-01-01 2010-01-01 false Broken kernels determination. 868.304 Section 868.304 Agriculture Regulations of the Department of Agriculture (Continued) GRAIN INSPECTION, PACKERS AND STOCKYARD... Application of Standards § 868.304 Broken kernels determination. Broken kernels shall be determined by the use...
Biasing anisotropic scattering kernels for deep-penetration Monte Carlo calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carter, L.L.; Hendricks, J.S.
1983-01-01
The exponential transform is often used to improve the efficiency of deep-penetration Monte Carlo calculations. This technique is usually implemented by biasing the distance-to-collision kernel of the transport equation, but leaving the scattering kernel unchanged. Dwivedi obtained significant improvements in efficiency by biasing an isotropic scattering kernel as well as the distance-to-collision kernel. This idea is extended to anisotropic scattering, particularly the highly forward Klein-Nishina scattering of gamma rays.
Performance Characteristics of a Kernel-Space Packet Capture Module
2010-03-01
Defense, or the United States Government . AFIT/GCO/ENG/10-03 PERFORMANCE CHARACTERISTICS OF A KERNEL-SPACE PACKET CAPTURE MODULE THESIS Presented to the...3.1.2.3 Prototype. The proof of concept for this research is the design, development, and comparative performance analysis of a kernel level N2d capture...changes to kernel code 5. Can be used for both user-space and kernel-space capture applications in order to control comparative performance analysis to
Makanza, R; Zaman-Allah, M; Cairns, J E; Eyre, J; Burgueño, J; Pacheco, Ángela; Diepenbrock, C; Magorokosho, C; Tarekegne, A; Olsen, M; Prasanna, B M
2018-01-01
Grain yield, ear and kernel attributes can assist to understand the performance of maize plant under different environmental conditions and can be used in the variety development process to address farmer's preferences. These parameters are however still laborious and expensive to measure. A low-cost ear digital imaging method was developed that provides estimates of ear and kernel attributes i.e., ear number and size, kernel number and size as well as kernel weight from photos of ears harvested from field trial plots. The image processing method uses a script that runs in a batch mode on ImageJ; an open source software. Kernel weight was estimated using the total kernel number derived from the number of kernels visible on the image and the average kernel size. Data showed a good agreement in terms of accuracy and precision between ground truth measurements and data generated through image processing. Broad-sense heritability of the estimated parameters was in the range or higher than that for measured grain weight. Limitation of the method for kernel weight estimation is discussed. The method developed in this work provides an opportunity to significantly reduce the cost of selection in the breeding process, especially for resource constrained crop improvement programs and can be used to learn more about the genetic bases of grain yield determinants.
A Kernel-based Lagrangian method for imperfectly-mixed chemical reactions
NASA Astrophysics Data System (ADS)
Schmidt, Michael J.; Pankavich, Stephen; Benson, David A.
2017-05-01
Current Lagrangian (particle-tracking) algorithms used to simulate diffusion-reaction equations must employ a certain number of particles to properly emulate the system dynamics-particularly for imperfectly-mixed systems. The number of particles is tied to the statistics of the initial concentration fields of the system at hand. Systems with shorter-range correlation and/or smaller concentration variance require more particles, potentially limiting the computational feasibility of the method. For the well-known problem of bimolecular reaction, we show that using kernel-based, rather than Dirac delta, particles can significantly reduce the required number of particles. We derive the fixed width of a Gaussian kernel for a given reduced number of particles that analytically eliminates the error between kernel and Dirac solutions at any specified time. We also show how to solve for the fixed kernel size by minimizing the squared differences between solutions over any given time interval. Numerical results show that the width of the kernel should be kept below about 12% of the domain size, and that the analytic equations used to derive kernel width suffer significantly from the neglect of higher-order moments. The simulations with a kernel width given by least squares minimization perform better than those made to match at one specific time. A heuristic time-variable kernel size, based on the previous results, performs on par with the least squares fixed kernel size.
Optimized Kernel Entropy Components.
Izquierdo-Verdiguier, Emma; Laparra, Valero; Jenssen, Robert; Gomez-Chova, Luis; Camps-Valls, Gustau
2017-06-01
This brief addresses two main issues of the standard kernel entropy component analysis (KECA) algorithm: the optimization of the kernel decomposition and the optimization of the Gaussian kernel parameter. KECA roughly reduces to a sorting of the importance of kernel eigenvectors by entropy instead of variance, as in the kernel principal components analysis. In this brief, we propose an extension of the KECA method, named optimized KECA (OKECA), that directly extracts the optimal features retaining most of the data entropy by means of compacting the information in very few features (often in just one or two). The proposed method produces features which have higher expressive power. In particular, it is based on the independent component analysis framework, and introduces an extra rotation to the eigen decomposition, which is optimized via gradient-ascent search. This maximum entropy preservation suggests that OKECA features are more efficient than KECA features for density estimation. In addition, a critical issue in both the methods is the selection of the kernel parameter, since it critically affects the resulting performance. Here, we analyze the most common kernel length-scale selection criteria. The results of both the methods are illustrated in different synthetic and real problems. Results show that OKECA returns projections with more expressive power than KECA, the most successful rule for estimating the kernel parameter is based on maximum likelihood, and OKECA is more robust to the selection of the length-scale parameter in kernel density estimation.
Brain tumor image segmentation using kernel dictionary learning.
Jeon Lee; Seung-Jun Kim; Rong Chen; Herskovits, Edward H
2015-08-01
Automated brain tumor image segmentation with high accuracy and reproducibility holds a big potential to enhance the current clinical practice. Dictionary learning (DL) techniques have been applied successfully to various image processing tasks recently. In this work, kernel extensions of the DL approach are adopted. Both reconstructive and discriminative versions of the kernel DL technique are considered, which can efficiently incorporate multi-modal nonlinear feature mappings based on the kernel trick. Our novel discriminative kernel DL formulation allows joint learning of a task-driven kernel-based dictionary and a linear classifier using a K-SVD-type algorithm. The proposed approaches were tested using real brain magnetic resonance (MR) images of patients with high-grade glioma. The obtained preliminary performances are competitive with the state of the art. The discriminative kernel DL approach is seen to reduce computational burden without much sacrifice in performance.
SEMI-SUPERVISED OBJECT RECOGNITION USING STRUCTURE KERNEL
Wang, Botao; Xiong, Hongkai; Jiang, Xiaoqian; Ling, Fan
2013-01-01
Object recognition is a fundamental problem in computer vision. Part-based models offer a sparse, flexible representation of objects, but suffer from difficulties in training and often use standard kernels. In this paper, we propose a positive definite kernel called “structure kernel”, which measures the similarity of two part-based represented objects. The structure kernel has three terms: 1) the global term that measures the global visual similarity of two objects; 2) the part term that measures the visual similarity of corresponding parts; 3) the spatial term that measures the spatial similarity of geometric configuration of parts. The contribution of this paper is to generalize the discriminant capability of local kernels to complex part-based object models. Experimental results show that the proposed kernel exhibit higher accuracy than state-of-art approaches using standard kernels. PMID:23666108
Chapin, Jay W; Thomas, James S
2003-08-01
Pitfall traps placed in South Carolina peanut, Arachis hypogaea (L.), fields collected three species of burrower bugs (Cydnidae): Cyrtomenus ciliatus (Palisot de Beauvois), Sehirus cinctus cinctus (Palisot de Beauvois), and Pangaeus bilineatus (Say). Cyrtomenus ciliatus was rarely collected. Sehirus cinctus produced a nymphal cohort in peanut during May and June, probably because of abundant henbit seeds, Lamium amplexicaule L., in strip-till production systems. No S. cinctus were present during peanut pod formation. Pangaeus bilineatus was the most abundant species collected and the only species associated with peanut kernel feeding injury. Overwintering P. bilineatus adults were present in a conservation tillage peanut field before planting and two to three subsequent generations were observed. Few nymphs were collected until the R6 (full seed) growth stage. Tillage and choice of cover crop affected P. bilineatus populations. Peanuts strip-tilled into corn or wheat residue had greater P. bilineatus populations and kernel-feeding than conventional tillage or strip-tillage into rye residue. Fall tillage before planting a wheat cover crop also reduced burrower bug feeding on peanut. At-pegging (early July) granular chlorpyrifos treatments were most consistent in suppressing kernel feeding. Kernels fed on by P. bilineatus were on average 10% lighter than unfed on kernels. Pangaeus bilineatus feeding reduced peanut grade by reducing individual kernel weight, and increasing the percentage damaged kernels. Each 10% increase in kernels fed on by P. bilineatus was associated with a 1.7% decrease in total sound mature kernels, and kernel feeding levels above 30% increase the risk of damaged kernel grade penalties.
Toews, Michael D; Pearson, Tom C; Campbell, James F
2006-04-01
Computed tomography, an imaging technique commonly used for diagnosing internal human health ailments, uses multiple x-rays and sophisticated software to recreate a cross-sectional representation of a subject. The use of this technique to image hard red winter wheat, Triticum aestivm L., samples infested with pupae of Sitophilus oryzae (L.) was investigated. A software program was developed to rapidly recognize and quantify the infested kernels. Samples were imaged in a 7.6-cm (o.d.) plastic tube containing 0, 50, or 100 infested kernels per kg of wheat. Interkernel spaces were filled with corn oil so as to increase the contrast between voids inside kernels and voids among kernels. Automated image processing, using a custom C language software program, was conducted separately on each 100 g portion of the prepared samples. The average detection accuracy in the five infested kernels per 100-g samples was 94.4 +/- 7.3% (mean +/- SD, n = 10), whereas the average detection accuracy in the 10 infested kernels per 100-g sample was 87.3 +/- 7.9% (n = 10). Detection accuracy in the 10 infested kernels per 100-g samples was slightly less than the five infested kernels per 100-g samples because of some infested kernels overlapping with each other or air bubbles in the oil. A mean of 1.2 +/- 0.9 (n = 10) bubbles (per tube) was incorrectly classed as infested kernels in replicates containing no infested kernels. In light of these positive results, future studies should be conducted using additional grains, insect species, and life stages.
Relationship of source and sink in determining kernel composition of maize
Seebauer, Juliann R.; Singletary, George W.; Krumpelman, Paulette M.; Ruffo, Matías L.; Below, Frederick E.
2010-01-01
The relative role of the maternal source and the filial sink in controlling the composition of maize (Zea mays L.) kernels is unclear and may be influenced by the genotype and the N supply. The objective of this study was to determine the influence of assimilate supply from the vegetative source and utilization of assimilates by the grain sink on the final composition of maize kernels. Intermated B73×Mo17 recombinant inbred lines (IBM RILs) which displayed contrasting concentrations of endosperm starch were grown in the field with deficient or sufficient N, and the source supply altered by ear truncation (45% reduction) at 15 d after pollination (DAP). The assimilate supply into the kernels was determined at 19 DAP using the agar trap technique, and the final kernel composition was measured. The influence of N supply and kernel ear position on final kernel composition was also determined for a commercial hybrid. Concentrations of kernel protein and starch could be altered by genotype or the N supply, but remained fairly constant along the length of the ear. Ear truncation also produced a range of variation in endosperm starch and protein concentrations. The C/N ratio of the assimilate supply at 19 DAP was directly related to the final kernel composition, with an inverse relationship between the concentrations of starch and protein in the mature endosperm. The accumulation of kernel starch and protein in maize is uniform along the ear, yet adaptable within genotypic limits, suggesting that kernel composition is source limited in maize. PMID:19917600
Zhang, Zhanhui; Wu, Xiangyuan; Shi, Chaonan; Wang, Rongna; Li, Shengfei; Wang, Zhaohui; Liu, Zonghua; Xue, Yadong; Tang, Guiliang; Tang, Jihua
2016-02-01
Kernel development is an important dynamic trait that determines the final grain yield in maize. To dissect the genetic basis of maize kernel development process, a conditional quantitative trait locus (QTL) analysis was conducted using an immortalized F2 (IF2) population comprising 243 single crosses at two locations over 2 years. Volume (KV) and density (KD) of dried developing kernels, together with kernel weight (KW) at different developmental stages, were used to describe dynamic changes during kernel development. Phenotypic analysis revealed that final KW and KD were determined at DAP22 and KV at DAP29. Unconditional QTL mapping for KW, KV and KD uncovered 97 QTLs at different kernel development stages, of which qKW6b, qKW7a, qKW7b, qKW10b, qKW10c, qKV10a, qKV10b and qKV7 were identified under multiple kernel developmental stages and environments. Among the 26 QTLs detected by conditional QTL mapping, conqKW7a, conqKV7a, conqKV10a, conqKD2, conqKD7 and conqKD8a were conserved between the two mapping methodologies. Furthermore, most of these QTLs were consistent with QTLs and genes for kernel development/grain filling reported in previous studies. These QTLs probably contain major genes associated with the kernel development process, and can be used to improve grain yield and quality through marker-assisted selection.
Image quality of mixed convolution kernel in thoracic computed tomography.
Neubauer, Jakob; Spira, Eva Maria; Strube, Juliane; Langer, Mathias; Voss, Christian; Kotter, Elmar
2016-11-01
The mixed convolution kernel alters his properties geographically according to the depicted organ structure, especially for the lung. Therefore, we compared the image quality of the mixed convolution kernel to standard soft and hard kernel reconstructions for different organ structures in thoracic computed tomography (CT) images.Our Ethics Committee approved this prospective study. In total, 31 patients who underwent contrast-enhanced thoracic CT studies were included after informed consent. Axial reconstructions were performed with hard, soft, and mixed convolution kernel. Three independent and blinded observers rated the image quality according to the European Guidelines for Quality Criteria of Thoracic CT for 13 organ structures. The observers rated the depiction of the structures in all reconstructions on a 5-point Likert scale. Statistical analysis was performed with the Friedman Test and post hoc analysis with the Wilcoxon rank-sum test.Compared to the soft convolution kernel, the mixed convolution kernel was rated with a higher image quality for lung parenchyma, segmental bronchi, and the border between the pleura and the thoracic wall (P < 0.03). Compared to the hard convolution kernel, the mixed convolution kernel was rated with a higher image quality for aorta, anterior mediastinal structures, paratracheal soft tissue, hilar lymph nodes, esophagus, pleuromediastinal border, large and medium sized pulmonary vessels and abdomen (P < 0.004) but a lower image quality for trachea, segmental bronchi, lung parenchyma, and skeleton (P < 0.001).The mixed convolution kernel cannot fully substitute the standard CT reconstructions. Hard and soft convolution kernel reconstructions still seem to be mandatory for thoracic CT.
Nonparametric methods for doubly robust estimation of continuous treatment effects.
Kennedy, Edward H; Ma, Zongming; McHugh, Matthew D; Small, Dylan S
2017-09-01
Continuous treatments (e.g., doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the effect curve, or by not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on the effect curve, and still allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and give a procedure for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.
21 CFR 176.350 - Tamarind seed kernel powder.
Code of Federal Regulations, 2014 CFR
2014-04-01
... 21 Food and Drugs 3 2014-04-01 2014-04-01 false Tamarind seed kernel powder. 176.350 Section 176... Paperboard § 176.350 Tamarind seed kernel powder. Tamarind seed kernel powder may be safely used as a component of articles intended for use in producing, manufacturing, packing, processing, preparing, treating...
Local Observed-Score Kernel Equating
ERIC Educational Resources Information Center
Wiberg, Marie; van der Linden, Wim J.; von Davier, Alina A.
2014-01-01
Three local observed-score kernel equating methods that integrate methods from the local equating and kernel equating frameworks are proposed. The new methods were compared with their earlier counterparts with respect to such measures as bias--as defined by Lord's criterion of equity--and percent relative error. The local kernel item response…
Code of Federal Regulations, 2010 CFR
2010-01-01
... which have been broken to the extent that the kernel within is plainly visible without minute... discoloration beneath, but the peanut shall be judged as it appears with the talc. (c) Kernels which are rancid or decayed. (d) Moldy kernels. (e) Kernels showing sprouts extending more than one-eighth inch from...
7 CFR 981.61 - Redetermination of kernel weight.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 8 2010-01-01 2010-01-01 false Redetermination of kernel weight. 981.61 Section 981... GROWN IN CALIFORNIA Order Regulating Handling Volume Regulation § 981.61 Redetermination of kernel weight. The Board, on the basis of reports by handlers, shall redetermine the kernel weight of almonds...
7 CFR 981.60 - Determination of kernel weight.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 8 2010-01-01 2010-01-01 false Determination of kernel weight. 981.60 Section 981.60... Regulating Handling Volume Regulation § 981.60 Determination of kernel weight. (a) Almonds for which settlement is made on kernel weight. All lots of almonds, whether shelled or unshelled, for which settlement...
Genome-wide Association Analysis of Kernel Weight in Hard Winter Wheat
USDA-ARS?s Scientific Manuscript database
Wheat kernel weight is an important and heritable component of wheat grain yield and a key predictor of flour extraction. Genome-wide association analysis was conducted to identify genomic regions associated with kernel weight and kernel weight environmental response in 8 trials of 299 hard winter ...
7 CFR 999.400 - Regulation governing the importation of filberts.
Code of Federal Regulations, 2010 CFR
2010-01-01
...) Definitions. (1) Filberts means filberts or hazelnuts. (2) Inshell filberts means filberts, the kernels or edible portions of which are contained in the shell. (3) Shelled filberts means the kernels of filberts... Filbert kernels or portions of filbert kernels shall meet the following requirements: (1) Well dried and...
Code of Federal Regulations, 2010 CFR
2010-01-01
.... (2) For kernel defects, by count. (i) 12 percent for pecans with kernels which fail to meet the... kernels which are seriously damaged: Provided, That not more than six-sevenths of this amount, or 6 percent, shall be allowed for kernels which are rancid, moldy, decayed or injured by insects: And provided...
Enhanced gluten properties in soft kernel durum wheat
USDA-ARS?s Scientific Manuscript database
Soft kernel durum wheat is a relatively recent development (Morris et al. 2011 Crop Sci. 51:114). The soft kernel trait exerts profound effects on kernel texture, flour milling including break flour yield, milling energy, and starch damage, and dough water absorption (DWA). With the caveat of reduce...
End-use quality of soft kernel durum wheat
USDA-ARS?s Scientific Manuscript database
Kernel texture is a major determinant of end-use quality of wheat. Durum wheat has very hard kernels. We developed soft kernel durum wheat via Ph1b-mediated homoeologous recombination. The Hardness locus was transferred from Chinese Spring to Svevo durum wheat via back-crossing. ‘Soft Svevo’ had SKC...
Code of Federal Regulations, 2014 CFR
2014-01-01
... are excessively thin kernels and can have black, brown or gray surface with a dark interior color and the immaturity has adversely affected the flavor of the kernel. (2) Kernel spotting refers to dark brown or dark gray spots aggregating more than one-eighth of the surface of the kernel. (g) Serious...
Code of Federal Regulations, 2013 CFR
2013-01-01
... are excessively thin kernels and can have black, brown or gray surface with a dark interior color and the immaturity has adversely affected the flavor of the kernel. (2) Kernel spotting refers to dark brown or dark gray spots aggregating more than one-eighth of the surface of the kernel. (g) Serious...
7 CFR 51.1416 - Optional determinations.
Code of Federal Regulations, 2010 CFR
2010-01-01
... throughout the lot. (a) Edible kernel content. A minimum sample of at least 500 grams of in-shell pecans shall be used for determination of edible kernel content. After the sample is weighed and shelled... determine edible kernel content for the lot. (b) Poorly developed kernel content. A minimum sample of at...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bihani, Abhishek; Daigle, Hugh; Cook, Ann
Coexistence of three methane phases (liquid (L), gas (G), hydrate (H)) in marine gas hydrate systems may occur according to in-situ pressure, temperature, salinity and pore size. In sediments with salinity close to seawater, a discrete zone of three-phase (3P) equilibrium may occur near the base of the regional hydrate stability zone (RHSZ) due to capillary effects. The existence of a 3P zone influences the location of the bottom-simulating reflection (BSR) and has implications for methane fluxes at the base of the RHSZ. We studied hydrate stability conditions in two wells, WR313-G and WR313-H, at Walker Ridge Block 313 inmore » the northern Gulf of Mexico. We determined pore size distributions (PSD) by constructing a synthetic nuclear magnetic resonance (NMR) relaxation time distribution. Correlations were obtained by non-linear regression on NMR, gamma ray, and bulk density logs from well KC-151 at Keathley Canyon. The correlations enabled construction of relaxation time distributions for WR313-G and WR313-H, which were used to predict PSD through comparison with mercury injection capillary pressure measurements. With the computed PSD, L+H and L+G methane solubility was determined from in-situ pressure and temperature. The intersection of the L+G and L+H curves for various pore sizes allowed calculation of the depth range of the 3P equilibrium zone. As in previous studies at Blake Ridge and Hydrate Ridge, the top of the 3P zone moves upwards with increasing water depth and overlies the bulk 3P equilibrium depth. In clays at Walker Ridge, the predicted thickness of the 3P zone is approximately 35 m, but in coarse sands it is only a few meters due to the difference in absolute pore sizes and the width of the PSD. The thick 3P zone in the clays may explain in part why the BSR is only observed in the sand layers at Walker Ridge, although other factors may influence the presence or absence of a BSR.« less
NASA Technical Reports Server (NTRS)
Lickly, Ben
2005-01-01
Data from all current JPL missions are stored in files called SPICE kernels. At present, animators who want to use data from these kernels have to either read through the kernels looking for the desired data, or write programs themselves to retrieve information about all the needed objects for their animations. In this project, methods of automating the process of importing the data from the SPICE kernels were researched. In particular, tools were developed for creating basic scenes in Maya, a 3D computer graphics software package, from SPICE kernels.
Generalization Performance of Regularized Ranking With Multiscale Kernels.
Zhou, Yicong; Chen, Hong; Lan, Rushi; Pan, Zhibin
2016-05-01
The regularized kernel method for the ranking problem has attracted increasing attentions in machine learning. The previous regularized ranking algorithms are usually based on reproducing kernel Hilbert spaces with a single kernel. In this paper, we go beyond this framework by investigating the generalization performance of the regularized ranking with multiscale kernels. A novel ranking algorithm with multiscale kernels is proposed and its representer theorem is proved. We establish the upper bound of the generalization error in terms of the complexity of hypothesis spaces. It shows that the multiscale ranking algorithm can achieve satisfactory learning rates under mild conditions. Experiments demonstrate the effectiveness of the proposed method for drug discovery and recommendation tasks.
NASA Astrophysics Data System (ADS)
Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui
2015-12-01
This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.
Regression analysis of sparse asynchronous longitudinal data.
Cao, Hongyuan; Zeng, Donglin; Fine, Jason P
2015-09-01
We consider estimation of regression models for sparse asynchronous longitudinal observations, where time-dependent responses and covariates are observed intermittently within subjects. Unlike with synchronous data, where the response and covariates are observed at the same time point, with asynchronous data, the observation times are mismatched. Simple kernel-weighted estimating equations are proposed for generalized linear models with either time invariant or time-dependent coefficients under smoothness assumptions for the covariate processes which are similar to those for synchronous data. For models with either time invariant or time-dependent coefficients, the estimators are consistent and asymptotically normal but converge at slower rates than those achieved with synchronous data. Simulation studies evidence that the methods perform well with realistic sample sizes and may be superior to a naive application of methods for synchronous data based on an ad hoc last value carried forward approach. The practical utility of the methods is illustrated on data from a study on human immunodeficiency virus.
A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.
Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung
2016-03-01
With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the multicollinearity between explanatory variables were also discussed. By including a specific congestion indicator, the model performance significantly improved. When comparing models with and without ridge regression, the magnitude of the coefficients was altered in the existence of multicollinearity. These conclusions suggest that the use of appropriate congestion measure and consideration of multicolilnearity among the variables would improve the models and our understanding about the effects of congestion on traffic safety. Copyright © 2015 Elsevier Ltd. All rights reserved.
Graph wavelet alignment kernels for drug virtual screening.
Smalter, Aaron; Huan, Jun; Lushington, Gerald
2009-06-01
In this paper, we introduce a novel statistical modeling technique for target property prediction, with applications to virtual screening and drug design. In our method, we use graphs to model chemical structures and apply a wavelet analysis of graphs to summarize features capturing graph local topology. We design a novel graph kernel function to utilize the topology features to build predictive models for chemicals via Support Vector Machine classifier. We call the new graph kernel a graph wavelet-alignment kernel. We have evaluated the efficacy of the wavelet-alignment kernel using a set of chemical structure-activity prediction benchmarks. Our results indicate that the use of the kernel function yields performance profiles comparable to, and sometimes exceeding that of the existing state-of-the-art chemical classification approaches. In addition, our results also show that the use of wavelet functions significantly decreases the computational costs for graph kernel computation with more than ten fold speedup.
Anato, F M; Sinzogan, A A C; Offenberg, J; Adandonon, A; Wargui, R B; Deguenon, J M; Ayelo, P M; Vayssières, J-F; Kossou, D K
2017-06-01
Weaver ants, Oecophylla spp., are known to positively affect cashew, Anacardium occidentale L., raw nut yield, but their effects on the kernels have not been reported. We compared nut size and the proportion of marketable kernels between raw nuts collected from trees with and without ants. Raw nuts collected from trees with weaver ants were 2.9% larger than nuts from control trees (i.e., without weaver ants), leading to 14% higher proportion of marketable kernels. On trees with ants, the kernel: raw nut ratio from nuts damaged by formic acid was 4.8% lower compared with nondamaged nuts from the same trees. Weaver ants provided three benefits to cashew production by increasing yields, yielding larger nuts, and by producing greater proportions of marketable kernel mass. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Kernel-aligned multi-view canonical correlation analysis for image recognition
NASA Astrophysics Data System (ADS)
Su, Shuzhi; Ge, Hongwei; Yuan, Yun-Hao
2016-09-01
Existing kernel-based correlation analysis methods mainly adopt a single kernel in each view. However, only a single kernel is usually insufficient to characterize nonlinear distribution information of a view. To solve the problem, we transform each original feature vector into a 2-dimensional feature matrix by means of kernel alignment, and then propose a novel kernel-aligned multi-view canonical correlation analysis (KAMCCA) method on the basis of the feature matrices. Our proposed method can simultaneously employ multiple kernels to better capture the nonlinear distribution information of each view, so that correlation features learned by KAMCCA can have well discriminating power in real-world image recognition. Extensive experiments are designed on five real-world image datasets, including NIR face images, thermal face images, visible face images, handwritten digit images, and object images. Promising experimental results on the datasets have manifested the effectiveness of our proposed method.
Small convolution kernels for high-fidelity image restoration
NASA Technical Reports Server (NTRS)
Reichenbach, Stephen E.; Park, Stephen K.
1991-01-01
An algorithm is developed for computing the mean-square-optimal values for small, image-restoration kernels. The algorithm is based on a comprehensive, end-to-end imaging system model that accounts for the important components of the imaging process: the statistics of the scene, the point-spread function of the image-gathering device, sampling effects, noise, and display reconstruction. Subject to constraints on the spatial support of the kernel, the algorithm generates the kernel values that restore the image with maximum fidelity, that is, the kernel minimizes the expected mean-square restoration error. The algorithm is consistent with the derivation of the spatially unconstrained Wiener filter, but leads to a small, spatially constrained kernel that, unlike the unconstrained filter, can be efficiently implemented by convolution. Simulation experiments demonstrate that for a wide range of imaging systems these small kernels can restore images with fidelity comparable to images restored with the unconstrained Wiener filter.
Kernels, Degrees of Freedom, and Power Properties of Quadratic Distance Goodness-of-Fit Tests
Lindsay, Bruce G.; Markatou, Marianthi; Ray, Surajit
2014-01-01
In this article, we study the power properties of quadratic-distance-based goodness-of-fit tests. First, we introduce the concept of a root kernel and discuss the considerations that enter the selection of this kernel. We derive an easy to use normal approximation to the power of quadratic distance goodness-of-fit tests and base the construction of a noncentrality index, an analogue of the traditional noncentrality parameter, on it. This leads to a method akin to the Neyman-Pearson lemma for constructing optimal kernels for specific alternatives. We then introduce a midpower analysis as a device for choosing optimal degrees of freedom for a family of alternatives of interest. Finally, we introduce a new diffusion kernel, called the Pearson-normal kernel, and study the extent to which the normal approximation to the power of tests based on this kernel is valid. Supplementary materials for this article are available online. PMID:24764609
NASA Technical Reports Server (NTRS)
Kahler, S. W.; Petrasso, R. D.; Kane, S. R.
1976-01-01
The physical parameters for the kernels of three solar X-ray flare events have been deduced using photographic data from the S-054 X-ray telescope on Skylab as the primary data source and 1-8 and 8-20 A fluxes from Solrad 9 as the secondary data source. The kernels had diameters of about 5-7 seconds of arc and in two cases electron densities at least as high as 0.3 trillion per cu cm. The lifetimes of the kernels were 5-10 min. The presence of thermal conduction during the decay phases is used to argue: (1) that kernels are entire, not small portions of, coronal loop structures, and (2) that flare heating must continue during the decay phase. We suggest a simple geometric model to explain the role of kernels in flares in which kernels are identified with emerging flux regions.
21 CFR 176.350 - Tamarind seed kernel powder.
Code of Federal Regulations, 2011 CFR
2011-04-01
... 21 Food and Drugs 3 2011-04-01 2011-04-01 false Tamarind seed kernel powder. 176.350 Section 176... Substances for Use Only as Components of Paper and Paperboard § 176.350 Tamarind seed kernel powder. Tamarind seed kernel powder may be safely used as a component of articles intended for use in producing...
21 CFR 176.350 - Tamarind seed kernel powder.
Code of Federal Regulations, 2012 CFR
2012-04-01
... 21 Food and Drugs 3 2012-04-01 2012-04-01 false Tamarind seed kernel powder. 176.350 Section 176... Substances for Use Only as Components of Paper and Paperboard § 176.350 Tamarind seed kernel powder. Tamarind seed kernel powder may be safely used as a component of articles intended for use in producing...
21 CFR 176.350 - Tamarind seed kernel powder.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 21 Food and Drugs 3 2010-04-01 2009-04-01 true Tamarind seed kernel powder. 176.350 Section 176... Substances for Use Only as Components of Paper and Paperboard § 176.350 Tamarind seed kernel powder. Tamarind seed kernel powder may be safely used as a component of articles intended for use in producing...
21 CFR 176.350 - Tamarind seed kernel powder.
Code of Federal Regulations, 2013 CFR
2013-04-01
... 21 Food and Drugs 3 2013-04-01 2013-04-01 false Tamarind seed kernel powder. 176.350 Section 176... Substances for Use Only as Components of Paper and Paperboard § 176.350 Tamarind seed kernel powder. Tamarind seed kernel powder may be safely used as a component of articles intended for use in producing...
7 CFR 51.1403 - Kernel color classification.
Code of Federal Regulations, 2013 CFR
2013-01-01
... generally conforms to the “light” or “light amber” classification, that color classification may be used to... 7 Agriculture 2 2013-01-01 2013-01-01 false Kernel color classification. 51.1403 Section 51.1403... Color Classification § 51.1403 Kernel color classification. (a) The skin color of pecan kernels may be...
7 CFR 51.1403 - Kernel color classification.
Code of Federal Regulations, 2014 CFR
2014-01-01
... generally conforms to the “light” or “light amber” classification, that color classification may be used to... 7 Agriculture 2 2014-01-01 2014-01-01 false Kernel color classification. 51.1403 Section 51.1403... Color Classification § 51.1403 Kernel color classification. (a) The skin color of pecan kernels may be...
Nutrition quality of extraction mannan residue from palm kernel cake on brolier chicken
NASA Astrophysics Data System (ADS)
Tafsin, M.; Hanafi, N. D.; Kejora, E.; Yusraini, E.
2018-02-01
This study aims to find out the nutrient residue of palm kernel cake from mannan extraction on broiler chicken by evaluating physical quality (specific gravity, bulk density and compacted bulk density), chemical quality (proximate analysis and Van Soest Test) and biological test (metabolizable energy). Treatment composed of T0 : palm kernel cake extracted aquadest (control), T1 : palm kernel cake extracted acetic acid (CH3COOH) 1%, T2 : palm kernel cake extracted aquadest + mannanase enzyme 100 u/l and T3 : palm kernel cake extracted acetic acid (CH3COOH) 1% + enzyme mannanase 100 u/l. The results showed that mannan extraction had significant effect (P<0.05) in improving the quality of physical and numerically increase the value of crude protein and decrease the value of NDF (Neutral Detergent Fiber). Treatments had highly significant influence (P<0.01) on the metabolizable energy value of palm kernel cake residue in broiler chickens. It can be concluded that extraction with aquadest + enzyme mannanase 100 u/l yields the best nutrient quality of palm kernel cake residue for broiler chicken.
Oil point and mechanical behaviour of oil palm kernels in linear compression
NASA Astrophysics Data System (ADS)
Kabutey, Abraham; Herak, David; Choteborsky, Rostislav; Mizera, Čestmír; Sigalingging, Riswanti; Akangbe, Olaosebikan Layi
2017-07-01
The study described the oil point and mechanical properties of roasted and unroasted bulk oil palm kernels under compression loading. The literature information available is very limited. A universal compression testing machine and vessel diameter of 60 mm with a plunger were used by applying maximum force of 100 kN and speed ranging from 5 to 25 mm min-1. The initial pressing height of the bulk kernels was measured at 40 mm. The oil point was determined by a litmus test for each deformation level of 5, 10, 15, 20, and 25 mm at a minimum speed of 5 mmmin-1. The measured parameters were the deformation, deformation energy, oil yield, oil point strain and oil point pressure. Clearly, the roasted bulk kernels required less deformation energy compared to the unroasted kernels for recovering the kernel oil. However, both kernels were not permanently deformed. The average oil point strain was determined at 0.57. The study is an essential contribution to pursuing innovative methods for processing palm kernel oil in rural areas of developing countries.
Jia, Xiaodong; Luo, Huiting; Xu, Mengyang; Zhai, Min; Guo, Zhongren; Qiao, Yushan; Wang, Liangju
2018-02-16
Pecan ( Carya illinoinensis ) kernels have a high phenolics content and a high antioxidant capacity compared to other nuts-traits that have attracted great interest of late. Changes in the total phenolic content (TPC), condensed tannins (CT), total flavonoid content (TFC), five individual phenolics, and antioxidant capacity of five pecan cultivars were investigated during the process of kernel ripening. Ultra-performance liquid chromatography coupled with quadruple time-of-flight mass (UPLC-Q/TOF-MS) was also used to analyze the phenolics profiles in mixed pecan kernels. TPC, CT, TFC, individual phenolics, and antioxidant capacity were changed in similar patterns, with values highest at the water or milk stages, lowest at milk or dough stages, and slightly varied at kernel stages. Forty phenolics were tentatively identified in pecan kernels, of which two were first reported in the genus Carya , six were first reported in Carya illinoinensis , and one was first reported in its kernel. The findings on these new phenolic compounds provide proof of the high antioxidant capacity of pecan kernels.
Lu, Zhao; Sun, Jing; Butts, Kenneth
2016-02-03
A giant leap has been made in the past couple of decades with the introduction of kernel-based learning as a mainstay for designing effective nonlinear computational learning algorithms. In view of the geometric interpretation of conditional expectation and the ubiquity of multiscale characteristics in highly complex nonlinear dynamic systems [1]-[3], this paper presents a new orthogonal projection operator wavelet kernel, aiming at developing an efficient computational learning approach for nonlinear dynamical system identification. In the framework of multiresolution analysis, the proposed projection operator wavelet kernel can fulfill the multiscale, multidimensional learning to estimate complex dependencies. The special advantage of the projection operator wavelet kernel developed in this paper lies in the fact that it has a closed-form expression, which greatly facilitates its application in kernel learning. To the best of our knowledge, it is the first closed-form orthogonal projection wavelet kernel reported in the literature. It provides a link between grid-based wavelets and mesh-free kernel-based methods. Simulation studies for identifying the parallel models of two benchmark nonlinear dynamical systems confirm its superiority in model accuracy and sparsity.
Novel characterization method of impedance cardiography signals using time-frequency distributions.
Escrivá Muñoz, Jesús; Pan, Y; Ge, S; Jensen, E W; Vallverdú, M
2018-03-16
The purpose of this document is to describe a methodology to select the most adequate time-frequency distribution (TFD) kernel for the characterization of impedance cardiography signals (ICG). The predominant ICG beat was extracted from a patient and was synthetized using time-frequency variant Fourier approximations. These synthetized signals were used to optimize several TFD kernels according to a performance maximization. The optimized kernels were tested for noise resistance on a clinical database. The resulting optimized TFD kernels are presented with their performance calculated using newly proposed methods. The procedure explained in this work showcases a new method to select an appropriate kernel for ICG signals and compares the performance of different time-frequency kernels found in the literature for the case of ICG signals. We conclude that, for ICG signals, the performance (P) of the spectrogram with either Hanning or Hamming windows (P = 0.780) and the extended modified beta distribution (P = 0.765) provided similar results, higher than the rest of analyzed kernels. Graphical abstract Flowchart for the optimization of time-frequency distribution kernels for impedance cardiography signals.
New Fukui, dual and hyper-dual kernels as bond reactivity descriptors.
Franco-Pérez, Marco; Polanco-Ramírez, Carlos-A; Ayers, Paul W; Gázquez, José L; Vela, Alberto
2017-06-21
We define three new linear response indices with promising applications for bond reactivity using the mathematical framework of τ-CRT (finite temperature chemical reactivity theory). The τ-Fukui kernel is defined as the ratio between the fluctuations of the average electron density at two different points in the space and the fluctuations in the average electron number and is designed to integrate to the finite-temperature definition of the electronic Fukui function. When this kernel is condensed, it can be interpreted as a site-reactivity descriptor of the boundary region between two atoms. The τ-dual kernel corresponds to the first order response of the Fukui kernel and is designed to integrate to the finite temperature definition of the dual descriptor; it indicates the ambiphilic reactivity of a specific bond and enriches the traditional dual descriptor by allowing one to distinguish between the electron-accepting and electron-donating processes. Finally, the τ-hyper dual kernel is defined as the second-order derivative of the Fukui kernel and is proposed as a measure of the strength of ambiphilic bonding interactions. Although these quantities have never been proposed, our results for the τ-Fukui kernel and for τ-dual kernel can be derived in zero-temperature formulation of the chemical reactivity theory with, among other things, the widely-used parabolic interpolation model.
Urrutia, Eugene; Lee, Seunggeun; Maity, Arnab; Zhao, Ni; Shen, Judong; Li, Yun; Wu, Michael C
Analysis of rare genetic variants has focused on region-based analysis wherein a subset of the variants within a genomic region is tested for association with a complex trait. Two important practical challenges have emerged. First, it is difficult to choose which test to use. Second, it is unclear which group of variants within a region should be tested. Both depend on the unknown true state of nature. Therefore, we develop the Multi-Kernel SKAT (MK-SKAT) which tests across a range of rare variant tests and groupings. Specifically, we demonstrate that several popular rare variant tests are special cases of the sequence kernel association test which compares pair-wise similarity in trait value to similarity in the rare variant genotypes between subjects as measured through a kernel function. Choosing a particular test is equivalent to choosing a kernel. Similarly, choosing which group of variants to test also reduces to choosing a kernel. Thus, MK-SKAT uses perturbation to test across a range of kernels. Simulations and real data analyses show that our framework controls type I error while maintaining high power across settings: MK-SKAT loses power when compared to the kernel for a particular scenario but has much greater power than poor choices.
Cid, Jaime A; von Davier, Alina A
2015-05-01
Test equating is a method of making the test scores from different test forms of the same assessment comparable. In the equating process, an important step involves continuizing the discrete score distributions. In traditional observed-score equating, this step is achieved using linear interpolation (or an unscaled uniform kernel). In the kernel equating (KE) process, this continuization process involves Gaussian kernel smoothing. It has been suggested that the choice of bandwidth in kernel smoothing controls the trade-off between variance and bias. In the literature on estimating density functions using kernels, it has also been suggested that the weight of the kernel depends on the sample size, and therefore, the resulting continuous distribution exhibits bias at the endpoints, where the samples are usually smaller. The purpose of this article is (a) to explore the potential effects of atypical scores (spikes) at the extreme ends (high and low) on the KE method in distributions with different degrees of asymmetry using the randomly equivalent groups equating design (Study I), and (b) to introduce the Epanechnikov and adaptive kernels as potential alternative approaches to reducing boundary bias in smoothing (Study II). The beta-binomial model is used to simulate observed scores reflecting a range of different skewed shapes.
Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies
Manitz, Juliane; Burger, Patricia; Amos, Christopher I.; Chang-Claude, Jenny; Wichmann, Heinz-Erich; Kneib, Thomas; Bickeböller, Heike
2017-01-01
The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility. PMID:28785300
Pathway-Based Kernel Boosting for the Analysis of Genome-Wide Association Studies.
Friedrichs, Stefanie; Manitz, Juliane; Burger, Patricia; Amos, Christopher I; Risch, Angela; Chang-Claude, Jenny; Wichmann, Heinz-Erich; Kneib, Thomas; Bickeböller, Heike; Hofner, Benjamin
2017-01-01
The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility.
Antioxidant and antimicrobial activities of bitter and sweet apricot (Prunus armeniaca L.) kernels.
Yiğit, D; Yiğit, N; Mavi, A
2009-04-01
The present study describes the in vitro antimicrobial and antioxidant activity of methanol and water extracts of sweet and bitter apricot (Prunus armeniaca L.) kernels. The antioxidant properties of apricot kernels were evaluated by determining radical scavenging power, lipid peroxidation inhibition activity and total phenol content measured with a DPPH test, the thiocyanate method and the Folin method, respectively. In contrast to extracts of the bitter kernels, both the water and methanol extracts of sweet kernels have antioxidant potential. The highest percent inhibition of lipid peroxidation (69%) and total phenolic content (7.9 +/- 0.2 microg/mL) were detected in the methanol extract of sweet kernels (Hasanbey) and in the water extract of the same cultivar, respectively. The antimicrobial activities of the above extracts were also tested against human pathogenic microorganisms using a disc-diffusion method, and the minimal inhibitory concentration (MIC) values of each active extract were determined. The most effective antibacterial activity was observed in the methanol and water extracts of bitter kernels and in the methanol extract of sweet kernels against the Gram-positive bacteria Staphylococcus aureus. Additionally, the methanol extracts of the bitter kernels were very potent against the Gram-negative bacteria Escherichia coli (0.312 mg/mL MIC value). Significant anti-candida activity was also observed with the methanol extract of bitter apricot kernels against Candida albicans, consisting of a 14 mm in diameter of inhibition zone and a 0.625 mg/mL MIC value.
Michalski, Andrew S; Edwards, W Brent; Boyd, Steven K
2017-10-17
Quantitative computed tomography has been posed as an alternative imaging modality to investigate osteoporosis. We examined the influence of computed tomography convolution back-projection reconstruction kernels on the analysis of bone quantity and estimated mechanical properties in the proximal femur. Eighteen computed tomography scans of the proximal femur were reconstructed using both a standard smoothing reconstruction kernel and a bone-sharpening reconstruction kernel. Following phantom-based density calibration, we calculated typical bone quantity outcomes of integral volumetric bone mineral density, bone volume, and bone mineral content. Additionally, we performed finite element analysis in a standard sideways fall on the hip loading configuration. Significant differences for all outcome measures, except integral bone volume, were observed between the 2 reconstruction kernels. Volumetric bone mineral density measured using images reconstructed by the standard kernel was significantly lower (6.7%, p < 0.001) when compared with images reconstructed using the bone-sharpening kernel. Furthermore, the whole-bone stiffness and the failure load measured in images reconstructed by the standard kernel were significantly lower (16.5%, p < 0.001, and 18.2%, p < 0.001, respectively) when compared with the image reconstructed by the bone-sharpening kernel. These data suggest that for future quantitative computed tomography studies, a standardized reconstruction kernel will maximize reproducibility, independent of the use of a quantitative calibration phantom. Copyright © 2017 The International Society for Clinical Densitometry. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Xing, Fuguo; Yao, Haibo; Hruska, Zuzana; Kincaid, Russell; Zhu, Fengle; Brown, Robert L.; Bhatnagar, Deepak; Liu, Yang
2017-05-01
Aflatoxin contamination in peanut products has been an important and long-standing problem around the world. Produced mainly by Aspergillus flavus and Aspergillus parasiticus, aflatoxins are the most toxic and carcinogenic compounds among toxins. This study investigated the application of fluorescence visible near-infrared (VNIR) hyperspectral images to assess the spectral difference between peanut kernels inoculated with toxigenic and atoxigenic inocula of A. flavus and healthy kernels. Peanut kernels were inoculated with NRRL3357, a toxigenic strain of A. flavus, and AF36, an atoxigenic strain of A. flavus, respectively. Fluorescence hyperspectral images under ultraviolet (UV) excitation were recorded on peanut kernels with and without skin. Contaminated kernels exhibited different fluorescence features compared with healthy kernels. For the kernels without skin, the inoculated kernels had a fluorescence peaks shifted to longer wavelengths with lower intensity than healthy kernels. In addition, the fluorescence intensity of peanuts without skin was higher than that of peanuts with skin (10 times). The fluorescence spectra of kernels with skin are significantly different from that of the control group (p<0.001). Furthermore, the fluorescence intensity of the toxigenic, AF3357 peanuts with skin was lower than that of the atoxigenic AF36 group. Discriminate analysis showed that the inoculation group can be separated from the controls with 100% accuracy. However, the two inoculation groups (AF3357 vis AF36) can be separated with only ∼80% accuracy. This study demonstrated the potential of fluorescence hyperspectral imaging techniques for screening of peanut kernels contaminated with A. flavus, which could potentially lead to the production of rapid and non-destructive scanning-based detection technology for the peanut industry.
Amin, Furheen; Masoodi, F A; Baba, Waqas N; Khan, Asma Ashraf; Ganie, Bashir Ahmad
2017-11-01
Packing tissue between and around the kernel halves just turning brown (PTB) is a phenological indicator of kernel ripening at harvest in walnuts. The effect of three ripening stages (Pre-PTB, PTB and Post-PTB) on kernel quality characteristics, mineral composition, lipid characterization, sensory analysis, antioxidant and antibacterial activity were investigated in fresh kernels of indigenous numbered walnut selection of Kashmir valley "SKAU-02". Proximate composition, physical properties and sensory analysis of walnut kernels showed better results for Pre-PTB and PTB while higher mineral content was seen for kernels at Post-PTB stage in comparison to other stages of ripening. Kernels showed significantly higher levels of Omega-3 PUFA (C18:3 n3 ) and low n6/n3 ratio when harvested at Pre-PTB and PTB stages. The highest phenolic content and antioxidant activity was observed at the first stage of ripening and a steady decrease was observed at later stages. TBARS values increased as ripening advanced but did not show any significant difference in malonaldehyde formation during early ripening stages whereas it showed marked increase in walnut kernels at post-PTB stage. Walnut extracts inhibited growth of Gram-positive bacteria ( B. cereus, B. subtilis, and S. aureus ) with respective MICs of 1, 1 and 5 mg/mL and gram negative bacteria ( E. coli, P. and K. pneumonia ) with MIC of 100 mg/mL. Zone of inhibition obtained against all the bacterial strains from walnut kernel extracts increased with increase in the stage of ripening. It is concluded that Pre-PTB harvest stage with higher antioxidant activities, better fatty acid profile and consumer acceptability could be preferred harvesting stage for obtaining functionally superior walnut kernels.
Salt stress reduces kernel number of corn by inhibiting plasma membrane H+-ATPase activity.
Jung, Stephan; Hütsch, Birgit W; Schubert, Sven
2017-04-01
Salt stress affects yield formation of corn (Zea mays L.) at various physiological levels resulting in an overall grain yield decrease. In this study we investigated how salt stress affects kernel development of two corn cultivars (cvs. Pioneer 3906 and Fabregas) at and shortly after pollination. In an earlier study, we found an accumulation of hexoses in the kernel tissue. Therefore, it was hypothesized that hexose uptake into developing endosperm and embryo might be inhibited. Hexoses are transported into the developing endosperm by carriers localized in the plasma membrane (PM). The transport is driven by the pH gradient which is built up by the PM H + -ATPase. It was investigated whether the PM H + -ATPase activity in developing corn kernels was inhibited by salt stress, which would cause a lower pH gradient resulting in impaired hexose import and finally in kernel abortion. Corn grown under control and salt stress conditions was harvested 0 and 2 days after pollination (DAP). Under salt stress sucrose and hexose concentrations in kernel tissue were higher 0 and 2 DAP. Kernel PM H + -ATPase activity was not affected at 0 DAP, but it was reduced at 2 DAP. This is in agreement with the finding, that kernel growth and thus kernel setting was not affected in the salt stress treatment at pollination, but it was reduced 2 days later. It is concluded that inhibition of PM H + -ATPase under salt stress impaired the energization of hexose transporters into the cells, resulting in lower kernel growth and finally in kernel abortion. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Three-Dimensional Sensitivity Kernels of Z/H Amplitude Ratios of Surface and Body Waves
NASA Astrophysics Data System (ADS)
Bao, X.; Shen, Y.
2017-12-01
The ellipticity of Rayleigh wave particle motion, or Z/H amplitude ratio, has received increasing attention in inversion for shallow Earth structures. Previous studies of the Z/H ratio assumed one-dimensional (1D) velocity structures beneath the receiver, ignoring the effects of three-dimensional (3D) heterogeneities on wave amplitudes. This simplification may introduce bias in the resulting models. Here we present 3D sensitivity kernels of the Z/H ratio to Vs, Vp, and density perturbations, based on finite-difference modeling of wave propagation in 3D structures and the scattering-integral method. Our full-wave approach overcomes two main issues in previous studies of Rayleigh wave ellipticity: (1) the finite-frequency effects of wave propagation in 3D Earth structures, and (2) isolation of the fundamental mode Rayleigh waves from Rayleigh wave overtones and converted Love waves. In contrast to the 1D depth sensitivity kernels in previous studies, our 3D sensitivity kernels exhibit patterns that vary with azimuths and distances to the receiver. The laterally-summed 3D sensitivity kernels and 1D depth sensitivity kernels, based on the same homogeneous reference model, are nearly identical with small differences that are attributable to the single period of the 1D kernels and a finite period range of the 3D kernels. We further verify the 3D sensitivity kernels by comparing the predictions from the kernels with the measurements from numerical simulations of wave propagation for models with various small-scale perturbations. We also calculate and verify the amplitude kernels for P waves. This study shows that both Rayleigh and body wave Z/H ratios provide vertical and lateral constraints on the structure near the receiver. With seismic arrays, the 3D kernels afford a powerful tool to use the Z/H ratios to obtain accurate and high-resolution Earth models.
Considering causal genes in the genetic dissection of kernel traits in common wheat.
Mohler, Volker; Albrecht, Theresa; Castell, Adelheid; Diethelm, Manuela; Schweizer, Günther; Hartl, Lorenz
2016-11-01
Genetic factors controlling thousand-kernel weight (TKW) were characterized for their association with other seed traits, including kernel width, kernel length, ratio of kernel width to kernel length (KW/KL), kernel area, and spike number per m 2 (SN). For this purpose, a genetic map was established utilizing a doubled haploid population derived from a cross between German winter wheat cultivars Pamier and Format. Association studies in a diversity panel of elite cultivars supplemented genetic analysis of kernel traits. In both populations, genomic signatures of 13 candidate genes for TKW and kernel size were analyzed. Major quantitative trait loci (QTL) for TKW were identified on chromosomes 1B, 2A, 2D, and 4D, and their locations coincided with major QTL for kernel size traits, supporting the common belief that TKW is a function of other kernel traits. The QTL on chromosome 2A was associated with TKW candidate gene TaCwi-A1 and the QTL on chromosome 4D was associated with dwarfing gene Rht-D1. A minor QTL for TKW on chromosome 6B coincided with TaGW2-6B. The QTL for kernel dimensions that did not affect TKW were detected on eight chromosomes. A major QTL for KW/KL located at the distal tip of chromosome arm 5AS is being reported for the first time. TaSus1-7A and TaSAP-A1, closely linked to each other on chromosome 7A, could be related to a minor QTL for KW/KL. Genetic analysis of SN confirmed its negative correlation with TKW in this cross. In the diversity panel, TaSus1-7A was associated with TKW. Compared to the Pamier/Format bi-parental population where TaCwi-A1a was associated with higher TKW, the same allele reduced grain yield in the diversity panel, suggesting opposite effects of TaCwi-A1 on these two traits.
Guo, Zhiqing; Döll, Katharina; Dastjerdi, Raana; Karlovsky, Petr; Dehne, Heinz-Wilhelm; Altincicek, Boran
2014-01-01
Species of Fusarium have significant agro-economical and human health-related impact by infecting diverse crop plants and synthesizing diverse mycotoxins. Here, we investigated interactions of grain-feeding Tenebrio molitor larvae with four grain-colonizing Fusarium species on wheat kernels. Since numerous metabolites produced by Fusarium spp. are toxic to insects, we tested the hypothesis that the insect senses and avoids Fusarium-colonized grains. We found that only kernels colonized with F. avenaceum or Beauveria bassiana (an insect-pathogenic fungal control) were avoided by the larvae as expected. Kernels colonized with F. proliferatum, F. poae or F. culmorum attracted T. molitor larvae significantly more than control kernels. The avoidance/preference correlated with larval feeding behaviors and weight gain. Interestingly, larvae that had consumed F. proliferatum- or F. poae-colonized kernels had similar survival rates as control. Larvae fed on F. culmorum-, F. avenaceum- or B. bassiana-colonized kernels had elevated mortality rates. HPLC analyses confirmed the following mycotoxins produced by the fungal strains on the kernels: fumonisins, enniatins and beauvericin by F. proliferatum, enniatins and beauvericin by F. poae, enniatins by F. avenaceum, and deoxynivalenol and zearalenone by F. culmorum. Our results indicate that T. molitor larvae have the ability to sense potential survival threats of kernels colonized with F. avenaceum or B. bassiana, but not with F. culmorum. Volatiles potentially along with gustatory cues produced by these fungi may represent survival threat signals for the larvae resulting in their avoidance. Although F. proliferatum or F. poae produced fumonisins, enniatins and beauvericin during kernel colonization, the larvae were able to use those kernels as diet without exhibiting increased mortality. Consumption of F. avenaceum-colonized kernels, however, increased larval mortality; these kernels had higher enniatin levels than F. proliferatum or F. poae-colonized ones suggesting that T. molitor can tolerate or metabolize those toxins. PMID:24932485
NASA Astrophysics Data System (ADS)
Pan, F.; Huang, X.; Chen, X.
2015-12-01
Radiative kernel method has been validated and widely used in the study of climate feedbacks. This study uses spectrally resolved longwave radiative kernels to examine the short-term water vapor feedbacks associated with the ENSO cycles. Using a 500-year GFDL CM3 and a 100-year NCAR CCSM4 pre-industry control simulation, we have constructed two sets of longwave spectral radiative kernels. We then composite El Niño, La Niña and ENSO-neutral states and estimate the water vapor feedbacks associated with the El Niño and La Niña phases of ENSO cycles in both simulations. Similar analysis is also applied to 35-year (1979-2014) ECMWF ERA-interim reanalysis data, which is deemed as observational results here. When modeled and observed broadband feedbacks are compared to each other, they show similar geographic patterns but with noticeable discrepancies in the contrast between the tropics and extra-tropics. Especially, in El Niño phase, the feedback estimated from reanalysis is much greater than those from the model simulations. Considering the observational data span, we carry out a sensitivity test to explore the variability of feedback-deriving using 35-year data. To do so, we calculate the water vapor feedback within every 35-year segment of the GFDL CM3 control run by two methods: one is to composite El Nino or La Nina phases as mentioned above and the other is to regressing the TOA flux perturbation caused by water vapor change (δR_H2O) against the global-mean surface temperature anomaly. We find that the short-term feedback strengths derived from composite method can change considerably from one segment to another segment, while the feedbacks by regression method are less sensitive to the choice of segment and their strengths are also much smaller than those from composite analysis. This study suggests that caution is warranted in order to infer long-term feedbacks from a few decades of observations. When spectral details of the global-mean feedbacks are examined, more inconsistencies can be revealed in many spectral bands, especially H2O continuum absorption bands and window regions. These discrepancies can be attributed back to differences in observed and modeled water vapor profiles in responses to tropical SST.
Bayesian Optimization for Neuroimaging Pre-processing in Brain Age Classification and Prediction
Lancaster, Jenessa; Lorenz, Romy; Leech, Rob; Cole, James H.
2018-01-01
Neuroimaging-based age prediction using machine learning is proposed as a biomarker of brain aging, relating to cognitive performance, health outcomes and progression of neurodegenerative disease. However, even leading age-prediction algorithms contain measurement error, motivating efforts to improve experimental pipelines. T1-weighted MRI is commonly used for age prediction, and the pre-processing of these scans involves normalization to a common template and resampling to a common voxel size, followed by spatial smoothing. Resampling parameters are often selected arbitrarily. Here, we sought to improve brain-age prediction accuracy by optimizing resampling parameters using Bayesian optimization. Using data on N = 2003 healthy individuals (aged 16–90 years) we trained support vector machines to (i) distinguish between young (<22 years) and old (>50 years) brains (classification) and (ii) predict chronological age (regression). We also evaluated generalisability of the age-regression model to an independent dataset (CamCAN, N = 648, aged 18–88 years). Bayesian optimization was used to identify optimal voxel size and smoothing kernel size for each task. This procedure adaptively samples the parameter space to evaluate accuracy across a range of possible parameters, using independent sub-samples to iteratively assess different parameter combinations to arrive at optimal values. When distinguishing between young and old brains a classification accuracy of 88.1% was achieved, (optimal voxel size = 11.5 mm3, smoothing kernel = 2.3 mm). For predicting chronological age, a mean absolute error (MAE) of 5.08 years was achieved, (optimal voxel size = 3.73 mm3, smoothing kernel = 3.68 mm). This was compared to performance using default values of 1.5 mm3 and 4mm respectively, resulting in MAE = 5.48 years, though this 7.3% improvement was not statistically significant. When assessing generalisability, best performance was achieved when applying the entire Bayesian optimization framework to the new dataset, out-performing the parameters optimized for the initial training dataset. Our study outlines the proof-of-principle that neuroimaging models for brain-age prediction can use Bayesian optimization to derive case-specific pre-processing parameters. Our results suggest that different pre-processing parameters are selected when optimization is conducted in specific contexts. This potentially motivates use of optimization techniques at many different points during the experimental process, which may improve statistical sensitivity and reduce opportunities for experimenter-led bias. PMID:29483870
Tan, Stéphanie; Soulez, Gilles; Diez Martinez, Patricia; Larrivée, Sandra; Stevens, Louis-Mathieu; Goussard, Yves; Mansour, Samer; Chartrand-Lefebvre, Carl
2016-01-01
Metallic artifacts can result in an artificial thickening of the coronary stent wall which can significantly impair computed tomography (CT) imaging in patients with coronary stents. The objective of this study is to assess in vivo visualization of coronary stent wall and lumen with an edge-enhancing CT reconstruction kernel, as compared to a standard kernel. This is a prospective cross-sectional study involving the assessment of 71 coronary stents (24 patients), with blinded observers. After 256-slice CT angiography, image reconstruction was done with medium-smooth and edge-enhancing kernels. Stent wall thickness was measured with both orthogonal and circumference methods, averaging thickness from diameter and circumference measurements, respectively. Image quality was assessed quantitatively using objective parameters (noise, signal to noise (SNR) and contrast to noise (CNR) ratios), as well as visually using a 5-point Likert scale. Stent wall thickness was decreased with the edge-enhancing kernel in comparison to the standard kernel, either with the orthogonal (0.97 ± 0.02 versus 1.09 ± 0.03 mm, respectively; p<0.001) or the circumference method (1.13 ± 0.02 versus 1.21 ± 0.02 mm, respectively; p = 0.001). The edge-enhancing kernel generated less overestimation from nominal thickness compared to the standard kernel, both with the orthogonal (0.89 ± 0.19 versus 1.00 ± 0.26 mm, respectively; p<0.001) and the circumference (1.06 ± 0.26 versus 1.13 ± 0.31 mm, respectively; p = 0.005) methods. The edge-enhancing kernel was associated with lower SNR and CNR, as well as higher background noise (all p < 0.001), in comparison to the medium-smooth kernel. Stent visual scores were higher with the edge-enhancing kernel (p<0.001). In vivo 256-slice CT assessment of coronary stents shows that the edge-enhancing CT reconstruction kernel generates thinner stent walls, less overestimation from nominal thickness, and better image quality scores than the standard kernel.
Estimating top-of-atmosphere thermal infrared radiance using MERRA-2 atmospheric data
NASA Astrophysics Data System (ADS)
Kleynhans, Tania; Montanaro, Matthew; Gerace, Aaron; Kanan, Christopher
2017-05-01
Thermal infrared satellite images have been widely used in environmental studies. However, satellites have limited temporal resolution, e.g., 16 day Landsat or 1 to 2 day Terra MODIS. This paper investigates the use of the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) reanalysis data product, produced by NASA's Global Modeling and Assimilation Office (GMAO) to predict global topof-atmosphere (TOA) thermal infrared radiance. The high temporal resolution of the MERRA-2 data product presents opportunities for novel research and applications. Various methods were applied to estimate TOA radiance from MERRA-2 variables namely (1) a parameterized physics based method, (2) Linear regression models and (3) non-linear Support Vector Regression. Model prediction accuracy was evaluated using temporally and spatially coincident Moderate Resolution Imaging Spectroradiometer (MODIS) thermal infrared data as reference data. This research found that Support Vector Regression with a radial basis function kernel produced the lowest error rates. Sources of errors are discussed and defined. Further research is currently being conducted to train deep learning models to predict TOA thermal radiance
Clifford support vector machines for classification, regression, and recurrence.
Bayro-Corrochano, Eduardo Jose; Arana-Daniel, Nancy
2010-11-01
This paper introduces the Clifford support vector machines (CSVM) as a generalization of the real and complex-valued support vector machines using the Clifford geometric algebra. In this framework, we handle the design of kernels involving the Clifford or geometric product. In this approach, one redefines the optimization variables as multivectors. This allows us to have a multivector as output. Therefore, we can represent multiple classes according to the dimension of the geometric algebra in which we work. We show that one can apply CSVM for classification and regression and also to build a recurrent CSVM. The CSVM is an attractive approach for the multiple input multiple output processing of high-dimensional geometric entities. We carried out comparisons between CSVM and the current approaches to solve multiclass classification and regression. We also study the performance of the recurrent CSVM with experiments involving time series. The authors believe that this paper can be of great use for researchers and practitioners interested in multiclass hypercomplex computing, particularly for applications in complex and quaternion signal and image processing, satellite control, neurocomputation, pattern recognition, computer vision, augmented virtual reality, robotics, and humanoids.
Hruska, Zuzana; Yao, Haibo; Kincaid, Russell; Brown, Robert L; Bhatnagar, Deepak; Cleveland, Thomas E
2017-01-01
Non-invasive, easy to use and cost-effective technology offers a valuable alternative for rapid detection of carcinogenic fungal metabolites, namely aflatoxins, in commodities. One relatively recent development in this area is the use of spectral technology. Fluorescence hyperspectral imaging, in particular, offers a potential rapid and non-invasive method for detecting the presence of aflatoxins in maize infected with the toxigenic fungus Aspergillus flavus . Earlier studies have shown that whole maize kernels contaminated with aflatoxins exhibit different spectral signatures from uncontaminated kernels based on the external fluorescence emission of the whole kernels. Here, the effect of time on the internal fluorescence spectral emissions from cross-sections of kernels infected with toxigenic and atoxigenic A. flavus , were examined in order to elucidate the interaction between the fluorescence signals emitted by some aflatoxin contaminated maize kernels and the fungal invasion resulting in the production of aflatoxins. First, the difference in internal fluorescence emissions between cross-sections of kernels incubated in toxigenic and atoxigenic inoculum was assessed. Kernels were inoculated with each strain for 5, 7, and 9 days before cross-sectioning and imaging. There were 270 kernels (540 halves) imaged, including controls. Second, in a different set of kernels (15 kernels/group; 135 total), the germ of each kernel was separated from the endosperm to determine the major areas of aflatoxin accumulation and progression over nine growth days. Kernels were inoculated with toxigenic and atoxigenic fungal strains for 5, 7, and 9 days before the endosperm and germ were separated, followed by fluorescence hyperspectral imaging and chemical aflatoxin determination. A marked difference in fluorescence intensity was shown between the toxigenic and atoxigenic strains on day nine post-inoculation, which may be a useful indicator of the location of aflatoxin contamination. This finding suggests that both, the fluorescence peak shift and intensity as well as timing, may be essential in distinguishing toxigenic and atoxigenic fungi based on spectral features. Results also reveal a possible preferential difference in the internal colonization of maize kernels between the toxigenic and atoxigenic strains of A. flavus suggesting a potential window for differentiating the strains based on fluorescence spectra at specific time points.
Hruska, Zuzana; Yao, Haibo; Kincaid, Russell; Brown, Robert L.; Bhatnagar, Deepak; Cleveland, Thomas E.
2017-01-01
Non-invasive, easy to use and cost-effective technology offers a valuable alternative for rapid detection of carcinogenic fungal metabolites, namely aflatoxins, in commodities. One relatively recent development in this area is the use of spectral technology. Fluorescence hyperspectral imaging, in particular, offers a potential rapid and non-invasive method for detecting the presence of aflatoxins in maize infected with the toxigenic fungus Aspergillus flavus. Earlier studies have shown that whole maize kernels contaminated with aflatoxins exhibit different spectral signatures from uncontaminated kernels based on the external fluorescence emission of the whole kernels. Here, the effect of time on the internal fluorescence spectral emissions from cross-sections of kernels infected with toxigenic and atoxigenic A. flavus, were examined in order to elucidate the interaction between the fluorescence signals emitted by some aflatoxin contaminated maize kernels and the fungal invasion resulting in the production of aflatoxins. First, the difference in internal fluorescence emissions between cross-sections of kernels incubated in toxigenic and atoxigenic inoculum was assessed. Kernels were inoculated with each strain for 5, 7, and 9 days before cross-sectioning and imaging. There were 270 kernels (540 halves) imaged, including controls. Second, in a different set of kernels (15 kernels/group; 135 total), the germ of each kernel was separated from the endosperm to determine the major areas of aflatoxin accumulation and progression over nine growth days. Kernels were inoculated with toxigenic and atoxigenic fungal strains for 5, 7, and 9 days before the endosperm and germ were separated, followed by fluorescence hyperspectral imaging and chemical aflatoxin determination. A marked difference in fluorescence intensity was shown between the toxigenic and atoxigenic strains on day nine post-inoculation, which may be a useful indicator of the location of aflatoxin contamination. This finding suggests that both, the fluorescence peak shift and intensity as well as timing, may be essential in distinguishing toxigenic and atoxigenic fungi based on spectral features. Results also reveal a possible preferential difference in the internal colonization of maize kernels between the toxigenic and atoxigenic strains of A. flavus suggesting a potential window for differentiating the strains based on fluorescence spectra at specific time points. PMID:28966606
Partial Müllerian Duct Retention in Smad4 Conditional Mutant Male Mice.
Petit, Fabrice G; Deng, Chuxia; Jamin, Soazik P
2016-01-01
Müllerian duct regression is a complex process which involves the AMH signalling pathway. We have previously demonstrated that besides AMH and its specific type II receptor (AMHRII), BMPR-IA and Smad5 are two essential factors implicated in this mechanism. Mothers against decapentaplegic homolog 4 (Smad4) is a transcription factor and the common Smad (co-Smad) involved in transforming growth factor beta (TGF-β) signalling pathway superfamily. Since Smad4 null mutants die early during gastrulation, we have inactivated Smad4 in the Müllerian duct mesenchyme. Specific inactivation of Smad4 in the urogenital ridge leads to the partial persistence of the Müllerian duct in adult male mice. Careful examination of the urogenital tract reveals that the Müllerian duct retention is randomly distributed either on one side or both sides. Histological analysis shows a uterus-like structure, which is confirmed by the expression of estrogen receptor α. As previously described in a β-catenin conditional mutant mouse model, β-catenin contributes to Müllerian duct regression. In our mutant male embryos, it appears that β-catenin expression is locally reduced along the urogenital ridge as compared to control mice. Moreover, the expression pattern is similar to those observed in control female mice. This study shows that reduced Smad4 expression disrupts the Wnt/β-catenin signalling leading to the partial persistence of Müllerian duct.
Using the Intel Math Kernel Library on Peregrine | High-Performance
Computing | NREL the Intel Math Kernel Library on Peregrine Using the Intel Math Kernel Library on Peregrine Learn how to use the Intel Math Kernel Library (MKL) with Peregrine system software. MKL architectures. Core math functions in MKL include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier
Code of Federal Regulations, 2014 CFR
2014-04-01
... the Act, are as follows: Common name Botanical name of plant source Apricot kernel (persic oil) Prunus armeniaca L. Peach kernel (persic oil) Prunus persica Sieb. et Zucc. Peanut stearine Arachis hypogaea L. Persic oil (see apricot kernel and peach kernel) Quince seed Cydonia oblonga Miller. [42 FR 14640, Mar...
7 CFR 51.2954 - Tolerances for grade defects.
Code of Federal Regulations, 2010 CFR
2010-01-01
... chart. Tolerances for Grade Defects Grade External (shell) defects Internal (kernel) defects Color of kernel U.S. No. 1. 10 pct, by count for splits. 5 pct. by count, for other shell defects, including not... tolerance to reduce the required 70 pct of “light amber” kernels or the required 40 pct of “light” kernels...
7 CFR 51.2284 - Size classification.
Code of Federal Regulations, 2010 CFR
2010-01-01
...: “Halves”, “Pieces and Halves”, “Pieces” or “Small Pieces”. The size of portions of kernels in the lot... consists of 85 percent or more, by weight, half kernels, and the remainder three-fourths half kernels. (See § 51.2285.) (b) Pieces and halves. Lot consists of 20 percent or more, by weight, half kernels, and the...
USDA-ARS?s Scientific Manuscript database
Normal oleic peanuts are often found within commercial lots of high oleic peanuts when sampling among individual kernels. Kernels not meeting high oleic threshold could be true contamination with normal oleic peanuts introduced via poor handling, or kernels not meeting threshold could be immature a...
THERMOS. 30-Group ENDF/B Scattered Kernels
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCrosson, F.J.; Finch, D.R.
1973-12-01
These data are 30-group THERMOS thermal scattering kernels for P0 to P5 Legendre orders for every temperature of every material from s(alpha,beta) data stored in the ENDF/B library. These scattering kernels were generated using the FLANGE2 computer code. To test the kernels, the integral properties of each set of kernels were determined by a precision integration of the diffusion length equation and compared to experimental measurements of these properties. In general, the agreement was very good. Details of the methods used and results obtained are contained in the reference. The scattering kernels are organized into a two volume magnetic tapemore » library from which they may be retrieved easily for use in any 30-group THERMOS library.« less
Jankowska, Marta M; Natarajan, Loki; Godbole, Suneeta; Meseck, Kristin; Sears, Dorothy D; Patterson, Ruth E; Kerr, Jacqueline
2017-07-01
Background: Environmental factors may influence breast cancer; however, most studies have measured environmental exposure in neighborhoods around home residences (static exposure). We hypothesize that tracking environmental exposures over time and space (dynamic exposure) is key to assessing total exposure. This study compares breast cancer survivors' exposure to walkable and recreation-promoting environments using dynamic Global Positioning System (GPS) and static home-based measures of exposure in relation to insulin resistance. Methods: GPS data from 249 breast cancer survivors living in San Diego County were collected for one week along with fasting blood draw. Exposure to recreation spaces and walkability was measured for each woman's home address within an 800 m buffer (static), and using a kernel density weight of GPS tracks (dynamic). Participants' exposure estimates were related to insulin resistance (using the homeostatic model assessment of insulin resistance, HOMA-IR) controlled by age and body mass index (BMI) in linear regression models. Results: The dynamic measurement method resulted in greater variability in built environment exposure values than did the static method. Regression results showed no association between HOMA-IR and home-based, static measures of walkability and recreation area exposure. GPS-based dynamic measures of both walkability and recreation area were significantly associated with lower HOMA-IR ( P < 0.05). Conclusions: Dynamic exposure measurements may provide important evidence for community- and individual-level interventions that can address cancer risk inequities arising from environments wherein breast cancer survivors live and engage. Impact: This is the first study to compare associations of dynamic versus static built environment exposure measures with insulin outcomes in breast cancer survivors. Cancer Epidemiol Biomarkers Prev; 26(7); 1078-84. ©2017 AACR . ©2017 American Association for Cancer Research.
Gaussian process regression for tool wear prediction
NASA Astrophysics Data System (ADS)
Kong, Dongdong; Chen, Yongjie; Li, Ning
2018-05-01
To realize and accelerate the pace of intelligent manufacturing, this paper presents a novel tool wear assessment technique based on the integrated radial basis function based kernel principal component analysis (KPCA_IRBF) and Gaussian process regression (GPR) for real-timely and accurately monitoring the in-process tool wear parameters (flank wear width). The KPCA_IRBF is a kind of new nonlinear dimension-increment technique and firstly proposed for feature fusion. The tool wear predictive value and the corresponding confidence interval are both provided by utilizing the GPR model. Besides, GPR performs better than artificial neural networks (ANN) and support vector machines (SVM) in prediction accuracy since the Gaussian noises can be modeled quantitatively in the GPR model. However, the existence of noises will affect the stability of the confidence interval seriously. In this work, the proposed KPCA_IRBF technique helps to remove the noises and weaken its negative effects so as to make the confidence interval compressed greatly and more smoothed, which is conducive for monitoring the tool wear accurately. Moreover, the selection of kernel parameter in KPCA_IRBF can be easily carried out in a much larger selectable region in comparison with the conventional KPCA_RBF technique, which helps to improve the efficiency of model construction. Ten sets of cutting tests are conducted to validate the effectiveness of the presented tool wear assessment technique. The experimental results show that the in-process flank wear width of tool inserts can be monitored accurately by utilizing the presented tool wear assessment technique which is robust under a variety of cutting conditions. This study lays the foundation for tool wear monitoring in real industrial settings.
Droplet Image Super Resolution Based on Sparse Representation and Kernel Regression
NASA Astrophysics Data System (ADS)
Zou, Zhenzhen; Luo, Xinghong; Yu, Qiang
2018-02-01
Microgravity and containerless conditions, which are produced via electrostatic levitation combined with a drop tube, are important when studying the intrinsic properties of new metastable materials. Generally, temperature and image sensors can be used to measure the changes of sample temperature, morphology and volume. Then, the specific heat, surface tension, viscosity changes and sample density can be obtained. Considering that the falling speed of the material sample droplet is approximately 31.3 m/s when it reaches the bottom of a 50-meter-high drop tube, a high-speed camera with a collection rate of up to 106 frames/s is required to image the falling droplet. However, at the high-speed mode, very few pixels, approximately 48-120, will be obtained in each exposure time, which results in low image quality. Super-resolution image reconstruction is an algorithm that provides finer details than the sampling grid of a given imaging device by increasing the number of pixels per unit area in the image. In this work, we demonstrate the application of single image-resolution reconstruction in the microgravity and electrostatic levitation for the first time. Here, using the image super-resolution method based on sparse representation, a low-resolution droplet image can be reconstructed. Employed Yang's related dictionary model, high- and low-resolution image patches were combined with dictionary training, and high- and low-resolution-related dictionaries were obtained. The online double-sparse dictionary training algorithm was used in the study of related dictionaries and overcome the shortcomings of the traditional training algorithm with small image patch. During the stage of image reconstruction, the algorithm of kernel regression is added, which effectively overcomes the shortcomings of the Yang image's edge blurs.
NASA Astrophysics Data System (ADS)
Lee, Minho; Kim, Namkug; Lee, Sang Min; Seo, Joon Beom; Oh, Sang Young
2015-03-01
To quantify low attenuation area (LAA) of emphysematous regions according to cluster size in 3D volumetric CT data of chronic obstructive pulmonary disease (COPD) patients and to compare these indices with their pulmonary functional test (PFT). Sixty patients with COPD were scanned by a more than 16-multi detector row CT scanner (Siemens Sensation 16 and 64) within 0.75mm collimation. Based on these LAA masks, a length scale analysis to estimate each emphysema LAA's size was performed as follows. At first, Gaussian low pass filter from 30mm to 1mm kernel size with 1mm interval on the mask was performed from large to small size, iteratively. Centroid voxels resistant to the each filter were selected and dilated by the size of the kernel, which was regarded as the specific size emphysema mask. The slopes of area and number of size based LAA (slope of semi-log plot) were analyzed and compared with PFT. PFT parameters including DLco, FEV1, and FEV1/FVC were significantly (all p-value< 0.002) correlated with the slopes (r-values; -0.73, 0.54, 0.69, respectively) and EI (r-values; -0.84, -0.60, -0.68, respectively). In addition, the D independently contributed regression for FEV1 and FEV1/FVC (adjust R sq. of regression study: EI only, 0.70, 0.45; EI and D, 0.71, 0.51, respectively). By the size based LAA segmentation and analysis, we evaluated the Ds of area, number, and distribution of size based LAA, which would be independent factors for predictor of PFT parameters.
Kernel Regression Estimation of Fiber Orientation Mixtures in Diffusion MRI
Cabeen, Ryan P.; Bastin, Mark E.; Laidlaw, David H.
2016-01-01
We present and evaluate a method for kernel regression estimation of fiber orientations and associated volume fractions for diffusion MR tractography and population-based atlas construction in clinical imaging studies of brain white matter. This is a model-based image processing technique in which representative fiber models are estimated from collections of component fiber models in model-valued image data. This extends prior work in nonparametric image processing and multi-compartment processing to provide computational tools for image interpolation, smoothing, and fusion with fiber orientation mixtures. In contrast to related work on multi-compartment processing, this approach is based on directional measures of divergence and includes data-adaptive extensions for model selection and bilateral filtering. This is useful for reconstructing complex anatomical features in clinical datasets analyzed with the ball-and-sticks model, and our framework’s data-adaptive extensions are potentially useful for general multi-compartment image processing. We experimentally evaluate our approach with both synthetic data from computational phantoms and in vivo clinical data from human subjects. With synthetic data experiments, we evaluate performance based on errors in fiber orientation, volume fraction, compartment count, and tractography-based connectivity. With in vivo data experiments, we first show improved scan-rescan reproducibility and reliability of quantitative fiber bundle metrics, including mean length, volume, streamline count, and mean volume fraction. We then demonstrate the creation of a multi-fiber tractography atlas from a population of 80 human subjects. In comparison to single tensor atlasing, our multi-fiber atlas shows more complete features of known fiber bundles and includes reconstructions of the lateral projections of the corpus callosum and complex fronto-parietal connections of the superior longitudinal fasciculus I, II, and III. PMID:26691524
Droplet Image Super Resolution Based on Sparse Representation and Kernel Regression
NASA Astrophysics Data System (ADS)
Zou, Zhenzhen; Luo, Xinghong; Yu, Qiang
2018-05-01
Microgravity and containerless conditions, which are produced via electrostatic levitation combined with a drop tube, are important when studying the intrinsic properties of new metastable materials. Generally, temperature and image sensors can be used to measure the changes of sample temperature, morphology and volume. Then, the specific heat, surface tension, viscosity changes and sample density can be obtained. Considering that the falling speed of the material sample droplet is approximately 31.3 m/s when it reaches the bottom of a 50-meter-high drop tube, a high-speed camera with a collection rate of up to 106 frames/s is required to image the falling droplet. However, at the high-speed mode, very few pixels, approximately 48-120, will be obtained in each exposure time, which results in low image quality. Super-resolution image reconstruction is an algorithm that provides finer details than the sampling grid of a given imaging device by increasing the number of pixels per unit area in the image. In this work, we demonstrate the application of single image-resolution reconstruction in the microgravity and electrostatic levitation for the first time. Here, using the image super-resolution method based on sparse representation, a low-resolution droplet image can be reconstructed. Employed Yang's related dictionary model, high- and low-resolution image patches were combined with dictionary training, and high- and low-resolution-related dictionaries were obtained. The online double-sparse dictionary training algorithm was used in the study of related dictionaries and overcome the shortcomings of the traditional training algorithm with small image patch. During the stage of image reconstruction, the algorithm of kernel regression is added, which effectively overcomes the shortcomings of the Yang image's edge blurs.
NASA Astrophysics Data System (ADS)
Süveges, Maria; Anderson, Richard I.
2018-04-01
Detailed knowledge of the variability of classical Cepheids, in particular their modulations and mode composition, provides crucial insight into stellar structure and pulsation. However, tiny modulations of the dominant radial-mode pulsation were recently found to be very frequent, possibly ubiquitous in Cepheids, which makes secondary modes difficult to detect and analyse, since these modulations can easily mask the potentially weak secondary modes. The aim of this study is to re-investigate the secondary mode content in the sample of OGLE-III and -IV single-mode classical Cepheids using kernel regression with adaptive kernel width for pre-whitening, instead of using a constant-parameter model. This leads to a more precise removal of the modulated dominant pulsation, and enables a more complete survey of secondary modes with frequencies outside a narrow range around the primary. Our analysis reveals that significant secondary modes occur more frequently among first overtone Cepheids than previously thought. The mode composition appears significantly different in the Large and Small Magellanic Clouds, suggesting a possible dependence on chemical composition. In addition to the formerly identified non-radial mode at P2 ≈ 0.6…0.65P1 (0.62-mode), and a cluster of modes with near-primary frequency, we find two more candidate non-radial modes. One is a numerous group of secondary modes with P2 ≈ 1.25P1, which may represent the fundamental of the 0.62-mode, supposed to be the first harmonic of an l ∈ {7, 8, 9} non-radial mode. The other new mode is at P2 ≈ 1.46P1, possibly analogous to a similar, rare mode recently discovered among first overtone RR Lyrae stars.
Karri, Rama Rao; Sahu, J N
2018-01-15
Zn (II) is one the common pollutant among heavy metals found in industrial effluents. Removal of pollutant from industrial effluents can be accomplished by various techniques, out of which adsorption was found to be an efficient method. Applications of adsorption limits itself due to high cost of adsorbent. In this regard, a low cost adsorbent produced from palm oil kernel shell based agricultural waste is examined for its efficiency to remove Zn (II) from waste water and aqueous solution. The influence of independent process variables like initial concentration, pH, residence time, activated carbon (AC) dosage and process temperature on the removal of Zn (II) by palm kernel shell based AC from batch adsorption process are studied systematically. Based on the design of experimental matrix, 50 experimental runs are performed with each process variable in the experimental range. The optimal values of process variables to achieve maximum removal efficiency is studied using response surface methodology (RSM) and artificial neural network (ANN) approaches. A quadratic model, which consists of first order and second order degree regressive model is developed using the analysis of variance and RSM - CCD framework. The particle swarm optimization which is a meta-heuristic optimization is embedded on the ANN architecture to optimize the search space of neural network. The optimized trained neural network well depicts the testing data and validation data with R 2 equal to 0.9106 and 0.9279 respectively. The outcomes indicates that the superiority of ANN-PSO based model predictions over the quadratic model predictions provided by RSM. Copyright © 2017 Elsevier Ltd. All rights reserved.
Predicting radiotherapy outcomes using statistical learning techniques
NASA Astrophysics Data System (ADS)
El Naqa, Issam; Bradley, Jeffrey D.; Lindsay, Patricia E.; Hope, Andrew J.; Deasy, Joseph O.
2009-09-01
Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model variables. These models have the capacity to predict on unseen data. Part of this work was first presented at the Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, 11-13 December 2008.
7 CFR 810.2003 - Basis of determination.
Code of Federal Regulations, 2010 CFR
2010-01-01
... Basis of determination. Each determination of heat-damaged kernels, damaged kernels, material other than... shrunken and broken kernels. Other determinations not specifically provided for under the general...
DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.
Ma, Wenxiu; Yang, Lin; Rohs, Remo; Noble, William Stafford
2017-10-01
Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values. The software is available at https://bitbucket.org/wenxiu/sequence-shape.git. rohs@usc.edu or william-noble@uw.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439
Fine-mapping of qGW4.05, a major QTL for kernel weight and size in maize.
Chen, Lin; Li, Yong-xiang; Li, Chunhui; Wu, Xun; Qin, Weiwei; Li, Xin; Jiao, Fuchao; Zhang, Xiaojing; Zhang, Dengfeng; Shi, Yunsu; Song, Yanchun; Li, Yu; Wang, Tianyu
2016-04-12
Kernel weight and size are important components of grain yield in cereals. Although some information is available concerning the map positions of quantitative trait loci (QTL) for kernel weight and size in maize, little is known about the molecular mechanisms of these QTLs. qGW4.05 is a major QTL that is associated with kernel weight and size in maize. We combined linkage analysis and association mapping to fine-map and identify candidate gene(s) at qGW4.05. QTL qGW4.05 was fine-mapped to a 279.6-kb interval in a segregating population derived from a cross of Huangzaosi with LV28. By combining the results of regional association mapping and linkage analysis, we identified GRMZM2G039934 as a candidate gene responsible for qGW4.05. Candidate gene-based association mapping was conducted using a panel of 184 inbred lines with variable kernel weights and kernel sizes. Six polymorphic sites in the gene GRMZM2G039934 were significantly associated with kernel weight and kernel size. The results of linkage analysis and association mapping revealed that GRMZM2G039934 is the most likely candidate gene for qGW4.05. These results will improve our understanding of the genetic architecture and molecular mechanisms underlying kernel development in maize.
Kandianis, Catherine B.; Michenfelder, Abigail S.; Simmons, Susan J.; Grusak, Michael A.; Stapleton, Ann E.
2013-01-01
The improvement of grain nutrient profiles for essential minerals and vitamins through breeding strategies is a target important for agricultural regions where nutrient poor crops like maize contribute a large proportion of the daily caloric intake. Kernel iron concentration in maize exhibits a broad range. However, the magnitude of genotype by environment (GxE) effects on this trait reduces the efficacy and predictability of selection programs, particularly when challenged with abiotic stress such as water and nitrogen limitations. Selection has also been limited by an inverse correlation between kernel iron concentration and the yield component of kernel size in target environments. Using 25 maize inbred lines for which extensive genome sequence data is publicly available, we evaluated the response of kernel iron density and kernel mass to water and nitrogen limitation in a managed field stress experiment using a factorial design. To further understand GxE interactions we used partition analysis to characterize response of kernel iron and weight to abiotic stressors among all genotypes, and observed two patterns: one characterized by higher kernel iron concentrations in control over stress conditions, and another with higher kernel iron concentration under drought and combined stress conditions. Breeding efforts for this nutritional trait could exploit these complementary responses through combinations of favorable allelic variation from these already well-characterized genetic stocks. PMID:24363659
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.
Maulik, Ujjwal; Sarkar, Anasua
2013-01-01
Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.
USDA-ARS?s Scientific Manuscript database
The ionome, or elemental profile, of a maize kernel represents at least two distinct ideas. First, the collection of elements within the kernel are food, feed and feedstocks for people, animals and industrial processes. Second, the ionome of the kernel represents a developmental end point that can s...