Partial Least Squares Regression Models for the Analysis of Kinase Signaling.
Bourgeois, Danielle L; Kreeger, Pamela K
2017-01-01
Partial least squares regression (PLSR) is a data-driven modeling approach that can be used to analyze multivariate relationships between kinase networks and cellular decisions or patient outcomes. In PLSR, a linear model relating an X matrix of dependent variables and a Y matrix of independent variables is generated by extracting the factors with the strongest covariation. While the identified relationship is correlative, PLSR models can be used to generate quantitative predictions for new conditions or perturbations to the network, allowing for mechanisms to be identified. This chapter will provide a brief explanation of PLSR and provide an instructive example to demonstrate the use of PLSR to analyze kinase signaling.
Perez-Guaita, David; Kuligowski, Julia; Quintás, Guillermo; Garrigues, Salvador; Guardia, Miguel de la
2013-03-30
Locally weighted partial least squares regression (LW-PLSR) has been applied to the determination of four clinical parameters in human serum samples (total protein, triglyceride, glucose and urea contents) by Fourier transform infrared (FTIR) spectroscopy. Classical LW-PLSR models were constructed using different spectral regions. For the selection of parameters by LW-PLSR modeling, a multi-parametric study was carried out employing the minimum root-mean square error of cross validation (RMSCV) as objective function. In order to overcome the effect of strong matrix interferences on the predictive accuracy of LW-PLSR models, this work focuses on sample selection. Accordingly, a novel strategy for the development of local models is proposed. It was based on the use of: (i) principal component analysis (PCA) performed on an analyte specific spectral region for identifying most similar sample spectra and (ii) partial least squares regression (PLSR) constructed using the whole spectrum. Results found by using this strategy were compared to those provided by PLSR using the same spectral intervals as for LW-PLSR. Prediction errors found by both, classical and modified LW-PLSR improved those obtained by PLSR. Hence, both proposed approaches were useful for the determination of analytes present in a complex matrix as in the case of human serum samples. Copyright © 2013 Elsevier B.V. All rights reserved.
Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald
2011-06-01
Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.
2011-01-01
Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems. PMID:21627852
Rapid Detection of Volatile Oil in Mentha haplocalyx by Near-Infrared Spectroscopy and Chemometrics.
Yan, Hui; Guo, Cheng; Shao, Yang; Ouyang, Zhen
2017-01-01
Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . The effects of data pre-processing methods on the accuracy of the PLSR calibration models were investigated. The performance of the final model was evaluated according to the correlation coefficient ( R ) and root mean square error of prediction (RMSEP). For PLSR model, the best preprocessing method combination was first-order derivative, standard normal variate transformation (SNV), and mean centering, which had of 0.8805, of 0.8719, RMSEC of 0.091, and RMSEP of 0.097, respectively. The wave number variables linking to volatile oil are from 5500 to 4000 cm-1 by analyzing the loading weights and variable importance in projection (VIP) scores. For SVM model, six LVs (less than seven LVs in PLSR model) were adopted in model, and the result was better than PLSR model. The and were 0.9232 and 0.9202, respectively, with RMSEC and RMSEP of 0.084 and 0.082, respectively, which indicated that the predicted values were accurate and reliable. This work demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in M. haplocalyx . The quality of medicine directly links to clinical efficacy, thus, it is important to control the quality of Mentha haplocalyx . Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . For SVM model, 6 LVs (less than 7 LVs in PLSR model) were adopted in model, and the result was better than PLSR model. It demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in Mentha haplocalyx . Abbreviations used: 1 st der: First-order derivative; 2 nd der: Second-order derivative; LOO: Leave-one-out; LVs: Latent variables; MC: Mean centering, NIR: Near-infrared; NIRS: Near infrared spectroscopy; PCR: Principal component regression, PLSR: Partial least squares regression; RBF: Radial basis function; RMSEC: Root mean square error of cross validation, RMSEC: Root mean square error of calibration; RMSEP: Root mean square error of prediction; SNV: Standard normal variate transformation; SVM: Support vector machine; VIP: Variable Importance in projection.
Hao, Yong; Sun, Xu-Dong; Yang, Qiang
2012-12-01
Variables selection strategy combined with local linear embedding (LLE) was introduced for the analysis of complex samples by using near infrared spectroscopy (NIRS). Three methods include Monte Carlo uninformation variable elimination (MCUVE), successive projections algorithm (SPA) and MCUVE connected with SPA were used for eliminating redundancy spectral variables. Partial least squares regression (PLSR) and LLE-PLSR were used for modeling complex samples. The results shown that MCUVE can both extract effective informative variables and improve the precision of models. Compared with PLSR models, LLE-PLSR models can achieve more accurate analysis results. MCUVE combined with LLE-PLSR is an effective modeling method for NIRS quantitative analysis.
Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen
2016-02-01
Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation--partial least squares regression (PLSR) method effectively solves the information loss problem of correlation--multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R(2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions.
NASA Astrophysics Data System (ADS)
Mai, W.; Zhang, J.-F.; Zhao, X.-M.; Li, Z.; Xu, Z.-W.
2017-11-01
Wastewater from the dye industry is typically analyzed using a standard method for measurement of chemical oxygen demand (COD) or by a single-wavelength spectroscopic method. To overcome the disadvantages of these methods, ultraviolet-visible (UV-Vis) spectroscopy was combined with principal component regression (PCR) and partial least squares regression (PLSR) in this study. Unlike the standard method, this method does not require digestion of the samples for preparation. Experiments showed that the PLSR model offered high prediction performance for COD, with a mean relative error of about 5% for two dyes. This error is similar to that obtained with the standard method. In this study, the precision of the PLSR model decreased with the number of dye compounds present. It is likely that multiple models will be required in reality, and the complexity of a COD monitoring system would be greatly reduced if the PLSR model is used because it can include several dyes. UV-Vis spectroscopy with PLSR successfully enhanced the performance of COD prediction for dye wastewater and showed good potential for application in on-line water quality monitoring.
A deep belief network with PLSR for nonlinear system modeling.
Qiao, Junfei; Wang, Gongming; Li, Wenjing; Li, Xiaoli
2018-08-01
Nonlinear system modeling plays an important role in practical engineering, and deep learning-based deep belief network (DBN) is now popular in nonlinear system modeling and identification because of the strong learning ability. However, the existing weights optimization for DBN is based on gradient, which always leads to a local optimum and a poor training result. In this paper, a DBN with partial least square regression (PLSR-DBN) is proposed for nonlinear system modeling, which focuses on the problem of weights optimization for DBN using PLSR. Firstly, unsupervised contrastive divergence (CD) algorithm is used in weights initialization. Secondly, initial weights derived from CD algorithm are optimized through layer-by-layer PLSR modeling from top layer to bottom layer. Instead of gradient method, PLSR-DBN can determine the optimal weights using several PLSR models, so that a better performance of PLSR-DBN is achieved. Then, the analysis of convergence is theoretically given to guarantee the effectiveness of the proposed PLSR-DBN model. Finally, the proposed PLSR-DBN is tested on two benchmark nonlinear systems and an actual wastewater treatment system as well as a handwritten digit recognition (nonlinear mapping and modeling) with high-dimension input data. The experiment results show that the proposed PLSR-DBN has better performances of time and accuracy on nonlinear system modeling than that of other methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
Quantitative Analysis of Single and Mix Food Antiseptics Basing on SERS Spectra with PLSR Method
NASA Astrophysics Data System (ADS)
Hou, Mengjing; Huang, Yu; Ma, Lingwei; Zhang, Zhengjun
2016-06-01
Usage and dosage of food antiseptics are very concerned due to their decisive influence in food safety. Surface-enhanced Raman scattering (SERS) effect was employed in this research to realize trace potassium sorbate (PS) and sodium benzoate (SB) detection. HfO2 ultrathin film-coated Ag NR array was fabricated as SERS substrate. Protected by HfO2 film, the SERS substrate possesses good acid resistance, which enables it to be applicable in acidic environment where PS and SB work. Regression relationship between SERS spectra of 0.3~10 mg/L PS solution and their concentration was calibrated by partial least squares regression (PLSR) method, and the concentration prediction performance was quite satisfactory. Furthermore, mixture solution of PS and SB was also quantitatively analyzed by PLSR method. Spectrum data of characteristic peak sections corresponding to PS and SB was used to establish the regression models of these two solutes, respectively, and their concentrations were determined accurately despite their characteristic peak sections overlapping. It is possible that the unique modeling process of PLSR method prevented the overlapped Raman signal from reducing the model accuracy.
Wang, Hai-Xia; Suo, Tong-Chuan; Yu, He-Shui; Li, Zheng
2016-10-01
The manufacture of traditional Chinese medicine (TCM) products is always accompanied by processing complex raw materials and real-time monitoring of the manufacturing process. In this study, we investigated different modeling strategies for the extraction process of licorice. Near-infrared spectra associate with the extraction time was used to detemine the states of the extraction processes. Three modeling approaches, i.e., principal component analysis (PCA), partial least squares regression (PLSR) and parallel factor analysis-PLSR (PARAFAC-PLSR), were adopted for the prediction of the real-time status of the process. The overall results indicated that PCA, PLSR and PARAFAC-PLSR can effectively detect the errors in the extraction procedure and predict the process trajectories, which has important significance for the monitoring and controlling of the extraction processes. Copyright© by the Chinese Pharmaceutical Association.
NASA Astrophysics Data System (ADS)
Li, Jiangtong; Luo, Yongdao; Dai, Honglin
2018-01-01
Water is the source of life and the essential foundation of all life. With the development of industrialization, the phenomenon of water pollution is becoming more and more frequent, which directly affects the survival and development of human. Water quality detection is one of the necessary measures to protect water resources. Ultraviolet (UV) spectral analysis is an important research method in the field of water quality detection, which partial least squares regression (PLSR) analysis method is becoming predominant technology, however, in some special cases, PLSR's analysis produce considerable errors. In order to solve this problem, the traditional principal component regression (PCR) analysis method was improved by using the principle of PLSR in this paper. The experimental results show that for some special experimental data set, improved PCR analysis method performance is better than PLSR. The PCR and PLSR is the focus of this paper. Firstly, the principal component analysis (PCA) is performed by MATLAB to reduce the dimensionality of the spectral data; on the basis of a large number of experiments, the optimized principal component is extracted by using the principle of PLSR, which carries most of the original data information. Secondly, the linear regression analysis of the principal component is carried out with statistic package for social science (SPSS), which the coefficients and relations of principal components can be obtained. Finally, calculating a same water spectral data set by PLSR and improved PCR, analyzing and comparing two results, improved PCR and PLSR is similar for most data, but improved PCR is better than PLSR for data near the detection limit. Both PLSR and improved PCR can be used in Ultraviolet spectral analysis of water, but for data near the detection limit, improved PCR's result better than PLSR.
Soil sail content estimation in the yellow river delta with satellite hyperspectral data
Weng, Yongling; Gong, Peng; Zhu, Zhi-Liang
2008-01-01
Soil salinization is one of the most common land degradation processes and is a severe environmental hazard. The primary objective of this study is to investigate the potential of predicting salt content in soils with hyperspectral data acquired with EO-1 Hyperion. Both partial least-squares regression (PLSR) and conventional multiple linear regression (MLR), such as stepwise regression (SWR), were tested as the prediction model. PLSR is commonly used to overcome the problem caused by high-dimensional and correlated predictors. Chemical analysis of 95 samples collected from the top layer of soils in the Yellow River delta area shows that salt content was high on average, and the dominant chemicals in the saline soil were NaCl and MgCl2. Multivariate models were established between soil contents and hyperspectral data. Our results indicate that the PLSR technique with laboratory spectral data has a strong prediction capacity. Spectral bands at 1487-1527, 1971-1991, 2032-2092, and 2163-2355 nm possessed large absolute values of regression coefficients, with the largest coefficient at 2203 nm. We obtained a root mean squared error (RMSE) for calibration (with 61 samples) of RMSEC = 0.753 (R2 = 0.893) and a root mean squared error for validation (with 30 samples) of RMSEV = 0.574. The prediction model was applied on a pixel-by-pixel basis to a Hyperion reflectance image to yield a quantitative surface distribution map of soil salt content. The result was validated successfully from 38 sampling points. We obtained an RMSE estimate of 1.037 (R2 = 0.784) for the soil salt content map derived by the PLSR model. The salinity map derived from the SWR model shows that the predicted value is higher than the true value. These results demonstrate that the PLSR method is a more suitable technique than stepwise regression for quantitative estimation of soil salt content in a large area. ?? 2008 CASI.
Bian, Xihui; Li, Shujuan; Lin, Ligang; Tan, Xiaoyao; Fan, Qingjie; Li, Ming
2016-06-21
Accurate prediction of the model is fundamental to the successful analysis of complex samples. To utilize abundant information embedded over frequency and time domains, a novel regression model is presented for quantitative analysis of hydrocarbon contents in the fuel oil samples. The proposed method named as high and low frequency unfolded PLSR (HLUPLSR), which integrates empirical mode decomposition (EMD) and unfolded strategy with partial least squares regression (PLSR). In the proposed method, the original signals are firstly decomposed into a finite number of intrinsic mode functions (IMFs) and a residue by EMD. Secondly, the former high frequency IMFs are summed as a high frequency matrix and the latter IMFs and residue are summed as a low frequency matrix. Finally, the two matrices are unfolded to an extended matrix in variable dimension, and then the PLSR model is built between the extended matrix and the target values. Coupled with Ultraviolet (UV) spectroscopy, HLUPLSR has been applied to determine hydrocarbon contents of light gas oil and diesel fuels samples. Comparing with single PLSR and other signal processing techniques, the proposed method shows superiority in prediction ability and better model interpretation. Therefore, HLUPLSR method provides a promising tool for quantitative analysis of complex samples. Copyright © 2016 Elsevier B.V. All rights reserved.
Hao, Z Q; Li, C M; Shen, M; Yang, X Y; Li, K H; Guo, L B; Li, X Y; Lu, Y F; Zeng, X Y
2015-03-23
Laser-induced breakdown spectroscopy (LIBS) with partial least squares regression (PLSR) has been applied to measuring the acidity of iron ore, which can be defined by the concentrations of oxides: CaO, MgO, Al₂O₃, and SiO₂. With the conventional internal standard calibration, it is difficult to establish the calibration curves of CaO, MgO, Al₂O₃, and SiO₂ in iron ore due to the serious matrix effects. PLSR is effective to address this problem due to its excellent performance in compensating the matrix effects. In this work, fifty samples were used to construct the PLSR calibration models for the above-mentioned oxides. These calibration models were validated by the 10-fold cross-validation method with the minimum root-mean-square errors (RMSE). Another ten samples were used as a test set. The acidities were calculated according to the estimated concentrations of CaO, MgO, Al₂O₃, and SiO₂ using the PLSR models. The average relative error (ARE) and RMSE of the acidity achieved 3.65% and 0.0048, respectively, for the test samples.
NASA Astrophysics Data System (ADS)
Peterson, K. T.; Wulamu, A.
2017-12-01
Water, essential to all living organisms, is one of the Earth's most precious resources. Remote sensing offers an ideal approach to monitor water quality over traditional in-situ techniques that are highly time and resource consuming. Utilizing a multi-scale approach, incorporating data from handheld spectroscopy, UAS based hyperspectal, and satellite multispectral images were collected in coordination with in-situ water quality samples for the two midwestern watersheds. The remote sensing data was modeled and correlated to the in-situ water quality variables including chlorophyll content (Chl), turbidity, and total dissolved solids (TDS) using Normalized Difference Spectral Indices (NDSI) and Partial Least Squares Regression (PLSR). The results of the study supported the original hypothesis that correlating water quality variables with remotely sensed data benefits greatly from the use of more complex modeling and regression techniques such as PLSR. The final results generated from the PLSR analysis resulted in much higher R2 values for all variables when compared to NDSI. The combination of NDSI and PLSR analysis also identified key wavelengths for identification that aligned with previous study's findings. This research displays the advantages and future for complex modeling and machine learning techniques to improve water quality variable estimation from spectral data.
Marabel, Miguel; Alvarez-Taboada, Flor
2013-01-01
Aboveground biomass (AGB) is one of the strategic biophysical variables of interest in vegetation studies. The main objective of this study was to evaluate the Support Vector Machine (SVM) and Partial Least Squares Regression (PLSR) for estimating the AGB of grasslands from field spectrometer data and to find out which data pre-processing approach was the most suitable. The most accurate model to predict the total AGB involved PLSR and the Maximum Band Depth index derived from the continuum removed reflectance in the absorption features between 916–1,120 nm and 1,079–1,297 nm (R2 = 0.939, RMSE = 7.120 g/m2). Regarding the green fraction of the AGB, the Area Over the Minimum index derived from the continuum removed spectra provided the most accurate model overall (R2 = 0.939, RMSE = 3.172 g/m2). Identifying the appropriate absorption features was proved to be crucial to improve the performance of PLSR to estimate the total and green aboveground biomass, by using the indices derived from those spectral regions. Ordinary Least Square Regression could be used as a surrogate for the PLSR approach with the Area Over the Minimum index as the independent variable, although the resulting model would not be as accurate. PMID:23925082
Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Kim, Moon S; Chao, Kuanglin; Qin, Jianwei; Fu, Xiaping; Baek, Insuck; Cho, Byoung-Kwan
2016-05-01
Illegal use of nitrogen-rich melamine (C3H6N6) to boost perceived protein content of food products such as milk, infant formula, frozen yogurt, pet food, biscuits, and coffee drinks has caused serious food safety problems. Conventional methods to detect melamine in foods, such as Enzyme-linked immunosorbent assay (ELISA), High-performance liquid chromatography (HPLC), and Gas chromatography-mass spectrometry (GC-MS), are sensitive but they are time-consuming, expensive, and labor-intensive. In this research, near-infrared (NIR) hyperspectral imaging technique combined with regression coefficient of partial least squares regression (PLSR) model was used to detect melamine particles in milk powders easily and quickly. NIR hyperspectral reflectance imaging data in the spectral range of 990-1700nm were acquired from melamine-milk powder mixture samples prepared at various concentrations ranging from 0.02% to 1%. PLSR models were developed to correlate the spectral data (independent variables) with melamine concentration (dependent variables) in melamine-milk powder mixture samples. PLSR models applying various pretreatment methods were used to reconstruct the two-dimensional PLS images. PLS images were converted to the binary images to detect the suspected melamine pixels in milk powder. As the melamine concentration was increased, the numbers of suspected melamine pixels of binary images were also increased. These results suggested that NIR hyperspectral imaging technique and the PLSR model can be regarded as an effective tool to detect melamine particles in milk powders. Copyright © 2016 Elsevier B.V. All rights reserved.
Application of near-infrared spectroscopy in the detection of fat-soluble vitamins in premix feed
NASA Astrophysics Data System (ADS)
Jia, Lian Ping; Tian, Shu Li; Zheng, Xue Cong; Jiao, Peng; Jiang, Xun Peng
2018-02-01
Vitamin is the organic compound and necessary for animal physiological maintenance. The rapid determination of the content of different vitamins in premix feed can help to achieve accurate diets and efficient feeding. Compared with high-performance liquid chromatography and other wet chemical methods, near-infrared spectroscopy is a fast, non-destructive, non-polluting method. 168 samples of premix feed were collected and the contents of vitamin A, vitamin E and vitamin D3 were detected by the standard method. The near-infrared spectra of samples ranging from 10 000 to 4 000 cm-1 were obtained. Partial least squares regression (PLSR) and support vector machine regression (SVMR) were used to construct the quantitative model. The results showed that the RMSEP of PLSR model of vitamin A, vitamin E and vitamin D3 were 0.43×107 IU/kg, 0.09×105 IU/kg and 0.17×107 IU/kg, respectively. The RMSEP of SVMR model was 0.45×107 IU/kg, 0.11×105 IU/kg and 0.18×107 IU/kg. Compared with nonlinear regression method (SVMR), linear regression method (PLSR) is more suitable for the quantitative analysis of vitamins in premix feed.
Quantitative determination of Auramine O by terahertz spectroscopy with 2DCOS-PLSR model
NASA Astrophysics Data System (ADS)
Zhang, Huo; Li, Zhi; Chen, Tao; Qin, Binyi
2017-09-01
Residues of harmful dyes such as Auramine O (AO) in herb and food products threaten the health of people. So, fast and sensitive detection techniques of the residues are needed. As a powerful tool for substance detection, terahertz (THz) spectroscopy was used for the quantitative determination of AO by combining with an improved partial least-squares regression (PLSR) model in this paper. Absorbance of herbal samples with different concentrations was obtained by THz-TDS in the band between 0.2THz and 1.6THz. We applied two-dimensional correlation spectroscopy (2DCOS) to improve the PLSR model. This method highlighted the spectral differences of different concentrations, provided a clear criterion of the input interval selection, and improved the accuracy of detection result. The experimental result indicated that the combination of the THz spectroscopy and 2DCOS-PLSR is an excellent quantitative analysis method.
Kandpal, Lalit Mohan; Lee, Hoonsoo; Kim, Moon S.; Mo, Changyeun; Cho, Byoung-Kwan
2013-01-01
Spectroscopy has proven to be an efficient tool for measuring the properties of meat. In this article, hyperspectral imaging (HSI) techniques are used to determine the moisture content in cooked chicken breast over the VIS/NIR (400–1,000 nm) spectral range. Moisture measurements were performed using an oven drying method. A partial least squares regression (PLSR) model was developed to extract a relationship between the HSI spectra and the moisture content. In the full wavelength range, the PLSR model possessed a maximum R2p of 0.90 and an SEP of 0.74%. For the NIR range, the PLSR model yielded an R2p of 0.94 and an SEP of 0.71%. The majority of the absorption peaks occurred around 760 and 970 nm, representing the water content in the samples. Finally, PLSR images were constructed to visualize the dehydration and water distribution within different sample regions. The high correlation coefficient and low prediction error from the PLSR analysis validates that HSI is an effective tool for visualizing the chemical properties of meat. PMID:24084119
Genkawa, Takuma; Shinzawa, Hideyuki; Kato, Hideaki; Ishikawa, Daitaro; Murayama, Kodai; Komiyama, Makoto; Ozaki, Yukihiro
2015-12-01
An alternative baseline correction method for diffuse reflection near-infrared (NIR) spectra, searching region standard normal variate (SRSNV), was proposed. Standard normal variate (SNV) is an effective pretreatment method for baseline correction of diffuse reflection NIR spectra of powder and granular samples; however, its baseline correction performance depends on the NIR region used for SNV calculation. To search for an optimal NIR region for baseline correction using SNV, SRSNV employs moving window partial least squares regression (MWPLSR), and an optimal NIR region is identified based on the root mean square error (RMSE) of cross-validation of the partial least squares regression (PLSR) models with the first latent variable (LV). The performance of SRSNV was evaluated using diffuse reflection NIR spectra of mixture samples consisting of wheat flour and granular glucose (0-100% glucose at 5% intervals). From the obtained NIR spectra of the mixture in the 10 000-4000 cm(-1) region at 4 cm intervals (1501 spectral channels), a series of spectral windows consisting of 80 spectral channels was constructed, and then SNV spectra were calculated for each spectral window. Using these SNV spectra, a series of PLSR models with the first LV for glucose concentration was built. A plot of RMSE versus the spectral window position obtained using the PLSR models revealed that the 8680–8364 cm(-1) region was optimal for baseline correction using SNV. In the SNV spectra calculated using the 8680–8364 cm(-1) region (SRSNV spectra), a remarkable relative intensity change between a band due to wheat flour at 8500 cm(-1) and that due to glucose at 8364 cm(-1) was observed owing to successful baseline correction using SNV. A PLSR model with the first LV based on the SRSNV spectra yielded a determination coefficient (R2) of 0.999 and an RMSE of 0.70%, while a PLSR model with three LVs based on SNV spectra calculated in the full spectral region gave an R2 of 0.995 and an RMSE of 2.29%. Additional evaluation of SRSNV was carried out using diffuse reflection NIR spectra of marzipan and corn samples, and PLSR models based on SRSNV spectra showed good prediction results. These evaluation results indicate that SRSNV is effective in baseline correction of diffuse reflection NIR spectra and provides regression models with good prediction accuracy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tripathi, Markandey M.; Krishnan, Sundar R.; Srinivasan, Kalyan K.
Chemiluminescence emissions from OH*, CH*, C2, and CO2 formed within the reaction zone of premixed flames depend upon the fuel-air equivalence ratio in the burning mixture. In the present paper, a new partial least square regression (PLS-R) based multivariate sensing methodology is investigated and compared with an OH*/CH* intensity ratio-based calibration model for sensing equivalence ratio in atmospheric methane-air premixed flames. Five replications of spectral data at nine different equivalence ratios ranging from 0.73 to 1.48 were used in the calibration of both models. During model development, the PLS-R model was initially validated with the calibration data set using themore » leave-one-out cross validation technique. Since the PLS-R model used the entire raw spectral intensities, it did not need the nonlinear background subtraction of CO2 emission that is required for typical OH*/CH* intensity ratio calibrations. An unbiased spectral data set (not used in the PLS-R model development), for 28 different equivalence ratio conditions ranging from 0.71 to 1.67, was used to predict equivalence ratios using the PLS-R and the intensity ratio calibration models. It was found that the equivalence ratios predicted with the PLS-R based multivariate calibration model matched the experimentally measured equivalence ratios within 7%; whereas, the OH*/CH* intensity ratio calibration grossly underpredicted equivalence ratios in comparison to measured equivalence ratios, especially under rich conditions ( > 1.2). The practical implications of the chemiluminescence-based multivariate equivalence ratio sensing methodology are also discussed.« less
How to predict the sugariness and hardness of melons: A near-infrared hyperspectral imaging method.
Sun, Meijun; Zhang, Dong; Liu, Li; Wang, Zheng
2017-03-01
Hyperspectral imaging (HSI) in the near-infrared (NIR) region (900-1700nm) was used for non-intrusive quality measurements (of sweetness and texture) in melons. First, HSI data from melon samples were acquired to extract the spectral signatures. The corresponding sample sweetness and hardness values were recorded using traditional intrusive methods. Partial least squares regression (PLSR), principal component analysis (PCA), support vector machine (SVM), and artificial neural network (ANN) models were created to predict melon sweetness and hardness values from the hyperspectral data. Experimental results for the three types of melons show that PLSR produces the most accurate results. To reduce the high dimensionality of the hyperspectral data, the weighted regression coefficients of the resulting PLSR models were used to identify the most important wavelengths. On the basis of these wavelengths, each image pixel was used to visualize the sweetness and hardness in all the portions of each sample. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Cheng, Jun-Hu; Jin, Huali; Liu, Zhiwei
2018-01-01
The feasibility of developing a multispectral imaging method using important wavelengths from hyperspectral images selected by genetic algorithm (GA), successive projection algorithm (SPA) and regression coefficient (RC) methods for modeling and predicting protein content in peanut kernel was investigated for the first time. Partial least squares regression (PLSR) calibration model was established between the spectral data from the selected optimal wavelengths and the reference measured protein content ranged from 23.46% to 28.43%. The RC-PLSR model established using eight key wavelengths (1153, 1567, 1972, 2143, 2288, 2339, 2389 and 2446 nm) showed the best predictive results with the coefficient of determination of prediction (R2P) of 0.901, and root mean square error of prediction (RMSEP) of 0.108 and residual predictive deviation (RPD) of 2.32. Based on the obtained best model and image processing algorithms, the distribution maps of protein content were generated. The overall results of this study indicated that developing a rapid and online multispectral imaging system using the feature wavelengths and PLSR analysis is potential and feasible for determination of the protein content in peanut kernels.
NASA Astrophysics Data System (ADS)
Sarkar, Arnab; Karki, Vijay; Aggarwal, Suresh K.; Maurya, Gulab S.; Kumar, Rohit; Rai, Awadhesh K.; Mao, Xianglei; Russo, Richard E.
2015-06-01
Laser induced breakdown spectroscopy (LIBS) was applied for elemental characterization of high alloy steel using partial least squares regression (PLSR) with an objective to evaluate the analytical performance of this multivariate approach. The optimization of the number of principle components for minimizing error in PLSR algorithm was investigated. The effect of different pre-treatment procedures on the raw spectral data before PLSR analysis was evaluated based on several statistical (standard error of prediction, percentage relative error of prediction etc.) parameters. The pre-treatment with "NORM" parameter gave the optimum statistical results. The analytical performance of PLSR model improved by increasing the number of laser pulses accumulated per spectrum as well as by truncating the spectrum to appropriate wavelength region. It was found that the statistical benefit of truncating the spectrum can also be accomplished by increasing the number of laser pulses per accumulation without spectral truncation. The constituents (Co and Mo) present in hundreds of ppm were determined with relative precision of 4-9% (2σ), whereas the major constituents Cr and Ni (present at a few percent levels) were determined with a relative precision of ~ 2%(2σ).
NASA Astrophysics Data System (ADS)
Nawar, Said; Buddenbaum, Henning; Hill, Joachim
2014-05-01
A rapid and inexpensive soil analytical technique is needed for soil quality assessment and accurate mapping. This study investigated a method for improved estimation of soil clay (SC) and organic matter (OM) using reflectance spectroscopy. Seventy soil samples were collected from Sinai peninsula in Egypt to estimate the soil clay and organic matter relative to the soil spectra. Soil samples were scanned with an Analytical Spectral Devices (ASD) spectrometer (350-2500 nm). Three spectral formats were used in the calibration models derived from the spectra and the soil properties: (1) original reflectance spectra (OR), (2) first-derivative spectra smoothened using the Savitzky-Golay technique (FD-SG) and (3) continuum-removed reflectance (CR). Partial least-squares regression (PLSR) models using the CR of the 400-2500 nm spectral region resulted in R2 = 0.76 and 0.57, and RPD = 2.1 and 1.5 for estimating SC and OM, respectively, indicating better performance than that obtained using OR and SG. The multivariate adaptive regression splines (MARS) calibration model with the CR spectra resulted in an improved performance (R2 = 0.89 and 0.83, RPD = 3.1 and 2.4) for estimating SC and OM, respectively. The results show that the MARS models have a great potential for estimating SC and OM compared with PLSR models. The results obtained in this study have potential value in the field of soil spectroscopy because they can be applied directly to the mapping of soil properties using remote sensing imagery in arid environment conditions. Key Words: soil clay, organic matter, PLSR, MARS, reflectance spectroscopy.
NASA Astrophysics Data System (ADS)
Jiang, Hao; Lu, Jiangang
2018-05-01
Corn starch is an important material which has been traditionally used in the fields of food and chemical industry. In order to enhance the rapidness and reliability of the determination for starch content in corn, a methodology is proposed in this work, using an optimal CC-PLSR-RBFNN calibration model and near-infrared (NIR) spectroscopy. The proposed model was developed based on the optimal selection of crucial parameters and the combination of correlation coefficient method (CC), partial least squares regression (PLSR) and radial basis function neural network (RBFNN). To test the performance of the model, a standard NIR spectroscopy data set was introduced, containing spectral information and chemical reference measurements of 80 corn samples. For comparison, several other models based on the identical data set were also briefly discussed. In this process, the root mean square error of prediction (RMSEP) and coefficient of determination (Rp2) in the prediction set were used to make evaluations. As a result, the proposed model presented the best predictive performance with the smallest RMSEP (0.0497%) and the highest Rp2 (0.9968). Therefore, the proposed method combining NIR spectroscopy with the optimal CC-PLSR-RBFNN model can be helpful to determine starch content in corn.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
NASA Astrophysics Data System (ADS)
Chen, Pengfei; Jing, Qi
2017-02-01
An assumption that the non-linear method is more reasonable than the linear method when canopy reflectance is used to establish the yield prediction model was proposed and tested in this study. For this purpose, partial least squares regression (PLSR) and artificial neural networks (ANN), represented linear and non-linear analysis method, were applied and compared for wheat yield prediction. Multi-period Landsat-8 OLI images were collected at two different wheat growth stages, and a field campaign was conducted to obtain grain yields at selected sampling sites in 2014. The field data were divided into a calibration database and a testing database. Using calibration data, a cross-validation concept was introduced for the PLSR and ANN model construction to prevent over-fitting. All models were tested using the test data. The ANN yield-prediction model produced R2, RMSE and RMSE% values of 0.61, 979 kg ha-1, and 10.38%, respectively, in the testing phase, performing better than the PLSR yield-prediction model, which produced R2, RMSE, and RMSE% values of 0.39, 1211 kg ha-1, and 12.84%, respectively. Non-linear method was suggested as a better method for yield prediction.
Feng, Chao-Hui; Makino, Yoshio; Yoshimura, Masatoshi; Thuyet, Dang Quoc; García-Martín, Juan Francisco
2018-02-01
The potential of hyperspectral imaging with wavelengths of 380 to 1000 nm was used to determine the pH of cooked sausages after different storage conditions (4 °C for 1 d, 35 °C for 1, 3, and 5 d). The mean spectra of the sausages were extracted from the hyperspectral images and partial least squares regression (PLSR) model was developed to relate spectral profiles with the pH of the cooked sausages. Eleven important wavelengths were selected based on the regression coefficient values. The PLSR model established using the optimal wavelengths showed good precision being the prediction coefficient of determination (R p 2 ) 0.909 and the root mean square error of prediction 0.035. The prediction map for illustrating pH indices in sausages was for the first time developed by R statistics. The overall results suggested that hyperspectral imaging combined with PLSR and R statistics are capable to quantify and visualize the sausages pH evolution under different storage conditions. In this paper, hyperspectral imaging is for the first time used to detect pH in cooked sausages using R statistics, which provides another useful information for the researchers who do not have the access to Matlab. Eleven optimal wavelengths were successfully selected, which were used for simplifying the PLSR model established based on the full wavelengths. This simplified model achieved a high R p 2 (0.909) and a low root mean square error of prediction (0.035), which can be useful for the design of multispectral imaging systems. © 2017 Institute of Food Technologists®.
Barmeier, Gero; Schmidhalter, Urs
2017-01-01
To optimize plant architecture (e.g., photosynthetic active leaf area, leaf-stem ratio), plant physiologists and plant breeders rely on destructively and tediously harvested biomass samples. A fast and non-destructive method for obtaining information about different plant organs could be vehicle-based spectral proximal sensing. In this 3-year study, the mobile phenotyping platform PhenoTrac 4 was used to compare the measurements from active and passive spectral proximal sensors of leaves, leaf sheaths, culms and ears of 34 spring barley cultivars at anthesis and dough ripeness. Published vegetation indices (VI), partial least square regression (PLSR) models and contour map analysis were compared to assess these traits. Contour maps are matrices consisting of coefficients of determination for all of the binary combinations of wavelengths and the biomass parameters. The PLSR models of leaves, leaf sheaths and culms showed strong correlations ( R 2 = 0.61-0.76). Published vegetation indices depicted similar coefficients of determination; however, their RMSEs were higher. No wavelength combination could be found by the contour map analysis to improve the results of the PLSR or published VIs. The best results were obtained for the dry weight and N uptake of leaves and culms. The PLSR models yielded satisfactory relationships for leaf sheaths at anthesis ( R 2 = 0.69), whereas only a low performance for all of sensors and methods was observed at dough ripeness. No relationships with ears were observed. Active and passive sensors performed comparably, with slight advantages observed for the passive spectrometer. The results indicate that tractor-based proximal sensing in combination with optimized spectral indices or PLSR models may represent a suitable tool for plant breeders to assess relevant morphological traits, allowing for a better understanding of plant architecture, which is closely linked to the physiological performance. Further validation of PLSR models is required in independent studies. Organ specific phenotyping represents a first step toward breeding by design.
Barmeier, Gero; Schmidhalter, Urs
2017-01-01
To optimize plant architecture (e.g., photosynthetic active leaf area, leaf-stem ratio), plant physiologists and plant breeders rely on destructively and tediously harvested biomass samples. A fast and non-destructive method for obtaining information about different plant organs could be vehicle-based spectral proximal sensing. In this 3-year study, the mobile phenotyping platform PhenoTrac 4 was used to compare the measurements from active and passive spectral proximal sensors of leaves, leaf sheaths, culms and ears of 34 spring barley cultivars at anthesis and dough ripeness. Published vegetation indices (VI), partial least square regression (PLSR) models and contour map analysis were compared to assess these traits. Contour maps are matrices consisting of coefficients of determination for all of the binary combinations of wavelengths and the biomass parameters. The PLSR models of leaves, leaf sheaths and culms showed strong correlations (R2 = 0.61–0.76). Published vegetation indices depicted similar coefficients of determination; however, their RMSEs were higher. No wavelength combination could be found by the contour map analysis to improve the results of the PLSR or published VIs. The best results were obtained for the dry weight and N uptake of leaves and culms. The PLSR models yielded satisfactory relationships for leaf sheaths at anthesis (R2 = 0.69), whereas only a low performance for all of sensors and methods was observed at dough ripeness. No relationships with ears were observed. Active and passive sensors performed comparably, with slight advantages observed for the passive spectrometer. The results indicate that tractor-based proximal sensing in combination with optimized spectral indices or PLSR models may represent a suitable tool for plant breeders to assess relevant morphological traits, allowing for a better understanding of plant architecture, which is closely linked to the physiological performance. Further validation of PLSR models is required in independent studies. Organ specific phenotyping represents a first step toward breeding by design. PMID:29163629
Li, Jie; Sun, Jin; He, Zhonggui
2007-01-26
We aimed to establish quantitative structure-retention relationship (QSRR) with immobilized artificial membrane (IAM) chromatography using easily understood and obtained physicochemical molecular descriptors and to elucidate which descriptors are critical to affect the interaction process between solutes and immobilized phospholipid membranes. The retention indices (logk(IAM)) of 55 structurally diverse drugs were determined on an immobilized artificial membrane column (IAM.PC.DD2) directly or obtained by extrapolation method for highly hydrophobic compounds. Ten simple physicochemical property descriptors (clogP, rings, rotatory bond, hydro-bond counting, etc.) of these drugs were collected and used to establish QSRR and predict the retention data by partial least squares regression (PLSR). Five descriptors, clogP, rotatory bond (RotB), rings, molecular weight (MW) and total surface area (TSA), were reserved by using the Variable Importance for Projection (VIP) values as criterion to build the final PLSR model. An external test set was employed to verify the QSRR based on the training set with the five variables, and QSRR by PLSR exhibited a satisfying predictive ability with R(p)=0.902 and RMSE(p)=0.400. Comparison of coefficients of centered and scaled variables by PLSR demonstrated that, for the descriptors studied, clogP and TSA have the most significant positive effect but the rotatable bond has significant negative effect on drug IAM chromatographic retention.
Douglas, R K; Nawar, S; Alamar, M C; Mouazen, A M; Coulon, F
2018-03-01
Visible and near infrared spectrometry (vis-NIRS) coupled with data mining techniques can offer fast and cost-effective quantitative measurement of total petroleum hydrocarbons (TPH) in contaminated soils. Literature showed however significant differences in the performance on the vis-NIRS between linear and non-linear calibration methods. This study compared the performance of linear partial least squares regression (PLSR) with a nonlinear random forest (RF) regression for the calibration of vis-NIRS when analysing TPH in soils. 88 soil samples (3 uncontaminated and 85 contaminated) collected from three sites located in the Niger Delta were scanned using an analytical spectral device (ASD) spectrophotometer (350-2500nm) in diffuse reflectance mode. Sequential ultrasonic solvent extraction-gas chromatography (SUSE-GC) was used as reference quantification method for TPH which equal to the sum of aliphatic and aromatic fractions ranging between C 10 and C 35 . Prior to model development, spectra were subjected to pre-processing including noise cut, maximum normalization, first derivative and smoothing. Then 65 samples were selected as calibration set and the remaining 20 samples as validation set. Both vis-NIR spectrometry and gas chromatography profiles of the 85 soil samples were subjected to RF and PLSR with leave-one-out cross-validation (LOOCV) for the calibration models. Results showed that RF calibration model with a coefficient of determination (R 2 ) of 0.85, a root means square error of prediction (RMSEP) 68.43mgkg -1 , and a residual prediction deviation (RPD) of 2.61 outperformed PLSR (R 2 =0.63, RMSEP=107.54mgkg -1 and RDP=2.55) in cross-validation. These results indicate that RF modelling approach is accounting for the nonlinearity of the soil spectral responses hence, providing significantly higher prediction accuracy compared to the linear PLSR. It is recommended to adopt the vis-NIRS coupled with RF modelling approach as a portable and cost effective method for the rapid quantification of TPH in soils. Copyright © 2017 Elsevier B.V. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Purpose: The aim of this study was to develop a technique for the non-destructive and rapid prediction of the moisture content in red pepper powder using near-infrared (NIR) spectroscopy and a partial least squares regression (PLSR) model. Methods: Three red pepper powder products were separated in...
[Measurement of soil organic matter and available K based on SPA-LS-SVM].
Zhang, Hai-Liang; Liu, Xue-Mei; He, Yong
2014-05-01
Visible and short wave infrared spectroscopy (Vis/SW-NIRS) was investigated in the present study for measurement of soil organic matter (OM) and available potassium (K). Four types of pretreatments including smoothing, SNV, MSC and SG smoothing+first derivative were adopted to eliminate the system noises and external disturbances. Then partial least squares regression (PLSR) and least squares-support vector machine (LS-SVM) models were implemented for calibration models. The LS-SVM model was built by using characteristic wavelength based on successive projections algorithm (SPA). Simultaneously, the performance of LSSVM models was compared with PLSR models. The results indicated that LS-SVM models using characteristic wavelength as inputs based on SPA outperformed PLSR models. The optimal SPA-LS-SVM models were achieved, and the correlation coefficient (r), and RMSEP were 0. 860 2 and 2. 98 for OM and 0. 730 5 and 15. 78 for K, respectively. The results indicated that visible and short wave near infrared spectroscopy (Vis/SW-NIRS) (325 approximately 1 075 nm) combined with LS-SVM based on SPA could be utilized as a precision method for the determination of soil properties.
Lee, Byeong-Ju; Zhou, Yaoyao; Lee, Jae Soung; Shin, Byeung Kon; Seo, Jeong-Ah; Lee, Doyup; Kim, Young-Suk
2018-01-01
The ability to determine the origin of soybeans is an important issue following the inclusion of this information in the labeling of agricultural food products becoming mandatory in South Korea in 2017. This study was carried out to construct a prediction model for discriminating Chinese and Korean soybeans using Fourier-transform infrared (FT-IR) spectroscopy and multivariate statistical analysis. The optimal prediction models for discriminating soybean samples were obtained by selecting appropriate scaling methods, normalization methods, variable influence on projection (VIP) cutoff values, and wave-number regions. The factors for constructing the optimal partial-least-squares regression (PLSR) prediction model were using second derivatives, vector normalization, unit variance scaling, and the 4000–400 cm–1 region (excluding water vapor and carbon dioxide). The PLSR model for discriminating Chinese and Korean soybean samples had the best predictability when a VIP cutoff value was not applied. When Chinese soybean samples were identified, a PLSR model that has the lowest root-mean-square error of the prediction value was obtained using a VIP cutoff value of 1.5. The optimal PLSR prediction model for discriminating Korean soybean samples was also obtained using a VIP cutoff value of 1.5. This is the first study that has combined FT-IR spectroscopy with normalization methods, VIP cutoff values, and selected wave-number regions for discriminating Chinese and Korean soybeans. PMID:29689113
The prediction of food additives in the fruit juice based on electronic nose with chemometrics.
Qiu, Shanshan; Wang, Jun
2017-09-01
Food additives are added to products to enhance their taste, and preserve flavor or appearance. While their use should be restricted to achieve a technological benefit, the contents of food additives should be also strictly controlled. In this study, E-nose was applied as an alternative to traditional monitoring technologies for determining two food additives, namely benzoic acid and chitosan. For quantitative monitoring, support vector machine (SVM), random forest (RF), extreme learning machine (ELM) and partial least squares regression (PLSR) were applied to establish regression models between E-nose signals and the amount of food additives in fruit juices. The monitoring models based on ELM and RF reached higher correlation coefficients (R 2 s) and lower root mean square errors (RMSEs) than models based on PLSR and SVM. This work indicates that E-nose combined with RF or ELM can be a cost-effective, easy-to-build and rapid detection system for food additive monitoring. Copyright © 2017 Elsevier Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Spectral scattering is useful for nondestructive sensing of fruit firmness. Prediction models, however, are typically built using multivariate statistical methods such as partial least squares regression (PLSR), whose performance generally depends on the characteristics of the data. The aim of this ...
Jackman, Patrick; Sun, Da-Wen; Elmasry, Gamal
2012-08-01
A new algorithm for the conversion of device dependent RGB colour data into device independent L*a*b* colour data without introducing noticeable error has been developed. By combining a linear colour space transform and advanced multiple regression methodologies it was possible to predict L*a*b* colour data with less than 2.2 colour units of error (CIE 1976). By transforming the red, green and blue colour components into new variables that better reflect the structure of the L*a*b* colour space, a low colour calibration error was immediately achieved (ΔE(CAL) = 14.1). Application of a range of regression models on the data further reduced the colour calibration error substantially (multilinear regression ΔE(CAL) = 5.4; response surface ΔE(CAL) = 2.9; PLSR ΔE(CAL) = 2.6; LASSO regression ΔE(CAL) = 2.1). Only the PLSR models deteriorated substantially under cross validation. The algorithm is adaptable and can be easily recalibrated to any working computer vision system. The algorithm was tested on a typical working laboratory computer vision system and delivered only a very marginal loss of colour information ΔE(CAL) = 2.35. Colour features derived on this system were able to safely discriminate between three classes of ham with 100% correct classification whereas colour features measured on a conventional colourimeter were not. Copyright © 2012 Elsevier Ltd. All rights reserved.
Xia, Qing; Liu, Changhong; Liu, Jinxia; Pan, Wenjuan; Lu, Xuzhong; Yang, Jianbo; Chen, Wei; Zheng, Lei
2016-03-30
Rancidity is an important attribute for quality assessment of butter cookies, while traditional methods for rancidity measurement are usually laborious, destructive and prone to operational error. In the present paper, the potential of applying multi-spectral imaging (MSI) technology with 19 wavelengths in the range of 405-970 nm to evaluate the rancidity in butter cookies was investigated. Moisture content, acid value and peroxide value were determined by traditional methods and then related with the spectral information by partial least squares regression (PLSR) and back-propagation artificial neural network (BP-ANN). The optimal models for predicting moisture content, acid value and peroxide value were obtained by PLSR. The correlation coefficient (r) obtained by PLSR models revealed that MSI had a perfect ability to predict moisture content (r = 0.909), acid value (r = 0.944) and peroxide value (r = 0.971). The study demonstrated that the rancidity level of butter cookies can be continuously monitored and evaluated in real-time by the multi-spectral imaging, which is of great significance for developing online food safety monitoring solutions. © 2015 Society of Chemical Industry.
Kuriakose, Saji; Joe, I Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC=0.00009% v/v). The lowest root mean square error of prediction (RMSEP=0.00016% v/v) in the test set and the highest coefficient of determination (R(2)=0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kuriakose, Saji; Joe, I. Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC = 0.00009% v/v). The lowest root mean square error of prediction (RMSEP = 0.00016% v/v) in the test set and the highest coefficient of determination (R2 = 0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model.
NASA Astrophysics Data System (ADS)
Arshad, Muhammad; Ullah, Saleem; Khurshid, Khurram; Ali, Asad
2017-10-01
Leaf Water Content (LWC) is an essential constituent of plant leaves that determines vegetation heath and its productivity. An accurate and on-time measurement of water content is crucial for planning irrigation, forecasting drought and predicting woodland fire. The retrieval of LWC from Visible to Shortwave Infrared (VSWIR: 0.4-2.5 μm) has been extensively investigated but little has been done in the Mid and Thermal Infrared (MIR and TIR: 2.50 -14.0 μm), windows of electromagnetic spectrum. This study is mainly focused on retrieval of LWC from Mid and Thermal Infrared, using Genetic Algorithm integrated with Partial Least Square Regression (PLSR). Genetic Algorithm fused with PLSR selects spectral wavebands with high predictive performance i.e., yields high adjusted-R2 and low RMSE. In our case, GA-PLSR selected eight variables (bands) and yielded highly accurate models with adjusted-R2 of 0.93 and RMSEcv equal to 7.1 %. The study also demonstrated that MIR is more sensitive to the variation in LWC as compared to TIR. However, the combined use of MIR and TIR spectra enhances the predictive performance in retrieval of LWC. The integration of Genetic Algorithm and PLSR, not only increases the estimation precision by selecting the most sensitive spectral bands but also helps in identifying the important spectral regions for quantifying water stresses in vegetation. The findings of this study will allow the future space missions (like HyspIRI) to position wavebands at sensitive regions for characterizing vegetation stresses.
Soil-Bacterium Compatibility Model as a Decision-Making Tool for Soil Bioremediation.
Horemans, Benjamin; Breugelmans, Philip; Saeys, Wouter; Springael, Dirk
2017-02-07
Bioremediation of organic pollutant contaminated soil involving bioaugmentation with dedicated bacteria specialized in degrading the pollutant is suggested as a green and economically sound alternative to physico-chemical treatment. However, intrinsic soil characteristics impact the success of bioaugmentation. The feasibility of using partial least-squares regression (PLSR) to predict the success of bioaugmentation in contaminated soil based on the intrinsic physico-chemical soil characteristics and, hence, to improve the success of bioaugmentation, was examined. As a proof of principle, PLSR was used to build soil-bacterium compatibility models to predict the bioaugmentation success of the phenanthrene-degrading Novosphingobium sp. LH128. The survival and biodegradation activity of strain LH128 were measured in 20 soils and correlated with the soil characteristics. PLSR was able to predict the strain's survival using 12 variables or less while the PAH-degrading activity of strain LH128 in soils that show survival was predicted using 9 variables. A three-step approach using the developed soil-bacterium compatibility models is proposed as a decision making tool and first estimation to select compatible soils and organisms and increase the chance of success of bioaugmentation.
NASA Astrophysics Data System (ADS)
Paul, Andrea; Meyer, Klas; Ruiken, Jan-Paul; Illner, Markus; Müller, David-Nicolas; Esche, Erik; Wozny, Günther; Westad, Frank; Maiwald, Michael
2017-03-01
A major industrial reaction based on homogeneous catalysis is hydroformylation for the production of aldehydes from alkenes and syngas. Hydroformylation in microemulsions, which is currently under investigation at Technische Universität Berlin on a mini-plant scale, was identified as a cost efficient approach which also enhances product selectivity. Herein, we present the application of online Raman spectroscopy on the reaction of 1-dodecene to 1-tridecanal within a microemulsion. To achieve a good representation of the operation range in the mini-plant with regard to concentrations of the reactants a design of experiments was used. Based on initial Raman spectra partial least squares regression (PLSR) models were calibrated for the prediction of 1-dodecene and 1-tridecanal. Limits of predictions arise from nonlinear correlations between Raman intensity and mass fractions of compounds in the microemulsion system. Furthermore, the prediction power of PLSR models becomes limited due to unexpected by-product formation. Application of the lab-scale derived calibration spectra and PLSR models on online spectra from a mini-plant operation yielded promising estimations of 1-tridecanal and acceptable predictions of 1-dodecene mass fractions suggesting Raman spectroscopy as a suitable technique for process analytics in microemulsions.
Hyperspectral imaging using a color camera and its application for pathogen detection
NASA Astrophysics Data System (ADS)
Yoon, Seung-Chul; Shin, Tae-Sung; Heitschmidt, Gerald W.; Lawrence, Kurt C.; Park, Bosoon; Gamble, Gary
2015-02-01
This paper reports the results of a feasibility study for the development of a hyperspectral image recovery (reconstruction) technique using a RGB color camera and regression analysis in order to detect and classify colonies of foodborne pathogens. The target bacterial pathogens were the six representative non-O157 Shiga-toxin producing Escherichia coli (STEC) serogroups (O26, O45, O103, O111, O121, and O145) grown in Petri dishes of Rainbow agar. The purpose of the feasibility study was to evaluate whether a DSLR camera (Nikon D700) could be used to predict hyperspectral images in the wavelength range from 400 to 1,000 nm and even to predict the types of pathogens using a hyperspectral STEC classification algorithm that was previously developed. Unlike many other studies using color charts with known and noise-free spectra for training reconstruction models, this work used hyperspectral and color images, separately measured by a hyperspectral imaging spectrometer and the DSLR color camera. The color images were calibrated (i.e. normalized) to relative reflectance, subsampled and spatially registered to match with counterpart pixels in hyperspectral images that were also calibrated to relative reflectance. Polynomial multivariate least-squares regression (PMLR) was previously developed with simulated color images. In this study, partial least squares regression (PLSR) was also evaluated as a spectral recovery technique to minimize multicollinearity and overfitting. The two spectral recovery models (PMLR and PLSR) and their parameters were evaluated by cross-validation. The QR decomposition was used to find a numerically more stable solution of the regression equation. The preliminary results showed that PLSR was more effective especially with higher order polynomial regressions than PMLR. The best classification accuracy measured with an independent test set was about 90%. The results suggest the potential of cost-effective color imaging using hyperspectral image classification algorithms for rapidly differentiating pathogens in agar plates.
Liu, Wei; Wang, Zhen-Zhong; Qing, Jian-Ping; Li, Hong-Juan; Xiao, Wei
2014-01-01
Background: Peach kernels which contain kinds of fatty acids play an important role in the regulation of a variety of physiological and biological functions. Objective: To establish an innovative and rapid diffuse reflectance near-infrared spectroscopy (DR-NIR) analysis method along with chemometric techniques for the qualitative and quantitative determination of a peach kernel. Materials and Methods: Peach kernel samples from nine different origins were analyzed with high-performance liquid chromatography (HPLC) as a reference method. DR-NIR is in the spectral range 1100-2300 nm. Principal component analysis (PCA) and partial least squares regression (PLSR) algorithm were applied to obtain prediction models, The Savitzky-Golay derivative and first derivative were adopted for the spectral pre-processing, PCA was applied to classify the varieties of those samples. For the quantitative calibration, the models of linoleic and oleinic acids were established with the PLSR algorithm and the optimal principal component (PC) numbers were selected with leave-one-out (LOO) cross-validation. The established models were evaluated with the root mean square error of deviation (RMSED) and corresponding correlation coefficients (R2). Results: The PCA results of DR-NIR spectra yield clear classification of the two varieties of peach kernel. PLSR had a better predictive ability. The correlation coefficients of the two calibration models were above 0.99, and the RMSED of linoleic and oleinic acids were 1.266% and 1.412%, respectively. Conclusion: The DR-NIR combined with PCA and PLSR algorithm could be used efficiently to identify and quantify peach kernels and also help to solve variety problem. PMID:25422544
Feature reconstruction of LFP signals based on PLSR in the neural information decoding study.
Yonghui Dong; Zhigang Shang; Mengmeng Li; Xinyu Liu; Hong Wan
2017-07-01
To solve the problems of Signal-to-Noise Ratio (SNR) and multicollinearity when the Local Field Potential (LFP) signals is used for the decoding of animal motion intention, a feature reconstruction of LFP signals based on partial least squares regression (PLSR) in the neural information decoding study is proposed in this paper. Firstly, the feature information of LFP coding band is extracted based on wavelet transform. Then the PLSR model is constructed by the extracted LFP coding features. According to the multicollinearity characteristics among the coding features, several latent variables which contribute greatly to the steering behavior are obtained, and the new LFP coding features are reconstructed. Finally, the K-Nearest Neighbor (KNN) method is used to classify the reconstructed coding features to verify the decoding performance. The results show that the proposed method can achieve the highest accuracy compared to the other three methods and the decoding effect of the proposed method is robust.
Spatial assessment of soluble solid contents on apple slices using hyperspectral imaging
USDA-ARS?s Scientific Manuscript database
A partial least squares regression (PLSR) model to map internal soluble solids content (SSC) of apples using visible/near-infrared (VNIR) hyperspectral imaging was developed. The reflectance spectra of sliced apples were extracted from hyperspectral absorbance images obtained in the 400e1000 nm rang...
Rapid analysis of pharmaceutical drugs using LIBS coupled with multivariate analysis.
Tiwari, P K; Awasthi, S; Kumar, R; Anand, R K; Rai, P K; Rai, A K
2018-02-01
Type 2 diabetes drug tablets containing voglibose having dose strengths of 0.2 and 0.3 mg of various brands have been examined, using laser-induced breakdown spectroscopy (LIBS) technique. The statistical methods such as the principal component analysis (PCA) and the partial least square regression analysis (PLSR) have been employed on LIBS spectral data for classifying and developing the calibration models of drug samples. We have developed the ratio-based calibration model applying PLSR in which relative spectral intensity ratios H/C, H/N and O/N are used. Further, the developed model has been employed to predict the relative concentration of element in unknown drug samples. The experiment has been performed in air and argon atmosphere, respectively, and the obtained results have been compared. The present model provides rapid spectroscopic method for drug analysis with high statistical significance for online control and measurement process in a wide variety of pharmaceutical industrial applications.
NASA Astrophysics Data System (ADS)
Vasat, Radim; Klement, Ales; Jaksik, Ondrej; Kodesova, Radka; Drabek, Ondrej; Boruvka, Lubos
2014-05-01
Visible and near-infrared diffuse reflectance spectroscopy (VNIR-DRS) provides a rapid and inexpensive tool for simultaneous prediction of a variety of soil properties. Usually, some sophisticated multivariate mathematical or statistical methods are employed in order to extract the required information from the raw spectra measurement. For this purpose especially the Partial least squares regression (PLSR) and Support vector machines (SVM) are the most frequently used. These methods generally benefit from the complexity with which the soil spectra are treated. But it is interesting that also techniques that focus only on a single spectral feature, such as a simple linear regression with selected continuum-removed spectra (CRS) characteristic (e.g. peak depth), can often provide competitive results. Therefore, we decided to enhance the potential of CRS taking into account all possible CRS peak parameters (area, width and depth) and develop a comprehensive methodology based on multiple linear regression approach. The eight considered soil properties were oxidizable carbon content (Cox), exchangeable (pHex) and active soil pH (pHa), particle and bulk density, CaCO3 content, crystalline and amorphous (Fed) and amorphous Fe (Feox) forms. In four cases (pHa, bulk density, Fed and Feox), of which two (Fed and Feox) were predicted reliably accurately (0.50 < R2cv < 0.80) and the other two (pHa and bulk density) only poorly (R2cv < 0.50), we obtained slightly better results than with PLSR and SVM. In one case (pHex) we achieved a significantly higher, although just reliable, accuracy (R2cv = 0.601) than with PLSR and SVM (R2cv = 0.448 and 0.442, resp.). But most interestingly, in the case of particle density, the presented approach outperformed the PLSR and SVM dramatically offering a fairly accurate prediction (R2cv = 0.827) against two failures (R2cv = 0.034 and 0.121 for PLSR and SVM, resp.). In last two cases (Cox and CaCO3) a slightly worse results were achieved then with PLSR and SVM with overall fairly accurate prediction (R2cv > 0.80). Acknowledgment: Authors acknowledge the financial support of the Ministry of Agriculture of the Czech Republic (grant No. QJ1230319).
Estimation of water quality by UV/Vis spectrometry in the framework of treated wastewater reuse.
Carré, Erwan; Pérot, Jean; Jauzein, Vincent; Lin, Liming; Lopez-Ferber, Miguel
2017-07-01
The aim of this study is to investigate the potential of ultraviolet/visible (UV/Vis) spectrometry as a complementary method for routine monitoring of reclaimed water production. Robustness of the models and compliance of their sensitivity with current quality limits are investigated. The following indicators are studied: total suspended solids (TSS), turbidity, chemical oxygen demand (COD) and nitrate. Partial least squares regression (PLSR) is used to find linear correlations between absorbances and indicators of interest. Artificial samples are made by simulating a sludge leak on the wastewater treatment plant and added to the original dataset, then divided into calibration and prediction datasets. The models are built on the calibration set, and then tested on the prediction set. The best models are developed with: PLSR for COD (R pred 2 = 0.80), TSS (R pred 2 = 0.86) and turbidity (R pred 2 = 0.96), and with a simple linear regression from absorbance at 208 nm (R pred 2 = 0.95) for nitrate concentration. The input of artificial data significantly enhances the robustness of the models. The sensitivity of the UV/Vis spectrometry monitoring system developed is compatible with quality requirements of reclaimed water production processes.
Hattori, Yusuke; Otsuka, Makoto
2017-05-30
In the pharmaceutical industry, the implementation of continuous manufacturing has been widely promoted in lieu of the traditional batch manufacturing approach. More specially, in recent years, the innovative concept of feed-forward control has been introduced in relation to process analytical technology. In the present study, we successfully developed a feed-forward control model for the tablet compression process by integrating data obtained from near-infrared (NIR) spectra and the physical properties of granules. In the pharmaceutical industry, batch manufacturing routinely allows for the preparation of granules with the desired properties through the manual control of process parameters. On the other hand, continuous manufacturing demands the automatic determination of these process parameters. Here, we proposed the development of a control model using the partial least squares regression (PLSR) method. The most significant feature of this method is the use of dataset integrating both the NIR spectra and the physical properties of the granules. Using our model, we determined that the properties of products, such as tablet weight and thickness, need to be included as independent variables in the PLSR analysis in order to predict unknown process parameters. Copyright © 2017 Elsevier B.V. All rights reserved.
Qiu, Shanshan; Wang, Jun; Gao, Liping
2014-07-09
An electronic nose (E-nose) and an electronic tongue (E-tongue) have been used to characterize five types of strawberry juices based on processing approaches (i.e., microwave pasteurization, steam blanching, high temperature short time pasteurization, frozen-thawed, and freshly squeezed). Juice quality parameters (vitamin C, pH, total soluble solid, total acid, and sugar/acid ratio) were detected by traditional measuring methods. Multivariate statistical methods (linear discriminant analysis (LDA) and partial least squares regression (PLSR)) and neural networks (Random Forest (RF) and Support Vector Machines) were employed to qualitative classification and quantitative regression. E-tongue system reached higher accuracy rates than E-nose did, and the simultaneous utilization did have an advantage in LDA classification and PLSR regression. According to cross-validation, RF has shown outstanding and indisputable performances in the qualitative and quantitative analysis. This work indicates that the simultaneous utilization of E-nose and E-tongue can discriminate processed fruit juices and predict quality parameters successfully for the beverage industry.
NASA Astrophysics Data System (ADS)
Qu, Yonghua; Jiao, Siong; Lin, Xudong
2008-10-01
Hetao Irrigation District located in Inner Mongolia, is one of the three largest irrigated area in China. In the irrigational agriculture region, for the reasons that many efforts have been put on irrigation rather than on drainage, as a result much sedimentary salt that usually is solved in water has been deposited in surface soil. So there has arisen a problem in such irrigation district that soil salinity has become a chief fact which causes land degrading. Remote sensing technology is an efficiency way to map the salinity in regional scale. In the principle of remote sensing, soil spectrum is one of the most important indications which can be used to reflect the status of soil salinity. In the past decades, many efforts have been made to reveal the spectrum characteristics of the salinized soil, such as the traditional statistic regression method. But it also has been found that when the hyper-spectral reflectance data are considered, the traditional regression method can't be treat the large dimension data, because the hyper-spectral data usually have too higher spectral band number. In this paper, a partial least squares regression (PLSR) model was established based on the statistical analysis on the soil salinity and the reflectance of hyper-spectral. Dataset were collect through the field soil samples were collected in the region of Hetao irrigation from the end of July to the beginning of August. The independent validation using data which are not included in the calibration model reveals that the proposed model can predicate the main soil components such as the content of total ions(S%), PH with higher determination coefficients(R2) of 0.728 and 0.715 respectively. And the rate of prediction to deviation(RPD) of the above predicted value are larger than 1.6, which indicates that the calibrated PLSR model can be used as a tool to retrieve soil salinity with accurate results. When the PLSR model's regression coefficients were aggregated according to the wavelength of visual (blue, green, red) and near infrared bands of LandSat Thematic Mapper(TM) sensor, some significant response values were observed, which indicates that the proposed method in this paper can be used to analysis the remotely sensed data from the space-boarded platform.
Zhang, Ni; Liu, Xu; Jin, Xiaoduo; Li, Chen; Wu, Xuan; Yang, Shuqin; Ning, Jifeng; Yanne, Paul
2017-12-15
Phenolics contents in wine grapes are key indicators for assessing ripeness. Near-infrared hyperspectral images during ripening have been explored to achieve an effective method for predicting phenolics contents. Principal component regression (PCR), partial least squares regression (PLSR) and support vector regression (SVR) models were built, respectively. The results show that SVR behaves globally better than PLSR and PCR, except in predicting tannins content of seeds. For the best prediction results, the squared correlation coefficient and root mean square error reached 0.8960 and 0.1069g/L (+)-catechin equivalents (CE), respectively, for tannins in skins, 0.9065 and 0.1776 (g/L CE) for total iron-reactive phenolics (TIRP) in skins, 0.8789 and 0.1442 (g/L M3G) for anthocyanins in skins, 0.9243 and 0.2401 (g/L CE) for tannins in seeds, and 0.8790 and 0.5190 (g/L CE) for TIRP in seeds. Our results indicated that NIR hyperspectral imaging has good prospects for evaluation of phenolics in wine grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Princz, S.; Wenzel, U.; Miller, R.; Hessling, M.
2014-11-01
One aerobic and four anaerobic batch fermentations of the yeast Saccharomyces cerevisiae were conducted in a stirred bioreactor and monitored inline by NIR spectroscopy and a transflectance dip probe. From the acquired NIR spectra, chemometric partial least squares regression (PLSR) models for predicting biomass, glucose and ethanol were constructed. The spectra were directly measured in the fermentation broth and successfully inspected for adulteration using our novel data pre-processing method. These adulterations manifested as strong fluctuations in the shape and offset of the absorption spectra. They resulted from cells, cell clusters, or gas bubbles intercepting the optical path of the dip probe. In the proposed data pre-processing method, adulterated signals are removed by passing the time-scanned non-averaged spectra through two filter algorithms with a 5% quantile cutoff. The filtered spectra containing meaningful data are then averaged. A second step checks whether the whole time scan is analyzable. If true, the average is calculated and used to prepare the PLSR models. This new method distinctly improved the prediction results. To dissociate possible correlations between analyte concentrations, such as glucose and ethanol, the feeding analytes were alternately supplied at different concentrations (spiking) at the end of the four anaerobic fermentations. This procedure yielded low-error (anaerobic) PLSR models for predicting analyte concentrations of 0.31 g/l for biomass, 3.41 g/l for glucose, and 2.17 g/l for ethanol. The maximum concentrations were 14 g/l biomass, 167 g/l glucose, and 80 g/l ethanol. Data from the aerobic fermentation, carried out under high agitation and high aeration, were incorporated to realize combined PLSR models, which have not been previously reported to our knowledge.
Naguib, Ibrahim A; Abdelrahman, Maha M; El Ghobashy, Mohamed R; Ali, Nesma A
2016-01-01
Two accurate, sensitive, and selective stability-indicating methods are developed and validated for simultaneous quantitative determination of agomelatine (AGM) and its forced degradation products (Deg I and Deg II), whether in pure forms or in pharmaceutical formulations. Partial least-squares regression (PLSR) and spectral residual augmented classical least-squares (SRACLS) are two chemometric models that are being subjected to a comparative study through handling UV spectral data in range (215-350 nm). For proper analysis, a three-factor, four-level experimental design was established, resulting in a training set consisting of 16 mixtures containing different ratios of interfering species. An independent test set consisting of eight mixtures was used to validate the prediction ability of the suggested models. The results presented indicate the ability of mentioned multivariate calibration models to analyze AGM, Deg I, and Deg II with high selectivity and accuracy. The analysis results of the pharmaceutical formulations were statistically compared to the reference HPLC method, with no significant differences observed regarding accuracy and precision. The SRACLS model gives comparable results to the PLSR model; however, it keeps the qualitative spectral information of the classical least-squares algorithm for analyzed components.
Chen, Baisheng; Wu, Huanan; Li, Sam Fong Yau
2014-03-01
To overcome the challenging task to select an appropriate pathlength for wastewater chemical oxygen demand (COD) monitoring with high accuracy by UV-vis spectroscopy in wastewater treatment process, a variable pathlength approach combined with partial-least squares regression (PLSR) was developed in this study. Two new strategies were proposed to extract relevant information of UV-vis spectral data from variable pathlength measurements. The first strategy was by data fusion with two data fusion levels: low-level data fusion (LLDF) and mid-level data fusion (MLDF). Predictive accuracy was found to improve, indicated by the lower root-mean-square errors of prediction (RMSEP) compared with those obtained for single pathlength measurements. Both fusion levels were found to deliver very robust PLSR models with residual predictive deviations (RPD) greater than 3 (i.e. 3.22 and 3.29, respectively). The second strategy involved calculating the slopes of absorbance against pathlength at each wavelength to generate slope-derived spectra. Without the requirement to select the optimal pathlength, the predictive accuracy (RMSEP) was improved by 20-43% as compared to single pathlength spectroscopy. Comparing to nine-factor models from fusion strategy, the PLSR model from slope-derived spectroscopy was found to be more parsimonious with only five factors and more robust with residual predictive deviation (RPD) of 3.72. It also offered excellent correlation of predicted and measured COD values with R(2) of 0.936. In sum, variable pathlength spectroscopy with the two proposed data analysis strategies proved to be successful in enhancing prediction performance of COD in wastewater and showed high potential to be applied in on-line water quality monitoring. Copyright © 2013 Elsevier B.V. All rights reserved.
Tchabo, William; Ma, Yongkun; Kwaw, Emmanuel; Zhang, Haining; Xiao, Lulu; Tahir, Haroon Elrasheid
2017-10-01
The present study was undertaken to assess accelerating aging effects of high pressure, ultrasound and manosonication on the aromatic profile and sensorial attributes of aged mulberry wines (AMW). A total of 166 volatile compounds were found amongst the AMW. The outcomes of the investigation were presented by means of geometric mean (GM), cluster analysis (CA), principal component analysis (PCA), partial least squares regressions (PLSR) and principal component regression (PCR). GM highlighted 24 organoleptic attributes responsible for the sensorial profile of the AMW. Moreover, CA revealed that the volatile composition of the non-thermal accelerated aged wines differs from that of the conventional aged wines. Besides, PCA discriminated the AMW on the basis of their main sensorial characteristics. Furthermore, PLSR identified 75 aroma compounds which were mainly responsible for the olfactory notes of the AMW. Finally, the overall quality of the AMW was noted to be better predicted by PLSR than PCR. Copyright © 2017 Elsevier Ltd. All rights reserved.
Oberg, Tomas
2004-01-01
Halogenated aliphatic compounds have many technical uses, but substances within this group are also ubiquitous environmental pollutants that can affect the ozone layer and contribute to global warming. The establishment of quantitative structure-property relationships is of interest not only to fill in gaps in the available database but also to validate experimental data already acquired. The three-dimensional structures of 240 compounds were modeled with molecular mechanics prior to the generation of empirical descriptors. Two bilinear projection methods, principal component analysis (PCA) and partial-least-squares regression (PLSR), were used to identify outliers. PLSR was subsequently used to build a multivariate calibration model by extracting the latent variables that describe most of the covariation between the molecular structure and the boiling point. Boiling points were also estimated with an extension of the group contribution method of Stein and Brown.
Computerized pigment design based on property hypersurfaces
NASA Astrophysics Data System (ADS)
Halova, Jaroslava; Sulcova, Petra; Kupka, Karel
2007-05-01
Competition is tough in the pigment market. Rational pigment design has therefore a competitive advantage, saving time and money. The aim of this work is to provide methods that can assist in designing pigments with defined properties. These methods include partial least squares regression (PLSR), neural network (NN) and generalized regression ANOVA model. Authors show how PLS bi-plot can be used to identify market gaps poorly covered by pigment manufacturers, thus giving an opportunity to develop pigments with potentially profitable properties.
NASA Astrophysics Data System (ADS)
Yan, B.; Fang, N. F.; Zhang, P. C.; Shi, Z. H.
2013-03-01
SummaryUnderstanding how changes in individual land use types influence the dynamics of streamflow and sediment yield would greatly improve the predictability of the hydrological consequences of land use changes and could thus help stakeholders to make better decisions. Multivariate statistics are commonly used to compare individual land use types to control the dynamics of streamflow or sediment yields. However, one issue with the use of conventional statistical methods to address relationships between land use types and streamflow or sediment yield is multicollinearity. In this study, an integrated approach involving hydrological modelling and partial least squares regression (PLSR) was used to quantify the contributions of changes in individual land use types to changes in streamflow and sediment yield. In a case study, hydrological modelling was conducted using land use maps from four time periods (1978, 1987, 1999, and 2007) for the Upper Du watershed (8973 km2) in China using the Soil and Water Assessment Tool (SWAT). Changes in streamflow and sediment yield across the two simulations conducted using the land use maps from 2007 to 1978 were found to be related to land use changes according to a PLSR, which was used to quantify the effect of this influence at the sub-basin scale. The major land use changes that affected streamflow in the studied catchment areas were related to changes in the farmland, forest and urban areas between 1978 and 2007; the corresponding regression coefficients were 0.232, -0.147 and 1.256, respectively, and the Variable Influence on Projection (VIP) was greater than 1. The dominant first-order factors affecting the changes in sediment yield in our study were: farmland (the VIP and regression coefficient were 1.762 and 14.343, respectively) and forest (the VIP and regression coefficient were 1.517 and -7.746, respectively). The PLSR methodology presented in this paper is beneficial and novel, as it partially eliminates the co-dependency of the variables and facilitates a more unbiased view of the contribution of the changes in individual land use types to changes in streamflow and sediment yield. This practicable and simple approach could be applied to a variety of other watersheds for which time-sequenced digital land use maps are available.
Estimation of Nitrogen Vertical Distribution by Bi-Directional Canopy Reflectance in Winter Wheat
Huang, Wenjiang; Yang, Qinying; Pu, Ruiliang; Yang, Shaoyuan
2014-01-01
Timely measurement of vertical foliage nitrogen distribution is critical for increasing crop yield and reducing environmental impact. In this study, a novel method with partial least square regression (PLSR) and vegetation indices was developed to determine optimal models for extracting vertical foliage nitrogen distribution of winter wheat by using bi-directional reflectance distribution function (BRDF) data. The BRDF data were collected from ground-based hyperspectral reflectance measurements recorded at the Xiaotangshan Precision Agriculture Experimental Base in 2003, 2004 and 2007. The view zenith angles (1) at nadir, 40° and 50°; (2) at nadir, 30° and 40°; and (3) at nadir, 20° and 30° were selected as optical view angles to estimate foliage nitrogen density (FND) at an upper, middle and bottom layer, respectively. For each layer, three optimal PLSR analysis models with FND as a dependent variable and two vegetation indices (nitrogen reflectance index (NRI), normalized pigment chlorophyll index (NPCI) or a combination of NRI and NPCI) at corresponding angles as explanatory variables were established. The experimental results from an independent model verification demonstrated that the PLSR analysis models with the combination of NRI and NPCI as the explanatory variables were the most accurate in estimating FND for each layer. The coefficients of determination (R2) of this model between upper layer-, middle layer- and bottom layer-derived and laboratory-measured foliage nitrogen density were 0.7335, 0.7336, 0.6746, respectively. PMID:25353983
Estimation of nitrogen vertical distribution by bi-directional canopy reflectance in winter wheat.
Huang, Wenjiang; Yang, Qinying; Pu, Ruiliang; Yang, Shaoyuan
2014-10-28
Timely measurement of vertical foliage nitrogen distribution is critical for increasing crop yield and reducing environmental impact. In this study, a novel method with partial least square regression (PLSR) and vegetation indices was developed to determine optimal models for extracting vertical foliage nitrogen distribution of winter wheat by using bi-directional reflectance distribution function (BRDF) data. The BRDF data were collected from ground-based hyperspectral reflectance measurements recorded at the Xiaotangshan Precision Agriculture Experimental Base in 2003, 2004 and 2007. The view zenith angles (1) at nadir, 40° and 50°; (2) at nadir, 30° and 40°; and (3) at nadir, 20° and 30° were selected as optical view angles to estimate foliage nitrogen density (FND) at an upper, middle and bottom layer, respectively. For each layer, three optimal PLSR analysis models with FND as a dependent variable and two vegetation indices (nitrogen reflectance index (NRI), normalized pigment chlorophyll index (NPCI) or a combination of NRI and NPCI) at corresponding angles as explanatory variables were established. The experimental results from an independent model verification demonstrated that the PLSR analysis models with the combination of NRI and NPCI as the explanatory variables were the most accurate in estimating FND for each layer. The coefficients of determination (R2) of this model between upper layer-, middle layer- and bottom layer-derived and laboratory-measured foliage nitrogen density were 0.7335, 0.7336, 0.6746, respectively.
Liang, Ningjian; Lu, Xiaonan; Hu, Yaxi; Kitts, David D
2016-01-27
The chlorogenic acid isomer profile and antioxidant activity of both green and roasted coffee beans are reported herein using ATR-FTIR spectroscopy combined with chemometric analyses. High-performance liquid chromatography (HPLC) quantified different chlorogenic acid isomer contents for reference, whereas ORAC, ABTS, and DPPH were used to determine the antioxidant activity of the same coffee bean extracts. FTIR spectral data and reference data of 42 coffee bean samples were processed to build optimized PLSR models, and 18 samples were used for external validation of constructed PLSR models. In total, six PLSR models were constructed for six chlorogenic acid isomers to predict content, with three PLSR models constructed to forecast the free radical scavenging activities, obtained using different chemical assays. In conclusion, FTIR spectroscopy, coupled with PLSR, serves as a reliable, nondestructive, and rapid analytical method to quantify chlorogenic acids and to assess different free radical-scavenging capacities in coffee beans.
NASA Astrophysics Data System (ADS)
Braga, Jez Willian Batista; Trevizan, Lilian Cristina; Nunes, Lidiane Cristina; Rufini, Iolanda Aparecida; Santos, Dário, Jr.; Krug, Francisco José
2010-01-01
The application of laser induced breakdown spectrometry (LIBS) aiming the direct analysis of plant materials is a great challenge that still needs efforts for its development and validation. In this way, a series of experimental approaches has been carried out in order to show that LIBS can be used as an alternative method to wet acid digestions based methods for analysis of agricultural and environmental samples. The large amount of information provided by LIBS spectra for these complex samples increases the difficulties for selecting the most appropriated wavelengths for each analyte. Some applications have suggested that improvements in both accuracy and precision can be achieved by the application of multivariate calibration in LIBS data when compared to the univariate regression developed with line emission intensities. In the present work, the performance of univariate and multivariate calibration, based on partial least squares regression (PLSR), was compared for analysis of pellets of plant materials made from an appropriate mixture of cryogenically ground samples with cellulose as the binding agent. The development of a specific PLSR model for each analyte and the selection of spectral regions containing only lines of the analyte of interest were the best conditions for the analysis. In this particular application, these models showed a similar performance, but PLSR seemed to be more robust due to a lower occurrence of outliers in comparison to the univariate method. Data suggests that efforts dealing with sample presentation and fitness of standards for LIBS analysis must be done in order to fulfill the boundary conditions for matrix independent development and validation.
Multivariate analysis of ATR-FTIR spectra for assessment of oil shale organic geochemical properties
Washburn, Kathryn E.; Birdwell, Justin E.
2013-01-01
In this study, attenuated total reflectance (ATR) Fourier transform infrared spectroscopy (FTIR) was coupled with partial least squares regression (PLSR) analysis to relate spectral data to parameters from total organic carbon (TOC) analysis and programmed pyrolysis to assess the feasibility of developing predictive models to estimate important organic geochemical parameters. The advantage of ATR-FTIR over traditional analytical methods is that source rocks can be analyzed in the laboratory or field in seconds, facilitating more rapid and thorough screening than would be possible using other tools. ATR-FTIR spectra, TOC concentrations and Rock–Eval parameters were measured for a set of oil shales from deposits around the world and several pyrolyzed oil shale samples. PLSR models were developed to predict the measured geochemical parameters from infrared spectra. Application of the resulting models to a set of test spectra excluded from the training set generated accurate predictions of TOC and most Rock–Eval parameters. The critical region of the infrared spectrum for assessing S1, S2, Hydrogen Index and TOC consisted of aliphatic organic moieties (2800–3000 cm−1) and the models generated a better correlation with measured values of TOC and S2 than did integrated aliphatic peak areas. The results suggest that combining ATR-FTIR with PLSR is a reliable approach for estimating useful geochemical parameters of oil shales that is faster and requires less sample preparation than current screening methods.
NASA Astrophysics Data System (ADS)
Arantes Camargo, Livia; Marques, José, Jr.
2015-04-01
The prediction of erodibility using indirect methods such as diffuse reflectance spectroscopy could facilitate the characterization of the spatial variability in large areas and optimize implementation of conservation practices. The aim of this study was to evaluate the prediction of interrill erodibility (Ki) and rill erodibility (Kr) by means of iron oxides content and soil color using multiple linear regression and diffuse reflectance spectroscopy (DRS) using regression analysis by least squares partial (PLSR). The soils were collected from three geomorphic surfaces and analyzed for chemical, physical and mineralogical properties, plus scanned in the spectral range from the visible and infrared. Maps of spatial distribution of Ki and Kr were built with the values calculated by the calibrated models that obtained the best accuracy using geostatistics. Interrill-rill erodibility presented negative correlation with iron extracted by dithionite-citrate-bicarbonate, hematite, and chroma, confirming the influence of iron oxides in soil structural stability. Hematite and hue were the attributes that most contributed in calibration models by multiple linear regression for the prediction of Ki (R2 = 0.55) and Kr (R2 = 0.53). The diffuse reflectance spectroscopy via PLSR allowed to predict Interrill-rill erodibility with high accuracy (R2adj = 0.76, 0.81 respectively and RPD> 2.0) in the range of the visible spectrum (380-800 nm) and the characterization of the spatial variability of these attributes by geostatistics.
Radioecological modelling of Polonium-210 and Caesium-137 in lichen-reindeer-man and top predators.
Persson, Bertil R R; Gjelsvik, Runhild; Holm, Elis
2018-06-01
This work deals with analysis and modelling of the radionuclides 210 Pb and 210 Po in the food-chain lichen-reindeer-man in addition to 210 Po and 137 Cs in top predators. By using the methods of Partial Least Square Regression (PLSR) the atmospheric deposition of 210 Pb and 210 Po is predicted at the sample locations. Dynamic modelling of the activity concentration with differential equations is fitted to the sample data. Reindeer lichen consumption, gastrointestinal absorption, organ distribution and elimination is derived from information in the literature. Dynamic modelling of transfer of 210 Pb and 210 Po to reindeer meat, liver and bone from lichen consumption, fitted well with data from Sweden and Finland from 1966 to 1971. The activity concentration of 210 Pb in the skeleton in man is modelled by using the results of studying the kinetics of lead in skeleton and blood in lead-workers after end of occupational exposure. The result of modelling 210 Pb and 210 Po activity in skeleton matched well with concentrations of 210 Pb and 210 Po in teeth from reindeer-breeders and autopsy bone samples in Finland. The results of 210 Po and 137 Cs in different tissues of wolf, wolverine and lynx previously published, are analysed with multivariate data processing methods such as Principal Component Analysis PCA, and modelled with the method of Projection to Latent Structures, PLS, or Partial Least Square Regression PLSR. Copyright © 2017 Elsevier Ltd. All rights reserved.
Fusing face-verification algorithms and humans.
O'Toole, Alice J; Abdi, Hervé; Jiang, Fang; Phillips, P Jonathon
2007-10-01
It has been demonstrated recently that state-of-the-art face-recognition algorithms can surpass human accuracy at matching faces over changes in illumination. The ranking of algorithms and humans by accuracy, however, does not provide information about whether algorithms and humans perform the task comparably or whether algorithms and humans can be fused to improve performance. In this paper, we fused humans and algorithms using partial least square regression (PLSR). In the first experiment, we applied PLSR to face-pair similarity scores generated by seven algorithms participating in the Face Recognition Grand Challenge. The PLSR produced an optimal weighting of the similarity scores, which we tested for generality with a jackknife procedure. Fusing the algorithms' similarity scores using the optimal weights produced a twofold reduction of error rate over the most accurate algorithm. Next, human-subject-generated similarity scores were added to the PLSR analysis. Fusing humans and algorithms increased the performance to near-perfect classification accuracy. These results are discussed in terms of maximizing face-verification accuracy with hybrid systems consisting of multiple algorithms and humans.
Palomba, M. Lia; Piersanti, Kelly; Ziegler, Carly G. K.; Decker, Hugo; Cotari, Jesse W.; Bantilan, Kurt; Rijo, Ivelise; Gardner, Jeff R.; Heaney, Mark; Bemis, Debra; Balderas, Robert; Malek, Sami N.; Seymour, Erlene; Zelenetz, Andrew D.
2014-01-01
Purpose Chronic Lymphocytic Leukemia (CLL) is defined by a perturbed B-cell receptor-mediated signaling machinery. We aimed to model differential signaling behavior between B cells from CLL and healthy individuals to pinpoint modes of dysregulation. Experimental Design We developed an experimental methodology combining immunophenotyping, multiplexed phosphospecific flow cytometry, and multifactorial statistical modeling. Utilizing patterns of signaling network covariance, we modeled BCR signaling in 67 CLL patients using Partial Least Squares Regression (PLSR). Results from multidimensional modeling were validated using an independent test cohort of 38 patients. Results We identified a dynamic and variable imbalance between proximal (pSYK, pBTK) and distal (pPLCγ2, pBLNK, ppERK) phosphoresponses. PLSR identified the relationship between upstream tyrosine kinase SYK and its target, PLCγ2, as maximally predictive and sufficient to distinguish CLL from healthy samples, pointing to this juncture in the signaling pathway as a hallmark of CLL B cells. Specific BCR pathway signaling signatures that correlate with the disease and its degree of aggressiveness were identified. Heterogeneity in the PLSR response variable within the B cell population is both a characteristic mark of healthy samples and predictive of disease aggressiveness. Conclusion Single-cell multidimensional analysis of BCR signaling permitted focused analysis of the variability and heterogeneity of signaling behavior from patient-to-patient, and from cell-to-cell. Disruption of the pSYK/pPLCγ2 relationship is uncovered as a robust hallmark of CLL B cell signaling behavior. Together, these observations implicate novel elements of the BCR signal transduction as potential therapeutic targets. PMID:24489640
2009-01-01
Background Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle. Methods Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls. Results For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy. All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time. Conclusions The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended. PMID:20043835
NASA Astrophysics Data System (ADS)
Chen, Hui; Tan, Chao; Lin, Zan; Wu, Tong
2018-01-01
Milk is among the most popular nutrient source worldwide, which is of great interest due to its beneficial medicinal properties. The feasibility of the classification of milk powder samples with respect to their brands and the determination of protein concentration is investigated by NIR spectroscopy along with chemometrics. Two datasets were prepared for experiment. One contains 179 samples of four brands for classification and the other contains 30 samples for quantitative analysis. Principal component analysis (PCA) was used for exploratory analysis. Based on an effective model-independent variable selection method, i.e., minimal-redundancy maximal-relevance (MRMR), only 18 variables were selected to construct a partial least-square discriminant analysis (PLS-DA) model. On the test set, the PLS-DA model based on the selected variable set was compared with the full-spectrum PLS-DA model, both of which achieved 100% accuracy. In quantitative analysis, the partial least-square regression (PLSR) model constructed by the selected subset of 260 variables outperforms significantly the full-spectrum model. It seems that the combination of NIR spectroscopy, MRMR and PLS-DA or PLSR is a powerful tool for classifying different brands of milk and determining the protein content.
NASA Astrophysics Data System (ADS)
Lorenzi, Marco; Simpson, Ivor J.; Mendelson, Alex F.; Vos, Sjoerd B.; Cardoso, M. Jorge; Modat, Marc; Schott, Jonathan M.; Ourselin, Sebastien
2016-04-01
The joint analysis of brain atrophy measured with magnetic resonance imaging (MRI) and hypometabolism measured with positron emission tomography with fluorodeoxyglucose (FDG-PET) is of primary importance in developing models of pathological changes in Alzheimer’s disease (AD). Most of the current multimodal analyses in AD assume a local (spatially overlapping) relationship between MR and FDG-PET intensities. However, it is well known that atrophy and hypometabolism are prominent in different anatomical areas. The aim of this work is to describe the relationship between atrophy and hypometabolism by means of a data-driven statistical model of non-overlapping intensity correlations. For this purpose, FDG-PET and MRI signals are jointly analyzed through a computationally tractable formulation of partial least squares regression (PLSR). The PLSR model is estimated and validated on a large clinical cohort of 1049 individuals from the ADNI dataset. Results show that the proposed non-local analysis outperforms classical local approaches in terms of predictive accuracy while providing a plausible description of disease dynamics: early AD is characterised by non-overlapping temporal atrophy and temporo-parietal hypometabolism, while the later disease stages show overlapping brain atrophy and hypometabolism spread in temporal, parietal and cortical areas.
Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston
2016-10-28
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
Madhavan, Dinesh B; Baldock, Jeff A; Read, Zoe J; Murphy, Simon C; Cunningham, Shaun C; Perring, Michael P; Herrmann, Tim; Lewis, Tom; Cavagnaro, Timothy R; England, Jacqueline R; Paul, Keryn I; Weston, Christopher J; Baker, Thomas G
2017-05-15
Reforestation of agricultural lands with mixed-species environmental plantings can effectively sequester C. While accurate and efficient methods for predicting soil organic C content and composition have recently been developed for soils under agricultural land uses, such methods under forested land uses are currently lacking. This study aimed to develop a method using infrared spectroscopy for accurately predicting total organic C (TOC) and its fractions (particulate, POC; humus, HOC; and resistant, ROC organic C) in soils under environmental plantings. Soils were collected from 117 paired agricultural-reforestation sites across Australia. TOC fractions were determined in a subset of 38 reforested soils using physical fractionation by automated wet-sieving and 13 C nuclear magnetic resonance (NMR) spectroscopy. Mid- and near-infrared spectra (MNIRS, 6000-450 cm -1 ) were acquired from finely-ground soils from environmental plantings and agricultural land. Satisfactory prediction models based on MNIRS and partial least squares regression (PLSR) were developed for TOC and its fractions. Leave-one-out cross-validations of MNIRS-PLSR models indicated accurate predictions (R 2 > 0.90, negligible bias, ratio of performance to deviation > 3) and fraction-specific functional group contributions to beta coefficients in the models. TOC and its fractions were predicted using the cross-validated models and soil spectra for 3109 reforested and agricultural soils. The reliability of predictions determined using k-nearest neighbour score distance indicated that >80% of predictions were within the satisfactory inlier limit. The study demonstrated the utility of infrared spectroscopy (MNIRS-PLSR) to rapidly and economically determine TOC and its fractions and thereby accurately describe the effects of land use change such as reforestation on agricultural soils. Copyright © 2017 Elsevier Ltd. All rights reserved.
Discrimination of serum Raman spectroscopy between normal and colorectal cancer
NASA Astrophysics Data System (ADS)
Li, Xiaozhou; Yang, Tianyue; Yu, Ting; Li, Siqi
2011-07-01
Raman spectroscopy of tissues has been widely studied for the diagnosis of various cancers, but biofluids were seldom used as the analyte because of the low concentration. Herein, serum of 30 normal people, 46 colon cancer, and 44 rectum cancer patients were measured Raman spectra and analyzed. The information of Raman peaks (intensity and width) and that of the fluorescence background (baseline function coefficients) were selected as parameters for statistical analysis. Principal component regression (PCR) and partial least square regression (PLSR) were used on the selected parameters separately to see the performance of the parameters. PCR performed better than PLSR in our spectral data. Then linear discriminant analysis (LDA) was used on the principal components (PCs) of the two regression method on the selected parameters, and a diagnostic accuracy of 88% and 83% were obtained. The conclusion is that the selected features can maintain the information of original spectra well and Raman spectroscopy of serum has the potential for the diagnosis of colorectal cancer.
Pérez-Castaño, Estefanía; Sánchez-Viñas, Mercedes; Gázquez-Evangelista, Domingo; Bagur-González, M Gracia
2018-01-15
This paper describes and discusses the application of trimethylsilyl (TMS)-4,4'-desmethylsterols derivatives chromatographic fingerprints (obtained from an off-line HPLC-GC-FID system) for the quantification of extra virgin olive oil in commercial vinaigrettes, dressing salad and in-house reference materials (i-HRM) using two different Partial Least Square-Regression (PLS-R) multivariate quantification methods. Different data pre-processing strategies were carried out being the whole one: (i) internal normalization; (ii) sampling based on The Nyquist Theorem; (iii) internal correlation optimized shifting, icoshift; (iv) baseline correction (v) mean centering and (vi) selecting zones. The first model corresponds to a matrix of dimensions 'n×911' variables and the second one to a matrix of dimensions 'n×431' variables. It has to be highlighted that the proposed two PLS-R models allow the quantification of extra virgin olive oil in binary blends, foodstuffs, etc., when the provided percentage is greater than 25%. Copyright © 2017 Elsevier Ltd. All rights reserved.
Zhang, Yong-Hong; Xia, Zhi-Ning; Qin, Li-Tang; Liu, Shu-Shen
2010-09-01
The objective of this paper is to build a reliable model based on the molecular electronegativity distance vector (MEDV) descriptors for predicting the blood-brain barrier (BBB) permeability and to reveal the effects of the molecular structural segments on the BBB permeability. Using 70 structurally diverse compounds, the partial least squares regression (PLSR) models between the BBB permeability and the MEDV descriptors were developed and validated by the variable selection and modeling based on prediction (VSMP) technique. The estimation ability, stability, and predictive power of a model are evaluated by the estimated correlation coefficient (r), leave-one-out (LOO) cross-validation correlation coefficient (q), and predictive correlation coefficient (R(p)). It has been found that PLSR model has good quality, r=0.9202, q=0.7956, and R(p)=0.6649 for M1 model based on the training set of 57 samples. To search the most important structural factors affecting the BBB permeability of compounds, we performed the values of the variable importance in projection (VIP) analysis for MEDV descriptors. It was found that some structural fragments in compounds, such as -CH(3), -CH(2)-, =CH-, =C, triple bond C-, -CH<, =C<, =N-, -NH-, =O, and -OH, are the most important factors affecting the BBB permeability. (c) 2010. Published by Elsevier Inc.
Canopy Spectral Reflectance as a Predictor of Soil Water Potential in Rice
NASA Astrophysics Data System (ADS)
Panigrahi, N.; Das, B. S.
2018-04-01
Soil water potential (SWP) is a key parameter for characterizing water stress. Typically, a tensiometer is used to measure SWP. However, the measurement range for commercially available tensiometers is limited to -90 kPa and a tensiometer can only provide estimate of SWP at a single location. In this study, a new approach was developed for estimating SWP from spectral reflectance data of a standing rice crop over the visible to shortwave-infrared region (wavelength: 350-2,500 nm). Five water stress treatments corresponding to targeted SWP of -30, -50, -70, -120, and -140 kPa were examined by withholding irrigation during the vegetative growth stage of three rice varieties. Tensiometers and mechanistic water flow model were used for monitoring SWP. Spectral models for SWP were developed using partial-least-squares regression (PLSR), support vector regression (SVR), and coupled PLSR and feature selection (PLSRFS) approaches. Results showed that the SVR approach was the best model for estimating SWP from spectral reflectance data with the coefficient of determination values of 0.71 and 0.55 for the calibration and validation data sets, respectively. Observed root-mean-squared residuals for the predicted SWPs were in the range of -7 to -19 kPa. A new spectral water stress index was also developed using the reflectance values at 745 and 2,002 nm, which showed strong correlation with relative water contents and electrolyte leakage. This new approach is rapid and noninvasive and may be used for estimating SWP over large areas.
NASA Astrophysics Data System (ADS)
Yan, Ling; Liu, Changhong; Qu, Hao; Liu, Wei; Zhang, Yan; Yang, Jianbo; Zheng, Lei
2018-03-01
Terahertz (THz) technique, a recently developed spectral method, has been researched and used for the rapid discrimination and measurements of food compositions due to its low-energy and non-ionizing characteristics. In this study, THz spectroscopy combined with chemometrics has been utilized for qualitative and quantitative analysis of myricetin, quercetin, and kaempferol with concentrations of 0.025, 0.05, and 0.1 mg/mL. The qualitative discrimination was achieved by KNN, ELM, and RF models with the spectra pre-treatments. An excellent discrimination (100% CCR in the prediction set) could be achieved using the RF model. Furthermore, the quantitative analyses were performed by partial least square regression (PLSR) and least squares support vector machine (LS-SVM). Comparing to the PLSR models, the LS-SVM yielded better results with low RMSEP (0.0044, 0.0039, and 0.0048), higher Rp (0.9601, 0.9688, and 0.9359), and higher RPD (8.6272, 9.6333, and 7.9083) for myricetin, quercetin, and kaempferol, respectively. Our results demonstrate that THz spectroscopy technique is a powerful tool for identification of three flavonols with similar chemical structures and quantitative determination of their concentrations.
Tan, Jin; Li, Rong; Jiang, Zi-Tao; Tang, Shu-Hua; Wang, Ying; Shi, Meng; Xiao, Yi-Qian; Jia, Bin; Lu, Tian-Xiang; Wang, Hao
2017-02-15
Synchronous front-face fluorescence spectroscopy has been developed for the discrimination of used frying oil (UFO) from edible vegetable oil (EVO), the estimation of the using time of UFO, and the determination of the adulteration of EVO with UFO. Both the heating time of laboratory prepared UFO and the adulteration of EVO with UFO could be determined by partial least squares regression (PLSR). To simulate the EVO adulteration with UFO, for each kind of oil, fifty adulterated samples at the adulterant amounts range of 1-50% were prepared. PLSR was then adopted to build the model and both full (leave-one-out) cross-validation and external validation were performed to evaluate the predictive ability. Under the optimum condition, the plots of observed versus predicted values exhibited high linearity (R(2)>0.96). The root mean square error of cross-validation (RMSECV) and root mean square error of prediction (RMSEP) were both lower than 3%. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kang, Bo-Sik; Lee, Jang-Eun; Park, Hyun-Jin
2014-05-15
A commercial electronic tongue was used to discriminate Korean rice wines (makgeolli) brewed from nine cultivars of rice with different amino acid and fatty acid compositions. The E-tongue was applied to establish prediction models with sensory evaluation or LC-MS/MS by partial least squares regression (PLSR). All makgeollis were classified into three groups by principal components analysis, and the separation pattern was affected by rice qualities and yeast fermentation. Makgeolli taste changed from the complicated comprising sweetness, saltiness, and umami to the uncomplicated, such as bitterness and then, sourness, with a decrease of amino acids and fatty acids in the rice. The quantitative correlation between E-tongue and sensory scores or LC-MS/MS by PLSR demonstrated that E-tongue could well predict most of the sensory attributes with relatively acceptable r(2), except for bitterness, but could not predict most of the chemical compounds responsible for taste attributes, except for ribose, lactate, succinate, and tryptophan. Copyright © 2013 Elsevier Ltd. All rights reserved.
Liu, Xuesong; Wu, Chunyan; Geng, Shu; Jin, Ye; Luan, Lianjun; Chen, Yong; Wu, Yongjiang
2015-01-01
This paper used near-infrared (NIR) spectroscopy for the on-line quantitative monitoring of water precipitation during Danhong injection. For these NIR measurements, two fiber optic probes designed to transmit NIR radiation through a 2 mm flow cell were used to collect spectra in real-time. Partial least squares regression (PLSR) was developed as the preferred chemometrics quantitative analysis of the critical intermediate qualities: the danshensu (DSS, (R)-3, 4-dihydroxyphenyllactic acid), protocatechuic aldehyde (PA), rosmarinic acid (RA), and salvianolic acid B (SAB) concentrations. Optimized PLSR models were successfully built and used for on-line detecting of the concentrations of DSS, PA, RA, and SAB of water precipitation during Danhong injection. Besides, the information of DSS, PA, RA, and SAB concentrations would be instantly fed back to site technical personnel for control and adjustment timely. The verification experiments determined that the predicted values agreed with the actual homologic value.
Mercury and water level fluctuations in lakes of northern Minnesota
Larson, James H.; Maki, Ryan P; Christensen, Victoria G.; Sandheinrich, Mark B.; LeDuc, Jaime F.; Kissane, Claire; Knights, Brent C.
2017-01-01
Large lake ecosystems support a variety of ecosystem services in surrounding communities, including recreational and commercial fishing. However, many northern temperate fisheries are contaminated by mercury. Annual variation in mercury accumulation in fish has previously been linked to water level (WL) fluctuations, opening the possibility of regulating water levels in a manner that minimizes or reduces mercury contamination in fisheries. Here, we compiled a long-term dataset (1997-2015) of mercury content in young-of-year Yellow Perch (Perca flavescens) from six lakes on the border between the U.S. and Canada and examined whether mercury content appeared to be related to several metrics of WL fluctuation (e.g., spring WL rise, annual maximum WL, and year-to-year change in maximum WL). Using simple correlation analysis, several WL metrics appear to be strongly correlated to Yellow Perch mercury content, although the strength of these correlations varies by lake. We also used many WL metrics, water quality measurements, temperature and annual deposition data to build predictive models using partial least squared regression (PLSR) analysis for each lake. These PLSR models showed some variation among lakes, but also supported strong associations between WL fluctuations and annual variation in Yellow Perch mercury content. The study lakes underwent a modest change in WL management in 2000, when winter WL minimums were increased by about 1 m in five of the six study lakes. Using the PLSR models, we estimated how this change in WL management would have affected Yellow Perch mercury content. For four of the study lakes, the change in WL management that occurred in 2000 likely reduced Yellow Perch mercury content, relative to the previous WL management regime.
NASA Astrophysics Data System (ADS)
Lespinats, S.; Meyer-Bäse, Anke; He, Huan; Marshall, Alan G.; Conrad, Charles A.; Emmett, Mark R.
2009-05-01
Partial Least Square Regression (PLSR) and Data-Driven High Dimensional Scaling (DD-HDS) are employed for the prediction and the visualization of changes in polar lipid expression induced by different combinations of wild-type (wt) p53 gene therapy and SN38 chemotherapy of U87 MG glioblastoma cells. A very detailed analysis of the gangliosides reveals that certain gangliosides of GM3 or GD1-type have unique properties not shared by the others. In summary, this preliminary work shows that data mining techniques are able to determine the modulation of gangliosides by different treatment combinations.
Wenjun, Ji; Zhou, Shi; Jingyi, Huang; Shuo, Li
2014-01-01
In situ measurements with visible and near-infrared spectroscopy (vis-NIR) provide an efficient way for acquiring soil information of paddy soils in the short time gap between the harvest and following rotation. The aim of this study was to evaluate its feasibility to predict a series of soil properties including organic matter (OM), organic carbon (OC), total nitrogen (TN), available nitrogen (AN), available phosphorus (AP), available potassium (AK) and pH of paddy soils in Zhejiang province, China. Firstly, the linear partial least squares regression (PLSR) was performed on the in situ spectra and the predictions were compared to those with laboratory-based recorded spectra. Then, the non-linear least-square support vector machine (LS-SVM) algorithm was carried out aiming to extract more useful information from the in situ spectra and improve predictions. Results show that in terms of OC, OM, TN, AN and pH, (i) the predictions were worse using in situ spectra compared to laboratory-based spectra with PLSR algorithm (ii) the prediction accuracy using LS-SVM (R2>0.75, RPD>1.90) was obviously improved with in situ vis-NIR spectra compared to PLSR algorithm, and comparable or even better than results generated using laboratory-based spectra with PLSR; (iii) in terms of AP and AK, poor predictions were obtained with in situ spectra (R2<0.5, RPD<1.50) either using PLSR or LS-SVM. The results highlight the use of LS-SVM for in situ vis-NIR spectroscopic estimation of soil properties of paddy soils. PMID:25153132
Wang, Jie; Shen, Changwei; Liu, Na; Jin, Xin; Fan, Xueshan; Dong, Caixia; Xu, Yangchun
2017-03-08
Non-destructive and timely determination of leaf nitrogen (N) concentration is urgently needed for N management in pear orchards. A two-year field experiment was conducted in a commercial pear orchard with five N application rates: 0 (N0), 165 (N1), 330 (N2), 660 (N3), and 990 (N4) kg·N·ha -1 . The mid-portion leaves on the year's shoot were selected for the spectral measurement first and then N concentration determination in the laboratory at 50 and 80 days after full bloom (DAB). Three methods of in-field spectral measurement (25° bare fibre under solar conditions, black background attached to plant probe, and white background attached to plant probe) were compared. We also investigated the modelling performances of four chemometric techniques (principal components regression, PCR; partial least squares regression, PLSR; stepwise multiple linear regression, SMLR; and back propagation neural network, BPNN) and three vegetation indices (difference spectral index, normalized difference spectral index, and ratio spectral index). Due to the low correlation of reflectance obtained by the 25° field of view method, all of the modelling was performed on two spectral datasets-both acquired by a plant probe. Results showed that the best modelling and prediction accuracy were found in the model established by PLSR and spectra measured with a black background. The randomly-separated subsets of calibration ( n = 1000) and validation ( n = 420) of this model resulted in high R² values of 0.86 and 0.85, respectively, as well as a low mean relative error (<6%). Furthermore, a higher coefficient of determination between the leaf N concentration and fruit yield was found at 50 DAB samplings in both 2015 (R² = 0.77) and 2014 (R² = 0.59). Thus, the leaf N concentration was suggested to be determined at 50 DAB by visible/near-infrared spectroscopy and the threshold should be 24-27 g/kg.
Han, Fu Liang; Li, Zheng; Xu, Yan
2015-12-01
Monomeric anthocyanin contributions to young red wine color were investigated using partial least square regression (PLSR) and aqueous alcohol solutions in this study. Results showed that the correlation between the anthocyanin concentration and the solution color fitted in a quadratic regression rather than linear or cubic regression. Malvidin-3-O-glucoside was estimated to show the highest contribution to young red wine color according to its concentration in wine, whereas peonidin-3-O-glucoside in its concentration contributed the least. The PLSR suggested that delphinidin-3-O-glucoside and peonidin-3-O-glucoside under the same concentration resulted in a stronger color of young red wine compared with malvidin-3-O-glucoside. These estimates were further confirmed by their color in aqueous alcohol solutions. These results suggested that delphinidin-3-O-glucoside and peonidin-3-O-glucoside were primary anthocyanins to enhance young red wine color by increasing their concentrations. This study could provide an alternative approach to improve young red wine color by adjusting anthocyanin composition and concentration. © 2015 Institute of Food Technologists®
NASA Astrophysics Data System (ADS)
Pullanagari, R. R.; Kereszturi, Gábor; Yule, I. J.
2016-07-01
On-farm assessment of mixed pasture nutrient concentrations is important for animal production and pasture management. Hyperspectral imaging is recognized as a potential tool to quantify the nutrient content of vegetation. However, it is a great challenge to estimate macro and micro nutrients in heterogeneous mixed pastures. In this study, canopy reflectance data was measured by using a high resolution airborne visible-to-shortwave infrared (Vis-SWIR) imaging spectrometer measuring in the wavelength region 380-2500 nm to predict nutrient concentrations, nitrogen (N) phosphorus (P), potassium (K), sulfur (S), zinc (Zn), sodium (Na), manganese (Mn) copper (Cu) and magnesium (Mg) in heterogeneous mixed pastures across a sheep and beef farm in hill country, within New Zealand. Prediction models were developed using four different methods which are included partial least squares regression (PLSR), kernel PLSR, support vector regression (SVR), random forest regression (RFR) algorithms and their performance compared using the test data. The results from the study revealed that RFR produced highest accuracy (0.55 ⩽ R2CV ⩽ 0.78; 6.68% ⩽ nRMSECV ⩽ 26.47%) compared to all other algorithms for the majority of nutrients (N, P, K, Zn, Na, Cu and Mg) described, and the remaining nutrients (S and Mn) were predicted with high accuracy (0.68 ⩽ R2CV ⩽ 0.86; 13.00% ⩽ nRMSECV ⩽ 14.64%) using SVR. The best training models were used to extrapolate over the whole farm with the purpose of predicting those pasture nutrients and expressed through pixel based spatial maps. These spatially registered nutrient maps demonstrate the range and geographical location of often large differences in pasture nutrient values which are normally not measured and therefore not included in decision making when considering more effective ways to utilized pasture.
Cao, Xueren; Luo, Yong; Zhou, Yilin; Fan, Jieru; Xu, Xiangming; West, Jonathan S.; Duan, Xiayu; Cheng, Dengfa
2015-01-01
To determine the influence of plant density and powdery mildew infection of winter wheat and to predict grain yield, hyperspectral canopy reflectance of winter wheat was measured for two plant densities at Feekes growth stage (GS) 10.5.3, 10.5.4, and 11.1 in the 2009–2010 and 2010–2011 seasons. Reflectance in near infrared (NIR) regions was significantly correlated with disease index at GS 10.5.3, 10.5.4, and 11.1 at two plant densities in both seasons. For the two plant densities, the area of the red edge peak (Σdr 680–760 nm), difference vegetation index (DVI), and triangular vegetation index (TVI) were significantly correlated negatively with disease index at three GSs in two seasons. Compared with other parameters Σdr 680–760 nm was the most sensitive parameter for detecting powdery mildew. Linear regression models relating mildew severity to Σdr 680–760 nm were constructed at three GSs in two seasons for the two plant densities, demonstrating no significant difference in the slope estimates between the two plant densities at three GSs. Σdr 680–760 nm was correlated with grain yield at three GSs in two seasons. The accuracies of partial least square regression (PLSR) models were consistently higher than those of models based on Σdr 680760 nm for disease index and grain yield. PLSR can, therefore, provide more accurate estimation of disease index of wheat powdery mildew and grain yield using canopy reflectance. PMID:25815468
Sakudo, Akikazu; Kato, Yukiko Hakariya; Kuratsune, Hirohiko; Ikuta, Kazuyoshi
2009-10-01
After blood donation, in some individuals having polycythemia, dehydration causes anemia. Although the hematocrit (Ht) level is closely related to anemia, the current method of measuring Ht is performed after blood drawing. Furthermore, the monitoring of Ht levels contributes to a healthy life. Therefore, a non-invasive test for Ht is warranted for the safe donation of blood and good quality of life. A non-invasive procedure for the prediction of hematocrit levels was developed on the basis of a chemometric analysis of visible and near-infrared (Vis-NIR) spectra of the thumbs using portable spectrophotometer. Transmittance spectra in the 600- to 1100-nm region from thumbs of Japanese volunteers were subjected to a partial least squares regression (PLSR) analysis and leave-out cross-validation to develop chemometric models for predicting Ht levels. Ht levels of masked samples predicted by this model from Vis-NIR spectra provided a coefficient of determination in prediction of 0.6349 with a standard error of prediction of 3.704% and a detection limit in prediction of 17.14%, indicating that the model is applicable for normal and abnormal value in Ht level. These results suggest portable Vis-NIR spectrophotometer to have potential for the non-invasive measurement of Ht levels with a combination of PLSR analysis.
Nondestructive detection of pork quality based on dual-band VIS/NIR spectroscopy
NASA Astrophysics Data System (ADS)
Wang, Wenxiu; Peng, Yankun; Li, Yongyu; Tang, Xiuying; Liu, Yuanyuan
2015-05-01
With the continuous development of living standards and the relative change of dietary structure, consumers' rising and persistent demand for better quality of meat is emphasized. Colour, pH value, and cooking loss are important quality attributes when evaluating meat. To realize nondestructive detection of multi-parameter of meat quality simultaneously is popular in production and processing of meat and meat products. The objectives of this research were to compare the effectiveness of two bands for rapid nondestructive and simultaneous detection of pork quality attributes. Reflectance spectra of 60 chilled pork samples were collected from a dual-band visible/near-infrared spectroscopy system which covered 350-1100 nm and 1000-2600 nm. Then colour, pH value and cooking loss were determined by standard methods as reference values. Standard normal variables transform (SNVT) was employed to eliminate the spectral noise. A spectrum connection method was put forward for effective integration of the dual-band spectrum to make full use of the whole efficient information. Partial least squares regression (PLSR) and Principal component analysis (PCA) were applied to establish prediction models using based on single-band spectrum and dual-band spectrum, respectively. The experimental results showed that the PLSR model based on dual-band spectral information was superior to the models based on single band spectral information with lower root means quare error (RMSE) and higher accuracy. The PLSR model based on dual-band (use the overlapping part of first band) yielded the best prediction result with correlation coefficient of validation (Rv) of 0.9469, 0.9495, 0.9180, 0.9054 and 0.8789 for L*, a*, b*, pH value and cooking loss, respectively. This mainly because dual-band spectrum can provide sufficient and comprehensive information which reflected the quality attributes. Data fusion from dual-band spectrum could significantly improve pork quality parameters prediction performance. The research also indicated that multi-band spectral information fusion has potential to comprehensively evaluate other quality and safety attributes of pork.
Li, Shuifang; Zhang, Xin; Shan, Yang; Su, Donglin; Ma, Qiang; Wen, Ruizhi; Li, Jiaojuan
2017-03-01
Near-infrared spectroscopy (NIR) was used for qualitative and quantitative detection of honey adulterated with high-fructose corn syrup (HFCS) or maltose syrup (MS). Competitive adaptive reweighted sampling (CARS) was employed to select key variables. Partial least squares linear discriminant analysis (PLS-LDA) was adopted to classify the adulterated honey samples. The CARS-PLS-LDA models showed an accuracy of 86.3% (honey vs. adulterated honey with HFCS) and 96.1% (honey vs. adulterated honey with MS), respectively. PLS regression (PLSR) was used to predict the extent of adulteration in the honeys. The results showed that NIR combined with PLSR could not be used to quantify adulteration with HFCS, but could be used to quantify adulteration with MS: coefficient (R p 2 ) and root mean square of prediction (RMSEP) were 0.901 and 4.041 for MS-adulterated samples from different floral origins, and 0.981 and 1.786 for MS-adulterated samples from the same floral origin (Brassica spp.), respectively. Copyright © 2016. Published by Elsevier Ltd.
Tahir, Haroon Elrasheid; Xiaobo, Zou; Zhihua, Li; Jiyong, Shi; Zhai, Xiaodong; Wang, Sheng; Mariod, Abdalbasit Adam
2017-07-01
Fourier transform infrared with attenuated total reflectance (FTIR-ATR) and Raman spectroscopy combined with partial least square regression (PLSR) were applied for the prediction of phenolic compounds and antioxidant activity in honey. Standards of catechin, syringic, vanillic, and chlorogenic acids were used for the identification and quantification of the individual phenolic compounds in six honey varieties using HPLC-DAD. Total antioxidant activity (TAC) and ferrous chelating capacity were measured spectrophotometrically. For the establishment of PLSR model, Raman spectra with Savitzky-Golay smoothing in wavenumber region 1500-400cm -1 was used while for FTIR-ATR the wavenumber regions of 1800-700 and 3000-2800cm -1 with multiplicative scattering correction (MSC) and Savitzky-Golay smoothing were used. The determination coefficients (R 2 ) were ranged from 0.9272 to 0.9992 for Raman while from 0.9461 to 0.9988 for FTIT-ART. The FTIR-ATR and Raman demonstrated to be simple, rapid and nondestructive methods to quantify phenolic compounds and antioxidant activities in honey. Copyright © 2017 Elsevier Ltd. All rights reserved.
Liu, Jinbao; Han, Jichang; Zhang, Yang; Wang, Huanyuan; Kong, Hui; Shi, Lei
2018-06-05
The storage of soil organic carbon (SOC) should improve soil fertility. Conventional determination of SOC is expensive and tedious. Visible-near infrared reflectance spectroscopy is a practical and cost-effective approach that has been successfully used SOC concentration. Soil spectral inversion model could quickly and efficiently determine SOC content. This paper presents a study dealing with SOC estimation through the combination of soil spectroscopy and stepwise multiple linear regression (SMLR), partial least squares regression (PLSR), principal component regression (PCR). Spectral measurements for 106 soil samples were acquired using an ASD FieldSpec 4 standard-res spectroradiometer (350-2500 nm). Six types of transformations and three regression methods were applied to build for the quantification of different parent materials development soil. The results show that (1)the basaltic volcanic clastics development of SOC spectral response bands located in 500 nm, 800 nm; Trachyte spectral response of the soil quality, and the volcanic clastics development at 405 nm, 465 nm, 575 nm, 1105 nm. (2) Basaltic volcanic debris soil development, first deviation of maximum correlation coefficient is 0.8898; thick surface soil of the development of rocky volcanic debris from bottom reflectivity logarithm of first deviation of maximum correlation coefficient is 0.9029. (3) Soil organic matter content of basaltic volcanic clastics development optimal prediction model based on spectral reflectance inverse logarithms of first deviation of SMLR. Independent variable number is 7, Rv 2 = 0.9720, RMSEP = 2.0590, sig = 0.003. Trachyte qualitative volcanic clastics developed soil organic matter content of the optimal prediction model based on spectral reflectance inverse logarithms of first deviation of PLSR. Model number of the independent variables Pc = 5, Rc = 0.9872, Rc 2 = 0.9745, RMSEC = 0.4821, SEC = 0.4906, forecasts determine coefficient Rv 2 = 0.9702, RMSEP = 0.9563, SEP = 0.9711, Bias = 0.0637. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.
2014-12-01
Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.
Nie, Pengcheng; Wu, Di; Sun, Da-Wen; Cao, Fang; Bao, Yidan; He, Yong
2013-01-01
Notoginseng is a classical traditional Chinese medical herb, which is of high economic and medical value. Notoginseng powder (NP) could be easily adulterated with Sophora flavescens powder (SFP) or corn flour (CF), because of their similar tastes and appearances and much lower cost for these adulterants. The objective of this study is to quantify the NP content in adulterated NP by using a rapid and non-destructive visible and near infrared (Vis-NIR) spectroscopy method. Three wavelength ranges of visible spectra, short-wave near infrared spectra (SNIR) and long-wave near infrared spectra (LNIR) were separately used to establish the model based on two calibration methods of partial least square regression (PLSR) and least-squares support vector machines (LS-SVM), respectively. Competitive adaptive reweighted sampling (CARS) was conducted to identify the most important wavelengths/variables that had the greatest influence on the adulterant quantification throughout the whole wavelength range. The CARS-PLSR models based on LNIR were determined as the best models for the quantification of NP adulterated with SFP, CF, and their mixtures, in which the rP values were 0.940, 0.939, and 0.867 for the three models respectively. The research demonstrated the potential of the Vis-NIR spectroscopy technique for the rapid and non-destructive quantification of NP containing adulterants. PMID:24129019
Qualitative Analysis of Dairy and Powder Milk Using Laser-Induced Breakdown Spectroscopy (LIBS).
Alfarraj, Bader A; Sanghapi, Herve K; Bhatt, Chet R; Yueh, Fang Y; Singh, Jagdish P
2018-01-01
Laser-induced breakdown spectroscopy (LIBS) technique was used to compare various types of commercial milk products. Laser-induced breakdown spectroscopy spectra were investigated for the determination of the elemental composition of soy and rice milk powder, dairy milk, and lactose-free dairy milk. The analysis was performed using radiative transitions. Atomic emissions from Ca, K, Na, and Mg lines observed in LIBS spectra of dairy milk were compared. In addition, proteins and fat level in milks can be determined using molecular emissions such as CN bands. Ca concentrations were calculated to be 2.165 ± 0.203 g/L in 1% of dairy milk fat samples and 2.809 ± 0.172 g/L in 2% of dairy milk fat samples using the standard addition method (SAM) with LIBS spectra. Univariate and multivariate statistical analysis methods showed that the contents of major mineral elements were higher in lactose-free dairy milk than those in dairy milk. The principal component analysis (PCA) method was used to discriminate four milk samples depending on their mineral elements concentration. In addition, proteins and fat level in dairy milks were determined using molecular emissions such as CN band. We applied partial least squares regression (PLSR) and simple linear regression (SLR) models to predict levels of milk fat in dairy milk samples. The PLSR model was successfully used to predict levels of milk fat in dairy milk sample with the relative accuracy (RA%) less than 6.62% using CN (0,0) band.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xi; Tang, Jianwu; Mustard, John F.
Understanding the temporal patterns of leaf traits is critical in determining the seasonality and magnitude of terrestrial carbon, water, and energy fluxes. However, we lack robust and efficient ways to monitor the temporal dynamics of leaf traits. Here we assessed the potential of leaf spectroscopy to predict and monitor leaf traits across their entire life cycle at different forest sites and light environments (sunlit vs. shaded) using a weekly sampled dataset across the entire growing season at two temperate deciduous forests. In addition, the dataset includes field measured leaf-level directional-hemispherical reflectance/transmittance together with seven important leaf traits [total chlorophyll (chlorophyllmore » a and b), carotenoids, mass-based nitrogen concentration (N mass), mass-based carbon concentration (C mass), and leaf mass per area (LMA)]. All leaf traits varied significantly throughout the growing season, and displayed trait-specific temporal patterns. We used a Partial Least Square Regression (PLSR) modeling approach to estimate leaf traits from spectra, and found that PLSR was able to capture the variability across time, sites, and light environments of all leaf traits investigated (R 2 = 0.6–0.8 for temporal variability; R 2 = 0.3–0.7 for cross-site variability; R 2 = 0.4–0.8 for variability from light environments). We also tested alternative field sampling designs and found that for most leaf traits, biweekly leaf sampling throughout the growing season enabled accurate characterization of the seasonal patterns. Compared with the estimation of foliar pigments, the performance of N mass, C mass and LMA PLSR models improved more significantly with sampling frequency. Our results demonstrate that leaf spectra-trait relationships vary with time, and thus tracking the seasonality of leaf traits requires statistical models calibrated with data sampled throughout the growing season. In conclusion, our results have broad implications for future research that use vegetation spectra to infer leaf traits at different growing stages.« less
Yang, Xi; Tang, Jianwu; Mustard, John F.; ...
2016-04-02
Understanding the temporal patterns of leaf traits is critical in determining the seasonality and magnitude of terrestrial carbon, water, and energy fluxes. However, we lack robust and efficient ways to monitor the temporal dynamics of leaf traits. Here we assessed the potential of leaf spectroscopy to predict and monitor leaf traits across their entire life cycle at different forest sites and light environments (sunlit vs. shaded) using a weekly sampled dataset across the entire growing season at two temperate deciduous forests. In addition, the dataset includes field measured leaf-level directional-hemispherical reflectance/transmittance together with seven important leaf traits [total chlorophyll (chlorophyllmore » a and b), carotenoids, mass-based nitrogen concentration (N mass), mass-based carbon concentration (C mass), and leaf mass per area (LMA)]. All leaf traits varied significantly throughout the growing season, and displayed trait-specific temporal patterns. We used a Partial Least Square Regression (PLSR) modeling approach to estimate leaf traits from spectra, and found that PLSR was able to capture the variability across time, sites, and light environments of all leaf traits investigated (R 2 = 0.6–0.8 for temporal variability; R 2 = 0.3–0.7 for cross-site variability; R 2 = 0.4–0.8 for variability from light environments). We also tested alternative field sampling designs and found that for most leaf traits, biweekly leaf sampling throughout the growing season enabled accurate characterization of the seasonal patterns. Compared with the estimation of foliar pigments, the performance of N mass, C mass and LMA PLSR models improved more significantly with sampling frequency. Our results demonstrate that leaf spectra-trait relationships vary with time, and thus tracking the seasonality of leaf traits requires statistical models calibrated with data sampled throughout the growing season. In conclusion, our results have broad implications for future research that use vegetation spectra to infer leaf traits at different growing stages.« less
Liu, Mingyue; Du, Baojia; Zhang, Bai
2018-01-01
Soil salinity and sodicity can significantly reduce the value and the productivity of affected lands, posing degradation, and threats to sustainable development of natural resources on earth. This research attempted to map soil salinity/sodicity via disentangling the relationships between Landsat 8 Operational Land Imager (OLI) imagery and in-situ measurements (EC, pH) over the west Jilin of China. We established the retrieval models for soil salinity and sodicity using Partial Least Square Regression (PLSR). Spatial distribution of the soils that were subjected to hybridized salinity and sodicity (HSS) was obtained by overlay analysis using maps of soil salinity and sodicity in geographical information system (GIS) environment. We analyzed the severity and occurring sizes of soil salinity, sodicity, and HSS with regard to specified soil types and land cover. Results indicated that the models’ accuracy was improved by combining the reflectance bands and spectral indices that were mathematically transformed. Therefore, our results stipulated that the OLI imagery and PLSR method applied to mapping soil salinity and sodicity in the region. The mapping results revealed that the areas of soil salinity, sodicity, and HSS were 1.61 × 106 hm2, 1.46 × 106 hm2, and 1.36 × 106 hm2, respectively. Also, the occurring area of moderate and intensive sodicity was larger than that of salinity. This research may underpin efficiently mapping regional salinity/sodicity occurrences, understanding the linkages between spectral reflectance and ground measurements of soil salinity and sodicity, and provide tools for soil salinity monitoring and the sustainable utilization of land resources. PMID:29614727
Janik, Leslie J; Forrester, Sean T; Soriano-Disla, José M; Kirby, Jason K; McLaughlin, Michael J; Reimann, Clemens
2015-02-01
The authors' aim was to develop rapid and inexpensive regression models for the prediction of partitioning coefficients (Kd), defined as the ratio of the total or surface-bound metal/metalloid concentration of the solid phase to the total concentration in the solution phase. Values of Kd were measured for boric acid (B[OH]3(0)) and selected added soluble oxoanions: molybdate (MoO4(2-)), antimonate (Sb[OH](6-)), selenate (SeO4(2-)), tellurate (TeO4(2-)) and vanadate (VO4(3-)). Models were developed using approximately 500 spectrally representative soils of the Geochemical Mapping of Agricultural Soils of Europe (GEMAS) program. These calibration soils represented the major properties of the entire 4813 soils of the GEMAS project. Multiple linear regression (MLR) from soil properties, partial least-squares regression (PLSR) using mid-infrared diffuse reflectance Fourier-transformed (DRIFT) spectra, and models using DRIFT spectra plus analytical pH values (DRIFT + pH), were compared with predicted log K(d + 1) values. Apart from selenate (R(2) = 0.43), the DRIFT + pH calibrations resulted in marginally better models to predict log K(d + 1) values (R(2) = 0.62-0.79), compared with those from PSLR-DRIFT (R(2) = 0.61-0.72) and MLR (R(2) = 0.54-0.79). The DRIFT + pH calibrations were applied to the prediction of log K(d + 1) values in the remaining 4313 soils. An example map of predicted log K(d + 1) values for added soluble MoO4(2-) in soils across Europe is presented. The DRIFT + pH PLSR models provided a rapid and inexpensive tool to assess the risk of mobility and potential availability of boric acid and selected oxoanions in European soils. For these models to be used in the prediction of log K(d + 1) values in soils globally, additional research will be needed to determine if soil variability is accounted on the calibration. © 2014 SETAC.
NASA Astrophysics Data System (ADS)
Liu, Ronghua; Sun, Qiaofeng; Hu, Tian; Li, Lian; Nie, Lei; Wang, Jiayue; Zhou, Wanhui; Zang, Hengchang
2018-03-01
As a powerful process analytical technology (PAT) tool, near infrared (NIR) spectroscopy has been widely used in real-time monitoring. In this study, NIR spectroscopy was applied to monitor multi-parameters of traditional Chinese medicine (TCM) Shenzhiling oral liquid during the concentration process to guarantee the quality of products. Five lab scale batches were employed to construct quantitative models to determine five chemical ingredients and physical change (samples density) during concentration process. The paeoniflorin, albiflorin, liquiritin and samples density were modeled by partial least square regression (PLSR), while the content of the glycyrrhizic acid and cinnamic acid were modeled by support vector machine regression (SVMR). Standard normal variate (SNV) and/or Savitzkye-Golay (SG) smoothing with derivative methods were adopted for spectra pretreatment. Variable selection methods including correlation coefficient (CC), competitive adaptive reweighted sampling (CARS) and interval partial least squares regression (iPLS) were performed for optimizing the models. The results indicated that NIR spectroscopy was an effective tool to successfully monitoring the concentration process of Shenzhiling oral liquid.
Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers
2010-01-01
Background At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI). Methods Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length. Results RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls. Conclusions Accurate genomic evaluation of the broader bull and cow population can be achieved with a single genotyping assays containing ~ 3,000 to 5,000 evenly spaced SNP. PMID:20950478
Structure-activity relationships between sterols and their thermal stability in oil matrix.
Hu, Yinzhou; Xu, Junli; Huang, Weisu; Zhao, Yajing; Li, Maiquan; Wang, Mengmeng; Zheng, Lufei; Lu, Baiyi
2018-08-30
Structure-activity relationships between 20 sterols and their thermal stabilities were studied in a model oil system. All sterol degradations were found to be consistent with a first-order kinetic model with determination of coefficient (R 2 ) higher than 0.9444. The number of double bonds in the sterol structure was negatively correlated with the thermal stability of sterol, whereas the length of the branch chain was positively correlated with the thermal stability of sterol. A quantitative structure-activity relationship (QSAR) model to predict thermal stability of sterol was developed by using partial least squares regression (PLSR) combined with genetic algorithm (GA). A regression model was built with R 2 of 0.806. Almost all sterol degradation constants can be predicted accurately with R 2 of cross-validation equals to 0.680. Four important variables were selected in optimal QSAR model and the selected variables were observed to be related with information indices, RDF descriptors, and 3D-MoRSE descriptors. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Tuukkanen, T.; Marttila, H.; Kløve, B.
2017-07-01
Organic matter and nutrient export from drained peatlands is affected by complex hydrological and biogeochemical interactions. Here partial least squares regression (PLSR) was used to relate various soil and catchment characteristics to variations in chemical oxygen demand (COD), total nitrogen (TN), and total phosphorus (TP) concentrations in runoff. Peat core samples and water quality data were collected from 15 peat extraction sites in Finland. PLSR models constructed by cross-validation and variable selection routines predicted 92, 88, and 95% of the variation in mean COD, TN, and TP concentration in runoff, respectively. The results showed that variations in COD were mainly related to net production (temperature and water-extractable dissolved organic carbon (DOC)), hydrology (topographical relief), and solubility of dissolved organic matter (peat sulfur (S) and calcium (Ca) concentrations). Negative correlations for peat S and runoff COD indicated that acidity from oxidation of organic S stored in peat may be an important mechanism suppressing organic matter leaching. Moreover, runoff COD was associated with peat aluminum (Al), P, and sodium (Na) concentrations. Hydrological controls on TN and COD were similar (i.e., related to topography), whereas degree of humification, bulk density, and water-extractable COD and Al provided additional explanations for TN concentration. Variations in runoff TP concentration were attributed to erosion of particulate P, as indicated by a positive correlation with suspended sediment concentration (SSC), and factors associated with metal-humic complexation and P adsorption (peat Al, water-extractable P, and water-extractable iron (Fe)).
NASA Astrophysics Data System (ADS)
Liu, Yande; Ying, Yibin; Lu, Huishan; Fu, Xiaping
2005-11-01
A new method is proposed to eliminate the varying background and noise simultaneously for multivariate calibration of Fourier transform near infrared (FT-NIR) spectral signals. An ideal spectrum signal prototype was constructed based on the FT-NIR spectrum of fruit sugar content measurement. The performances of wavelet based threshold de-noising approaches via different combinations of wavelet base functions were compared. Three families of wavelet base function (Daubechies, Symlets and Coiflets) were applied to estimate the performance of those wavelet bases and threshold selection rules by a series of experiments. The experimental results show that the best de-noising performance is reached via the combinations of Daubechies 4 or Symlet 4 wavelet base function. Based on the optimization parameter, wavelet regression models for sugar content of pear were also developed and result in a smaller prediction error than a traditional Partial Least Squares Regression (PLSR) mode.
Yao, Mingyin; Yang, Hui; Huang, Lin; Chen, Tianbing; Rao, Gangfu; Liu, Muhua
2017-05-10
In seeking a novel method with the ability of green analysis in monitoring toxic heavy metals residue in fresh leafy vegetables, laser-induced breakdown spectroscopy (LIBS) was applied to prove its capability in performing this work. The spectra of fresh vegetable samples polluted in the lab were collected by optimized LIBS experimental setup, and the reference concentrations of cadmium (Cd) from samples were obtained by conventional atomic absorption spectroscopy after wet digestion. The direct calibration employing intensity of single Cd line and Cd concentration exposed the weakness of this calibration method. Furthermore, the accuracy of linear calibration can be improved a little by triple Cd lines as characteristic variables, especially after the spectra were pretreated. However, it is not enough in predicting Cd in samples. Therefore, partial least-squares regression (PLSR) was utilized to enhance the robustness of quantitative analysis. The results of the PLSR model showed that the prediction accuracy of the Cd target can meet the requirement of determination in food safety. This investigation presented that LIBS is a promising and emerging method in analyzing toxic compositions in agricultural products, especially combined with suitable chemometrics.
Kocaoglu-Vurma, N A; Eliardi, A; Drake, M A; Rodriguez-Saona, L E; Harper, W J
2009-08-01
The acceptability of cheese depends largely on the flavor formed during ripening. The flavor profiles of cheeses are complex and region- or manufacturer-specific which have made it challenging to understand the chemistry of flavor development and its correlation with sensory properties. Infrared spectroscopy is an attractive technology for the rapid, sensitive, and high-throughput analysis of foods, providing information related to its composition and conformation of food components from the spectra. Our objectives were to establish infrared spectral profiles to discriminate Swiss cheeses produced by different manufacturers in the United States and to develop predictive models for determination of sensory attributes based on infrared spectra. Fifteen samples from 3 Swiss cheese manufacturers were received and analyzed using attenuated total reflectance infrared spectroscopy (ATR-IR). The spectra were analyzed using soft independent modeling of class analogy (SIMCA) to build a classification model. The cheeses were profiled by a trained sensory panel using descriptive sensory analysis. The relationship between the descriptive sensory scores and ATR-IR spectra was assessed using partial least square regression (PLSR) analysis. SIMCA discriminated the Swiss cheeses based on manufacturer and production region. PLSR analysis generated prediction models with correlation coefficients of validation (rVal) between 0.69 and 0.96 with standard error of cross-validation (SECV) ranging from 0.04 to 0.29. Implementation of rapid infrared analysis by the Swiss cheese industry would help to streamline quality assurance.
Lee, Byeong-Ju; Kim, Hye-Youn; Lim, Sa Rang; Huang, Linfang; Choi, Hyung-Kyoon
2017-01-01
Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values.
Lim, Sa Rang; Huang, Linfang
2017-01-01
Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values. PMID:29049369
Fanesi, Andrea; Wagner, Heiko; Wilhelm, Christian
2017-02-08
Climate change has a strong impact on phytoplankton communities and water quality. However, the development of robust techniques to assess phytoplankton growth is still in progress. In this study, the growth rate of phytoplankton cells grown at different temperatures was modelled based on conventional physiological traits (e.g. chlorophyll, carbon and photosynthetic parameters) using the partial least square regression (PLSR) algorithm and compared with a new approach combining Fourier transform infrared-spectroscopy and PLSR. In this second model, it is assumed that the macromolecular composition of phytoplankton cells represents an intracellular marker for growth. The models have comparable high predictive power (R 2 > 0.8) and low error in predicting new observations. Interestingly, not all of the predictors present the same weight in the modelling of growth rate. A set of specific parameters, such as non-photochemical fluorescence quenching (NPQ) and the quantum yield of carbon production in the first model, and lipid, protein and carbohydrate contents for the second one, strongly covary with cell growth rate regardless of the taxonomic position of the phytoplankton species investigated. This reflects a set of specific physiological adjustments covarying with growth rate, conserved among taxonomically distant algal species that might be used as guidelines for the improvement of modern primary production models. The high predictive power of both sets of cellular traits for growth rate is of great importance for applied phycological studies. Our approach may find application as a quality control tool for the monitoring of phytoplankton populations in natural communities or in photobioreactors. © 2017 The Author(s).
NASA Astrophysics Data System (ADS)
Qiao, T.; Ren, J.; Craigie, C.; Zabalza, J.; Maltin, Ch.; Marshall, S.
2015-03-01
It is well known that the eating quality of beef has a significant influence on the repurchase behavior of consumers. There are several key factors that affect the perception of quality, including color, tenderness, juiciness, and flavor. To support consumer repurchase choices, there is a need for an objective measurement of quality that could be applied to meat prior to its sale. Objective approaches such as offered by spectral technologies may be useful, but the analytical algorithms used remain to be optimized. For visible and near infrared (VISNIR) spectroscopy, Partial Least Squares Regression (PLSR) is a widely used technique for meat related quality modeling and prediction. In this paper, a Support Vector Machine (SVM) based machine learning approach is presented to predict beef eating quality traits. Although SVM has been successfully used in various disciplines, it has not been applied extensively to the analysis of meat quality parameters. To this end, the performance of PLSR and SVM as tools for the analysis of meat tenderness is evaluated, using a large dataset acquired under industrial conditions. The spectral dataset was collected using VISNIR spectroscopy with the wavelength ranging from 350 to 1800 nm on 234 beef M. longissimus thoracis steaks from heifers, steers, and young bulls. As the dimensionality with the VISNIR data is very high (over 1600 spectral bands), the Principal Component Analysis (PCA) technique was applied for feature extraction and data reduction. The extracted principal components (less than 100) were then used for data modeling and prediction. The prediction results showed that SVM has a greater potential to predict beef eating quality than PLSR, especially for the prediction of tenderness. The infl uence of animal gender on beef quality prediction was also investigated, and it was found that beef quality traits were predicted most accurately in beef from young bulls.
Retrieval and Mapping of Heavy Metal Concentration in Soil Using Time Series Landsat 8 Imagery
NASA Astrophysics Data System (ADS)
Fang, Y.; Xu, L.; Peng, J.; Wang, H.; Wong, A.; Clausi, D. A.
2018-04-01
Heavy metal pollution is a critical global environmental problem which has always been a concern. Traditional approach to obtain heavy metal concentration relying on field sampling and lab testing is expensive and time consuming. Although many related studies use spectrometers data to build relational model between heavy metal concentration and spectra information, and then use the model to perform prediction using the hyperspectral imagery, this manner can hardly quickly and accurately map soil metal concentration of an area due to the discrepancies between spectrometers data and remote sensing imagery. Taking the advantage of easy accessibility of Landsat 8 data, this study utilizes Landsat 8 imagery to retrieve soil Cu concentration and mapping its distribution in the study area. To enlarge the spectral information for more accurate retrieval and mapping, 11 single date Landsat 8 imagery from 2013-2017 are selected to form a time series imagery. Three regression methods, partial least square regression (PLSR), artificial neural network (ANN) and support vector regression (SVR) are used to model construction. By comparing these models unbiasedly, the best model are selected to mapping Cu concentration distribution. The produced distribution map shows a good spatial autocorrelation and consistency with the mining area locations.
Egg embryo development detection with hyperspectral imaging
NASA Astrophysics Data System (ADS)
Lawrence, Kurt C.; Smith, Douglas P.; Windham, William R.; Heitschmidt, Gerald W.; Park, Bosoon
2006-10-01
In the U. S. egg industry, anywhere from 130 million to over one billion infertile eggs are incubated each year. Some of these infertile eggs explode in the hatching cabinet and can potentially spread molds or bacteria to all the eggs in the cabinet. A method to detect the embryo development of incubated eggs was developed. Twelve brown-shell hatching eggs from two replicates (n=24) were incubated and imaged to identify embryo development. A hyperspectral imaging system was used to collect transmission images from 420 to 840 nm of brown-shell eggs positioned with the air cell vertical and normal to the camera lens. Raw transmission images from about 400 to 900 nm were collected for every egg on days 0, 1, 2, and 3 of incubation. A total of 96 images were collected and eggs were broken out on day 6 to determine fertility. After breakout, all eggs were found to be fertile. Therefore, this paper presents results for egg embryo development, not fertility. The original hyperspectral data and spectral means for each egg were both used to create embryo development models. With the hyperspectral data range reduced to about 500 to 700 nm, a minimum noise fraction transformation was used, along with a Mahalanobis Distance classification model, to predict development. Days 2 and 3 were all correctly classified (100%), while day 0 and day 1 were classified at 95.8% and 91.7%, respectively. Alternatively, the mean spectra from each egg were used to develop a partial least squares regression (PLSR) model. First, a PLSR model was developed with all eggs and all days. The data were multiplicative scatter corrected, spectrally smoothed, and the wavelength range was reduced to 539 - 770 nm. With a one-out cross validation, all eggs for all days were correctly classified (100%). Second, a PLSR model was developed with data from day 0 and day 3, and the model was validated with data from day 1 and 2. For day 1, 22 of 24 eggs were correctly classified (91.7%) and for day 2, all eggs were correctly classified (100%). Although the results are based on relatively small sample sizes, they are encouraging. However, larger sample sizes, from multiple flocks, will be needed to fully validate and verify these models. Additionally, future experiments must also include non-fertile eggs so the fertile / non-fertile effect can be determined.
New type of dry substances content meter using microwaves for application in biogas plants.
Nacke, Thomas; Brückner, Kathleen; Göller, Arndt; Kaufhold, Sebastian; Nakos, Xenia; Noack, Stephan; Stöber, Heinrich; Beckmann, Dieter
2005-11-01
Dry substances (DS) are an important index for monitoring and controlling anaerobic co-digestion in biogas plants. We have developed and tested an online meter that measures suspended solids by means of the reflection coefficient of an exiting microwave signal, which is dependent on the dielectric properties of the suspensions. Intelligent models based on partial least squares regression (PLSR) and artificial neural network (ANN) for calibration allow exact and reproducible measurements under different circumstances. This measuring method is appropriate for contactless and online measurements of dry substance contents in biogas plants in a large range from 2-14%.
Hyperspectral imaging technique for determination of pork freshness attributes
NASA Astrophysics Data System (ADS)
Li, Yongyu; Zhang, Leilei; Peng, Yankun; Tang, Xiuying; Chao, Kuanglin; Dhakal, Sagar
2011-06-01
Freshness of pork is an important quality attribute, which can vary greatly in storage and logistics. The specific objectives of this research were to develop a hyperspectral imaging system to predict pork freshness based on quality attributes such as total volatile basic-nitrogen (TVB-N), pH value and color parameters (L*,a*,b*). Pork samples were packed in seal plastic bags and then stored at 4°C. Every 12 hours. Hyperspectral scattering images were collected from the pork surface at the range of 400 nm to 1100 nm. Two different methods were performed to extract scattering feature spectra from the hyperspectral scattering images. First, the spectral scattering profiles at individual wavelengths were fitted accurately by a three-parameter Lorentzian distribution (LD) function; second, reflectance spectra were extracted from the scattering images. Partial Least Square Regression (PLSR) method was used to establish prediction models to predict pork freshness. The results showed that the PLSR models based on reflectance spectra was better than combinations of LD "parameter spectra" in prediction of TVB-N with a correlation coefficient (r) = 0.90, a standard error of prediction (SEP) = 7.80 mg/100g. Moreover, a prediction model for pork freshness was established by using a combination of TVB-N, pH and color parameters. It could give a good prediction results with r = 0.91 for pork freshness. The research demonstrated that hyperspectral scattering technique is a valid tool for real-time and nondestructive detection of pork freshness.
Oxidative Stress in Wild Boars Naturally and Experimentally Infected with Mycobacterium bovis
Gassó, Diana; Vicente, Joaquín; Mentaberre, Gregorio; Soriguer, Ramón; Jiménez Rodríguez, Rocío; Navarro-González, Nora; Tvarijonaviciute, Asta; Lavín, Santiago; Fernández-Llario, Pedro; Segalés, Joaquim; Serrano, Emmanuel
2016-01-01
Reactive oxygen and nitrogen species (ROS-RNS) are important defence substances involved in the immune response against pathogens. An excessive increase in ROS-RNS, however, can damage the organism causing oxidative stress (OS). The organism is able to neutralise OS by the production of antioxidant enzymes (AE); hence, tissue damage is the result of an imbalance between oxidant and antioxidant status. Though some work has been carried out in humans, there is a lack of information about the oxidant/antioxidant status in the presence of tuberculosis (TB) in wild reservoirs. In the Mediterranean Basin, wild boar (Sus scrofa) is the main reservoir of TB. Wild boar showing severe TB have an increased risk to Mycobacterium spp. shedding, leading to pathogen spreading and persistence. If OS is greater in these individuals, oxidant/antioxidant balance in TB-affected boars could be used as a biomarker of disease severity. The present work had a two-fold objective: i) to study the effects of bovine TB on different OS biomarkers (namely superoxide dismutase (SOD), catalasa (CAT), glutathione peroxidase (GPX), glutathione reductase (GR) and thiobarbituric acid reactive substances (TBARS)) in wild boar experimentally challenged with Mycobacterium bovis, and ii) to explore the role of body weight, sex, population and season in explaining the observed variability of OS indicators in two populations of free-ranging wild boar where TB is common. For the first objective, a partial least squares regression (PLSR) approach was used whereas, recursive partitioning with regression tree models (RTM) were applied for the second. A negative relationship between antioxidant enzymes and bovine TB (the more severe lesions, the lower the concentration of antioxidant biomarkers) was observed in experimentally infected animals. The final PLSR model retained the GPX, SOD and GR biomarkers and showed that 17.6% of the observed variability of antioxidant capacity was significantly correlated with the PLSR X’s component represented by both disease status and the age of boars. In the samples from free-ranging wild boar, however, the environmental factors were more relevant to the observed variability of the OS biomarkers than the TB itself. For each OS biomarker, each RTM was defined as a maximum by one node due to the population effect. Along the same lines, the ad hoc tree regression on boars from the population with a higher prevalence of severe TB confirmed that disease status was not the main factor explaining the observed variability in OS biomarkers. It was concluded that oxidative damage caused by TB is significant, but can only be detected in the absence of environmental variation in wild boar. PMID:27682987
Katseanes, Chelsea K; Chappell, Mark A; Hopkins, Bryan G; Durham, Brian D; Price, Cynthia L; Porter, Beth E; Miller, Lesley F
2016-11-01
After nearly a century of use in numerous munition platforms, TNT and RDX contamination has turned up largely in the environment due to ammunition manufacturing or as part of releases from low-order detonations during training activities. Although the basic knowledge governing the environmental fate of TNT and RDX are known, accurate predictions of TNT and RDX persistence in soil remain elusive, particularly given the universal heterogeneity of pedomorphic soil types. In this work, we proposed a new solution for modeling the sorption and persistence of these munition constituents as multivariate mathematical functions correlating soil attribute data over a variety of taxonomically distinct soil types to contaminant behavior, instead of a single constant or parameter of a specific absolute value. To test this idea, we conducted experiments measuring the sorption of TNT and RDX on taxonomically different soil types that were extensively physical and chemically characterized. Statistical decomposition of the log-transformed, and auto-scaled soil characterization data using the dimension-reduction technique PCA (principal component analysis) revealed a strong latent structure based in the multiple pairwise correlations among the soil properties. TNT and RDX sorption partitioning coefficients (KD-TNT and KD-RDX) were regressed against this latent structure using partial least squares regression (PLSR), generating a 3-factor, multivariate linear functions. Here, PLSR models predicted KD-TNT and KD-RDX values based on attributes contributing to endogenous alkaline/calcareous and soil fertility criteria, respectively, exhibited among the different soil types: We hypothesized that the latent structure arising from the strong covariance of full multivariate geochemical matrix describing taxonomically distinguished soil types may provide the means for potentially predicting complex phenomena in soils. The development of predictive multivariate models tuned to a local soil's taxonomic designation would have direct benefit to military range managers seeking to anticipate the environmental risks of training activities on impact sites. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Shi, Tiezhu; Wang, Junjie; Chen, Yiyun; Wu, Guofeng
2016-10-01
Visible and near-infrared reflectance spectroscopy provides a beneficial tool for investigating soil heavy metal contamination. This study aimed to investigate mechanisms of soil arsenic prediction using laboratory based soil and leaf spectra, compare the prediction of arsenic content using soil spectra with that using rice plant spectra, and determine whether the combination of both could improve the prediction of soil arsenic content. A total of 100 samples were collected and the reflectance spectra of soils and rice plants were measured using a FieldSpec3 portable spectroradiometer (350-2500 nm). After eliminating spectral outliers, the reflectance spectra were divided into calibration (n = 62) and validation (n = 32) data sets using the Kennard-Stone algorithm. Genetic algorithm (GA) was used to select useful spectral variables for soil arsenic prediction. Thereafter, the GA-selected spectral variables of the soil and leaf spectra were individually and jointly employed to calibrate the partial least squares regression (PLSR) models using the calibration data set. The regression models were validated and compared using independent validation data set. Furthermore, the correlation coefficients of soil arsenic against soil organic matter, leaf arsenic and leaf chlorophyll were calculated, and the important wavelengths for PLSR modeling were extracted. Results showed that arsenic prediction using the leaf spectra (coefficient of determination in validation, Rv2 = 0.54; root mean square error in validation, RMSEv = 12.99 mg kg-1; and residual prediction deviation in validation, RPDv = 1.35) was slightly better than using the soil spectra (Rv2 = 0.42, RMSEv = 13.35 mg kg-1, and RPDv = 1.31). However, results also showed that the combinational use of soil and leaf spectra resulted in higher arsenic prediction (Rv2 = 0.63, RMSEv = 11.94 mg kg-1, RPDv = 1.47) compared with either soil or leaf spectra alone. Soil spectral bands near 480, 600, 670, 810, 1980, 2050 and 2290 nm, leaf spectral bands near 700, 890 and 900 nm in PLSR models were important wavelengths for soil arsenic prediction. Moreover, soil arsenic showed significantly positive correlations with soil organic matter (r = 0.62, p < 0.01) and leaf arsenic (r = 0.77, p < 0.01), and a significantly negative correlation with leaf chlorophyll (r = -0.67, p < 0.01). The results showed that the prediction of arsenic contents using soil and leaf spectra may be based on their relationships with soil organic matter and leaf chlorophyll contents, respectively. Although RPD of 1.47 was below the recommended RPD of >2 for soil analysis, arsenic prediction in agricultural soils can be improved by combining the leaf and soil spectra.
Quantitative determination and classification of energy drinks using near-infrared spectroscopy.
Rácz, Anita; Héberger, Károly; Fodor, Marietta
2016-09-01
Almost a hundred commercially available energy drink samples from Hungary, Slovakia, and Greece were collected for the quantitative determination of their caffeine and sugar content with FT-NIR spectroscopy and high-performance liquid chromatography (HPLC). Calibration models were built with partial least-squares regression (PLSR). An HPLC-UV method was used to measure the reference values for caffeine content, while sugar contents were measured with the Schoorl method. Both the nominal sugar content (as indicated on the cans) and the measured sugar concentration were used as references. Although the Schoorl method has larger error and bias, appropriate models could be developed using both references. The validation of the models was based on sevenfold cross-validation and external validation. FT-NIR analysis is a good candidate to replace the HPLC-UV method, because it is much cheaper than any chromatographic method, while it is also more time-efficient. The combination of FT-NIR with multidimensional chemometric techniques like PLSR can be a good option for the detection of low caffeine concentrations in energy drinks. Moreover, three types of energy drinks that contain (i) taurine, (ii) arginine, and (iii) none of these two components were classified correctly using principal component analysis and linear discriminant analysis. Such classifications are important for the detection of adulterated samples and for quality control, as well. In this case, more than a hundred samples were used for the evaluation. The classification was validated with cross-validation and several randomization tests (X-scrambling). Graphical Abstract The way of energy drinks from cans to appropriate chemometric models.
Dawson, Neil; Thompson, Rhiannon J.; McVie, Allan; Thomson, David M.; Morris, Brian J.; Pratt, Judith A.
2012-01-01
Objective: In the present study, we employ mathematical modeling (partial least squares regression, PLSR) to elucidate the functional connectivity signatures of discrete brain regions in order to identify the functional networks subserving PCP-induced disruption of distinct cognitive functions and their restoration by the procognitive drug modafinil. Methods: We examine the functional connectivity signatures of discrete brain regions that show overt alterations in metabolism, as measured by semiquantitative 2-deoxyglucose autoradiography, in an animal model (subchronic phencyclidine [PCP] treatment), which shows cognitive inflexibility with relevance to the cognitive deficits seen in schizophrenia. Results: We identify the specific components of functional connectivity that contribute to the rescue of this cognitive inflexibility and to the restoration of overt cerebral metabolism by modafinil. We demonstrate that modafinil reversed both the PCP-induced deficit in the ability to switch attentional set and the PCP-induced hypometabolism in the prefrontal (anterior prelimbic) and retrosplenial cortices. Furthermore, modafinil selectively enhanced metabolism in the medial prelimbic cortex. The functional connectivity signatures of these regions identified a unifying functional subsystem underlying the influence of modafinil on cerebral metabolism and cognitive flexibility that included the nucleus accumbens core and locus coeruleus. In addition, these functional connectivity signatures identified coupling events specific to each brain region, which relate to known anatomical connectivity. Conclusions: These data support clinical evidence that modafinil may alleviate cognitive deficits in schizophrenia and also demonstrate the benefit of applying PLSR modeling to characterize functional brain networks in translational models relevant to central nervous system dysfunction. PMID:20810469
Wei, Zhebo; Xiao, Xize
2017-01-01
In this study, a portable electronic nose (E-nose) was self-developed to identify rice wines with different marked ages—all the operations of the E-nose were controlled by a special Smartphone Application. The sensor array of the E-nose was comprised of 12 MOS sensors and the obtained response values were transmitted to the Smartphone thorough a wireless communication module. Then, Aliyun worked as a cloud storage platform for the storage of responses and identification models. The measurement of the E-nose was composed of the taste information obtained phase (TIOP) and the aftertaste information obtained phase (AIOP). The area feature data obtained from the TIOP and the feature data obtained from the TIOP-AIOP were applied to identify rice wines by using pattern recognition methods. Principal component analysis (PCA), locally linear embedding (LLE) and linear discriminant analysis (LDA) were applied for the classification of those wine samples. LDA based on the area feature data obtained from the TIOP-AIOP proved a powerful tool and showed the best classification results. Partial least-squares regression (PLSR) and support vector machine (SVM) were applied for the predictions of marked ages and SVM (R2 = 0.9942) worked much better than PLSR. PMID:29088076
Use of Standing Gold Nanorods for Detection of Malachite Green and Crystal Violet in Fish by SERS.
Chen, Xiaowei; Nguyen, Trang H D; Gu, Liqun; Lin, Mengshi
2017-07-01
With growing consumption of aquaculture products, there is increasing demand on rapid and sensitive techniques that can detect prohibited substances in the seafood products. This study aimed to develop a novel surface-enhanced Raman spectroscopy (SERS) method coupled with simplified extraction protocol and novel gold nanorod (AuNR) substrates to detect banned aquaculture substances (malachite green [MG] and crystal violet [CV]) and their mixture (1:1) in aqueous solution and fish samples. Multivariate statistical tools such as principal component analysis (PCA) and partial least squares regression (PLSR) were used in data analysis. PCA results demonstrate that SERS can distinguish MG, CV and their mixture (1:1) in aqueous solution and in fish samples. The detection limit of SERS coupled with standing AuNR substrates is 1 ppb for both MG and CV in fish samples. A good linear relationship between the actual concentration and predicted concentration of analytes based on PLSR models with R 2 values from 0.87 to 0.99 were obtained, indicating satisfactory quantification results of this method. These results demonstrate that the SERS method coupled with AuNR substrates can be used for rapid and accurate detection of MG and CV in fish samples. © 2017 Institute of Food Technologists®.
Wei, Zhebo; Xiao, Xize; Wang, Jun; Wang, Hui
2017-10-31
In this study, a portable electronic nose (E-nose) was self-developed to identify rice wines with different marked ages-all the operations of the E-nose were controlled by a special Smartphone Application. The sensor array of the E-nose was comprised of 12 MOS sensors and the obtained response values were transmitted to the Smartphone thorough a wireless communication module. Then, Aliyun worked as a cloud storage platform for the storage of responses and identification models. The measurement of the E-nose was composed of the taste information obtained phase (TIOP) and the aftertaste information obtained phase (AIOP). The area feature data obtained from the TIOP and the feature data obtained from the TIOP-AIOP were applied to identify rice wines by using pattern recognition methods. Principal component analysis (PCA), locally linear embedding (LLE) and linear discriminant analysis (LDA) were applied for the classification of those wine samples. LDA based on the area feature data obtained from the TIOP-AIOP proved a powerful tool and showed the best classification results. Partial least-squares regression (PLSR) and support vector machine (SVM) were applied for the predictions of marked ages and SVM (R² = 0.9942) worked much better than PLSR.
NASA Astrophysics Data System (ADS)
Singh, A.; Serbin, S. P.; Kingdon, C.; Townsend, P. A.
2013-12-01
A major goal of remote sensing, and imaging spectroscopy in particular, is the development of generalizable algorithms to repeatedly and accurately map ecosystem properties such as canopy chemistry across space and time. Existing methods must therefore be tested across a range of measurement approaches to identify and overcome limits to the consistent retrieval of such properties from spectroscopic imagery. Here we illustrate a general approach for the estimation of key foliar biochemical and morphological traits from spectroscopic imagery derived from the AVIRIS instrument and the propagation of errors from the leaf to the image scale using partial least squares regression (PLSR) techniques. Our method involves the integration of three types of data representing different scales of observation: At the image scale, the images were normalized for atmospheric, illumination and BRDF effects. Spectra from field plot locations were extracted from the 51AVIRIS images and were averaged when the field plot was larger than a single pixel. At the plot level, the scaling was conducted using multiple replicates (1000) derived from the leaf-level uncertainty estimates to generate plot-level estimates with their associated uncertainties. Leaf-level estimates of foliar traits (%N, %C, %Fiber, %Cellulose, %Lignin, LMA) were scaled to the canopy based on relative species composition of each plot. Image spectra were iteratively split into 50/50 randomized calibration-validation datasets and multiple (500) trait-predictive PLSR models were generated, this time sampling from within the plot-level uncertainty distribution. This allowed the propagation of uncertainty from the leaf-level dependent variables to the plot level, and finally to models built using AVIRIS image spectra. Moreover, this method allows us to generate spatially explicit maps of uncertainty in our sampled traits. Both LMA and %N PLSR models had a R2 greater than 0.8, root mean square errors (RMSEs) for both variables were less than 6% of the range of data. Fiber and lignin were predicted with R2 > 0.65 and carbon and cellulose greater than 0.5. Although R2 of these variables were lower than LMA and %N, their RMSE values were beneath 9% of the range of data. The comparatively lower R2 values for %C and cellulose in particular were related to the low amount of natural variability in these constituents. Further, coefficients from the randomized set of PLSR models were applied to imagery and aggregated to obtain pixel-wise predicted means and uncertainty estimates for each foliar trait. The resulting maps of nutritional and morphological properties together with their overall uncertainties represent a first-of-its-kind data product for examining the spatio-temporal patterns of forest functioning and nutrient cycling. These data are now being used to relate foliar traits with ecosystem processes such as streamwater nutrient export and insect herbivory. In addition, the ability to assign a retrieval uncertainty enables more efficient assimilation of these data products into ecosystem models to help constrain carbon and nutrient cycling projections.
Niu, Yunwei; Zhang, Xiaoming; Xiao, Zuobing; Song, Shiqing; Jia, Chengsheng; Yu, Haiyan; Fang, Lingling; Xu, Chunhua
2012-08-01
Five cherry wines exhibiting marked differences in taste and mouthfeel were selected for the study. The taste and mouthfeel of cherry wines were described by four sensory terms as sour, sweet, bitter and astringent. Eight organic acids, seventeen amino acids, three sugars and tannic acid were determined by high performance liquid chromatography (HPLC). Five phenolic acids were determined by ultra performance liquid chromatography coupled with mass spectrometry (UPLC-MS). The relationship between these taste-active compounds, wine samples and sensory attributes was modeled by partial least squares regression (PLSR). The regression analysis indicated tartaric acid, methionine, proline, sucrose, glucose, fructose, asparagines, serine, glycine, threonine, phenylalanine, leucine, gallic acid, chlorogenic acid, vanillic acid, arginine and tannic acid made a great contribution to the characteristic taste or mouthfeel of cherry wines. Copyright © 2012 Elsevier B.V. All rights reserved.
Bhatt, Chet R; Jain, Jinesh C; Goueguel, Christian L; McIntyre, Dustin L; Singh, Jagdish P
2018-01-01
Laser-induced breakdown spectroscopy (LIBS) was used to detect rare earth elements (REEs) in natural geological samples. Low and high intensity emission lines of Ce, La, Nd, Y, Pr, Sm, Eu, Gd, and Dy were identified in the spectra recorded from the samples to claim the presence of these REEs. Multivariate analysis was executed by developing partial least squares regression (PLS-R) models for the quantification of Ce, La, and Nd. Analysis of unknown samples indicated that the prediction results of these samples were found comparable to those obtained by inductively coupled plasma mass spectrometry analysis. Data support that LIBS has potential to quantify REEs in geological minerals/ores.
Petersen, Nanna; Stocks, Stuart; Gernaey, Krist V
2008-05-01
The main purpose of this article is to demonstrate that principal component analysis (PCA) and partial least squares regression (PLSR) can be used to extract information from particle size distribution data and predict rheological properties. Samples from commercially relevant Aspergillus oryzae fermentations conducted in 550 L pilot scale tanks were characterized with respect to particle size distribution, biomass concentration, and rheological properties. The rheological properties were described using the Herschel-Bulkley model. Estimation of all three parameters in the Herschel-Bulkley model (yield stress (tau(y)), consistency index (K), and flow behavior index (n)) resulted in a large standard deviation of the parameter estimates. The flow behavior index was not found to be correlated with any of the other measured variables and previous studies have suggested a constant value of the flow behavior index in filamentous fermentations. It was therefore chosen to fix this parameter to the average value thereby decreasing the standard deviation of the estimates of the remaining rheological parameters significantly. Using a PLSR model, a reasonable prediction of apparent viscosity (micro(app)), yield stress (tau(y)), and consistency index (K), could be made from the size distributions, biomass concentration, and process information. This provides a predictive method with a high predictive power for the rheology of fermentation broth, and with the advantages over previous models that tau(y) and K can be predicted as well as micro(app). Validation on an independent test set yielded a root mean square error of 1.21 Pa for tau(y), 0.209 Pa s(n) for K, and 0.0288 Pa s for micro(app), corresponding to R(2) = 0.95, R(2) = 0.94, and R(2) = 0.95 respectively. Copyright 2007 Wiley Periodicals, Inc.
Wang, Jingzhe; Abulimiti, Aerzuna; Cai, Lianghong
2018-01-01
Soil salinization is one of the most common forms of land degradation. The detection and assessment of soil salinity is critical for the prevention of environmental deterioration especially in arid and semi-arid areas. This study introduced the fractional derivative in the pretreatment of visible and near infrared (VIS–NIR) spectroscopy. The soil samples (n = 400) collected from the Ebinur Lake Wetland, Xinjiang Uyghur Autonomous Region (XUAR), China, were used as the dataset. After measuring the spectral reflectance and salinity in the laboratory, the raw spectral reflectance was preprocessed by means of the absorbance and the fractional derivative order in the range of 0.0–2.0 order with an interval of 0.1. Two different modeling methods, namely, partial least squares regression (PLSR) and random forest (RF) with preprocessed reflectance were used for quantifying soil salinity. The results showed that more spectral characteristics were refined for the spectrum reflectance treated via fractional derivative. The validation accuracies showed that RF models performed better than those of PLSR. The most effective model was established based on RF with the 1.5 order derivative of absorbance with the optimal values of R2 (0.93), RMSE (4.57 dS m−1), and RPD (2.78 ≥ 2.50). The developed RF model was stable and accurate in the application of spectral reflectance for determining the soil salinity of the Ebinur Lake wetland. The pretreatment of fractional derivative could be useful for monitoring multiple soil parameters with higher accuracy, which could effectively help to analyze the soil salinity. PMID:29736341
Wang, Jingzhe; Ding, Jianli; Abulimiti, Aerzuna; Cai, Lianghong
2018-01-01
Soil salinization is one of the most common forms of land degradation. The detection and assessment of soil salinity is critical for the prevention of environmental deterioration especially in arid and semi-arid areas. This study introduced the fractional derivative in the pretreatment of visible and near infrared (VIS-NIR) spectroscopy. The soil samples ( n = 400) collected from the Ebinur Lake Wetland, Xinjiang Uyghur Autonomous Region (XUAR), China, were used as the dataset. After measuring the spectral reflectance and salinity in the laboratory, the raw spectral reflectance was preprocessed by means of the absorbance and the fractional derivative order in the range of 0.0-2.0 order with an interval of 0.1. Two different modeling methods, namely, partial least squares regression (PLSR) and random forest (RF) with preprocessed reflectance were used for quantifying soil salinity. The results showed that more spectral characteristics were refined for the spectrum reflectance treated via fractional derivative. The validation accuracies showed that RF models performed better than those of PLSR. The most effective model was established based on RF with the 1.5 order derivative of absorbance with the optimal values of R 2 (0.93), RMSE (4.57 dS m -1 ), and RPD (2.78 ≥ 2.50). The developed RF model was stable and accurate in the application of spectral reflectance for determining the soil salinity of the Ebinur Lake wetland. The pretreatment of fractional derivative could be useful for monitoring multiple soil parameters with higher accuracy, which could effectively help to analyze the soil salinity.
Tiyip, Tashpolat; Ding, Jianli; Zhang, Dong; Liu, Wei; Wang, Fei; Tashpolat, Nigara
2017-01-01
Effective pretreatment of spectral reflectance is vital to model accuracy in soil parameter estimation. However, the classic integer derivative has some disadvantages, including spectral information loss and the introduction of high-frequency noise. In this paper, the fractional order derivative algorithm was applied to the pretreatment and partial least squares regression (PLSR) was used to assess the clay content of desert soils. Overall, 103 soil samples were collected from the Ebinur Lake basin in the Xinjiang Uighur Autonomous Region of China, and used as data sets for calibration and validation. Following laboratory measurements of spectral reflectance and clay content, the raw spectral reflectance and absorbance data were treated using the fractional derivative order from the 0.0 to the 2.0 order (order interval: 0.2). The ratio of performance to deviation (RPD), determinant coefficients of calibration (Rc2), root mean square errors of calibration (RMSEC), determinant coefficients of prediction (Rp2), and root mean square errors of prediction (RMSEP) were applied to assess the performance of predicting models. The results showed that models built on the fractional derivative order performed better than when using the classic integer derivative. Comparison of the predictive effects of 22 models for estimating clay content, calibrated by PLSR, showed that those models based on the fractional derivative 1.8 order of spectral reflectance (Rc2 = 0.907, RMSEC = 0.425%, Rp2 = 0.916, RMSEP = 0.364%, and RPD = 2.484 ≥ 2.000) and absorbance (Rc2 = 0.888, RMSEC = 0.446%, Rp2 = 0.918, RMSEP = 0.383% and RPD = 2.511 ≥ 2.000) were most effective. Furthermore, they performed well in quantitative estimations of the clay content of soils in the study area. PMID:28934274
Peng, Jiyu; He, Yong; Ye, Lanhan; Shen, Tingting; Liu, Fei; Kong, Wenwen; Liu, Xiaodan; Zhao, Yun
2017-07-18
Fast detection of heavy metals in plant materials is crucial for environmental remediation and ensuring food safety. However, most plant materials contain high moisture content, the influence of which cannot be simply ignored. Hence, we proposed moisture influence reducing method for fast detection of heavy metals using laser-induced breakdown spectroscopy (LIBS). First, we investigated the effect of moisture content on signal intensity, stability, and plasma parameters (temperature and electron density) and determined the main influential factors (experimental parameters F and the change of analyte concentration) on the variations of signal. For chromium content detection, the rice leaves were performed with a quick drying procedure, and two strategies were further used to reduce the effect of moisture content and shot-to-shot fluctuation. An exponential model based on the intensity of background was used to correct the actual element concentration in analyte. Also, the ratio of signal-to-background for univariable calibration and partial least squared regression (PLSR) for multivariable calibration were used to compensate the prediction deviations. The PLSR calibration model obtained the best result, with the correlation coefficient of 0.9669 and root-mean-square error of 4.75 mg/kg in the prediction set. The preliminary results indicated that the proposed method allowed for the detection of heavy metals in plant materials using LIBS, and it could be possibly used for element mapping in future work.
Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969
Rapid Isolation and Detection for RNA Biomarkers for TBI Diagnostics
2015-10-01
V., Grape and wine sensory attributes correlate with pattern- based discrimination of Cabernet Sauvignon wines by a peptidic sensor array, Tetrahedron... wine samples. Partial Least Squares Regression (PLSR) was used for the correlation of wine sensory attributes to the peptide-based receptor...responses. Data analysis was done using the software XLSTAT Addinsoft, NewYork) and R.Absorbance values due to wine without the sensing ensembles were
Hyperspectral sensing to detect the impact of herbicide drift on cotton growth and yield
NASA Astrophysics Data System (ADS)
Suarez, L. A.; Apan, A.; Werth, J.
2016-10-01
Yield loss in crops is often associated with plant disease or external factors such as environment, water supply and nutrient availability. Improper agricultural practices can also introduce risks into the equation. Herbicide drift can be a combination of improper practices and environmental conditions which can create a potential yield loss. As traditional assessment of plant damage is often imprecise and time consuming, the ability of remote and proximal sensing techniques to monitor various bio-chemical alterations in the plant may offer a faster, non-destructive and reliable approach to predict yield loss caused by herbicide drift. This paper examines the prediction capabilities of partial least squares regression (PLS-R) models for estimating yield. Models were constructed with hyperspectral data of a cotton crop sprayed with three simulated doses of the phenoxy herbicide 2,4-D at three different growth stages. Fibre quality, photosynthesis, conductance, and two main hormones, indole acetic acid (IAA) and abscisic acid (ABA) were also analysed. Except for fibre quality and ABA, Spearman correlations have shown that these variables were highly affected by the chemical. Four PLS-R models for predicting yield were developed according to four timings of data collection: 2, 7, 14 and 28 days after the exposure (DAE). As indicated by the model performance, the analysis revealed that 7 DAE was the best time for data collection purposes (RMSEP = 2.6 and R2 = 0.88), followed by 28 DAE (RMSEP = 3.2 and R2 = 0.84). In summary, the results of this study show that it is possible to accurately predict yield after a simulated herbicide drift of 2,4-D on a cotton crop, through the analysis of hyperspectral data, thereby providing a reliable, effective and non-destructive alternative based on the internal response of the cotton leaves.
Raman microspectroscopy for in situ examination of carbon-microbe-mineral interactions
NASA Astrophysics Data System (ADS)
Creamer, C.; Foster, A. L.; Lawrence, C. R.; Mcfarland, J. W.; Waldrop, M. P.
2016-12-01
The changing paradigm of soil organic matter formation and turnover is focused at the nexus of microbe-carbon-mineral interactions. However, visualizing biotic and abiotic stabilization of C on mineral surfaces is difficult given our current techniques. Therefore we investigated Raman microspectroscopy as a potential tool to examine microbially mediated organo-mineral associations. Raman microspectroscopy is a non-destructive technique that has been used to identify microorganisms and minerals, and to quantify microbial assimilation of 13C labeled substrates in culture. We developed a partial least squares regression (PLSR) model to accurately quantify (within 5%) adsorption of four model 12C substrates (glucose, glutamic acid, oxalic acid, p-hydroxybenzoic acid) on a range of soil minerals. We also developed a PLSR model to quantify the incorporation of 13C into E. coli cells. Using these two models, along with measures of the 13C content of respired CO2, we determined the allocation of glucose-derived C into mineral-associated microbial biomass and respired CO2 in situ and through time. We observed progressive 13C enrichment of microbial biomass with incubation time, as well as 13C enrichment of CO2 indicating preferential decomposition of glucose-derived C. We will also present results on the application of our in situ chamber to quantify the formation of organo-mineral associations under both abiotic and biotic conditions with a variety of C and mineral substrates, as well as the rate of turnover and stabilization of microbial residues. Application of Raman microspectroscopy to microbial-mineral interactions represents a novel method to quantify microbial transformation of C substrates and subsequent mineral stabilization without destructive sampling, and has the potential to provide new insights to our conceptual understanding of carbon-microbe-mineral interactions.
Rapid determination of sugar level in snack products using infrared spectroscopy.
Wang, Ting; Rodriguez-Saona, Luis E
2012-08-01
Real-time spectroscopic methods can provide a valuable window into food manufacturing to permit optimization of production rate, quality and safety. There is a need for cutting edge sensor technology directed at improving efficiency, throughput and reliability of critical processes. The aim of the research was to evaluate the feasibility of infrared systems combined with chemometric analysis to develop rapid methods for determination of sugars in cereal products. Samples were ground and spectra were collected using a mid-infrared (MIR) spectrometer equipped with a triple-bounce ZnSe MIRacle attenuated total reflectance accessory or Fourier transform near infrared (NIR) system equipped with a diffuse reflection-integrating sphere. Sugar contents were determined using a reference HPLC method. Partial least squares regression (PLSR) was used to create cross-validated calibration models. The predictability of the models was evaluated on an independent set of samples and compared with reference techniques. MIR and NIR spectra showed characteristic absorption bands for sugars, and generated excellent PLSR models (sucrose: SEP < 1.7% and r > 0.96). Multivariate models accurately and precisely predicted sugar level in snacks allowing for rapid analysis. This simple technique allows for reliable prediction of quality parameters, and automation enabling food manufacturers for early corrective actions that will ultimately save time and money while establishing a uniform quality. The U.S. snack food industry generates billions of dollars in revenue each year and vibrational spectroscopic methods combined with pattern recognition analysis could permit optimization of production rate, quality, and safety of many food products. This research showed that infrared spectroscopy is a powerful technique for near real-time (approximately 1 min) assessment of sugar content in various cereal products. © 2012 Institute of Food Technologists®
NASA Astrophysics Data System (ADS)
Vaudour, E.; Gilliot, J. M.; Bel, L.; Lefevre, J.; Chehdi, K.
2016-07-01
This study aimed at identifying the potential of Vis-NIR airborne hyperspectral AISA-Eagle data for predicting the topsoil organic carbon (SOC) content of bare cultivated soils over a large peri-urban area (221 km2) with both contrasted soils and SOC contents, located in the western region of Paris, France. Soil types comprised haplic luvisols, calcaric cambisols and colluvic cambisols. Airborne AISA-Eagle data (400-1000 nm, 126 bands) with 1 m-resolution were acquired on 17 April 2013 over 13 tracks. Tracks were atmospherically corrected then mosaicked at a 2 m-resolution using a set of 24 synchronous field spectra of bare soils, black and white targets and impervious surfaces. The land use identification system layer (RPG) of 2012 was used to mask non-agricultural areas, then calculation and thresholding of NDVI from an atmospherically corrected SPOT image acquired the same day enabled to map agricultural fields with bare soil. A total of 101 sites sampled either in 2013 or in the 3 previous years and in 2015 were identified as bare by means of this map. Predictions were made from the mosaic AISA spectra which were related to topsoil SOC contents by means of partial least squares regression (PLSR). Regression robustness was evaluated through a series of 1000 bootstrap data sets of calibration-validation samples, considering 74 sites outside cloud shadows only, and different sampling strategies for selecting calibration samples. Validation root-mean-square errors (RMSE) were comprised between 3.73 and 4.49 g Kg-1 and were ∼4 g Kg-1 in median. The most performing models in terms of coefficient of determination (R2) and Residual Prediction Deviation (RPD) values were the calibration models derived either from Kennard-Stone or conditioned Latin Hypercube sampling on smoothed spectra. The most generalizable model leading to lowest RMSE value of 3.73 g Kg-1 at the regional scale and 1.44 g Kg-1 at the within-field scale and low bias was the cross-validated leave-one-out PLSR model constructed with the 28 near-synchronous samples and raw spectra.
[Research on Oil Sands Spectral Characteristics and Oil Content by Remote Sensing Estimation].
You, Jin-feng; Xing, Li-xin; Pan, Jun; Shan, Xuan-long; Liang, Li-heng; Fan, Rui-xue
2015-04-01
Visible and near infrared spectroscopy is a proven technology to be widely used in identification and exploration of hydrocarbon energy sources with high spectral resolution for detail diagnostic absorption characteristics of hydrocarbon groups. The most prominent regions for hydrocarbon absorption bands are 1,740-1,780, 2,300-2,340 and 2,340-2,360 nm by the reflectance of oil sands samples. These spectral ranges are dominated by various C-H overlapping overtones and combination bands. Meanwhile, there is relatively weak even or no absorption characteristics in the region from 1,700 to 1,730 nm in the spectra of oil sands samples with low bitumen content. With the increase in oil content, in the spectral range of 1,700-1,730 nm the obvious hydrocarbon absorption begins to appear. The bitumen content is the critical parameter for oil sands reserves estimation. The absorption depth was used to depict the response intensity of the absorption bands controlled by first-order overtones and combinations of the various C-H stretching and bending fundamentals. According to the Pearson and partial correlation relationships of oil content and absorption depth dominated by hydrocarbon groups in 1,740-1,780, 2,300-2,340 and 2,340-2,360 nm wavelength range, the scheme of association mode was established between the intensity of spectral response and bitumen content, and then unary linear regression(ULR) and partial least squares regression (PLSR) methods were employed to model the equation between absorption depth attributed to various C-H bond and bitumen content. There were two calibration equations in which ULR method was employed to model the relationship between absorption depth near 2,350 nm region and bitumen content and PLSR method was developed to model the relationship between absorption depth of 1,758, 2,310, 2,350 nm regions and oil content. It turned out that the calibration models had good predictive ability and high robustness and they could provide the scientific basis for rapid estimation of oil content in oil sands in future.
NASA Astrophysics Data System (ADS)
Hussain, Javid; Mabood, Fazal; Al-Harrasi, Ahmed; Ali, Liaqat; Rizvi, Tania Shamim; Jabeen, Farah; Gilani, Syed Abdullah; Shinwari, Shehla; Ahmad, Mushtaq; Alabri, Zahra Khalfan; Al Ghawi, Said Hamood Salim
2018-04-01
Flavonoids are natural antioxidants derived from plants and commonly found in a variety of foods to sequester free radicals. Quercetin, belonging to flavonol subclass of flavonoids, has received considerable attention because of its wide uses as a nutritional supplement as well as a phytochemical remedy for a number of diseases. In the current study, quantification of quercetin was carried out in two medicinally important flavonoid rich plant Ziziphus mucronata and Ziziphus sativa. Emission spectroscopy was utilized as a new method coupled with Partial Least Squares Regression (PLSR) and the cross validation was done by UV-Visible spectroscopy. The results indicated the higher quercetin content in Z. mucronata (1.50 ± 0.034%) than Z. sativa (1.21 ± 0.052%), and were further verified through Folin-Ciocalteu Colorimetric method (Z. mucronata; 1.41 ± 0.26% and Z. sativa; 1.13 ± 0.136%). In this study the sensitivity was explained in term of slope i.e. Slope = 0.9973.
Xu, Shengxiang; Shi, Xuezheng; Wang, Meiyan; Zhao, Yongcun
2016-01-01
Assessment and monitoring of soil organic matter (SOM) quality are important for understanding SOM dynamics and developing management practices that will enhance and maintain the productivity of agricultural soils. Visible and near-infrared (Vis–NIR) diffuse reflectance spectroscopy (350–2500 nm) has received increasing attention over the recent decades as a promising technique for SOM analysis. While heterogeneity of sample sets is one critical factor that complicates the prediction of soil properties from Vis–NIR spectra, a spectral library representing the local soil diversity needs to be constructed. The study area, covering a surface of 927 km2 and located in Yujiang County of Jiangsu Province, is characterized by a hilly area with different soil parent materials (e.g., red sandstone, shale, Quaternary red clay, and river alluvium). In total, 232 topsoil (0–20 cm) samples were collected for SOM analysis and scanned with a Vis–NIR spectrometer in the laboratory. Reflectance data were related to surface SOM content by means of a partial least square regression (PLSR) method and several data pre-processing techniques, such as first and second derivatives with a smoothing filter. The performance of the PLSR model was tested under different combinations of calibration/validation sets (global and local calibrations stratified according to parent materials). The results showed that the models based on the global calibrations can only make approximate predictions for SOM content (RMSE (root mean squared error) = 4.23–4.69 g kg−1; R2 (coefficient of determination) = 0.80–0.84; RPD (ratio of standard deviation to RMSE) = 2.19–2.44; RPIQ (ratio of performance to inter-quartile distance) = 2.88–3.08). Under the local calibrations, the individual PLSR models for each parent material improved SOM predictions (RMSE = 2.55–3.49 g kg−1; R2 = 0.87–0.93; RPD = 2.67–3.12; RPIQ = 3.15–4.02). Among the four different parent materials, the largest R2 and the smallest RMSE were observed for the shale soils, which had the lowest coefficient of variation (CV) values for clay (18.95%), free iron oxides (15.93%), and pH (1.04%). This demonstrates the importance of a practical subsetting strategy for the continued improvement of SOM prediction with Vis–NIR spectroscopy. PMID:26974821
Value of Information Analysis for Time-lapse Seismic Data by Simulation-Regression
NASA Astrophysics Data System (ADS)
Dutta, G.; Mukerji, T.; Eidsvik, J.
2016-12-01
A novel method to estimate the Value of Information (VOI) of time-lapse seismic data in the context of reservoir development is proposed. VOI is a decision analytic metric quantifying the incremental value that would be created by collecting information prior to making a decision under uncertainty. The VOI has to be computed before collecting the information and can be used to justify its collection. Previous work on estimating the VOI of geophysical data has involved explicit approximation of the posterior distribution of reservoir properties given the data and then evaluating the prospect values for that posterior distribution of reservoir properties. Here, we propose to directly estimate the prospect values given the data by building a statistical relationship between them using regression. Various regression techniques such as Partial Least Squares Regression (PLSR), Multivariate Adaptive Regression Splines (MARS) and k-Nearest Neighbors (k-NN) are used to estimate the VOI, and the results compared. For a univariate Gaussian case, the VOI obtained from simulation-regression has been shown to be close to the analytical solution. Estimating VOI by simulation-regression is much less computationally expensive since the posterior distribution of reservoir properties given each possible dataset need not be modeled and the prospect values need not be evaluated for each such posterior distribution of reservoir properties. This method is flexible, since it does not require rigid model specification of posterior but rather fits conditional expectations non-parametrically from samples of values and data.
Multisensor on-the-go mapping of readily dispersible clay, particle size and soil organic matter
NASA Astrophysics Data System (ADS)
Debaene, Guillaume; Niedźwiecki, Jacek; Papierowska, Ewa
2016-04-01
Particle size fractions affect strongly the physical and chemical properties of soil. Readily dispersible clay (RDC) is the part of the clay fraction in soils that is easily or potentially dispersible in water when small amounts of mechanical energy are applied to soil. The amount of RDC in the soil is of significant importance for agriculture and environment because clay dispersion is a cause of poor soil stability in water which in turn contributes to soil erodibility, mud flows, and cementation. To obtain a detailed map of soil texture, many samples are needed. Moreover, RDC determination is time consuming. The use of a mobile visible and near-infrared (VIS-NIR) platform is proposed here to map those soil properties and obtain the first detailed map of RDC at field level. Soil properties prediction was based on calibration model developed with 10 representative samples selected by a fuzzy logic algorithm. Calibration samples were analysed for soil texture (clay, silt and sand), RDC and soil organic carbon (SOC) using conventional wet chemistry analysis. Moreover, the Veris mobile sensor platform is also collecting electrical conductivity (EC) data (deep and shallow), and soil temperature. These auxiliary data were combined with VIS-NIR measurement (data fusion) to improve prediction results. EC maps were also produced to help understanding RDC data. The resulting maps were visually compared with an orthophotography of the field taken at the beginning of the plant growing season. Models were developed with partial least square regression (PLSR) and support vector machine regression (SVMR). There were no significant differences between calibration using PLSR or SVMR. Nevertheless, the best models were obtained with PLSR and standard normal variate (SNV) pretreatment and the fusion with deep EC data (e.g. for RDC and clay content: RMSECV = 0,35% and R2 = 0,71; RMSECV = 0,32% and R2 = 0,73 respectively). The best models were used to predict soil properties from the field spectra collected with the VIS-NIR platform. Maps of soil properties were generated using natural neighbour (NN) interpolation. Calibration results were satisfactory for all soil properties and allowed for the generation of detailed maps. The spatial variability of RDC was in accordance with the field orthophotography. Areas of high RDC content were corresponding to area of bad plant development. Soil texture has been correctly predicted by VIS-NIR spectroscopy (laboratory or on-the-go) before. However, readily dispersible clay (an important parameter for soil stability) has never been investigated before. This study introduces the possibility of using VIS-NIR for predicting readily dispersible clay at field level. The results obtained could be used in preventing soil erosion. Acknowledgement: This research was financed by a National Science Centre grant (NCN - Poland) with decision number UMO-2012/07/B/ST10/04387
Feng, Yao-Ze; Elmasry, Gamal; Sun, Da-Wen; Scannell, Amalia G M; Walsh, Des; Morcy, Noha
2013-06-01
Bacterial pathogens are the main culprits for outbreaks of food-borne illnesses. This study aimed to use the hyperspectral imaging technique as a non-destructive tool for quantitative and direct determination of Enterobacteriaceae loads on chicken fillets. Partial least squares regression (PLSR) models were established and the best model using full wavelengths was obtained in the spectral range 930-1450 nm with coefficients of determination R(2)≥ 0.82 and root mean squared errors (RMSEs) ≤ 0.47 log(10)CFUg(-1). In further development of simplified models, second derivative spectra and weighted PLS regression coefficients (BW) were utilised to select important wavelengths. However, the three wavelengths (930, 1121 and 1345 nm) selected from BW were competent and more preferred for predicting Enterobacteriaceae loads with R(2) of 0.89, 0.86 and 0.87 and RMSEs of 0.33, 0.40 and 0.45 log(10)CFUg(-1) for calibration, cross-validation and prediction, respectively. Besides, the constructed prediction map provided the distribution of Enterobacteriaceae bacteria on chicken fillets, which cannot be achieved by conventional methods. It was demonstrated that hyperspectral imaging is a potential tool for determining food sanitation and detecting bacterial pathogens on food matrix without using complicated laboratory regimes. Copyright © 2012 Elsevier Ltd. All rights reserved.
Sirisomboon, Panmanas; Chowbankrang, Rawiphan; Williams, Phil
2012-05-01
Near-infrared spectroscopy in diffuse reflection mode was used to evaluate the apparent viscosity of Para rubber field latex and concentrated latex over the wavelength range of 1100 to 2500 nm, using partial least square regression (PLSR). The model with ten principal components (PCs) developed using the raw spectra accurately predicted the apparent viscosity with correlation coefficient (r), standard error of prediction (SEP), and bias of 0.974, 8.6 cP, and -0.4 cP, respectively. The ratio of the SEP to the standard deviation (RPD) and the ratio of the SEP to the range (RER) for the prediction were 4.4 and 16.7, respectively. Therefore, the model can be used for measurement of the apparent viscosity of field latex and concentrated latex in quality assurance and process control in the factory.
Liu, Huiyu; Zhang, Mingyang; Lin, Zhenshan
2017-10-05
Climate changes are considered to significantly impact net primary productivity (NPP). However, there are few studies on how climate changes at multiple time scales impact NPP. With MODIS NPP product and station-based observations of sunshine duration, annual average temperature and annual precipitation, impacts of climate changes at different time scales on annual NPP, have been studied with EEMD (ensemble empirical mode decomposition) method in the Karst area of northwest Guangxi, China, during 2000-2013. Moreover, with partial least squares regression (PLSR) model, the relative importance of climatic variables for annual NPP has been explored. The results show that (1) only at quasi 3-year time scale do sunshine duration and temperature have significantly positive relations with NPP. (2) Annual precipitation has no significant relation to NPP by direct comparison, but significantly positive relation at 5-year time scale, which is because 5-year time scale is not the dominant scale of precipitation; (3) the changes of NPP may be dominated by inter-annual variabilities. (4) Multiple time scales analysis will greatly improve the performance of PLSR model for estimating NPP. The variable importance in projection (VIP) scores of sunshine duration and temperature at quasi 3-year time scale, and precipitation at quasi 5-year time scale are greater than 0.8, indicating important for NPP during 2000-2013. However, sunshine duration and temperature at quasi 3-year time scale are much more important. Our results underscore the importance of multiple time scales analysis for revealing the relations of NPP to changing climate.
Determination of yolk contamination in liquid egg white using Raman spectroscopy.
Cluff, K; Konda Naganathan, G; Jonnalagada, D; Mortensen, I; Wehling, R; Subbiah, J
2016-07-01
Purified egg white is an important ingredient in a number of baked and confectionary foods because of its foaming properties. However, yolk contamination in amounts as low as 0.01% can impede the foaming ability of egg white. In this study, we used Raman spectroscopy to evaluate the hypothesis that yolk contamination in egg white could be detected based on its molecular optical properties. Yolk contaminated egg white samples (n = 115) with contamination levels ranging from 0% to 0.25% (on weight basis) were prepared. The samples were excited with a 785 nm laser and Raman spectra from 250 to 3,200 cm(-1) were recorded. The Raman spectra were baseline corrected using an optimized piecewise cubic interpolation on each spectrum and then normalized with a standard normal variate transformation. Samples were randomly divided into calibration (n = 77) and validation (n = 38) data sets. A partial least squares regression (PLSR) model was developed to predict yolk contamination levels, based on the Raman spectral fingerprint. Raman spectral peaks, in the spectral region of 1,080 and 1,666 cm(-1), had the largest influence on detecting yolk contamination in egg white. The PLSR model was able to correctly predict yolk contamination levels with an R(2) = 0.90 in the validation data set. These results demonstrate the capability of Raman spectroscopy for detection of yolk contamination at very low levels in egg white and present a strong case for development of an on-line system to be deployed in egg processing plants. © 2016 Poultry Science Association Inc.
NASA Astrophysics Data System (ADS)
Zheng, Xiaochun; Peng, Yankun; Li, Yongyu; Chao, Kuanglin; Qin, Jianwei
2017-05-01
The plate count method is commonly used to detect the total viable count (TVC) of bacteria in pork, which is timeconsuming and destructive. It has also been used to study the changes of the TVC in pork under different storage conditions. In recent years, many scholars have explored the non-destructive methods on detecting TVC by using visible near infrared (VIS/NIR) technology and hyperspectral technology. The TVC in chilled pork was monitored under high oxygen condition in this study by using hyperspectral technology in order to evaluate the changes of total bacterial count during storage, and then evaluate advantages and disadvantages of the storage condition. The VIS/NIR hyperspectral images of samples stored in high oxygen condition was acquired by a hyperspectral system in range of 400 1100nm. The actual reference value of total bacteria was measured by standard plate count method, and the results were obtained in 48 hours. The reflection spectra of the samples are extracted and used for the establishment of prediction model for TVC. The spectral preprocessing methods of standard normal variate transformation (SNV), multiple scatter correction (MSC) and derivation was conducted to the original reflectance spectra of samples. Partial least squares regression (PLSR) of TVC was performed and optimized to be the prediction model. The results show that the near infrared hyperspectral technology based on 400-1100nm combined with PLSR model can describe the growth pattern of the total bacteria count of the chilled pork under the condition of high oxygen very vividly and rapidly. The results obtained in this study demonstrate that the nondestructive method of TVC based on NIR hyperspectral has great potential in monitoring of edible safety in processing and storage of meat.
Characterization of the biosolids composting process by hyperspectral analysis.
Ilani, Talli; Herrmann, Ittai; Karnieli, Arnon; Arye, Gilboa
2016-02-01
Composted biosolids are widely used as a soil supplement to improve soil quality. However, the application of immature or unstable compost can cause the opposite effect. To date, compost maturation determination is time consuming and cannot be done at the composting site. Hyperspectral spectroscopy was suggested as a simple tool for assessing compost maturity and quality. Nevertheless, there is still a gap in knowledge regarding several compost maturation characteristics, such as dissolved organic carbon, NO3, and NH4 contents. In addition, this approach has not yet been tested on a sample at its natural water content. Therefore, in the current study, hyperspectral analysis was employed in order to characterize the biosolids composting process as a function of composting time. This goal was achieved by correlating the reflectance spectra in the range of 400-2400nm, using the partial least squares-regression (PLS-R) model, with the chemical properties of wet and oven-dried biosolid samples. The results showed that the proposed method can be used as a reliable means to evaluate compost maturity and stability. Specifically, the PLS-R model was found to be an adequate tool to evaluate the biosolids' total carbon and dissolved organic carbon, total nitrogen and dissolved nitrogen, and nitrate content, as well as the absorbance ratio of 254/365nm (E2/E3) and C/N ratios in the dry and wet samples. It failed, however, to predict the ammonium content in the dry samples since the ammonium evaporated during the drying process. It was found that in contrast to what is commonly assumed, the spectral analysis of the wet samples can also be successfully used to build a model for predicting the biosolids' compost maturity. Copyright © 2015 Elsevier Ltd. All rights reserved.
Hejri-Zarifi, Sudiyeh; Ahmadian-Kouchaksaraei, Zahra; Pourfarzad, Amir; Khodaparast, Mohammad Hossein Haddad
2014-12-01
Germinated palm date seeds were milled into two fractions: germ and residue. Dough rheological characteristics, baking (specific volume and sensory evaluation), and textural properties (at first day and during storage for 5 days) were determined in Barbari flat bread. Germ and residue fractions were incorporated at various levels ranged in 0.5-3 g/100 g of wheat flour. Water absorption, arrival time and gelatination temperature were decreased by germ fraction but accompanied by an increasing effect on the mixing tolerance index and degree of softening in most levels. Although improvement in dough stability was monitored but specific volume of bread was not affected by both fractions. Texture analysis of bread samples during 5 days of storage indicated that both fractions of germinated date seeds were able to diminish bread staling. Avrami non-linear regression equation was chosen as useful mathematical model to properly study bread hardening kinetics. In addition, principal component analysis (PCA) allowed discriminating among dough and bread specialties. Partial least squares regression (PLSR) models were applied to determine the relationships between sensory and instrumental data.
Kang, Bo-Sik; Lee, Jang-Eun; Park, Hyun-Jin
2014-06-01
In Korean rice wine (makgeolli) model, we tried to develop a prediction model capable of eliciting a quantitative relationship between initial amino acids in makgeolli mash and major aromatic compounds, such as fusel alcohols, their acetate esters, and ethyl esters of fatty acids, in makgeolli brewed. Mass-spectrometry-based electronic nose (MS-EN) was used to qualitatively discriminate between makgeollis made from makgeolli mashes with different amino acid compositions. Following this measurement, headspace solid-phase microextraction coupled to gas chromatography-mass spectrometry (GC-MS) combined with partial least-squares regression (PLSR) method was employed to quantitatively correlate amino acid composition of makgeolli mash with major aromatic compounds evolved during makgeolli fermentation. In qualitative prediction with MS-EN analysis, the makgeollis were well discriminated according to the volatile compounds derived from amino acids of makgeolli mash. Twenty-seven ion fragments with mass-to-charge ratio (m/z) of 55 to 98 amu were responsible for the discrimination. In GC-MS combined with PLSR method, a quantitative approach between the initial amino acids of makgeolli mash and the fusel compounds of makgeolli demonstrated that coefficient of determination (R(2)) of most of the fusel compounds ranged from 0.77 to 0.94 in good correlation, except for 2-phenylethanol (R(2) = 0.21), whereas R(2) for ethyl esters of MCFAs including ethyl caproate, ethyl caprylate, and ethyl caprate was 0.17 to 0.40 in poor correlation. The amino acids have been known to affect the aroma in alcoholic beverages. In this study, we demonstrated that an electronic nose qualitatively differentiated Korean rice wines (makgeollis) by their volatile compounds evolved from amino acids with rapidity and reproducibility and successively, a quantitative correlation with acceptable R2 between amino acids and fusel compounds could be established via HS-SPME GC-MS combined with partial least-squares regression. Our approach for predicting the quantities of volatile compounds in the finished product from initial condition of fermentation will give an insight to food researchers to modify and optimize the qualities of the corresponding products. © 2014 Institute of Food Technologists®
On the prediction of threshold friction velocity of wind erosion using soil reflectance spectroscopy
NASA Astrophysics Data System (ADS)
Li, Junran; Flagg, Cody; Okin, Gregory S.; Painter, Thomas H.; Dintwe, Kebonye; Belnap, Jayne
2015-12-01
Current approaches to estimate threshold friction velocity (TFV) of soil particle movement, including both experimental and empirical methods, suffer from various disadvantages, and they are particularly not effective to estimate TFVs at regional to global scales. Reflectance spectroscopy has been widely used to obtain TFV-related soil properties (e.g., moisture, texture, crust, etc.), however, no studies have attempted to directly relate soil TFV to their spectral reflectance. The objective of this study was to investigate the relationship between soil TFV and soil reflectance in the visible and near infrared (VIS-NIR, 350-2500 nm) spectral region, and to identify the best range of wavelengths or combinations of wavelengths to predict TFV. Threshold friction velocity of 31 soils, along with their reflectance spectra and texture were measured in the Mojave Desert, California and Moab, Utah. A correlation analysis between TFV and soil reflectance identified a number of isolated, narrow spectral domains that largely fell into two spectral regions, the VIS area (400-700 nm) and the short-wavelength infrared (SWIR) area (1100-2500 nm). A partial least squares regression analysis (PLSR) confirmed the significant bands that were identified by correlation analysis. The PLSR further identified the strong relationship between the first-difference transformation and TFV at several narrow regions around 1400, 1900, and 2200 nm. The use of PLSR allowed us to identify a total of 17 key wavelengths in the investigated spectrum range, which may be used as the optimal spectral settings for estimating TFV in the laboratory and field, or mapping of TFV using airborne/satellite sensors.
NASA Astrophysics Data System (ADS)
Das, Bappa; Sahoo, Rabi N.; Pargal, Sourabh; Krishna, Gopal; Verma, Rakesh; Chinnusamy, Viswanathan; Sehgal, Vinay K.; Gupta, Vinod K.; Dash, Sushanta K.; Swain, Padmini
2018-03-01
In the present investigation, the changes in sucrose, reducing and total sugar content due to water-deficit stress in rice leaves were modeled using visible, near infrared (VNIR) and shortwave infrared (SWIR) spectroscopy. The objectives of the study were to identify the best vegetation indices and suitable multivariate technique based on precise analysis of hyperspectral data (350 to 2500 nm) and sucrose, reducing sugar and total sugar content measured at different stress levels from 16 different rice genotypes. Spectral data analysis was done to identify suitable spectral indices and models for sucrose estimation. Novel spectral indices in near infrared (NIR) range viz. ratio spectral index (RSI) and normalised difference spectral indices (NDSI) sensitive to sucrose, reducing sugar and total sugar content were identified which were subsequently calibrated and validated. The RSI and NDSI models had R2 values of 0.65, 0.71 and 0.67; RPD values of 1.68, 1.95 and 1.66 for sucrose, reducing sugar and total sugar, respectively for validation dataset. Different multivariate spectral models such as artificial neural network (ANN), multivariate adaptive regression splines (MARS), multiple linear regression (MLR), partial least square regression (PLSR), random forest regression (RFR) and support vector machine regression (SVMR) were also evaluated. The best performing multivariate models for sucrose, reducing sugars and total sugars were found to be, MARS, ANN and MARS, respectively with respect to RPD values of 2.08, 2.44, and 1.93. Results indicated that VNIR and SWIR spectroscopy combined with multivariate calibration can be used as a reliable alternative to conventional methods for measurement of sucrose, reducing sugars and total sugars of rice under water-deficit stress as this technique is fast, economic, and noninvasive.
Maurer, Natalie E; Hatta-Sakoda, Beatriz; Pascual-Chagman, Gloria; Rodriguez-Saona, Luis E
2012-09-15
Consumption of omega-3 fatty acids (ω-3's), whether from fish oils, flax or supplements, can protect against cardiovascular disease. Finding plant-based sources of the essential ω-3's could provide a sustainable, renewable and inexpensive source of ω-3's, compared to fish oils. Our objective was to develop a rapid test to characterize and detect adulteration in sacha inchi oils, a Peruvian seed containing higher levels of ω-3's in comparison to other oleaginous seeds. A temperature-controlled ZnSe ATR mid-infrared benchtop and diamond ATR mid-infrared portable handheld spectrometers were used to characterize sacha inchi oil and evaluate its oxidative stability compared to commercial oils. A soft independent model of class analogy (SIMCA) and partial least squares regression (PLSR) analyzed the spectral data. Fatty acid profiles showed that sacha inchi oil (44% linolenic acid) had levels of PUFA similar to those of flax oils. PLSR showed good correlation coefficients (R(2)>0.9) between reference tests and spectra from infrared devices, allowing for rapid determination of fatty acid composition and prediction of oxidative stability. Oils formed distinct clusters, allowing the evaluation of commercial sacha inchi oils from Peruvian markets and showed some prevalence of adulteration. Determining oil adulteration and quality parameters, by using the ATR-MIR portable handheld spectrometer, allowed for portability and ease-of-use, making it a great alternative to traditional testing methods. Copyright © 2012 Elsevier Ltd. All rights reserved.
Gas Chromatography Data Classification Based on Complex Coefficients of an Autoregressive Model
Zhao, Weixiang; Morgan, Joshua T.; Davis, Cristina E.
2008-01-01
This paper introduces autoregressive (AR) modeling as a novel method to classify outputs from gas chromatography (GC). The inverse Fourier transformation was applied to the original sensor data, and then an AR model was applied to transform data to generate AR model complex coefficients. This series of coefficients effectively contains a compressed version of all of the information in the original GC signal output. We applied this method to chromatograms resulting from proliferating bacteria species grown in culture. Three types of neural networks were used to classify the AR coefficients: backward propagating neural network (BPNN), radial basis function-principal component analysismore » (RBF-PCA) approach, and radial basis function-partial least squares regression (RBF-PLSR) approach. This exploratory study demonstrates the feasibility of using complex root coefficient patterns to distinguish various classes of experimental data, such as those from the different bacteria species. This cognition approach also proved to be robust and potentially useful for freeing us from time alignment of GC signals.« less
Open-target sparse sensing of biological agents using DNA microarray
2011-01-01
Background Current biosensors are designed to target and react to specific nucleic acid sequences or structural epitopes. These 'target-specific' platforms require creation of new physical capture reagents when new organisms are targeted. An 'open-target' approach to DNA microarray biosensing is proposed and substantiated using laboratory generated data. The microarray consisted of 12,900 25 bp oligonucleotide capture probes derived from a statistical model trained on randomly selected genomic segments of pathogenic prokaryotic organisms. Open-target detection of organisms was accomplished using a reference library of hybridization patterns for three test organisms whose DNA sequences were not included in the design of the microarray probes. Results A multivariate mathematical model based on the partial least squares regression (PLSR) was developed to detect the presence of three test organisms in mixed samples. When all 12,900 probes were used, the model correctly detected the signature of three test organisms in all mixed samples (mean(R2)) = 0.76, CI = 0.95), with a 6% false positive rate. A sampling algorithm was then developed to sparsely sample the probe space for a minimal number of probes required to capture the hybridization imprints of the test organisms. The PLSR detection model was capable of correctly identifying the presence of the three test organisms in all mixed samples using only 47 probes (mean(R2)) = 0.77, CI = 0.95) with nearly 100% specificity. Conclusions We conceived an 'open-target' approach to biosensing, and hypothesized that a relatively small, non-specifically designed, DNA microarray is capable of identifying the presence of multiple organisms in mixed samples. Coupled with a mathematical model applied to laboratory generated data, and sparse sampling of capture probes, the prototype microarray platform was able to capture the signature of each organism in all mixed samples with high sensitivity and specificity. It was demonstrated that this new approach to biosensing closely follows the principles of sparse sensing. PMID:21801424
Multispectral Imaging for Determination of Astaxanthin Concentration in Salmonids
Dissing, Bjørn S.; Nielsen, Michael E.; Ersbøll, Bjarne K.; Frosch, Stina
2011-01-01
Multispectral imaging has been evaluated for characterization of the concentration of a specific cartenoid pigment; astaxanthin. 59 fillets of rainbow trout, Oncorhynchus mykiss, were filleted and imaged using a rapid multispectral imaging device for quantitative analysis. The multispectral imaging device captures reflection properties in 19 distinct wavelength bands, prior to determination of the true concentration of astaxanthin. The samples ranged from 0.20 to 4.34 g per g fish. A PLSR model was calibrated to predict astaxanthin concentration from novel images, and showed good results with a RMSEP of 0.27. For comparison a similar model were built for normal color images, which yielded a RMSEP of 0.45. The acquisition speed of the multispectral imaging system and the accuracy of the PLSR model obtained suggest this method as a promising technique for rapid in-line estimation of astaxanthin concentration in rainbow trout fillets. PMID:21573000
Zhao, Ming; Nian, Yingqun; Allen, Paul; Downey, Gerard; Kerry, Joseph P; O'Donnell, Colm P
2018-05-01
This work aims to develop a rapid analytical technique to predict beef sensory attributes using Raman spectroscopy (RS) and to investigate correlations between sensory attributes using chemometric analysis. Beef samples (n = 72) were obtained from young dairy bulls (Holstein-Friesian and Jersey×Holstein-Friesian) slaughtered at 15 and 19 months old. Trained sensory panel evaluation and Raman spectral data acquisition were both carried out on the same longissimus thoracis muscles after ageing for 21 days. The best prediction results were obtained using a Raman frequency range of 1300-2800 cm -1 . Prediction performance of partial least squares regression (PLSR) models developed using all samples were moderate to high for all sensory attributes (R 2 CV values of 0.50-0.84 and RMSECV values of 1.31-9.07) and were particularly high for desirable flavour attributes (R 2 CVs of 0.80-0.84, RMSECVs of 4.21-4.65). For PLSR models developed on subsets of beef samples i.e. beef of an identical age or breed type, significant improvements on prediction performances were achieved for overall sensory attributes (R 2 CVs of 0.63-0.89 and RMSECVs of 0.38-6.88 for each breed type; R 2 CVs of 0.52-0.89 and RMSECVs of 0.96-6.36 for each age group). Chemometric analysis revealed strong correlations between sensory attributes. Raman spectroscopy combined with chemometric analysis was demonstrated to have high potential as a rapid and non-destructive technique to predict the sensory quality traits of young dairy bull beef. Copyright © 2018. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Maimaitijiang, Maitiniyazi; Ghulam, Abduwasit; Sidike, Paheding; Hartling, Sean; Maimaitiyiming, Matthew; Peterson, Kyle; Shavers, Ethan; Fishman, Jack; Peterson, Jim; Kadam, Suhas; Burken, Joel; Fritschi, Felix
2017-12-01
Estimating crop biophysical and biochemical parameters with high accuracy at low-cost is imperative for high-throughput phenotyping in precision agriculture. Although fusion of data from multiple sensors is a common application in remote sensing, less is known on the contribution of low-cost RGB, multispectral and thermal sensors to rapid crop phenotyping. This is due to the fact that (1) simultaneous collection of multi-sensor data using satellites are rare and (2) multi-sensor data collected during a single flight have not been accessible until recent developments in Unmanned Aerial Systems (UASs) and UAS-friendly sensors that allow efficient information fusion. The objective of this study was to evaluate the power of high spatial resolution RGB, multispectral and thermal data fusion to estimate soybean (Glycine max) biochemical parameters including chlorophyll content and nitrogen concentration, and biophysical parameters including Leaf Area Index (LAI), above ground fresh and dry biomass. Multiple low-cost sensors integrated on UASs were used to collect RGB, multispectral, and thermal images throughout the growing season at a site established near Columbia, Missouri, USA. From these images, vegetation indices were extracted, a Crop Surface Model (CSM) was advanced, and a model to extract the vegetation fraction was developed. Then, spectral indices/features were combined to model and predict crop biophysical and biochemical parameters using Partial Least Squares Regression (PLSR), Support Vector Regression (SVR), and Extreme Learning Machine based Regression (ELR) techniques. Results showed that: (1) For biochemical variable estimation, multispectral and thermal data fusion provided the best estimate for nitrogen concentration and chlorophyll (Chl) a content (RMSE of 9.9% and 17.1%, respectively) and RGB color information based indices and multispectral data fusion exhibited the largest RMSE 22.6%; the highest accuracy for Chl a + b content estimation was obtained by fusion of information from all three sensors with an RMSE of 11.6%. (2) Among the plant biophysical variables, LAI was best predicted by RGB and thermal data fusion while multispectral and thermal data fusion was found to be best for biomass estimation. (3) For estimation of the above mentioned plant traits of soybean from multi-sensor data fusion, ELR yields promising results compared to PLSR and SVR in this study. This research indicates that fusion of low-cost multiple sensor data within a machine learning framework can provide relatively accurate estimation of plant traits and provide valuable insight for high spatial precision in agriculture and plant stress assessment.
Predicting heavy metal concentrations in soils and plants using field spectrophotometry
NASA Astrophysics Data System (ADS)
Muradyan, V.; Tepanosyan, G.; Asmaryan, Sh.; Sahakyan, L.; Saghatelyan, A.; Warner, T. A.
2017-09-01
Aim of this study is to predict heavy metal (HM) concentrations in soils and plants using field remote sensing methods. The studied sites were an industrial town of Kajaran and city of Yerevan. The research also included sampling of soils and leaves of two tree species exposed to different pollution levels and determination of contents of HM in lab conditions. The obtained spectral values were then collated with contents of HM in Kajaran soils and the tree leaves sampled in Yerevan, and statistical analysis was done. Consequently, Zn and Pb have a negative correlation coefficient (p <0.01) in a 2498 nm spectral range for soils. Pb has a significantly higher correlation at red edge for plants. A regression models and artificial neural network (ANN) for HM prediction were developed. Good results were obtained for the best stress sensitive spectral band ANN (R2 0.9, RPD 2.0), Simple Linear Regression (SLR) and Partial Least Squares Regression (PLSR) (R2 0.7, RPD 1.4) models. Multiple Linear Regression (MLR) model was not applicable to predict Pb and Zn concentrations in soils in this research. Almost all full spectrum PLS models provide good calibration and validation results (RPD>1.4). Full spectrum ANN models are characterized by excellent calibration R2, rRMSE and RPD (0.9; 0.1 and >2.5 respectively). For prediction of Pb and Ni contents in plants SLR and PLS models were used. The latter provide almost the same results. Our findings indicate that it is possible to make coarse direct estimation of HM content in soils and plants using rapid and economic reflectance spectroscopy.
Liu, Fei; Ye, Lanhan; Peng, Jiyu; Song, Kunlin; Shen, Tingting; Zhang, Chu; He, Yong
2018-02-27
Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R 2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where R c 2 and R p 2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice.
Ye, Lanhan; Song, Kunlin; Shen, Tingting
2018-01-01
Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where Rc2 and Rp2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice. PMID:29495445
On the prediction of threshold friction velocity of wind erosion using soil reflectance spectroscopy
Li, Junran; Flagg, Cody B.; Okin, Gregory S.; Painter, Thomas H.; Dintwe, Kebonye; Belnap, Jayne
2015-01-01
Current approaches to estimate threshold friction velocity (TFV) of soil particle movement, including both experimental and empirical methods, suffer from various disadvantages, and they are particularly not effective to estimate TFVs at regional to global scales. Reflectance spectroscopy has been widely used to obtain TFV-related soil properties (e.g., moisture, texture, crust, etc.), however, no studies have attempted to directly relate soil TFV to their spectral reflectance. The objective of this study was to investigate the relationship between soil TFV and soil reflectance in the visible and near infrared (VIS–NIR, 350–2500 nm) spectral region, and to identify the best range of wavelengths or combinations of wavelengths to predict TFV. Threshold friction velocity of 31 soils, along with their reflectance spectra and texture were measured in the Mojave Desert, California and Moab, Utah. A correlation analysis between TFV and soil reflectance identified a number of isolated, narrow spectral domains that largely fell into two spectral regions, the VIS area (400–700 nm) and the short-wavelength infrared (SWIR) area (1100–2500 nm). A partial least squares regression analysis (PLSR) confirmed the significant bands that were identified by correlation analysis. The PLSR further identified the strong relationship between the first-difference transformation and TFV at several narrow regions around 1400, 1900, and 2200 nm. The use of PLSR allowed us to identify a total of 17 key wavelengths in the investigated spectrum range, which may be used as the optimal spectral settings for estimating TFV in the laboratory and field, or mapping of TFV using airborne/satellite sensors.
Ding, Guoyu; Li, Baiqing; Han, Yanqi; Liu, Aina; Zhang, Jingru; Peng, Jiamin; Jiang, Min; Hou, Yuanyuan; Bai, Gang
2016-11-30
For quality control of herbal medicines or functional foods, integral activity evaluation has become more popular in recent studies. The majority of researchers focus on the relationship between chromatography/mass spectroscopy and bioactivity, but the connection with spectrum-activity is easily ignored. In this paper, the near infrared reflection spectra (NIRS) of Flos Chrysanthemi samples were collected as a representative spectrum technology, and corresponding anti-inflammation activities were utilized to illustrate the spectrum-activity study. HPLC/Q-TOF-MS identification and heat map clustering were used to select the quality markers (Q-marker) from five cultivars of Flos Chrysanthemi. Using boxplot analysis and the interval limits of detection (LODs) theory, six crucial markers, namely, chlorogenic acid, 3,5-dicaffeoylquinic acid, 1,5-dicaffeoylquinic acid, luteoloside, apigenin-7-O-β-d-glucoside, and luteolin-7-O-6-malonylglucoside were screened out. Then partial least squares regression (PLSR) calibration models combined with synergy interval partial least squares (siPLS) and 12 different spectral pretreatment methods were developed for the parameters optimization of these Q-markers in Flos Chrysanthemi powder. After comparing the relationship between Q-marker contents and anti-inflammation activity via three machine learning approaches and PLSR, back-propagation neural network (BP-ANN) displayed a more excellent non-linear fitting effect, as its R for new batches reached 0.89. These results indicated that the integrated NIRS and bioactive strategy was suitable for fast quality management in Flos Chrysanthemi, and also applied to other botanical food quality control. Copyright © 2016 Elsevier B.V. All rights reserved.
Spatially dense morphometrics of craniofacial sexual dimorphism in 1-year-olds.
Matthews, Harold; Penington, Tony; Saey, Ine; Halliday, Jane; Muggli, Evelyn; Claes, Peter
2016-10-01
Recent advances in the field of geometric morphometrics allow for powerful statistical hypothesis testing for effects of biological and environmental variables on anatomical shape. This study used partial least-squares regression (PLSR) and the recently developed bootstrapped response-based imputation modelling (BRIM) algorithm to test for sexual dimorphism in the craniofacial shape of 1-year-old humans. We observed a recession of the forehead in boys relative to girls, and differences in the nose, consistent with adult dimorphism. Results also suggest that the degree to which individuals express dimorphic traits is continuous throughout the population. This is also seen in adult dimorphism but in 1-year-olds the amount of overlap between groups is much higher, indicating the strength of dimorphism between sexes is lower. Our results demonstrate early sexual dimorphism that is not attributable to the influx of sex hormones at puberty. This highlights the need to look at very early ontogeny for the origins of sexual dimorphism. We suggest that future work look at potential mediating effects of this early dimorphism on the later impact of puberty. The subtle shape differences we have detected, may also be applied to sexing fossilised crania. A common artefact in 3D images of faces of young children is that they often have their mouths open to varying degrees, introducing variability in the data unrelated to anatomy. We describe two PLSR-based methods of correcting this. These methods may facilitate surgical planning and assessment of young children based on 3D images. © 2016 Anatomical Society.
NASA Astrophysics Data System (ADS)
Harris, C. D.; Profeta, Luisa T. M.; Akpovo, Codjo A.; Johnson, Lewis; Stowe, Ashley C.
2017-05-01
A calibration model was created to illustrate the detection capabilities of laser ablation molecular isotopic spectroscopy (LAMIS) discrimination in isotopic analysis. The sample set contained boric acid pellets that varied in isotopic concentrations of 10B and 11B. Each sample set was interrogated with a Q-switched Nd:YAG ablation laser operating at 532 nm. A minimum of four band heads of the β system B2∑ -> Χ2∑transitions were identified and verified with previous literature on BO molecular emission lines. Isotopic shifts were observed in the spectra for each transition and used as the predictors in the calibration model. The spectra along with their respective 10/11B isotopic ratios were analyzed using Partial Least Squares Regression (PLSR). An IUPAC novel approach for determining a multivariate Limit of Detection (LOD) interval was used to predict the detection of the desired isotopic ratios. The predicted multivariate LOD is dependent on the variation of the instrumental signal and other composites in the calibration model space.
Liu, Jinxia; Cao, Yue; Wang, Qiu; Pan, Wenjuan; Ma, Fei; Liu, Changhong; Chen, Wei; Yang, Jianbo; Zheng, Lei
2016-01-01
Water-injected beef has aroused public concern as a major food-safety issue in meat products. In the study, the potential of multispectral imaging analysis in the visible and near-infrared (405-970 nm) regions was evaluated for identifying water-injected beef. A multispectral vision system was used to acquire images of beef injected with up to 21% content of water, and partial least squares regression (PLSR) algorithm was employed to establish prediction model, leading to quantitative estimations of actual water increase with a correlation coefficient (r) of 0.923. Subsequently, an optimized model was achieved by integrating spectral data with feature information extracted from ordinary RGB data, yielding better predictions (r = 0.946). Moreover, the prediction equation was transferred to each pixel within the images for visualizing the distribution of actual water increase. These results demonstrate the capability of multispectral imaging technology as a rapid and non-destructive tool for the identification of water-injected beef. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kim, Dae-Yong; Cho, Byoung-Kwan
2015-11-01
The quality parameters of the Korean traditional rice wine "Makgeolli" were monitored using Fourier transform near-infrared (FT-NIR) spectroscopy with multivariate statistical analysis (MSA) during fermentation. Alcohol, reducing sugar, and titratable acid were the parameters assessed to determine the quality index of fermentation substrates and products. The acquired spectra were analyzed with partial least squares regression (PLSR). The best prediction model for alcohol was obtained with maximum normalization, showing a coefficient of determination (Rp2) of 0.973 and a standard error of prediction (SEP) of 0.760%. In addition, the best prediction model for reducing sugar was obtained with no data preprocessing, with a Rp2 value of 0.945 and a SEP of 1.233%. The prediction of titratable acidity was best with mean normalization, showing a Rp2 value of 0.882 and a SEP of 0.045%. These results demonstrate that FT-NIR spectroscopy can be used for rapid measurements of quality parameters during Makgeolli fermentation.
Li, Bin; Shin, Hyunjin; Gulbekyan, Georgy; Pustovalova, Olga; Nikolsky, Yuri; Hope, Andrew; Bessarabova, Marina; Schu, Matthew; Kolpakova-Hart, Elona; Merberg, David; Dorner, Andrew; Trepicchio, William L.
2015-01-01
Development of drug responsive biomarkers from pre-clinical data is a critical step in drug discovery, as it enables patient stratification in clinical trial design. Such translational biomarkers can be validated in early clinical trial phases and utilized as a patient inclusion parameter in later stage trials. Here we present a study on building accurate and selective drug sensitivity models for Erlotinib or Sorafenib from pre-clinical in vitro data, followed by validation of individual models on corresponding treatment arms from patient data generated in the BATTLE clinical trial. A Partial Least Squares Regression (PLSR) based modeling framework was designed and implemented, using a special splitting strategy and canonical pathways to capture robust information for model building. Erlotinib and Sorafenib predictive models could be used to identify a sub-group of patients that respond better to the corresponding treatment, and these models are specific to the corresponding drugs. The model derived signature genes reflect each drug’s known mechanism of action. Also, the models predict each drug’s potential cancer indications consistent with clinical trial results from a selection of globally normalized GEO expression datasets. PMID:26107615
Li, Bin; Shin, Hyunjin; Gulbekyan, Georgy; Pustovalova, Olga; Nikolsky, Yuri; Hope, Andrew; Bessarabova, Marina; Schu, Matthew; Kolpakova-Hart, Elona; Merberg, David; Dorner, Andrew; Trepicchio, William L
2015-01-01
Development of drug responsive biomarkers from pre-clinical data is a critical step in drug discovery, as it enables patient stratification in clinical trial design. Such translational biomarkers can be validated in early clinical trial phases and utilized as a patient inclusion parameter in later stage trials. Here we present a study on building accurate and selective drug sensitivity models for Erlotinib or Sorafenib from pre-clinical in vitro data, followed by validation of individual models on corresponding treatment arms from patient data generated in the BATTLE clinical trial. A Partial Least Squares Regression (PLSR) based modeling framework was designed and implemented, using a special splitting strategy and canonical pathways to capture robust information for model building. Erlotinib and Sorafenib predictive models could be used to identify a sub-group of patients that respond better to the corresponding treatment, and these models are specific to the corresponding drugs. The model derived signature genes reflect each drug's known mechanism of action. Also, the models predict each drug's potential cancer indications consistent with clinical trial results from a selection of globally normalized GEO expression datasets.
NASA Astrophysics Data System (ADS)
Vaudour, Emmanuelle; Gilliot, Jean-Marc; Bel, Liliane; Lefevre, Josias; Chehdi, Kacem
2016-04-01
This study was carried out in the framework of the TOSCA-PLEIADES-CO of the French Space Agency and benefited data from the earlier PROSTOCK-Gessol3 project supported by the French Environment and Energy Management Agency (ADEME). It aimed at identifying the potential of airborne hyperspectral visible near-infrared AISA-Eagle data for predicting the topsoil organic carbon (SOC) content of bare cultivated soils over a large peri-urban area (221 km2) with intensive annual crop cultivation and both contrasted soils and SOC contents, located in the western region of Paris, France. Soils comprise hortic or glossic luvisols, calcaric, rendzic cambisols and colluvic cambisols. Airborne AISA-Eagle images (400-1000 nm, 126 bands) with 1 m-resolution were acquired on 17 April 2013 over 13 tracks. Tracks were atmospherically corrected then mosaicked at a 2 m-resolution using a set of 24 synchronous field spectra of bare soils, black and white targets and impervious surfaces. The land use identification system layer (RPG) of 2012 was used to mask non-agricultural areas, then calculation and thresholding of NDVI from an atmospherically corrected SPOT4 image acquired the same day enabled to map agricultural fields with bare soil. A total of 101 sites, which were sampled either at the regional scale or within one field, were identified as bare by means of this map. Predictions were made from the mosaic AISA spectra which were related to SOC contents by means of partial least squares regression (PLSR). Regression robustness was evaluated through a series of 1000 bootstrap data sets of calibration-validation samples, considering those 75 sites outside cloud shadows only, and different sampling strategies for selecting calibration samples. Validation root-mean-square errors (RMSE) were comprised between 3.73 and 4.49 g. Kg-1 and were ~4 g. Kg-1 in median. The most performing models in terms of coefficient of determination (R²) and Residual Prediction Deviation (RPD) values were the calibration models derived either from Kennard-Stone or conditioned Latin Hypercube sampling on smoothed spectra. However, the most generalizable model leading to lowest RMSE value of 3.73 g. Kg-1 at the regional scale and 1.44 g. Kg-1 at the within-field scale and low validation bias was the cross-validated leave-one-out PLSR model constructed with the 28 near-synchronous samples and raw spectra.
Detection and quantification of adulteration in sandalwood oil through near infrared spectroscopy.
Kuriakose, Saji; Thankappan, Xavier; Joe, Hubert; Venkataraman, Venkateswaran
2010-10-01
The confirmation of authenticity of essential oils and the detection of adulteration are problems of increasing importance in the perfumes, pharmaceutical, flavor and fragrance industries. This is especially true for 'value added' products like sandalwood oil. A methodical study is conducted here to demonstrate the potential use of Near Infrared (NIR) spectroscopy along with multivariate calibration models like principal component regression (PCR) and partial least square regression (PLSR) as rapid analytical techniques for the qualitative and quantitative determination of adulterants in sandalwood oil. After suitable pre-processing of the NIR raw spectral data, the models are built-up by cross-validation. The lowest Root Mean Square Error of Cross-Validation and Calibration (RMSECV and RMSEC % v/v) are used as a decision supporting system to fix the optimal number of factors. The coefficient of determination (R(2)) and the Root Mean Square Error of Prediction (RMSEP % v/v) in the prediction sets are used as the evaluation parameters (R(2) = 0.9999 and RMSEP = 0.01355). The overall result leads to the conclusion that NIR spectroscopy with chemometric techniques could be successfully used as a rapid, simple, instant and non-destructive method for the detection of adulterants, even 1% of the low-grade oils, in the high quality form of sandalwood oil.
Estimating Biochemical Parameters of Tea (camellia Sinensis (L.)) Using Hyperspectral Techniques
NASA Astrophysics Data System (ADS)
Bian, M.; Skidmore, A. K.; Schlerf, M.; Liu, Y.; Wang, T.
2012-07-01
Tea (Camellia Sinensis (L.)) is an important economic crop and the market price of tea depends largely on its quality. This research aims to explore the potential of hyperspectral remote sensing on predicting the concentration of biochemical components, namely total tea polyphenols, as indicators of tea quality at canopy scale. Experiments were carried out for tea plants growing in the field and greenhouse. Partial least squares regression (PLSR), which has proven to be the one of the most successful empirical approach, was performed to establish the relationship between reflectance and biochemical concentration across six tea varieties in the field. Moreover, a novel integrated approach involving successive projections algorithms as band selection method and neural networks was developed and applied to detect the concentration of total tea polyphenols for one tea variety, in order to explore and model complex nonlinearity relationships between independent (wavebands) and dependent (biochemicals) variables. The good prediction accuracies (r2 > 0.8 and relative RMSEP < 10 %) achieved for tea plants using both linear (partial lease squares regress) and nonlinear (artificial neural networks) modelling approaches in this study demonstrates the feasibility of using airborne and spaceborne sensors to cover wide areas of tea plantation for in situ monitoring of tea quality cheaply and rapidly.
NASA Astrophysics Data System (ADS)
Li, Wenlong; Cheng, Zhiwei; Wang, Yuefei; Qu, Haibin
2013-01-01
In this paper we describe the strategy used in the development and validation of a near infrared spectroscopy method for the rapid determination of baicalin, chlorogenic acid, ursodeoxycholic acid (UDCA), chenodeoxycholic acid (CDCA), and the total solid contents (TSCs) in the Tanreqing injection. To increase the representativeness of calibration sample set, a concentrating-diluting method was adopted to artificially prepare samples. Partial least square regression (PLSR) was used to establish calibration models, with which the five quality indicators can be determined with satisfied accuracy and repeatability. In addition, the slope/bias (S/B) method was used for the models transfer between two different types of NIR instruments from the same manufacturer, which is contributing to enlarge the application range of the established models. With the presented method, a great deal of time, effort and money can be saved when large amounts of Tanreqing injection samples need to be analyzed in a relatively short period of time, which is of great significance to the traditional Chinese medicine (TCM) industries.
Yulia, Meinilwita
2017-01-01
Asian palm civet coffee or kopi luwak (Indonesian words for coffee and palm civet) is well known as the world's priciest and rarest coffee. To protect the authenticity of luwak coffee and protect consumer from luwak coffee adulteration, it is very important to develop a robust and simple method for determining the adulteration of luwak coffee. In this research, the use of UV-Visible spectra combined with PLSR was evaluated to establish rapid and simple methods for quantification of adulteration in luwak-arabica coffee blend. Several preprocessing methods were tested and the results show that most of the preprocessing spectra were effective in improving the quality of calibration models with the best PLS calibration model selected for Savitzky-Golay smoothing spectra which had the lowest RMSECV (0.039) and highest RPDcal value (4.64). Using this PLS model, a prediction for quantification of luwak content was calculated and resulted in satisfactory prediction performance with high both RPDp and RER values. PMID:28913348
Prediction of warmed-over flavour development in cooked chicken by colorimetric sensor array.
Kim, Su-Yeon; Li, Jinglei; Lim, Na-Ri; Kang, Bo-Sik; Park, Hyun-Jin
2016-11-15
The aim of this study was to develop a simple and rapid method based on colorimetric sensor array (CSA) for evaluation of warmed-over flavour (WOF) in cooked chicken. All samples were classified according to storage time by CSA coupled with principle component analysis (PCA) or hierarchical cluster analysis (HCA). The CSA data were used to establish prediction models with thiobarbituric acid reactive substances (TBARS), pentanal, hexanal, or heptanal associated with WOF by partial least square regression (PLSR). For the TBARS model, the coefficient of determination (rp(2)) was 0.9997 in the prediction range of 0.28-0.69mg/kg. In each of the models for pentanal, hexanal, and heptanal, all rp(2) were higher than 0.960 in the range of 0.58-2.10mg/kg, 5.50-11.69mg/kg, and 0.09-0.16mg/kg, respectively. These results demonstrate that the CSA was able to predict WOF development and to distinguish between each storage time. Copyright © 2016 Elsevier Ltd. All rights reserved.
Hirnschall, Nino; Norrby, Sverker; Weber, Maria; Maedel, Sophie; Amir-Asgari, Sahand; Findl, Oliver
2015-01-01
To include intraoperative measurements of the anterior lens capsule of the aphakic eye into the intraocular lens power calculation (IPC) process and to compare the refractive outcome with conventional IPC formulae. In this prospective study, a prototype operating microscope with an integrated continuous optical coherence tomography (OCT) device (Visante attached to OPMI VISU 200, Carl Zeiss Meditec AG, Germany) was used to measure the anterior lens capsule position after implanting a capsular tension ring (CTR). Optical biometry (intraocular lens (IOL) Master 500) and ACMaster measurements (Carl Zeiss Meditec AG, Germany) were performed before surgery. Autorefraction and subjective refraction were performed 3 months after surgery. Conventional IPC formulae were compared with a new intraoperatively measured anterior chamber depth (ACD) (ACDIntraOP) partial least squares regression (PLSR) model for prediction of the postoperative refractive outcome. In total, 70 eyes of 70 patients were included. Mean axial eye length (AL) was 23.3 mm (range: 20.6-29.5 mm). Predictive power of the intraoperative measurements was found to be slightly better compared to conventional IOL power calculations. Refractive error dependency on AL for Holladay I, HofferQ, SRK/T, Haigis and ACDintraOP PLSR was r(2)=-0.42 (p<0.0001), r(2)=-0.5 (p<0.0001), r(2)=-0.34 (p=0.010), r(2)=-0.28 (p=0.049) and r(2)<0.001 (p=0.866), respectively, ACDIntraOP measurements help to better predict the refractive outcome and could be useful, if implemented in fourth-generation IPC formulae. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Lu, Xiaonan; Rasco, Barbara A.; Jabal, Jamie M. F.; Aston, D. Eric; Lin, Mengshi; Konkel, Michael E.
2011-01-01
Fourier transform infrared (FT-IR) spectroscopy and Raman spectroscopy were used to study the cell injury and inactivation of Campylobacter jejuni from exposure to antioxidants from garlic. C. jejuni was treated with various concentrations of garlic concentrate and garlic-derived organosulfur compounds in growth media and saline at 4, 22, and 35°C. The antimicrobial activities of the diallyl sulfides increased with the number of sulfur atoms (diallyl sulfide < diallyl disulfide < diallyl trisulfide). FT-IR spectroscopy confirmed that organosulfur compounds are responsible for the substantial antimicrobial activity of garlic, much greater than those of garlic phenolic compounds, as indicated by changes in the spectral features of proteins, lipids, and polysaccharides in the bacterial cell membranes. Confocal Raman microscopy (532-nm-gold-particle substrate) and Raman mapping of a single bacterium confirmed the intracellular uptake of sulfur and phenolic components. Scanning electron microscopy (SEM) and transmission electron microscopy (TEM) were employed to verify cell damage. Principal-component analysis (PCA), discriminant function analysis (DFA), and soft independent modeling of class analogs (SIMCA) were performed, and results were cross validated to differentiate bacteria based upon the degree of cell injury. Partial least-squares regression (PLSR) was employed to quantify and predict actual numbers of healthy and injured bacterial cells remaining following treatment. PLSR-based loading plots were investigated to further verify the changes in the cell membrane of C. jejuni treated with organosulfur compounds. We demonstrated that bacterial injury and inactivation could be accurately investigated by complementary infrared and Raman spectroscopies using a chemical-based, “whole-organism fingerprint” with the aid of chemometrics and electron microscopy. PMID:21642409
Toziou, Peristera-Maria; Barmpalexis, Panagiotis; Boukouvala, Paraskevi; Verghese, Susan; Nikolakakis, Ioannis
2018-05-30
Since culture-based methods are costly and time consuming, alternative methods are investigated for the quantification of probiotics in commercial products. In this work ATR- FTIR vibration spectroscopy was applied for the differentiation and quantification of live Lactobacillus (La 5) in mixed populations of live and killed La 5, in the absence and in the presence of enteric polymer Eudragit ® L 100-55. Suspensions of live (La 5_L) and killed in acidic environment bacillus (La 5_K) were prepared and binary mixtures of different percentages were used to grow cell cultures for colony counting and spectral analysis. The increase in the number of colonies with added%La 5_L to the mixture was log-linear (r 2 = 0.926). Differentiation of La 5_L from La 5_K was possible directly from the peak area at 1635 cm -1 (amides of proteins and peptides) and a linear relationship between%La 5_L and peak area in the range 0-95% was obtained. Application of partial least squares regression (PLSR) gave reasonable prediction of%La 5_L (RMSEp = 6.48) in binary mixtures of live and killed La 5 but poor prediction (RMSEp = 11.75) when polymer was added to the La 5 mixture. Application of artificial neural networks (ANNs) improved greatly the predictive ability for%La 5_L both in the absence and in the presence of polymer (RMSEp = 8.11 × 10 -8 for La 5 only mixtures and RMSEp = 8.77 × 10 -8 with added polymer) due to their ability to express in the calibration models more hidden spectral information than PLSR. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Dube, Timothy; Sibanda, Mbulisi; Shoko, Cletah; Mutanga, Onisimo
2017-10-01
Forest stand volume is one of the crucial stand parameters, which influences the ability of these forests to provide ecosystem goods and services. This study thus aimed at examining the potential of integrating multispectral SPOT 5 image, with ancillary data (forest age and rainfall metrics) in estimating stand volume between coppiced and planted Eucalyptus spp. in KwaZulu-Natal, South Africa. To achieve this objective, Partial Least Squares Regression (PLSR) algorithm was used. The PLSR algorithm was implemented by applying three tier analysis stages: stage I: using ancillary data as an independent dataset, stage II: SPOT 5 spectral bands as an independent dataset and stage III: combined SPOT 5 spectral bands and ancillary data. The results of the study showed that the use of an independent ancillary dataset better explained the volume of Eucalyptus spp. growing from coppices (adjusted R2 (R2Adj) = 0.54, RMSEP = 44.08 m3/ha), when compared with those that were planted (R2Adj = 0.43, RMSEP = 53.29 m3/ha). Similar results were also observed when SPOT 5 spectral bands were applied as an independent dataset, whereas improved volume estimates were produced when using combined dataset. For instance, planted Eucalyptus spp. were better predicted adjusted R2 (R2Adj) = 0.77, adjusted R2Adj = 0.59, RMSEP = 36.02 m3/ha) when compared with those that grow from coppices (R2 = 0.76, R2Adj = 0.46, RMSEP = 40.63 m3/ha). Overall, the findings of this study demonstrated the relevance of multi-source data in ecosystems modelling.
Kamruzzaman, Mohammed; Sun, Da-Wen; ElMasry, Gamal; Allen, Paul
2013-01-15
Many studies have been carried out in developing non-destructive technologies for predicting meat adulteration, but there is still no endeavor for non-destructive detection and quantification of adulteration in minced lamb meat. The main goal of this study was to develop and optimize a rapid analytical technique based on near-infrared (NIR) hyperspectral imaging to detect the level of adulteration in minced lamb. Initial investigation was carried out using principal component analysis (PCA) to identify the most potential adulterate in minced lamb. Minced lamb meat samples were then adulterated with minced pork in the range 2-40% (w/w) at approximately 2% increments. Spectral data were used to develop a partial least squares regression (PLSR) model to predict the level of adulteration in minced lamb. Good prediction model was obtained using the whole spectral range (910-1700 nm) with a coefficient of determination (R(2)(cv)) of 0.99 and root-mean-square errors estimated by cross validation (RMSECV) of 1.37%. Four important wavelengths (940, 1067, 1144 and 1217 nm) were selected using weighted regression coefficients (Bw) and a multiple linear regression (MLR) model was then established using these important wavelengths to predict adulteration. The MLR model resulted in a coefficient of determination (R(2)(cv)) of 0.98 and RMSECV of 1.45%. The developed MLR model was then applied to each pixel in the image to obtain prediction maps to visualize the distribution of adulteration of the tested samples. The results demonstrated that the laborious and time-consuming tradition analytical techniques could be replaced by spectral data in order to provide rapid, low cost and non-destructive testing technique for adulterate detection in minced lamb meat. Copyright © 2012 Elsevier B.V. All rights reserved.
Quality Detection of Litchi Stored in Different Environments Using an Electronic Nose
Xu, Sai; Lü, Enli; Lu, Huazhong; Zhou, Zhiyan; Wang, Yu; Yang, Jing; Wang, Yajuan
2016-01-01
The purpose of this paper was to explore the utility of an electronic nose to detect the quality of litchi fruit stored in different environments. In this study, a PEN3 electronic nose was adopted to test the storage time and hardness of litchi that were stored in three different types of environment (room temperature, refrigerator and controlled-atmosphere). After acquiring data about the hardness of the sample and from the electronic nose, linear discriminant analysis (LDA), canonical correlation analysis (CCA), BP neural network (BPNN) and BP neural network-partial least squares regression (BPNN-PLSR), were employed for data processing. The experimental results showed that the hardness of litchi fruits stored in all three environments decreased during storage. The litchi stored at room temperature had the fastest rate of decrease in hardness, followed by those stored in a refrigerator environment and under a controlled-atmosphere. LDA has a poor ability to classify the storage time of the three environments in which litchi was stored. BPNN can effectively recognize the storage time of litchi stored in a refrigerator and a controlled-atmosphere environment. However, the BPNN classification of the effect of room temperature storage on litchi was poor. CCA results show a significant correlation between electronic nose data and hardness data under the room temperature, and the correlation is more obvious for those under the refrigerator environment and controlled-atmosphere environment. The BPNN-PLSR can effectively predict the hardness of litchi under refrigerator storage conditions and a controlled-atmosphere environment. However, the BPNN-PLSR prediction of the effect of room temperature storage on litchi and global environment storage on litchi were poor. Thus, this experiment proved that an electronic nose can detect the quality of litchi under refrigeratored storage and a controlled-atmosphere environment. These results provide a useful reference for future studies on nondestructive and intelligent monitoring of fruit quality. PMID:27338391
Liu, Xue-song; Sun, Fen-fang; Jin, Ye; Wu, Yong-jiang; Gu, Zhi-xin; Zhu, Li; Yan, Dong-lan
2015-12-01
A novel method was developed for the rapid determination of multi-indicators in corni fructus by means of near infrared (NIR) spectroscopy. Particle swarm optimization (PSO) based least squares support vector machine was investigated to increase the levels of quality control. The calibration models of moisture, extractum, morroniside and loganin were established using the PSO-LS-SVM algorithm. The performance of PSO-LS-SVM models was compared with partial least squares regression (PLSR) and back propagation artificial neural network (BP-ANN). The calibration and validation results of PSO-LS-SVM were superior to both PLS and BP-ANN. For PSO-LS-SVM models, the correlation coefficients (r) of calibrations were all above 0.942. The optimal prediction results were also achieved by PSO-LS-SVM models with the RMSEP (root mean square error of prediction) and RSEP (relative standard errors of prediction) less than 1.176 and 15.5% respectively. The results suggest that PSO-LS-SVM algorithm has a good model performance and high prediction accuracy. NIR has a potential value for rapid determination of multi-indicators in Corni Fructus.
Lin, Ping; Chen, Yong-ming; Yao, Zhi-lei
2015-11-01
A novel method of combination of the chemometrics and the hyperspectral imaging techniques was presented to detect the temperatures of Ethylene-Vinyl Acetate copolymer (EVA) films in photovoltaic cells during the thermal encapsulation process. Four varieties of the EVA films which had been heated at the temperatures of 128, 132, 142 and 148 °C during the photovoltaic cells production process were used for investigation in this paper. These copolymer encapsulation films were firstly scanned by the hyperspectral imaging equipment (Spectral Imaging Ltd. Oulu, Finland). The scanning band range of hyperspectral equipemnt was set between 904.58 and 1700.01 nm. The hyperspectral dataset of copolymer films was randomly divided into two parts for the training and test purpose. Each type of the training set and test set contained 90 and 10 instances, respectively. The obtained hyperspectral images of EVA films were dealt with by using the ENVI (Exelis Visual Information Solutions, USA) software. The size of region of interest (ROI) of each obtained hyperspectral image of EVA film was set as 150 x 150 pixels. The average of reflectance hyper spectra of all the pixels in the ROI was used as the characteristic curve to represent the instance. There kinds of chemometrics methods including partial least squares regression (PLSR), multi-class support vector machine (SVM) and large margin nearest neighbor (LMNN) were used to correlate the characteristic hyper spectra with the encapsulation temperatures of of copolymer films. The plot of weighted regression coefficients illustrated that both bands of short- and long-wave near infrared hyperspectral data contributed to enhancing the prediction accuracy of the forecast model. Because the attained reflectance hyperspectral data of EVA materials displayed the strong nonlinearity, the prediction performance of linear modeling method of PLSR declined and the prediction precision only reached to 95%. The kernel-based forecast models were introduced to eliminate the impact of nonlinear hyperspectral data to some extent through mapping the original nonlinear hyperspectral data to the high dimensional linear feature space, so the relationship between the nonlinear hyperspectral data and the encapsulation temperatures of EVA films was fully disclosed finally. Compared with the prediction results of three proposed models, the prediction performance of LMNN was superior to the other two, whose final recognition accuracy achieved 100%. The results indicated that the methods of combination of LMNN model with the hyperspectral imaging techniques was the best one for accurately and rapidly determining the encapsulation temperatures of EVA films of photovoltaic cells. In addition, this paper had created the ideal conditions for automatically monitoring and effectively controlling the encapsulation temperatures of EVA films in the photovoltaic cells production process.
Wang, Pei; Zhang, Hui; Yang, Hailong; Nie, Lei; Zang, Hengchang
2015-02-25
Near-infrared (NIR) spectroscopy has been developed into an indispensable tool for both academic research and industrial quality control in a wide field of applications. The feasibility of NIR spectroscopy to monitor the concentration of puerarin, daidzin, daidzein and total isoflavonoid (TIF) during the extraction process of kudzu (Pueraria lobata) was verified in this work. NIR spectra were collected in transmission mode and pretreated with smoothing and derivative. Partial least square regression (PLSR) was used to establish calibration models. Three different variable selection methods, including correlation coefficient method, interval partial least squares (iPLS), and successive projections algorithm (SPA) were performed and compared with models based on all of the variables. The results showed that the approach was very efficient and environmentally friendly for rapid determination of the four quality indices (QIs) in the kudzu extraction process. This method established may have the potential to be used as a process analytical technological (PAT) tool in the future. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kusumo, B. H.; Sukartono, S.; Bustan, B.
2018-02-01
Measuring soil organic carbon (C) using conventional analysis is tedious procedure, time consuming and expensive. It is needed simple procedure which is cheap and saves time. Near infrared technology offers rapid procedure as it works based on the soil spectral reflectance and without any chemicals. The aim of this research is to test whether this technology able to rapidly measure soil organic C in rice paddy field. Soil samples were collected from rice paddy field of Lombok Island Indonesia, and the coordinates of the samples were recorded. Parts of the samples were analysed using conventional analysis (Walkley and Black) and some other parts were scanned using near infrared spectroscopy (NIRS) for soil spectral collection. Partial Least Square Regression (PLSR) Models were developed using data of soil C analysed using conventional analysis and data from soil spectral reflectance. The models were moderately successful to measure soil C in rice paddy field of Lombok Island. This shows that the NIR technology can be further used to monitor the C change in rice paddy soil.
NASA Astrophysics Data System (ADS)
Gavilan, C.; Grunwald, S.; Quiroz, R.
2017-12-01
The Andes represent the largest and highest mountain range in the tropics and is considered an important reserve of biodiversity, water provision and soil organic carbon (SOC) stocks. Nevertheless, limited attention has been given to estimate these stocks due to the lack of recent soil data, the poor accessibility and the wide range of coexistent ecosystems. In addition, conventional methods to determine SOC are usually time consuming and expensive to use in large-scale studies, hindering the possibility to have an accurate SOC assessment in the region. Proximal soil sensing techniques, such as visible near infrared (VNIR) and mid infrared (MIR) spectroscopy, have proven to be useful as an alternative to conventional methods for characterizing SOC but have not been tested in Andean soils. The aim of this study was to evaluate the potential of using VNIR and MIR spectroscopy to predict SOC content in the Central Andean region, using multivariate methods. Three study areas were selected across the Peruvian Central Andes. A total of 400 topsoil samples (0-30 cm) were collected and analyzed for SOC. The VNIR and MIR reflectance of the soil samples was measured in the laboratory. Three modeling approaches: Partial least squares regression (PLSR), random forest (RF) and support vector machine (SVM) were used to predict SOC from VNIR and MIR spectra in the study areas. The data was preprocessed in order to minimize the noise and optimize the accuracy of predictions. The models, for each study area, were assessed using 10-fold cross validation. Independent validation was implemented in the whole dataset (400 observations) by splitting it into calibration (70 %) and validation (30%) sets. Overall, the results indicate potential for both VNIR and MIR spectra to predict SOC content in the Andean soils. SOC content predictions from MIR spectra outperformed those from VNIR spectra. The evaluation of model performance shows that RF and SVM provide more accurate SOC predictions compared to PLSR. These findings suggest that integrating VNIR and MIR spectroscopy with machine learning algorithms constitutes a promising approach for assessing SOC content in high-Andean ecosystems.
Lykomitros, Dimitrios; Fogliano, Vincenzo; Capuano, Edoardo
2018-04-01
Roasted peanuts are a popular snack in Europe, but their drivers of liking and perceived freshness have not been previously studied with European consumers. Consumer research to date has been focused on U.S. consumers, and only on specific peanut cultivars. In this study, 26 unique samples were produced from peanuts of different types, cultivars, origins, and with different process technologies (including baking, frying, and maceration). The peanut samples were subjected to sensory (expert panel, Spectrum TM ) and instrumental analysis (color, headspace volatiles, sugar profile, large deformation compression tests, and graded by size) and were hedonically rated by consumers in The Netherlands, Spain, and Turkey (n > 200 each). Preference Mapping (PREFMAP) on mean liking models revealed that the drivers of liking are similar across the three countries. Sweet taste, roasted peanut, dark roast, and sweet aromas and the color b * value were related to increased liking, and raw bean aroma and bitter taste with decreased liking. Further partial least square regression (PLSR) modeling of liking and perceived freshness against instrumental attributes showed that the color coordinates in combination with sucrose content and a select few headspace volatiles were strong predictors of both preference and perceived freshness. Finally, additional PLSR models focusing on the headspace volatiles only showed that liking and ''fresh'' attributes were correlated with the presence of several pyrroles in the volatile fraction, and inversely related to ''stale'' and to hexanal and 2-heptanone. This study provides insight into which flavor, taste, and appearance attributes drive liking and disliking of roasted peanuts for European consumers. The drivers are linked back to analytical attributes that can be measured instrumentally, thereby reducing the reliance on costly sensory panels. Particular emphasis is placed on color as a predictor of preference, because of the low cost of the measuring equipment, it is available to even smaller producers. In addition to preference, the study also examines whether product attributes that drive perceived freshness exist. The results can be used to design products with high acceptability across several countries within Europe. © 2018 Institute of Food Technologists®.
NASA Astrophysics Data System (ADS)
Chen, Sanming; Lin, Gang; Yin, Xianyang; Sun, Xiaolin; Xu, Jiasheng; Liu, Zhiying
2015-12-01
Sedimentary manganese deposits widely distribute in North Guangxi with the characteristic existing Celosia argentea. Celosia argentea is a kind of plant which has a strong ability to enrich manganese. In order to study the relationship between the hyperspectral characteristics of Celosia argentea and the concentration effect of manganese in the soil, we used soil of B layer in mining area, background soil and the soil adding reagent of MnCl4 to make up experimental sample soil with 10 levels Manganese content for the same batch Celosia argentea. The levels are 0mg/kg, 4500mg/kg, 9000mg/kg, 13500mg/kg, 18000mg/kg, 18020mg/kg, 18040mg/kg, 18080mg/kg, 18160mg/kg. ASD FieldSpec-4 has been used to measure the abnormal spectrums of these Celosia argentea through a whole growth cycle. After pretreating the spectral data, we used Successive Projections Algorithm (SPA) to extract the characteristic variables for extracting 1603 bands into 8 bands. Finally, the relationship between the spectral variables and the concentration of manganese was predicted by the Model of Partial Least Squares Regression (PLSR). The results show that the correlation coefficient-r2 are 0.8714 and 0.9141 in two sets of data. The prediction results are satisfactory, but the front 5 groups are closer to the regression line than the last 5 groups.
Cheng, Jun-Hu; Sun, Da-Wen; Pu, Hong-Bin; Wang, Qi-Jun; Chen, Yu-Nan
2015-03-15
The suitability of hyperspectral imaging technique (400-1000 nm) was investigated to determine the thiobarbituric acid (TBA) value for monitoring lipid oxidation in fish fillets during cold storage at 4°C for 0, 2, 5, and 8 days. The PLSR calibration model was established with full spectral region between the spectral data extracted from the hyperspectral images and the reference TBA values and showed good performance for predicting TBA value with determination coefficients (R(2)P) of 0.8325 and root-mean-square errors of prediction (RMSEP) of 0.1172 mg MDA/kg flesh. Two simplified PLSR and MLR models were built and compared using the selected ten most important wavelengths. The optimised MLR model yielded satisfactory results with R(2)P of 0.8395 and RMSEP of 0.1147 mg MDA/kg flesh, which was used to visualise the TBA values distribution in fish fillets. The whole results confirmed that using hyperspectral imaging technique as a rapid and non-destructive tool is suitable for the determination of TBA values for monitoring lipid oxidation and evaluation of fish freshness. Copyright © 2014 Elsevier Ltd. All rights reserved.
Multivariate analysis relating oil shale geochemical properties to NMR relaxometry
Birdwell, Justin E.; Washburn, Kathryn E.
2015-01-01
Low-field nuclear magnetic resonance (NMR) relaxometry has been used to provide insight into shale composition by separating relaxation responses from the various hydrogen-bearing phases present in shales in a noninvasive way. Previous low-field NMR work using solid-echo methods provided qualitative information on organic constituents associated with raw and pyrolyzed oil shale samples, but uncertainty in the interpretation of longitudinal-transverse (T1–T2) relaxometry correlation results indicated further study was required. Qualitative confirmation of peaks attributed to kerogen in oil shale was achieved by comparing T1–T2 correlation measurements made on oil shale samples to measurements made on kerogen isolated from those shales. Quantitative relationships between T1–T2 correlation data and organic geochemical properties of raw and pyrolyzed oil shales were determined using partial least-squares regression (PLSR). Relaxometry results were also compared to infrared spectra, and the results not only provided further confidence in the organic matter peak interpretations but also confirmed attribution of T1–T2 peaks to clay hydroxyls. In addition, PLSR analysis was applied to correlate relaxometry data to trace element concentrations with good success. The results of this work show that NMR relaxometry measurements using the solid-echo approach produce T1–T2 peak distributions that correlate well with geochemical properties of raw and pyrolyzed oil shales.
[Determination of Carbaryl in Rice by Using FT Far-IR and THz-TDS Techniques].
Sun, Tong; Zhang, Zhuo-yong; Xiang, Yu-hong; Zhu, Ruo-hua
2016-02-01
Determination of carbaryl in rice by using Fourier transform far-infrared (FT- Far-IR) and terahertz time-domain spectroscopy (THz-TDS) combined with chemometrics was studied and the spectral characteristics of carbaryl in terahertz region was investigated. Samples were prepared by mixing carbaryl at different amounts with rice powder, and then a 13 mm diameter, and about 1 mm thick pellet with polyethylene (PE) as matrix was compressed under the pressure of 5-7 tons. Terahertz time domain spectra of the pellets were measured at 0.5~1.5 THz, and the absorption spectra at 1.6. 3 THz were acquired with Fourier transform far-IR spectroscopy. The method of sample preparation is so simple that it does not need separation and enrichment. The absorption peaks in the frequency range of 1.8-6.3 THz have been found at 3.2 and 5.2 THz by Far-IR. There are several weak absorption peaks in the range of 0.5-1.5 THz by THz-TDS. These two kinds of characteristic absorption spectra were randomly divided into calibration set and prediction set by leave-N-out cross-validation, respectively. Finally, the partial least squares regression (PLSR) method was used to establish two quantitative analysis models. The root mean square error (RMSECV), the root mean square errors of prediction (RMSEP) and the correlation coefficient of the prediction are used as a basis for the model of performance evaluation. For the R,, a higher value is better; for the RMSEC and RMSEP, lower is better. The obtained results demonstrated that the predictive accuracy of. the two models with PLSR method were satisfactory. For the FT-Far-IR model, the correlation between actual and predicted values of prediction samples (Rv) was 0.99. The root mean square error of prediction set (RMSEP) was 0.008 6, and for calibration set (RMSECV) was 0.007 7. For the THz-TDS model, R. was 0. 98, RMSEP was 0.004 4, and RMSECV was 0.002 5. Results proved that the technology of FT-Far-IR and THz- TDS can be a feasible tool for quantitative determination of carbaryl in rice. This paper provides a new method for the quantitative determination pesticide in other grain samples.
NASA Astrophysics Data System (ADS)
McMillan, N. J.; Chavez, A.; Chanover, N.; Voelz, D.; Uckert, K.; Tawalbeh, R.; Gariano, J.; Dragulin, I.; Xiao, X.; Hull, R.
2014-12-01
Rapid, in-situ methods for identification of biologic and non-biologic mineral precipitation sites permit mapping of biological hot spots. Two portable spectrometers, Laser-Induced Breakdown Spectroscopy (LIBS) and Acoustic-Optic Tunable Filter Reflectance Spectroscopy (AOTFRS) were used to differentiate between bacterially influenced and inorganically precipitated calcite specimens from Fort Stanton Cave, NM, USA. LIBS collects light emitted from the decay of excited electrons in a laser ablation plasma; the spectrum is a chemical fingerprint of the analyte. AOTFRS collects light reflected from the surface of a specimen and provides structural information about the material (i.e., the presence of O-H bonds). These orthogonal data sets provide a rigorous method to determine the origin of calcite in cave deposits. This study used a set of 48 calcite samples collected from Fort Stanton cave. Samples were examined in SEM for the presence of biologic markers; these data were used to separate the samples into biologic and non-biologic groups. Spectra were modeled using the multivariate technique Partial Least Squares Regression (PLSR). Half of the spectra were used to train a PLSR model, in which biologic samples were assigned to the independent variable "0" and non-biologic samples were assigned the variable "1". Values of the independent variable were calculated for each of the training samples, which were close to 0 for the biologic samples (-0.09 - 0.23) and close to 1 for the non-biologic samples (0.57 - 1.14). A Value of Apparent Distinction (VAD) of 0.55 was used to numerically distinguish between the two groups; any sample with an independent variable value < 0.55 was classified as having a biologic origin; a sample with a value > 0.55 was determined to be non-biologic in origin. After the model was trained, independent variable values for the remaining half of the samples were calculated. Biologic or non-biologic origin was assigned by comparison to the VAD. Using LIBS data alone, the model has a 92% success rate, correctly identifying 23 of 25 samples. Modeling of AOTFRS spectra and the combined LIBS-AOTFRS data set have similar success rates. This study demonstrates that rapid, portable LIBS and AOTFRS instruments can be used to map the spatial distribution of biologic precipitation in caves.
Determination of persimmon leaf chloride contents using near-infrared spectroscopy (NIRS).
de Paz, José Miguel; Visconti, Fernando; Chiaravalle, Mara; Quiñones, Ana
2016-05-01
Early diagnosis of specific chloride toxicity in persimmon trees requires the reliable and fast determination of the leaf chloride content, which is usually performed by means of a cumbersome, expensive and time-consuming wet analysis. A methodology has been developed in this study as an alternative to determine chloride in persimmon leaves using near-infrared spectroscopy (NIRS) in combination with multivariate calibration techniques. Based on a training dataset of 134 samples, a predictive model was developed from their NIR spectral data. For modelling, the partial least squares regression (PLSR) method was used. The best model was obtained with the first derivative of the apparent absorbance and using just 10 latent components. In the subsequent external validation carried out with 35 external data this model reached r(2) = 0.93, RMSE = 0.16% and RPD = 3.6, with standard error of 0.026% and bias of -0.05%. From these results, the model based on NIR spectral readings can be used for speeding up the laboratory determination of chloride in persimmon leaves with only a modest loss of precision. The intermolecular interaction between chloride ions and the peptide bonds in leaf proteins through hydrogen bonding, i.e. N-H···Cl, explains the ability for chloride determinations on the basis of NIR spectra.
Elsohaby, Ibrahim; Windeyer, M Claire; Haines, Deborah M; Homerosky, Elizabeth R; Pearson, Jennifer M; McClure, J Trenton; Keefe, Greg P
2018-03-06
The objective of this study was to explore the potential of transmission infrared (TIR) spectroscopy in combination with partial least squares regression (PLSR) for quantification of dairy and beef cow colostral immunoglobulin G (IgG) concentration and assessment of colostrum quality. A total of 430 colostrum samples were collected from dairy (n = 235) and beef (n = 195) cows and tested by a radial immunodiffusion (RID) assay and TIR spectroscopy. Colostral IgG concentrations obtained by the RID assay were linked to the preprocessed spectra and divided into combined and prediction data sets. Three PLSR calibration models were built: one for the dairy cow colostrum only, the second for beef cow colostrum only, and the third for the merged dairy and beef cow colostrum. The predictive performance of each model was evaluated separately using the independent prediction data set. The Pearson correlation coefficients between IgG concentrations as determined by the TIR-based assay and the RID assay were 0.84 for dairy cow colostrum, 0.88 for beef cow colostrum, and 0.92 for the merged set of dairy and beef cow colostrum. The average of the differences between colostral IgG concentrations obtained by the RID- and TIR-based assays were -3.5, 2.7, and 1.4 g/L for dairy, beef, and merged colostrum samples, respectively. Further, the average relative error of the colostral IgG predicted by the TIR spectroscopy from the RID assay was 5% for dairy cow, 1.2% for beef cow, and 0.8% for the merged data set. The average intra-assay CV% of the IgG concentration predicted by the TIR-based method were 3.2%, 2.5%, and 6.9% for dairy cow, beef cow, and merged data set, respectively.The utility of TIR method for assessment of colostrum quality was evaluated using the entire data set and showed that TIR spectroscopy accurately identified the quality status of 91% of dairy cow colostrum, 95% of beef cow colostrum, and 89% and 93% of the merged dairy and beef cow colostrum samples, respectively. The results showed that TIR spectroscopy demonstrates potential as a simple, rapid, and cost-efficient method for use as an estimate of IgG concentration in dairy and beef cow colostrum samples and assessment of colostrum quality. The results also showed that merging the dairy and beef cow colostrum sample data sets improved the predictive ability of the TIR spectroscopy.
Sensory characteristics and consumer preference for chicken meat in Guinea.
Sow, T M A; Grongnet, J F
2010-10-01
This study identified the sensory characteristics and consumer preference for chicken meat in Guinea. Five chicken samples [live village chicken, live broiler, live spent laying hen, ready-to-cook broiler, and ready-to-cook broiler (imported)] bought from different locations were assessed by 10 trained panelists using 19 sensory attributes. The ANOVA results showed that 3 chicken appearance attributes (brown, yellow, and white), 5 chicken odor attributes (oily, intense, medicine smell, roasted, and mouth persistent), 3 chicken flavor attributes (sweet, bitter, and astringent), and 8 chicken texture attributes (firm, tender, juicy, chew, smooth, springy, hard, and fibrous) were significantly discriminating between the chicken samples (P<0.05). Principal component analysis of the sensory data showed that the first 2 principal components explained 84% of the sensory data variance. The principal component analysis results showed that the live village chicken, the live spent laying hen, and the ready-to-cook broiler (imported) were very well represented and clearly distinguished from the live broiler and the ready-to-cook broiler. One hundred twenty consumers expressed their preferences for the chicken samples using a 5-point Likert scale. The hierarchical cluster analysis of the preference data identified 4 homogenous consumer clusters. The hierarchical cluster analysis results showed that the live village chicken was the most preferred chicken sample, whereas the ready-to-cook broiler was the least preferred one. The partial least squares regression (PLSR) type 1 showed that 72% of the sensory data for the first 2 principal components explained 83% of the chicken preference. The PLSR1 identified that the sensory characteristics juicy, oily, sweet, hard, mouth persistent, and yellow were the most relevant sensory drivers of the Guinean chicken preference. The PLSR2 (with multiple responses) identified the relationship between the chicken samples, their sensory attributes, and the consumer clusters. Our results showed that there was not a chicken category that was exclusively preferred from the other chicken samples and therefore highlight the existence of place for development of all chicken categories in the local market.
Balabin, Roman M; Smirnov, Sergey V
2011-04-29
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice. Copyright © 2011 Elsevier B.V. All rights reserved.
Garriga, Miguel; Romero-Bravo, Sebastián; Estrada, Félix; Escobar, Alejandro; Matus, Iván A.; del Pozo, Alejandro; Astudillo, Cesar A.; Lobos, Gustavo A.
2017-01-01
Phenotyping, via remote and proximal sensing techniques, of the agronomic and physiological traits associated with yield potential and drought adaptation could contribute to improvements in breeding programs. In the present study, 384 genotypes of wheat (Triticum aestivum L.) were tested under fully irrigated (FI) and water stress (WS) conditions. The following traits were evaluated and assessed via spectral reflectance: Grain yield (GY), spikes per square meter (SM2), kernels per spike (KPS), thousand-kernel weight (TKW), chlorophyll content (SPAD), stem water soluble carbohydrate concentration and content (WSC and WSCC, respectively), carbon isotope discrimination (Δ13C), and leaf area index (LAI). The performances of spectral reflectance indices (SRIs), four regression algorithms (PCR, PLSR, ridge regression RR, and SVR), and three classification methods (PCA-LDA, PLS-DA, and kNN) were evaluated for the prediction of each trait. For the classification approaches, two classes were established for each trait: The lower 80% of the trait variability range (Class 1) and the remaining 20% (Class 2 or elite genotypes). Both the SRIs and regression methods performed better when data from FI and WS were combined. The traits that were best estimated by SRIs and regression methods were GY and Δ13C. For most traits and conditions, the estimations provided by RR and SVR were the same, or better than, those provided by the SRIs. PLS-DA showed the best performance among the categorical methods and, unlike the SRI and regression models, most traits were relatively well-classified within a specific hydric condition (FI or WS), proving that classification approach is an effective tool to be explored in future studies related to genotype selection. PMID:28337210
Garriga, Miguel; Romero-Bravo, Sebastián; Estrada, Félix; Escobar, Alejandro; Matus, Iván A; Del Pozo, Alejandro; Astudillo, Cesar A; Lobos, Gustavo A
2017-01-01
Phenotyping, via remote and proximal sensing techniques, of the agronomic and physiological traits associated with yield potential and drought adaptation could contribute to improvements in breeding programs. In the present study, 384 genotypes of wheat ( Triticum aestivum L.) were tested under fully irrigated (FI) and water stress (WS) conditions. The following traits were evaluated and assessed via spectral reflectance: Grain yield (GY), spikes per square meter (SM2), kernels per spike (KPS), thousand-kernel weight (TKW), chlorophyll content (SPAD), stem water soluble carbohydrate concentration and content (WSC and WSCC, respectively), carbon isotope discrimination (Δ 13 C), and leaf area index (LAI). The performances of spectral reflectance indices (SRIs), four regression algorithms (PCR, PLSR, ridge regression RR, and SVR), and three classification methods (PCA-LDA, PLS-DA, and k NN) were evaluated for the prediction of each trait. For the classification approaches, two classes were established for each trait: The lower 80% of the trait variability range (Class 1) and the remaining 20% (Class 2 or elite genotypes). Both the SRIs and regression methods performed better when data from FI and WS were combined. The traits that were best estimated by SRIs and regression methods were GY and Δ 13 C. For most traits and conditions, the estimations provided by RR and SVR were the same, or better than, those provided by the SRIs. PLS-DA showed the best performance among the categorical methods and, unlike the SRI and regression models, most traits were relatively well-classified within a specific hydric condition (FI or WS), proving that classification approach is an effective tool to be explored in future studies related to genotype selection.
Wang, Meng; Ellsworth, Patrick Z; Zhou, Jianfeng; Cousins, Asaph B; Sankaran, Sindhuja
2016-05-15
Water limitations decrease stomatal conductance (g(s)) and, in turn, photosynthetic rate (A(net)), resulting in decreased crop productivity. The current techniques for evaluating these physiological responses are limited to leaf-level measures acquired by measuring leaf-level gas exchange. In this regard, proximal sensing techniques can be a useful tool in studying plant biology as they can be used to acquire plant-level measures in a high-throughput manner. However, to confidently utilize the proximal sensing technique for high-throughput physiological monitoring, it is important to assess the relationship between plant physiological parameters and the sensor data. Therefore, in this study, the application of rapid sensing techniques based on thermal imaging and visual-near infrared spectroscopy for assessing water-use efficiency (WUE) in foxtail millet (Setaria italica (L.) P. Beauv) was evaluated. The visible-near infrared spectral reflectance (350-2500 nm) and thermal (7.5-14 µm) data were collected at regular intervals from well-watered and drought-stressed plants in combination with other leaf physiological parameters (transpiration rate-E, A(net), g(s), leaf carbon isotopic signature-δ(13)C(leaf), WUE). Partial least squares regression (PLSR) analysis was used to predict leaf physiological measures based on the spectral data. The PLSR modeling on the hyperspectral data yielded accurate and precise estimates of leaf E, gs, δ(13)C(leaf), and WUE with coefficient of determination in a range of 0.85-0.91. Additionally, significant differences in average leaf temperatures (~1°C) measured with a thermal camera were observed between well-watered plants and drought-stressed plants. In summary, the visible-near infrared reflectance data, and thermal images can be used as a potential rapid technique for evaluating plant physiological responses such as WUE. Copyright © 2016 Elsevier B.V. All rights reserved.
Aznar, Margarita; López, Ricardo; Cacho, Juan; Ferreira, Vicente
2003-04-23
Partial least squares regression (PLSR) models able to predict some of the wine aroma nuances from its chemical composition have been developed. The aromatic sensory characteristics of 57 Spanish aged red wines were determined by 51 experts from the wine industry. The individual descriptions given by the experts were recorded, and the frequency with which a sensory term was used to define a given wine was taken as a measurement of its intensity. The aromatic chemical composition of the wines was determined by already published gas chromatography (GC)-flame ionization detector and GC-mass spectrometry methods. In the whole, 69 odorants were analyzed. Both matrixes, the sensory and chemical data, were simplified by grouping and rearranging correlated sensory terms or chemical compounds and by the exclusion of secondary aroma terms or of weak aroma chemicals. Finally, models were developed for 18 sensory terms and 27 chemicals or groups of chemicals. Satisfactory models, explaining more than 45% of the original variance, could be found for nine of the most important sensory terms (wood-vanillin-cinnamon, animal-leather-phenolic, toasted-coffee, old wood-reduction, vegetal-pepper, raisin-flowery, sweet-candy-cacao, fruity, and berry fruit). For this set of terms, the correlation coefficients between the measured and predicted Y (determined by cross-validation) ranged from 0.62 to 0.81. Models confirmed the existence of complex multivariate relationships between chemicals and odors. In general, pleasant descriptors were positively correlated to chemicals with pleasant aroma, such as vanillin, beta damascenone, or (E)-beta-methyl-gamma-octalactone, and negatively correlated to compounds showing less favorable odor properties, such as 4-ethyl and vinyl phenols, 3-(methylthio)-1-propanol, or phenylacetaldehyde.
Habibi Najafi, Mohammad B; Pourfarzad, Amir; Zahedi, Hoda; Ahmadian-Kouchaksaraie, Zahra; Haddad Khodaparast, Mohammad H
2016-01-01
The aim of this work was to study the effects of a novel sourdough system prepared by wheat flour supplemented by combination of pulverized date seed, Lactobacillus plantarum, and/or Lactobacillus brevis as well as Saccharomyces cerevisiae on the sourdough characteristics, quality, sensory, texture, shelf life and image properties of Barbari flat bread. The highest sourdough acidity and bread specific volume was obtained with co-culture of Lb. plantarum + Lb. brevis + S. cerevisiae. The results suggest that fermentation is a potential bioprocessing technology for improving sensory aspects of bread supplemented with pulverized date seed, as a dietary fiber resource. Texture analysis of bread samples during 7 days of storage indicated that the presence of pulverized date seed in sourdough was able to diminish bread staling. The interaction of baker's yeast and lactic acid bacteria (LAB) has led to increase the particle average size of bread crumb and decrease the area fraction than the LAB samples. It was observed that all treatments of sourdough Barbari breads had higher cell wall thickness than the control Barbari bread. Avrami non-linear regression equation was chosen as useful mathematical model to properly study bread hardening kinetics. In addition, principal component analysis (PCA) allowed discriminating among sourdough and bread specialties. Partial least squares regression (PLSR) models were applied to determine the relationships between sensory and instrumental data.
Wang, Yan-Cang; Gu, Xiao-He; Zhu, Jin-Shan; Long, Hui-Ling; Xu, Peng; Liao, Qin-Hong
2014-01-01
The present study aims to assess the feasibility of multi-spectral data in monitoring soil organic matter content. The data source comes from hyperspectral measured under laboratory condition, and simulated multi-spectral data from the hyperspectral. According to the reflectance response functions of Landsat TM and HJ-CCD (the Environment and Disaster Reduction Small Satellites, HJ), the hyperspectra were resampled for the corresponding bands of multi-spectral sensors. The correlation between hyperspectral, simulated reflectance spectra and organic matter content was calculated, and used to extract the sensitive bands of the organic matter in the north fluvo-aquic soil. The partial least square regression (PLSR) method was used to establish experiential models to estimate soil organic matter content. Both root mean squared error (RMSE) and coefficient of the determination (R2) were introduced to test the precision and stability of the modes. Results demonstrate that compared with the hyperspectral data, the best model established by simulated multi-spectral data gives a good result for organic matter content, with R2=0.586, and RMSE=0.280. Therefore, using multi-spectral data to predict tide soil organic matter content is feasible.
Özdemir, İbrahim Sani; Öztürk, Bülent; Çelik, Belgin; Sarıtepe, Yüksel; Aksoy, Hatice
2018-08-15
The potential of using FT-NIR spectroscopy for the rapid and non-destructive measurement of the moisture, water activity, firmness and SO 2 content of the intact sulphured-dried apricots (SDA) was investigated for the first time in the literature. The partial least squares regression (PLS-R) models constructed using FT-NIR spectra were very successful in predicting the moisture content (R 2 p = 0.986, RMSEP = 1.22%, RPD = 9.15) and water activity (R 2 p = 0.987, RMSEP = 0.016, RPD = 9.37) of SDAs. Satisfactory results were also obtained for the models developed for the prediction of the firmness (R 2 p = 0.845, RMSEP = 0.445, RPD = 2.55) and SO 2 content (R 2 p = 0.804, RMSEP = 349 mg kg -1 , RPD = 2.27). These results clearly demonstrate that the major quality parameters of SDA can be simultaneously measured in a short time by FT-NIR spectroscopy without any need for the sample preparation or skilled laboratory personnel. Copyright © 2018 Elsevier B.V. All rights reserved.
Subaihi, Abdu; Muhamadali, Howbeer; Mutter, Shaun T; Blanch, Ewan; Ellis, David I; Goodacre, Royston
2017-03-27
In this study surface enhanced Raman scattering (SERS) combined with the isotopic labelling (IL) principle has been used for the quantification of codeine spiked into both water and human plasma. Multivariate statistical approaches were employed for the analysis of these SERS spectral data, particularly partial least squares regression (PLSR) which was used to generate models using the full SERS spectral data for quantification of codeine with, and without, an internal isotopic labelled standard. The PLSR models provided accurate codeine quantification in water and human plasma with high prediction accuracy (Q 2 ). In addition, the employment of codeine-d 6 as the internal standard further improved the accuracy of the model, by increasing the Q 2 from 0.89 to 0.94 and decreasing the low root-mean-square error of predictions (RMSEP) from 11.36 to 8.44. Using the peak area at 1281 cm -1 assigned to C-N stretching, C-H wagging and ring breathing, the limit of detection was calculated in both water and human plasma to be 0.7 μM (209.55 ng mL -1 ) and 1.39 μM (416.12 ng mL -1 ), respectively. Due to a lack of definitive codeine vibrational assignments, density functional theory (DFT) calculations have also been used to assign the spectral bands with their corresponding vibrational modes, which were in excellent agreement with our experimental Raman and SERS findings. Thus, we have successfully demonstrated the application of SERS with isotope labelling for the absolute quantification of codeine in human plasma for the first time with a high degree of accuracy and reproducibility. The use of the IL principle which employs an isotopolog (that is to say, a molecule which is only different by the substitution of atoms by isotopes) improves quantification and reproducibility because the competition of the codeine and codeine-d 6 for the metal surface used for SERS is equal and this will offset any difference in the number of particles under analysis or any fluctuations in laser fluence. It is our belief that this may open up new exciting opportunities for testing SERS in real-world samples and applications which would be an area of potential future studies.
Singh, Aditya; Serbin, Shawn P.; McNeil, Brenden E.; ...
2015-12-01
A major goal of remote sensing is the development of generalizable algorithms to repeatedly and accurately map ecosystem properties across space and time. Imaging spectroscopy has great potential to map vegetation traits that cannot be retrieved from broadband spectral data, but rarely have such methods been tested across broad regions. Here we illustrate a general approach for estimating key foliar chemical and morphological traits through space and time using NASA's Airborne Visible/Infrared Imaging Spectrometer (AVIRIS-Classic). We apply partial least squares regression (PLSR) to data from 237 field plots within 51 images acquired between 2008 and 2011. Using a series ofmore » 500 randomized 50/50 subsets of the original data, we generated spatially explicit maps of seven traits (leaf mass per area (M area), percentage nitrogen, carbon, fiber, lignin, and cellulose, and isotopic nitrogen concentration, δ 15N) as well as pixel-wise uncertainties in their estimates based on error propagation in the analytical methods. Both Marea and %N PLSR models had a R 2 > 0.85. Root mean square errors (RMSEs) for both variables were less than 9% of the range of data. Fiber and lignin were predicted with R 2 > 0.65 and carbon and cellulose with R 2 > 0.45. Although R 2 of %C and cellulose were lower than Marea and %N, the measured variability of these constituents (especially %C) was also lower, and their RMSE values were beneath 12% of the range in overall variability. Model performance for δ 15N was the lowest (R 2 = 0.48, RMSE = 0.95‰), but within 15% of the observed range. The resulting maps of chemical and morphological traits, together with their overall uncertainties, represent a first-of-its-kind approach for examining the spatiotemporal patterns of forest functioning and nutrient cycling across a broad range of temperate and sub-boreal ecosystems. These results offer an alternative to categorical maps of functional or physiognomic types by providing non-discrete maps (i.e., on a continuum) of traits that define those functional types. A key contribution of this work is the ability to assign retrieval uncertainties by pixel, a requirement to enable assimilation of these data products into ecosystem modeling frameworks to constrain carbon and nutrient cycling projections.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Aditya; Serbin, Shawn P.; McNeil, Brenden E.
A major goal of remote sensing is the development of generalizable algorithms to repeatedly and accurately map ecosystem properties across space and time. Imaging spectroscopy has great potential to map vegetation traits that cannot be retrieved from broadband spectral data, but rarely have such methods been tested across broad regions. Here we illustrate a general approach for estimating key foliar chemical and morphological traits through space and time using NASA's Airborne Visible/Infrared Imaging Spectrometer (AVIRIS-Classic). We apply partial least squares regression (PLSR) to data from 237 field plots within 51 images acquired between 2008 and 2011. Using a series ofmore » 500 randomized 50/50 subsets of the original data, we generated spatially explicit maps of seven traits (leaf mass per area (M area), percentage nitrogen, carbon, fiber, lignin, and cellulose, and isotopic nitrogen concentration, δ 15N) as well as pixel-wise uncertainties in their estimates based on error propagation in the analytical methods. Both Marea and %N PLSR models had a R 2 > 0.85. Root mean square errors (RMSEs) for both variables were less than 9% of the range of data. Fiber and lignin were predicted with R 2 > 0.65 and carbon and cellulose with R 2 > 0.45. Although R 2 of %C and cellulose were lower than Marea and %N, the measured variability of these constituents (especially %C) was also lower, and their RMSE values were beneath 12% of the range in overall variability. Model performance for δ 15N was the lowest (R 2 = 0.48, RMSE = 0.95‰), but within 15% of the observed range. The resulting maps of chemical and morphological traits, together with their overall uncertainties, represent a first-of-its-kind approach for examining the spatiotemporal patterns of forest functioning and nutrient cycling across a broad range of temperate and sub-boreal ecosystems. These results offer an alternative to categorical maps of functional or physiognomic types by providing non-discrete maps (i.e., on a continuum) of traits that define those functional types. A key contribution of this work is the ability to assign retrieval uncertainties by pixel, a requirement to enable assimilation of these data products into ecosystem modeling frameworks to constrain carbon and nutrient cycling projections.« less
Lipiäinen, Tiina; Fraser-Miller, Sara J; Gordon, Keith C; Strachan, Clare J
2018-02-05
This study considers the potential of low-frequency (terahertz) Raman spectroscopy in the quantitative analysis of ternary mixtures of solid-state forms. Direct comparison between low-frequency and mid-frequency spectral regions for quantitative analysis of crystal form mixtures, without confounding sampling and instrumental variations, is reported for the first time. Piroxicam was used as a model drug, and the low-frequency spectra of piroxicam forms β, α2 and monohydrate are presented for the first time. These forms show clear spectral differences in both the low- and mid-frequency regions. Both spectral regions provided quantitative models suitable for predicting the mixture compositions using partial least squares regression (PLSR), but the low-frequency data gave better models, based on lower errors of prediction (2.7, 3.1 and 3.2% root-mean-square errors of prediction [RMSEP] values for the β, α2 and monohydrate forms, respectively) than the mid-frequency data (6.3, 5.4 and 4.8%, for the β, α2 and monohydrate forms, respectively). The better performance of low-frequency Raman analysis was attributed to larger spectral differences between the solid-state forms, combined with a higher signal-to-noise ratio. Copyright © 2017 Elsevier B.V. All rights reserved.
Mapping The Temporal and Spatial Variability of Soil Moisture Content Using Proximal Soil Sensing
NASA Astrophysics Data System (ADS)
Virgawati, S.; Mawardi, M.; Sutiarso, L.; Shibusawa, S.; Segah, H.; Kodaira, M.
2018-05-01
In studies related to soil optical properties, it has been proven that visual and NIR soil spectral response can predict soil moisture content (SMC) using proper data analysis techniques. SMC is one of the most important soil properties influencing most physical, chemical, and biological soil processes. The problem is how to provide reliable, fast and inexpensive information of SMC in the subsurface from numerous soil samples and repeated measurement. The use of spectroscopy technology has emerged as a rapid and low-cost tool for extensive investigation of soil properties. The objective of this research was to develop calibration models based on laboratory Vis-NIR spectroscopy to estimate the SMC at four different growth stages of the soybean crop in Yogyakarta Province. An ASD Field-spectrophotoradiometer was used to measure the reflectance of soil samples. The partial least square regression (PLSR) was performed to establish the relationship between the SMC with Vis-NIR soil reflectance spectra. The selected calibration model was used to predict the new samples of SMC. The temporal and spatial variability of SMC was performed in digital maps. The results revealed that the calibration model was excellent for SMC prediction. Vis-NIR spectroscopy was a reliable tool for the prediction of SMC.
Tanaka, Ryoma; Takahashi, Naoyuki; Nakamura, Yasuaki; Hattori, Yusuke; Ashizawa, Kazuhide; Otsuka, Makoto
2017-01-01
Resonant acoustic ® mixing (RAM) technology is a system that performs high-speed mixing by vibration through the control of acceleration and frequency. In recent years, real-time process monitoring and prediction has become of increasing interest, and process analytical technology (PAT) systems will be increasingly introduced into actual manufacturing processes. This study examined the application of PAT with the combination of RAM, near-infrared spectroscopy, and chemometric technology as a set of PAT tools for introduction into actual pharmaceutical powder blending processes. Content uniformity was based on a robust partial least squares regression (PLSR) model constructed to manage the RAM configuration parameters and the changing concentration of the components. As a result, real-time monitoring may be possible and could be successfully demonstrated for in-line real-time prediction of active pharmaceutical ingredients and other additives using chemometric technology. This system is expected to be applicable to the RAM method for the risk management of quality.
Luo, Yu; Li, Wen-Long; Huang, Wen-Hua; Liu, Xue-Hua; Song, Yan-Gang; Qu, Hai-Bin
2017-05-01
A near infrared spectroscopy (NIRS) approach was established for quality control of the alcohol precipitation liquid in the manufacture of Codonopsis Radix. By applying NIRS with multivariate analysis, it was possible to build variation into the calibration sample set, and the Plackett-Burman design, Box-Behnken design, and a concentrating-diluting method were used to obtain the sample set covered with sufficient fluctuation of process parameters and extended concentration information. NIR data were calibrated to predict the four quality indicators using partial least squares regression (PLSR). In the four calibration models, the root mean squares errors of prediction (RMSEPs) were 1.22 μg/ml, 10.5 μg/ml, 1.43 μg/ml, and 0.433% for lobetyolin, total flavonoids, pigments, and total solid contents, respectively. The results indicated that multi-components quantification of the alcohol precipitation liquid of Codonopsis Radix could be achieved with an NIRS-based method, which offers a useful tool for real-time release testing (RTRT) of intermediates in the manufacture of Codonopsis Radix.
Ayvaz, Huseyin; Rodriguez-Saona, Luis E
2015-05-01
The most common methods for acrylamide analysis in foods require the use of LC-MS/MS and GC-MS. Although these methods have great analytical performance, they need intensive sample preparation, highly specialised instrumentation, and are time consuming. In this study, portable and handheld infrared spectrometers were evaluated as rapid methods for screening acrylamide in potato chips and their performances were compared to those of benchtop infrared systems. The acrylamide content of 64 commercial potato chips (169-2453 μg/kg) was determined by LC-MS/MS. Spectral data were collected using mid-infrared (MIR) and near-infrared (NIR) spectrometers. Partial least squares regression (PLSR) calibration models were developed to predict acrylamide levels. Overall, good linear correlation was found between the predicted acrylamide levels and actual measured acrylamide concentrations by LC-MS/MS (rPred > 0.90 and SEP < 100 μg/kg). Our results indicate that portable and handheld spectrometers can be used as simple and rapid alternatives for acrylamide analysis in potato chips. Copyright © 2014 Elsevier Ltd. All rights reserved.
Towards decadal soil salinity mapping using Landsat time series data
NASA Astrophysics Data System (ADS)
Fan, Xingwang; Weng, Yongling; Tao, Jinmei
2016-10-01
Salinization is one of the major soil problems around the world. However, decadal variation in soil salinization has not yet been extensively reported. This study exploited thirty years (1985-2015) of Landsat sensor data, including Landsat-4/5 TM (Thematic Mapper), Landsat-7 ETM+ (Enhanced Thematic Mapper Plus) and Landsat-8 OLI (Operational Land Imager), for monitoring soil salinity of the Yellow River Delta, China. The data were initially corrected for atmospheric effects, and then matched the spectral bands of EO-1 (Earth Observing One) ALI (Advanced Land Imager). Subsequently, soil salinity maps were derived with a previously developed PLSR (Partial Least Square Regression) model. On intra-annual scale, the retrievals showed that soil salinity increased in February, stabilized in March, and decreased in April. On inter-annual scale, soil salinity decreased within 1985-2000 (-0.74 g kg-1/10a, p < 0.001), and increased within 2000-2015 (0.79 g kg-1/10a, p < 0.001). Our study presents a new perspective for use of multiple Landsat data in soil salinity retrieval, and further the understanding of soil salinization development over the Yellow River Delta.
Hyperspectral estimation of soil heavy metals in Guanzhong area, Shaanxi province
NASA Astrophysics Data System (ADS)
Liu, Jinbao; Cheng, Jie; Wang, Huanyuan; Tong, Wei; Ma, Zenghui
2017-10-01
In this study, the contents of Cr, Mn, Ni, Cu, and Zn, As, Cd, Hg and Pub in 44 soil samples were collected from Fufeng County, Yangling County and Wugong County, Shaanxi Province and were used as data sources. ASD Field Spec HR (350 ˜ 2500 nm), and then the NOR, MSC and SNV of the reflectance were pretreated, the first deviation, second deviation and reflectance reciprocal logarithmic transformation were carried out. The optimal hyper spectral estimation model of nine heavy metal elements of Cr, Mn, Ni, Cu, and Zn, As, Cd, Hg and Pb was established by regression method. Comparing the reflection characteristics of different heavy metal contents and the effect of different pretreatment methods on the establishment of soil heavy metal spectral inversion model. The results show that: (1) the reflectance spectrum improves the signal-to-noise ratio of the reflectance spectrum after the transformation of NOR, MSC and SNV. Combining differential transformation can improve the information of heavy metal elements in the soil, and use the correlation band energy significantly improve the stability and predictability of the model. (2) The modeling accuracy of the optimal model of nine heavy metal spectra of Cr, Mn, Ni, Cu, and Zn, As, Cd, Hg and Pb by PLSR method were 0.7002, 0.7852, 0.687, 0.8036, 0.8619, 0.5765, 0.5451, 0.9912, and 0.6182.
Alamar, Priscila D; Caramês, Elem T S; Poppi, Ronei J; Pallone, Juliana A L
2016-07-01
The present study investigated the application of near infrared spectroscopy as a green, quick, and efficient alternative to analytical methods currently used to evaluate the quality (moisture, total sugars, acidity, soluble solids, pH and ascorbic acid) of frozen guava and passion fruit pulps. Fifty samples were analyzed by near infrared spectroscopy (NIR) and reference methods. Partial least square regression (PLSR) was used to develop calibration models to relate the NIR spectra and the reference values. Reference methods indicated adulteration by water addition in 58% of guava pulp samples and 44% of yellow passion fruit pulp samples. The PLS models produced lower values of root mean squares error of calibration (RMSEC), root mean squares error of prediction (RMSEP), and coefficient of determination above 0.7. Moisture and total sugars presented the best calibration models (RMSEP of 0.240 and 0.269, respectively, for guava pulp; RMSEP of 0.401 and 0.413, respectively, for passion fruit pulp) which enables the application of these models to determine adulteration in guava and yellow passion fruit pulp by water or sugar addition. The models constructed for calibration of quality parameters of frozen fruit pulps in this study indicate that NIR spectroscopy coupled with the multivariate calibration technique could be applied to determine the quality of guava and yellow passion fruit pulp. Copyright © 2016 Elsevier Ltd. All rights reserved.
Improving the prediction of African savanna vegetation variables using time series of MODIS products
NASA Astrophysics Data System (ADS)
Tsalyuk, Miriam; Kelly, Maggi; Getz, Wayne M.
2017-09-01
African savanna vegetation is subject to extensive degradation as a result of rapid climate and land use change. To better understand these changes detailed assessment of vegetation structure is needed across an extensive spatial scale and at a fine temporal resolution. Applying remote sensing techniques to savanna vegetation is challenging due to sparse cover, high background soil signal, and difficulty to differentiate between spectral signals of bare soil and dry vegetation. In this paper, we attempt to resolve these challenges by analyzing time series of four MODIS Vegetation Products (VPs): Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Leaf Area Index (LAI), and Fraction of Photosynthetically Active Radiation (FPAR) for Etosha National Park, a semiarid savanna in north-central Namibia. We create models to predict the density, cover, and biomass of the main savanna vegetation forms: grass, shrubs, and trees. To calibrate remote sensing data we developed an extensive and relatively rapid field methodology and measured herbaceous and woody vegetation during both the dry and wet seasons. We compared the efficacy of the four MODIS-derived VPs in predicting vegetation field measured variables. We then compared the optimal time span of VP time series to predict ground-measured vegetation. We found that Multiyear Partial Least Square Regression (PLSR) models were superior to single year or single date models. Our results show that NDVI-based PLSR models yield robust prediction of tree density (R2 = 0.79, relative Root Mean Square Error, rRMSE = 1.9%) and tree cover (R2 = 0.78, rRMSE = 0.3%). EVI provided the best model for shrub density (R2 = 0.82) and shrub cover (R2 = 0.83), but was only marginally superior over models based on other VPs. FPAR was the best predictor of vegetation biomass of trees (R2 = 0.76), shrubs (R2 = 0.83), and grass (R2 = 0.91). Finally, we addressed an enduring challenge in the remote sensing of semiarid vegetation by examining the transferability of predictive models through space and time. Our results show that models created in the wetter part of Etosha could accurately predict trees' and shrubs' variables in the drier part of the reserve and vice versa. Moreover, our results demonstrate that models created for vegetation variables in the dry season of 2011 could be successfully applied to predict vegetation in the wet season of 2012. We conclude that extensive field data combined with multiyear time series of MODIS vegetation products can produce robust predictive models for multiple vegetation forms in the African savanna. These methods advance the monitoring of savanna vegetation dynamics and contribute to improved management and conservation of these valuable ecosystems.
Yang, Teng; Adams, Jonathan M; Shi, Yu; He, Jin-Sheng; Jing, Xin; Chen, Litong; Tedersoo, Leho; Chu, Haiyan
2017-07-01
Previous studies have revealed inconsistent correlations between fungal diversity and plant diversity from local to global scales, and there is a lack of information about the diversity-diversity and productivity-diversity relationships for fungi in alpine regions. Here we investigated the internal relationships between soil fungal diversity, plant diversity and productivity across 60 grassland sites on the Tibetan Plateau, using Illumina sequencing of the internal transcribed spacer 2 (ITS2) region for fungal identification. Fungal alpha and beta diversities were best explained by plant alpha and beta diversities, respectively, when accounting for environmental drivers and geographic distance. The best ordinary least squares (OLS) multiple regression models, partial least squares regression (PLSR) and variation partitioning analysis (VPA) indicated that plant richness was positively correlated with fungal richness. However, no correlation between plant richness and fungal richness was evident for fungal functional guilds when analyzed individually. Plant productivity showed a weaker relationship to fungal diversity which was intercorrelated with other factors such as plant diversity, and was thus excluded as a main driver. Our study points to a predominant effect of plant diversity, along with other factors such as carbon : nitrogen (C : N) ratio, soil phosphorus and dissolved organic carbon, on soil fungal richness. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
Using LUCAS topsoil database to estimate soil organic carbon content in local spectral libraries
NASA Astrophysics Data System (ADS)
Castaldi, Fabio; van Wesemael, Bas; Chabrillat, Sabine; Chartin, Caroline
2017-04-01
The quantification of the soil organic carbon (SOC) content over large areas is mandatory to obtain accurate soil characterization and classification, which can improve site specific management at local or regional scale exploiting the strong relationship between SOC and crop growth. The estimation of the SOC is not only important for agricultural purposes: in recent years, the increasing attention towards global warming highlighted the crucial role of the soil in the global carbon cycle. In this context, soil spectroscopy is a well consolidated and widespread method to estimate soil variables exploiting the interaction between chromophores and electromagnetic radiation. The importance of spectroscopy in soil science is reflected by the increasing number of large soil spectral libraries collected in the world. These large libraries contain soil samples derived from a consistent number of pedological regions and thus from different parent material and soil types; this heterogeneity entails, in turn, a large variability in terms of mineralogical and organic composition. In the light of the huge variability of the spectral responses to SOC content and composition, a rigorous classification process is necessary to subset large spectral libraries and to avoid the calibration of global models failing to predict local variation in SOC content. In this regard, this study proposes a method to subset the European LUCAS topsoil database into soil classes using a clustering analysis based on a large number of soil properties. The LUCAS database was chosen to apply a standardized multivariate calibration approach valid for large areas without the need for extensive field and laboratory work for calibration of local models. Seven soil classes were detected by the clustering analyses and the samples belonging to each class were used to calibrate specific partial least square regression (PLSR) models to estimate SOC content of three local libraries collected in Belgium (Loam belt and Wallonia) and Luxembourg. The three local libraries only consist of spectral data (199 samples) acquired using the same protocol as the one used for the LUCAS database. SOC was estimated with a good accuracy both within each local library (RMSE: 1.2 ÷ 5.4 g kg-1; RPD: 1.41 ÷ 2.06) and for the samples of the three libraries together (RMSE: 3.9 g kg-1; RPD: 2.47). The proposed approach could allow to estimate SOC everywhere in Europe only collecting spectra, without the need for chemical laboratory analyses, exploiting the potentiality of the LUCAS database and specific PLSR models.
NASA Astrophysics Data System (ADS)
Rosero-Vlasova, O.; Borini Alves, D.; Vlassova, L.; Perez-Cabello, F.; Montorio Lloveria, R.
2017-10-01
Deforestation in Amazon basin due, among other factors, to frequent wildfires demands continuous post-fire monitoring of soil and vegetation. Thus, the study posed two objectives: (1) evaluate the capacity of Visible - Near InfraRed - ShortWave InfraRed (VIS-NIR-SWIR) spectroscopy to estimate soil organic matter (SOM) in fire-affected soils, and (2) assess the feasibility of SOM mapping from satellite images. For this purpose, 30 soil samples (surface layer) were collected in 2016 in areas of grass and riparian vegetation of Campos Amazonicos National Park, Brazil, repeatedly affected by wildfires. Standard laboratory procedures were applied to determine SOM. Reflectance spectra of soils were obtained in controlled laboratory conditions using Fieldspec4 spectroradiometer (spectral range 350nm- 2500nm). Measured spectra were resampled to simulate reflectances for Landsat-8, Sentinel-2 and EnMap spectral bands, used as predictors in SOM models developed using Partial Least Squares regression and step-down variable selection algorithm (PLSR-SD). The best fit was achieved with models based on reflectances simulated for EnMap bands (R2=0.93; R2cv=0.82 and NMSE=0.07; NMSEcv=0.19). The model uses only 8 out of 244 predictors (bands) chosen by the step-down variable selection algorithm. The least reliable estimates (R2=0.55 and R2cv=0.40 and NMSE=0.43; NMSEcv=0.60) resulted from Landsat model, while Sentinel-2 model showed R2=0.68 and R2cv=0.63; NMSE=0.31 and NMSEcv=0.38. The results confirm high potential of VIS-NIR-SWIR spectroscopy for SOM estimation. Application of step-down produces sparser and better-fit models. Finally, SOM can be estimated with an acceptable accuracy (NMSE 0.35) from EnMap and Sentinel-2 data enabling mapping and analysis of impacts of repeated wildfires on soils in the study area.
Capote, F Priego; Jiménez, J Ruiz; de Castro, M D Luque
2007-08-01
An analytical method for the sequential detection, identification and quantitation of extra virgin olive oil adulteration with four edible vegetable oils--sunflower, corn, peanut and coconut oils--is proposed. The only data required for this method are the results obtained from an analysis of the lipid fraction by gas chromatography-mass spectrometry. A total number of 566 samples (pure oils and samples of adulterated olive oil) were used to develop the chemometric models, which were designed to accomplish, step-by-step, the three aims of the method: to detect whether an olive oil sample is adulterated, to identify the type of adulterant used in the fraud, and to determine how much aldulterant is in the sample. Qualitative analysis was carried out via two chemometric approaches--soft independent modelling of class analogy (SIMCA) and K nearest neighbours (KNN)--both approaches exhibited prediction abilities that were always higher than 91% for adulterant detection and 88% for type of adulterant identification. Quantitative analysis was based on partial least squares regression (PLSR), which yielded R2 values of >0.90 for calibration and validation sets and thus made it possible to determine adulteration with excellent precision according to the Shenk criteria.
Kern, Simon; Meyer, Klas; Guhl, Svetlana; Gräßer, Patrick; Paul, Andrea; King, Rudibert; Maiwald, Michael
2018-05-01
Monitoring specific chemical properties is the key to chemical process control. Today, mainly optical online methods are applied, which require time- and cost-intensive calibration effort. NMR spectroscopy, with its advantage being a direct comparison method without need for calibration, has a high potential for enabling closed-loop process control while exhibiting short set-up times. Compact NMR instruments make NMR spectroscopy accessible in industrial and rough environments for process monitoring and advanced process control strategies. We present a fully automated data analysis approach which is completely based on physically motivated spectral models as first principles information (indirect hard modeling-IHM) and applied it to a given pharmaceutical lithiation reaction in the framework of the European Union's Horizon 2020 project CONSENS. Online low-field NMR (LF NMR) data was analyzed by IHM with low calibration effort, compared to a multivariate PLS-R (partial least squares regression) approach, and both validated using online high-field NMR (HF NMR) spectroscopy. Graphical abstract NMR sensor module for monitoring of the aromatic coupling of 1-fluoro-2-nitrobenzene (FNB) with aniline to 2-nitrodiphenylamine (NDPA) using lithium-bis(trimethylsilyl) amide (Li-HMDS) in continuous operation. Online 43.5 MHz low-field NMR (LF) was compared to 500 MHz high-field NMR spectroscopy (HF) as reference method.
Tamburini, Elena; Mamolini, Elisabetta; De Bastiani, Morena; Marchetti, Maria Gabriella
2016-07-15
Fusarium proliferatum is considered to be a pathogen of many economically important plants, including garlic. The objective of this research was to apply near-infrared spectroscopy (NIRS) to rapidly determine fungal concentration in intact garlic cloves, avoiding the laborious and time-consuming procedures of traditional assays. Preventive detection of infection before seeding is of great interest for farmers, because it could avoid serious losses of yield during harvesting and storage. Spectra were collected on 95 garlic cloves, divided in five classes of infection (from 1-healthy to 5-very highly infected) in the range of fungal concentration 0.34-7231.15 ppb. Calibration and cross validation models were developed with partial least squares regression (PLSR) on pretreated spectra (standard normal variate, SNV, and derivatives), providing good accuracy in prediction, with a coefficient of determination (R²) of 0.829 and 0.774, respectively, a standard error of calibration (SEC) of 615.17 ppb, and a standard error of cross validation (SECV) of 717.41 ppb. The calibration model was then used to predict fungal concentration in unknown samples, peeled and unpeeled. The results showed that NIRS could be used as a reliable tool to directly detect and quantify F. proliferatum infection in peeled intact garlic cloves, but the presence of the external peel strongly affected the prediction reliability.
The rapid measurement of soil carbon stock using near-infrared technology
NASA Astrophysics Data System (ADS)
Kusumo, B. H.; Sukartono; Bustan
2018-03-01
As a soil pool stores carbon (C) three times higher than an atmospheric pool, the depletion of C stock in the soil will significantly increase the concentration of CO2 in the atmosphere, causing global warming. However, the monitoring or measurement of soil C stock using conventional procedures is time-consuming and expensive. So it requires a rapid and non-destructive technique that is simple and does not need chemical substances. This research is aimed at testing whether near-infrared (NIR) technology is able to rapidly measure C stock in the soil. Soil samples were collected from an agricultural land at the sub-district of Kayangan, North Lombok, Indonesia. The coordinates of the samples were recorded. Parts of the samples were analyzed using conventional procedure (Walkley and Black) and some other parts were scanned using near-infrared spectroscopy (NIRS) for soil spectral collection. Partial Least Square Regression (PLSR) was used to develop models from soil C data measured by conventional analysis and from spectral data scanned by NIRS. The best model was moderately successful to measure soil C stock in the study area in North Lombok. This indicates that the NIR technology can be further used to monitor the change of soil C stock in the soil.
Spectroscopic sensitivity of real-time, rapidly induced phytochemical change in response to damage.
Couture, John J; Serbin, Shawn P; Townsend, Philip A
2013-04-01
An ecological consequence of plant-herbivore interactions is the phytochemical induction of defenses in response to insect damage. Here, we used reflectance spectroscopy to characterize the foliar induction profile of cardenolides in Asclepias syriaca in response to damage, tracked in vivo changes and examined the influence of multiple plant traits on cardenolide concentrations. Foliar cardenolide concentrations were measured at specific time points following damage to capture their induction profile. Partial least-squares regression (PLSR) modeling was employed to calibrate cardenolide concentrations to reflectance spectroscopy. In addition, subsets of plants were either repeatedly sampled to track in vivo changes or modified to reduce latex flow to damaged areas. Cardenolide concentrations and the induction profile of A. syriaca were well predicted using models derived from reflectance spectroscopy, and this held true for repeatedly sampled plants. Correlations between cardenolides and other foliar-related variables were weak or not significant. Plant modification for latex reduction inhibited an induced cardenolide response. Our findings show that reflectance spectroscopy can characterize rapid phytochemical changes in vivo. We used reflectance spectroscopy to identify the mechanisms behind the production of plant secondary metabolites, simultaneously characterizing multiple foliar constituents. In this case, cardenolide induction appears to be largely driven by enhanced latex delivery to leaves following damage. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
NASA Astrophysics Data System (ADS)
Suo, Lizhu; Huang, Mingbin; Zhang, Yongkun; Duan, Liangxia; Shan, Yan
2018-07-01
Soil moisture dynamics plays an active role in ecological and hydrological processes, and it depends on a large number of environmental factors, such as topographic attributes, soil properties, land use types, and precipitation. However, studies must still clarify the relative significance of these environmental factors at different soil depths and at different spatial scales. This study aimed: (1) to characterize temporal and spatial variations in soil moisture content (SMC) at four soil layers (0-40, 40-100, 100-200, and 200-500 cm) and three spatial scales (plot, hillslope, and region); and (2) to determine their dominant controls in diverse soil layers at different spatial scales over semiarid and semi-humid areas of the Loess Plateau, China. Given the high co-dependence of environmental factors, partial least squares regression (PLSR) was used to detect relative significance among 15 selected environmental factors that affect SMC. Temporal variation in SMC decreased with increasing soil depth, and vertical changes in the 0-500 cm soil profile were divided into a fast-changing layer (0-40 cm), an active layer (40-100 cm), a sub-active layer (100-200 cm), and a relatively stable layer (200-500 cm). PLSR models simulated SMC accurately in diverse soil layers at different scales; almost all values for variation in response (R2) and goodness of prediction (Q2) were >0.5 and >0.0975, respectively. Upper and lower layer SMCs were the two most important factors that influenced diverse soil layers at three scales, and these SMC variables exhibited the highest importance in projection (VIP) values. The 7-day antecedent precipitation and 7-day antecedent potential evapotranspiration contributed significantly to SMC only at the 0-40 cm soil layer. VIP of soil properties, especially sand and silt content, which influenced SMC strongly, increased significantly after increasing the measured scale. Mean annual precipitation and potential evapotranspiration also influenced SMC at the regional scale significantly. Overall, this study indicated that dominant controls of SMC varied among three spatial scales on the Loess Plateau, and VIP was a function of spatial scale and soil depth.
Burns, Jennifer B.; Riley, Christopher B.; Shaw, R. Anthony; McClure, J. Trenton
2017-01-01
The objective of this study was to develop and compare the performance of laboratory grade and portable attenuated total reflectance infrared (ATR-IR) spectroscopic approaches in combination with partial least squares regression (PLSR) for the rapid quantification of alpaca serum IgG concentration, and the identification of low IgG (<1000 mg/dL), which is consistent with the diagnosis of failure of transfer of passive immunity (FTPI) in neonates. Serum samples (n = 175) collected from privately owned, healthy alpacas were tested by the reference method of radial immunodiffusion (RID) assay, and laboratory grade and portable ATR-IR spectrometers. Various pre-processing strategies were applied to the ATR-IR spectra that were linked to corresponding RID-IgG concentrations, and then randomly split into two sets: calibration (training) and test sets. PLSR was applied to the calibration set and calibration models were developed, and the test set was used to assess the accuracy of the analytical method. For the test set, the Pearson correlation coefficients between the IgG measured by RID and predicted by both laboratory grade and portable ATR-IR spectrometers was 0.91. The average differences between reference serum IgG concentrations and the two IR-based methods were 120.5 mg/dL and 71 mg/dL for the laboratory and portable ATR-IR-based assays, respectively. Adopting an IgG concentration <1000 mg/dL as the cut-point for FTPI cases, the sensitivity, specificity, and accuracy for identifying serum samples below this cut point by laboratory ATR-IR assay were 86, 100 and 98%, respectively (within the entire data set). Corresponding values for the portable ATR-IR assay were 95, 99 and 99%, respectively. These results suggest that the two different ATR-IR assays performed similarly for rapid qualitative evaluation of alpaca serum IgG and for diagnosis of IgG <1000 mg/dL, the portable ATR-IR spectrometer performed slightly better, and provides more flexibility for potential application in the field. PMID:28651006
Elsohaby, Ibrahim; Burns, Jennifer B; Riley, Christopher B; Shaw, R Anthony; McClure, J Trenton
2017-01-01
The objective of this study was to develop and compare the performance of laboratory grade and portable attenuated total reflectance infrared (ATR-IR) spectroscopic approaches in combination with partial least squares regression (PLSR) for the rapid quantification of alpaca serum IgG concentration, and the identification of low IgG (<1000 mg/dL), which is consistent with the diagnosis of failure of transfer of passive immunity (FTPI) in neonates. Serum samples (n = 175) collected from privately owned, healthy alpacas were tested by the reference method of radial immunodiffusion (RID) assay, and laboratory grade and portable ATR-IR spectrometers. Various pre-processing strategies were applied to the ATR-IR spectra that were linked to corresponding RID-IgG concentrations, and then randomly split into two sets: calibration (training) and test sets. PLSR was applied to the calibration set and calibration models were developed, and the test set was used to assess the accuracy of the analytical method. For the test set, the Pearson correlation coefficients between the IgG measured by RID and predicted by both laboratory grade and portable ATR-IR spectrometers was 0.91. The average differences between reference serum IgG concentrations and the two IR-based methods were 120.5 mg/dL and 71 mg/dL for the laboratory and portable ATR-IR-based assays, respectively. Adopting an IgG concentration <1000 mg/dL as the cut-point for FTPI cases, the sensitivity, specificity, and accuracy for identifying serum samples below this cut point by laboratory ATR-IR assay were 86, 100 and 98%, respectively (within the entire data set). Corresponding values for the portable ATR-IR assay were 95, 99 and 99%, respectively. These results suggest that the two different ATR-IR assays performed similarly for rapid qualitative evaluation of alpaca serum IgG and for diagnosis of IgG <1000 mg/dL, the portable ATR-IR spectrometer performed slightly better, and provides more flexibility for potential application in the field.
Liu, Ya; Pan, Xianzhang; Wang, Changkun; Li, Yanli; Shi, Rongjie
2015-01-01
Robust models for predicting soil salinity that use visible and near-infrared (vis–NIR) reflectance spectroscopy are needed to better quantify soil salinity in agricultural fields. Currently available models are not sufficiently robust for variable soil moisture contents. Thus, we used external parameter orthogonalization (EPO), which effectively projects spectra onto the subspace orthogonal to unwanted variation, to remove the variations caused by an external factor, e.g., the influences of soil moisture on spectral reflectance. In this study, 570 spectra between 380 and 2400 nm were obtained from soils with various soil moisture contents and salt concentrations in the laboratory; 3 soil types × 10 salt concentrations × 19 soil moisture levels were used. To examine the effectiveness of EPO, we compared the partial least squares regression (PLSR) results established from spectra with and without EPO correction. The EPO method effectively removed the effects of moisture, and the accuracy and robustness of the soil salt contents (SSCs) prediction model, which was built using the EPO-corrected spectra under various soil moisture conditions, were significantly improved relative to the spectra without EPO correction. This study contributes to the removal of soil moisture effects from soil salinity estimations when using vis–NIR reflectance spectroscopy and can assist others in quantifying soil salinity in the future. PMID:26468645
Jiménez-Carvelo, Ana M; González-Casado, Antonio; Cuadros-Rodríguez, Luis
2017-03-01
A new analytical method for the quantification of olive oil and palm oil in blends with other vegetable edible oils (canola, safflower, corn, peanut, seeds, grapeseed, linseed, sesame and soybean) using normal phase liquid chromatography, and applying chemometric tools was developed. The procedure for obtaining of chromatographic fingerprint from the methyl-transesterified fraction from each blend is described. The multivariate quantification methods used were Partial Least Square-Regression (PLS-R) and Support Vector Regression (SVR). The quantification results were evaluated by several parameters as the Root Mean Square Error of Validation (RMSEV), Mean Absolute Error of Validation (MAEV) and Median Absolute Error of Validation (MdAEV). It has to be highlighted that the new proposed analytical method, the chromatographic analysis takes only eight minutes and the results obtained showed the potential of this method and allowed quantification of mixtures of olive oil and palm oil with other vegetable oils. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Pal, I.; Lall, U.; Robertson, A. W.; Cane, M. A.; Bansal, R.
2013-06-01
Snowmelt-dominated streamflow of the Western Himalayan rivers is an important water resource during the dry pre-monsoon spring months to meet the irrigation and hydropower needs in northern India. Here we study the seasonal prediction of melt-dominated total inflow into the Bhakra Dam in northern India based on statistical relationships with meteorological variables during the preceding winter. Total inflow into the Bhakra Dam includes the Satluj River flow together with a flow diversion from its tributary, the Beas River. Both are tributaries of the Indus River that originate from the Western Himalayas, which is an under-studied region. Average measured winter snow volume at the upper-elevation stations and corresponding lower-elevation rainfall and temperature of the Satluj River basin were considered as empirical predictors. Akaike information criteria (AIC) and Bayesian information criteria (BIC) were used to select the best subset of inputs from all the possible combinations of predictors for a multiple linear regression framework. To test for potential issues arising due to multicollinearity of the predictor variables, cross-validated prediction skills of the best subset were also compared with the prediction skills of principal component regression (PCR) and partial least squares regression (PLSR) techniques, which yielded broadly similar results. As a whole, the forecasts of the melt season at the end of winter and as the melt season commences were shown to have potential skill for guiding the development of stochastic optimization models to manage the trade-off between irrigation and hydropower releases versus flood control during the annual fill cycle of the Bhakra Reservoir, a major energy and irrigation source in the region.
Early detection of germinated wheat grains using terahertz image and chemometrics
NASA Astrophysics Data System (ADS)
Jiang, Yuying; Ge, Hongyi; Lian, Feiyu; Zhang, Yuan; Xia, Shanhong
2016-02-01
In this paper, we propose a feasible tool that uses a terahertz (THz) imaging system for identifying wheat grains at different stages of germination. The THz spectra of the main changed components of wheat grains, maltose and starch, which were obtained by THz time spectroscopy, were distinctly different. Used for original data compression and feature extraction, principal component analysis (PCA) revealed the changes that occurred in the inner chemical structure during germination. Two thresholds, one indicating the start of the release of α-amylase and the second when it reaches the steady state, were obtained through the first five score images. Thus, the first five PCs were input for the partial least-squares regression (PLSR), least-squares support vector machine (LS-SVM), and back-propagation neural network (BPNN) models, which were used to classify seven different germination times between 0 and 48 h, with a prediction accuracy of 92.85%, 93.57%, and 90.71%, respectively. The experimental results indicated that the combination of THz imaging technology and chemometrics could be a new effective way to discriminate wheat grains at the early germination stage of approximately 6 h.
Nicolaou, Nicoletta; Goodacre, Royston
2008-10-01
Microbiological safety plays a very significant part in the quality control of milk and dairy products worldwide. Current methods used in the detection and enumeration of spoilage bacteria in pasteurized milk in the dairy industry, although accurate and sensitive, are time-consuming. FT-IR spectroscopy is a metabolic fingerprinting technique that can potentially be used to deliver results with the same accuracy and sensitivity, within minutes after minimal sample preparation. We tested this hypothesis using attenuated total reflectance (ATR), and high throughput (HT) FT-IR techniques. Three main types of pasteurized milk - whole, semi-skimmed and skimmed - were used and milk was allowed to spoil naturally by incubation at 15 degrees C. Samples for FT-IR were obtained at frequent, fixed time intervals and pH and total viable counts were also recorded. Multivariate statistical methods, including principal components-discriminant function analysis and partial least squares regression (PLSR), were then used to investigate the relationship between metabolic fingerprints and the total viable counts. FT-IR ATR data for all milks showed reasonable results for bacterial loads above 10(5) cfu ml(-1). By contrast, FT-IR HT provided more accurate results for lower viable bacterial counts down to 10(3) cfu ml(-1) for whole milk and, 4 x 10(2) cfu ml(-1) for semi-skimmed and skimmed milk. Using FT-IR with PLSR we were able to acquire a metabolic fingerprint rapidly and quantify the microbial load of milk samples accurately, with very little sample preparation. We believe that metabolic fingerprinting using FT-IR has very good potential for future use in the dairy industry as a rapid method of detection and enumeration.
Miloudi, Lynda; Bonnier, Franck; Bertrand, Dominique; Byrne, Hugh J; Perse, Xavier; Chourpa, Igor; Munnier, Emilie
2017-07-01
Core-shell nanocarriers are increasingly being adapted in cosmetic and dermatological fields, aiming to provide an increased penetration of the active pharmaceutical or cosmetic ingredients (API and ACI) through the skin. In the final form, the nanocarriers (NC) are usually prepared in hydrogels, conferring desired viscous properties for topical application. Combined with the high chemical complexity of the encapsulating system itself, involving numerous ingredients to form a stable core and quantifying the NC and/or the encapsulated active without labor-intensive and destructive methods remains challenging. In this respect, the specific molecular fingerprint obtained from vibrational spectroscopy analysis could unambiguously overcome current obstacles in the development of fast and cost-effective quality control tools for NC-based products. The present study demonstrates the feasibility to deliver accurate quantification of the concentrations of curcumin (ACI)-loaded alginate nanocarriers in hydrogel matrices, coupling partial least square regression (PLSR) to infrared (IR) absorption and Raman spectroscopic analyses. With respective root mean square errors of 0.1469 ± 0.0175% w/w and 0.4462 ± 0.0631% w/w, both approaches offer acceptable precision. Further investigation of the PLSR results allowed to highlight the different selectivity of each approach, indicating only IR analysis delivers direct monitoring of the NC through the quantification of the Labrafac®, the main NC ingredient. Raman analyses are rather dominated by the contribution of the ACI which opens numerous perspectives to quantify the active molecules without interferences from the complex core-shell encapsulating systems thus positioning the technique as a powerful analytical tool for industrial screening of cosmetic and pharmaceutical products. Graphical abstract Quantitative analysis of encapuslated active molecules in hydrogel-based samples by means of infrared and Raman spectroscopy.
Mansoor, J K; Schelegle, Edward S; Davis, Cristina E; Walby, William F; Zhao, Weixiang; Aksenov, Alexander A; Pasamontes, Alberto; Figueroa, Jennifer; Allen, Roblee
2014-01-01
An important challenge to pulmonary arterial hypertension (PAH) diagnosis and treatment is early detection of occult pulmonary vascular pathology. Symptoms are frequently confused with other disease entities that lead to inappropriate interventions and allow for progression to advanced states of disease. There is a significant need to develop new markers for early disease detection and management of PAH. Exhaled breath condensate (EBC) samples were compared from 30 age-matched normal healthy individuals and 27 New York Heart Association functional class III and IV idiopathic pulmonary arterial hypertenion (IPAH) patients, a subgroup of PAH. Volatile organic compounds (VOC) in EBC samples were analyzed using gas chromatography/mass spectrometry (GC/MS). Individual peaks in GC profiles were identified in both groups and correlated with pulmonary hemodynamic and clinical endpoints in the IPAH group. Additionally, GC/MS data were analyzed using autoregression followed by partial least squares regression (AR/PLSR) analysis to discriminate between the IPAH and control groups. After correcting for medicaitons, there were 62 unique compounds in the control group, 32 unique compounds in the IPAH group, and 14 in-common compounds between groups. Peak-by-peak analysis of GC profiles of IPAH group EBC samples identified 6 compounds significantly correlated with pulmonary hemodynamic variables important in IPAH diagnosis. AR/PLSR analysis of GC/MS data resulted in a distinct and identifiable metabolic signature for IPAH patients. These findings indicate the utility of EBC VOC analysis to discriminate between severe IPAH and a healthy population; additionally, we identified potential novel biomarkers that correlated with IPAH pulmonary hemodynamic variables that may be important in screening for less severe forms IPAH.
Mansoor, J. K.; Schelegle, Edward S.; Davis, Cristina E.; Walby, William F.; Zhao, Weixiang; Aksenov, Alexander A.; Pasamontes, Alberto; Figueroa, Jennifer; Allen, Roblee
2014-01-01
Background An important challenge to pulmonary arterial hypertension (PAH) diagnosis and treatment is early detection of occult pulmonary vascular pathology. Symptoms are frequently confused with other disease entities that lead to inappropriate interventions and allow for progression to advanced states of disease. There is a significant need to develop new markers for early disease detection and management of PAH. Methodolgy and Findings Exhaled breath condensate (EBC) samples were compared from 30 age-matched normal healthy individuals and 27 New York Heart Association functional class III and IV idiopathic pulmonary arterial hypertenion (IPAH) patients, a subgroup of PAH. Volatile organic compounds (VOC) in EBC samples were analyzed using gas chromatography/mass spectrometry (GC/MS). Individual peaks in GC profiles were identified in both groups and correlated with pulmonary hemodynamic and clinical endpoints in the IPAH group. Additionally, GC/MS data were analyzed using autoregression followed by partial least squares regression (AR/PLSR) analysis to discriminate between the IPAH and control groups. After correcting for medicaitons, there were 62 unique compounds in the control group, 32 unique compounds in the IPAH group, and 14 in-common compounds between groups. Peak-by-peak analysis of GC profiles of IPAH group EBC samples identified 6 compounds significantly correlated with pulmonary hemodynamic variables important in IPAH diagnosis. AR/PLSR analysis of GC/MS data resulted in a distinct and identifiable metabolic signature for IPAH patients. Conclusions These findings indicate the utility of EBC VOC analysis to discriminate between severe IPAH and a healthy population; additionally, we identified potential novel biomarkers that correlated with IPAH pulmonary hemodynamic variables that may be important in screening for less severe forms IPAH. PMID:24748102
Bekiaris, Georgios; Lindedam, Jane; Peltre, Clément; ...
2015-06-18
Complexity and high cost are the main limitations for high-throughput screening methods for the estimation of the sugar release from plant materials during bioethanol production. In addition, it is important that we improve our understanding of the mechanisms by which different chemical components are affecting the degradability of plant material. In this study, Fourier transform infrared photoacoustic spectroscopy (FTIR-PAS) was combined with advanced chemometrics to develop calibration models predicting the amount of sugars released after pretreatment and enzymatic hydrolysis of wheat straw during bioethanol production, and the spectra were analysed to identify components associated with recalcitrance. A total of 1122more » wheat straw samples from nine different locations in Denmark and one location in the United Kingdom, spanning a large variation in genetic material and environmental conditions during growth, were analysed. The FTIR-PAS spectra of non-pretreated wheat straw were correlated with the measured sugar release, determined by a high-throughput pretreatment and enzymatic hydrolysis (HTPH) assay. A partial least square regression (PLSR) calibration model predicting the glucose and xylose release was developed. The interpretation of the regression coefficients revealed a positive correlation between the released glucose and xylose with easily hydrolysable compounds, such as amorphous cellulose and hemicellulose. Additionally, we observed a negative correlation with crystalline cellulose and lignin, which inhibits cellulose and hemicellulose hydrolysis. FTIR-PAS was used as a reliable method for the rapid estimation of sugar release during bioethanol production. The spectra revealed that lignin inhibited the hydrolysis of polysaccharides into monomers, while the crystallinity of cellulose retarded its hydrolysis into glucose. Amorphous cellulose and xylans were found to contribute significantly to the released amounts of glucose and xylose, respectively.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bekiaris, Georgios; Lindedam, Jane; Peltre, Clément
Complexity and high cost are the main limitations for high-throughput screening methods for the estimation of the sugar release from plant materials during bioethanol production. In addition, it is important that we improve our understanding of the mechanisms by which different chemical components are affecting the degradability of plant material. In this study, Fourier transform infrared photoacoustic spectroscopy (FTIR-PAS) was combined with advanced chemometrics to develop calibration models predicting the amount of sugars released after pretreatment and enzymatic hydrolysis of wheat straw during bioethanol production, and the spectra were analysed to identify components associated with recalcitrance. A total of 1122more » wheat straw samples from nine different locations in Denmark and one location in the United Kingdom, spanning a large variation in genetic material and environmental conditions during growth, were analysed. The FTIR-PAS spectra of non-pretreated wheat straw were correlated with the measured sugar release, determined by a high-throughput pretreatment and enzymatic hydrolysis (HTPH) assay. A partial least square regression (PLSR) calibration model predicting the glucose and xylose release was developed. The interpretation of the regression coefficients revealed a positive correlation between the released glucose and xylose with easily hydrolysable compounds, such as amorphous cellulose and hemicellulose. Additionally, we observed a negative correlation with crystalline cellulose and lignin, which inhibits cellulose and hemicellulose hydrolysis. FTIR-PAS was used as a reliable method for the rapid estimation of sugar release during bioethanol production. The spectra revealed that lignin inhibited the hydrolysis of polysaccharides into monomers, while the crystallinity of cellulose retarded its hydrolysis into glucose. Amorphous cellulose and xylans were found to contribute significantly to the released amounts of glucose and xylose, respectively.« less
Hyperspectral Remote Sensing of Terrestrial Ecosystem Productivity from ISS
NASA Astrophysics Data System (ADS)
Huemmrich, K. F.; Campbell, P. K. E.; Gao, B. C.; Flanagan, L. B.; Goulden, M.
2017-12-01
Data from the Hyperspectral Imager for Coastal Ocean (HICO), mounted on the International Space Station (ISS), were used to develop and test algorithms for remotely retrieving ecosystem productivity. The ISS orbit introduces both limitations and opportunities for observing ecosystem dynamics. Twenty six HICO images were used from four study sites representing different vegetation types: grasslands, shrubland, and forest. Gross ecosystem production (GEP) data from eddy covariance were matched with HICO-derived spectra. Multiple algorithms were successful relating spectral reflectance with GEP, including: Spectral Vegetation Indices (SVI), SVI in a light use efficiency model framework, spectral shape characteristics through spectral derivatives and absorption feature analysis, and statistical models leading to Multiband Hyperspectral Indices (MHI) from stepwise regressions and Partial Least Squares Regression (PLSR). Algorithms were able to achieve r2 better than 0.7 for both GEP at the overpass time and daily GEP. These algorithms were successful using a diverse set of observations combining data from multiple years, multiple times during growing season, different times of day, with different view angles, and different vegetation types. The demonstrated robustness of the algorithms presented in this study over these conditions provides some confidence in mapping spatial patterns of GEP, describing variability within fields as well as the regional patterns based only on spectral reflectance information. The ISS orbit provides periods with multiple observations collected at different times of the day within a period of a few days. Diurnal GEP patterns were estimated comparing the half-hourly average GEP from the flux tower against HICO estimates of GEP (r2=0.87) if morning, midday, and afternoon observations were available for average fluxes in the time period.
NASA Astrophysics Data System (ADS)
Jurasinski, Gerald; Scharnweber, Tobias; Schröder, Christian; Lennartz, Bernd; Bauwe, Andreas
2017-04-01
Tree growth depends, among other factors, largely on the prevailing climatic conditions. Therefore, tree growth patterns are to be expected under climate change. Here, we analyze the tree-ring growth response of three major European tree species to projected future climate across a climatic (mostly precipitation) gradient in northeastern Germany. We used monthly data for temperature, precipitation, and the standardized precipitation evapotranspiration index (SPEI) over multiple time scales (1, 3, 6, 12, and 24 months) to construct models of tree-ring growth for Scots pine (Pinus syl- vestris L.) at three pure stands, and for Common beech (Fagus sylvatica L.) and Pedunculate oak (Quercus robur L.) at three mature mixed stands. The regression models were derived using a two-step approach based on partial least squares regression (PLSR) to extract potentially well explaining variables followed by ordinary least squares regression (OLSR) to consolidate the models to the least number of variables while retaining high explanatory power. The stability of the models was tested with a comprehensive calibration-verification scheme. All models were successfully verified with R2s ranging from 0.21 for the western pine stand to 0.62 for the beech stand in the east. For growth prediction, climate data forecasted until 2100 by the regional climate model WETTREG2010 based on the A1B Intergovernmental Panel on Climate Change (IPCC) emission scenario was used. For beech and oak, growth rates will likely decrease until the end of the 21st century. For pine, modeled growth trends vary and range from a slight growth increase to a weak decrease in growth rates depending on the position along the climatic gradient. The climatic gradient across the study area will possibly affect the future growth of oak with larger growth reductions towards the drier east. For beech, site-specific adaptations seem to override the influence of the climatic gradient. We conclude that in Northeastern Germany Scots pine has great potential to remain resilient to projected climate change without any greater impairment, whereas Common beech and Pedunculate oak will likely face lesser growth under the expected warmer and dryer climate conditions. The results call for an adaptation of forest management to mitigate the negative effects of climate change for beech and oak in the region.
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method
NASA Astrophysics Data System (ADS)
Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.
2017-04-01
Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP) statistics was used to quantitatively assess the predictors most relevant for response variable estimation and then for variable selection (Andersen and Bro, 2010). PCA and SDA returned TOC and RFC as influential variables both on the set of chemical and physical data analyzed separately as well as on the whole dataset (Stellacci et al., 2016). Highly weighted variables in PCA were also TEC, followed by K, and AC, followed by Pmac and BD, in the first PC (41.2% of total variance); Olsen P and HA-FA in the second PC (12.6%), Ca in the third (10.6%) component. Variables enabling maximum discrimination among treatments for SDA were WEOC, on the whole dataset, humic substances, followed by Olsen P, EC and clay, in the separate data analyses. The highest PLS-VIP statistics were recorded for Olsen P and Pmac, followed by TOC, TEC, pH and Mg for chemical variables and clay, RFC and AC for the physical variables. Results show that different methods may provide different ranking of the selected variables and the presence of a response variable, in regressive techniques, may affect variable selection. Further investigation with different response variables and with multi-year datasets would allow to better define advantages and limits of single or combined approaches. Acknowledgment The work was supported by the projects "BIOTILLAGE, approcci innovative per il miglioramento delle performances ambientali e produttive dei sistemi cerealicoli no-tillage", financed by PSR-Basilicata 2007-2013, and "DESERT, Low-cost water desalination and sensor technology compact module" financed by ERANET-WATERWORKS 2014. References Andersen C.M. and Bro R., 2010. Variable selection in regression - a tutorial. Journal of Chemometrics, 24 728-737. Armenise et al., 2013. Developing a soil quality index to compare soil fitness for agricultural use under different managements in the mediterranean environment. Soil and Tillage Research, 130:91-98. de Paul Obade et al., 2016. A standardized soil quality index for diverse field conditions. Sci. Total Env. 541:424-434. Pulido Moncada et al., 2014. Data-driven analysis of soil quality indicators using limited data. Geoderma, 235:271-278. Stellacci et al., 2016. Comparison of different multivariate methods to select key soil variables for soil quality indices computation. XLV Congress of the Italian Society of Agronomy (SIA), Sassari, 20-22 September 2016.
Wang, Yan-Cang; Yang, Gui-Jun; Zhu, Jin-Shan; Gu, Xiao-He; Xu, Peng; Liao, Qin-Hong
2014-07-01
For improving the estimation accuracy of soil organic matter content of the north fluvo-aquic soil, wavelet transform technology is introduced. The soil samples were collected from Tongzhou district and Shunyi district in Beijing city. And the data source is from soil hyperspectral data obtained under laboratory condition. First, discrete wavelet transform efficiently decomposes hyperspectral into approximate coefficients and detail coefficients. Then, the correlation between approximate coefficients, detail coefficients and organic matter content was analyzed, and the sensitive bands of the organic matter were screened. Finally, models were established to estimate the soil organic content by using the partial least squares regression (PLSR). Results show that the NIR bands made more contributions than the visible band in estimating organic matter content models; the ability of approximate coefficients to estimate organic matter content is better than that of detail coefficients; The estimation precision of the detail coefficients fir soil organic matter content decreases with the spectral resolution being lower; Compared with the commonly used three types of soil spectral reflectance transforms, the wavelet transform can improve the estimation ability of soil spectral fir organic content; The accuracy of the best model established by the approximate coefficients or detail coefficients is higher, and the coefficient of determination (R2) and the root mean square error (RMSE) of the best model for approximate coefficients are 0.722 and 0.221, respectively. The R2 and RMSE of the best model for detail coefficients are 0.670 and 0.255, respectively.
Tamburini, Elena; Mamolini, Elisabetta; De Bastiani, Morena; Marchetti, Maria Gabriella
2016-01-01
Fusarium proliferatum is considered to be a pathogen of many economically important plants, including garlic. The objective of this research was to apply near-infrared spectroscopy (NIRS) to rapidly determine fungal concentration in intact garlic cloves, avoiding the laborious and time-consuming procedures of traditional assays. Preventive detection of infection before seeding is of great interest for farmers, because it could avoid serious losses of yield during harvesting and storage. Spectra were collected on 95 garlic cloves, divided in five classes of infection (from 1-healthy to 5-very highly infected) in the range of fungal concentration 0.34–7231.15 ppb. Calibration and cross validation models were developed with partial least squares regression (PLSR) on pretreated spectra (standard normal variate, SNV, and derivatives), providing good accuracy in prediction, with a coefficient of determination (R2) of 0.829 and 0.774, respectively, a standard error of calibration (SEC) of 615.17 ppb, and a standard error of cross validation (SECV) of 717.41 ppb. The calibration model was then used to predict fungal concentration in unknown samples, peeled and unpeeled. The results showed that NIRS could be used as a reliable tool to directly detect and quantify F. proliferatum infection in peeled intact garlic cloves, but the presence of the external peel strongly affected the prediction reliability. PMID:27428978
Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Kim, Moon S
2015-10-29
This research aims to design and fabricate a system to measure the capsaicinoid content of red pepper powder in a non-destructive and rapid method using visible and near infrared spectroscopy (VNIR). The developed system scans a well-leveled powder surface continuously to minimize the influence of the placenta distribution, thus acquiring stable and representative reflectance spectra. The system incorporates flat belts driven by a sample input hopper and stepping motor, a powder surface leveler, charge-coupled device (CCD) image sensor-embedded VNIR spectrometer, fiber optic probe, and tungsten halogen lamp, and an automated reference measuring unit with a reference panel to measure the standard spectrum. The operation program includes device interface, standard reflectivity measurement, and a graphical user interface to measure the capsaicinoid content. A partial least square regression (PLSR) model was developed to predict the capsaicinoid content; 44 red pepper powder samples whose measured capsaicinoid content ranged 13.45-159.48 mg/100 g by per high-performance liquid chromatography (HPLC) and 1242 VNIR absorbance spectra acquired by the pungency measurement system were used. The determination coefficient of validation (RV2) and standard error of prediction (SEP) for the model with the first-order derivative pretreatment method for Korean red pepper powder were 0.8484 and ±13.6388 mg/100 g, respectively.
Henn, Raphael; Kirchler, Christian G; Grossgut, Maria-Elisabeth; Huck, Christian W
2017-05-01
This study compared three commercially available spectrometers - whereas two of them were miniaturized - in terms of prediction ability of melamine in milk powder (infant formula). Therefore all spectra were split into calibration- and validation-set using Kennard Stone and Duplex algorithm in comparison. For each instrument the three best performing PLSR models were constructed using SNV and Savitzky Golay derivatives. The best RMSEP values were 0.28g/100g, 0.33g/100g and 0.27g/100g for the NIRFlex N-500, the microPHAZIR and the microNIR2200 respectively. Furthermore the multivariate LOD interval [LOD min , LOD max ] was calculated for all the PLSR models unveiling significant differences among the spectrometers showing values of 0.20g/100g - 0.27g/100g, 0.28g/100g - 0.54g/100g and 0.44g/100g - 1.01g/100g for the NIRFlex N-500, the microPHAZIR and the microNIR2200 respectively. To assess the robustness of all models, artificial introduction of white noise, baseline shift, multiplicative effect, spectral shrink and stretch, stray light and spectral shift were applied. Monitoring the RMSEP as function of the perturbation gave indication of robustness of the models and helped to compare the performances of the spectrometers. Not taking the additional information from the LOD calculations into account one could falsely assume that all the spectrometers perform equally well which is not the case when the multivariate evaluation and robustness data were considered. Copyright © 2017 Elsevier B.V. All rights reserved.
Tahir, Haroon Elrasheid; Xiaobo, Zou; Xiaowei, Huang; Jiyong, Shi; Mariod, Abdalbasit Adam
2016-09-01
Aroma profiles of six honey varieties of different botanical origins were investigated using colorimetric sensor array, gas chromatography-mass spectrometry (GC-MS) and descriptive sensory analysis. Fifty-eight aroma compounds were identified, including 2 norisoprenoids, 5 hydrocarbons, 4 terpenes, 6 phenols, 7 ketones, 9 acids, 12 aldehydes and 13 alcohols. Twenty abundant or active compounds were chosen as key compounds to characterize honey aroma. Discrimination of the honeys was subsequently implemented using multivariate analysis, including hierarchical clustering analysis (HCA) and principal component analysis (PCA). Honeys of the same botanical origin were grouped together in the PCA score plot and HCA dendrogram. SPME-GC/MS and colorimetric sensor array were able to discriminate the honeys effectively with the advantages of being rapid, simple and low-cost. Moreover, partial least squares regression (PLSR) was applied to indicate the relationship between sensory descriptors and aroma compounds. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Arantes Camargo, Livia; Marques Júnior, José; Reynaldo Ferracciú Alleoni, Luís; Tadeu Pereira, Gener; De Bortoli Teixeira, Daniel; Santos Rabelo de Souza Bahia, Angélica
2017-04-01
Environmental impact assessments may be assisted by spatial characterization of potentially toxic elements (PTEs). Diffuse reflectance spectroscopy (DRS) and X-ray fluorescence spectroscopy (XRF) are rapid, non-destructive, low-cost, prediction tools for a simultaneous characterization of different soil attributes. Although low concentrations of PTEs might preclude the observation of spectral features, their contents can be predicted using spectroscopy by exploring the existing relationship between the PTEs and soil attributes with spectral features. This study aimed to evaluate, in three geomorphic surfaces of Oxisols, the capacity for predicting PTEs (Ba, Co, and Ni) and their spatial variability by means of diffuse reflectance spectroscopy (DRS) and X-ray fluorescence spectroscopy (XRF). For that, soil samples were collected from three geomorphic surfaces and analyzed for chemical, physical, and mineralogical properties, and then analyzed in DRS (visible + near infrared - VIS+NIR and medium infrared - MIR) and XRF equipment. PTE prediction models were calibrated using partial least squares regression (PLSR). PTE spatial distribution maps were built using the values calculated by the calibrated models that reached the best accuracy using geostatistics. PTE prediction models were satisfactorily calibrated using MIR DRS for Ba, and Co (residual prediction deviation - RPD > 3.0), Vis DRS for Ni (RPD > 2.0) and FRX for all the studied PTEs (RPD > 1.8). DRS- and XRF-predicted values allowed the characterization and the understanding of spatial variability of the studied PTEs.
NASA Astrophysics Data System (ADS)
Shi, Z. H.
2014-12-01
There are strong ties between land use and sediment yield in watersheds. Many studies have used multivariate regression techniques to explore the response of sediment yield to land-use compositions and spatial configurations in watersheds. However, one issue with the use of conventional statistical methods to address relationships between land-use compositions and spatial configurations and sediment yield is multicollinearity. This paper examines the combined effects of land-use compositions and land-use spatial configurations of the watershed on the specific sediment yield of the Upper Du River watershed (8,973 km2) in China using the Soil and Water Assessment Tool (SWAT) and partial least-squares regression (PLSR). The land-use compositions and spatial configurations of the watershed were calculated at the sub-watershed scale. The sediment yields from sub-watershed were evaluated using SWAT model. The first-order factors were identified by calculating the variable importance for the projection (VIP). The results revealed that the land-use compositions exerted the largest effects on the specific sediment yield and explained 61.2% of the variation in the specific sediment yield. Land-use spatial configurations were also found to have a large effect on the specific sediment yield and explained 21.7% of the observed variation in the specific sediment yield. The following are the dominant first-order factors of the specific sediment yield at the sub-watershed scale: the areal percentages of agriculture and forest, patch density, value of the Shannon's diversity index, contagion. The VIP values suggested that the Shannon's diversity index and contagion are important factors for sediment delivery.
NASA Astrophysics Data System (ADS)
Poggio, Matteo; Brown, David J.; Gasch, Caley K.; Brooks, Erin S.; Yourek, Matt A.
2015-04-01
In the Palouse region of eastern Washington and northern Idaho (USA), spatially discontinuous restrictive layers impede rooting growth and water infiltration. Consequently, accurate maps showing the depth and spatial extent of these restrictive layers are essential for watershed hydrologic modeling appropriate for precision agriculture. In this presentation, we report on the use of a Visible and Near-Infrared (VisNIR) penetrometer fore optic to construct detailed maps of three wheat fields in the Palouse region. The VisNIR penetrometer was used to deliver in situ soil reflectance to an Analytical Spectral Devices (ASD, Boulder, CO, USA) spectrometer and simultaneously acquire insertion force. With a hydraulic push-type soil coring systems for insertion (e.g. Giddings), we collected soil spectra and insertion force data along 41m x 41m grid points (2 fields) and 50m x 50m grid points (1 field) to ≈80cm depth, in addition to interrogation points at 36 representative instrumented locations per field. At each of the 36 instrumented locations, two soil cores were extracted for laboratory determination of clay content and bulk density. We developed calibration models of soil clay content and bulk density with spectra and insertion force collected in situ, using partial least squares regression 2 (PLSR2). Applying spline functions, we delineated clay and bulk density profiles at each points (grid and 24 locations). The soil profiles were then used as inputs in a regression-kriging model with terrain indexes and ECa data (derived from an EM38 field survey, Geonics, Mississauga, Ontario, Canada) as covariates to generate 3D soil maps. Preliminary results show that the VisNIR penetrometer can capture the spatial patterns of restrictive layers. Work is ongoing to evaluate the prediction accuracy of penetrometer-derived 3D clay content and restriction layer maps.
Jia, Shengyao; Li, Hongyang; Wang, Yanjie; Tong, Renyuan; Li, Qing
2017-01-01
Soil is an important environment for crop growth. Quick and accurately access to soil nutrient content information is a prerequisite for scientific fertilization. In this work, hyperspectral imaging (HSI) technology was applied for the classification of soil types and the measurement of soil total nitrogen (TN) content. A total of 183 soil samples collected from Shangyu City (People’s Republic of China), were scanned by a near-infrared hyperspectral imaging system with a wavelength range of 874–1734 nm. The soil samples belonged to three major soil types typical of this area, including paddy soil, red soil and seashore saline soil. The successive projections algorithm (SPA) method was utilized to select effective wavelengths from the full spectrum. Pattern texture features (energy, contrast, homogeneity and entropy) were extracted from the gray-scale images at the effective wavelengths. The support vector machines (SVM) and partial least squares regression (PLSR) methods were used to establish classification and prediction models, respectively. The results showed that by using the combined data sets of effective wavelengths and texture features for modelling an optimal correct classification rate of 91.8%. could be achieved. The soil samples were first classified, then the local models were established for soil TN according to soil types, which achieved better prediction results than the general models. The overall results indicated that hyperspectral imaging technology could be used for soil type classification and soil TN determination, and data fusion combining spectral and image texture information showed advantages for the classification of soil types. PMID:28974005
Oberg, T
2007-01-01
The vapour pressure is the most important property of an anthropogenic organic compound in determining its partitioning between the atmosphere and the other environmental media. The enthalpy of vaporisation quantifies the temperature dependence of the vapour pressure and its value around 298 K is needed for environmental modelling. The enthalpy of vaporisation can be determined by different experimental methods, but estimation methods are needed to extend the current database and several approaches are available from the literature. However, these methods have limitations, such as a need for other experimental results as input data, a limited applicability domain, a lack of domain definition, and a lack of predictive validation. Here we have attempted to develop a quantitative structure-property relationship (QSPR) that has general applicability and is thoroughly validated. Enthalpies of vaporisation at 298 K were collected from the literature for 1835 pure compounds. The three-dimensional (3D) structures were optimised and each compound was described by a set of computationally derived descriptors. The compounds were randomly assigned into a calibration set and a prediction set. Partial least squares regression (PLSR) was used to estimate a low-dimensional QSPR model with 12 latent variables. The predictive performance of this model, within the domain of application, was estimated at n=560, q2Ext=0.968 and s=0.028 (log transformed values). The QSPR model was subsequently applied to a database of 100,000+ structures, after a similar 3D optimisation and descriptor generation. Reliable predictions can be reported for compounds within the previously defined applicability domain.
What’s Wrong with the Murals at the Mogao Grottoes: A Near-Infrared Hyperspectral Imaging Method
Sun, Meijun; Zhang, Dong; Wang, Zheng; Ren, Jinchang; Chai, Bolong; Sun, Jizhou
2015-01-01
Although a significant amount of work has been performed to preserve the ancient murals in the Mogao Grottoes by Dunhuang Cultural Research, non-contact methods need to be developed to effectively evaluate the degree of flaking of the murals. In this study, we propose to evaluate the flaking by automatically analyzing hyperspectral images that were scanned at the site. Murals with various degrees of flaking were scanned in the 126th cave using a near-infrared (NIR) hyperspectral camera with a spectral range of approximately 900 to 1700 nm. The regions of interest (ROIs) of the murals were manually labeled and grouped into four levels: normal, slight, moderate, and severe. The average spectral data from each ROI and its group label were used to train our classification model. To predict the degree of flaking, we adopted four algorithms: deep belief networks (DBNs), partial least squares regression (PLSR), principal component analysis with a support vector machine (PCA + SVM) and principal component analysis with an artificial neural network (PCA + ANN). The experimental results show the effectiveness of our method. In particular, better results are obtained using DBNs when the training data contain a significant amount of striping noise. PMID:26394926
Rapid non-destructive assessment of pork edible quality by using VIS/NIR spectroscopic technique
NASA Astrophysics Data System (ADS)
Zhang, Leilei; Peng, Yankun; Dhakal, Sagar; Song, Yulin; Zhao, Juan; Zhao, Songwei
2013-05-01
The objectives of this research were to develop a rapid non-destructive method to evaluate the edible quality of chilled pork. A total of 42 samples were packed in seal plastic bags and stored at 4°C for 1 to 21 days. Reflectance spectra were collected from visible/near-infrared spectroscopy system in the range of 400nm to 1100nm. Microbiological, physicochemical and organoleptic characteristics such as the total viable counts (TVC), total volatile basic-nitrogen (TVB-N), pH value and color parameters L* were determined to appraise pork edible quality. Savitzky-Golay (SG) based on five and eleven smoothing points, Multiple Scattering Correlation (MSC) and first derivative pre-processing methods were employed to eliminate the spectra noise. The support vector machines (SVM) and partial least square regression (PLSR) were applied to establish prediction models using the de-noised spectra. A linear correlation was developed between the VIS/NIR spectroscopy and parameters such as TVC, TVB-N, pH and color parameter L* indexes, which could gain prediction results with Rv of 0.931, 0.844, 0.805 and 0.852, respectively. The results demonstrated that VIS/NIR spectroscopy technique combined with SVM possesses a powerful assessment capability. It can provide a potential tool for detecting pork edible quality rapidly and non-destructively.
Wu, Yongjiang; Jin, Ye; Ding, Haiying; Luan, Lianjun; Chen, Yong; Liu, Xuesong
2011-09-01
The application of near-infrared (NIR) spectroscopy for in-line monitoring of extraction process of scutellarein from Erigeron breviscapus (vant.) Hand-Mazz was investigated. For NIR measurements, two fiber optic probes designed to transmit NIR radiation through a 2 mm pathlength flow cell were utilized to collect spectra in real-time. High performance liquid chromatography (HPLC) was used as a reference method to determine scutellarein in extract solution. Partial least squares regression (PLSR) calibration model of Savitzky-Golay smoothing NIR spectra in the 5450-10,000 cm(-1) region gave satisfactory predictive results for scutellarein. The results showed that the correlation coefficients of calibration and cross validation were 0.9967 and 0.9811, respectively, and the root mean square error of calibration and cross validation were 0.044 and 0.105, respectively. Furthermore, both the moving block standard deviation (MBSD) method and conformity test were used to identify the end point of extraction process, providing real-time data and instant feedback about the extraction course. The results obtained in this study indicated that the NIR spectroscopy technique provides an efficient and environmentally friendly approach for fast determination of scutellarein and end point control of extraction process. Copyright © 2011 Elsevier B.V. All rights reserved.
Prediction of soil organic carbon in a coal mining area by Vis-NIR spectroscopy.
Sun, Wenjuan; Li, Xinju; Niu, Beibei
2018-01-01
Coal mining has led to increasingly serious land subsidence, and the reclamation of the subsided land has become a hot topic of concern for governments and scholars. Soil quality of reclaimed land is the key indicator to the evaluation of the reclamation effect; hence, rapid monitoring and evaluation of reclaimed land is of great significance. Visible-near infrared (Vis-NIR) spectroscopy has been shown to be a rapid, timely and efficient tool for the prediction of soil organic carbon (SOC). In this study, 104 soil samples were collected from the Baodian mining area of Shandong province. Vis-NIR reflectance spectra and soil organic carbon content were then measured under laboratory conditions. The spectral data were first denoised using the Savitzky-Golay (SG) convolution smoothing method or the multiple scattering correction (MSC) method, after which the spectral reflectance (R) was subjected to reciprocal, reciprocal logarithm and differential transformations to improve spectral sensitivity. Finally, regression models for estimating the SOC content by the spectral data were constructed using partial least squares regression (PLSR). The results showed that: (1) The SOC content in the mining area was generally low (at the below-average level) and exhibited great variability. (2) The spectral reflectance increased with the decrease of soil organic carbon content. In addition, the sensitivity of the spectrum to the change in SOC content, especially that in the near-infrared band of the original reflectance, decreased when the SOC content was low. (3) The modeling results performed best when the spectral reflectance was preprocessed by Savitzky-Golay (SG) smoothing coupled with multiple scattering correction (MSC) and first-order differential transformation (modeling R2 = 0.86, RMSE = 2.00 g/kg, verification R2 = 0.78, RMSE = 1.81 g/kg, and RPD = 2.69). In addition, the first-order differential of R combined with SG, MSC with R, SG together with MSC and R also produced better modeling results than other pretreatment combinations. Vis-NIR modeling with specific spectral preprocessing methods could predict SOC content effectively.
NASA Astrophysics Data System (ADS)
Fang, N. F.; Shi, Z. H.; Chen, F. X.; Zhang, H. Y.; Wang, Y. X.
2015-09-01
Understanding and quantifying sediment loads is important in watersheds with highly erodible materials, which will eventually cause environmental and ecological problems. Within this context, suspended sediment (SS) transport and its temporal dynamics were studied in a small mountainous watershed with sloping lands containing rock fragments in subtropical China. Soils containing rock fragments with many macro-pores have a high permeability rate. Over a 7-year period, the mean runoff coefficient of this watershed was 0.65. Overall, 30 flood events were monitored and accounted for 95.5%, 27.3%, 17.1% of the total SS load, precipitation and total discharge, respectively, over a 5-year period. The presence of rock fragments in soils can affect soil loss. When comparing the soil loss in the studied watershed with that of other watersheds under similar climatic conditions, rock fragments negatively affect soil loss. However, an extreme event occurred on 14 August 1990, and the sediment load exhibited a phenomenon called "small deposits towards lump withdrawal", which resulted in a soil loss of 20,499 t (4.6 times the mean yearly soil loss). This event exhausted most of the SSs stored by the rock fragments on the slope and channel. Following this event, the mean SS concentration (SSC) of the 11 events was 1.05 kg m-3, and the mean SSC of the 18 previous events was 1.75 kg m-3. Twelve variables were separated using the classical hydrograph separation method. Partial least-squares regression (PLSR) was used to determine the highly co-related variables of the discharge. The results indicated that PLSR could explain runoff well. The relationship between discharge and SSC was highly scattered. During 24 flood events, three types of hysteresis loops were observed: clockwise (17 events), figure-eight (3 events), and complex (4 events).
Leaf aging of Amazonian canopy trees as revealed by spectral and physiochemical measurements.
Chavana-Bryant, Cecilia; Malhi, Yadvinder; Wu, Jin; Asner, Gregory P; Anastasiou, Athanasios; Enquist, Brian J; Cosio Caravasi, Eric G; Doughty, Christopher E; Saleska, Scott R; Martin, Roberta E; Gerard, France F
2017-05-01
Leaf aging is a fundamental driver of changes in leaf traits, thereby regulating ecosystem processes and remotely sensed canopy dynamics. We explore leaf reflectance as a tool to monitor leaf age and develop a spectra-based partial least squares regression (PLSR) model to predict age using data from a phenological study of 1099 leaves from 12 lowland Amazonian canopy trees in southern Peru. Results demonstrated monotonic decreases in leaf water (LWC) and phosphorus (P mass ) contents and an increase in leaf mass per unit area (LMA) with age across trees; leaf nitrogen (N mass ) and carbon (C mass ) contents showed monotonic but tree-specific age responses. We observed large age-related variation in leaf spectra across trees. A spectra-based model was more accurate in predicting leaf age (R 2 = 0.86; percent root mean square error (%RMSE) = 33) compared with trait-based models using single (R 2 = 0.07-0.73; %RMSE = 7-38) and multiple (R 2 = 0.76; %RMSE = 28) predictors. Spectra- and trait-based models established a physiochemical basis for the spectral age model. Vegetation indices (VIs) including the normalized difference vegetation index (NDVI), enhanced vegetation index 2 (EVI2), normalized difference water index (NDWI) and photosynthetic reflectance index (PRI) were all age-dependent. This study highlights the importance of leaf age as a mediator of leaf traits, provides evidence of age-related leaf reflectance changes that have important impacts on VIs used to monitor canopy dynamics and productivity and proposes a new approach to predicting and monitoring leaf age with important implications for remote sensing. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
NASA Astrophysics Data System (ADS)
Bornemann, L.; Welp, G.; Amelung, W.
2009-04-01
Comprising more than 60 % of the terrestrial carbon pool, soil organic carbon (SOC) is one of the principal factors regulating the global C-cycle. Against the background of worldwide increasing CO2 emissions, much effort has been put to the modelling of soil-C turnover in order to evaluate its potential for mitigation of climate change. Soil organic matter is an ever changing assemblage of various organic components that interact with the mineral matrix and in dependence of its ecological environment. Carbon storage is thereby assumed to propagate by hierarchical saturation of different carbon pools. A homogeneous distribution of the respective pools within natural environments is unlikely as the controlling soil parameters are subject to spatial and temporal heterogeneity. Several attempts to operationalize this complex soil compartment have been proposed, most of them resting upon a concept of pools with different stability and varying turnover times. Among these pools, particulate organic matter (POM) is considered to be most sensitive to environmental changes and has been shown to explain major parts of the SOC variations. Until today, rather laborious physical and physico-chemical fractionation procedures are most commonly applied for the initialization and validation of POM in C-turnover models. Mid-infrared spectroscopy (MIRS) in combination with partial least squares regression (PLSR) could overcome this problem. The technique is fast, cheap, and requires little sample preparation. All the same, it is an appropriate technique not only for the determination of gross parameters like total soil organic carbon contents, but also for the determination and characterization of minor constituents like black carbon in soils. Basically, the infrared radiation is absorbed by molecules that express a dipole-moment during vibration. As virtually all constituents of soil organic matter and also a multitude of inorganic soil constituents express such a dipole-moment, plentiful chemical information can be extracted from absorption spectra of soil samples. In this work we present the development of calibration models for POM quantification via MIRS-PLSR, and the compilation of a raster data set including SOC and POM of three size classes for the testsite of the SFB-TR32 at Selhausen near Jülich (Germany). The studied test site is an orthic luvisol which has been sampled in a ten times ten meter raster from 0-30 cm depth (n=131). For POM fractionation samples were gently sonicated and material from 2000-250 µm was gained by wet sieving. After a second, more intense sonication, intermediate (250-53 µm) and fine (53-20 µm) material was also gained by wet sieving. All fractions were dried at 40 °C, carbon contents were determined by elemental analysis. For calibration of MIRS-PLSR, SOC contents of 87 bulk soil samples were determined by elemental analysis. Contributions of the different POM fractions to bulk SOC as well as the SOC contents within the particular POM fraction were determined for 36 soil samples by physical particle size fractionation as described above. MIRS-PLSR based predictions for the contribution of POM fractions to bulk soil proved to be satisfactory (R² >0.77) and improved with decreasing particle size. For the predictions of SOC contents in bulk soil and the different POM fractions R² even reached values ≥0.97. Root mean squared errors of the cross validations were in the range of standard deviations of the lab analysis or smaller. As physical fractionation methods are intrinsically susceptible to measurement errors, determination of POM fractions by MIRS analysis may even improve data sets for modelling. Apart from the generally convincing statistical parameters, further evidence for reliable predictions of the contributions of the different POM fractions to bulk SOC could be drawn from the spectral information itself. The spectral features utilized for the determination of the contribution of the different POM fractions to bulk SOC were matching the features for the prediction of the absolute SOC concentrations within the particular fractions. As these predictions were conducted with independent sample sets (bulk soil for the POM contribution and soil fractions for the SOC content within the fraction) the matching structural information for both features of the individual POM fraction indirectly validates the prediction for the POM pools. The latter is especially true as the observed features coincide with the actual knowledge on chemistry and stabilization of POM in soils. For the compilation of a complete raster data-set, the developed calibrations were applied to all of the 131 topsoil samples taken at the SFB-TR32 testsite. Correlation analysis indicated that the coarse and the intermediate POM fractions are related to each other, to bulk SOC content and textural parameters respectively, while the fine POM fraction seems to be independent from these factors. The observed coherences and the applicability of a C-saturation concept will be discussed by visual map-comparison and geostatistical analysis of the determined parameters.
Development of VIS/NIR spectroscopic system for real-time prediction of fresh pork quality
NASA Astrophysics Data System (ADS)
Zhang, Haiyun; Peng, Yankun; Zhao, Songwei; Sasao, Akira
2013-05-01
Quality attributes of fresh meat will influence nutritional value and consumers' purchasing power. The aim of the research was to develop a prototype for real-time detection of quality in meat. It consisted of hardware system and software system. A VIS/NIR spectrograph in the range of 350 to 1100 nm was used to collect the spectral data. In order to acquire more potential information of the sample, optical fiber multiplexer was used. A conveyable and cylindrical device was designed and fabricated to hold optical fibers from multiplexer. High power halogen tungsten lamp was collected as the light source. The spectral data were obtained with the exposure time of 2.17ms from the surface of the sample by press down the trigger switch on the self-developed system. The system could automatically acquire, process, display and save the data. Moreover the quality could be predicted on-line. A total of 55 fresh pork samples were used to develop prediction model for real time detection. The spectral data were pretreated with standard normalized variant (SNV) and partial least squares regression (PLSR) was used to develop prediction model. The correlation coefficient and root mean square error of the validation set for water content and pH were 0.810, 0.653, and 0.803, 0.098 respectively. The research shows that the real-time non-destructive detection system based on VIS/NIR spectroscopy can be efficient to predict the quality of fresh meat.
Park, Hyunjin; Yang, Jin-ju; Seo, Jongbum; Choi, Yu-yong; Lee, Kun-ho; Lee, Jong-min
2014-04-01
Cortical features derived from magnetic resonance imaging (MRI) provide important information to account for human intelligence. Cortical thickness, surface area, sulcal depth, and mean curvature were considered to explain human intelligence. One region of interest (ROI) of a cortical structure consisting of thousands of vertices contained thousands of measurements, and typically, one mean value (first order moment), was used to represent a chosen ROI, which led to a potentially significant loss of information. We proposed a technological improvement to account for human intelligence in which a second moment (variance) in addition to the mean value was adopted to represent a chosen ROI, so that the loss of information would be less severe. Two computed moments for the chosen ROIs were analyzed with partial least squares regression (PLSR). Cortical features for 78 adults were measured and analyzed in conjunction with the full-scale intelligence quotient (FSIQ). Our results showed that 45% of the variance of the FSIQ could be explained using the combination of four cortical features using two moments per chosen ROI. Our results showed improvement over using a mean value for each ROI, which explained 37% of the variance of FSIQ using the same set of cortical measurements. Our results suggest that using additional second order moments is potentially better than using mean values of chosen ROIs for regression analysis to account for human intelligence. Copyright © 2014 Elsevier Ltd. All rights reserved.
Determination of elemental composition of shale rocks by laser induced breakdown spectroscopy
NASA Astrophysics Data System (ADS)
Sanghapi, Hervé K.; Jain, Jinesh; Bol'shakov, Alexander; Lopano, Christina; McIntyre, Dustin; Russo, Richard
2016-08-01
In this study laser induced breakdown spectroscopy (LIBS) is used for elemental characterization of outcrop samples from the Marcellus Shale. Powdered samples were pressed to form pellets and used for LIBS analysis. Partial least squares regression (PLS-R) and univariate calibration curves were used for quantification of analytes. The matrix effect is substantially reduced using the partial least squares calibration method. Predicted results with LIBS are compared to ICP-OES results for Si, Al, Ti, Mg, and Ca. As for C, its results are compared to those obtained by a carbon analyzer. Relative errors of the LIBS measurements are in the range of 1.7 to 12.6%. The limits of detection (LODs) obtained for Si, Al, Ti, Mg and Ca are 60.9, 33.0, 15.6, 4.2 and 0.03 ppm, respectively. An LOD of 0.4 wt.% was obtained for carbon. This study shows that the LIBS method can provide a rapid analysis of shale samples and can potentially benefit depleted gas shale carbon storage research.
Chen, Tao; Chang, Qingrui; Clevers, J G P W; Kooistra, L
2015-11-01
Soil heavy metal pollution due to long-term sewage irrigation is a serious environmental problem in many irrigation areas in northern China. Quickly identifying its pollution status is an important basis for remediation. Visible-near-infrared reflectance spectroscopy (VNIRS) provides a useful tool. In a case study, 76 soil samples were collected and their reflectance spectra were used to estimate cadmium (Cd) concentration by partial least squares regression (PLSR) and back propagation neural network (BPNN). To reduce noise, six pre-treatments were compared, in which orthogonal signal correction (OSC) was first used in soil Cd estimation. Spectral analysis and geostatistics were combined to identify Cd pollution hotspots. Results showed that Cd was accumulated in topsoil at the study area. OSC can effectively remove irrelevant information to improve prediction accuracy. More accurate estimation was achieved by applying a BPNN. Soil Cd pollution hotspots could be identified by interpolating the predicted values obtained from spectral estimates. Copyright © 2015 Elsevier Ltd. All rights reserved.
Khorasani, Milad; Amigo, José M; Sun, Changquan Calvin; Bertelsen, Poul; Rantanen, Jukka
2015-06-01
In the present study the application of near-infrared chemical imaging (NIR-CI) supported by chemometric modeling as non-destructive tool for monitoring and assessing the roller compaction and tableting processes was investigated. Based on preliminary risk-assessment, discussion with experts and current work from the literature the critical process parameter (roll pressure and roll speed) and critical quality attributes (ribbon porosity, granule size, amount of fines, tablet tensile strength) were identified and a design space was established. Five experimental runs with different process settings were carried out which revealed intermediates (ribbons, granules) and final products (tablets) with different properties. Principal component analysis (PCA) based model of NIR images was applied to map the ribbon porosity distribution. The ribbon porosity distribution gained from the PCA based NIR-CI was used to develop predictive models for granule size fractions. Predictive methods with acceptable R(2) values could be used to predict the granule particle size. Partial least squares regression (PLS-R) based model of the NIR-CI was used to map and predict the chemical distribution and content of active compound for both roller compacted ribbons and corresponding tablets. In order to select the optimal process, setting the standard deviation of tablet tensile strength and tablet weight for each tablet batch was considered. Strong linear correlation between tablet tensile strength and amount of fines and granule size was established, respectively. These approaches are considered to have a potentially large impact on quality monitoring and control of continuously operating manufacturing lines, such as roller compaction and tableting processes. Copyright © 2015 Elsevier B.V. All rights reserved.
Modeling pollen time series using seasonal-trend decomposition procedure based on LOESS smoothing
NASA Astrophysics Data System (ADS)
Rojo, Jesús; Rivero, Rosario; Romero-Morte, Jorge; Fernández-González, Federico; Pérez-Badia, Rosa
2017-02-01
Analysis of airborne pollen concentrations provides valuable information on plant phenology and is thus a useful tool in agriculture—for predicting harvests in crops such as the olive and for deciding when to apply phytosanitary treatments—as well as in medicine and the environmental sciences. Variations in airborne pollen concentrations, moreover, are indicators of changing plant life cycles. By modeling pollen time series, we can not only identify the variables influencing pollen levels but also predict future pollen concentrations. In this study, airborne pollen time series were modeled using a seasonal-trend decomposition procedure based on LOcally wEighted Scatterplot Smoothing (LOESS) smoothing (STL). The data series—daily Poaceae pollen concentrations over the period 2006-2014—was broken up into seasonal and residual (stochastic) components. The seasonal component was compared with data on Poaceae flowering phenology obtained by field sampling. Residuals were fitted to a model generated from daily temperature and rainfall values, and daily pollen concentrations, using partial least squares regression (PLSR). This method was then applied to predict daily pollen concentrations for 2014 (independent validation data) using results for the seasonal component of the time series and estimates of the residual component for the period 2006-2013. Correlation between predicted and observed values was r = 0.79 (correlation coefficient) for the pre-peak period (i.e., the period prior to the peak pollen concentration) and r = 0.63 for the post-peak period. Separate analysis of each of the components of the pollen data series enables the sources of variability to be identified more accurately than by analysis of the original non-decomposed data series, and for this reason, this procedure has proved to be a suitable technique for analyzing the main environmental factors influencing airborne pollen concentrations.
Modeling pollen time series using seasonal-trend decomposition procedure based on LOESS smoothing.
Rojo, Jesús; Rivero, Rosario; Romero-Morte, Jorge; Fernández-González, Federico; Pérez-Badia, Rosa
2017-02-01
Analysis of airborne pollen concentrations provides valuable information on plant phenology and is thus a useful tool in agriculture-for predicting harvests in crops such as the olive and for deciding when to apply phytosanitary treatments-as well as in medicine and the environmental sciences. Variations in airborne pollen concentrations, moreover, are indicators of changing plant life cycles. By modeling pollen time series, we can not only identify the variables influencing pollen levels but also predict future pollen concentrations. In this study, airborne pollen time series were modeled using a seasonal-trend decomposition procedure based on LOcally wEighted Scatterplot Smoothing (LOESS) smoothing (STL). The data series-daily Poaceae pollen concentrations over the period 2006-2014-was broken up into seasonal and residual (stochastic) components. The seasonal component was compared with data on Poaceae flowering phenology obtained by field sampling. Residuals were fitted to a model generated from daily temperature and rainfall values, and daily pollen concentrations, using partial least squares regression (PLSR). This method was then applied to predict daily pollen concentrations for 2014 (independent validation data) using results for the seasonal component of the time series and estimates of the residual component for the period 2006-2013. Correlation between predicted and observed values was r = 0.79 (correlation coefficient) for the pre-peak period (i.e., the period prior to the peak pollen concentration) and r = 0.63 for the post-peak period. Separate analysis of each of the components of the pollen data series enables the sources of variability to be identified more accurately than by analysis of the original non-decomposed data series, and for this reason, this procedure has proved to be a suitable technique for analyzing the main environmental factors influencing airborne pollen concentrations.
Mabood, Fazal; Abbas, Ghulam; Jabeen, Farah; Naureen, Zakira; Al-Harrasi, Ahmed; Hamaed, Ahmad M; Hussain, Javid; Al-Nabhani, Mahmood; Al Shukaili, Maryam S; Khan, Alamgir; Manzoor, Suryyia
2018-03-01
Cows' butterfat may be adulterated with animal fat materials like tallow which causes increased serum cholesterol and triglycerides levels upon consumption. There is no reliable technique to detect and quantify tallow adulteration in butter samples in a feasible way. In this study a highly sensitive near-infrared (NIR) spectroscopy combined with chemometric methods was developed to detect as well as quantify the level of tallow adulterant in clarified butter samples. For this investigation the pure clarified butter samples were intentionally adulterated with tallow at the following percentage levels: 1%, 3%, 5%, 7%, 9%, 11%, 13%, 15%, 17% and 20% (wt/wt). Altogether 99 clarified butter samples were used including nine pure samples (un-adulterated clarified butter) and 90 clarified butter samples adulterated with tallow. Each sample was analysed by using NIR spectroscopy in the reflection mode in the range 10,000-4000 cm -1 , at 2 cm -1 resolution and using the transflectance sample accessory which provided a total path length of 0.5 mm. Chemometric models including principal components analysis (PCA), partial least-squares discriminant analysis (PLSDA), and partial least-squares regressions (PLSR) were applied for statistical treatment of the obtained NIR spectral data. The PLSDA model was employed to differentiate pure butter samples from those adulterated with tallow. The employed model was then externally cross-validated by using a test set which included 30% of the total butter samples. The excellent performance of the model was proved by the low RMSEP value of 1.537% and the high correlation factor of 0.95. This newly developed method is robust, non-destructive, highly sensitive, and economical with very minor sample preparation and good ability to quantify less than 1.5% of tallow adulteration in clarified butter samples.
NASA Astrophysics Data System (ADS)
Araya, F. Z.; Abdul-Aziz, O. I.
2017-12-01
This study utilized a systematic data analytics approach to determine the relative linkages of stream dissolved oxygen (DO) with the hydro-climatic and biogeochemical drivers across the U.S. Pacific Coast. Multivariate statistical techniques of Pearson correlation matrix, principal component analysis, and factor analysis were applied to a complex water quality dataset (1998-2015) at 35 water quality monitoring stations of USGS NWIS and EPA STORET. Power-law based partial least squares regression (PLSR) models with a bootstrap Monte Carlo procedure (1000 iterations) were developed to reliably estimate the relative linkages by resolving multicollinearity (Nash-Sutcliffe Efficiency, NSE = 0.50-0.94). Based on the dominant drivers, four environmental regimes have been identified and adequately described the system-data variances. In Pacific North West and Southern California, water temperature was the most dominant driver of DO in majority of the streams. However, in Central and Northern California, stream DO was controlled by multiple drivers (i.e., water temperature, pH, stream flow, and total phosphorus), exhibiting a transitional environmental regime. Further, total phosphorus (TP) appeared to be the limiting nutrient for most streams. The estimated linkages and insights would be useful to identify management priorities to achieve healthy coastal stream ecosystems across the Pacific Coast of U.S.A. and similar regions around the world. Keywords: Data analytics, water quality, coastal streams, dissolved oxygen, environmental regimes, Pacific Coast, United States.
NASA Astrophysics Data System (ADS)
Beganović, Anel; Beć, Krzysztof B.; Henn, Raphael; Huck, Christian W.
2018-05-01
The applicability of two elimination techniques for interferences occurring in measurements with cells of short pathlength using Fourier transform near-infrared (FT-NIR) spectroscopy was evaluated. Due to the growing interest in the field of vibrational spectroscopy in aqueous biological fluids (e.g. glucose in blood), aqueous solutions of D-(+)-glucose were prepared and split into a calibration set and an independent validation set. All samples were measured with two FT-NIR spectrometers at various spectral resolutions. Moving average smoothing (MAS) and fast Fourier transform filter (FFT filter) were applied to the interference affected FT-NIR spectra in order to eliminate the interference pattern. After data pre-treatment, partial least squares regression (PLSR) models using different NIR regions were constructed using untreated (interference affected) spectra and spectra treated with MAS and FFT filter. The prediction of the independent validation set revealed information about the performance of the utilized interference elimination techniques, as well as the different NIR regions. The results showed that the combination band of water at approx. 5200 cm-1 is of great importance since its performance was superior to the one of the so-called first overtone of water at approx. 6800 cm-1. Furthermore, this work demonstrated that MAS and FFT filter are fast and easy-to-use techniques for the elimination of interference fringes in FT-NIR transmittance spectroscopy.
Yang, Ling Yu; Gao, Xiao Hong; Zhang, Wei; Shi, Fei Fei; He, Lin Hua; Jia, Wei
2016-06-01
In this study, we explored the feasibility of estimating the soil heavy metal concentrations using the hyperspectral satellite image. The concentration of As, Pb, Zn and Cd elements in 48 topsoil samples collected from the field in Yushu County of the Sanjiangyuan regions was measured in the laboratory. We then extracted 176 vegetation spectral reflectance bands of 48 soil samples as well as five vegetation indices from two Hyperion images. Following that, the partial least squares regression (PLSR) method was employed to estimate the soil heavy metal concentrations using the above two independent sets of Hyperion-derived variables, separately constructed the estimation model between the 176 vegetation spectral reflectance bands and the soil heavy metal concentrations (called the vegetation spectral reflectance-based estimation model), and between the five vegetation indices being used as the independent variable and the soil heavy metal concentrations (called synthetic vegetation index-based estimation model). Using RPD (the ratio of standard deviation from the 4 heavy metals measured values of the validation samples to RMSE) as the validation criteria, the RPDs of As and Pb concentrations from the two models were both less than 1.4, which suggested that both models were incapable of roughly estimating As and Pb concentrations; whereas the RPDs of Zn and Cd were 1.53, 1.46 and 1.46, 1.42, respectively, which implied that both models had the ability for rough estimation of Zn and Cd concentrations. Based on those results, the vegetation spectral-based estimation model was selected to obtain the spatial distribution map of Zn concentration in combination with the Hyperion image. The estimated Zn map showed that the zones with high Zn concentrations were distributed near the provincial road 308, national road 214 and towns, which could be influenced by human activities. Our study proved that the spectral reflectance of Hyperion image was useful in estimating the soil concentrations of Zn and Cd.
Liu, Jinbao; Zhang, Yang; Wang, Huanyuan; Du, Yichun
2018-06-15
The estimation of soils heavy metal content can reflect the impending surroundings of surface, which lays theoretical foundation for using covered vegetation to monitor environment and investigate resource. In this study, the contents of Cr, Mn, Ni, Cu, Zn, As, Cd, Hg and Pb in 44 soil samples were collected from Fufeng County, Yangling County and Wugong County, Shaanxi Province and were used as data sources. ASD FieldSpec HR (350-2500nm), and then the NOR, MSC and SNV of the reflectance were pretreated, the first deviation, second deviation and reflectance reciprocal logarithmic transformation were carried out. The optimal spectroscopy estimation model of nine heavy metal elements of Cr, Mn, Ni, Cu, Zn, As, Cd, Hg and Pb was established by regression method. Comparing the diffuse reflectance characteristics of different heavy metal contents and the effect of different pretreatment methods on the establishment of soil heavy metal spectral inversion model. The results of chemical analysis show that there was a serious Hg pollution in the study area, and the Cd content was close to the critical value. The results show that: (1) NOR, MSC and SNV were adopted for the acquisition of visible near-infrared. Combining differential transformation can improve the information of heavy metal elements in the soil, and use the correlation band energy Significantly improve the stability and predictability of the model. (2) The modeling accuracy of the optimal model of nine heavy metal spectra of Cr, Mn, Ni, Cu, Zn, As, Cd, Hg and Pb by PLSR method were 0.70, 0.79, 0.69, 0.81, 0.86, 0.58, 0.55, 0.99, 0.62. (3) The optimal estimation model of different elements using different treatment methods has better stability and higher precision, and can realize the rapid prediction of nine kinds of heavy metal elements in this region. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Liu, Jinbao; Zhang, Yang; Wang, Huanyuan; Du, Yichun
2018-06-01
The estimation of soils heavy metal content can reflect the impending surroundings of surface, which lays theoretical foundation for using covered vegetation to monitor environment and investigate resource. In this study, the contents of Cr, Mn, Ni, Cu, Zn, As, Cd, Hg and Pb in 44 soil samples were collected from Fufeng County, Yangling County and Wugong County, Shaanxi Province and were used as data sources. ASD FieldSpec HR (350-2500 nm), and then the NOR, MSC and SNV of the reflectance were pretreated, the first deviation, second deviation and reflectance reciprocal logarithmic transformation were carried out. The optimal spectroscopy estimation model of nine heavy metal elements of Cr, Mn, Ni, Cu, Zn, As, Cd, Hg and Pb was established by regression method. Comparing the diffuse reflectance characteristics of different heavy metal contents and the effect of different pretreatment methods on the establishment of soil heavy metal spectral inversion model. The results of chemical analysis show that there was a serious Hg pollution in the study area, and the Cd content was close to the critical value. The results show that: (1) NOR, MSC and SNV were adopted for the acquisition of visible near-infrared. Combining differential transformation can improve the information of heavy metal elements in the soil, and use the correlation band energy Significantly improve the stability and predictability of the model. (2) The modeling accuracy of the optimal model of nine heavy metal spectra of Cr, Mn, Ni, Cu, Zn, As, Cd, Hg and Pb by PLSR method were 0.70, 0.79, 0.69, 0.81, 0.86, 0.58, 0.55, 0.99, 0.62. (3) The optimal estimation model of different elements using different treatment methods has better stability and higher precision, and can realize the rapid prediction of nine kinds of heavy metal elements in this region.
NASA Astrophysics Data System (ADS)
Kriegs, Stefanie; Buddenbaum, Henning; Rogge, Derek; Steffens, Markus
2015-04-01
Laboratory imaging Vis-NIR spectroscopy of soil profiles is a novel technique in soil science that can determine quantity and quality of various chemical soil properties with a hitherto unreached spatial resolution in undisturbed soil profiles. We have applied this technique to soil cores in order to get quantitative proof of redoximorphic processes under two different tree species and to proof tree-soil interactions at microscale. Due to the imaging capabilities of Vis-NIR spectroscopy a spatially explicit understanding of soil processes and properties can be achieved. Spatial heterogeneity of the soil profile can be taken into account. We took six 30 cm long rectangular soil columns of adjacent Luvisols derived from quaternary aeolian sediments (Loess) in a forest soil near Freising/Bavaria using stainless steel boxes (100×100×300 mm). Three profiles were sampled under Norway spruce and three under European beech. A hyperspectral camera (VNIR, 400-1000 nm in 160 spectral bands) with spatial resolution of 63×63 µm² per pixel was used for data acquisition. Reference samples were taken at representative spots and analysed for organic carbon (OC) quantity and quality with a CN elemental analyser and for iron oxides (Fe) content using dithionite extraction followed by ICP-OES measurement. We compared two supervised classification algorithms, Spectral Angle Mapper and Maximum Likelihood, using different sets of training areas and spectral libraries. As established in chemometrics we used multivariate analysis such as partial least-squares regression (PLSR) in addition to multivariate adaptive regression splines (MARS) to correlate chemical data with Vis-NIR spectra. As a result elemental mapping of Fe and OC within the soil core at high spatial resolution has been achieved. The regression model was validated by a new set of reference samples for chemical analysis. Digital soil classification easily visualizes soil properties within the soil profiles. By combining both techniques, detailed soil maps, elemental balances and a deeper understanding of soil forming processes at the microscale become feasible for complete soil profiles.
Nhouchi, Zeineb; Karoui, Romdhane
2018-06-30
The aim of the present study was to investigate the ability of MIR and texture analyzer to evaluate the quality of pound cake samples produced with palm oil and rapeseed oil throughout storage. The MIR spectra analyzed by using principal component analysis (PCA) showed a clear separation of pound cakes as a function of the storage time and the nature of the used oil in the recipe. By applying partial least square regression (PLSR), excellent prediction was obtained for hardness (R 2 = 0.91; RPD = 2.26), while an approximate qualitative prediction was found for springiness (R 2 = 0.73; RPD = 2.07), cohesiveness (R 2 = 0.67; RPD = 1.31) and resilience (R 2 = 0.65; RPD = 1.24). It could be concluded that the MIR spectroscopy could be used as a rapid and non-destructive technique for monitoring texture of pound cakes throughout storage as well as for the prediction of their hardness. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Jensen, D.; Cavanaugh, K. C.; Simard, M.
2016-12-01
Coastal wetlands provide a wealth of ecosystem services, including improved water quality, protection from storm surges, and wildlife habitat. Louisiana's wetlands, however, are threatened by development, pollution, and relative sea level rise (RSLR)—the combination of sea level rise and subsidence rates. Beyond causing land loss, RSLR impacts Louisiana's wetland ecosystems by altering salinity, nutrient availability, flood duration, and flood frequency in the region. Despite widespread wetland loss, areas such as the Wax Lake and Atchafalaya river deltas are in fact growing due to their sediment loads, resulting in a complex of both degradation and aggradation along the Louisiana coast. In order to understand and model how coastal wetlands are responding to RSLR, there is a need for improved vegetation distribution mapping, biomass estimation, and ecosystem change modeling. To this end, high-resolution imaging spectroscopy offers the ability to accurately develop species-level distribution maps and predictive aboveground biomass (AGB) models. AVIRIS-NG data collected over the Atchafalaya River Delta were calibrated to reduce Bidirectional Reflectance Distribution Function (BRDF) effects and mosaicked, along with other scenes that coincided with field observations. Multiple Endmember Spectral Mixture Analysis (MESMA) was used to map salt marsh at the species level across our study area. Field observations were used to parameterize and validate our MESMA based approach. AGB was then mapped for this region using a partial least squares regression (PLSR) model developed from the same imagery and field measurements. Last, the Sea Level Affecting Marshes Model was applied to predict wetland loss and changes in marsh composition due to sea level rise, which was then paired with the AGB map to estimate carbon storage change. In doing so, this study addresses key concerns for coastal regions and demonstrates the ability of imaging spectroscopy to predict those impacts.
Spectroscopic determination of leaf traits using infrared spectra
NASA Astrophysics Data System (ADS)
Buitrago, Maria F.; Groen, Thomas A.; Hecker, Christoph A.; Skidmore, Andrew K.
2018-07-01
Leaf traits characterise and differentiate single species but can also be used for monitoring vegetation structure and function. Conventional methods to measure leaf traits, especially at the molecular level (e.g. water, lignin and cellulose content), are expensive and time-consuming. Spectroscopic methods to estimate leaf traits can provide an alternative approach. In this study, we investigated high spectral resolution (6612 bands) emissivity measurements from the short to the long wave infrared (1.4-16.0 μm) of leaves from 19 different plant species ranging from herbaceous to woody, and from temperate to tropical types. At the same time, we measured 14 leaf traits to characterise a leaf, including chemical (e.g., leaf water content, nitrogen, cellulose) and physical features (e.g., leaf area and leaf thickness). We fitted partial least squares regression (PLSR) models across the SWIR, MWIR and LWIR for each leaf trait. Then, reduced models (PLSRred) were derived by iteratively reducing the number of bands in the model (using a modified Jackknife resampling method with a Martens and Martens uncertainty test) down to a few bands (4-10 bands) that contribute the most to the variation of the trait. Most leaf traits could be determined from infrared data with a moderate accuracy (65 < Rcv2 < 77% for observed versus predicted plots) based on PLSRred models, while the accuracy using the whole infrared range (6612 bands) presented higher accuracies, 74 < Rcv2 < 90%. Using the full SWIR range (1.4-2.5 μm) shows similarly high accuracies compared to the whole infrared. Leaf thickness, leaf water content, cellulose, lignin and stomata density are the traits that could be estimated most accurately from infrared data (with Rcv2 above 0.80 for the full range models). Leaf thickness, cellulose and lignin were predicted with reasonable accuracy from a combination of single infrared bands. Nevertheless, for all leaf traits, a combination of a few bands yields moderate to accurate estimations.
Kinoshita, Rintaro; Moebius-Clune, Bianca N.; van Es, Harold M.; Hively, W. Dean; Bilgilis, A. Volkan
2012-01-01
Visible and near-infrared reflectance spectroscopy (VNIRS) is a rapid and nondestructive method that can predict multiple soil properties simultaneously, but its application in multidimensional soil quality (SQ) assessment in the tropics still needs to be further assessed. In this study, VNIRS (350–2500 nm) was employed to analyze 227 air-dried soil samples of Ultisols from a soil chronosequence in western Kenya and assess 16 SQ indicators. Partial least squares regression (PLSR) was validated using the full-site cross-validation method by grouping samples from each farm or forest site. Most suitable models successfully predicted SQ indicators (R2 ≥ 0.80; ratio of performance to deviation [RPD] ≥ 2.00) including soil organic matter (OMLOI), active C, Ca, cation exchange capacity (CEC), and clay. Moderately-well predicted indicators (0.50 ≤ R2 pwp), and field capacity (Θfc). Poorly predicted indicators (R2 < 0.50; RPD < 1.40) were EC, S, P, available water capacity (AWC), K, Zn, and penetration resistance. Combining VNIRS with selected field- and laboratory-measured SQ indicator values increased predictability. Furthermore, VNIRS showed moderate to substantial agreement in predicting interpretive SQ scores and a composite soil quality index (CSQI) especially when combined with directly measured SQ indicator values. In conclusion, VNIRS has good potential for low cost, rapid assessment of physical and biological SQ indicators but conventional soil chemical tests may need to be retained to provide comprehensive SQ assessments.
NASA Astrophysics Data System (ADS)
Li, Can; Wang, Fei; Zang, Lixuan; Zang, Hengchang; Alcalà, Manel; Nie, Lei; Wang, Mingyu; Li, Lian
2017-03-01
Nowadays, as a powerful process analytical tool, near infrared spectroscopy (NIRS) has been widely applied in process monitoring. In present work, NIRS combined with multivariate analysis was used to monitor the ethanol precipitation process of fraction I + II + III (FI + II + III) supernatant in human albumin (HA) separation to achieve qualitative and quantitative monitoring at the same time and assure the product's quality. First, a qualitative model was established by using principal component analysis (PCA) with 6 of 8 normal batches samples, and evaluated by the remaining 2 normal batches and 3 abnormal batches. The results showed that the first principal component (PC1) score chart could be successfully used for fault detection and diagnosis. Then, two quantitative models were built with 6 of 8 normal batches to determine the content of the total protein (TP) and HA separately by using partial least squares regression (PLS-R) strategy, and the models were validated by 2 remaining normal batches. The determination coefficient of validation (Rp2), root mean square error of cross validation (RMSECV), root mean square error of prediction (RMSEP) and ratio of performance deviation (RPD) were 0.975, 0.501 g/L, 0.465 g/L and 5.57 for TP, and 0.969, 0.530 g/L, 0.341 g/L and 5.47 for HA, respectively. The results showed that the established models could give a rapid and accurate measurement of the content of TP and HA. The results of this study indicated that NIRS is an effective tool and could be successfully used for qualitative and quantitative monitoring the ethanol precipitation process of FI + II + III supernatant simultaneously. This research has significant reference value for assuring the quality and improving the recovery ratio of HA in industrialization scale by using NIRS.
Li, Can; Wang, Fei; Zang, Lixuan; Zang, Hengchang; Alcalà, Manel; Nie, Lei; Wang, Mingyu; Li, Lian
2017-03-15
Nowadays, as a powerful process analytical tool, near infrared spectroscopy (NIRS) has been widely applied in process monitoring. In present work, NIRS combined with multivariate analysis was used to monitor the ethanol precipitation process of fraction I+II+III (FI+II+III) supernatant in human albumin (HA) separation to achieve qualitative and quantitative monitoring at the same time and assure the product's quality. First, a qualitative model was established by using principal component analysis (PCA) with 6 of 8 normal batches samples, and evaluated by the remaining 2 normal batches and 3 abnormal batches. The results showed that the first principal component (PC1) score chart could be successfully used for fault detection and diagnosis. Then, two quantitative models were built with 6 of 8 normal batches to determine the content of the total protein (TP) and HA separately by using partial least squares regression (PLS-R) strategy, and the models were validated by 2 remaining normal batches. The determination coefficient of validation (R p 2 ), root mean square error of cross validation (RMSECV), root mean square error of prediction (RMSEP) and ratio of performance deviation (RPD) were 0.975, 0.501g/L, 0.465g/L and 5.57 for TP, and 0.969, 0.530g/L, 0.341g/L and 5.47 for HA, respectively. The results showed that the established models could give a rapid and accurate measurement of the content of TP and HA. The results of this study indicated that NIRS is an effective tool and could be successfully used for qualitative and quantitative monitoring the ethanol precipitation process of FI+II+III supernatant simultaneously. This research has significant reference value for assuring the quality and improving the recovery ratio of HA in industrialization scale by using NIRS. Copyright © 2016 Elsevier B.V. All rights reserved.
Serbin, Shawn P.; Singh, Aditya; Desai, Ankur R.; ...
2015-06-11
To date, the utility of ecosystem and Earth system models (EESMs) has been limited by poor spatial and temporal representation of critical input parameters. For example, EESMs often rely on leaf-scale or literature-derived estimates for a key determinant of canopy photosynthesis, the maximum velocity of RuBP carboxylation (Vcmax, μmol m –2 s –1). Our recent work (Ainsworth et al., 2014; Serbin et al., 2012) showed that reflectance spectroscopy could be used to estimate Vcmax at the leaf level. Here, we present evidence that imaging spectroscopy data can be used to simultaneously predict Vcmax and its sensitivity to temperature (E V)more » at the canopy scale. In 2013 and 2014, high-altitude Airborne Visible/Infrared Imaging Spectroscopy (AVIRIS) imagery and contemporaneous ground-based assessments of canopy structure and leaf photosynthesis were acquired across an array of monospecific agroecosystems in central and southern California, USA. A partial least-squares regression (PLSR) modeling approach was employed to characterize the pixel-level variation in canopy V cmax (at a standardized canopy temperature of 30 °C) and E V, based on visible and shortwave infrared AVIRIS spectra (414–2447 nm). Our approach yielded parsimonious models with strong predictive capability for Vcmax (at 30 °C) and E V (R 2 of withheld data = 0.94 and 0.92, respectively), both of which varied substantially in the field (≥ 1.7 fold) across the sampled crop types. The models were applied to additional AVIRIS imagery to generate maps of V cmax and E V, as well as their uncertainties, for agricultural landscapes in California. The spatial patterns exhibited in the maps were consistent with our in-situ observations. As a result, these findings highlight the considerable promise of airborne and, by implication, space-borne imaging spectroscopy, such as the proposed HyspIRI mission, to map spatial and temporal variation in key drivers of photosynthetic metabolism in terrestrial vegetation.« less
NASA Astrophysics Data System (ADS)
Anggraeni, Anni; Arianto, Fernando; Mutalib, Abdul; Pratomo, Uji; Bahti, Husein H.
2017-05-01
Rare Earth Elements (REE) are elements that a lot of function for life, such as metallurgy, optical devices, and manufacture of electronic devices. Sources of REE is present in the mineral, in which each element has similar properties. Currently, to determining the content of REE is used instruments such as ICP-OES, ICP-MS, XRF, and HPLC. But in each instruments, there are still have some weaknesses. Therefore we need an alternative analytical method for the determination of rare earth metal content, one of them is by a combination of UV-Visible spectrophotometry and multivariate analysis, including Principal Component Analysis (PCA), Principal Component Regression (PCR), and Partial Least Square Regression (PLS). The purpose of this experiment is to determine the content of light and medium rare earth elements in the mineral monazite without chemical separation by using a combination of multivariate analysis and UV-Visible spectrophotometric methods. Training set created 22 variations of concentration and absorbance was measured using a UV-Vis spectrophotometer, then the data is processed by PCA, PCR, and PLSR. The results were compared and validated to obtain the mathematical equation with the smallest percent error. From this experiment, mathematical equation used PLS methods was better than PCR after validated, which has RMSE value for La, Ce, Pr, Nd, Gd, Sm, Eu, and Tb respectively 0.095; 0.573; 0.538; 0.440; 3.387; 1.240; 1.870; and 0.639.
Lin, Shunshun; Zhang, Xiaoming; Song, Shiqing; Hayat, Khizar; Eric, Karangwa; Majeed, Hamid
2016-03-01
Based on encouraged development of potential reduced-exposure products (PREPs) by the US Institute of Medicine, casings (glucose and peptides) added treatments (CAT) and enzymatic (protease and xylanase) hydrolysis treatments (EHT) were developed to study their effect on alkaloids reduction in tobacco and cigarette mainstream smoke (MS) and further investigate the correlation between sensory attributes and alkaloids. Results showed that the developed treatments reduced nicotine by 14.5% and 24.4% in tobacco and cigarette MS, respectively, indicating that both CAT and EHT are potentially effective for developing lower-risk cigarettes. Sensory and electronic nose analysis confirmed the significant influence of treatments on sensory and cigarette MS components. PLSR analysis demonstrated that tobacco alkaloids were positively correlated to the off-taste, irritation and impact attributes, and negatively correlated to the aroma and softness attributes. Additionally, nicotine and anabasine from tobacco leaves positively contributed to the impact attribute, while they negatively contributed to the aroma attribute (P<0.05). Meanwhile, most alkaloids in cigarette MS positively contributed to the impact and irritation attributes (P<0.05). Hence, this study paved a way to better understand the correlation between tobacco alkaloids and sensory attributes. Copyright © 2015 Elsevier Inc. All rights reserved.
Soil Organic Carbon Estimation and Mapping Using "on-the-go" VisNIR Spectroscopy
NASA Astrophysics Data System (ADS)
Brown, D. J.; Bricklemyer, R. S.; Christy, C.
2007-12-01
Soil organic carbon (SOC) and other soil properties related to carbon sequestration (eg. soil clay content and mineralogy) vary spatially across landscapes. To cost effectively capture this variability, new technologies, such as Visible and Near Infrared (VisNIR) spectroscopy, have been applied to soils for rapid, accurate, and inexpensive estimation of SOC and other soil properties. For this study, we evaluated an "on the go" VisNIR sensor developed by Veris Technologies, Inc. (Salinas, KS) for mapping SOC, soil clay content and mineralogy. The Veris spectrometer spanned 350 to 2224 nm with 8 nm spectral resolution, and 25 spectra were integrated every 2 seconds resulting in 3 -5 m scanning distances on the ground. The unit was mounted to a mobile sensor platform pulled by a tractor, and scanned soils at an average depth of 10 cm through a quartz-sapphire window. We scanned eight 16.2 ha (40 ac) wheat fields in north central Montana (USA), with 15 m transect intervals. Using random sampling with spatial inhibition, 100 soil samples from 0-10 cm depths were extracted along scanned transects from each field and were analyzed for SOC. Neat, sieved (<2 mm) soil sample materials were also scanned in the lab using an Analytical Spectral Devices (ASD, Boulder, CO, USA) Fieldspec Pro FR spectroradiometer with a spectral range of 350-2500 and spectral resolution of 2-10 nm. The analyzed samples were used to calibrate and validate a number of partial least squares regression (PLSR) VisNIR models to compare on-the-go scanning vs. higher spectral resolution laboratory spectroscopy vs. standard SOC measurement methods.
Gashaw, Temesgen; Tulu, Taffa; Argaw, Mekuria; Worqlul, Abeyou W
2018-04-01
Understanding the hydrological response of a watershed to land use/land cover (LULC) changes is imperative for water resources management planning. The objective of this study was to analyze the hydrological impacts of LULC changes in the Andassa watershed for a period of 1985-2015 and to predict the LULC change impact on the hydrological status in year 2045. The hybrid land use classification technique for classifying Landsat images (1985, 2000 and 2015); Cellular-Automata Markov (CA-Markov) for prediction of the 2030 and 2045 LULC states; the Soil and Water Assessment Tool (SWAT) for hydrological modeling were employed in the analyses. In order to isolate the impacts of LULC changes, the LULC maps were used independently while keeping the other SWAT inputs constant. The contribution of each of the LULC classes was examined with the Partial Least Squares Regression (PLSR) model. The results showed that there was a continuous expansion of cultivated land and built-up area, and withdrawing of forest, shrubland and grassland during the 1985-2015 periods, which are expected to continue in the 2030 and 2045 periods. The LULC changes, which had occurred during the period of 1985 to 2015, had increased the annual flow (2.2%), wet seasonal flow (4.6%), surface runoff (9.3%) and water yield (2.4%). Conversely, the observed changes had reduced dry season flow (2.8%), lateral flow (5.7%), groundwater flow (7.8%) and ET (0.3%). The 2030 and 2045 LULC states are expected to further increase the annual and wet season flow, surface runoff and water yield, and reduce dry season flow, groundwater flow, lateral flow and ET. The change in hydrological components is a direct result of the significant transition from the vegetation to non-vegetation cover in the watershed. This suggests an urgent need to regulate the LULC in order to maintain the hydrological balance. Copyright © 2017 Elsevier B.V. All rights reserved.
Sarcoptic mange breaks up bottom-up regulation of body condition in a large herbivore population.
Carvalho, João; Granados, José E; López-Olvera, Jorge R; Cano-Manuel, Francisco Javier; Pérez, Jesús M; Fandos, Paulino; Soriguer, Ramón C; Velarde, Roser; Fonseca, Carlos; Ráez, Arian; Espinosa, José; Pettorelli, Nathalie; Serrano, Emmanuel
2015-11-06
Both parasitic load and resource availability can impact individual fitness, yet little is known about the interplay between these parameters in shaping body condition, a key determinant of fitness in wild mammals inhabiting seasonal environments. Using partial least square regressions (PLSR), we explored how temporal variation in climatic conditions, vegetation dynamics and sarcoptic mange (Sarcoptes scabiei) severity impacted body condition of 473 Iberian ibexes (Capra pyrenaica) harvested between 1995 and 2008 in the highly seasonal Alpine ecosystem of Sierra Nevada Natural Space (SNNS), southern Spain. Bottom-up regulation was found to only occur in healthy ibexes; the condition of infected ibexes was independent of primary productivity and snow cover. No link between ibex abundance and ibex body condition could be established when only considering infected individuals. The pernicious effects of mange on Iberian ibexes overcome the benefits of favorable environmental conditions. Even though the increase in primary production exerts a positive effect on the body condition of healthy ibexes, the scabietic individuals do not derive any advantage from increased resource availability. Further applied research coupled with continuous sanitary surveillance are needed to address remaining knowledge gaps associated with the transmission dynamics and management of sarcoptic mange in free-living populations.
Investigating the Moisture Content of Polyamide 6 by Raman-Microscopy and Multivariate Data Analysis
NASA Astrophysics Data System (ADS)
Lechner, Tobias; Noack, Kristina; Thöne, Manuel; Amend, Philipp; Schmidt, Michael; Will, Stefan
Thermal malleability of thermoplastics results in a high product diversity in various industry sectors. However, industrial applications require a constant and high component quality. Hence, material processing such as laser welding has to consider that, e.g., the moisture content of thermoplastics influences the mechanical properties such as the tensile strength. Moreover, water evaporates during laser welding and can form pores and defects. Thus, there is a large need for non-invasive material inspection before processing. To that end, we developed a methodology based on Raman-microscopy and multivariate data analysis (MVD) to determine the moisture content of polyamide (MCP). Further, the impact of the MCP on the mechanical properties was verified. For samples with a defined variation of the MCP, xyz-Raman-scans were carried out and analysed using MVD. For reference purposes, the samples were weighted and tensile tests were performed. An evaluation by means of partial least squares regression analysis (PLSR) resulted in a prediction of the MCP with a correlation coefficient >98%. Consequently, Raman-microscopy shows large potential for developing new techniques for inspection and quality control of plastics before processing. Dedicated to Professor Alfred Leipertz on the occasion of his 70th birthday.
Tian, Huaixiang; Li, Fenghua; Qin, Lan; Yu, Haiyan; Ma, Xia
2014-11-01
This study examines the feasibility of electronic nose as a method to discriminate chicken and beef seasonings and to predict sensory attributes. Sensory evaluation showed that 8 chicken seasonings and 4 beef seasonings could be well discriminated and classified based on 8 sensory attributes. The sensory attributes including chicken/beef, gamey, garlic, spicy, onion, soy sauce, retention, and overall aroma intensity were generated by a trained evaluation panel. Principal component analysis (PCA), discriminant factor analysis (DFA), and cluster analysis (CA) combined with electronic nose were used to discriminate seasoning samples based on the difference of the sensor response signals of chicken and beef seasonings. The correlation between sensory attributes and electronic nose sensors signal was established using partial least squares regression (PLSR) method. The results showed that the seasoning samples were all correctly classified by the electronic nose combined with PCA, DFA, and CA. The electronic nose gave good prediction results for all the sensory attributes with correlation coefficient (r) higher than 0.8. The work indicated that electronic nose is an effective method for discriminating different seasonings and predicting sensory attributes. © 2014 Institute of Food Technologists®
Taradolsirithitikul, Panchita; Sirisomboon, Panmanas; Dachoupakan Sirisomboon, Cheewanun
2017-03-01
Ochratoxin A (OTA) contamination is highly prevalent in a variety of agricultural products including the commercially important coffee bean. As such, rapid and accurate detection methods are considered necessary for the identification of OTA in green coffee beans. The goal of this research was to apply Fourier transform near infrared spectroscopy to detect and classify OTA contamination in green coffee beans in both a quantitative and qualitative manner. PLSR models were generated using pretreated spectroscopic data to predict the OTA concentration. The best model displayed a correlation coefficient (r) of 0.814, a standard error of prediction (SEP and bias of 1.965 µg kg -1 and 0.358 µg kg -1 , respectively. Additionally, a PLS-DA model was also generated, displaying a classification accuracy of 96.83% for a non-OTA contaminated model and 80.95% for an OTA contaminated model, with an overall classification accuracy of 88.89%. The results demonstrate that the developed model could be used for detecting OTA contamination in green coffee beans in either a quantitative or qualitative manner. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
NASA Astrophysics Data System (ADS)
Wang, Wenxiu; Peng, Yankun; Wang, Fan; Sun, Hongwei
2017-05-01
The improvement of living standards has urged consumers to pay more attention to the quality and nutrition of meat, so the development of nondestructive detection device for quality and nutritional parameters is commercioganic undoubtedly. In this research, a portable device equipped with visible (Vis) and near-infrared (NIR) spectrometers, tungsten halogen lamp, optical fiber, ring light guide and embedded computer was developed to realize simultaneous and fast detection of color (L*, a*, b*), pH, total volatile basic nitrogen (TVB-N), intramuscular fat (IF), protein and water content in pork. The wavelengths of dual-band spectrometers were 400 1100 nm and 940 1650 nm respectively and the tungsten halogen lamp cooperated with ring light guide to form a ring light source and provide appropriate illumination intensity for sample. Software was self-developed to control the functionality of dual-band spectrometers, set spectrometer parameters, acquire and process Vis/NIR spectroscopy and display the prediction results in real time. In order to obtain a robust and accurate prediction model, fresh longissimus dorsi meat was bought and placed in the refrigerator for 12 days to get pork samples with different freshness degrees. Besides, pork meat from three different parts including longissimus dorsi, haunch and lean meat was collected for the determination of IF, protein and water to make the reference values have a wider distribution range. After acquisition of Vis/NIR spectra, data from 400 1100 nm were pretreated with Savitzky-Golay (S-G) filter and standard normal variables transform (SNVT) and spectrum data from 940 1650 nm were preprocessed with SNVT. The anomalous were eliminated by Monte Carlo method based on model cluster analysis and then partial least square regression (PLSR) models based on single band (400 1100 nm or 940 1650 nm) and dual-band were established and compared. The results showed the optimal models for each parameter were built with correlation coefficients in prediction set of 0.9101, 0.9121, 0.8873, 0.9094, 0.9378, 0.9348, 0.9342 and 0.8882, respectively. It indicated this innovative and practical device can be a promising technology for nondestructive, fast and accurate detection of nutritional parameters in meat.
Mohamadi Monavar, H; Afseth, N K; Lozano, J; Alimardani, R; Omid, M; Wold, J P
2013-07-15
The purpose of this study was to evaluate the feasibility of Raman spectroscopy for predicting purity of caviars. The 93 wild caviar samples of three different types, namely; Beluga, Asetra and Sevruga were analysed by Raman spectroscopy in the range 1995 cm(-1) to 545 cm(-1). Also, 60 samples from combinations of every two types were examined. The chemical origin of the samples was identified by reference measurements on pure samples. Linear chemometric methods like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) were used for data visualisation and classification which permitted clear distinction between different caviars. Non-linear methods like Artificial Neural Networks (ANN) were used to classify caviar samples. Two different networks were tested in the classification: Probabilistic Neural Network with Radial-Basis Function (PNN) and Multilayer Feed Forward Networks with Back Propagation (BP-NN). In both cases, scores of principal components (PCs) were chosen as input nodes for the input layer in PC-ANN models in order to reduce the redundancy of data and time of training. Leave One Out (LOO) cross validation was applied in order to check the performance of the networks. Results of PCA indicated that, features like type and purity can be used to discriminate different caviar samples. These findings were also supported by LDA with efficiency between 83.77% and 100%. These results were confirmed with the results obtained by developed PC-ANN models, able to classify pure caviar samples with 93.55% and 71.00% accuracy in BP network and PNN, respectively. In comparison, LDA, PNN and BP-NN models for predicting caviar types have 90.3%, 73.1% and 91.4% accuracy. Partial least squares regression (PLSR) models were built under cross validation and tested with different independent data sets, yielding determination coefficients (R(2)) of 0.86, 0.83, 0.92 and 0.91 with root mean square error (RMSE) of validation of 0.32, 0.11, 0.03 and 0.09 for fatty acids of 16.0, 20.5, 22.6 and fat, respectively. Crown Copyright © 2013. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Moore, T. S.; Sanderman, J.; Baldock, J.; Plante, A. F.
2016-12-01
National-scale inventories typically include soil organic carbon (SOC) content, but not chemical composition or biogeochemical stability. Australia's Soil Carbon Research Programme (SCaRP) represents a national inventory of SOC content and composition in agricultural systems. The program used physical fractionation followed by 13C nuclear magnetic resonance (NMR) spectroscopy. While these techniques are highly effective, they are typically too expensive and time consuming for use in large-scale SOC monitoring. We seek to understand if analytical thermal analysis is a viable alternative. Coupled differential scanning calorimetry (DSC) and evolved gas analysis (CO2- and H2O-EGA) yields valuable data on SOC composition and stability via ramped combustion. The technique requires little training to use, and does not require fractionation or other sample pre-treatment. We analyzed 300 agricultural samples collected by SCaRP, divided into four fractions: whole soil, coarse particulates (POM), untreated mineral associated (HUM), and hydrofluoric acid (HF)-treated HUM. All samples were analyzed by DSC-EGA, but only the POM and HF-HUM fractions were analyzed by NMR. Multivariate statistical analyses were used to explore natural clustering in SOC composition and stability based on DSC-EGA data. A partial least-squares regression (PLSR) model was used to explore correlations among the NMR and DSC-EGA data. Correlations demonstrated regions of combustion attributable to specific functional groups, which may relate to SOC stability. We are increasingly challenged with developing an efficient technique to assess SOC composition and stability at large spatial and temporal scales. Correlations between NMR and DSC-EGA may demonstrate the viability of using thermal analysis in lieu of more demanding methods in future large-scale surveys, and may provide data that goes beyond chemical composition to better approach quantification of biogeochemical stability.
A portable device for detecting fruit quality by diffuse reflectance Vis/NIR spectroscopy
NASA Astrophysics Data System (ADS)
Sun, Hongwei; Peng, Yankun; Li, Peng; Wang, Wenxiu
2017-05-01
Soluble solid content (SSC) is a major quality parameter to fruit, which has influence on its flavor or texture. Some researches on the on-line non-invasion detection of fruit quality were published. However, consumers desire portable devices currently. This study aimed to develop a portable device for accurate, real-time and nondestructive determination of quality factors of fruit based on diffuse reflectance Vis/NIR spectroscopy (520-950 nm). The hardware of the device consisted of four units: light source unit, spectral acquisition unit, central processing unit, display unit. Halogen lamp was chosen as light source. When working, its hand-held probe was in contact with the surface of fruit samples thus forming dark environment to shield the interferential light outside. Diffuse reflectance light was collected and measured by spectrometer (USB4000). ARM (Advanced RISC Machines), as central processing unit, controlled all parts in device and analyzed spectral data. Liquid Crystal Display (LCD) touch screen was used to interface with users. To validate its reliability and stability, 63 apples were tested in experiment, 47 of which were chosen as calibration set, while others as prediction set. Their SSC reference values were measured by refractometer. At the same time, samples' spectral data acquired by portable device were processed by standard normalized variables (SNV) and Savitzky-Golay filter (S-G) to eliminate the spectra noise. Then partial least squares regression (PLSR) was applied to build prediction models, and the best predictions results was achieved with correlation coefficient (r) of 0.855 and standard error of 0.6033° Brix. The results demonstrated that this device was feasible to quantitatively analyze soluble solid content of apple.
Estimating soil zinc concentrations using reflectance spectroscopy
NASA Astrophysics Data System (ADS)
Sun, Weichao; Zhang, Xia
2017-06-01
Soil contamination by heavy metals has been an increasingly severe threat to nature environment and human health. Efficiently investigation of contamination status is essential to soil protection and remediation. Visible and near-infrared reflectance spectroscopy (VNIRS) has been regarded as an alternative for monitoring soil contamination by heavy metals. Generally, the entire VNIR spectral bands are employed to estimate heavy metal concentration, which lacks interpretability and requires much calculation. In this study, 74 soil samples were collected from Hunan Province, China and their reflectance spectra were used to estimate zinc (Zn) concentration in soil. Organic matter and clay minerals have strong adsorption for Zn in soil. Spectral bands associated with organic matter and clay minerals were used for estimation with genetic algorithm based partial least square regression (GA-PLSR). The entire VNIR spectral bands, the bands associated with organic matter and the bands associated with clay minerals were incorporated as comparisons. Root mean square error of prediction, residual prediction deviation, and coefficient of determination (R2) for the model developed using combined bands of organic matter and clay minerals were 329.65 mg kg-1, 1.96 and 0.73, which is better than 341.88 mg kg-1, 1.89 and 0.71 for the entire VNIR spectral bands, 492.65 mg kg-1, 1.31 and 0.40 for the organic matter, and 430.26 mg kg-1, 1.50 and 0.54 for the clay minerals. Additionally, in consideration of atmospheric water vapor absorption in field spectra measurement, combined bands of organic matter and absorption around 2200 nm were used for estimation and achieved high prediction accuracy with R2 reached 0.640. The results indicate huge potential of soil reflectance spectroscopy in estimating Zn concentrations in soil.
Zhu, JianCai; Chen, Feng; Wang, LingYing; Niu, YunWei; Chen, HeXing; Wang, HongLin; Xiao, ZuoBing
2016-06-22
The volatile compounds of cranberries obtained from four cultivars (Early Black, Y1; Howes, Y2; Searles, Y3; and McFarlin, Y4) were analyzed by gas chromatography-olfactometry (GC-O), gas chromatography-mass spectrometry (GC-MS), and GC-flame photometric detection (FPD). The result presented that a total of thirty-three, thirty-four, thirty-four, and thirty-six odor-active compounds were identified by GC-O in the Y1, Y2, Y3, and Y4, respectively. In addition, twenty-two, twenty-two, thirty, and twenty-seven quantified compounds were demonstrated as important odorants according to odor activity values (OAVs > 1). Among these compounds, hexanal (OAV: 27-60), pentanal (OAV: 31-51), (E)-2-heptenal (OAV: 17-66), (E)-2-hexenal (OAV: 18-63), (E)-2-octenal (OAV: 10-28), (E)-2-nonenal (OAV: 8-77), ethyl 2-methylbutyrate (OAV: 10-33), β-ionone (OAV: 8-73), 2-methylbutyric acid (OAV: 18-37), and octanal (OAV: 4-24) contributed greatly to the aroma of cranberry. Partial least-squares regression (PLSR) was used to process the mean data accumulated from sensory evaluation by the panelists, odor-active aroma compounds (OAVs > 1), and samples. Sample Y3 was highly correlated with the sensory descriptors "floral" and "fruity". Sample Y4 was greatly related to the sensory descriptors "mellow" and "green and grass". Finally, an aroma reconstitution (Model A) was prepared by mixing the odor-active aroma compounds (OAVs > 1) based on their measured concentrations in the Y1 sample, indicating that the aroma profile of the reconstitution was pretty similar to that of the original sample.
NASA Astrophysics Data System (ADS)
Ahmed, M. H.; Abdul-Aziz, O. I.
2017-12-01
Chlorophyll-a (Chl-a) is a key indicator for stream water quality and ecological health. The characterization of interplay between Chl-a and its numerous hydroclimatic and biogeochemical drivers is complex, and often involves multicollinear datasets. A systematic data analytics methodology was employed to determine the relative linkages of stream Chl-a with its dynamic environmental drivers at 50 stream water quality monitoring stations across the continental U.S. Multivariate statistical techniques of principal component analysis (PCA) and factor analysis (FA), in concert with Pearson correlation analysis, were applied to evaluate interrelationships among hydroclimatic, biogeochemical, and biological variables. Power-law based partial least square regression (PLSR) models were developed with a bootstrap Monte Carlo procedure (1000 iterations) to reliably estimate the comparative linkages of Chl-a by resolving multicollinearity in the data matrices (Nash-Sutcliff efficiency = 0.50-87). The data analytics suggested four environmental regimes of stream Chl-a, as dominated by nutrient, climate, redox, and hydro-atmospheric contributions, respectively. Total phosphorous (TP) was the most dominant driver of stream Chl-a in the nutrient controlled regime. Water temperature demonstrated the strongest control of Chl-a in the climate-dominated regime. Furthermore, pH and stream flow were found to be the most important drivers of Chl-a in the redox and hydro-atmospheric component dominated regimes, respectively. The research led to a significant reduction of dimensionality in the large data matrices, providing quantitative and qualitative insights on the dynamics of stream Chl-a. The findings would be useful to manage stream water quality and ecosystem health in the continental U.S. and around the world under a changing climate and environment.
Elsohaby, Ibrahim; McClure, J Trenton; Riley, Christopher B; Bryanton, Janet; Bigsby, Kathryn; Shaw, R Anthony
2018-02-20
Attenuated total reflectance infrared (ATR-IR) spectroscopy is a simple, rapid and cost-effective method for the analysis of serum. However, the complex nature of serum remains a limiting factor to the reliability of this method. We investigated the benefits of coupling the centrifugal ultrafiltration with ATR-IR spectroscopy for quantification of human serum IgA concentration. Human serum samples (n = 196) were analyzed for IgA using an immunoturbidimetric assay. ATR-IR spectra were acquired for whole serum samples and for the retentate (residue) reconstituted with saline following 300 kDa centrifugal ultrafiltration. IR-based analytical methods were developed for each of the two spectroscopic datasets, and the accuracy of each of the two methods compared. Analytical methods were based upon partial least squares regression (PLSR) calibration models - one with 5-PLS factors (for whole serum) and the second with 9-PLS factors (for the reconstituted retentate). Comparison of the two sets of IR-based analytical results to reference IgA values revealed improvements in the Pearson correlation coefficient (from 0.66 to 0.76), and the root mean squared error of prediction in IR-based IgA concentrations (from 102 to 79 mg/dL) for the ultrafiltration retentate-based method as compared to the method built upon whole serum spectra. Depleting human serum low molecular weight proteins using a 300 kDa centrifugal filter thus enhances the accuracy IgA quantification by ATR-IR spectroscopy. Further evaluation and optimization of this general approach may ultimately lead to routine analysis of a range of high molecular-weight analytical targets that are otherwise unsuitable for IR-based analysis. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Gilliot, Jean-Marc; Vaudour, Emmanuelle; Michelin, Joël
2016-04-01
This study was carried out in the framework of the PROSTOCK-Gessol3 project supported by the French Environment and Energy Management Agency (ADEME), the TOSCA-PLEIADES-CO project of the French Space Agency (CNES) and the SOERE PRO network working on environmental impacts of Organic Waste Products recycling on field crops at long time scale. The organic matter is an important soil fertility parameter and previous studies have shown the potential of spectral information measured in the laboratory or directly in the field using field spectro-radiometer or satellite imagery to predict the soil organic carbon (SOC) content. This work proposes a method for a spatial prediction of bare cultivated topsoil SOC content, from Unmanned Aerial Vehicle (UAV) multispectral imagery. An agricultural plot of 13 ha, located in the western region of Paris France, was analysed in April 2013, shortly before sowing while it was still bare soil. Soils comprised haplic luvisols, rendzic cambisols and calcaric or colluvic cambisols. The UAV platform used was a fixed wing provided by Airinov® flying at an altitude of 150m and was equipped with a four channels multispectral visible near-infrared camera MultiSPEC 4C® (550nm, 660nm, 735 nm and 790 nm). Twenty three ground control points (GCP) were sampled within the plot according to soils descriptions. GCP positions were determined with a centimetric DGPS. Different observations and measurements were made synchronously with the drone flight: soil surface description, spectral measurements (with ASD FieldSpec 3® spectroradiometer), roughness measurements by a photogrammetric method. Each of these locations was sampled for both soil standard physico-chemical analysis and soil water content. A Structure From Motion (SFM) processing was done from the UAV imagery to produce a 15 cm resolution multispectral mosaic using the Agisoft Photoscan® software. The SOC content was modelled by partial least squares regression (PLSR) between the laboratory analyses and the multispectral information for the 23 plots. The mean squared error of cross validation (RMSECV) by LOO (Leave One Out) method was 1.97 g of OC per kg of soil. A second correction of the model incorporating the effects of moisture and roughness on reflectance, has improved the quality of the prediction by 18% and a RMSECV of 1.61 g / kg. The model was finally spatialized on the whole plot using ArcGIS® by applying the regression formula on all mosaic pixels. Results are discussed in the light of an additional sampling campaign carried out in October 2015, providing 34 independent samples.
Prediction of iron oxide contents using diffuse reflectance spectroscopy
NASA Astrophysics Data System (ADS)
Marques, José, Jr.; Arantes Camargo, Livia
2015-04-01
Determining soil iron oxides using conventional analysis is relatively unfeasible when large areas are mapped, with the aim of characterizing spatial variability. Diffuse reflectance spectroscopy (DRS) is rapid, less expensive, non-destructive and sometimes more accurate than conventional analysis. Furthermore, this technique allows the simultaneous characterization of many soil attributes with agronomic and environmental relevance. This study aims to assess the DRS capability to predict iron oxides content -hematite and goethite - , characterizing their spatial variability in soils of Brazil. Soil samples collected from an 800-hectare area were scanned in the visible and near-infrared spectral range. Moreover, chemometric calibration was obtained through partial least-squares regression (PLSR). Then, spatial distribution maps of the attributes were constructed using predicted values from calibrated models through geostatistical methods. The studied area presented soils with varied contents of iron oxides as examples for the Oxisols and Entisols. In the spectra of each soil is observed that the reflectance decreases with the content of iron oxides present in the soil. In soils with a high content of iron oxides can be observed more pronounced concavities between 380 and 1100 nm which are characteristic of the presence of these oxides. In soils with higher reflectance it were observed concavity characteristics due to the presence of kaolinite, in agreement with the low iron contents of those soils. The best accuracy of prediction models [residual prediction deviation (RPD) = 1.7] was obtained for goethite within the visible region (380-800 nm), and for hematite (RPD = 2.0) within the visible near infrared (380-2300 nm). The maps of goethite and hematite predicted showed the spatial distribution pattern similar to the maps of clay and iron extracted by dithionite-citrate-bicarbonate, being consistent with the iron oxide contents of soils present in the study area. These results confirm the value of DRS in the mapping of iron oxides in large areas at detailed scale.
NASA Astrophysics Data System (ADS)
Yang, Yue; Wu, Yongjiang; Li, Weili; Liu, Xuesong; Zheng, Jiyu; Zhang, Wentao; Chen, Yong
2018-02-01
Near infrared (NIR) spectroscopy coupled with chemometrics was used to discriminate the geographical origin of Herba Epimedii in this work. Four different classification models, namely discriminant analysis (DA), back propagation neural network (BPNN), K-nearest neighbor (KNN), and support vector machine (SVM), were constructed, and their performances in terms of recognition accuracy were compared. The results indicated that the SVM model was superior over the other models in the geographical origin identification of Herba Epimedii. The recognition rates of the optimum SVM model were up to 100% for the calibration set and 94.44% for the prediction set, respectively. In addition, the feasibility of NIR spectroscopy with the CARS-PLSR calibration model in prediction of icariin content of Herba Epimedii was also investigated. The determination coefficient (RP2) and root-mean-square error (RMSEP) for prediction set were 0.9269 and 0.0480, respectively. It can be concluded that the NIR spectroscopy technique in combination with chemometrics has great potential in determination of geographical origin and icariin content of Herba Epimedii. This study can provide a valuable reference for rapid quality control of food products.
Vongsvivut, Jitraporn; Heraud, Philip; Gupta, Adarsha; Puri, Munish; McNaughton, Don; Barrow, Colin J
2013-10-21
The increase in polyunsaturated fatty acid (PUFA) consumption has prompted research into alternative resources other than fish oil. In this study, a new approach based on focal-plane-array Fourier transform infrared (FPA-FTIR) microspectroscopy and multivariate data analysis was developed for the characterisation of some marine microorganisms. Cell and lipid compositions in lipid-rich marine yeasts collected from the Australian coast were characterised in comparison to a commercially available PUFA-producing marine fungoid protist, thraustochytrid. Multivariate classification methods provided good discriminative accuracy evidenced from (i) separation of the yeasts from thraustochytrids and distinct spectral clusters among the yeasts that conformed well to their biological identities, and (ii) correct classification of yeasts from a totally independent set using cross-validation testing. The findings further indicated additional capability of the developed FPA-FTIR methodology, when combined with partial least squares regression (PLSR) analysis, for rapid monitoring of lipid production in one of the yeasts during the growth period, which was achieved at a high accuracy compared to the results obtained from the traditional lipid analysis based on gas chromatography. The developed FTIR-based approach when coupled to programmable withdrawal devices and a cytocentrifugation module would have strong potential as a novel online monitoring technology suited for bioprocessing applications and large-scale production.
Profiling Taste and Aroma Compound Metabolism during Apricot Fruit Development and Ripening
Xi, Wanpeng; Zheng, Huiwen; Zhang, Qiuyun; Li, Wenhui
2016-01-01
Sugars, organic acids and volatiles of apricot were determined by HPLC and GC-MS during fruit development and ripening, and the key taste and aroma components were identified by integrating flavor compound contents with consumers’ evaluation. Sucrose and glucose were the major sugars in apricot fruit. The contents of all sugars increased rapidly, and the accumulation pattern of sugars converted from glucose-predominated to sucrose-predominated during fruit development and ripening. Sucrose synthase (SS), sorbitol oxidase (SO) and sorbitol dehydrogenase (SDH) are under tight developmental control and they might play important roles in sugar accumulation. Almost all organic acids identified increased during early development and then decrease rapidly. During early development, fruit mainly accumulated quinate and malate, with the increase of citrate after maturation, and quinate, malate and citrate were the predominant organic acids at the ripening stage. The odor activity values (OAV) of aroma volatiles showed that 18 aroma compounds were the characteristic components of apricot fruit. Aldehydes and terpenes decreased significantly during the whole development period, whereas lactones and apocarotenoids significantly increased with fruit ripening. The partial least squares regression (PLSR) results revealed that β-ionone, γ-decalactone, sucrose and citrate are the key characteristic flavor factors contributing to consumer acceptance. Carotenoid cleavage dioxygenases (CCD) may be involved in β-ionone formation in apricot fruit. PMID:27347931
NASA Astrophysics Data System (ADS)
Oropeza, D.
2016-12-01
A highly innovative laser ablation sampling instrument (J200 Tandem LA - LIBS) that combines the capabilities and analytical benefits of LIBS, LA-ICP-MS and LA-ICP-OES was used for micrometer-scale, spatially-resolved, elemental analysis of a wide variety of samples of geological interest. Data collected using ablation systems consisted of nanosecond (Nd:YAG operated 266nm) and femtosecond lasers (1030 and 343nm). An ICCD LIBS detector and Quadrupole based mass spectrometer were selected for LIBS and ICP-MS detection, respectively. This tandem instrument allows simultaneous determination of major and minor elements (for example, Si, Ca, Na, and Al, and trace elements such as Li, Ce, Cr, Sr, Y, Zn, Zr among others). The research also focused on elemental mapping and calibration strategies, specifically the use of emission and mass spectra for multivariate data analysis. Partial Least Square Regression (PLSR) is shown to minimize and compensate for matrix effects in the emission and mass spectra improving quantitative analysis by LIBS and LA-ICP-MS, respectively. The study provides a benchmark to evaluate analytical results for more complex geological sample matrices.
Farrés, Mireia; Piña, Benjamí; Tauler, Romà
2016-08-01
Copper containing fungicides are used to protect vineyards from fungal infections. Higher residues of copper in grapes at toxic concentrations are potentially toxic and affect the microorganisms living in vineyards, such as Saccharomyces cerevisiae. In this study, the response of the metabolic profiles of S. cerevisiae at different concentrations of copper sulphate (control, 1 mM, 3 mM and 6 mM) was analysed by liquid chromatography coupled to mass spectrometry (LC-MS) and multivariate curve resolution-alternating least squares (MCR-ALS) using an untargeted metabolomics approach. Peak areas of the MCR-ALS resolved elution profiles in control and in Cu(ii)-treated samples were compared using partial least squares regression (PLSR) and PLS-discriminant analysis (PLS-DA), and the intracellular metabolites best contributing to sample discrimination were selected and identified. Fourteen metabolites showed significant concentration changes upon Cu(ii) exposure, following a dose-response effect. The observed changes were consistent with the expected effects of Cu(ii) toxicity, including oxidative stress and DNA damage. This research confirmed that LC-MS based metabolomics coupled to chemometric methods are a powerful approach for discerning metabolomics changes in S. cerevisiae and for elucidating modes of toxicity of environmental stressors, including heavy metals like Cu(ii).
NASA Astrophysics Data System (ADS)
Will, R. M.; Li, A.; Glenn, N. F.; Benner, S. G.; Spaete, L.; Ilangakoon, N. T.
2015-12-01
Soil organic carbon distribution and the factors influencing this distribution are important for understanding carbon stores, vegetation dynamics, and the overall carbon cycle. Linking soil organic carbon (SOC) with aboveground vegetation biomass may provide a method to better understand SOC distribution in semiarid ecosystems. The Reynolds Creek Critical Zone Observatory (RC CZO) in Idaho, USA, is approximately 240 square kilometers and is situated in the semiarid Great Basin of the sagebrush-steppe ecosystem. Full waveform airborne lidar data and Next-Generation Airborne Visible/Infrared Imaging Spectrometer (AVIRIS-ng) collected in 2014 across the RC CZO are used to map vegetation biomass and SOC and then explore the relationships between them. Vegetation biomass is estimated by identifying vegetation species, and quantifying distribution and structure with lidar and integrating the field-measured biomass. Spectral data from AVIRIS-ng are used to differentiate non-photosynthetic vegetation (NPV) and soil, which are commonly confused in semiarid ecosystems. The information from lidar and AVIRIS-ng are then used to predict SOC by partial least squares regression (PLSR). An uncertainty analysis is provided, demonstrating the applicability of these approaches to improving our understanding of the distribution and patterns of SOC across the landscape.
Moreira, Maria João
2018-01-01
The aim of this study was to evaluate the potential of Fourier transform infrared (FTIR) spectroscopy coupled with chemometric methods to detect fish adulteration. Muscles of Atlantic salmon (Salmo salar) (SS) and Salmon trout (Onconrhynchus mykiss) (OM) muscles were mixed in different percentages and transformed into mini-burgers. These were stored at 3 °C, then examined at 0, 72, 160, and 240 h for deteriorative microorganisms. Mini-burgers was submitted to Soxhlet extraction, following which lipid extracts were analyzed by FTIR. The principal component analysis (PCA) described the studied adulteration using four principal components with an explained variance of 95.60%. PCA showed that the absorbance in the spectral region from 721, 1097, 1370, 1464, 1655, 2805, to 2935, 3009 cm−1 may be attributed to biochemical fingerprints related to differences between SS and OM. The partial least squares regression (PLS-R) predicted the presence/absence of adulteration in fish samples of an external set with high accuracy. The proposed methods have the advantage of allowing quick measurements, despite the storage time of the adulterated fish. FTIR combined with chemometrics showed that a methodology to identify the adulteration of SS with OM can be established, even when stored for different periods of time. PMID:29621135
Visible luminescence of Dy3+ doped PbF2-Li2O-SrO-ZnO-B2O3 glasses for yellow light applications
NASA Astrophysics Data System (ADS)
Anjaiah, G.; Sasikala, T.; Kistaiah, P.
2018-05-01
The present studies on various concentrations of Dy3+ ions doped PLSrZFB glasses were carried out through optical absorption, photoluminescence and decay time measurements. The Judd-Ofelt (JO) intensity parameters Ωλ (λ = 2,4,6) can be utilized to evaluate the emission properties. The decay curves for the 4F9/2 levels have been measured and these turns to non-exponential nature at higher concentrations (> 0.1 mol%) is due to energy transfer between the Dy3+-Dy3+ ions dipole -dipole type through cross relaxation channels. The CIE chromaticity color coordinates were calculated and they were all located within the vicinity of white region of the color coordination diagram. The Inokuti-Hirayama model is used to analyze the energy transfer process and also energy transfer parameters have been calculated and discussed.
Perception and Modeling of Affective Qualities of Musical Instrument Sounds across Pitch Registers.
McAdams, Stephen; Douglas, Chelsea; Vempala, Naresh N
2017-01-01
Composers often pick specific instruments to convey a given emotional tone in their music, partly due to their expressive possibilities, but also due to their timbres in specific registers and at given dynamic markings. Of interest to both music psychology and music informatics from a computational point of view is the relation between the acoustic properties that give rise to the timbre at a given pitch and the perceived emotional quality of the tone. Musician and nonmusician listeners were presented with 137 tones produced at a fixed dynamic marking (forte) playing tones at pitch class D# across each instrument's entire pitch range and with different playing techniques for standard orchestral instruments drawn from the brass, woodwind, string, and pitched percussion families. They rated each tone on six analogical-categorical scales in terms of emotional valence (positive/negative and pleasant/unpleasant), energy arousal (awake/tired), tension arousal (excited/calm), preference (like/dislike), and familiarity. Linear mixed models revealed interactive effects of musical training, instrument family, and pitch register, with non-linear relations between pitch register and several dependent variables. Twenty-three audio descriptors from the Timbre Toolbox were computed for each sound and analyzed in two ways: linear partial least squares regression (PLSR) and nonlinear artificial neural net modeling. These two analyses converged in terms of the importance of various spectral, temporal, and spectrotemporal audio descriptors in explaining the emotion ratings, but some differences also emerged. Different combinations of audio descriptors make major contributions to the three emotion dimensions, suggesting that they are carried by distinct acoustic properties. Valence is more positive with lower spectral slopes, a greater emergence of strong partials, and an amplitude envelope with a sharper attack and earlier decay. Higher tension arousal is carried by brighter sounds, more spectral variation and more gentle attacks. Greater energy arousal is associated with brighter sounds, with higher spectral centroids and slower decrease of the spectral slope, as well as with greater spectral emergence. The divergences between linear and nonlinear approaches are discussed.
Perception and Modeling of Affective Qualities of Musical Instrument Sounds across Pitch Registers
McAdams, Stephen; Douglas, Chelsea; Vempala, Naresh N.
2017-01-01
Composers often pick specific instruments to convey a given emotional tone in their music, partly due to their expressive possibilities, but also due to their timbres in specific registers and at given dynamic markings. Of interest to both music psychology and music informatics from a computational point of view is the relation between the acoustic properties that give rise to the timbre at a given pitch and the perceived emotional quality of the tone. Musician and nonmusician listeners were presented with 137 tones produced at a fixed dynamic marking (forte) playing tones at pitch class D# across each instrument's entire pitch range and with different playing techniques for standard orchestral instruments drawn from the brass, woodwind, string, and pitched percussion families. They rated each tone on six analogical-categorical scales in terms of emotional valence (positive/negative and pleasant/unpleasant), energy arousal (awake/tired), tension arousal (excited/calm), preference (like/dislike), and familiarity. Linear mixed models revealed interactive effects of musical training, instrument family, and pitch register, with non-linear relations between pitch register and several dependent variables. Twenty-three audio descriptors from the Timbre Toolbox were computed for each sound and analyzed in two ways: linear partial least squares regression (PLSR) and nonlinear artificial neural net modeling. These two analyses converged in terms of the importance of various spectral, temporal, and spectrotemporal audio descriptors in explaining the emotion ratings, but some differences also emerged. Different combinations of audio descriptors make major contributions to the three emotion dimensions, suggesting that they are carried by distinct acoustic properties. Valence is more positive with lower spectral slopes, a greater emergence of strong partials, and an amplitude envelope with a sharper attack and earlier decay. Higher tension arousal is carried by brighter sounds, more spectral variation and more gentle attacks. Greater energy arousal is associated with brighter sounds, with higher spectral centroids and slower decrease of the spectral slope, as well as with greater spectral emergence. The divergences between linear and nonlinear approaches are discussed. PMID:28228741
Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.
2008-01-01
Tectonic geochemical paleolatitude (TGP) models were developed to predict the paleolatitude of petroleum source rock from the geochemical composition of crude oil. The results validate studies designed to reconstruct ancient source rock depositional environments using oil chemistry and tectonic reconstruction of paleogeography from coordinates of the present day collection site. TGP models can also be used to corroborate tectonic paleolatitude in cases where the predicted paleogeography conflicts with the depositional setting predicted by the oil chemistry, or to predict paleolatitude when the present day collection locality is far removed from the source rock, as might occur due to long distance subsurface migration or transport of tarballs by ocean currents. Biomarker and stable carbon isotope ratios were measured for 496 crude oil samples inferred to originate from Upper Jurassic source rock in West Siberia, the North Sea and offshore Labrador. First, a unique, multi-tiered chemometric (multivariate statistics) decision tree was used to classify these samples into seven oil families and infer the type of organic matter, lithology and depositional environment of each organofacies of source rock [Peters, K.E., Ramos, L.S., Zumberge, J.E., Valin, Z.C., Scotese, C.R., Gautier, D.L., 2007. Circum-Arctic petroleum systems identified using decision-tree chemometrics. American Association of Petroleum Geologists Bulletin 91, 877-913]. Second, present day geographic locations for each sample were used to restore the tectonic paleolatitude of the source rock during Late Jurassic time (???150 Ma). Third, partial least squares regression (PLSR) was used to construct linear TGP models that relate tectonic and geochemical paleolatitude, where the latter is based on 19 source-related biomarker and isotope ratios for each oil family. The TGP models were calibrated using 70% of the samples in each family and the remaining 30% of samples were used for model validation. Positive relationships exist between tectonic and geochemical paleolatitude for each family. Standard error of prediction for geochemical paleolatitude ranges from 0.9?? to 2.6?? of tectonic paleolatitude, which translates to a relative standard error of prediction in the range 1.5-4.8%. The results suggest that the observed effect of source rock paleolatitude on crude oil composition is caused by (i) stable carbon isotope fractionation during photosynthetic fixation of carbon and (ii) species diversity at different latitudes during Late Jurassic time. ?? 2008 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Li, Yao-Wang; Li, Bo; He, Jiguo; Qian, Ping
2011-07-01
A database consisting of 214 tripeptides which contain either His or Tyr residue was applied to study quantitative structure-activity relationships (QSAR) of antioxidative tripeptides. Partial Least-Squares Regression analysis (PLSR) was conducted using parameters individually of each amino acid descriptor, including Divided Physico-chemical Property Scores (DPPS), Hydrophobic, Electronic, Steric, and Hydrogen (HESH), Vectors of Hydrophobic, Steric, and Electronic properties (VHSE), Molecular Surface-Weighted Holistic Invariant Molecular (MS-WHIM), isotropic surface area-electronic charge index (ISA-ECI) and Z-scale, to describe antioxidative tripeptides as X-variables and antioxidant activities measured with ferric thiocyanate methods were as Y-variable. After elimination of outliers by Hotelling's T 2 method and residual analysis, six significant models were obtained describing the entire data set. According to cumulative squared multiple correlation coefficients ( R2), cumulative cross-validation coefficients ( Q2) and relative standard deviation for calibration set (RSD c), the qualities of models using DPPS, HESH, ISA-ECI, and VHSE descriptors are better ( R2 > 0.6, Q2 > 0.5, RSD c < 0.39) than that of models using MS-WHIM and Z-scale descriptors ( R2 < 0.6, Q2 < 0.5, RSD c > 0.44). Furthermore, the predictive ability of models using DPPS descriptor is best among the six descriptors systems (cumulative multiple correlation coefficient for predict set ( Rext2) > 0.7). It was concluded that the DPPS is better to describe the amino acid of antioxidative tripeptides. The results of DPPS descriptor reveal that the importance of the center amino acid and the N-terminal amino acid are far more than the importance of the C-terminal amino acid for antioxidative tripeptides. The hydrophobic (positively to activity) and electronic (negatively to activity) properties of the N-terminal amino acid are suggested to play the most important significance to activity, followed by the hydrogen bond (positively to activity) of the center amino acid. The N-terminal amino acid should be a high hydrophobic and low electronic amino acid (such as Ala, Gly, Val, and Leu); the center amino acid would be an amino acid that possesses high hydrogen bond property (such as base amino acid Arg, Lys, and His). The structural characteristics of antioxidative peptide be found in this paper may contribute to the further research of antioxidative mechanism.
Herrero, Paula; Sáenz-Navajas, Pilar; Culleré, Laura; Ferreira, Vicente; Chatin, Amelie; Chaperon, Vincent; Litoux-Desrues, François; Escudero, Ana
2016-09-15
Five different methodologies were applied for the quantitative analysis of 86 volatile molecules in 32 Chardonnay and 30 Pinot Noir Champagne white base wines. Sensory characterization was carried out by descriptive analysis. Pinot Noir wines had more constitutive compounds while Chardonnay wines had more discriminant compounds. Only four compounds predominated in Chardonnay wines: 4-vinylphenol, guaiacol, sotolon and 4-methyl-4-mercapto-2-pentanone. Correlation studies and PLSR models were calculated with sensory and chemical variables. For Pinot Noir wines, they were not as revealing as for Chardonnay base wines. Sulfur-related compounds were suggested to be involved in tropical fruit, dried fruit and citric sensory notes. This family of compounds seemed to be responsible for discriminant sensory terms in Champagne base wines. Fermentative compounds (aromatic buffer) were found at significantly higher levels in Pinot Noir wines, which would explain the fact that these wines were more difficult to describe in comparison with Chardonnay base wines. Copyright © 2016 Elsevier Ltd. All rights reserved.
Markiewicz-Keszycka, Maria; Casado-Gavalda, Maria P; Cama-Moncunill, Xavier; Cama-Moncunill, Raquel; Dixit, Yash; Cullen, Patrick J; Sullivan, Carl
2018-04-01
Gluten free (GF) diets are prone to mineral deficiency, thus effective monitoring of the elemental composition of GF products is important to ensure a balanced micronutrient diet. The objective of this study was to test the potential of laser-induced breakdown spectroscopy (LIBS) analysis combined with chemometrics for at-line monitoring of ash, potassium and magnesium content of GF flours: tapioca, potato, maize, buckwheat, brown rice and a GF flour mixture. Concentrations of ash, potassium and magnesium were determined with reference methods and LIBS. PCA analysis was performed and presented the potential for discrimination of the six GF flours. For the quantification analysis PLSR models were developed; R 2 cal were 0.99 for magnesium and potassium and 0.97 for ash. The study revealed that LIBS combined with chemometrics is a convenient method to quantify concentrations of ash, potassium and magnesium and present the potential to classify different types of flours. Copyright © 2017 Elsevier Ltd. All rights reserved.
Salter, Ian
2018-01-01
Environmental DNA (eDNA) can be defined as the DNA pool recovered from an environmental sample that includes both extracellular and intracellular DNA. There has been a significant increase in the number of recent studies that have demonstrated the possibility to detect macroorganisms using eDNA. Despite the enormous potential of eDNA to serve as a biomonitoring and conservation tool in aquatic systems, there remain some important limitations concerning its application. One significant factor is the variable persistence of eDNA over natural environmental gradients, which imposes a critical constraint on the temporal and spatial scales of species detection. In the present study, a radiotracer bioassay approach was used to quantify the kinetic parameters of dissolved eDNA (d-eDNA), a component of extracellular DNA, over an annual cycle in the coastal Northwest Mediterranean. Significant seasonal variability in the biological uptake and turnover of d-eDNA was observed, the latter ranging from several hours to over one month. Maximum uptake rates of d-eDNA occurred in summer during a period of intense phosphate limitation (turnover <5 hrs). Corresponding increases in bacterial production and uptake of adenosine triphosphate (ATP) demonstrated the microbial utilization of d-eDNA as an organic phosphorus substrate. Higher temperatures during summer may amplify this effect through a general enhancement of microbial metabolism. A partial least squares regression (PLSR) model was able to reproduce the seasonal cycle in d-eDNA persistence and explained 60% of the variance in the observations. Rapid phosphate turnover and low concentrations of bioavailable phosphate, both indicative of phosphate limitation, were the most important parameters in the model. Abiotic factors such as pH, salinity and oxygen exerted minimal influence. The present study demonstrates significant seasonal variability in the persistence of d-eDNA in a natural marine environment that can be linked to the metabolic response of microbial communities to nutrient limitation. Future studies should consider the effect of natural environmental gradients on the seasonal persistence of eDNA, which will be of particular relevance for time-series biomonitoring programs.
2018-01-01
Environmental DNA (eDNA) can be defined as the DNA pool recovered from an environmental sample that includes both extracellular and intracellular DNA. There has been a significant increase in the number of recent studies that have demonstrated the possibility to detect macroorganisms using eDNA. Despite the enormous potential of eDNA to serve as a biomonitoring and conservation tool in aquatic systems, there remain some important limitations concerning its application. One significant factor is the variable persistence of eDNA over natural environmental gradients, which imposes a critical constraint on the temporal and spatial scales of species detection. In the present study, a radiotracer bioassay approach was used to quantify the kinetic parameters of dissolved eDNA (d-eDNA), a component of extracellular DNA, over an annual cycle in the coastal Northwest Mediterranean. Significant seasonal variability in the biological uptake and turnover of d-eDNA was observed, the latter ranging from several hours to over one month. Maximum uptake rates of d-eDNA occurred in summer during a period of intense phosphate limitation (turnover <5 hrs). Corresponding increases in bacterial production and uptake of adenosine triphosphate (ATP) demonstrated the microbial utilization of d-eDNA as an organic phosphorus substrate. Higher temperatures during summer may amplify this effect through a general enhancement of microbial metabolism. A partial least squares regression (PLSR) model was able to reproduce the seasonal cycle in d-eDNA persistence and explained 60% of the variance in the observations. Rapid phosphate turnover and low concentrations of bioavailable phosphate, both indicative of phosphate limitation, were the most important parameters in the model. Abiotic factors such as pH, salinity and oxygen exerted minimal influence. The present study demonstrates significant seasonal variability in the persistence of d-eDNA in a natural marine environment that can be linked to the metabolic response of microbial communities to nutrient limitation. Future studies should consider the effect of natural environmental gradients on the seasonal persistence of eDNA, which will be of particular relevance for time-series biomonitoring programs. PMID:29474423
Mendoza, Fernando A; Cichy, Karen A; Sprague, Christy; Goffnett, Amanda; Lu, Renfu; Kelly, James D
2018-01-01
Texture is a major quality parameter for the acceptability of canned whole beans. Prior knowledge of this quality trait before processing would be useful to guide variety development by bean breeders and optimize handling protocols by processors. The objective of this study was to evaluate and compare the predictive power of visible and near infrared reflectance spectroscopy (visible/NIRS, 400-2498 nm) and hyperspectral imaging (HYPERS, 400-1000 nm) techniques for predicting texture of canned black beans from intact dry seeds. Black beans were grown in Michigan (USA) over three field seasons. The samples exhibited phenotypic variability for canned bean texture due to genetic variability and processing practice. Spectral preprocessing methods (i.e. smoothing, first and second derivatives, continuous wavelet transform, and two-band ratios), coupled with a feature selection method, were tested for optimizing the prediction accuracy in both techniques based on partial least squares regression (PLSR) models. Visible/NIRS and HYPERS were effective in predicting texture of canned beans using intact dry seeds, as indicated by their correlation coefficients for prediction (R pred ) and standard errors of prediction (SEP). Visible/NIRS was superior (R pred = 0.546-0.923, SEP = 7.5-1.9 kg 100 g -1 ) to HYPERS (R pred = 0.401-0.883, SEP = 7.6-2.4 kg 100 g -1 ), which is likely due to the wider wavelength range collected in visible/NIRS. However, a significant improvement was reached in both techniques when the two-band ratios preprocessing method was applied to the data, reducing SEP by at least 10.4% and 16.2% for visible/NIRS and HYPERS, respectively. Moreover, results from using the combination of the three-season data sets based on the two-band ratios showed that visible/NIRS (R pred = 0.886, SEP = 4.0 kg 100 g -1 ) and HYPERS (R pred = 0.844, SEP = 4.6 kg 100 g -1 ) models were consistently successful in predicting texture over a wide range of measurements. Visible/NIRS and HYPERS have great potential for predicting the texture of canned beans; the robustness of the models is impacted by genotypic diversity, planting year and phenotypic variability for canned bean texture used for model building, and hence, robust models can be built based on data sets with high phenotypic diversity in textural properties, and periodically maintained and updated with new data. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Validating the accuracy of SO2 gas retrievals in the thermal infrared (8-14 μm)
NASA Astrophysics Data System (ADS)
Gabrieli, Andrea; Porter, John N.; Wright, Robert; Lucey, Paul G.
2017-11-01
Quantifying sulfur dioxide (SO2) in volcanic plumes is important for eruption predictions and public health. Ground-based remote sensing of spectral radiance of plumes contains information on the path-concentration of SO2. However, reliable inversion algorithms are needed to convert plume spectral radiance measurements into SO2 path-concentrations. Various techniques have been used for this purpose. Recent approaches have employed thermal infrared (TIR) imaging between 8 μm and 14 μm to provide two-dimensional mapping of plume SO2 path-concentration, using what might be described as "dual-view" techniques. In this case, the radiance (or its surrogate brightness temperature) is computed for portions of the image that correspond to the plume and compared with spectral radiance obtained for adjacent regions of the image that do not (i.e., "clear sky"). In this way, the contribution that the plume makes to the measured radiance can be isolated from the background atmospheric contribution, this residual signal being converted to an estimate of gas path-concentration via radiative transfer modeling. These dual-view approaches suffer from several issues, mainly the assumption of clear sky background conditions. At this time, the various inversion algorithms remain poorly validated. This paper makes two contributions. Firstly, it validates the aforementioned dual-view approaches, using hyperspectral TIR imaging data. Secondly, it introduces a new method to derive SO2 path-concentrations, which allows for single point SO2 path-concentration retrievals, suitable for hyperspectral imaging with clear or cloudy background conditions. The SO2 amenable lookup table algorithm (SO2-ALTA) uses the MODTRAN5 radiative transfer model to compute radiance for a variety (millions) of plume and atmospheric conditions. Rather than searching this lookup table to find the best fit for each measured spectrum, the lookup table was used to train a partial least square regression (PLSR) model. The coefficients of this model are used to invert measured radiance spectra to path-concentration on a pixel-by-pixel basis. In order to validate the algorithms, TIR hyperspectral measurements were carried out by measuring sky radiance when looking through gas cells filled with known amounts of SO2. SO2-ALTA was also tested on retrieving SO2 path-concentrations from the Kīlauea volcano, Hawai'i. For cloud-free conditions, all three techniques worked well. In cases where background clouds were present, then only SO2-ALTA was found to provide good results, but only under low atmospheric water vapor column amounts.
Li, Chunhui; Yu, Chuanhua
2013-01-01
To provide a reference for evaluating public non-profit hospitals in the new environment of medical reform, we established a performance evaluation system for public non-profit hospitals. The new “input-output” performance model for public non-profit hospitals is based on four primary indexes (input, process, output and effect) that include 11 sub-indexes and 41 items. The indicator weights were determined using the analytic hierarchy process (AHP) and entropy weight method. The BP neural network was applied to evaluate the performance of 14 level-3 public non-profit hospitals located in Hubei Province. The most stable BP neural network was produced by comparing different numbers of neurons in the hidden layer and using the “Leave-one-out” Cross Validation method. The performance evaluation system we established for public non-profit hospitals could reflect the basic goal of the new medical health system reform in China. Compared with PLSR, the result indicated that the BP neural network could be used effectively for evaluating the performance public non-profit hospitals. PMID:23955238
Contribution of low molecular weight phenols to bitter taste and mouthfeel properties in red wines.
Gonzalo-Diago, Ana; Dizy, Marta; Fernández-Zurbano, Purificación
2014-07-01
The aim of this study was to explore the relationship between low molecular weight compounds present in wines and their sensory contribution. Six young red wines were fractionated by gel permeation chromatography and subsequently each fraction obtained was separated from sugars and acids by solid phase extraction. Wines and both fractions were in-mouth evaluated by a trained sensory panel and UPLC-MS analyses were performed. The lack of ethanol and proanthocyanidins greatly increased the acidity perceived. The elimination of organic acids enabled the description of the samples, which were evaluated as bitter, persistent and slightly astringent. Coutaric acid and quercetin-3-O-rutinoside appear to be relevant astringent compounds in the absence of proanthocyanidins. Bitter taste was highly correlated with the in-mouth persistence. A significant predictive model for bitter taste was built by means of PLSR. Further research must be carried out to validate the sensory contribution of the compounds involved in bitterness and astringency and to verify the sensory interactions observed. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Nousratpour, A.
2011-12-01
The annual CO2 emission from soils corresponds to a large portion of the global carbon cycle and equals 10 percent of the total atmospheric carbon pool. The total forest soil CO2 loss equals the sum of contribution from autotrophic and heterotrophic organisms. The autotrophic respiration is derived from recent photosynthates from the forest canopy and exudates via the roots. The heterotrophic respiration is less directly dependent on root presence and recently assimilated photosynthates, which points to the possibility of separate mechanisms governing the CO2 emissions. The variation of the CO2 flux from these some-what overlapping sources in the soil i.e. rhizospheric and non-rhizosperically is still not fully understood. Soil temperature and water availability in particular have often been used to explain the variation of soil CO2 efflux by using regression methods. In this experiment around 1000 hours of soil CO2-emission rates from a drained spruce forest was collected from 6 plots, among which 3 were previously root excluded. The emission rates were collected during 5 campaigns throughout the growing season along with continuous above ground and below ground temperature and water properties such as precipitation and VPD (vapor pressure deficit). The resulting matrix was analyzed using multivariate statistical model PLSr (Partial Least Squares regression). This operation reduces the dimensionality of large datasets with probable multicollinearity and helps clarify the dependence of a response factor on x- variables. In addition a time series analysis is applied to the dataset to address the time lag between below ground temperature and water properties to the above ground weather conditions such as VPD and air temperature. Mean carbon emission from the control plots (428 mg Carbon m-2 hr-1) was significantly larger than that from the root excluded plots (136 mg Carbon m-2 hr-1). During the growing season more than 2/3 of the total CO2 release was estimated to be root contribution. The results show that the activity in the rhizosphere increased with rising soil temperature, VPD and ground water depletion until a certain point. When the level of ground water depth was deeper than about 0.5 m the dependence was reversed. This effect was either the opposite or lacking in the root excluded plots, which reflects the involvement of the tree roots and the separate factors controlling the different sources of CO2.
NASA Astrophysics Data System (ADS)
Eisele, Andreas; Chabrillat, Sabine; Lau, Ian; Hecker, Christoph; Hewson, Robert; Carter, Dan; Wheaton, Buddy; Ong, Cindy; Cudahy, Thomas John; Kaufmann, Hermann
2014-05-01
Digital soil mapping with the means of passive remote sensing basically relies on the soils' spectral characteristics and an appropriate atmospheric window, where electromagnetic radiation transmits without significant attenuation. Traditionally the atmospheric window in the solar-reflective wavelength region (visible, VIS: 0.4 - 0.7 μm; near infrared, NIR: 0.7 - 1.1 μm; shortwave infrared, SWIR: 1.1 - 2.5 μm) has been used to quantify soil surface properties. However, spectral characteristics of semi-arid soils, typically have a coarse quartz rich texture and iron coatings that can limit the prediction of soil surface properties. In this study we investigated the potential of the atmospheric window in the thermal wavelength region (long wave infrared, LWIR: 8 - 14 μm) to predict soil surface properties such as the grain size distribution (texture) and the organic carbon content (SOC) for coarse-textured soils from the Australian wheat belt region. This region suffers soil loss due to wind erosion processes and large scale monitoring techniques, such as remote sensing, is urgently required to observe the dynamic changes of such soil properties. The coarse textured sandy soils of the investigated area require methods, which can measure the special spectral response of the quartz dominated mineralogy with iron oxide enriched grain coatings. By comparison, the spectroscopy using the solar-reflective region has limitations to discriminate such arid soil mineralogy and associated coatings. Such monitoring is important for observing potential desertification trends associated with coarsening of topsoil texture and reduction in SOC. In this laboratory study we identified the relevant LWIR wavelengths to predict these soil surface properties. The results showed the ability of multivariate analyses methods (PLSR) to predict these soil properties from the soil's spectral signature, where the texture parameters (clay and sand content) could be predicted well in the models using the LWIR-window (sand content: R2 = 0.84 and RMSECV = 1.09 %, and for clay content: R2 = 0.77 and RMSECV = 1.0 %, both with 3 factor models). In comparison, the quantification from the solar-reflective window showed its limitations in its relative complex PLSR models and a lower prediction accuracy (sand content: R2 = 0.69 and RMSECV = 1.5 % with 7 factors, and for clay content: R2 = 0.64 and RMSECV = 1.26 % with 9 factors). The prediction of the SOC content, on the other hand, showed minor disparity between the two atmospheric windows (LWIR: R2 = 0.73 and RMSECV = 0.1 % with 6 factors, VNIR-SWIR: R2 = 0.69 and RMSECV = 0.11 %, with 9 factors). The prospect of the LWIR for determining soil texture was demonstrated to be even more impressive when reduced to the spectral band specifications of airborne (TASI-600) and spaceborne (ASTER) sensors. The results demonstrate the high potential of the LWIR to detect and quantify soil surface properties in the future for a monitoring via LWIR hyperspectral remote sensing.
NASA Astrophysics Data System (ADS)
Tsakiridis, Nikolaos L.; Tziolas, Nikolaos; Dimitrakos, Agathoklis; Galanis, Georgios; Ntonou, Eleftheria; Tsirika, Anastasia; Terzopoulou, Evangelia; Kalopesa, Eleni; Zalidis, George C.
2017-09-01
Soil Spectral Libraries facilitate agricultural production taking into account the principles of a low-input sustainable agriculture and provide more valuable knowledge to environmental policy makers, enabling improved decision making and effective management of natural resources in the region. In this paper, a comparison in the predictive performance of two state of the art algorithms, one linear (Partial Least Squares Regression) and one non-linear (Cubist), employed in soil spectroscopy is conducted. The comparison was carried out in a regional Soil Spectral Library developed in the Eastern Macedonia and Thrace region of Northern Greece, comprised of roughly 450 Entisol soil samples from soil horizons A (0-30 cm) and B (30-60 cm). The soil spectra were acquired in the visible - Near Infrared Red region (vis- NIR, 350nm-2500nm) using a standard protocol in the laboratory. Three soil properties, which are essential for agriculture, were analyzed and taken into account for the comparison. These were the Organic Matter, the Clay content and the concentration of nitrate-N. Additionally, three different spectral pre-processing techniques were utilized, namely the continuum removal, the absorbance transformation, and the first derivative. Following the removal of outliers using the Mahalanobis distance in the first 5 principal components of the spectra (accounting for 99.8% of the variance), a five-fold cross-validation experiment was considered for all 12 datasets. Statistical comparisons were conducted on the results, which indicate that the Cubist algorithm outperforms PLSR, while the most informative transformation is the first derivative.
Chang, Xing; Jia, Hongmei; Zhou, Chao; Zhang, Hongwu; Yu, Meng; Yang, Junshan; Zou, Zhongmei
2015-12-01
Chaihu-Shu-Gan-San (CSGS) is a classical traditional Chinese medicine formula for the treatment of depression. As one of the single herbs in CSGS, Bai-Shao displayed antidepressant effect. In order to explore the role of Bai-Shao towards the antidepressant effect of CSGS, the metabolic regulation and chemical profiles of CSGS with and without Bai-Shao (QBS) were investigated using metabonomics integrated with chemical fingerprinting. At first, partial least squares regression (PLSR) analysis was applied to characterize the potential biomarkers associated with chronic unpredictable mild stress (CUMS)-induced depression. Among 46 differential metabolites found in the ultra-performance liquid chromatography quadrupole time of flight mass spectrometry (UPLC-Q-TOF/MS) and (1)H NMR-based urinary metabonomics, 20 were significantly correlated with the preferred sucrose consumption observed in behavior experiments and were considered as biomarkers to evaluate the antidepressant effect of CSGS. Based on differential regulation on CUMS-induced metabolic disturbances with CSGS and QBS treatments, we concluded that Bai-Shao made crucial contribution to CSGS in the improvement of the metabolic deviations of six biomarkers (i.e., glutamate, acetoacetic acid, creatinine, xanthurenic acid, kynurenic acid, and N-acetylserotonin) disturbed in CUMS-induced depression. While the chemical constituents of Bai-Shao contributed to CSGS were paeoniflorin, albiflorin, isomaltopaeoniflorin, and benzoylpaeoniflorin based on the multivariate analysis of the UPLC-Q-TOF/MS chemical profiles from CSGS and QBS extracts. These findings suggested that Bai-Shao played an indispensable role in the antidepressant effect of CSGS. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Vaudour, Emmanuelle; Gomez, Cécile; Fouad, Youssef; Gilliot, Jean-Marc; Lagacherie, Philippe
2017-04-01
This study aimed at exploring the potential of SENTINEL-2 (S2A) multispectral satellite images for predicting several topsoil properties in two contrasted environments: a temperate region marked by intensive annual crop cultivation and soils derived from either loess or colluvium and/or marine limestone or chalk for one part (Versailles Plain, 221 km2), and a Mediterranean region marked by vineyard cultivation and soils derived from either lacustrine limestone, calcareous sandstones, colluvium, or alluvial deposits (La Peyne catchment, 48 km2) for the other part. Two S2A images (acquired in mid-March 2016 over each site) were atmospherically corrected. Then NDVI was computed and thresholded (0.35) in order to extract bare soils. Prediction models of soil properties based on partial least squares regressions (PLSR) were built from S2A spectra of 72 and 143 sampling locations in the Versailles Plain and La Peyne catchment, respectively. Ten soil properties were investigated in both regions: pH, cation exchange capacity (CEC), five texture fractions (clay, coarse silt, fine silt, coarse sand and fine sand), iron, calcium carbonate and soil organic carbon (SOC) in the tilled horizon. Predictive abilities were studied according to R_cv2 and ratio of performance to deviation (RPD). Intermediate to near intermediate performances of prediction (R_cv2 and RPD between 0.28-0.70 and 1.19-1.85 respectively) were obtained for 6 topsoil properties: clay, iron, SOC, CEC, pH, coarse silt. In the Versailles Plain, 5 out of these properties could be predicted (by decreasing performance, CEC, SOC, pH, clay, coarse silt), while there were 4 predictable properties for La Peyne catchment (Iron, clay, CEC, coarse silt). The amount in coarse fragment content appeared to impact prediction error for iron content over La Peyne, while it influenced prediction error for SOC content over the Versailles Plain along with calcium carbonate content. A spatial structure of the estimated soil properties for bare soils pixels was highlighted, which promises further improvements in spatial prediction models for these properties. This work was carried out in the framework of both the TOSCA-CES "Cartographie Numérique des sols" and the PLEIADES-CO projects of the French Space Agency (CNES).
Nissen, Lise R; Byrne, Derek V; Bertelsen, Grete; Skibsted, Leif H
2004-11-01
Antioxidative efficiency of extracts of rosemary, green tea, coffee and grape skin in precooked pork patties was investigated during storage under retail conditions (10 days, 4 °C, atmospheric air), using descriptive sensory profiling following reheating and quantitative measurements of hexanal, thiobarbituric acid reactive substances (TBARS) and vitamin E as indicators of lipid oxidation. The initial oxidative status of pork patties (evaluated by ANOVA) showed a significant lower level of secondary oxidation products and higher levels of vitamin E in patties with extracts incorporated, indicating that the extracts retarded lipid oxidation during processing of the meat. Data analysis for the storage study was based on qualitative overview of sensory/chemical variation by principal component analysis (PCA) and quantitative ANOVA-PLSR for determination of the relationship between design variables (days of chill-storage, extract treatment) versus sensory-chemical variables and PLSR for elucidating the predictive ability of the chemical methods for sensory terms. Lipid oxidation was seen to involve a decrease in perception of meat flavour/odour and a concomitant increase in the off-flavour/odours linseed, rancid. TBARS, hexanal and vitamin E were all significant predictive indices (P<0.05) for the majority of the sensory terms, while vitamin E through negative correlation with TBARS and hexanal displayed its antioxidative effect and thus, its ability to preserve sensory fresh meat flavour/odour. The effect of the various extracts incorporated in the product was clearly related to the degree of lipid oxidation and an overall ranking of the antioxidative efficiency of extracts in declining order became apparent: Rosemary>Grape skin>Tea>Coffee>Reference. Furthermore, the relation between extracts and vitamin E indicated that the extracts, to some extent, interacted with the vitamin and prevented it from degrading. In conclusion, the rosemary extract displayed potential for maintaining sensory eating quality in processed pork products.
Difficulties of biomass estimation over natural grassland
NASA Astrophysics Data System (ADS)
Kertész, Péter; Gecse, Bernadett; Pintér, Krisztina; Fóti, Szilvia; Nagy, Zoltán
2017-04-01
Estimation of biomass amount in grasslands using remote sensing is a challenge due to the high diversity and different phenologies of the constituting plant species. The aim of this study was to estimate the biomass amount (dry weight per area) during the vegetation period of a diverse semi-natural grassland with remote sensing. A multispectral camera (Tetracam Mini-MCA 6) was used with 3 cm ground resolution. The pre-processing method includes noise reduction, the correction for the vignetting effect and the calculation of the reflectance using an Incident Light Sensor (ILS). Calibration was made with ASD spectrophotometer as reference. To estimate biomass Partial Least Squares Regression (PLSR) statistical method was used with 5 bands and NDVI as input variables. Above ground biomass was cut in 15 quadrats (50×50 cm) as reference. The best prediction was attained in spring (r2=0.94, RMSE: 26.37 g m-2). The average biomass amount was 167 g m-2. The variability of the biomass is mainly determined by the relief, which causes the high and low biomass patches to be stable. The reliability of biomass estimation was negatively affected by the appearance of flowers and by the senescent plant parts during the summer. To determine the effects of flower's presence on the biomass estimation, 20 dominant species with visually dominant flowers in the area were selected and cover of flowers (%) were estimated in permanent plots during measurement campaigns. If the cover of flowers was low (<25%), the biomass amount estimation was successful (r2 >0,9), while at higher cover of flowers (>30%), the estimation failed (r2 <0,2). This effect restricts the usage of the remote sensing method to the spring - early summer period in diverse grasslands.
Saraiva, C; Vasconcelos, H; de Almeida, José M M M
2017-01-16
The aim of this work was to investigate the potential of Fourier transform infrared spectroscopy (FTIR) to detect and predict the bacterial load of salmon fillets (Salmo salar) stored at 3, 8 and 30°C under three packaging conditions: air packaging (AP) and two modified atmospheres constituted by a mixture of 50%N 2 /40%CO 2 /10%O 2 with lemon juice (MAPL) and without lemon juice (MAP). Fresh salmon samples were periodically examined for total viable counts (TVC), specific spoilage organisms (SSO) counts, pH, FTIR and sensory assessment of freshness. Principal components analysis (PCA) allowed identification of the wavenumbers potentially correlated with the spoilage process. Linear discriminant analysis (LDA) of infrared spectral data was performed to support sensory data and to accurately identify samples freshness. The effect of the packaging atmospheres was assessed by microbial enumeration and LDA was used to determine sample packaging from the measured infrared spectra. It was verified that modified atmospheres can decrease significantly the bacterial load of fresh salmon. Lemon juice combined with MAP showed a more pronounced delay in the growth of Brochothrix thermosphacta, Photobacterium phosphoreum, psychrotrophs and H 2 S producers. Partial least squares regression (PLS-R) allowed estimates of TVC and psychrotrophs, lactic acid bacteria, molds and yeasts, Brochothrix thermosphacta, Enterobacteriaceae, Pseudomonas spp. and H 2 S producer counts from the infrared spectral data. For TVC, the root mean square error of prediction (RMSEP) value was 0.78logcfug -1 for an external set of samples. According to the results, FTIR can be used as a reliable, accurate and fast method for real time freshness evaluation of salmon fillets stored under different temperatures and packaging atmospheres. Copyright © 2016 Elsevier B.V. All rights reserved.
Regional prediction of soil organic carbon content over croplands using airborne hyperspectral data
NASA Astrophysics Data System (ADS)
Vaudour, Emmanuelle; Gilliot, Jean-Marc; Bel, Liliane; Lefebvre, Josias; Chehdi, Kacem
2015-04-01
This study was carried out in the framework of the Prostock-Gessol3 and the BASC-SOCSENSIT projects, dedicated to the spatial monitoring of the effects of exogenous organic matter land application on soil organic carbon storage. It aims at identifying the potential of airborne hyperspectral AISA-Eagle data for predicting the topsoil organic carbon (SOC) content of bare cultivated soils over a large peri-urban area (221 km2) with both contrasted soils and SOC contents, located in the western region of Paris, France. Soils comprise hortic or glossic luvisols, calcaric, rendzic cambisols and colluvic cambisols. Airborne AISA-Eagle data (400-1000 nm, 126 bands) with 1 m-resolution were acquired on 17 April 2013 over 13 tracks which were georeferenced. Tracks were atmospherically corrected using a set of 22 synchronous field spectra of both bare soils, black and white targets and impervious surfaces. Atmospherically corrected track tiles were mosaicked at a 2 m-resolution resulting in a 66 Gb image. A SPOT4 satellite image was acquired the same day in the framework of the SPOT4-Take Five program of the French Space Agency (CNES) which provided it with atmospheric correction. The land use identification system layer (RPG) of 2012 was used to mask non-agricultural areas, then NDVI calculation and thresholding enabled to map agricultural fields with bare soil. All 18 sampled sites known to be bare at this very date were correctly included in this map. A total of 85 sites sampled in 2013 or in the 3 previous years were identified as bare by means of this map. Predictions were made from the mosaic spectra which were related to topsoil SOC contents by means of partial least squares regression (PLSR). Regression robustness was evaluated through a series of 1000 bootstrap data sets of calibration-validation samples. The use of the total sample including 27 sites under cloud shadows led to non-significant results. Considering 43 sites outside cloud shadows only, median validation root-mean-square errors (RMSE) were ~4-4.5 g. kg-1. An additional set of 15 samples with bare soils led to similar RMSE values. Such results are only slightly better than those resulting from an earlier study with multispectral satellite images (Vaudour et al., 2013). The influence of soil surface condition and particularly soil roughness is discussed.
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
The potential of UAS imagery for soil mapping at the agricultural plot scale
NASA Astrophysics Data System (ADS)
Gilliot, Jean-Marc; Michelin, Joël; Becu, Maxime; Cissé, Moustapha; Hadjar, Dalila; Vaudour, Emmanuelle
2017-04-01
Soil mapping is expensive and time consuming. Airborne and satellite remote sensing data have already been used to predict some soil properties but now Unmanned Aerial Systems (UAS) allow to do many images acquisitions in various field conditions in favour of developing methods for better prediction models construction. This study propose an operational method for spatial prediction of soil properties (organic carbon, clay) at the scale of the agricultural plot by using UAS imagery. An agricultural plot of 28 ha, located in the western region of Paris France, was studied from March to May 2016. An area of 3.6 ha was delimited within the plot and a total of 16 flights were completed. The UAS platforms used were the eBee fixed wing provided by Sensefly® flying at an altitude from 60m to 130m and the iris+ 3DR® Quadcopter (from 30m to 100m). Two multispectral visible near-infrared cameras were used: the AirInov® MultiSPEC 4C® and the Micasense® RedEdge®. 42 ground control points (GCP) were sampled within the 3.6 ha plot. A centimetric Trimble Geo 7x DGPS was used to determine precise GCP positions. On each GCP the soil horizons were described and the top soil were sampled for standard physico-chemical analysis. Ground spectral measurements with a Spectral Evolution® SR-3500 spectroradiometer were made synchronously with the drone flights. 22 additional GCP were placed around the 3.6 ha area in order to realize a precise georeferencing. The multispectral mosaics were calculated using the Agisoft Photoscan® software and all mapping processings were done with the ESRI ArcGIS® 10.3 software. The soil properties were estimated by partial least squares regression (PLSR) between the laboratory analyses and the multispectral information of the UAS images, with the PLS package of the R software. The objective was to establish a model that would achieve an acceptable prediction quality using minimum number of points. For this, we tested 5 models with a decreasing number of calibration points: 20, 15, 10, 5 and 3 points. The remaining points were used to validate the models. The point positions were determined on the basis of a soil brightness index map calculated from the UAS image, in order to distribute the points in areas of contrasted brightness. Root Mean Squared Error Prediction (RMSEP) obtained by cross-validation were 1.6 g.kg-1 and 28 g.kg-1 for organic carbon and clay respectively, with 20 points. Results showed ability to obtain acceptable precision (2 g.kg-1 and 48 g.kg-1) with only 3 points. This work was supported by the SolFIT research network of the BASC LabEx (Laboratory of Excellence) and by the TOSCA-PLEIADES-CO project of the French Space Agency (CNES).
Parametric regression model for survival data: Weibull regression model as an example
2016-01-01
Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
NASA Astrophysics Data System (ADS)
Tao, Feifei; Mba, Ogan; Liu, Li; Ngadi, Michael
2017-04-01
Polyunsaturated fatty acids (PUFAs) are important nutrients present in Salmon. However, current methods for quantifying the fatty acids (FAs) contents in foods are generally based on gas chromatography (GC) technique, which is time-consuming, laborious and destructive to the tested samples. Therefore, the capability of near-infrared (NIR) hyperspectral imaging to predict the PUFAs contents of C20:2 n-6, C20:3 n-6, C20:5 n-3, C22:5 n-3 and C22:6 n-3 in Salmon fillets in a rapid and non-destructive way was investigated in this work. Mean reflectance spectra were first extracted from the region of interests (ROIs), and then the spectral pre-processing methods of 2nd derivative and Savitzky-Golay (SG) smoothing were performed on the original spectra. Based on the original and the pre-processed spectra, PLSR technique was employed to develop the quantitative models for predicting each PUFA content in Salmon fillets. The results showed that for all the studied PUFAs, the quantitative models developed using the pre-processed reflectance spectra by "2nd derivative + SG smoothing" could improve their modeling results. Good prediction results were achieved with RP and RMSEP of 0.91 and 0.75 mg/g dry weight, 0.86 and 1.44 mg/g dry weight, 0.82 and 3.01 mg/g dry weight for C20:3 n-6, C22:5 n-3 and C20:5 n-3, respectively after pre-processing by "2nd derivative + SG smoothing". The work demonstrated that NIR hyperspectral imaging could be a useful tool for rapid and non-destructive determination of the PUFA contents in fish fillets.
Watanabe, Hiroyuki; Miyazaki, Hiroyasu
2006-01-01
Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
Modified Regression Correlation Coefficient for Poisson Regression Model
NASA Astrophysics Data System (ADS)
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
NASA Astrophysics Data System (ADS)
Chavana-Bryant, C.; Malhi, Y.; Gerard, F.
2015-12-01
Leaf aging is a fundamental driver of changes in leaf traits, thereby, regulating ecosystem processes and remotely-sensed canopy dynamics. Leaf age is particularly important for carbon-rich tropical evergreen forests, as leaf demography (leaf age distribution) has been proposed as a major driver of seasonal productivity in these forests. We explore leaf reflectance as a tool to monitor leaf age and develop a novel spectra-based (PLSR) model to predict age using data from a phenological study of 1,072 leaves from 12 lowland Amazonian canopy tree species in southern Peru. Our results demonstrate monotonic decreases in LWC and Pmass and increase in LMA with age across species; Nmass and Cmassshowed monotonic but species-specific age responses. Spectrally, we observed large age-related variation across species, with the most age-sensitive spectral domains found to be: green peak (550nm), red edge (680-750 nm), NIR (700-850 nm), and around the main water absorption features (~1450 and ~1940 nm). A spectra-based model was more accurate in predicting leaf age (R2= 0.86; %RMSE= 33) compared to trait-based models using single (R2=0.07 to 0.73; %RMSE=7 to 38) and multiple predictors (step-wise analysis; R2=0.76; %RMSE=28). Spectral and trait-based models established a physiochemical basis for the spectral age model. The relative importance of the traits modifying the leaf spectra of aging leaves was: LWC>LMA>Nmass>Pmass,&Cmass. Vegetation indices (VIs), including NDVI, EVI2, NDWI and PRI were all age-dependent. This study highlights the importance of leaf age as a mediator of leaf traits, provides evidence of age-related leaf reflectance changes that have important impacts on VIs used to monitor canopy dynamics and productivity, and proposes a new approach to predicting and monitoring leaf age with important implications for remote sensing.
Land-use versus natural controls on soil fertility in the Subandean Amazon, Peru.
Lindell, Lina; Aström, Mats; Oberg, Tomas
2010-01-15
Deforestation to amplify the agricultural frontier is a serious threat to the Amazon forest. Strategies to attain and maintain satisfactory soil fertility, which requires knowledge of spatial and temporal changes caused by land-use, are important for reaching sustainable development. This study highlights these issues by evaluating the relative effects of agricultural land-use and natural factors on chemical fertility of Inceptisols on redbed lithologies in the Subandean Amazon. Macro and micronutrients were determined in topsoil and subsoil in the vicinity of two villages at a total of 80 sites including pastures, coffee plantations, swidden fields, secondary forest and, as a reference, adjacent primary forest. Differences in soil fertility between the land cover classes were investigated by principal component analysis (PCA) and partial least squares regression (PLSR). Primary forest soil was found to be chemically similar to that of coffee plantations, pastures and secondary forests. There were no significant differences between soils of these land cover types in terms of plant nutrients (e.g. N, P, K, Ca, Mg, Mo, Mn, Zn, Cu and Co) or other fertility indicators (OM, pH, BS, EC, CECe and exchangeable acidity). The parent material (as indicated by texture and sample geographical origin) and the slope of the sampled sites were stronger controls on soil fertility than land cover type. Elevated concentrations of a few nutrients (NO(3) and K) were, however detected in soils of swidden fields. Despite being fertile (higher CECe, Ca and P) compared to Oxisols and Ultisols in the Amazon lowland, the Subandean soils frequently showed deficiencies in several nutrients (e.g. P, K, NO(3), Cu and Zn), and high levels of free Al at acidic sites. This paper concludes that deforestation and agricultural land-use has not introduced lasting chemical changes in the studied Subandean soils that are significant in comparison to the natural variability. Copyright 2009 Elsevier B.V. All rights reserved.
Zhang, Jingcheng; Pu, Ruiliang; Yuan, Lin; Wang, Jihua; Huang, Wenjiang; Yang, Guijun
2014-01-01
Powdery mildew is one of the most serious diseases that have a significant impact on the production of winter wheat. As an effective alternative to traditional sampling methods, remote sensing can be a useful tool in disease detection. This study attempted to use multi-temporal moderate resolution satellite-based data of surface reflectances in blue (B), green (G), red (R) and near infrared (NIR) bands from HJ-CCD (CCD sensor on Huanjing satellite) to monitor disease at a regional scale. In a suburban area in Beijing, China, an extensive field campaign for disease intensity survey was conducted at key growth stages of winter wheat in 2010. Meanwhile, corresponding time series of HJ-CCD images were acquired over the study area. In this study, a number of single-stage and multi-stage spectral features, which were sensitive to powdery mildew, were selected by using an independent t-test. With the selected spectral features, four advanced methods: mahalanobis distance, maximum likelihood classifier, partial least square regression and mixture tuned matched filtering were tested and evaluated for their performances in disease mapping. The experimental results showed that all four algorithms could generate disease maps with a generally correct distribution pattern of powdery mildew at the grain filling stage (Zadoks 72). However, by comparing these disease maps with ground survey data (validation samples), all of the four algorithms also produced a variable degree of error in estimating the disease occurrence and severity. Further, we found that the integration of MTMF and PLSR algorithms could result in a significant accuracy improvement of identifying and determining the disease intensity (overall accuracy of 72% increased to 78% and kappa coefficient of 0.49 increased to 0.59). The experimental results also demonstrated that the multi-temporal satellite images have a great potential in crop diseases mapping at a regional scale. PMID:24691435
Liu, Yingchun; Liu, Zhongbo; Sun, Guoxiang; Wang, Yan; Ling, Junhong; Gao, Jiayue; Huang, Jiahao
2015-01-01
A combination method of multi-wavelength fingerprinting and multi-component quantification by high performance liquid chromatography (HPLC) coupled with diode array detector (DAD) was developed and validated to monitor and evaluate the quality consistency of herbal medicines (HM) in the classical preparation Compound Bismuth Aluminate tablets (CBAT). The validation results demonstrated that our method met the requirements of fingerprint analysis and quantification analysis with suitable linearity, precision, accuracy, limits of detection (LOD) and limits of quantification (LOQ). In the fingerprint assessments, rather than using conventional qualitative "Similarity" as a criterion, the simple quantified ratio fingerprint method (SQRFM) was recommended, which has an important quantified fingerprint advantage over the "Similarity" approach. SQRFM qualitatively and quantitatively offers the scientific criteria for traditional Chinese medicines (TCM)/HM quality pyramid and warning gate in terms of three parameters. In order to combine the comprehensive characterization of multi-wavelength fingerprints, an integrated fingerprint assessment strategy based on information entropy was set up involving a super-information characteristic digitized parameter of fingerprints, which reveals the total entropy value and absolute information amount about the fingerprints and, thus, offers an excellent method for fingerprint integration. The correlation results between quantified fingerprints and quantitative determination of 5 marker compounds, including glycyrrhizic acid (GLY), liquiritin (LQ), isoliquiritigenin (ILG), isoliquiritin (ILQ) and isoliquiritin apioside (ILA), indicated that multi-component quantification could be replaced by quantified fingerprints. The Fenton reaction was employed to determine the antioxidant activities of CBAT samples in vitro, and they were correlated with HPLC fingerprint components using the partial least squares regression (PLSR) method. In summary, the method of multi-wavelength fingerprints combined with antioxidant activities has been proved to be a feasible and scientific procedure for monitoring and evaluating the quality consistency of CBAT.
Tran, Chieu D; Mututuvari, Tamutsiwa M
2016-03-07
A method was developed in which cellulose (CEL) and/or chitosan (CS) were added to keratin (KER) to enable [CEL/CS+KER] composites formed to have better mechanical strength and wider utilization. Butylmethylimmidazolium chloride ([BMIm + Cl - ]), an ionic liquid, was used as the sole solvent, and because the majority of [BMIm + Cl - ] used (at least 88%) was recovered, the method is green and recyclable. FTIR, XRD, 13 C CP-MAS NMR and SEM results confirm that KER, CS and CEL remain chemically intact and distributed homogeneously in the composites. We successfully demonstrate that the widely used method based on the deconvolution of the FTIR bands of amide bonds to determine secondary structure of proteins is relatively subjective as the conformation obtained is strongly dependent on the choice of parameters selected for curve fitting. A new method, based on the partial least squares regression analysis (PLSR) of the amide bands, was developed, and proven to be objective and can provide more accurate information. Results obtained with this method agree well with those by XRD, namely they indicate that although KER retains its second structure when incorporated into the [CEL+CS] composites, it has relatively lower α -helix, higher β -turn and random form compared to that of the KER in native wool. It seems that during dissolution by [BMIm + Cl - ], the inter- and intramolecular forces in KER were broken thereby destroying its secondary structure. During regeneration, these interactions were reestablished to reform partially the secondary structure. However, in the presence of either CEL or CS, the chains seem to prefer the extended form thereby hindering reformation of the α -helix. Consequently, the KER in these matrices may adopt structures with lower content of α -helix and higher β -sheet. As anticipated, results of tensile strength and TGA confirm that adding CEL or CS into KER substantially increase the mechanical strength and thermal stability of the [CS/CEL+KER] composites.
Zhang, Jingcheng; Pu, Ruiliang; Yuan, Lin; Wang, Jihua; Huang, Wenjiang; Yang, Guijun
2014-01-01
Powdery mildew is one of the most serious diseases that have a significant impact on the production of winter wheat. As an effective alternative to traditional sampling methods, remote sensing can be a useful tool in disease detection. This study attempted to use multi-temporal moderate resolution satellite-based data of surface reflectances in blue (B), green (G), red (R) and near infrared (NIR) bands from HJ-CCD (CCD sensor on Huanjing satellite) to monitor disease at a regional scale. In a suburban area in Beijing, China, an extensive field campaign for disease intensity survey was conducted at key growth stages of winter wheat in 2010. Meanwhile, corresponding time series of HJ-CCD images were acquired over the study area. In this study, a number of single-stage and multi-stage spectral features, which were sensitive to powdery mildew, were selected by using an independent t-test. With the selected spectral features, four advanced methods: mahalanobis distance, maximum likelihood classifier, partial least square regression and mixture tuned matched filtering were tested and evaluated for their performances in disease mapping. The experimental results showed that all four algorithms could generate disease maps with a generally correct distribution pattern of powdery mildew at the grain filling stage (Zadoks 72). However, by comparing these disease maps with ground survey data (validation samples), all of the four algorithms also produced a variable degree of error in estimating the disease occurrence and severity. Further, we found that the integration of MTMF and PLSR algorithms could result in a significant accuracy improvement of identifying and determining the disease intensity (overall accuracy of 72% increased to 78% and kappa coefficient of 0.49 increased to 0.59). The experimental results also demonstrated that the multi-temporal satellite images have a great potential in crop diseases mapping at a regional scale.
NASA Astrophysics Data System (ADS)
Gabrieli, A.; Wright, R.; Lucey, P. G.; Porter, J. N.
2017-12-01
Detecting and quantifying volcanic carbon dioxide (CO2) and sulfur dioxide (SO2) emissions is of relevance to volcanologists. Changes in the amount and composition of gases that volcanoes emit are related to subsurface magma movements and the probability of eruptions. Volcanic gases and related acidic aerosols are also an important atmospheric pollution source that create environmental health hazards for people, animals, plants, and infrastructures. For these reasons, it is important to measure emissions from volcanic plumes during both day and night. We present image measurements of the volcanic plume at Kīlauea volcano, HI, and flux derivation, using a newly developed 8-14 um hyperspectral imaging spectrometer, the Thermal Hyperspectral Imager (THI). THI is capable of acquiring images of the scene it views from which spectra can be derived from each pixel. Each spectrum contains 50 wavelength samples between 8 and 14 um where CO2 and SO2 volcanic gases have diagnostic absorption/emission features respectively at 8.6 and 14 um. Plume radiance measurements were carried out both during the day and the night by using both the lava lake in the Halema'uma'u crater as a hot source and the sky as a cold background to detect respectively the spectral signatures of volcanic CO2 and SO2 gases. CO2 and SO2 path-concentrations were then obtained from the spectral radiance measurements using a new Partial Least Squares Regression (PLSR)-based inversion algorithm, which was developed as part of this project. Volcanic emission fluxes were determined by combining the path measurements with wind observations, derived directly from the images. Several hours long time-series of volcanic emission fluxes will be presented and the SO2 conversion rates into aerosols will be discussed. The new imaging and inversion technique, discussed here, are novel allowing for continuous CO2 and SO2 plume mapping during both day and night.
Porto-Figueira, Priscilla; Figueira, José A; Berenguer, Pedro; Câmara, José S
2018-04-15
The effect of ripening on the evolution of the volatomic pattern from endemic Vaccinium padifolium L. (Uveira) berries was investigated using headspace-solid phase microextraction (HS-SPME) followed by gas chromatography/quadrupole-mass spectrometry (GC-qMS) and multivariate statistical analysis (MVA). The most significant HS-SPME parameters, namely fibre polymer, ionic strength and extraction time, were optimized in order to improve extraction efficiency. Under optimal experimental conditions (DVB/CAR/PDMS fibre coating, 40°C, 30min extraction time and 5g of sample amount), a total of 72 volatiles of different functionalities were isolated and identified. Terpenes followed by higher alcohols and esters were the predominant classes in the ripening stages - green, break and ripe. Although significant differences in the volatomic profiles at the three stages were obtained, cis-β-ocimene (2.0-40.0%), trans-2-hexenol (2.4-19.4%), cis-3-hexenol (2.5.16.4%), β-myrcene (1.9-13.8%), 1-hexanol (1.7-13.6%), 2-hexenal (0.7-8.0%), 2-heptanone (0.7-7.7%), and linalool (1.9-6.1%) were the main volatile compounds identified. Higher alcohols, carboxylic acids and ketones gradually increased during ripening, whereas monoterpenes significantly decreased. These trends were dominated by the higher alcohols (1-hexanol, cis-3-hexenol, trans-2-hexenol) and monoterpenes (β-myrcene, cis-β-ocimene and trans-β-ocimene). Partial least squares regression (PLSR) revealed that ethyl caprylate (1.000), trans-geraniol (0.995), ethyl isovalerate (-0.994) and benzyl carbinol (0.993) are the key variables that most contributed to the successful differentiation of Uveira berries according to ripening stage. To the best of our knowledge, no study has carried out on the volatomic composition of berries from endemic Uveira. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
Bollegala, Danushka; Kontonatsios, Georgios; Ananiadou, Sophia
2015-01-01
Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks. PMID:26030738
Spatial distribution of heterocyclic organic matter compounds at macropore surfaces in Bt-horizons
NASA Astrophysics Data System (ADS)
Leue, Martin; Eckhardt, Kai-Uwe; Gerke, Horst H.; Ellerbrock, Ruth H.; Leinweber, Peter
2017-04-01
The illuvial Bt-horizon of Luvisols is characterized by coatings of clay and organic matter (OM) at the surfaces of cracks, biopores and inter-aggregate spaces. The OM composition of the coatings that originate from preferential transport of suspended matter in macropores determines the physico-chemical properties of the macropore surfaces. The analysis of the spatial distribution of specific OM components such as heterocyclic N-compounds (NCOMP) and benzonitrile and naphthalene (BN+NA) could enlighten the effect of macropore coatings on the transport of colloids and reactive solutes during preferential flow and on OM turnover processes in subsoils. The objective was to characterize the mm-to-cm scale spatial distribution of NCOMP and BN+NA at intact macropore surfaces from the Bt-horizons of two Luvisols developed on loess and glacial till. In material manually separated from macropore surfaces the proportions of NCOMP and BN+NA were determined by pyrolysis-field ionization mass spectrometry (Py-FIMS). These OM compounds, likely originating from combustion residues, were found increased in crack coatings and pinhole fillings but decreased in biopore walls (worm burrows and root channels). The Py-FIMS data were correlated with signals from C=O and C=C groups and with signals from O-H groups of clay minerals as determined by Fourier transform infrared spectroscopy in diffuse reflectance mode (DRIFT). Intensive signals of C15 to C17 alkanes from long-chain alkenes as main components of diesel and diesel exhaust particulates substantiated the assumption that burning residues were prominent in the subsoil OM. The spatial distribution of NCOMP and BN+NA along the macropores was predicted by partial least squares regression (PLSR) using DRIFT mapping spectra from intact surfaces and was found closely related to the distribution of crack coatings and pinholes. The results emphasize the importance of clay coatings in the subsoil to OM sorption and stabilization. Differences between biopores and cracks suggest differences in the mass transport and OM turnover between these macropore types in Luvisols.
Deng, Qingqiong; Zhou, Mingquan; Wu, Zhongke; Shui, Wuyang; Ji, Yuan; Wang, Xingce; Liu, Ching Yiu Jessica; Huang, Youliang; Jiang, Haiyan
2016-02-01
Craniofacial reconstruction recreates a facial outlook from the cranium based on the relationship between the face and the skull to assist identification. But craniofacial structures are very complex, and this relationship is not the same in different craniofacial regions. Several regional methods have recently been proposed, these methods segmented the face and skull into regions, and the relationship of each region is then learned independently, after that, facial regions for a given skull are estimated and finally glued together to generate a face. Most of these regional methods use vertex coordinates to represent the regions, and they define a uniform coordinate system for all of the regions. Consequently, the inconsistence in the positions of regions between different individuals is not eliminated before learning the relationships between the face and skull regions, and this reduces the accuracy of the craniofacial reconstruction. In order to solve this problem, an improved regional method is proposed in this paper involving two types of coordinate adjustments. One is the global coordinate adjustment performed on the skulls and faces with the purpose to eliminate the inconsistence of position and pose of the heads; the other is the local coordinate adjustment performed on the skull and face regions with the purpose to eliminate the inconsistence of position of these regions. After these two coordinate adjustments, partial least squares regression (PLSR) is used to estimate the relationship between the face region and the skull region. In order to obtain a more accurate reconstruction, a new fusion strategy is also proposed in the paper to maintain the reconstructed feature regions when gluing the facial regions together. This is based on the observation that the feature regions usually have less reconstruction errors compared to rest of the face. The results demonstrate that the coordinate adjustments and the new fusion strategy can significantly improve the craniofacial reconstructions. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Moderation analysis using a two-level regression model.
Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott
2014-10-01
Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.
Li, Jian; Ma, Guowei; Ma, Lin; Bao, Xiaolin; Li, Liping; Zhao, Qian
2018-01-01
Effects of 1-methylcyclopropene (1-MCP) and vacuum precooling on quality and antioxidant properties of blackberries (Rubus spp.) were evaluated using one-way analysis of variance, principal component analysis (PCA), partial least squares (PLS), and path analysis. Results showed that the activities of antioxidant enzymes were enhanced by both 1-MCP treatment and vacuum precooling. PCA could discriminate 1-MCP treated fruit and the vacuum precooled fruit and showed that the radical-scavenging activities in vacuum precooled fruit were higher than those in 1-MCP treated fruit. The scores of PCA showed that H2O2 content was the most important variables of blackberry fruit. PLSR results showed that peroxidase (POD) activity negatively correlated with H2O2 content. The results of path coefficient analysis indicated that glutathione (GSH) also had an indirect effect on H2O2 content. PMID:29487622
A portable device for rapid nondestructive detection of fresh meat quality
NASA Astrophysics Data System (ADS)
Lin, Wan; Peng, Yankun
2014-05-01
Quality attributes of fresh meat influence nutritional value and consumers' purchasing power. In order to meet the demand of inspection department for portable device, a rapid and nondestructive detection device for fresh meat quality based on ARM (Advanced RISC Machines) processor and VIS/NIR technology was designed. Working principal, hardware composition, software system and functional test were introduced. Hardware system consisted of ARM processing unit, light source unit, detection probe unit, spectral data acquisition unit, LCD (Liquid Crystal Display) touch screen display unit, power unit and the cooling unit. Linux operating system and quality parameters acquisition processing application were designed. This system has realized collecting spectral signal, storing, displaying and processing as integration with the weight of 3.5 kg. 40 pieces of beef were used in experiment to validate the stability and reliability. The results indicated that prediction model developed using PLSR method using SNV as pre-processing method had good performance, with the correlation coefficient of 0.90 and root mean square error of 1.56 for validation set for L*, 0.95 and 1.74 for a*,0.94 and 0.59 for b*, 0.88 and 0.13 for pH, 0.79 and 12.46 for tenderness, 0.89 and 0.91 for water content, respectively. The experimental result shows that this device can be a useful tool for detecting quality of meat.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
NASA Astrophysics Data System (ADS)
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].
Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L
2017-03-10
To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
Evaluation of weighted regression and sample size in developing a taper model for loblolly pine
Kenneth L. Cormier; Robin M. Reich; Raymond L. Czaplewski; William A. Bechtold
1992-01-01
A stem profile model, fit using pseudo-likelihood weighted regression, was used to estimate merchantable volume of loblolly pine (Pinus taeda L.) in the southeast. The weighted regression increased model fit marginally, but did not substantially increase model performance. In all cases, the unweighted regression models performed as well as the...
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
NASA Astrophysics Data System (ADS)
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
Applying Kaplan-Meier to Item Response Data
ERIC Educational Resources Information Center
McNeish, Daniel
2018-01-01
Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…
ERIC Educational Resources Information Center
Liou, Pey-Yan
2009-01-01
The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G
2011-06-28
We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of weight records. The results indicate that random regression models provide more accurate expected breeding values than the traditionally finite multi-trait models. Thus, higher genetic responses are expected for beef cattle growth traits by replacing a multi-trait model with random regression models for genetic evaluation. B-spline functions could be applied as an alternative to Legendre polynomials to model covariance functions for weights from birth to mature age.
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Modeling absolute differences in life expectancy with a censored skew-normal regression approach
Clough-Gorr, Kerri; Zwahlen, Marcel
2015-01-01
Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544
Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C
2018-06-29
A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)
NASA Astrophysics Data System (ADS)
Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul
2018-05-01
The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.
Bayesian Unimodal Density Regression for Causal Inference
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2011-01-01
Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…
Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace
ERIC Educational Resources Information Center
Culpepper, Steven Andrew; Park, Trevor
2017-01-01
A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Comparative evaluation of urban storm water quality models
NASA Astrophysics Data System (ADS)
Vaze, J.; Chiew, Francis H. S.
2003-10-01
The estimation of urban storm water pollutant loads is required for the development of mitigation and management strategies to minimize impacts to receiving environments. Event pollutant loads are typically estimated using either regression equations or "process-based" water quality models. The relative merit of using regression models compared to process-based models is not clear. A modeling study is carried out here to evaluate the comparative ability of the regression equations and process-based water quality models to estimate event diffuse pollutant loads from impervious surfaces. The results indicate that, once calibrated, both the regression equations and the process-based model can estimate event pollutant loads satisfactorily. In fact, the loads estimated using the regression equation as a function of rainfall intensity and runoff rate are better than the loads estimated using the process-based model. Therefore, if only estimates of event loads are required, regression models should be used because they are simpler and require less data compared to process-based models.
A generalized right truncated bivariate Poisson regression model with applications to health data.
Islam, M Ataharul; Chowdhury, Rafiqul I
2017-01-01
A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.
A generalized right truncated bivariate Poisson regression model with applications to health data
Islam, M. Ataharul; Chowdhury, Rafiqul I.
2017-01-01
A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
NASA Astrophysics Data System (ADS)
Klement, Ales; Jaksik, Ondrej; Kodesova, Radka; Drabek, Ondrej; Boruvka, Lubos
2013-04-01
Visible and near-infrared (VNIR) diffuse reflectance spectroscopy is a progressive method used for prediction of soil properties. Study was performed on the soils from the agricultural land from the south Moravia municipality of Brumovice. Studied area is characterized by a relatively flat upper part, a tributary valley in the middle and a colluvial fan at the bottom. Haplic Chernozem reminded at the flat upper part of the area. Regosols were formed at steep parts of the valley. Colluvial Chernozem and Colluvial soils were formed at the bottom parts of the valley and at the bottom part of the studied field. The goal of the study was to evaluate relationship between soil spectra curves and organic matter content, and different forms iron and manganese content (Mehlich III extract, ammonium oxalate extract and dithionite-citrate extract). Samples (87) were taken from the topsoil within regular grid covering studied area. The soil spectra curves (of air dry soil and sieved using 2 mm sieve) were measured in the laboratory using spectometer FieldSpec®3 (350 - 2 500 nm). The Fe and Mn contents in different extract were measured using ICP-OES (with an iCAP 6500 Radial ICP Emission spectrometer; Thermo Scientific, UK) under standard analytical conditions. Partial least squares regression (PLSR) was used for modeling of the relationship between spectra and measured soil properties. Prediction ability was evaluated using the R2, root mean square error (RMSE) and normalized root mean square deviation (NRMSD). The results showed the best prediction for Mn (R2 = 0.86, RMSE = 29, NRMSD = 0.11), Fe in ammonium oxalate extract (R2 = 0.82, RMSE = 171, NRMSD = 0.12) and organic matter content (R2 = 0.84, RMSE = 0.13, NRMSD = 0.09). The slightly worse prediction was obtained for Mn and Fe in citrate extract (R2 = 0.82, RMSE = 21, NRMSD = 0.10; R2 = 0.77, RMSE = 522, NRMSD = 0.23). Poor prediction was evaluated for Mn and Fe in Mehlich III extract (R2 = 0.43, RMSE = 13, NRMSD = 0.17; R2 = 0.39, RMSE = 13, NRMSD = 0.26). In general, the results confirmed that the measurement of soil spectral characteristics is a promising technology for a digital soil mapping and predicting studied soil properties. Acknowledgment: Authors acknowledge the financial support of the Ministry of Agriculture of the Czech Republic (grant No. QJ1230319) and the Czech Science Foundation (grant No. GA526/09/1762).
NASA Astrophysics Data System (ADS)
Camargo, Livia; Marques, José, Jr.
2014-05-01
Traditional technologies for measuring phosphorus adsorbed (Pads) and other soil attributes of agronomic importance are relatively unfeasible when aims to mapping large areas using the characterization of the spatial variability of soil attributes. These mappings need a large number of samples, which makes it expensive in mappings scale detail. This arouses in scientific society the need to develop methodologies able to assess these attributes within the landscape quickly, nondestructive, and not expensive. The diffuse reflectance spectroscopy (DRS) has been used to aid the characterization of soil attributes view of these requirements. In this sensing, the objective of this study was to evaluate the ability of DRS to estimate the Pads, clay, Fe extracted by dithionite-citrate-bicarbonate (Fedcb), contents of goethite (Gt) and hematite (Hm) and ratio Gt/(Gt + Hm) in Oxisols in The Northeastern State of São Paulo. Soil samples were collected in the transects each 25 m (100 samples). Geomorphic surfaces (GSs) were mapped in detail to support soil mapping. The soil in GS I was a Typic Hapludox, that in GS II a Typic Hapludox and Typic Eutrudox, and that in GS III a Typic Eutrudox. The soil samples were taken to the laboratory for chemical, physical and mineralogical analysis and DRS spectra were obtained over 380-2300 nm. Chemometric calibration and validation (using a one-out crossvalidation procedure) were done on absorbance measurements [Log10 (1/Reflectance)] by Partial least-squares regression (PLSR) analysis. The calibration accuracy was evaluated via the determination coefficient (R2), RMSE and the ratio performance deviation (RPD). The graph of Variable Importance in the Projection (VIP) for the Pad was built. The DRS was effective in predicting the attributes studied whereas the obtained models for the prediction of clay, Fedcb and Gt with greater accuracy (RPD> 1.4) were calibrated in the visible (380-800 nm) and to predict Pads, ratio Gt/(Gt + Hm) and Hm were calibrated in the visible + near infrared (801-2300 nm). The highest peaks of VIP for the Pads have been found in wavelengths: 480-580 nm and 780-980 nm which are assigned to crystalline iron oxides, mainly Gt and Hm. This result demonstrates the influence of these oxides on the P adsorption. In weathered soils, P adsorption is mainly correlated to iron oxides and aluminum clay fraction due phosphate interact with the functional groups of these oxides.
Spatial Assessment of Model Errors from Four Regression Techniques
Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove
2005-01-01
Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?
NASA Astrophysics Data System (ADS)
Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.
2016-12-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.
Developing a predictive tropospheric ozone model for Tabriz
NASA Astrophysics Data System (ADS)
Khatibi, Rahman; Naghipour, Leila; Ghorbani, Mohammad A.; Smith, Michael S.; Karimi, Vahid; Farhoudi, Reza; Delafrouz, Hadi; Arvanaghi, Hadi
2013-04-01
Predictive ozone models are becoming indispensable tools by providing a capability for pollution alerts to serve people who are vulnerable to the risks. We have developed a tropospheric ozone prediction capability for Tabriz, Iran, by using the following five modeling strategies: three regression-type methods: Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Gene Expression Programming (GEP); and two auto-regression-type models: Nonlinear Local Prediction (NLP) to implement chaos theory and Auto-Regressive Integrated Moving Average (ARIMA) models. The regression-type modeling strategies explain the data in terms of: temperature, solar radiation, dew point temperature, and wind speed, by regressing present ozone values to their past values. The ozone time series are available at various time intervals, including hourly intervals, from August 2010 to March 2011. The results for MLR, ANN and GEP models are not overly good but those produced by NLP and ARIMA are promising for the establishing a forecasting capability.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Sudarno
2018-05-01
The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Impact of multicollinearity on small sample hydrologic regression models
NASA Astrophysics Data System (ADS)
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Real estate value prediction using multivariate regression models
NASA Astrophysics Data System (ADS)
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Gurnani, Ashita S; John, Samantha E; Gavett, Brandon E
2015-05-01
The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins
Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.
2016-01-01
Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins
NASA Astrophysics Data System (ADS)
Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.
2016-12-01
Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Linear regression metamodeling as a tool to summarize and present simulation model results.
Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M
2013-10-01
Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
Weiss, Brandi A.; Dardick, William
2015-01-01
This article introduces an entropy-based measure of data–model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data–model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data–model fit to assess how well logistic regression models classify cases into observed categories. PMID:29795897
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression.
Weiss, Brandi A; Dardick, William
2016-12-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data-model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data-model fit to assess how well logistic regression models classify cases into observed categories.
Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree
de los Campos, Gustavo; Naya, Hugo; Gianola, Daniel; Crossa, José; Legarra, Andrés; Manfredi, Eduardo; Weigel, Kent; Cotes, José Miguel
2009-01-01
The availability of genomewide dense markers brings opportunities and challenges to breeding programs. An important question concerns the ways in which dense markers and pedigrees, together with phenotypic records, should be used to arrive at predictions of genetic values for complex traits. If a large number of markers are included in a regression model, marker-specific shrinkage of regression coefficients may be needed. For this reason, the Bayesian least absolute shrinkage and selection operator (LASSO) (BL) appears to be an interesting approach for fitting marker effects in a regression model. This article adapts the BL to arrive at a regression model where markers, pedigrees, and covariates other than markers are considered jointly. Connections between BL and other marker-based regression models are discussed, and the sensitivity of BL with respect to the choice of prior distributions assigned to key parameters is evaluated using simulation. The proposed model was fitted to two data sets from wheat and mouse populations, and evaluated using cross-validation methods. Results indicate that inclusion of markers in the regression further improved the predictive ability of models. An R program that implements the proposed model is freely available. PMID:19293140
Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K
2011-10-01
To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods including only regression or both regression and ranking constraints on clinical data. On high dimensional data, the former model performs better. However, this approach does not have a theoretical link with standard statistical models for survival data. This link can be made by means of transformation models when ranking constraints are included. Copyright © 2011 Elsevier B.V. All rights reserved.
Construction of mathematical model for measuring material concentration by colorimetric method
NASA Astrophysics Data System (ADS)
Liu, Bing; Gao, Lingceng; Yu, Kairong; Tan, Xianghua
2018-06-01
This paper use the method of multiple linear regression to discuss the data of C problem of mathematical modeling in 2017. First, we have established a regression model for the concentration of 5 substances. But only the regression model of the substance concentration of urea in milk can pass through the significance test. The regression model established by the second sets of data can pass the significance test. But this model exists serious multicollinearity. We have improved the model by principal component analysis. The improved model is used to control the system so that it is possible to measure the concentration of material by direct colorimetric method.
Developing and testing a global-scale regression model to quantify mean annual streamflow
NASA Astrophysics Data System (ADS)
Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.
2017-01-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
NASA Technical Reports Server (NTRS)
Parsons, Vickie s.
2009-01-01
The request to conduct an independent review of regression models, developed for determining the expected Launch Commit Criteria (LCC) External Tank (ET)-04 cycle count for the Space Shuttle ET tanking process, was submitted to the NASA Engineering and Safety Center NESC on September 20, 2005. The NESC team performed an independent review of regression models documented in Prepress Regression Analysis, Tom Clark and Angela Krenn, 10/27/05. This consultation consisted of a peer review by statistical experts of the proposed regression models provided in the Prepress Regression Analysis. This document is the consultation's final report.
Stochastic Approximation Methods for Latent Regression Item Response Models
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Using Weighted Least Squares Regression for Obtaining Langmuir Sorption Constants
USDA-ARS?s Scientific Manuscript database
One of the most commonly used models for describing phosphorus (P) sorption to soils is the Langmuir model. To obtain model parameters, the Langmuir model is fit to measured sorption data using least squares regression. Least squares regression is based on several assumptions including normally dist...
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan
2011-11-01
To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P < 0.0001) based on testing by the Lagrangemultiplier. Therefore, the over-dispersion dispersed data using a modified Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.
Geodesic least squares regression on information manifolds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be
We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply thismore » to scaling laws in magnetic confinement fusion.« less
Background stratified Poisson regression analysis of cohort data.
Richardson, David B; Langholz, Bryan
2012-03-01
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as 'nuisance' variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this 'conditional' regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for the verification data set decreased as the calibration data-set size decreased, but predictive accuracy was not as sensitive for the MAP?s as it was for the local regression models.
Accounting for measurement error in log regression models with applications to accelerated testing.
Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M
2018-01-01
In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
ERIC Educational Resources Information Center
Laird, Robert D.; Weems, Carl F.
2011-01-01
Research on informant discrepancies has increasingly utilized difference scores. This article demonstrates the statistical equivalence of regression models using difference scores (raw or standardized) and regression models using separate scores for each informant to show that interpretations should be consistent with both models. First,…
Slavchev, Aleksandar; Kovacs, Zoltan; Koshiba, Haruki; Nagai, Airi; Bázár, György; Krastanov, Albert; Kubota, Yousuke; Tsenkova, Roumiana
2015-01-01
Development of efficient screening method coupled with cell functionality evaluation is highly needed in contemporary microbiology. The presented novel concept and fast non-destructive method brings in to play the water spectral pattern of the solution as a molecular fingerprint of the cell culture system. To elucidate the concept, NIR spectroscopy with Aquaphotomics were applied to monitor the growth of sixteen Lactobacillus bulgaricus one Lactobacillus pentosus and one Lactobacillus gasseri bacteria strains. Their growth rate, maximal optical density, low pH and bile tolerances were measured and further used as a reference data for analysis of the simultaneously acquired spectral data. The acquired spectral data in the region of 1100-1850nm was subjected to various multivariate data analyses - PCA, OPLS-DA, PLSR. The results showed high accuracy of bacteria strains classification according to their probiotic strength. Most informative spectral fingerprints covered the first overtone of water, emphasizing the relation of water molecular system to cell functionality.
VenuGopal, K S; Cherita, Chris; Anu-Appaiah, K A
2018-03-01
The role of grape seed tannins on improving organoleptic properties and its involvement in color stabilization in red wine are well established. The addition of grape seeds as the source of condensed tannins in fruit wine may provide a solution for its color instability and improvement of sensory attributes. Syzgium cumini is traditionally known for its therapeutic properties. In the current study, the influence of yeasts and grape seed addition during fermentation on the chromatic, phenolic and sensory attributes of the wine was accessed. Grape seed addition improved the color characteristics of wine and increased overall phenolic composition. Analysis by HPLC revealed 6 major anthocyanins, among which 3, 5-diglucoside form of delphidin and petunidin was found to be the major components. Cluster and PLSR analysis explained the impact of seed addition on the yeasts, as well as on the perception of panelists, with bitterness and astringency as the dominating attributes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.
2009-01-01
In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
Population heterogeneity in the salience of multiple risk factors for adolescent delinquency.
Lanza, Stephanie T; Cooper, Brittany R; Bray, Bethany C
2014-03-01
To present mixture regression analysis as an alternative to more standard regression analysis for predicting adolescent delinquency. We demonstrate how mixture regression analysis allows for the identification of population subgroups defined by the salience of multiple risk factors. We identified population subgroups (i.e., latent classes) of individuals based on their coefficients in a regression model predicting adolescent delinquency from eight previously established risk indices drawn from the community, school, family, peer, and individual levels. The study included N = 37,763 10th-grade adolescents who participated in the Communities That Care Youth Survey. Standard, zero-inflated, and mixture Poisson and negative binomial regression models were considered. Standard and mixture negative binomial regression models were selected as optimal. The five-class regression model was interpreted based on the class-specific regression coefficients, indicating that risk factors had varying salience across classes of adolescents. Standard regression showed that all risk factors were significantly associated with delinquency. Mixture regression provided more nuanced information, suggesting a unique set of risk factors that were salient for different subgroups of adolescents. Implications for the design of subgroup-specific interventions are discussed. Copyright © 2014 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi
2015-12-01
The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.
Regression Models for Identifying Noise Sources in Magnetic Resonance Images
Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.
2009-01-01
Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478
NASA Astrophysics Data System (ADS)
Bae, Gihyun; Huh, Hoon; Park, Sungho
This paper deals with a regression model for light weight and crashworthiness enhancement design of automotive parts in frontal car crash. The ULSAB-AVC model is employed for the crash analysis and effective parts are selected based on the amount of energy absorption during the crash behavior. Finite element analyses are carried out for designated design cases in order to investigate the crashworthiness and weight according to the material and thickness of main energy absorption parts. Based on simulations results, a regression analysis is performed to construct a regression model utilized for light weight and crashworthiness enhancement design of automotive parts. An example for weight reduction of main energy absorption parts demonstrates the validity of a regression model constructed.
Army College Fund Cost-Effectiveness Study
1990-11-01
Section A.2 presents a theory of enlistment supply to provide a basis for specifying the regression model , The model Is specified in Section A.3, which...Supplementary materials are included in the final four sections. Section A.6 provides annual trends in the regression model variables. Estimates of the model ...millions, A.S. ESTIMATION OF A YOUTH EARNINGS FORECASTING MODEL Civilian pay is an important explanatory variable in the regression model . Previous
RRegrs: an R package for computer-aided model selection with multiple regression models.
Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L
2015-01-01
Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
ERIC Educational Resources Information Center
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
ERIC Educational Resources Information Center
Weiss, Brandi A.; Dardick, William
2016-01-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify…
Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models
ERIC Educational Resources Information Center
Shieh, Gwowen
2009-01-01
In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…
Building Regression Models: The Importance of Graphics.
ERIC Educational Resources Information Center
Dunn, Richard
1989-01-01
Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)
Testing Different Model Building Procedures Using Multiple Regression.
ERIC Educational Resources Information Center
Thayer, Jerome D.
The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…
A consistent framework for Horton regression statistics that leads to a modified Hack's law
Furey, P.R.; Troutman, B.M.
2008-01-01
A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.
Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Volden, Thomas R.
2012-01-01
An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.
Ardoino, Ilaria; Lanzoni, Monica; Marano, Giuseppe; Boracchi, Patrizia; Sagrini, Elisabetta; Gianstefani, Alice; Piscaglia, Fabio; Biganzoli, Elia M
2017-04-01
The interpretation of regression models results can often benefit from the generation of nomograms, 'user friendly' graphical devices especially useful for assisting the decision-making processes. However, in the case of multinomial regression models, whenever categorical responses with more than two classes are involved, nomograms cannot be drawn in the conventional way. Such a difficulty in managing and interpreting the outcome could often result in a limitation of the use of multinomial regression in decision-making support. In the present paper, we illustrate the derivation of a non-conventional nomogram for multinomial regression models, intended to overcome this issue. Although it may appear less straightforward at first sight, the proposed methodology allows an easy interpretation of the results of multinomial regression models and makes them more accessible for clinicians and general practitioners too. Development of prediction model based on multinomial logistic regression and of the pertinent graphical tool is illustrated by means of an example involving the prediction of the extent of liver fibrosis in hepatitis C patients by routinely available markers.
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Sumantari, Y. D.; Slamet, I.; Sugiyanto
2017-06-01
Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.
Ham, Joo-ho; Park, Hun-Young; Kim, Youn-ho; Bae, Sang-kon; Ko, Byung-hoon
2017-01-01
[Purpose] The purpose of this study was to develop a regression model to estimate the heart rate at the lactate threshold (HRLT) and the heart rate at the ventilatory threshold (HRVT) using the heart rate threshold (HRT), and to test the validity of the regression model. [Methods] We performed a graded exercise test with a treadmill in 220 normal individuals (men: 112, women: 108) aged 20–59 years. HRT, HRLT, and HRVT were measured in all subjects. A regression model was developed to estimate HRLT and HRVT using HRT with 70% of the data (men: 79, women: 76) through randomization (7:3), with the Bernoulli trial. The validity of the regression model developed with the remaining 30% of the data (men: 33, women: 32) was also examined. [Results] Based on the regression coefficient, we found that the independent variable HRT was a significant variable in all regression models. The adjusted R2 of the developed regression models averaged about 70%, and the standard error of estimation of the validity test results was 11 bpm, which is similar to that of the developed model. [Conclusion] These results suggest that HRT is a useful parameter for predicting HRLT and HRVT. PMID:29036765
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Ham, Joo-Ho; Park, Hun-Young; Kim, Youn-Ho; Bae, Sang-Kon; Ko, Byung-Hoon; Nam, Sang-Seok
2017-09-30
The purpose of this study was to develop a regression model to estimate the heart rate at the lactate threshold (HRLT) and the heart rate at the ventilatory threshold (HRVT) using the heart rate threshold (HRT), and to test the validity of the regression model. We performed a graded exercise test with a treadmill in 220 normal individuals (men: 112, women: 108) aged 20-59 years. HRT, HRLT, and HRVT were measured in all subjects. A regression model was developed to estimate HRLT and HRVT using HRT with 70% of the data (men: 79, women: 76) through randomization (7:3), with the Bernoulli trial. The validity of the regression model developed with the remaining 30% of the data (men: 33, women: 32) was also examined. Based on the regression coefficient, we found that the independent variable HRT was a significant variable in all regression models. The adjusted R2 of the developed regression models averaged about 70%, and the standard error of estimation of the validity test results was 11 bpm, which is similar to that of the developed model. These results suggest that HRT is a useful parameter for predicting HRLT and HRVT. ©2017 The Korean Society for Exercise Nutrition
Hu, Wenbiao; Tong, Shilu; Mengersen, Kerrie; Connell, Des
2007-09-01
Few studies have examined the relationship between weather variables and cryptosporidiosis in Australia. This paper examines the potential impact of weather variability on the transmission of cryptosporidiosis and explores the possibility of developing an empirical forecast system. Data on weather variables, notified cryptosporidiosis cases, and population size in Brisbane were supplied by the Australian Bureau of Meteorology, Queensland Department of Health, and Australian Bureau of Statistics for the period of January 1, 1996-December 31, 2004, respectively. Time series Poisson regression and seasonal auto-regression integrated moving average (SARIMA) models were performed to examine the potential impact of weather variability on the transmission of cryptosporidiosis. Both the time series Poisson regression and SARIMA models show that seasonal and monthly maximum temperature at a prior moving average of 1 and 3 months were significantly associated with cryptosporidiosis disease. It suggests that there may be 50 more cases a year for an increase of 1 degrees C maximum temperature on average in Brisbane. Model assessments indicated that the SARIMA model had better predictive ability than the Poisson regression model (SARIMA: root mean square error (RMSE): 0.40, Akaike information criterion (AIC): -12.53; Poisson regression: RMSE: 0.54, AIC: -2.84). Furthermore, the analysis of residuals shows that the time series Poisson regression appeared to violate a modeling assumption, in that residual autocorrelation persisted. The results of this study suggest that weather variability (particularly maximum temperature) may have played a significant role in the transmission of cryptosporidiosis. A SARIMA model may be a better predictive model than a Poisson regression model in the assessment of the relationship between weather variability and the incidence of cryptosporidiosis.
ERIC Educational Resources Information Center
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
ERIC Educational Resources Information Center
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
ℓ(p)-Norm multikernel learning approach for stock market price forecasting.
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.
Multiple-Instance Regression with Structured Data
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Lane, Terran; Roper, Alex
2008-01-01
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Christensen, A L; Lundbye-Christensen, S; Dethlefsen, C
2011-12-01
Several statistical methods of assessing seasonal variation are available. Brookhart and Rothman [3] proposed a second-order moment-based estimator based on the geometrical model derived by Edwards [1], and reported that this estimator is superior in estimating the peak-to-trough ratio of seasonal variation compared with Edwards' estimator with respect to bias and mean squared error. Alternatively, seasonal variation may be modelled using a Poisson regression model, which provides flexibility in modelling the pattern of seasonal variation and adjustments for covariates. Based on a Monte Carlo simulation study three estimators, one based on the geometrical model, and two based on log-linear Poisson regression models, were evaluated in regards to bias and standard deviation (SD). We evaluated the estimators on data simulated according to schemes varying in seasonal variation and presence of a secular trend. All methods and analyses in this paper are available in the R package Peak2Trough[13]. Applying a Poisson regression model resulted in lower absolute bias and SD for data simulated according to the corresponding model assumptions. Poisson regression models had lower bias and SD for data simulated to deviate from the corresponding model assumptions than the geometrical model. This simulation study encourages the use of Poisson regression models in estimating the peak-to-trough ratio of seasonal variation as opposed to the geometrical model. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
NASA Astrophysics Data System (ADS)
Baptiste Barré, Jean; Bourrier, Franck; Bertrand, David; Rey, Freddy
2015-04-01
Ecological engineering corresponds to the design of efficient solutions for protection against natural hazards such as shallow landslides and soil erosion. In particular, bioengineering structures can be composed of a living part, made of plants, cuttings or seeds, and an inert part, a timber logs structure. As wood is not treated by preservatives, fungal degradation can occur from the start of the construction. It results in wood strength loss, which practitioners try to evaluate with non-destructive tools (NDT). Classical NDT are mainly based on density measurements. However, the fungal activity reduces the mechanical properties (modulus of elasticity - MOE) well before well before a density change could be measured. In this context, it would be useful to provide a tool for assessing the residual mechanical strength at different decay stages due to a fungal community. Near-infrared spectroscopy (NIRS) can be used for that purpose, as it can allow evaluating wood mechanical properties as well as wood chemical changes due to brown and white rots. We monitored 160 silver fir samples (30x30x6000mm) from green state to different levels of decay. The degradation process took place in a greenhouse and samples were inoculated with silver fir decayed debris in order to accelerate the process. For each sample, we calculated the normalized bending modulus of elasticity loss (Dw moe) and defined it as decay extent. Near infrared spectra collected from both green and decayed ground samples were corrected by the subtraction of baseline offset. Spectra of green samples were averaged into one mean spectrum and decayed spectra were subtracted from the mean spectrum to calculate the absorption loss. Partial least square regression (PLSR) has been performed between the normalized MOE loss Dw moe (0 < Dw moe < 1) and the absorption loss, with a correlation coefficient R² equal to 0.85. Finally, the prediction of silver fir biodegradation rate by NIRS was significant (RMSEP = 0.13). This tool improves the evaluation accuracy of wood decay extent in the context of ecological engineering structures used for natural hazard mitigation.
Floating Data and the Problem with Illustrating Multiple Regression.
ERIC Educational Resources Information Center
Sachau, Daniel A.
2000-01-01
Discusses how to introduce basic concepts of multiple regression by creating a large-scale, three-dimensional regression model using the classroom walls and floor. Addresses teaching points that should be covered and reveals student reaction to the model. Finds that the greatest benefit of the model is the low fear, walk-through, nonmathematical…
Regression Model Optimization for the Analysis of Experimental Data
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard
2016-10-01
In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.
Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul
2015-11-04
Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
An Expert System for the Evaluation of Cost Models
1990-09-01
contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
Variable selection and model choice in geoadditive regression models.
Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard
2009-06-01
Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
Bayesian isotonic density regression
Wang, Lianming; Dunson, David B.
2011-01-01
Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distributions, and are well supported in many applications. A rich variety of Bayesian methods have been proposed for density regression, but it is not clear whether such priors have full support so that any true data-generating model can be accurately approximated. This article develops a new class of density regression models that incorporate stochastic-ordering constraints which are natural when a response tends to increase or decrease monotonely with a predictor. Theory is developed showing large support. Methods are developed for hypothesis testing, with posterior computation relying on a simple Gibbs sampler. Frequentist properties are illustrated in a simulation study, and an epidemiology application is considered. PMID:22822259
Linear regression crash prediction models : issues and proposed solutions.
DOT National Transportation Integrated Search
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Should metacognition be measured by logistic regression?
Rausch, Manuel; Zehetleitner, Michael
2017-03-01
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.
Modelling of capital asset pricing by considering the lagged effects
NASA Astrophysics Data System (ADS)
Sukono; Hidayat, Y.; Bon, A. Talib bin; Supian, S.
2017-01-01
In this paper the problem of modelling the Capital Asset Pricing Model (CAPM) with the effect of the lagged is discussed. It is assumed that asset returns are analysed influenced by the market return and the return of risk-free assets. To analyse the relationship between asset returns, the market return, and the return of risk-free assets, it is conducted by using a regression equation of CAPM, and regression equation of lagged distributed CAPM. Associated with the regression equation lagged CAPM distributed, this paper also developed a regression equation of Koyck transformation CAPM. Results of development show that the regression equation of Koyck transformation CAPM has advantages, namely simple as it only requires three parameters, compared with regression equation of lagged distributed CAPM.
Zhu, Yu; Xia, Jie-lai; Wang, Jing
2009-09-01
Application of the 'single auto regressive integrated moving average (ARIMA) model' and the 'ARIMA-generalized regression neural network (GRNN) combination model' in the research of the incidence of scarlet fever. Establish the auto regressive integrated moving average model based on the data of the monthly incidence on scarlet fever of one city, from 2000 to 2006. The fitting values of the ARIMA model was used as input of the GRNN, and the actual values were used as output of the GRNN. After training the GRNN, the effect of the single ARIMA model and the ARIMA-GRNN combination model was then compared. The mean error rate (MER) of the single ARIMA model and the ARIMA-GRNN combination model were 31.6%, 28.7% respectively and the determination coefficient (R(2)) of the two models were 0.801, 0.872 respectively. The fitting efficacy of the ARIMA-GRNN combination model was better than the single ARIMA, which had practical value in the research on time series data such as the incidence of scarlet fever.
Mixed conditional logistic regression for habitat selection studies.
Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas
2010-05-01
1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.
Chai, Rui; Xu, Li-Sheng; Yao, Yang; Hao, Li-Ling; Qi, Lin
2017-01-01
This study analyzed ascending branch slope (A_slope), dicrotic notch height (Hn), diastolic area (Ad) and systolic area (As) diastolic blood pressure (DBP), systolic blood pressure (SBP), pulse pressure (PP), subendocardial viability ratio (SEVR), waveform parameter (k), stroke volume (SV), cardiac output (CO), and peripheral resistance (RS) of central pulse wave invasively and non-invasively measured. Invasively measured parameters were compared with parameters measured from brachial pulse waves by regression model and transfer function model. Accuracy of parameters estimated by regression and transfer function model, was compared too. Findings showed that k value, central pulse wave and brachial pulse wave parameters invasively measured, correlated positively. Regression model parameters including A_slope, DBP, SEVR, and transfer function model parameters had good consistency with parameters invasively measured. They had same effect of consistency. SBP, PP, SV, and CO could be calculated through the regression model, but their accuracies were worse than that of transfer function model.
ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561
Analyzing hospitalization data: potential limitations of Poisson regression.
Weaver, Colin G; Ravani, Pietro; Oliver, Matthew J; Austin, Peter C; Quinn, Robert R
2015-08-01
Poisson regression is commonly used to analyze hospitalization data when outcomes are expressed as counts (e.g. number of days in hospital). However, data often violate the assumptions on which Poisson regression is based. More appropriate extensions of this model, while available, are rarely used. We compared hospitalization data between 206 patients treated with hemodialysis (HD) and 107 treated with peritoneal dialysis (PD) using Poisson regression and compared results from standard Poisson regression with those obtained using three other approaches for modeling count data: negative binomial (NB) regression, zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression. We examined the appropriateness of each model and compared the results obtained with each approach. During a mean 1.9 years of follow-up, 183 of 313 patients (58%) were never hospitalized (indicating an excess of 'zeros'). The data also displayed overdispersion (variance greater than mean), violating another assumption of the Poisson model. Using four criteria, we determined that the NB and ZINB models performed best. According to these two models, patients treated with HD experienced similar hospitalization rates as those receiving PD {NB rate ratio (RR): 1.04 [bootstrapped 95% confidence interval (CI): 0.49-2.20]; ZINB summary RR: 1.21 (bootstrapped 95% CI 0.60-2.46)}. Poisson and ZIP models fit the data poorly and had much larger point estimates than the NB and ZINB models [Poisson RR: 1.93 (bootstrapped 95% CI 0.88-4.23); ZIP summary RR: 1.84 (bootstrapped 95% CI 0.88-3.84)]. We found substantially different results when modeling hospitalization data, depending on the approach used. Our results argue strongly for a sound model selection process and improved reporting around statistical methods used for modeling count data. © The Author 2015. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Predicting School Enrollments Using the Modified Regression Technique.
ERIC Educational Resources Information Center
Grip, Richard S.; Young, John W.
This report is based on a study in which a regression model was constructed to increase accuracy in enrollment predictions. A model, known as the Modified Regression Technique (MRT), was used to examine K-12 enrollment over the past 20 years in 2 New Jersey school districts of similar size and ethnicity. To test the model's accuracy, MRT was…
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert M.
2013-01-01
A new regression model search algorithm was developed that may be applied to both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The algorithm is a simplified version of a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. The new algorithm performs regression model term reduction to prevent overfitting of data. It has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a regression model search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression model. Therefore, the simplified algorithm is not intended to replace the original algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new search algorithm.
Chai Rui; Li Si-Man; Xu Li-Sheng; Yao Yang; Hao Li-Ling
2017-07-01
This study mainly analyzed the parameters such as ascending branch slope (A_slope), dicrotic notch height (Hn), diastolic area (Ad) and systolic area (As) diastolic blood pressure (DBP), systolic blood pressure (SBP), pulse pressure (PP), subendocardial viability ratio (SEVR), waveform parameter (k), stroke volume (SV), cardiac output (CO) and peripheral resistance (RS) of central pulse wave invasively and non-invasively measured. These parameters extracted from the central pulse wave invasively measured were compared with the parameters measured from the brachial pulse waves by a regression model and a transfer function model. The accuracy of the parameters which were estimated by the regression model and the transfer function model was compared too. Our findings showed that in addition to the k value, the above parameters of the central pulse wave and the brachial pulse wave invasively measured had positive correlation. Both the regression model parameters including A_slope, DBP, SEVR and the transfer function model parameters had good consistency with the parameters invasively measured, and they had the same effect of consistency. The regression equations of the three parameters were expressed by Y'=a+bx. The SBP, PP, SV, CO of central pulse wave could be calculated through the regression model, but their accuracies were worse than that of transfer function model.
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.
Ferrari, Alberto; Comelli, Mario
2016-12-01
In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Mitra, Ashis; Majumdar, Prabal Kumar; Bannerjee, Debamalya
2013-03-01
This paper presents a comparative analysis of two modeling methodologies for the prediction of air permeability of plain woven handloom cotton fabrics. Four basic fabric constructional parameters namely ends per inch, picks per inch, warp count and weft count have been used as inputs for artificial neural network (ANN) and regression models. Out of the four regression models tried, interaction model showed very good prediction performance with a meager mean absolute error of 2.017 %. However, ANN models demonstrated superiority over the regression models both in terms of correlation coefficient and mean absolute error. The ANN model with 10 nodes in the single hidden layer showed very good correlation coefficient of 0.982 and 0.929 and mean absolute error of only 0.923 and 2.043 % for training and testing data respectively.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nimbalkar, Sachin U.; Wenning, Thomas J.; Guo, Wei
In the United States, manufacturing facilities account for about 32% of total domestic energy consumption in 2014. Robust energy tracking methodologies are critical to understanding energy performance in manufacturing facilities. Due to its simplicity and intuitiveness, the classic energy intensity method (i.e. the ratio of total energy use over total production) is the most widely adopted. However, the classic energy intensity method does not take into account the variation of other relevant parameters (i.e. product type, feed stock type, weather, etc.). Furthermore, the energy intensity method assumes that the facilities’ base energy consumption (energy use at zero production) is zero,more » which rarely holds true. Therefore, it is commonly recommended to utilize regression models rather than the energy intensity approach for tracking improvements at the facility level. Unfortunately, many energy managers have difficulties understanding why regression models are statistically better than utilizing the classic energy intensity method. While anecdotes and qualitative information may convince some, many have major reservations about the accuracy of regression models and whether it is worth the time and effort to gather data and build quality regression models. This paper will explain why regression models are theoretically and quantitatively more accurate for tracking energy performance improvements. Based on the analysis of data from 114 manufacturing plants over 12 years, this paper will present quantitative results on the importance of utilizing regression models over the energy intensity methodology. This paper will also document scenarios where regression models do not have significant relevance over the energy intensity method.« less
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
ERIC Educational Resources Information Center
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Retargeted Least Squares Regression Algorithm.
Zhang, Xu-Yao; Wang, Lingfeng; Xiang, Shiming; Liu, Cheng-Lin
2015-09-01
This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Quantitative detection of settled coal dust over green canopy
NASA Astrophysics Data System (ADS)
Brook, Anna; Sahar, Nir
2017-04-01
The main task of environmental and geoscience applications are efficient and accurate quantitative classification of earth surfaces and spatial phenomena. In the past decade, there has been a significant interest in employing spectral unmixing in order to retrieve accurate quantitative information latent in in situ data. Recently, the ground-truth and laboratory measured spectral signatures promoted by advanced algorithms are proposed as a new path toward solving the unmixing problem in semi-supervised fashion. This study presents a practical implementation of field spectroscopy as a quantitative tool to detect settled coal dust over green canopy in free/open environment. Coal dust is a fine powdered form of coal, which is created by the crushing, grinding, and pulverizing of coal. Since the inelastic nature of coal, coal dust can be created during transportation, or by mechanically handling coal. Coal dust, categorized at silt-clay particle size, of particular concern due to heavy metals (lead, mercury, nickel, tin, cadmium, mercury, antimony, arsenic, isotopes of thorium and strontium) which are toxic also at low concentrations. This hazard exposes risk on both environment and public health. It has been identified by medical scientist around the world as causing a range of diseases and health problems, mainly heart and respiratory diseases like asthma and lung cancer. It is due to the fact that the fine invisible coal dust particles (less than 2.5 microns) long lodge in the lungs and are not naturally expelled, so long-term exposure increases the risk of health problems. Numerus studies reported that data to conduct study of geographic distribution of the very fine coal dust (smaller than PM 2.5) and related health impacts from coal exports, is not being collected. Sediment dust load in an indoor environment can be spectrally assessed using reflectance spectroscopy (Chudnovsky and Ben-Dor, 2009). Small amounts of particulate pollution that may carry a signature of a forthcoming environmental hazard are of key interest when considering the effects of pollution. According to the most basic distribution dynamics, dust consists of suspended particulate matter in a fine state of subdivision that are raised and carried by wind. In this context, it is increasingly important to first, understand the distribution dynamics of pollutants, and subsequently develop dedicated tools and measures to control and monitor pollutants in the free environment. The earliest effect of settled polluted dust particles is not always reflected through poor conditions of vegetation or soils, or any visible damages. In most of the cases, it has a quite long accumulation process that graduates from a polluted condition to long-term environmental and health related hazard. Although conducted experiments with pollutant analog powders under controlled conditions have tended to con- firm the findings from field studies (Brook, 2014; Brook and Ben-Dor 2016; Brook, 2016), a major criticism of all these experiments is their short duration. The resulting conclusion is that it is difficult, if not impossible, to determine the implications of long-term exposure to realistic concentrations of pollutants from such short-term studies. In general, the task of unmixing is to decompose the reflectance spectrum into a set of endmembers or principal combined spectra and their corresponding abundances (Bioucas-Dias et al., 2012). This study suggests that the sensitivity of sparse unmixing techniques provides an ideal approach to extract and identify coal dust settled over/upon green vegetation canopy using in situ spectral data collected by portable spectrometer. The optimal NMF algorithms, such as ALS and LPG, are assumed to be the simplest methods that achieve the minimum error. The suggested practical approach includes the following stages: 1. In situ spectral measurements, 2. Near-real-time spectral data analysis, 3. Estimated concentration of coal dust reported as mg/sq m. The stage 2 is completed by calculating: 1. Unmixing between the green canopy and the settle dust extraction only coal dust fraction, 2. Converting spectral feature of coal dust to concentration via PLSR spectral model. The spectral model was trained and validated PLSR model developed at laboratory using spectra across MIR (FTIR reflectance spectra) and NIR regions and XRD analysis. The obtained RMSE was satisfying for both spectral regions. Thus, it was concluded that field spectroscopy can be used for this purpose, and it can provide fully quantitative measures of settle coal dust. Nowadays this approach (both spectrometer and algorithm) has been accepted as a practical operational tool for environmental monitoring near power station Orot Rabin in Hadera and will be used by the Sharon-Carmel Districts Municipal Association for Environmental Protection, Israel as a regulatory tool. In summary, this work shows that coal dust can be assessed using in situ spectroscopy, making it a potentially powerful tool for environmental studies. References Chudnovsky, A., & Ben-Dor, E. (2009). Reflectance spectroscopy as a tool for settled dust monitoring in office environment. International Journal of Environment and Waste Management, 4(1), 32-49. Brook, A. (2014). Quantitative Detection of Settled dust over Green Canopy using Sparse Unmixing of Airborne Hyperspectral Data. IEEE-Whispers 6th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing, 2014, Switzerland, 4-8. Brook, A. and Ben-Dor, E. (2016). Quantitative detection of settled dust over Green Canopy using sparse unmixing of airborne hyperspectral data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(2), pp.884-897. Brook, A. (2016). Quantitative Detection and Long-Term Monitoring of Settle Dust Using Semisupervised Learning for Spectral Data. Water, Air, & Soil Pollution, 227(3), pp.1-9. Bioucas-Dias, J.M., Plaza, A., Dobigeon, N., Parente, M., Du, Q., Gader, P. and Chanussot, J. (2012). Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2), pp.354-379. Keshava, N., Mustard, J. (2002). Spectral unmixing. IEEE Signal Process. Mag., 19(1), 44-57. Bioucas-Dias et al. (2012). Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(2), 354 -379.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Prediction models for clustered data: comparison of a random intercept and standard regression model
2013-01-01
Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436
Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne
2013-02-15
When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.
Functional mixture regression.
Yao, Fang; Fu, Yuejiao; Lee, Thomas C M
2011-04-01
In functional linear models (FLMs), the relationship between the scalar response and the functional predictor process is often assumed to be identical for all subjects. Motivated by both practical and methodological considerations, we relax this assumption and propose a new class of functional regression models that allow the regression structure to vary for different groups of subjects. By projecting the predictor process onto its eigenspace, the new functional regression model is simplified to a framework that is similar to classical mixture regression models. This leads to the proposed approach named as functional mixture regression (FMR). The estimation of FMR can be readily carried out using existing software implemented for functional principal component analysis and mixture regression. The practical necessity and performance of FMR are illustrated through applications to a longevity analysis of female medflies and a human growth study. Theoretical investigations concerning the consistent estimation and prediction properties of FMR along with simulation experiments illustrating its empirical properties are presented in the supplementary material available at Biostatistics online. Corresponding results demonstrate that the proposed approach could potentially achieve substantial gains over traditional FLMs.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Categorical regression dose-response modeling
The goal of this training is to provide participants with training on the use of the U.S. EPA’s Categorical Regression soft¬ware (CatReg) and its application to risk assessment. Categorical regression fits mathematical models to toxicity data that have been assigned ord...
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
ERIC Educational Resources Information Center
Berenson, Mark L.
2013-01-01
There is consensus in the statistical literature that severe departures from its assumptions invalidate the use of regression modeling for purposes of inference. The assumptions of regression modeling are usually evaluated subjectively through visual, graphic displays in a residual analysis but such an approach, taken alone, may be insufficient…
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers
NASA Technical Reports Server (NTRS)
Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.
2010-01-01
This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.
Kaneko, Hiromasa; Funatsu, Kimito
2013-09-23
We propose predictive performance criteria for nonlinear regression models without cross-validation. The proposed criteria are the determination coefficient and the root-mean-square error for the midpoints between k-nearest-neighbor data points. These criteria can be used to evaluate predictive ability after the regression models are updated, whereas cross-validation cannot be performed in such a situation. The proposed method is effective and helpful in handling big data when cross-validation cannot be applied. By analyzing data from numerical simulations and quantitative structural relationships, we confirm that the proposed criteria enable the predictive ability of the nonlinear regression models to be appropriately quantified.
Robust inference under the beta regression model with application to health care studies.
Ghosh, Abhik
2017-01-01
Data on rates, percentages, or proportions arise frequently in many different applied disciplines like medical biology, health care, psychology, and several others. In this paper, we develop a robust inference procedure for the beta regression model, which is used to describe such response variables taking values in (0, 1) through some related explanatory variables. In relation to the beta regression model, the issue of robustness has been largely ignored in the literature so far. The existing maximum likelihood-based inference has serious lack of robustness against outliers in data and generate drastically different (erroneous) inference in the presence of data contamination. Here, we develop the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model along with several applications. We derive their asymptotic properties and describe their robustness theoretically through the influence function analyses. Finite sample performances of the proposed estimators and tests are examined through suitable simulation studies and real data applications in the context of health care and psychology. Although we primarily focus on the beta regression models with a fixed dispersion parameter, some indications are also provided for extension to the variable dispersion beta regression models with an application.
Differentially private distributed logistic regression using private and public data.
Ji, Zhanglong; Jiang, Xiaoqian; Wang, Shuang; Xiong, Li; Ohno-Machado, Lucila
2014-01-01
Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
Nonlinear-regression flow model of the Gulf Coast aquifer systems in the south-central United States
Kuiper, L.K.
1994-01-01
A multiple-regression methodology was used to help answer questions concerning model reliability, and to calibrate a time-dependent variable-density ground-water flow model of the gulf coast aquifer systems in the south-central United States. More than 40 regression models with 2 to 31 regressions parameters are used and detailed results are presented for 12 of the models. More than 3,000 values for grid-element volume-averaged head and hydraulic conductivity are used for the regression model observations. Calculated prediction interval half widths, though perhaps inaccurate due to a lack of normality of the residuals, are the smallest for models with only four regression parameters. In addition, the root-mean weighted residual decreases very little with an increase in the number of regression parameters. The various models showed considerable overlap between the prediction inter- vals for shallow head and hydraulic conductivity. Approximate 95-percent prediction interval half widths for volume-averaged freshwater head exceed 108 feet; for volume-averaged base 10 logarithm hydraulic conductivity, they exceed 0.89. All of the models are unreliable for the prediction of head and ground-water flow in the deeper parts of the aquifer systems, including the amount of flow coming from the underlying geopressured zone. Truncating the domain of solution of one model to exclude that part of the system having a ground-water density greater than 1.005 grams per cubic centimeter or to exclude that part of the systems below a depth of 3,000 feet, and setting the density to that of freshwater does not appreciably change the results for head and ground-water flow, except for locations close to the truncation surface.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
NASA Astrophysics Data System (ADS)
Mei, Zhixiong; Wu, Hao; Li, Shiyun
2018-06-01
The Conversion of Land Use and its Effects at Small regional extent (CLUE-S), which is a widely used model for land-use simulation, utilizes logistic regression to estimate the relationships between land use and its drivers, and thus, predict land-use change probabilities. However, logistic regression disregards possible spatial autocorrelation and self-organization in land-use data. Autologistic regression can depict spatial autocorrelation but cannot address self-organization, while logistic regression by considering only self-organization (NElogistic regression) fails to capture spatial autocorrelation. Therefore, this study developed a regression (NE-autologistic regression) method, which incorporated both spatial autocorrelation and self-organization, to improve CLUE-S. The Zengcheng District of Guangzhou, China was selected as the study area. The land-use data of 2001, 2005, and 2009, as well as 10 typical driving factors, were used to validate the proposed regression method and the improved CLUE-S model. Then, three future land-use scenarios in 2020: the natural growth scenario, ecological protection scenario, and economic development scenario, were simulated using the improved model. Validation results showed that NE-autologistic regression performed better than logistic regression, autologistic regression, and NE-logistic regression in predicting land-use change probabilities. The spatial allocation accuracy and kappa values of NE-autologistic-CLUE-S were higher than those of logistic-CLUE-S, autologistic-CLUE-S, and NE-logistic-CLUE-S for the simulations of two periods, 2001-2009 and 2005-2009, which proved that the improved CLUE-S model achieved the best simulation and was thereby effective to a certain extent. The scenario simulation results indicated that under all three scenarios, traffic land and residential/industrial land would increase, whereas arable land and unused land would decrease during 2009-2020. Apparent differences also existed in the simulated change sizes and locations of each land-use type under different scenarios. The results not only demonstrate the validity of the improved model but also provide a valuable reference for relevant policy-makers.
Time series modeling by a regression approach based on a latent process.
Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice
2009-01-01
Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.