Zollanvari, Amin; Dougherty, Edward R
2014-06-01
The most important aspect of any classifier is its error rate, because this quantifies its predictive capacity. Thus, the accuracy of error estimation is critical. Error estimation is problematic in small-sample classifier design because the error must be estimated using the same data from which the classifier has been designed. Use of prior knowledge, in the form of a prior distribution on an uncertainty class of feature-label distributions to which the true, but unknown, feature-distribution belongs, can facilitate accurate error estimation (in the mean-square sense) in circumstances where accurate completely model-free error estimation is impossible. This paper provides analytic asymptotically exact finite-sample approximations for various performance metrics of the resulting Bayesian Minimum Mean-Square-Error (MMSE) error estimator in the case of linear discriminant analysis (LDA) in the multivariate Gaussian model. These performance metrics include the first, second, and cross moments of the Bayesian MMSE error estimator with the true error of LDA, and therefore, the Root-Mean-Square (RMS) error of the estimator. We lay down the theoretical groundwork for Kolmogorov double-asymptotics in a Bayesian setting, which enables us to derive asymptotic expressions of the desired performance metrics. From these we produce analytic finite-sample approximations and demonstrate their accuracy via numerical examples. Various examples illustrate the behavior of these approximations and their use in determining the necessary sample size to achieve a desired RMS. The Supplementary Material contains derivations for some equations and added figures.
Optimum nonparametric estimation of population density based on ordered distances
Patil, S.A.; Kovner, J.L.; Burnham, Kenneth P.
1982-01-01
The asymptotic mean and error mean square are determined for the nonparametric estimator of plant density by distance sampling proposed by Patil, Burnham and Kovner (1979, Biometrics 35, 597-604. On the basis of these formulae, a bias-reduced version of this estimator is given, and its specific form is determined which gives minimum mean square error under varying assumptions about the true probability density function of the sampled data. Extension is given to line-transect sampling.
Medina, K.D.; Tasker, Gary D.
1987-01-01
This report documents the results of an analysis of the surface-water data network in Kansas for its effectiveness in providing regional streamflow information. The network was analyzed using generalized least squares regression. The correlation and time-sampling error of the streamflow characteristic are considered in the generalized least squares method. Unregulated medium-, low-, and high-flow characteristics were selected to be representative of the regional information that can be obtained from streamflow-gaging-station records for use in evaluating the effectiveness of continuing the present network stations, discontinuing some stations, and (or) adding new stations. The analysis used streamflow records for all currently operated stations that were not affected by regulation and for discontinued stations for which unregulated flow characteristics, as well as physical and climatic characteristics, were available. The State was divided into three network areas, western, northeastern, and southeastern Kansas, and analysis was made for the three streamflow characteristics in each area, using three planning horizons. The analysis showed that the maximum reduction of sampling mean-square error for each cost level could be obtained by adding new stations and discontinuing some current network stations. Large reductions in sampling mean-square error for low-flow information could be achieved in all three network areas, the reduction in western Kansas being the most dramatic. The addition of new stations would be most beneficial for mean-flow information in western Kansas. The reduction of sampling mean-square error for high-flow information would benefit most from the addition of new stations in western Kansas. Southeastern Kansas showed the smallest error reduction in high-flow information. A comparison among all three network areas indicated that funding resources could be most effectively used by discontinuing more stations in northeastern and southeastern Kansas and establishing more new stations in western Kansas.
Chemical library subset selection algorithms: a unified derivation using spatial statistics.
Hamprecht, Fred A; Thiel, Walter; van Gunsteren, Wilfred F
2002-01-01
If similar compounds have similar activity, rational subset selection becomes superior to random selection in screening for pharmacological lead discovery programs. Traditional approaches to this experimental design problem fall into two classes: (i) a linear or quadratic response function is assumed (ii) some space filling criterion is optimized. The assumptions underlying the first approach are clear but not always defendable; the second approach yields more intuitive designs but lacks a clear theoretical foundation. We model activity in a bioassay as realization of a stochastic process and use the best linear unbiased estimator to construct spatial sampling designs that optimize the integrated mean square prediction error, the maximum mean square prediction error, or the entropy. We argue that our approach constitutes a unifying framework encompassing most proposed techniques as limiting cases and sheds light on their underlying assumptions. In particular, vector quantization is obtained, in dimensions up to eight, in the limiting case of very smooth response surfaces for the integrated mean square error criterion. Closest packing is obtained for very rough surfaces under the integrated mean square error and entropy criteria. We suggest to use either the integrated mean square prediction error or the entropy as optimization criteria rather than approximations thereof and propose a scheme for direct iterative minimization of the integrated mean square prediction error. Finally, we discuss how the quality of chemical descriptors manifests itself and clarify the assumptions underlying the selection of diverse or representative subsets.
A method of bias correction for maximal reliability with dichotomous measures.
Penev, Spiridon; Raykov, Tenko
2010-02-01
This paper is concerned with the reliability of weighted combinations of a given set of dichotomous measures. Maximal reliability for such measures has been discussed in the past, but the pertinent estimator exhibits a considerable bias and mean squared error for moderate sample sizes. We examine this bias, propose a procedure for bias correction, and develop a more accurate asymptotic confidence interval for the resulting estimator. In most empirically relevant cases, the bias correction and mean squared error correction can be performed simultaneously. We propose an approximate (asymptotic) confidence interval for the maximal reliability coefficient, discuss the implementation of this estimator, and investigate the mean squared error of the associated asymptotic approximation. We illustrate the proposed methods using a numerical example.
Parastar, Hadi; Mostafapour, Sara; Azimi, Gholamhasan
2016-01-01
Comprehensive two-dimensional gas chromatography and flame ionization detection combined with unfolded-partial least squares is proposed as a simple, fast and reliable method to assess the quality of gasoline and to detect its potential adulterants. The data for the calibration set are first baseline corrected using a two-dimensional asymmetric least squares algorithm. The number of significant partial least squares components to build the model is determined using the minimum value of root-mean square error of leave-one out cross validation, which was 4. In this regard, blends of gasoline with kerosene, white spirit and paint thinner as frequently used adulterants are used to make calibration samples. Appropriate statistical parameters of regression coefficient of 0.996-0.998, root-mean square error of prediction of 0.005-0.010 and relative error of prediction of 1.54-3.82% for the calibration set show the reliability of the developed method. In addition, the developed method is externally validated with three samples in validation set (with a relative error of prediction below 10.0%). Finally, to test the applicability of the proposed strategy for the analysis of real samples, five real gasoline samples collected from gas stations are used for this purpose and the gasoline proportions were in range of 70-85%. Also, the relative standard deviations were below 8.5% for different samples in the prediction set. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Medina, K.D.; Tasker, Gary D.
1985-01-01
The surface water data network in Kansas was analyzed using generalized least squares regression for its effectiveness in providing regional streamflow information. The correlation and time-sampling error of the streamflow characteristic are considered in the generalized least squares method. Unregulated medium-flow, low-flow and high-flow characteristics were selected to be representative of the regional information that can be obtained from streamflow gaging station records for use in evaluating the effectiveness of continuing the present network stations, discontinuing some stations; and/or adding new stations. The analysis used streamflow records for all currently operated stations that were not affected by regulation and discontinued stations for which unregulated flow characteristics , as well as physical and climatic characteristics, were available. The state was divided into three network areas, western, northeastern, and southeastern Kansas, and analysis was made for three streamflow characteristics in each area, using three planning horizons. The analysis showed that the maximum reduction of sampling mean square error for each cost level could be obtained by adding new stations and discontinuing some of the present network stations. Large reductions in sampling mean square error for low-flow information could be accomplished in all three network areas, with western Kansas having the most dramatic reduction. The addition of new stations would be most beneficial for man- flow information in western Kansas, and to lesser degrees in the other two areas. The reduction of sampling mean square error for high-flow information would benefit most from the addition of new stations in western Kansas, and the effect diminishes to lesser degrees in the other two areas. Southeastern Kansas showed the smallest error reduction in high-flow information. A comparison among all three network areas indicated that funding resources could be most effectively used by discontinuing more stations in northeastern and southeastern Kansas and establishing more new stations in western Kansas. (Author 's abstract)
Synthetic Aperture Sonar Processing with MMSE Estimation of Image Sample Values
2016-12-01
UNCLASSIFIED/UNLIMITED 13. SUPPLEMENTARY NOTES 14. ABSTRACT MMSE (minimum mean- square error) target sample estimation using non-orthogonal basis...orthogonal, they can still be used in a minimum mean‐ square error (MMSE) estimator that models the object echo as a weighted sum of the multi‐aspect basis...problem. 3 Introduction Minimum mean‐ square error (MMSE) estimation is applied to target imaging with synthetic aperture
Joshi, Shuchi N; Srinivas, Nuggehally R; Parmar, Deven V
2018-03-01
Our aim was to develop and validate the extrapolative performance of a regression model using a limited sampling strategy for accurate estimation of the area under the plasma concentration versus time curve for saroglitazar. Healthy subject pharmacokinetic data from a well-powered food-effect study (fasted vs fed treatments; n = 50) was used in this work. The first 25 subjects' serial plasma concentration data up to 72 hours and corresponding AUC 0-t (ie, 72 hours) from the fasting group comprised a training dataset to develop the limited sampling model. The internal datasets for prediction included the remaining 25 subjects from the fasting group and all 50 subjects from the fed condition of the same study. The external datasets included pharmacokinetic data for saroglitazar from previous single-dose clinical studies. Limited sampling models were composed of 1-, 2-, and 3-concentration-time points' correlation with AUC 0-t of saroglitazar. Only models with regression coefficients (R 2 ) >0.90 were screened for further evaluation. The best R 2 model was validated for its utility based on mean prediction error, mean absolute prediction error, and root mean square error. Both correlations between predicted and observed AUC 0-t of saroglitazar and verification of precision and bias using Bland-Altman plot were carried out. None of the evaluated 1- and 2-concentration-time points models achieved R 2 > 0.90. Among the various 3-concentration-time points models, only 4 equations passed the predefined criterion of R 2 > 0.90. Limited sampling models with time points 0.5, 2, and 8 hours (R 2 = 0.9323) and 0.75, 2, and 8 hours (R 2 = 0.9375) were validated. Mean prediction error, mean absolute prediction error, and root mean square error were <30% (predefined criterion) and correlation (r) was at least 0.7950 for the consolidated internal and external datasets of 102 healthy subjects for the AUC 0-t prediction of saroglitazar. The same models, when applied to the AUC 0-t prediction of saroglitazar sulfoxide, showed mean prediction error, mean absolute prediction error, and root mean square error <30% and correlation (r) was at least 0.9339 in the same pool of healthy subjects. A 3-concentration-time points limited sampling model predicts the exposure of saroglitazar (ie, AUC 0-t ) within predefined acceptable bias and imprecision limit. Same model was also used to predict AUC 0-∞ . The same limited sampling model was found to predict the exposure of saroglitazar sulfoxide within predefined criteria. This model can find utility during late-phase clinical development of saroglitazar in the patient population. Copyright © 2018 Elsevier HS Journals, Inc. All rights reserved.
Koláčková, Pavla; Růžičková, Gabriela; Gregor, Tomáš; Šišperová, Eliška
2015-08-30
Calibration models for the Fourier transform-near infrared (FT-NIR) instrument were developed for quick and non-destructive determination of oil and fatty acids in whole achenes of milk thistle. Samples with a range of oil and fatty acid levels were collected and their transmittance spectra were obtained by the FT-NIR instrument. Based on these spectra and data gained by the means of the reference method - Soxhlet extraction and gas chromatography (GC) - calibration models were created by means of partial least square (PLS) regression analysis. Precision and accuracy of the calibration models was verified via the cross-validation of validation samples whose spectra were not part of the calibration model and also according to the root mean square error of prediction (RMSEP), root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV) and the validation coefficient of determination (R(2) ). R(2) for whole seeds were 0.96, 0.96, 0.83 and 0.67 and the RMSEP values were 0.76, 1.68, 1.24, 0.54 for oil, linoleic (C18:2), oleic (C18:1) and palmitic (C16:0) acids, respectively. The calibration models are appropriate for the non-destructive determination of oil and fatty acids levels in whole seeds of milk thistle. © 2014 Society of Chemical Industry.
Hypothesis Testing Using Factor Score Regression
Devlieger, Ines; Mayer, Axel; Rosseel, Yves
2015-01-01
In this article, an overview is given of four methods to perform factor score regression (FSR), namely regression FSR, Bartlett FSR, the bias avoiding method of Skrondal and Laake, and the bias correcting method of Croon. The bias correcting method is extended to include a reliable standard error. The four methods are compared with each other and with structural equation modeling (SEM) by using analytic calculations and two Monte Carlo simulation studies to examine their finite sample characteristics. Several performance criteria are used, such as the bias using the unstandardized and standardized parameterization, efficiency, mean square error, standard error bias, type I error rate, and power. The results show that the bias correcting method, with the newly developed standard error, is the only suitable alternative for SEM. While it has a higher standard error bias than SEM, it has a comparable bias, efficiency, mean square error, power, and type I error rate. PMID:29795886
The theory precision analyse of RFM localization of satellite remote sensing imagery
NASA Astrophysics Data System (ADS)
Zhang, Jianqing; Xv, Biao
2009-11-01
The tradition method of detecting precision of Rational Function Model(RFM) is to make use of a great deal check points, and it calculates mean square error through comparing calculational coordinate with known coordinate. This method is from theory of probability, through a large number of samples to statistic estimate value of mean square error, we can think its estimate value approaches in its true when samples are well enough. This paper is from angle of survey adjustment, take law of propagation of error as the theory basis, and it calculates theory precision of RFM localization. Then take the SPOT5 three array imagery as experiment data, and the result of traditional method and narrated method in the paper are compared, while has confirmed tradition method feasible, and answered its theory precision question from the angle of survey adjustment.
Pappas, Christos; Kyraleou, Maria; Voskidi, Eleni; Kotseridis, Yorgos; Taranilis, Petros A; Kallithraka, Stamatina
2015-02-01
The direct and simultaneous quantitative determination of the mean degree of polymerization (mDP) and the degree of galloylation (%G) in grape seeds were quantified using diffuse reflectance infrared Fourier transform spectroscopy and partial least squares (PLS). The results were compared with those obtained using the conventional analysis employing phloroglucinolysis as pretreatment followed by high performance liquid chromatography-UV and mass spectrometry detection. Infrared spectra were recorded in solid state samples after freeze drying. The 2nd derivative of the 1832 to 1416 and 918 to 739 cm(-1) spectral regions for the quantification of mDP, the 2nd derivative of the 1813 to 607 cm(-1) spectral region for the degree of %G determination and PLS regression were used. The determination coefficients (R(2) ) of mDP and %G were 0.99 and 0.98, respectively. The corresponding values of the root-mean-square error of calibration were found 0.506 and 0.692, the root-mean-square error of cross validation 0.811 and 0.921, and the root-mean-square error of prediction 0.612 and 0.801. The proposed method in comparison with the conventional method is simpler, less time consuming, more economical, and requires reduced quantities of chemical reagents and fewer sample pretreatment steps. It could be a starting point for the design of more specific models according to the requirements of the wineries. © 2015 Institute of Food Technologists®
Durakli Velioglu, Serap; Ercioglu, Elif; Boyaci, Ismail Hakki
2017-05-01
This research paper describes the potential of synchronous fluorescence (SF) spectroscopy for authentication of buffalo milk, a favourable raw material in the production of some premium dairy products. Buffalo milk is subjected to fraudulent activities like many other high priced foodstuffs. The current methods widely used for the detection of adulteration of buffalo milk have various disadvantages making them unattractive for routine analysis. Thus, the aim of the present study was to assess the potential of SF spectroscopy in combination with multivariate methods for rapid discrimination between buffalo and cow milk and detection of the adulteration of buffalo milk with cow milk. SF spectra of cow and buffalo milk samples were recorded between 400-550 nm excitation range with Δλ of 10-100 nm, in steps of 10 nm. The data obtained for ∆λ = 10 nm were utilised to classify the samples using principal component analysis (PCA), and detect the adulteration level of buffalo milk with cow milk using partial least square (PLS) methods. Successful discrimination of samples and detection of adulteration of buffalo milk with limit of detection value (LOD) of 6% are achieved with the models having root mean square error of calibration (RMSEC) and the root mean square error of cross-validation (RMSECV) and root mean square error of prediction (RMSEP) values of 2, 7, and 4%, respectively. The results reveal the potential of SF spectroscopy for rapid authentication of buffalo milk.
A Comparison of Normal and Elliptical Estimation Methods in Structural Equation Models.
ERIC Educational Resources Information Center
Schumacker, Randall E.; Cheevatanarak, Suchittra
Monte Carlo simulation compared chi-square statistics, parameter estimates, and root mean square error of approximation values using normal and elliptical estimation methods. Three research conditions were imposed on the simulated data: sample size, population contamination percent, and kurtosis. A Bentler-Weeks structural model established the…
Application of near-infrared spectroscopy for the rapid quality assessment of Radix Paeoniae Rubra
NASA Astrophysics Data System (ADS)
Zhan, Hao; Fang, Jing; Tang, Liying; Yang, Hongjun; Li, Hua; Wang, Zhuju; Yang, Bin; Wu, Hongwei; Fu, Meihong
2017-08-01
Near-infrared (NIR) spectroscopy with multivariate analysis was used to quantify gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra, and the feasibility to classify the samples originating from different areas was investigated. A new high-performance liquid chromatography method was developed and validated to analyze gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra as the reference. Partial least squares (PLS), principal component regression (PCR), and stepwise multivariate linear regression (SMLR) were performed to calibrate the regression model. Different data pretreatments such as derivatives (1st and 2nd), multiplicative scatter correction, standard normal variate, Savitzky-Golay filter, and Norris derivative filter were applied to remove the systematic errors. The performance of the model was evaluated according to the root mean square of calibration (RMSEC), root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), and correlation coefficient (r). The results show that compared to PCR and SMLR, PLS had a lower RMSEC, RMSECV, and RMSEP and higher r for all the four analytes. PLS coupled with proper pretreatments showed good performance in both the fitting and predicting results. Furthermore, the original areas of Radix Paeoniae Rubra samples were partly distinguished by principal component analysis. This study shows that NIR with PLS is a reliable, inexpensive, and rapid tool for the quality assessment of Radix Paeoniae Rubra.
Increasing point-count duration increases standard error
Smith, W.P.; Twedt, D.J.; Hamel, P.B.; Ford, R.P.; Wiedenfeld, D.A.; Cooper, R.J.
1998-01-01
We examined data from point counts of varying duration in bottomland forests of west Tennessee and the Mississippi Alluvial Valley to determine if counting interval influenced sampling efficiency. Estimates of standard error increased as point count duration increased both for cumulative number of individuals and species in both locations. Although point counts appear to yield data with standard errors proportional to means, a square root transformation of the data may stabilize the variance. Using long (>10 min) point counts may reduce sample size and increase sampling error, both of which diminish statistical power and thereby the ability to detect meaningful changes in avian populations.
Lu, Xinjiang; Liu, Wenbo; Zhou, Chuang; Huang, Minghui
2017-06-13
The least-squares support vector machine (LS-SVM) is a popular data-driven modeling method and has been successfully applied to a wide range of applications. However, it has some disadvantages, including being ineffective at handling non-Gaussian noise as well as being sensitive to outliers. In this paper, a robust LS-SVM method is proposed and is shown to have more reliable performance when modeling a nonlinear system under conditions where Gaussian or non-Gaussian noise is present. The construction of a new objective function allows for a reduction of the mean of the modeling error as well as the minimization of its variance, and it does not constrain the mean of the modeling error to zero. This differs from the traditional LS-SVM, which uses a worst-case scenario approach in order to minimize the modeling error and constrains the mean of the modeling error to zero. In doing so, the proposed method takes the modeling error distribution information into consideration and is thus less conservative and more robust in regards to random noise. A solving method is then developed in order to determine the optimal parameters for the proposed robust LS-SVM. An additional analysis indicates that the proposed LS-SVM gives a smaller weight to a large-error training sample and a larger weight to a small-error training sample, and is thus more robust than the traditional LS-SVM. The effectiveness of the proposed robust LS-SVM is demonstrated using both artificial and real life cases.
Demand Forecasting: An Evaluation of DODs Accuracy Metric and Navys Procedures
2016-06-01
inventory management improvement plan, mean of absolute scaled error, lead time adjusted squared error, forecast accuracy, benchmarking, naïve method...Manager JASA Journal of the American Statistical Association LASE Lead-time Adjusted Squared Error LCI Life Cycle Indicator MA Moving Average MAE...Mean Squared Error xvi NAVSUP Naval Supply Systems Command NDAA National Defense Authorization Act NIIN National Individual Identification Number
NASA Astrophysics Data System (ADS)
Zakiyatussariroh, W. H. Wan; Said, Z. Mohammad; Norazan, M. R.
2014-12-01
This study investigated the performance of the Lee-Carter (LC) method and it variants in modeling and forecasting Malaysia mortality. These include the original LC, the Lee-Miller (LM) variant and the Booth-Maindonald-Smith (BMS) variant. These methods were evaluated using Malaysia's mortality data which was measured based on age specific death rates (ASDR) for 1971 to 2009 for overall population while those for 1980-2009 were used in separate models for male and female population. The performance of the variants has been examined in term of the goodness of fit of the models and forecasting accuracy. Comparison was made based on several criteria namely, mean square error (MSE), root mean square error (RMSE), mean absolute deviation (MAD) and mean absolute percentage error (MAPE). The results indicate that BMS method was outperformed in in-sample fitting for overall population and when the models were fitted separately for male and female population. However, in the case of out-sample forecast accuracy, BMS method only best when the data were fitted to overall population. When the data were fitted separately for male and female, LCnone performed better for male population and LM method is good for female population.
Nonparametric probability density estimation by optimization theoretic techniques
NASA Technical Reports Server (NTRS)
Scott, D. W.
1976-01-01
Two nonparametric probability density estimators are considered. The first is the kernel estimator. The problem of choosing the kernel scaling factor based solely on a random sample is addressed. An interactive mode is discussed and an algorithm proposed to choose the scaling factor automatically. The second nonparametric probability estimate uses penalty function techniques with the maximum likelihood criterion. A discrete maximum penalized likelihood estimator is proposed and is shown to be consistent in the mean square error. A numerical implementation technique for the discrete solution is discussed and examples displayed. An extensive simulation study compares the integrated mean square error of the discrete and kernel estimators. The robustness of the discrete estimator is demonstrated graphically.
NASA Astrophysics Data System (ADS)
Gidey, Amanuel
2018-06-01
Determining suitability and vulnerability of groundwater quality for irrigation use is a key alarm and first aid for careful management of groundwater resources to diminish the impacts on irrigation. This study was conducted to determine the overall suitability of groundwater quality for irrigation use and to generate their spatial distribution maps in Elala catchment, Northern Ethiopia. Thirty-nine groundwater samples were collected to analyze and map the water quality variables. Atomic absorption spectrophotometer, ultraviolet spectrophotometer, titration and calculation methods were used for laboratory groundwater quality analysis. Arc GIS, geospatial analysis tools, semivariogram model types and interpolation methods were used to generate geospatial distribution maps. Twelve and eight water quality variables were used to produce weighted overlay and irrigation water quality index models, respectively. Root-mean-square error, mean square error, absolute square error, mean error, root-mean-square standardized error, measured values versus predicted values were used for cross-validation. The overall weighted overlay model result showed that 146 km2 areas are highly suitable, 135 km2 moderately suitable and 60 km2 area unsuitable for irrigation use. The result of irrigation water quality index confirms 10.26% with no restriction, 23.08% with low restriction, 20.51% with moderate restriction, 15.38% with high restriction and 30.76% with the severe restriction for irrigation use. GIS and irrigation water quality index are better methods for irrigation water resources management to achieve a full yield irrigation production to improve food security and to sustain it for a long period, to avoid the possibility of increasing environmental problems for the future generation.
Estimation of population mean under systematic sampling
NASA Astrophysics Data System (ADS)
Noor-ul-amin, Muhammad; Javaid, Amjad
2017-11-01
In this study we propose a generalized ratio estimator under non-response for systematic random sampling. We also generate a class of estimators through special cases of generalized estimator using different combinations of coefficients of correlation, kurtosis and variation. The mean square errors and mathematical conditions are also derived to prove the efficiency of proposed estimators. Numerical illustration is included using three populations to support the results.
Static Scene Statistical Non-Uniformity Correction
2015-03-01
Error NUC Non-Uniformity Correction RMSE Root Mean Squared Error RSD Relative Standard Deviation S3NUC Static Scene Statistical Non-Uniformity...Deviation ( RSD ) which normalizes the standard deviation, σ, to the mean estimated value, µ using the equation RS D = σ µ × 100. The RSD plot of the gain...estimates is shown in Figure 4.1(b). The RSD plot shows that after a sample size of approximately 10, the different photocount values and the inclusion
Noncontact analysis of the fiber weight per unit area in prepreg by near-infrared spectroscopy.
Jiang, B; Huang, Y D
2008-05-26
The fiber weight per unit area in prepreg is an important factor to ensure the quality of the composite products. Near-infrared spectroscopy (NIRS) technology together with a noncontact reflectance sources has been applied for quality analysis of the fiber weight per unit area. The range of the unit area fiber weight was 13.39-14.14mgcm(-2). The regression method was employed by partial least squares (PLS) and principal components regression (PCR). The calibration model was developed by 55 samples to determine the fiber weight per unit area in prepreg. The determination coefficient (R(2)), root mean square error of calibration (RMSEC) and root mean square error of prediction (RMSEP) were 0.82, 0.092, 0.099, respectively. The predicted values of the fiber weight per unit area in prepreg measured by NIRS technology were comparable to the values obtained by the reference method. For this technology, the noncontact reflectance sources focused directly on the sample with neither previous treatment nor manipulation. The results of the paired t-test revealed that there was no significant difference between the NIR method and the reference method. Besides, the prepreg could be analyzed one time within 20s without sample destruction.
Four Types of Pulse Oximeters Accurately Detect Hypoxia during Low Perfusion and Motion.
Louie, Aaron; Feiner, John R; Bickler, Philip E; Rhodes, Laura; Bernstein, Michael; Lucero, Jennifer
2018-03-01
Pulse oximeter performance is degraded by motion artifacts and low perfusion. Manufacturers developed algorithms to improve instrument performance during these challenges. There have been no independent comparisons of these devices. We evaluated the performance of four pulse oximeters (Masimo Radical-7, USA; Nihon Kohden OxyPal Neo, Japan; Nellcor N-600, USA; and Philips Intellivue MP5, USA) in 10 healthy adult volunteers. Three motions were evaluated: tapping, pseudorandom, and volunteer-generated rubbing, adjusted to produce photoplethsmogram disturbance similar to arterial pulsation amplitude. During motion, inspired gases were adjusted to achieve stable target plateaus of arterial oxygen saturation (SaO2) at 75%, 88%, and 100%. Pulse oximeter readings were compared with simultaneous arterial blood samples to calculate bias (oxygen saturation measured by pulse oximetry [SpO2] - SaO2), mean, SD, 95% limits of agreement, and root mean square error. Receiver operating characteristic curves were determined to detect mild (SaO2 < 90%) and severe (SaO2 < 80%) hypoxemia. Pulse oximeter readings corresponding to 190 blood samples were analyzed. All oximeters detected hypoxia but motion and low perfusion degraded performance. Three of four oximeters (Masimo, Nellcor, and Philips) had root mean square error greater than 3% for SaO2 70 to 100% during any motion, compared to a root mean square error of 1.8% for the stationary control. A low perfusion index increased error. All oximeters detected hypoxemia during motion and low-perfusion conditions, but motion impaired performance at all ranges, with less accuracy at lower SaO2. Lower perfusion degraded performance in all but the Nihon Kohden instrument. We conclude that different types of pulse oximeters can be similarly effective in preserving sensitivity to clinically relevant hypoxia.
An Investigation of the Sample Performance of Two Nonnormality Corrections for RMSEA
ERIC Educational Resources Information Center
Brosseau-Liard, Patricia E.; Savalei, Victoria; Li, Libo
2012-01-01
The root mean square error of approximation (RMSEA) is a popular fit index in structural equation modeling (SEM). Typically, RMSEA is computed using the normal theory maximum likelihood (ML) fit function. Under nonnormality, the uncorrected sample estimate of the ML RMSEA tends to be inflated. Two robust corrections to the sample ML RMSEA have…
NASA Astrophysics Data System (ADS)
Yan, Hong; Song, Xiangzhong; Tian, Kuangda; Chen, Yilin; Xiong, Yanmei; Min, Shungeng
2018-02-01
A novel method, mid-infrared (MIR) spectroscopy, which enables the determination of Chlorantraniliprole in Abamectin within minutes, is proposed. We further evaluate the prediction ability of four wavelength selection methods, including bootstrapping soft shrinkage approach (BOSS), Monte Carlo uninformative variable elimination (MCUVE), genetic algorithm partial least squares (GA-PLS) and competitive adaptive reweighted sampling (CARS) respectively. The results showed that BOSS method obtained the lowest root mean squared error of cross validation (RMSECV) (0.0245) and root mean squared error of prediction (RMSEP) (0.0271), as well as the highest coefficient of determination of cross-validation (Qcv2) (0.9998) and the coefficient of determination of test set (Q2test) (0.9989), which demonstrated that the mid infrared spectroscopy can be used to detect Chlorantraniliprole in Abamectin conveniently. Meanwhile, a suitable wavelength selection method (BOSS) is essential to conducting a component spectral analysis.
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P.; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS.
NASA Astrophysics Data System (ADS)
Mikhailova, E. A.; Stiglitz, R. Y.; Post, C. J.; Schlautman, M. A.; Sharp, J. L.; Gerard, P. D.
2017-12-01
Color sensor technologies offer opportunities for affordable and rapid assessment of soil organic carbon (SOC) and total nitrogen (TN) in the field, but the applicability of these technologies may vary by soil type. The objective of this study was to use an inexpensive color sensor to develop SOC and TN prediction models for the Russian Chernozem (Haplic Chernozem) in the Kursk region of Russia. Twenty-one dried soil samples were analyzed using a Nix Pro™ color sensor that is controlled through a mobile application and Bluetooth to collect CIEL*a*b* (darkness to lightness, green to red, and blue to yellow) color data. Eleven samples were randomly selected to be used to construct prediction models and the remaining ten samples were set aside for cross validation. The root mean squared error (RMSE) was calculated to determine each model's prediction error. The data from the eleven soil samples were used to develop the natural log of SOC (lnSOC) and TN (lnTN) prediction models using depth, L*, a*, and b* for each sample as predictor variables in regression analyses. Resulting residual plots, root mean square errors (RMSE), mean squared prediction error (MSPE) and coefficients of determination ( R 2, adjusted R 2) were used to assess model fit for each of the SOC and total N prediction models. Final models were fit using all soil samples, which included depth and color variables, for lnSOC ( R 2 = 0.987, Adj. R 2 = 0.981, RMSE = 0.003, p-value < 0.001, MSPE = 0.182) and lnTN ( R 2 = 0.980 Adj. R 2 = 0.972, RMSE = 0.004, p-value < 0.001, MSPE = 0.001). Additionally, final models were fit for all soil samples, which included only color variables, for lnSOC ( R 2 = 0.959 Adj. R 2 = 0.949, RMSE = 0.007, p-value < 0.001, MSPE = 0.536) and lnTN ( R 2 = 0.912 Adj. R 2 = 0.890, RMSE = 0.015, p-value < 0.001, MSPE = 0.001). The results suggest that soil color may be used for rapid assessment of SOC and TN in these agriculturally important soils.
Performance metrics for the assessment of satellite data products: an ocean color case study
Seegers, Bridget N.; Stumpf, Richard P.; Schaeffer, Blake A.; Loftin, Keith A.; Werdell, P. Jeremy
2018-01-01
Performance assessment of ocean color satellite data has generally relied on statistical metrics chosen for their common usage and the rationale for selecting certain metrics is infrequently explained. Commonly reported statistics based on mean squared errors, such as the coefficient of determination (r2), root mean square error, and regression slopes, are most appropriate for Gaussian distributions without outliers and, therefore, are often not ideal for ocean color algorithm performance assessment, which is often limited by sample availability. In contrast, metrics based on simple deviations, such as bias and mean absolute error, as well as pair-wise comparisons, often provide more robust and straightforward quantities for evaluating ocean color algorithms with non-Gaussian distributions and outliers. This study uses a SeaWiFS chlorophyll-a validation data set to demonstrate a framework for satellite data product assessment and recommends a multi-metric and user-dependent approach that can be applied within science, modeling, and resource management communities. PMID:29609296
Li, Wen-bing; Yao, Lin-tao; Liu, Mu-hua; Huang, Lin; Yao, Ming-yin; Chen, Tian-bing; He, Xiu-wen; Yang, Ping; Hu, Hui-qin; Nie, Jiang-hui
2015-05-01
Cu in navel orange was detected rapidly by laser-induced breakdown spectroscopy (LIBS) combined with partial least squares (PLS) for quantitative analysis, then the effect on the detection accuracy of the model with different spectral data ptetreatment methods was explored. Spectral data for the 52 Gannan navel orange samples were pretreated by different data smoothing, mean centralized and standard normal variable transform. Then 319~338 nm wavelength section containing characteristic spectral lines of Cu was selected to build PLS models, the main evaluation indexes of models such as regression coefficient (r), root mean square error of cross validation (RMSECV) and the root mean square error of prediction (RMSEP) were compared and analyzed. Three indicators of PLS model after 13 points smoothing and processing of the mean center were found reaching 0. 992 8, 3. 43 and 3. 4 respectively, the average relative error of prediction model is only 5. 55%, and in one word, the quality of calibration and prediction of this model are the best results. The results show that selecting the appropriate data pre-processing method, the prediction accuracy of PLS quantitative model of fruits and vegetables detected by LIBS can be improved effectively, providing a new method for fast and accurate detection of fruits and vegetables by LIBS.
Performance of statistical models to predict mental health and substance abuse cost.
Montez-Rath, Maria; Christiansen, Cindy L; Ettner, Susan L; Loveland, Susan; Rosen, Amy K
2006-10-26
Providers use risk-adjustment systems to help manage healthcare costs. Typically, ordinary least squares (OLS) models on either untransformed or log-transformed cost are used. We examine the predictive ability of several statistical models, demonstrate how model choice depends on the goal for the predictive model, and examine whether building models on samples of the data affects model choice. Our sample consisted of 525,620 Veterans Health Administration patients with mental health (MH) or substance abuse (SA) diagnoses who incurred costs during fiscal year 1999. We tested two models on a transformation of cost: a Log Normal model and a Square-root Normal model, and three generalized linear models on untransformed cost, defined by distributional assumption and link function: Normal with identity link (OLS); Gamma with log link; and Gamma with square-root link. Risk-adjusters included age, sex, and 12 MH/SA categories. To determine the best model among the entire dataset, predictive ability was evaluated using root mean square error (RMSE), mean absolute prediction error (MAPE), and predictive ratios of predicted to observed cost (PR) among deciles of predicted cost, by comparing point estimates and 95% bias-corrected bootstrap confidence intervals. To study the effect of analyzing a random sample of the population on model choice, we re-computed these statistics using random samples beginning with 5,000 patients and ending with the entire sample. The Square-root Normal model had the lowest estimates of the RMSE and MAPE, with bootstrap confidence intervals that were always lower than those for the other models. The Gamma with square-root link was best as measured by the PRs. The choice of best model could vary if smaller samples were used and the Gamma with square-root link model had convergence problems with small samples. Models with square-root transformation or link fit the data best. This function (whether used as transformation or as a link) seems to help deal with the high comorbidity of this population by introducing a form of interaction. The Gamma distribution helps with the long tail of the distribution. However, the Normal distribution is suitable if the correct transformation of the outcome is used.
2016-09-01
mean- square (RMS) error of 0.29° at ə° resolution. For a P4 coded signal, the RMS error in estimating the AOA is 0.32° at 1° resolution. 14...FMCW signal, it was demonstrated that the system is capable of estimating the AOA with a root-mean- square (RMS) error of 0.29° at ə° resolution. For a...Modulator PCB printed circuit board PD photodetector RF radio frequency RMS root-mean- square xvi THIS PAGE INTENTIONALLY LEFT BLANK xvii
Using Least Squares for Error Propagation
ERIC Educational Resources Information Center
Tellinghuisen, Joel
2015-01-01
The method of least-squares (LS) has a built-in procedure for estimating the standard errors (SEs) of the adjustable parameters in the fit model: They are the square roots of the diagonal elements of the covariance matrix. This means that one can use least-squares to obtain numerical values of propagated errors by defining the target quantities as…
De Girolamo, A; Lippolis, V; Nordkvist, E; Visconti, A
2009-06-01
Fourier transform near-infrared spectroscopy (FT-NIR) was used for rapid and non-invasive analysis of deoxynivalenol (DON) in durum and common wheat. The relevance of using ground wheat samples with a homogeneous particle size distribution to minimize measurement variations and avoid DON segregation among particles of different sizes was established. Calibration models for durum wheat, common wheat and durum + common wheat samples, with particle size <500 microm, were obtained by using partial least squares (PLS) regression with an external validation technique. Values of root mean square error of prediction (RMSEP, 306-379 microg kg(-1)) were comparable and not too far from values of root mean square error of cross-validation (RMSECV, 470-555 microg kg(-1)). Coefficients of determination (r(2)) indicated an "approximate to good" level of prediction of the DON content by FT-NIR spectroscopy in the PLS calibration models (r(2) = 0.71-0.83), and a "good" discrimination between low and high DON contents in the PLS validation models (r(2) = 0.58-0.63). A "limited to good" practical utility of the models was ascertained by range error ratio (RER) values higher than 6. A qualitative model, based on 197 calibration samples, was developed to discriminate between blank and naturally contaminated wheat samples by setting a cut-off at 300 microg kg(-1) DON to separate the two classes. The model correctly classified 69% of the 65 validation samples with most misclassified samples (16 of 20) showing DON contamination levels quite close to the cut-off level. These findings suggest that FT-NIR analysis is suitable for the determination of DON in unprocessed wheat at levels far below the maximum permitted limits set by the European Commission.
Building on crossvalidation for increasing the quality of geostatistical modeling
Olea, R.A.
2012-01-01
The random function is a mathematical model commonly used in the assessment of uncertainty associated with a spatially correlated attribute that has been partially sampled. There are multiple algorithms for modeling such random functions, all sharing the requirement of specifying various parameters that have critical influence on the results. The importance of finding ways to compare the methods and setting parameters to obtain results that better model uncertainty has increased as these algorithms have grown in number and complexity. Crossvalidation has been used in spatial statistics, mostly in kriging, for the analysis of mean square errors. An appeal of this approach is its ability to work with the same empirical sample available for running the algorithms. This paper goes beyond checking estimates by formulating a function sensitive to conditional bias. Under ideal conditions, such function turns into a straight line, which can be used as a reference for preparing measures of performance. Applied to kriging, deviations from the ideal line provide sensitivity to the semivariogram lacking in crossvalidation of kriging errors and are more sensitive to conditional bias than analyses of errors. In terms of stochastic simulation, in addition to finding better parameters, the deviations allow comparison of the realizations resulting from the applications of different methods. Examples show improvements of about 30% in the deviations and approximately 10% in the square root of mean square errors between reasonable starting modelling and the solutions according to the new criteria. ?? 2011 US Government.
Wang, L; Qin, X C; Lin, H C; Deng, K F; Luo, Y W; Sun, Q R; Du, Q X; Wang, Z Y; Tuo, Y; Sun, J H
2018-02-01
To analyse the relationship between Fourier transform infrared (FTIR) spectrum of rat's spleen tissue and postmortem interval (PMI) for PMI estimation using FTIR spectroscopy combined with data mining method. Rats were sacrificed by cervical dislocation, and the cadavers were placed at 20 ℃. The FTIR spectrum data of rats' spleen tissues were taken and measured at different time points. After pretreatment, the data was analysed by data mining method. The absorption peak intensity of rat's spleen tissue spectrum changed with the PMI, while the absorption peak position was unchanged. The results of principal component analysis (PCA) showed that the cumulative contribution rate of the first three principal components was 96%. There was an obvious clustering tendency for the spectrum sample at each time point. The methods of partial least squares discriminant analysis (PLS-DA) and support vector machine classification (SVMC) effectively divided the spectrum samples with different PMI into four categories (0-24 h, 48-72 h, 96-120 h and 144-168 h). The determination coefficient ( R ²) of the PMI estimation model established by PLS regression analysis was 0.96, and the root mean square error of calibration (RMSEC) and root mean square error of cross validation (RMSECV) were 9.90 h and 11.39 h respectively. In prediction set, the R ² was 0.97, and the root mean square error of prediction (RMSEP) was 10.49 h. The FTIR spectrum of the rat's spleen tissue can be effectively analyzed qualitatively and quantitatively by the combination of FTIR spectroscopy and data mining method, and the classification and PLS regression models can be established for PMI estimation. Copyright© by the Editorial Department of Journal of Forensic Medicine.
Forecasting Error Calculation with Mean Absolute Deviation and Mean Absolute Percentage Error
NASA Astrophysics Data System (ADS)
Khair, Ummul; Fahmi, Hasanul; Hakim, Sarudin Al; Rahim, Robbi
2017-12-01
Prediction using a forecasting method is one of the most important things for an organization, the selection of appropriate forecasting methods is also important but the percentage error of a method is more important in order for decision makers to adopt the right culture, the use of the Mean Absolute Deviation and Mean Absolute Percentage Error to calculate the percentage of mistakes in the least square method resulted in a percentage of 9.77% and it was decided that the least square method be worked for time series and trend data.
Tian, Hai-Qing; Wang, Chun-Guang; Zhang, Hai-Jun; Yu, Zhi-Hong; Li, Jian-Kang
2012-11-01
Outlier samples strongly influence the precision of the calibration model in soluble solids content measurement of melons using NIR Spectra. According to the possible sources of outlier samples, three methods (predicted concentration residual test; Chauvenet test; leverage and studentized residual test) were used to discriminate these outliers respectively. Nine suspicious outliers were detected from calibration set which including 85 fruit samples. Considering the 9 suspicious outlier samples maybe contain some no-outlier samples, they were reclaimed to the model one by one to see whether they influence the model and prediction precision or not. In this way, 5 samples which were helpful to the model joined in calibration set again, and a new model was developed with the correlation coefficient (r) 0. 889 and root mean square errors for calibration (RMSEC) 0.6010 Brix. For 35 unknown samples, the root mean square errors prediction (RMSEP) was 0.854 degrees Brix. The performance of this model was more better than that developed with non outlier was eliminated from calibration set (r = 0.797, RMSEC= 0.849 degrees Brix, RMSEP = 1.19 degrees Brix), and more representative and stable with all 9 samples were eliminated from calibration set (r = 0.892, RMSEC = 0.605 degrees Brix, RMSEP = 0.862 degrees).
Adaptive control of theophylline therapy: importance of blood sampling times.
D'Argenio, D Z; Khakmahd, K
1983-10-01
A two-observation protocol for estimating theophylline clearance during a constant-rate intravenous infusion is used to examine the importance of blood sampling schedules with regard to the information content of resulting concentration data. Guided by a theory for calculating maximally informative sample times, population simulations are used to assess the effect of specific sampling times on the precision of resulting clearance estimates and subsequent predictions of theophylline plasma concentrations. The simulations incorporated noise terms for intersubject variability, dosing errors, sample collection errors, and assay error. Clearance was estimated using Chiou's method, least squares, and a Bayesian estimation procedure. The results of these simulations suggest that clinically significant estimation and prediction errors may result when using the above two-point protocol for estimating theophylline clearance if the time separating the two blood samples is less than one population mean elimination half-life.
Obstacle Detection in Indoor Environment for Visually Impaired Using Mobile Camera
NASA Astrophysics Data System (ADS)
Rahman, Samiur; Ullah, Sana; Ullah, Sehat
2018-01-01
Obstacle detection can improve the mobility as well as the safety of visually impaired people. In this paper, we present a system using mobile camera for visually impaired people. The proposed algorithm works in indoor environment and it uses a very simple technique of using few pre-stored floor images. In indoor environment all unique floor types are considered and a single image is stored for each unique floor type. These floor images are considered as reference images. The algorithm acquires an input image frame and then a region of interest is selected and is scanned for obstacle using pre-stored floor images. The algorithm compares the present frame and the next frame and compute mean square error of the two frames. If mean square error is less than a threshold value α then it means that there is no obstacle in the next frame. If mean square error is greater than α then there are two possibilities; either there is an obstacle or the floor type is changed. In order to check if the floor is changed, the algorithm computes mean square error of next frame and all stored floor types. If minimum of mean square error is less than a threshold value α then flour is changed otherwise there exist an obstacle. The proposed algorithm works in real-time and 96% accuracy has been achieved.
Evaluation of Bayesian Sequential Proportion Estimation Using Analyst Labels
NASA Technical Reports Server (NTRS)
Lennington, R. K.; Abotteen, K. M. (Principal Investigator)
1980-01-01
The author has identified the following significant results. A total of ten Large Area Crop Inventory Experiment Phase 3 blind sites and analyst-interpreter labels were used in a study to compare proportional estimates obtained by the Bayes sequential procedure with estimates obtained from simple random sampling and from Procedure 1. The analyst error rate using the Bayes technique was shown to be no greater than that for the simple random sampling. Also, the segment proportion estimates produced using this technique had smaller bias and mean squared errors than the estimates produced using either simple random sampling or Procedure 1.
Mapping from disease-specific measures to health-state utility values in individuals with migraine.
Gillard, Patrick J; Devine, Beth; Varon, Sepideh F; Liu, Lei; Sullivan, Sean D
2012-05-01
The objective of this study was to develop empirical algorithms that estimate health-state utility values from disease-specific quality-of-life scores in individuals with migraine. Data from a cross-sectional, multicountry study were used. Individuals with episodic and chronic migraine were randomly assigned to training or validation samples. Spearman's correlation coefficients between paired EuroQol five-dimensional (EQ-5D) questionnaire utility values and both Headache Impact Test (HIT-6) scores and Migraine-Specific Quality-of-Life Questionnaire version 2.1 (MSQ) domain scores (role restrictive, role preventive, and emotional function) were examined. Regression models were constructed to estimate EQ-5D questionnaire utility values from the HIT-6 score or the MSQ domain scores. Preferred algorithms were confirmed in the validation samples. In episodic migraine, the preferred HIT-6 and MSQ algorithms explained 22% and 25% of the variance (R(2)) in the training samples, respectively, and had similar prediction errors (root mean square errors of 0.30). In chronic migraine, the preferred HIT-6 and MSQ algorithms explained 36% and 45% of the variance in the training samples, respectively, and had similar prediction errors (root mean square errors 0.31 and 0.29). In episodic and chronic migraine, no statistically significant differences were observed between the mean observed and the mean estimated EQ-5D questionnaire utility values for the preferred HIT-6 and MSQ algorithms in the validation samples. The relationship between the EQ-5D questionnaire and the HIT-6 or the MSQ is adequate to use regression equations to estimate EQ-5D questionnaire utility values. The preferred HIT-6 and MSQ algorithms will be useful in estimating health-state utilities in migraine trials in which no preference-based measure is present. Copyright © 2012 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Determination of suitable drying curve model for bread moisture loss during baking
NASA Astrophysics Data System (ADS)
Soleimani Pour-Damanab, A. R.; Jafary, A.; Rafiee, S.
2013-03-01
This study presents mathematical modelling of bread moisture loss or drying during baking in a conventional bread baking process. In order to estimate and select the appropriate moisture loss curve equation, 11 different models, semi-theoretical and empirical, were applied to the experimental data and compared according to their correlation coefficients, chi-squared test and root mean square error which were predicted by nonlinear regression analysis. Consequently, of all the drying models, a Page model was selected as the best one, according to the correlation coefficients, chi-squared test, and root mean square error values and its simplicity. Mean absolute estimation error of the proposed model by linear regression analysis for natural and forced convection modes was 2.43, 4.74%, respectively.
NASA Astrophysics Data System (ADS)
Sun, Dongliang; Huang, Guangtuan; Jiang, Juncheng; Zhang, Mingguang; Wang, Zhirong
2013-04-01
Overpressure is one important cause of domino effect in accidents of chemical process equipments. Some models considering propagation probability and threshold values of the domino effect caused by overpressure have been proposed in previous study. In order to prove the rationality and validity of the models reported in the reference, two boundary values of three damage degrees reported were considered as random variables respectively in the interval [0, 100%]. Based on the overpressure data for damage to the equipment and the damage state, and the calculation method reported in the references, the mean square errors of the four categories of damage probability models of overpressure were calculated with random boundary values, and then a relationship of mean square error vs. the two boundary value was obtained, the minimum of mean square error was obtained, compared with the result of the present work, mean square error decreases by about 3%. Therefore, the error was in the acceptable range of engineering applications, the models reported can be considered reasonable and valid.
Tan, Jin; Li, Rong; Jiang, Zi-Tao; Tang, Shu-Hua; Wang, Ying; Shi, Meng; Xiao, Yi-Qian; Jia, Bin; Lu, Tian-Xiang; Wang, Hao
2017-02-15
Synchronous front-face fluorescence spectroscopy has been developed for the discrimination of used frying oil (UFO) from edible vegetable oil (EVO), the estimation of the using time of UFO, and the determination of the adulteration of EVO with UFO. Both the heating time of laboratory prepared UFO and the adulteration of EVO with UFO could be determined by partial least squares regression (PLSR). To simulate the EVO adulteration with UFO, for each kind of oil, fifty adulterated samples at the adulterant amounts range of 1-50% were prepared. PLSR was then adopted to build the model and both full (leave-one-out) cross-validation and external validation were performed to evaluate the predictive ability. Under the optimum condition, the plots of observed versus predicted values exhibited high linearity (R(2)>0.96). The root mean square error of cross-validation (RMSECV) and root mean square error of prediction (RMSEP) were both lower than 3%. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kuriakose, Saji; Joe, I Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC=0.00009% v/v). The lowest root mean square error of prediction (RMSEP=0.00016% v/v) in the test set and the highest coefficient of determination (R(2)=0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kuriakose, Saji; Joe, I. Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC = 0.00009% v/v). The lowest root mean square error of prediction (RMSEP = 0.00016% v/v) in the test set and the highest coefficient of determination (R2 = 0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model.
Analysis of tractable distortion metrics for EEG compression applications.
Bazán-Prieto, Carlos; Blanco-Velasco, Manuel; Cárdenas-Barrera, Julián; Cruz-Roldán, Fernando
2012-07-01
Coding distortion in lossy electroencephalographic (EEG) signal compression methods is evaluated through tractable objective criteria. The percentage root-mean-square difference, which is a global and relative indicator of the quality held by reconstructed waveforms, is the most widely used criterion. However, this parameter does not ensure compliance with clinical standard guidelines that specify limits to allowable noise in EEG recordings. As a result, expert clinicians may have difficulties interpreting the resulting distortion of the EEG for a given value of this parameter. Conversely, the root-mean-square error is an alternative criterion that quantifies distortion in understandable units. In this paper, we demonstrate that the root-mean-square error is better suited to control and to assess the distortion introduced by compression methods. The experiments conducted in this paper show that the use of the root-mean-square error as target parameter in EEG compression allows both clinicians and scientists to infer whether coding error is clinically acceptable or not at no cost for the compression ratio.
Hao, Z Q; Li, C M; Shen, M; Yang, X Y; Li, K H; Guo, L B; Li, X Y; Lu, Y F; Zeng, X Y
2015-03-23
Laser-induced breakdown spectroscopy (LIBS) with partial least squares regression (PLSR) has been applied to measuring the acidity of iron ore, which can be defined by the concentrations of oxides: CaO, MgO, Al₂O₃, and SiO₂. With the conventional internal standard calibration, it is difficult to establish the calibration curves of CaO, MgO, Al₂O₃, and SiO₂ in iron ore due to the serious matrix effects. PLSR is effective to address this problem due to its excellent performance in compensating the matrix effects. In this work, fifty samples were used to construct the PLSR calibration models for the above-mentioned oxides. These calibration models were validated by the 10-fold cross-validation method with the minimum root-mean-square errors (RMSE). Another ten samples were used as a test set. The acidities were calculated according to the estimated concentrations of CaO, MgO, Al₂O₃, and SiO₂ using the PLSR models. The average relative error (ARE) and RMSE of the acidity achieved 3.65% and 0.0048, respectively, for the test samples.
Modeling of surface dust concentrations using neural networks and kriging
NASA Astrophysics Data System (ADS)
Buevich, Alexander G.; Medvedev, Alexander N.; Sergeev, Alexander P.; Tarasov, Dmitry A.; Shichkin, Andrey V.; Sergeeva, Marina V.; Atanasova, T. B.
2016-12-01
Creating models which are able to accurately predict the distribution of pollutants based on a limited set of input data is an important task in environmental studies. In the paper two neural approaches: (multilayer perceptron (MLP)) and generalized regression neural network (GRNN)), and two geostatistical approaches: (kriging and cokriging), are using for modeling and forecasting of dust concentrations in snow cover. The area of study is under the influence of dust emissions from a copper quarry and a several industrial companies. The comparison of two mentioned approaches is conducted. Three indices are used as the indicators of the models accuracy: the mean absolute error (MAE), root mean square error (RMSE) and relative root mean square error (RRMSE). Models based on artificial neural networks (ANN) have shown better accuracy. When considering all indices, the most precision model was the GRNN, which uses as input parameters for modeling the coordinates of sampling points and the distance to the probable emissions source. The results of work confirm that trained ANN may be more suitable tool for modeling of dust concentrations in snow cover.
Combining forecast weights: Why and how?
NASA Astrophysics Data System (ADS)
Yin, Yip Chee; Kok-Haur, Ng; Hock-Eam, Lim
2012-09-01
This paper proposes a procedure called forecast weight averaging which is a specific combination of forecast weights obtained from different methods of constructing forecast weights for the purpose of improving the accuracy of pseudo out of sample forecasting. It is found that under certain specified conditions, forecast weight averaging can lower the mean squared forecast error obtained from model averaging. In addition, we show that in a linear and homoskedastic environment, this superior predictive ability of forecast weight averaging holds true irrespective whether the coefficients are tested by t statistic or z statistic provided the significant level is within the 10% range. By theoretical proofs and simulation study, we have shown that model averaging like, variance model averaging, simple model averaging and standard error model averaging, each produces mean squared forecast error larger than that of forecast weight averaging. Finally, this result also holds true marginally when applied to business and economic empirical data sets, Gross Domestic Product (GDP growth rate), Consumer Price Index (CPI) and Average Lending Rate (ALR) of Malaysia.
Perez-Guaita, David; Kuligowski, Julia; Quintás, Guillermo; Garrigues, Salvador; Guardia, Miguel de la
2013-03-30
Locally weighted partial least squares regression (LW-PLSR) has been applied to the determination of four clinical parameters in human serum samples (total protein, triglyceride, glucose and urea contents) by Fourier transform infrared (FTIR) spectroscopy. Classical LW-PLSR models were constructed using different spectral regions. For the selection of parameters by LW-PLSR modeling, a multi-parametric study was carried out employing the minimum root-mean square error of cross validation (RMSCV) as objective function. In order to overcome the effect of strong matrix interferences on the predictive accuracy of LW-PLSR models, this work focuses on sample selection. Accordingly, a novel strategy for the development of local models is proposed. It was based on the use of: (i) principal component analysis (PCA) performed on an analyte specific spectral region for identifying most similar sample spectra and (ii) partial least squares regression (PLSR) constructed using the whole spectrum. Results found by using this strategy were compared to those provided by PLSR using the same spectral intervals as for LW-PLSR. Prediction errors found by both, classical and modified LW-PLSR improved those obtained by PLSR. Hence, both proposed approaches were useful for the determination of analytes present in a complex matrix as in the case of human serum samples. Copyright © 2013 Elsevier B.V. All rights reserved.
Hazard Function Estimation with Cause-of-Death Data Missing at Random.
Wang, Qihua; Dinse, Gregg E; Liu, Chunling
2012-04-01
Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.
Optical diagnosis of malaria infection in human plasma using Raman spectroscopy
NASA Astrophysics Data System (ADS)
Bilal, Muhammad; Saleem, Muhammad; Amanat, Samina Tufail; Shakoor, Huma Abdul; Rashid, Rashad; Mahmood, Arshad; Ahmed, Mushtaq
2015-01-01
We present the prediction of malaria infection in human plasma using Raman spectroscopy. Raman spectra of malaria-infected samples are compared with those of healthy and dengue virus infected ones for disease recognition. Raman spectra were acquired using a laser at 532 nm as an excitation source and 10 distinct spectral signatures that statistically differentiated malaria from healthy and dengue-infected cases were found. A multivariate regression model has been developed that utilized Raman spectra of 20 malaria-infected, 10 non-malarial with fever, 10 healthy, and 6 dengue-infected samples to optically predict the malaria infection. The model yields the correlation coefficient r2 value of 0.981 between the predicted values and clinically known results of trainee samples, and the root mean square error in cross validation was found to be 0.09; both these parameters validated the model. The model was further blindly tested for 30 unknown suspected samples and found to be 86% accurate compared with the clinical results, with the inaccuracy due to three samples which were predicted in the gray region. Standard deviation and root mean square error in prediction for unknown samples were found to be 0.150 and 0.149, which are accepted for the clinical validation of the model.
Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai
2016-01-01
Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP).
Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai
2016-01-01
Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP). PMID:26829639
Liao, Xiang; Wang, Qing; Fu, Ji-hong; Tang, Jun
2015-09-01
This work was undertaken to establish a quantitative analysis model which can rapid determinate the content of linalool, linalyl acetate of Xinjiang lavender essential oil. Totally 165 lavender essential oil samples were measured by using near infrared absorption spectrum (NIR), after analyzing the near infrared spectral absorption peaks of all samples, lavender essential oil have abundant chemical information and the interference of random noise may be relatively low on the spectral intervals of 7100~4500 cm(-1). Thus, the PLS models was constructed by using this interval for further analysis. 8 abnormal samples were eliminated. Through the clustering method, 157 lavender essential oil samples were divided into 105 calibration set samples and 52 validation set samples. Gas chromatography mass spectrometry (GC-MS) was used as a tool to determine the content of linalool and linalyl acetate in lavender essential oil. Then the matrix was established with the GC-MS raw data of two compounds in combination with the original NIR data. In order to optimize the model, different pretreatment methods were used to preprocess the raw NIR spectral to contrast the spectral filtering effect, after analysizing the quantitative model results of linalool and linalyl acetate, the root mean square error prediction (RMSEP) of orthogonal signal transformation (OSC) was 0.226, 0.558, spectrally, it was the optimum pretreatment method. In addition, forward interval partial least squares (FiPLS) method was used to exclude the wavelength points which has nothing to do with determination composition or present nonlinear correlation, finally 8 spectral intervals totally 160 wavelength points were obtained as the dataset. Combining the data sets which have optimized by OSC-FiPLS with partial least squares (PLS) to establish a rapid quantitative analysis model for determining the content of linalool and linalyl acetate in Xinjiang lavender essential oil, numbers of hidden variables of two components were 8 in the model. The performance of the model was evaluated according to root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP). In the model, RESECV of linalool and linalyl acetate were 0.170 and 0.416, respectively; RM-SEP were 0.188 and 0.364. The results indicated that raw data was pretreated by OSC and FiPLS, the NIR-PLS quantitative analysis model with good robustness, high measurement precision; it could quickly determine the content of linalool and linalyl acetate in lavender essential oil. In addition, the model has a favorable prediction ability. The study also provide a new effective method which could rapid quantitative analysis the major components of Xinjiang lavender essential oil.
Alamar, Priscila D; Caramês, Elem T S; Poppi, Ronei J; Pallone, Juliana A L
2016-07-01
The present study investigated the application of near infrared spectroscopy as a green, quick, and efficient alternative to analytical methods currently used to evaluate the quality (moisture, total sugars, acidity, soluble solids, pH and ascorbic acid) of frozen guava and passion fruit pulps. Fifty samples were analyzed by near infrared spectroscopy (NIR) and reference methods. Partial least square regression (PLSR) was used to develop calibration models to relate the NIR spectra and the reference values. Reference methods indicated adulteration by water addition in 58% of guava pulp samples and 44% of yellow passion fruit pulp samples. The PLS models produced lower values of root mean squares error of calibration (RMSEC), root mean squares error of prediction (RMSEP), and coefficient of determination above 0.7. Moisture and total sugars presented the best calibration models (RMSEP of 0.240 and 0.269, respectively, for guava pulp; RMSEP of 0.401 and 0.413, respectively, for passion fruit pulp) which enables the application of these models to determine adulteration in guava and yellow passion fruit pulp by water or sugar addition. The models constructed for calibration of quality parameters of frozen fruit pulps in this study indicate that NIR spectroscopy coupled with the multivariate calibration technique could be applied to determine the quality of guava and yellow passion fruit pulp. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sampling for mercury at subnanogram per litre concentrations for load estimation in rivers
Colman, J.A.; Breault, R.F.
2000-01-01
Estimation of constituent loads in streams requires collection of stream samples that are representative of constituent concentrations, that is, composites of isokinetic multiple verticals collected along a stream transect. An all-Teflon isokinetic sampler (DH-81) cleaned in 75??C, 4 N HCl was tested using blank, split, and replicate samples to assess systematic and random sample contamination by mercury species. Mean mercury concentrations in field-equipment blanks were low: 0.135 ng??L-1 for total mercury (??Hg) and 0.0086 ng??L-1 for monomethyl mercury (MeHg). Mean square errors (MSE) for ??Hg and MeHg duplicate samples collected at eight sampling stations were not statistically different from MSE of samples split in the laboratory, which represent the analytical and splitting error. Low fieldblank concentrations and statistically equal duplicate- and split-sample MSE values indicate that no measurable contamination was occurring during sampling. Standard deviations associated with example mercury load estimations were four to five times larger, on a relative basis, than standard deviations calculated from duplicate samples, indicating that error of the load determination was primarily a function of the loading model used, not of sampling or analytical methods.
Two Enhancements of the Logarithmic Least-Squares Method for Analyzing Subjective Comparisons
1989-03-25
error term. 1 For this model, the total sum of squares ( SSTO ), defined as n 2 SSTO = E (yi y) i=1 can be partitioned into error and regression sums...of the regression line around the mean value. Mathematically, for the model given by equation A.4, SSTO = SSE + SSR (A.6) A-4 where SSTO is the total...sum of squares (i.e., the variance of the yi’s), SSE is error sum of squares, and SSR is the regression sum of squares. SSTO , SSE, and SSR are given
da Silva, Fabiana E B; Flores, Érico M M; Parisotto, Graciele; Müller, Edson I; Ferrão, Marco F
2016-03-01
An alternative method for the quantification of sulphametoxazole (SMZ) and trimethoprim (TMP) using diffuse reflectance infrared Fourier-transform spectroscopy (DRIFTS) and partial least square regression (PLS) was developed. Interval Partial Least Square (iPLS) and Synergy Partial Least Square (siPLS) were applied to select a spectral range that provided the lowest prediction error in comparison to the full-spectrum model. Fifteen commercial tablet formulations and forty-nine synthetic samples were used. The ranges of concentration considered were 400 to 900 mg g-1SMZ and 80 to 240 mg g-1 TMP. Spectral data were recorded between 600 and 4000 cm-1 with a 4 cm-1 resolution by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS). The proposed procedure was compared to high performance liquid chromatography (HPLC). The results obtained from the root mean square error of prediction (RMSEP), during the validation of the models for samples of sulphamethoxazole (SMZ) and trimethoprim (TMP) using siPLS, demonstrate that this approach is a valid technique for use in quantitative analysis of pharmaceutical formulations. The selected interval algorithm allowed building regression models with minor errors when compared to the full spectrum PLS model. A RMSEP of 13.03 mg g-1for SMZ and 4.88 mg g-1 for TMP was obtained after the selection the best spectral regions by siPLS.
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.
Lin, Johnny; Bentler, Peter M
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
Sando, Roy; Chase, Katherine J.
2017-03-23
A common statistical procedure for estimating streamflow statistics at ungaged locations is to develop a relational model between streamflow and drainage basin characteristics at gaged locations using least squares regression analysis; however, least squares regression methods are parametric and make constraining assumptions about the data distribution. The random forest regression method provides an alternative nonparametric method for estimating streamflow characteristics at ungaged sites and requires that the data meet fewer statistical conditions than least squares regression methods.Random forest regression analysis was used to develop predictive models for 89 streamflow characteristics using Precipitation-Runoff Modeling System simulated streamflow data and drainage basin characteristics at 179 sites in central and eastern Montana. The predictive models were developed from streamflow data simulated for current (baseline, water years 1982–99) conditions and three future periods (water years 2021–38, 2046–63, and 2071–88) under three different climate-change scenarios. These predictive models were then used to predict streamflow characteristics for baseline conditions and three future periods at 1,707 fish sampling sites in central and eastern Montana. The average root mean square error for all predictive models was about 50 percent. When streamflow predictions at 23 fish sampling sites were compared to nearby locations with simulated data, the mean relative percent difference was about 43 percent. When predictions were compared to streamflow data recorded at 21 U.S. Geological Survey streamflow-gaging stations outside of the calibration basins, the average mean absolute percent error was about 73 percent.
NASA Astrophysics Data System (ADS)
Sergeev, A. P.; Tarasov, D. A.; Buevich, A. G.; Shichkin, A. V.; Tyagunov, A. G.; Medvedev, A. N.
2017-06-01
Modeling of spatial distribution of pollutants in the urbanized territories is difficult, especially if there are multiple emission sources. When monitoring such territories, it is often impossible to arrange the necessary detailed sampling. Because of this, the usual methods of analysis and forecasting based on geostatistics are often less effective. Approaches based on artificial neural networks (ANNs) demonstrate the best results under these circumstances. This study compares two models based on ANNs, which are multilayer perceptron (MLP) and generalized regression neural networks (GRNNs) with the base geostatistical method - kriging. Models of the spatial dust distribution in the snow cover around the existing copper quarry and in the area of emissions of a nickel factory were created. To assess the effectiveness of the models three indices were used: the mean absolute error (MAE), the root-mean-square error (RMSE), and the relative root-mean-square error (RRMSE). Taking into account all indices the model of GRNN proved to be the most accurate which included coordinates of the sampling points and the distance to the likely emission source as input parameters for the modeling. Maps of spatial dust distribution in the snow cover were created in the study area. It has been shown that the models based on ANNs were more accurate than the kriging, particularly in the context of a limited data set.
Analyzing Hydraulic Conductivity Sampling Schemes in an Idealized Meandering Stream Model
NASA Astrophysics Data System (ADS)
Stonedahl, S. H.; Stonedahl, F.
2017-12-01
Hydraulic conductivity (K) is an important parameter affecting the flow of water through sediments under streams, which can vary by orders of magnitude within a stream reach. Measuring heterogeneous K distributions in the field is limited by time and resources. This study investigates hypothetical sampling practices within a modeling framework on a highly idealized meandering stream. We generated three sets of 100 hydraulic conductivity grids containing two sands with connectivity values of 0.02, 0.08, and 0.32. We investigated systems with twice as much fast (K=0.1 cm/s) sand as slow sand (K=0.01 cm/s) and the reverse ratio on the same grids. The K values did not vary with depth. For these 600 cases, we calculated the homogenous K value, Keq, that would yield the same flux into the sediments as the corresponding heterogeneous grid. We then investigated sampling schemes with six weighted probability distributions derived from the homogenous case: uniform, flow-paths, velocity, in-stream, flux-in, and flux-out. For each grid, we selected locations from these distributions and compared the arithmetic, geometric, and harmonic means of these lists to the corresponding Keq using the root-mean-square deviation. We found that arithmetic averaging of samples outperformed geometric or harmonic means for all sampling schemes. Of the sampling schemes, flux-in (sampling inside the stream in an inward flux-weighted manner) yielded the least error and flux-out yielded the most error. All three sampling schemes outside of the stream yielded very similar results. Grids with lower connectivity values (fewer and larger clusters) showed the most sensitivity to the choice of sampling scheme, and thus improved the most with the flux-insampling. We also explored the relationship between the number of samples taken and the resulting error. Increasing the number of sampling points reduced error for the arithmetic mean with diminishing returns, but did not substantially reduce error associated with geometric and harmonic means.
NASA Astrophysics Data System (ADS)
Saatkamp, Cassiano Junior; de Almeida, Maurício Liberal; Bispo, Jeyse Aliana Martins; Pinheiro, Antonio Luiz Barbosa; Fernandes, Adriana Barrinha; Silveira, Landulfo, Jr.
2016-03-01
Due to their importance in the regulation of metabolites, the kidneys need continuous monitoring to check for correct functioning, mainly by urea and creatinine urinalysis. This study aimed to develop a model to estimate the concentrations of urea and creatinine in urine by means of Raman spectroscopy (RS) that could be used to diagnose kidney disease. Midstream urine samples were obtained from 54 volunteers with no kidney complaints. Samples were subjected to a standard colorimetric assay of urea and creatinine and submitted to spectroscopic analysis by means of a dispersive Raman spectrometer (830 nm, 350 mW, 30 s). The Raman spectra of urine showed peaks related mainly to urea and creatinine. Partial least squares models were developed using selected Raman bands related to urea and creatinine and the biochemical concentrations in urine measured by the colorimetric method, resulting in r=0.90 and 0.91 for urea and creatinine, respectively, with root mean square error of cross-validation (RMSEcv) of 312 and 25.2 mg/dL, respectively. RS may become a technique for rapid urinalysis, with concentration errors suitable for population screening aimed at the prevention of renal diseases.
Functional Mixed Effects Model for Small Area Estimation.
Maiti, Tapabrata; Sinha, Samiran; Zhong, Ping-Shou
2016-09-01
Functional data analysis has become an important area of research due to its ability of handling high dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models, and in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area level data, and fit a varying coefficient linear mixed effect model where the varying coefficients are semi-parametrically modeled via B-splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors, and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.
Hazard Function Estimation with Cause-of-Death Data Missing at Random
Wang, Qihua; Dinse, Gregg E.; Liu, Chunling
2010-01-01
Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data. PMID:22267874
Niazi, Ali; Zolgharnein, Javad; Afiuni-Zadeh, Somaie
2007-11-01
Ternary mixtures of thiamin, riboflavin and pyridoxal have been simultaneously determined in synthetic and real samples by applications of spectrophotometric and least-squares support vector machines. The calibration graphs were linear in the ranges of 1.0 - 20.0, 1.0 - 10.0 and 1.0 - 20.0 microg ml(-1) with detection limits of 0.6, 0.5 and 0.7 microg ml(-1) for thiamin, riboflavin and pyridoxal, respectively. The experimental calibration matrix was designed with 21 mixtures of these chemicals. The concentrations were varied between calibration graph concentrations of vitamins. The simultaneous determination of these vitamin mixtures by using spectrophotometric methods is a difficult problem, due to spectral interferences. The partial least squares (PLS) modeling and least-squares support vector machines were used for the multivariate calibration of the spectrophotometric data. An excellent model was built using LS-SVM, with low prediction errors and superior performance in relation to PLS. The root mean square errors of prediction (RMSEP) for thiamin, riboflavin and pyridoxal with PLS and LS-SVM were 0.6926, 0.3755, 0.4322 and 0.0421, 0.0318, 0.0457, respectively. The proposed method was satisfactorily applied to the rapid simultaneous determination of thiamin, riboflavin and pyridoxal in commercial pharmaceutical preparations and human plasma samples.
NASA Astrophysics Data System (ADS)
Zhang, Ling; Cai, Yunlong; Li, Chunguang; de Lamare, Rodrigo C.
2017-12-01
In this work, we present low-complexity variable forgetting factor (VFF) techniques for diffusion recursive least squares (DRLS) algorithms. Particularly, we propose low-complexity VFF-DRLS algorithms for distributed parameter and spectrum estimation in sensor networks. For the proposed algorithms, they can adjust the forgetting factor automatically according to the posteriori error signal. We develop detailed analyses in terms of mean and mean square performance for the proposed algorithms and derive mathematical expressions for the mean square deviation (MSD) and the excess mean square error (EMSE). The simulation results show that the proposed low-complexity VFF-DRLS algorithms achieve superior performance to the existing DRLS algorithm with fixed forgetting factor when applied to scenarios of distributed parameter and spectrum estimation. Besides, the simulation results also demonstrate a good match for our proposed analytical expressions.
Some Results on Mean Square Error for Factor Score Prediction
ERIC Educational Resources Information Center
Krijnen, Wim P.
2006-01-01
For the confirmatory factor model a series of inequalities is given with respect to the mean square error (MSE) of three main factor score predictors. The eigenvalues of these MSE matrices are a monotonic function of the eigenvalues of the matrix gamma[subscript rho] = theta[superscript 1/2] lambda[subscript rho] 'psi[subscript rho] [superscript…
ERIC Educational Resources Information Center
Savalei, Victoria
2012-01-01
The fit index root mean square error of approximation (RMSEA) is extremely popular in structural equation modeling. However, its behavior under different scenarios remains poorly understood. The present study generates continuous curves where possible to capture the full relationship between RMSEA and various "incidental parameters," such as…
Liu, Yan-de; Ying, Yi-bin; Fu, Xia-ping
2005-11-01
The nondestructive method for quantifying sugar content (SC) and available acid (VA) of intact apples using diffuse near infrared reflectance and optical fiber sensing techniques were explored in the present research. The standard sample sets and prediction models were established by partial least squares analysis (PLS). A total of 120 Shandong Fuji apples were tested in the wave number of 12,500 - 4000 cm(-1) using Fourier transform near infrared spectroscopy. The results of the research indicated that the nondestructive quantification of SC and VA, gave a high correlation coefficient 0.970 and 0.906, a low root mean square error of prediction (RMSEP) 0.272 and 0.056 2, a low root mean square error of calibration (RMSEC) 0.261 and 0.0677, and a small difference between RMSEP and RMSEC 0.011 a nd 0.0115. It was suggested that the diffuse nearinfrared reflectance technique be feasible for nondestructive determination of apple sugar content in the wave number range of 10,341 - 5461 cm(-1) and for available acid in the wave number range of 10,341 - 3818 cm(-1).
Waskitho, Dri; Lukitaningsih, Endang; Sudjadi; Rohman, Abdul
2016-01-01
Analysis of lard extracted from lipstick formulation containing castor oil has been performed using FTIR spectroscopic method combined with multivariate calibration. Three different extraction methods were compared, namely saponification method followed by liquid/liquid extraction with hexane/dichlorometane/ethanol/water, saponification method followed by liquid/liquid extraction with dichloromethane/ethanol/water, and Bligh & Dyer method using chloroform/methanol/water as extracting solvent. Qualitative and quantitative analysis of lard were performed using principle component (PCA) and partial least square (PLS) analysis, respectively. The results showed that, in all samples prepared by the three extraction methods, PCA was capable of identifying lard at wavelength region of 1200-800 cm -1 with the best result was obtained by Bligh & Dyer method. Furthermore, PLS analysis at the same wavelength region used for qualification showed that Bligh and Dyer was the most suitable extraction method with the highest determination coefficient (R 2 ) and the lowest root mean square error of calibration (RMSEC) as well as root mean square error of prediction (RMSEP) values.
Fadzillah, Nurrulhidayah Ahmad; Man, Yaakob bin Che; Rohman, Abdul; Rosman, Arieff Salleh; Ismail, Amin; Mustafa, Shuhaimi; Khatib, Alfi
2015-01-01
The authentication of food products from the presence of non-allowed components for certain religion like lard is very important. In this study, we used proton Nuclear Magnetic Resonance ((1)H-NMR) spectroscopy for the analysis of butter adulterated with lard by simultaneously quantification of all proton bearing compounds, and consequently all relevant sample classes. Since the spectra obtained were too complex to be analyzed visually by the naked eyes, the classification of spectra was carried out.The multivariate calibration of partial least square (PLS) regression was used for modelling the relationship between actual value of lard and predicted value. The model yielded a highest regression coefficient (R(2)) of 0.998 and the lowest root mean square error calibration (RMSEC) of 0.0091% and root mean square error prediction (RMSEP) of 0.0090, respectively. Cross validation testing evaluates the predictive power of the model. PLS model was shown as good models as the intercept of R(2)Y and Q(2)Y were 0.0853 and -0.309, respectively.
Wiener-matrix image restoration beyond the sampling passband
NASA Technical Reports Server (NTRS)
Rahman, Zia-Ur; Alter-Gartenberg, Rachel; Fales, Carl L.; Huck, Friedrich O.
1991-01-01
A finer-than-sampling-lattice resolution image can be obtained using multiresponse image gathering and Wiener-matrix restoration. The multiresponse image gathering weighs the within-passband and aliased signal components differently, allowing the Wiener-matrix restoration filter to unscramble these signal components and restore spatial frequencies beyond the sampling passband of the photodetector array. A multiresponse images can be reassembled into a single minimum mean square error image with a resolution that is sq rt A times finer than the photodetector-array sampling lattice.
Prediction of ethanol in bottled Chinese rice wine by NIR spectroscopy
NASA Astrophysics Data System (ADS)
Ying, Yibin; Yu, Haiyan; Pan, Xingxiang; Lin, Tao
2006-10-01
To evaluate the applicability of non-invasive visible and near infrared (VIS-NIR) spectroscopy for determining ethanol concentration of Chinese rice wine in square brown glass bottle, transmission spectra of 100 bottled Chinese rice wine samples were collected in the spectral range of 350-1200 nm. Statistical equations were established between the reference data and VIS-NIR spectra by partial least squares (PLS) regression method. Performance of three kinds of mathematical treatment of spectra (original spectra, first derivative spectra and second derivative spectra) were also discussed. The PLS models of original spectra turned out better results, with higher correlation coefficient in calibration (R cal) of 0.89, lower root mean standard error of calibration (RMSEC) of 0.165, and lower root mean standard error of cross validation (RMSECV) of 0.179. Using original spectra, PLS models for ethanol concentration prediction were developed. The R cal and the correlation coefficient in validation (R val) were 0.928 and 0.875, respectively; and the RMSEC and the root mean standard error of validation (RMSEP) were 0.135 (%, v v -1) and 0.177 (%, v v -1), respectively. The results demonstrated that VIS-NIR spectroscopy could be used to predict ethanol concentration in bottled Chinese rice wine.
Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen
2016-02-01
Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation--partial least squares regression (PLSR) method effectively solves the information loss problem of correlation--multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R(2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions.
The influence of the uplink noise on the performance of satellite data transmission systems
NASA Astrophysics Data System (ADS)
Dewal, Vrinda P.
The problem of transmission of binary phase shift keying (BPSK) modulated digital data through a bandlimited nonlinear satellite channel in the presence of uplink, downlink Gaussian noise and intersymbol interface is examined. The satellite transponder is represented by a zero memory bandpass nonlinearity, with AM/AM conversion. The proposed optimum linear receiver structure consists of tapped-delay lines followed by a decision device. The linear receiver is designed to minimize the mean square error that is a function of the intersymbol interface, the uplink and the downlink noise. The minimum mean square error equalizer (MMSE) is derived using the Wiener-Kolmogorov theory. In this receiver, the decision about the transmitted signal is made by taking into account the received sequence of present sample, and the interfering past and future samples, which represent the intersymbol interference (ISI). Illustrative examples of the receiver structures are considered for the nonlinear channels with a symmetrical and asymmetrical frequency responses of the transmitter filter. The transponder nonlinearity is simulated by a polynomial using only the first and the third orders terms. A computer simulation determines the tap gain coefficients of the MMSE equalizer that adapt to the various uplink and downlink noise levels. The performance of the MMSE equalizer is evaluated in terms of an estimate of the average probability of error.
Quantitative Modelling of Trace Elements in Hard Coal.
Smoliński, Adam; Howaniec, Natalia
2016-01-01
The significance of coal in the world economy remains unquestionable for decades. It is also expected to be the dominant fossil fuel in the foreseeable future. The increased awareness of sustainable development reflected in the relevant regulations implies, however, the need for the development and implementation of clean coal technologies on the one hand, and adequate analytical tools on the other. The paper presents the application of the quantitative Partial Least Squares method in modeling the concentrations of trace elements (As, Ba, Cd, Co, Cr, Cu, Mn, Ni, Pb, Rb, Sr, V and Zn) in hard coal based on the physical and chemical parameters of coal, and coal ash components. The study was focused on trace elements potentially hazardous to the environment when emitted from coal processing systems. The studied data included 24 parameters determined for 132 coal samples provided by 17 coal mines of the Upper Silesian Coal Basin, Poland. Since the data set contained outliers, the construction of robust Partial Least Squares models for contaminated data set and the correct identification of outlying objects based on the robust scales were required. These enabled the development of the correct Partial Least Squares models, characterized by good fit and prediction abilities. The root mean square error was below 10% for all except for one the final Partial Least Squares models constructed, and the prediction error (root mean square error of cross-validation) exceeded 10% only for three models constructed. The study is of both cognitive and applicative importance. It presents the unique application of the chemometric methods of data exploration in modeling the content of trace elements in coal. In this way it contributes to the development of useful tools of coal quality assessment.
Quantitative Modelling of Trace Elements in Hard Coal
Smoliński, Adam; Howaniec, Natalia
2016-01-01
The significance of coal in the world economy remains unquestionable for decades. It is also expected to be the dominant fossil fuel in the foreseeable future. The increased awareness of sustainable development reflected in the relevant regulations implies, however, the need for the development and implementation of clean coal technologies on the one hand, and adequate analytical tools on the other. The paper presents the application of the quantitative Partial Least Squares method in modeling the concentrations of trace elements (As, Ba, Cd, Co, Cr, Cu, Mn, Ni, Pb, Rb, Sr, V and Zn) in hard coal based on the physical and chemical parameters of coal, and coal ash components. The study was focused on trace elements potentially hazardous to the environment when emitted from coal processing systems. The studied data included 24 parameters determined for 132 coal samples provided by 17 coal mines of the Upper Silesian Coal Basin, Poland. Since the data set contained outliers, the construction of robust Partial Least Squares models for contaminated data set and the correct identification of outlying objects based on the robust scales were required. These enabled the development of the correct Partial Least Squares models, characterized by good fit and prediction abilities. The root mean square error was below 10% for all except for one the final Partial Least Squares models constructed, and the prediction error (root mean square error of cross–validation) exceeded 10% only for three models constructed. The study is of both cognitive and applicative importance. It presents the unique application of the chemometric methods of data exploration in modeling the content of trace elements in coal. In this way it contributes to the development of useful tools of coal quality assessment. PMID:27438794
Spiral tracing on a touchscreen is influenced by age, hand, implement, and friction.
Heintz, Brittany D; Keenan, Kevin G
2018-01-01
Dexterity impairments are well documented in older adults, though it is unclear how these influence touchscreen manipulation. This study examined age-related differences while tracing on high- and low-friction touchscreens using the finger or stylus. 26 young and 24 older adults completed an Archimedes spiral tracing task on a touchscreen mounted on a force sensor. Root mean square error was calculated to quantify performance. Root mean square error increased by 29.9% for older vs. young adults using the fingertip, but was similar to young adults when using the stylus. Although other variables (e.g., touchscreen usage, sensation, and reaction time) differed between age groups, these variables were not related to increased error in older adults while using their fingertip. Root mean square error also increased on the low-friction surface for all subjects. These findings suggest that utilizing a stylus and increasing surface friction may improve touchscreen use in older adults.
Predicting the random drift of MEMS gyroscope based on K-means clustering and OLS RBF Neural Network
NASA Astrophysics Data System (ADS)
Wang, Zhen-yu; Zhang, Li-jie
2017-10-01
Measure error of the sensor can be effectively compensated with prediction. Aiming at large random drift error of MEMS(Micro Electro Mechanical System))gyroscope, an improved learning algorithm of Radial Basis Function(RBF) Neural Network(NN) based on K-means clustering and Orthogonal Least-Squares (OLS) is proposed in this paper. The algorithm selects the typical samples as the initial cluster centers of RBF NN firstly, candidates centers with K-means algorithm secondly, and optimizes the candidate centers with OLS algorithm thirdly, which makes the network structure simpler and makes the prediction performance better. Experimental results show that the proposed K-means clustering OLS learning algorithm can predict the random drift of MEMS gyroscope effectively, the prediction error of which is 9.8019e-007°/s and the prediction time of which is 2.4169e-006s
NASA Astrophysics Data System (ADS)
Li, Xiongwei; Wang, Zhe; Lui, Siu-Lung; Fu, Yangting; Li, Zheng; Liu, Jianming; Ni, Weidou
2013-10-01
A bottleneck of the wide commercial application of laser-induced breakdown spectroscopy (LIBS) technology is its relatively high measurement uncertainty. A partial least squares (PLS) based normalization method was proposed to improve pulse-to-pulse measurement precision for LIBS based on our previous spectrum standardization method. The proposed model utilized multi-line spectral information of the measured element and characterized the signal fluctuations due to the variation of plasma characteristic parameters (plasma temperature, electron number density, and total number density) for signal uncertainty reduction. The model was validated by the application of copper concentration prediction in 29 brass alloy samples. The results demonstrated an improvement on both measurement precision and accuracy over the generally applied normalization as well as our previously proposed simplified spectrum standardization method. The average relative standard deviation (RSD), average of the standard error (error bar), the coefficient of determination (R2), the root-mean-square error of prediction (RMSEP), and average value of the maximum relative error (MRE) were 1.80%, 0.23%, 0.992, 1.30%, and 5.23%, respectively, while those for the generally applied spectral area normalization were 3.72%, 0.71%, 0.973, 1.98%, and 14.92%, respectively.
NASA Technical Reports Server (NTRS)
Amling, G. E.; Holms, A. G.
1973-01-01
A computer program is described that performs a statistical multiple-decision procedure called chain pooling. It uses a number of mean squares assigned to error variance that is conditioned on the relative magnitudes of the mean squares. The model selection is done according to user-specified levels of type 1 or type 2 error probabilities.
[Determination of Carbaryl in Rice by Using FT Far-IR and THz-TDS Techniques].
Sun, Tong; Zhang, Zhuo-yong; Xiang, Yu-hong; Zhu, Ruo-hua
2016-02-01
Determination of carbaryl in rice by using Fourier transform far-infrared (FT- Far-IR) and terahertz time-domain spectroscopy (THz-TDS) combined with chemometrics was studied and the spectral characteristics of carbaryl in terahertz region was investigated. Samples were prepared by mixing carbaryl at different amounts with rice powder, and then a 13 mm diameter, and about 1 mm thick pellet with polyethylene (PE) as matrix was compressed under the pressure of 5-7 tons. Terahertz time domain spectra of the pellets were measured at 0.5~1.5 THz, and the absorption spectra at 1.6. 3 THz were acquired with Fourier transform far-IR spectroscopy. The method of sample preparation is so simple that it does not need separation and enrichment. The absorption peaks in the frequency range of 1.8-6.3 THz have been found at 3.2 and 5.2 THz by Far-IR. There are several weak absorption peaks in the range of 0.5-1.5 THz by THz-TDS. These two kinds of characteristic absorption spectra were randomly divided into calibration set and prediction set by leave-N-out cross-validation, respectively. Finally, the partial least squares regression (PLSR) method was used to establish two quantitative analysis models. The root mean square error (RMSECV), the root mean square errors of prediction (RMSEP) and the correlation coefficient of the prediction are used as a basis for the model of performance evaluation. For the R,, a higher value is better; for the RMSEC and RMSEP, lower is better. The obtained results demonstrated that the predictive accuracy of. the two models with PLSR method were satisfactory. For the FT-Far-IR model, the correlation between actual and predicted values of prediction samples (Rv) was 0.99. The root mean square error of prediction set (RMSEP) was 0.008 6, and for calibration set (RMSECV) was 0.007 7. For the THz-TDS model, R. was 0. 98, RMSEP was 0.004 4, and RMSECV was 0.002 5. Results proved that the technology of FT-Far-IR and THz- TDS can be a feasible tool for quantitative determination of carbaryl in rice. This paper provides a new method for the quantitative determination pesticide in other grain samples.
NASA Astrophysics Data System (ADS)
Reis, D. S.; Stedinger, J. R.; Martins, E. S.
2005-10-01
This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.
Dabkiewicz, Vanessa Emídio; de Mello Pereira Abrantes, Shirley; Cassella, Ricardo Jorgensen
2018-08-05
Near infrared spectroscopy (NIR) with diffuse reflectance associated to multivariate calibration has as main advantage the replacement of the physical separation of interferents by the mathematical separation of their signals, rapidly with no need for reagent consumption, chemical waste production or sample manipulation. Seeking to optimize quality control analyses, this spectroscopic analytical method was shown to be a viable alternative to the classical Kjeldahl method for the determination of protein nitrogen in yellow fever vaccine. The most suitable multivariate calibration was achieved by the partial least squares method (PLS) with multiplicative signal correction (MSC) treatment and data mean centering (MC), using a minimum number of latent variables (LV) equal to 1, with the lower value of the square root of the mean squared prediction error (0.00330) associated with the highest percentage value (91%) of samples. Accuracy ranged 95 to 105% recovery in the 4000-5184 cm -1 region. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Yan, Wen-juan; Yang, Ming; He, Guo-quan; Qin, Lin; Li, Gang
2014-11-01
In order to identify the diabetic patients by using tongue near-infrared (NIR) spectrum - a spectral classification model of the NIR reflectivity of the tongue tip is proposed, based on the partial least square (PLS) method. 39sample data of tongue tip's NIR spectra are harvested from healthy people and diabetic patients , respectively. After pretreatment of the reflectivity, the spectral data are set as the independent variable matrix, and information of classification as the dependent variables matrix, Samples were divided into two groups - i.e. 53 samples as calibration set and 25 as prediction set - then the PLS is used to build the classification model The constructed modelfrom the 53 samples has the correlation of 0.9614 and the root mean square error of cross-validation (RMSECV) of 0.1387.The predictions for the 25 samples have the correlation of 0.9146 and the RMSECV of 0.2122.The experimental result shows that the PLS method can achieve good classification on features of healthy people and diabetic patients.
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis
Lin, Johnny; Bentler, Peter M.
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511
Classification and Evaluation of Coherent Synchronous Sampled-Data Telemetry Systems
NASA Technical Reports Server (NTRS)
Viterbi, Andrew
1961-01-01
This paper analyzes the various types of continuous wave and pulse modulation for the transmission of sampled data over channels perturbed by white gaussian noise. Optimal coherent synchronous detection schemes for all the different modulation methods are shown to belong to one of two general classes: linear synchronous detection and correlation detection. The figures of merit, mean-square signal-to-error ratio and bandwidth occupancy, are determined for each system and compared.
Yang, Shun-hua; Zhang, Hai-tao; Guo, Long; Ren, Yan
2015-06-01
Relative elevation and stream power index were selected as auxiliary variables based on correlation analysis for mapping soil organic matter. Geographically weighted regression Kriging (GWRK) and regression Kriging (RK) were used for spatial interpolation of soil organic matter and compared with ordinary Kriging (OK), which acts as a control. The results indicated that soil or- ganic matter was significantly positively correlated with relative elevation whilst it had a significantly negative correlation with stream power index. Semivariance analysis showed that both soil organic matter content and its residuals (including ordinary least square regression residual and GWR resi- dual) had strong spatial autocorrelation. Interpolation accuracies by different methods were esti- mated based on a data set of 98 validation samples. Results showed that the mean error (ME), mean absolute error (MAE) and root mean square error (RMSE) of RK were respectively 39.2%, 17.7% and 20.6% lower than the corresponding values of OK, with a relative-improvement (RI) of 20.63. GWRK showed a similar tendency, having its ME, MAE and RMSE to be respectively 60.6%, 23.7% and 27.6% lower than those of OK, with a RI of 59.79. Therefore, both RK and GWRK significantly improved the accuracy of OK interpolation of soil organic matter due to their in- corporation of auxiliary variables. In addition, GWRK performed obviously better than RK did in this study, and its improved performance should be attributed to the consideration of sample spatial locations.
ERIC Educational Resources Information Center
Pan, Tianshu; Yin, Yue
2012-01-01
In the discussion of mean square difference (MSD) and standard error of measurement (SEM), Barchard (2012) concluded that the MSD between 2 sets of test scores is greater than 2(SEM)[superscript 2] and SEM underestimates the score difference between 2 tests when the 2 tests are not parallel. This conclusion has limitations for 2 reasons. First,…
ERIC Educational Resources Information Center
Li, Libo; Bentler, Peter M.
2011-01-01
MacCallum, Browne, and Cai (2006) proposed a new framework for evaluation and power analysis of small differences between nested structural equation models (SEMs). In their framework, the null and alternative hypotheses for testing a small difference in fit and its related power analyses were defined by some chosen root-mean-square error of…
Park, Sun-Young; Park, Eun-Ja; Suh, Hae Sun; Ha, Dongmun; Lee, Eui-Kyung
2017-08-01
Although nonpreference-based disease-specific measures are widely used in clinical studies, they cannot generate utilities for economic evaluation. A solution to this problem is to estimate utilities from disease-specific instruments using the mapping function. This study aimed to develop a transformation model for mapping the pruritus-visual analog scale (VAS) to the EuroQol 5-Dimension 3-Level (EQ-5D-3L) utility index in pruritus. A cross-sectional survey was conducted with a sample (n = 268) drawn from the general population of South Korea. Data were randomly divided into 2 groups, one for estimating and the other for validating mapping models. To select the best model, we developed and compared 3 separate models using demographic information and the pruritus-VAS as independent variables. The predictive performance was assessed using the mean absolute deviation and root mean square error in a separate dataset. Among the 3 models, model 2 using age, age squared, sex, and the pruritus-VAS as independent variables had the best performance based on the goodness of fit and model simplicity, with a log likelihood of 187.13. The 3 models had similar precision errors based on mean absolute deviation and root mean square error in the validation dataset. No statistically significant difference was observed between the mean observed and predicted values in all models. In conclusion, model 2 was chosen as the preferred mapping model. Outcomes measured as the pruritus-VAS can be transformed into the EQ-5D-3L utility index using this mapping model, which makes an economic evaluation possible when only pruritus-VAS data are available. © 2017 John Wiley & Sons, Ltd.
Zhang, Ya-Fei; Zuo, Xiang-Yun; Bi, Yu-An; Wu, Jian-Xiong; Wang, Zhen-Zhong; L, Ping; Xiao, Wei
2014-08-01
To establish a rapid quantitative analysis method for the content of chlorogenic acid and solid content in the extraction liquid concentration process during the production of Reduning injection by using the near-infrared (NIR) spectroscopy, in order to reflect the concentration state in a real-time manner and really realize the quality control of concentrating process of the extraction and concentration process. The samples during the Jinqing extraction liquid concentration process were collected. After the removal of abnormal samples, the spectra pretreatment and the wave band selection, the quantitative calibration model between NIR spectra and chlorogenic acid HPLC analytical value and solid content was established by using PLS algorithm, and unknown samples were predicted. The correlation coefficients between the chlorogenic acid content and the solid content were respectively 0.992 1 and 0.994 0, and the correlation coefficients of the verification model were respectively 0.994 4 and 0.998 4, with the root mean square error of calibration (RMSEC) of 0.814 6 and 2.656 1 and the root mean square error of prediction (RMSEP) of 0.704 6 and 1.876 7 respectively, and the relative standard errors of predictions (RSEP) were 6.01% and 2.93% respectively. The method is simple, rapid, nondestructive, accurate and reliable, thus could be adopted for the fast monitoring of the chlorogenic acid content and the solid content during the concentration process of Reduning injection extraction liquid.
Marchetti, Bárbara V; Candotti, Cláudia T; Raupp, Eduardo G; Oliveira, Eduardo B C; Furlanetto, Tássia S; Loss, Jefferson F
The purpose of this study was to assess a radiographic method for spinal curvature evaluation in children, based on spinous processes, and identify its normality limits. The sample consisted of 90 radiographic examinations of the spines of children in the sagittal plane. Thoracic and lumbar curvatures were evaluated using angular (apex angle [AA]) and linear (sagittal arrow [SA]) measurements based on the spinous processes. The same curvatures were also evaluated using the Cobb angle (CA) method, which is considered the gold standard. For concurrent validity (AA vs CA), Pearson's product-moment correlation coefficient, root-mean-square error, Pitman- Morgan test, and Bland-Altman analysis were used. For reproducibility (AA, SA, and CA), the intraclass correlation coefficient, standard error of measurement, and minimal detectable change measurements were used. A significant correlation was found between CA and AA measurements, as was a low root-mean-square error. The mean difference between the measurements was 0° for thoracic and lumbar curvatures, and the mean standard deviations of the differences were ±5.9° and 6.9°, respectively. The intraclass correlation coefficients of AA and SA were similar to or higher than the gold standard (CA). The standard error of measurement and minimal detectable change of the AA were always lower than the CA. This study determined the concurrent validity, as well as intra- and interrater reproducibility, of the radiographic measurements of kyphosis and lordosis in children. Copyright © 2017. Published by Elsevier Inc.
Analysis of pork adulteration in beef meatball using Fourier transform infrared (FTIR) spectroscopy.
Rohman, A; Sismindari; Erwanto, Y; Che Man, Yaakob B
2011-05-01
Meatball is one of the favorite foods in Indonesia. The adulteration of pork in beef meatball is frequently occurring. This study was aimed to develop a fast and non destructive technique for the detection and quantification of pork in beef meatball using Fourier transform infrared (FTIR) spectroscopy and partial least square (PLS) calibration. The spectral bands associated with pork fat (PF), beef fat (BF), and their mixtures in meatball formulation were scanned, interpreted, and identified by relating them to those spectroscopically representative to pure PF and BF. For quantitative analysis, PLS regression was used to develop a calibration model at the selected fingerprint regions of 1200-1000 cm(-1). The equation obtained for the relationship between actual PF value and FTIR predicted values in PLS calibration model was y = 0.999x + 0.004, with coefficient of determination (R(2)) and root mean square error of calibration are 0.999 and 0.442, respectively. The PLS calibration model was subsequently used for the prediction of independent samples using laboratory made meatball samples containing the mixtures of BF and PF. Using 4 principal components, root mean square error of prediction is 0.742. The results showed that FTIR spectroscopy can be used for the detection and quantification of pork in beef meatball formulation for Halal verification purposes. Copyright © 2010 The American Meat Science Association. Published by Elsevier Ltd. All rights reserved.
Mathematical modeling of wastewater-derived biodegradable dissolved organic nitrogen.
Simsek, Halis
2016-11-01
Wastewater-derived dissolved organic nitrogen (DON) typically constitutes the majority of total dissolved nitrogen (TDN) discharged to surface waters from advanced wastewater treatment plants (WWTPs). When considering the stringent regulations on nitrogen discharge limits in sensitive receiving waters, DON becomes problematic and needs to be reduced. Biodegradable DON (BDON) is a portion of DON that is biologically degradable by bacteria when the optimum environmental conditions are met. BDON in a two-stage trickling filter WWTP was estimated using artificial intelligence techniques, such as adaptive neuro-fuzzy inference systems, multilayer perceptron, radial basis neural networks (RBNN), and generalized regression neural networks. Nitrite, nitrate, ammonium, TDN, and DON data were used as input neurons. Wastewater samples were collected from four different locations in the plant. Model performances were evaluated using root mean square error, mean absolute error, mean bias error, and coefficient of determination statistics. Modeling results showed that the R(2) values were higher than 0.85 in all four models for all wastewater samples, except only R(2) in the final effluent sample for RBNN modeling was low (0.52). Overall, it was found that all four computing techniques could be employed successfully to predict BDON.
Detection of Tetracycline in Milk using NIR Spectroscopy and Partial Least Squares
NASA Astrophysics Data System (ADS)
Wu, Nan; Xu, Chenshan; Yang, Renjie; Ji, Xinning; Liu, Xinyuan; Yang, Fan; Zeng, Ming
2018-02-01
The feasibility of measuring tetracycline in milk was investigated by near infrared (NIR) spectroscopic technique combined with partial least squares (PLS) method. The NIR transmittance spectra of 40 pure milk samples and 40 tetracycline adulterated milk samples with different concentrations (from 0.005 to 40 mg/L) were obtained. The pure milk and tetracycline adulterated milk samples were properly assigned to the categories with 100% accuracy in the calibration set, and the rate of correct classification of 96.3% was obtained in the prediction set. For the quantitation of tetracycline in adulterated milk, the root mean squares errors for calibration and prediction models were 0.61 mg/L and 4.22 mg/L, respectively. The PLS model had good fitting effect in calibration set, however its predictive ability was limited, especially for low tetracycline concentration samples. Totally, this approach can be considered as a promising tool for discrimination of tetracycline adulterated milk, as a supplement to high performance liquid chromatography.
Wood, Clive; Alwati, Abdolati; Halsey, Sheelagh; Gough, Tim; Brown, Elaine; Kelly, Adrian; Paradkar, Anant
2016-09-10
The use of near infra red spectroscopy to predict the concentration of two pharmaceutical co-crystals; 1:1 ibuprofen-nicotinamide (IBU-NIC) and 1:1 carbamazepine-nicotinamide (CBZ-NIC) has been evaluated. A partial least squares (PLS) regression model was developed for both co-crystal pairs using sets of standard samples to create calibration and validation data sets with which to build and validate the models. Parameters such as the root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP) and correlation coefficient were used to assess the accuracy and linearity of the models. Accurate PLS regression models were created for both co-crystal pairs which can be used to predict the co-crystal concentration in a powder mixture of the co-crystal and the active pharmaceutical ingredient (API). The IBU-NIC model had smaller errors than the CBZ-NIC model, possibly due to the complex CBZ-NIC spectra which could reflect the different arrangement of hydrogen bonding associated with the co-crystal compared to the IBU-NIC co-crystal. These results suggest that NIR spectroscopy can be used as a PAT tool during a variety of pharmaceutical co-crystal manufacturing methods and the presented data will facilitate future offline and in-line NIR studies involving pharmaceutical co-crystals. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Doble, Brett; Lorgelly, Paula
2016-04-01
To determine the external validity of existing mapping algorithms for predicting EQ-5D-3L utility values from EORTC QLQ-C30 responses and to establish their generalizability in different types of cancer. A main analysis (pooled) sample of 3560 observations (1727 patients) and two disease severity patient samples (496 and 93 patients) with repeated observations over time from Cancer 2015 were used to validate the existing algorithms. Errors were calculated between observed and predicted EQ-5D-3L utility values using a single pooled sample and ten pooled tumour type-specific samples. Predictive accuracy was assessed using mean absolute error (MAE) and standardized root-mean-squared error (RMSE). The association between observed and predicted EQ-5D utility values and other covariates across the distribution was tested using quantile regression. Quality-adjusted life years (QALYs) were calculated using observed and predicted values to test responsiveness. Ten 'preferred' mapping algorithms were identified. Two algorithms estimated via response mapping and ordinary least-squares regression using dummy variables performed well on number of validation criteria, including accurate prediction of the best and worst QLQ-C30 health states, predicted values within the EQ-5D tariff range, relatively small MAEs and RMSEs, and minimal differences between estimated QALYs. Comparison of predictive accuracy across ten tumour type-specific samples highlighted that algorithms are relatively insensitive to grouping by tumour type and affected more by differences in disease severity. Two of the 'preferred' mapping algorithms suggest more accurate predictions, but limitations exist. We recommend extensive scenario analyses if mapped utilities are used in cost-utility analyses.
K.P. Poudel; H. Temesgen
2016-01-01
Estimating aboveground biomass and its components requires sound statistical formulation and evaluation. Using data collected from 55 destructively sampled trees in different parts of Oregon, we evaluated the performance of three groups of methods to estimate total aboveground biomass and (or) its components based on the bias and root mean squared error (RMSE) that...
USDA-ARS?s Scientific Manuscript database
For any analytical system the population mean (mu) number of entities (e.g., cells or molecules) per tested volume, surface area, or mass also defines the population standard deviation (sigma = square root of mu ). For a preponderance of analytical methods, sigma is very small relative to mu due to...
Comparison of estimators of standard deviation for hydrologic time series
Tasker, Gary D.; Gilroy, Edward J.
1982-01-01
Unbiasing factors as a function of serial correlation, ρ, and sample size, n for the sample standard deviation of a lag one autoregressive model were generated by random number simulation. Monte Carlo experiments were used to compare the performance of several alternative methods for estimating the standard deviation σ of a lag one autoregressive model in terms of bias, root mean square error, probability of underestimation, and expected opportunity design loss. Three methods provided estimates of σ which were much less biased but had greater mean square errors than the usual estimate of σ: s = (1/(n - 1) ∑ (xi −x¯)2)½. The three methods may be briefly characterized as (1) a method using a maximum likelihood estimate of the unbiasing factor, (2) a method using an empirical Bayes estimate of the unbiasing factor, and (3) a robust nonparametric estimate of σ suggested by Quenouille. Because s tends to underestimate σ, its use as an estimate of a model parameter results in a tendency to underdesign. If underdesign losses are considered more serious than overdesign losses, then the choice of one of the less biased methods may be wise.
Radon-222 concentrations in ground water and soil gas on Indian reservations in Wisconsin
DeWild, John F.; Krohelski, James T.
1995-01-01
For sites with wells finished in the sand and gravel aquifer, the coefficient of determination (R2) of the regression of concentration of radon-222 in ground water as a function of well depth is 0.003 and the significance level is 0.32, which indicates that there is not a statistically significant relation between radon-222 concentrations in ground water and well depth. The coefficient of determination of the regression of radon-222 in ground water and soil gas is 0.19 and the root mean square error of the regression line is 271 picocuries per liter. Even though the significance level (0.036) indicates a statistical relation, the root mean square error of the regression is so large that the regression equation would not give reliable predictions. Because of an inadequate number of samples, similar statistical analyses could not be performed for sites with wells finished in the crystalline and sedimentary bedrock aquifers.
Zhang, Bing-Fang; Yuan, Li-Bo; Kong, Qing-Ming; Shen, Wei-Zheng; Zhang, Bing-Xiu; Liu, Cheng-Hai
2014-10-01
In the present study, a new method using near infrared spectroscopy combined with optical fiber sensing technology was applied to the analysis of hogwash oil in blended oil. The 50 samples were a blend of frying oil and "nine three" soybean oil according to a certain volume ratio. The near infrared transmission spectroscopies were collected and the quantitative analysis model of frying oil was established by partial least squares (PLS) and BP artificial neural network The coefficients of determina- tion of calibration sets were 0.908 and 0.934 respectively. The coefficients of determination of validation sets were 0.961 and 0.952, the root mean square error of calibrations (RMSEC) was 0.184 and 0.136, and the root mean square error of predictions (RMSEP) was all 0.111 6. They conform to the model application requirement. At the same time, frying oil and qualified edible oil were identified with the principal component analysis (PCA), and the accurate rate was 100%. The experiment proved that near infrared spectral technology not only can quickly and accurately identify hogwash oil, but also can quantitatively detect hog- wash oil. This method has a wide application prospect in the detection of oil.
An empirical model for estimating solar radiation in the Algerian Sahara
NASA Astrophysics Data System (ADS)
Benatiallah, Djelloul; Benatiallah, Ali; Bouchouicha, Kada; Hamouda, Messaoud; Nasri, Bahous
2018-05-01
The present work aims to determine the empirical model R.sun that will allow us to evaluate the solar radiation flues on a horizontal plane and in clear-sky on the located Adrar city (27°18 N and 0°11 W) of Algeria and compare with the results measured at the localized site. The expected results of this comparison are of importance for the investment study of solar systems (solar power plants for electricity production, CSP) and also for the design and performance analysis of any system using the solar energy. Statistical indicators used to evaluate the accuracy of the model where the mean bias error (MBE), root mean square error (RMSE) and coefficient of determination. The results show that for global radiation, the daily correlation coefficient is 0.9984. The mean absolute percentage error is 9.44 %. The daily mean bias error is -7.94 %. The daily root mean square error is 12.31 %.
The Influence of Dimensionality on Estimation in the Partial Credit Model.
ERIC Educational Resources Information Center
De Ayala, R. J.
1995-01-01
The effect of multidimensionality on partial credit model parameter estimation was studied with noncompensatory and compensatory data. Analysis results, consisting of root mean square error bias, Pearson product-moment corrections, standardized root mean squared differences, standardized differences between means, and descriptive statistics…
Visualizing the Sample Standard Deviation
ERIC Educational Resources Information Center
Sarkar, Jyotirmoy; Rashid, Mamunur
2017-01-01
The standard deviation (SD) of a random sample is defined as the square-root of the sample variance, which is the "mean" squared deviation of the sample observations from the sample mean. Here, we interpret the sample SD as the square-root of twice the mean square of all pairwise half deviations between any two sample observations. This…
Application of Least Mean Square Algorithms to Spacecraft Vibration Compensation
NASA Technical Reports Server (NTRS)
Woodard , Stanley E.; Nagchaudhuri, Abhijit
1998-01-01
This paper describes the application of the Least Mean Square (LMS) algorithm in tandem with the Filtered-X Least Mean Square algorithm for controlling a science instrument's line-of-sight pointing. Pointing error is caused by a periodic disturbance and spacecraft vibration. A least mean square algorithm is used on-orbit to produce the transfer function between the instrument's servo-mechanism and error sensor. The result is a set of adaptive transversal filter weights tuned to the transfer function. The Filtered-X LMS algorithm, which is an extension of the LMS, tunes a set of transversal filter weights to the transfer function between the disturbance source and the servo-mechanism's actuation signal. The servo-mechanism's resulting actuation counters the disturbance response and thus maintains accurate science instrumental pointing. A simulation model of the Upper Atmosphere Research Satellite is used to demonstrate the algorithms.
Validation of the Family Inpatient Communication Survey.
Torke, Alexia M; Monahan, Patrick; Callahan, Christopher M; Helft, Paul R; Sachs, Greg A; Wocial, Lucia D; Slaven, James E; Montz, Kianna; Inger, Lev; Burke, Emily S
2017-01-01
Although many family members who make surrogate decisions report problems with communication, there is no validated instrument to accurately measure surrogate/clinician communication for older adults in the acute hospital setting. The objective of this study was to validate a survey of surrogate-rated communication quality in the hospital that would be useful to clinicians, researchers, and health systems. After expert review and cognitive interviewing (n = 10 surrogates), we enrolled 350 surrogates (250 development sample and 100 validation sample) of hospitalized adults aged 65 years and older from three hospitals in one metropolitan area. The communication survey and a measure of decision quality were administered within hospital days 3 and 10. Mental health and satisfaction measures were administered six to eight weeks later. Factor analysis showed support for both one-factor (Total Communication) and two-factor models (Information and Emotional Support). Item reduction led to a final 30-item scale. For the validation sample, internal reliability (Cronbach's alpha) was 0.96 (total), 0.94 (Information), and 0.90 (Emotional Support). Confirmatory factor analysis fit statistics were adequate (one-factor model, comparative fit index = 0.981, root mean square error of approximation = 0.62, weighted root mean square residual = 1.011; two-factor model comparative fit index = 0.984, root mean square error of approximation = 0.055, weighted root mean square residual = 0.930). Total score and subscales showed significant associations with the Decision Conflict Scale (Pearson correlation -0.43, P < 0.001 for total score). Emotional Support was associated with improved mental health outcomes at six to eight weeks, such as anxiety (-0.19 P < 0.001), and Information was associated with satisfaction with the hospital stay (0.49, P < 0.001). The survey shows high reliability and validity in measuring communication experiences for hospital surrogates. The scale has promise for measurement of communication quality and is predictive of important outcomes, such as surrogate satisfaction and well-being. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
PLS-LS-SVM based modeling of ATR-IR as a robust method in detection and qualification of alprazolam
NASA Astrophysics Data System (ADS)
Parhizkar, Elahehnaz; Ghazali, Mohammad; Ahmadi, Fatemeh; Sakhteman, Amirhossein
2017-02-01
According to the United States pharmacopeia (USP), Gold standard technique for Alprazolam determination in dosage forms is HPLC, an expensive and time-consuming method that is not easy to approach. In this study chemometrics assisted ATR-IR was introduced as an alternative method that produce similar results in fewer time and energy consumed manner. Fifty-eight samples containing different concentrations of commercial alprazolam were evaluated by HPLC and ATR-IR method. A preprocessing approach was applied to convert raw data obtained from ATR-IR spectra to normal matrix. Finally, a relationship between alprazolam concentrations achieved by HPLC and ATR-IR data was established using PLS-LS-SVM (partial least squares least squares support vector machines). Consequently, validity of the method was verified to yield a model with low error values (root mean square error of cross validation equal to 0.98). The model was able to predict about 99% of the samples according to R2 of prediction set. Response permutation test was also applied to affirm that the model was not assessed by chance correlations. At conclusion, ATR-IR can be a reliable method in manufacturing process in detection and qualification of alprazolam content.
Reconstruction of regional mean temperature for East Asia since 1900s and its uncertainties
NASA Astrophysics Data System (ADS)
Hua, W.
2017-12-01
Regional average surface air temperature (SAT) is one of the key variables often used to investigate climate change. Unfortunately, because of the limited observations over East Asia, there were also some gaps in the observation data sampling for regional mean SAT analysis, which was important to estimate past climate change. In this study, the regional average temperature of East Asia since 1900s is calculated by the Empirical Orthogonal Function (EOF)-based optimal interpolation (OA) method with considering the data errors. The results show that our estimate is more precise and robust than the results from simple average, which provides a better way for past climate reconstruction. In addition to the reconstructed regional average SAT anomaly time series, we also estimated uncertainties of reconstruction. The root mean square error (RMSE) results show that the the error decreases with respect to time, and are not sufficiently large to alter the conclusions on the persist warming in East Asia during twenty-first century. Moreover, the test of influence of data error on reconstruction clearly shows the sensitivity of reconstruction to the size of the data error.
The in vivo wear resistance of 12 composite resins.
Lang, B R; Bloem, T J; Powers, J M; Wang, R F
1992-09-01
The in vivo wear resistance of 12 composite resins were compared with an amalgam control using the Latin Square experimental design. Sixteen edentulous patients wearing specially designed complete dentures formed the experimental population. The Michigan Computer Graphics Measurement System was used to digitize the surface of the control and composite resin samples before and after 3-month test periods to obtain wear data. The 12 composite resins selected for this investigation based on their published composite classification types were seven fine particle composites, three blends, and two microfilled composite resins. The Latin Square experimental design was found to be valid with the factor of material being statistically different at the 5% level of significance. Wear was computed as volume loss (mm3/mm2), and all of the composites studied had more wear than the amalgam control (P = .001). After 3 months, the mean (error) of wear of the amalgam was 0.028 (0.006). Means (error) of wear for the 12 composites were ranked from most to least wear by mean wear volume loss. The absence of any relationship between mean wear volume loss and the volume percentage filler was confirmed by the correlation coefficient r = -0.158.
Evaluating the utility of mid-infrared spectral subspaces for predicting soil properties.
Sila, Andrew M; Shepherd, Keith D; Pokhariyal, Ganesh P
2016-04-15
We propose four methods for finding local subspaces in large spectral libraries. The proposed four methods include (a) cosine angle spectral matching; (b) hit quality index spectral matching; (c) self-organizing maps and (d) archetypal analysis methods. Then evaluate prediction accuracies for global and subspaces calibration models. These methods were tested on a mid-infrared spectral library containing 1907 soil samples collected from 19 different countries under the Africa Soil Information Service project. Calibration models for pH, Mehlich-3 Ca, Mehlich-3 Al, total carbon and clay soil properties were developed for the whole library and for the subspace. Root mean square error of prediction was used to evaluate predictive performance of subspace and global models. The root mean square error of prediction was computed using a one-third-holdout validation set. Effect of pretreating spectra with different methods was tested for 1st and 2nd derivative Savitzky-Golay algorithm, multiplicative scatter correction, standard normal variate and standard normal variate followed by detrending methods. In summary, the results show that global models outperformed the subspace models. We, therefore, conclude that global models are more accurate than the local models except in few cases. For instance, sand and clay root mean square error values from local models from archetypal analysis method were 50% poorer than the global models except for subspace models obtained using multiplicative scatter corrected spectra with which were 12% better. However, the subspace approach provides novel methods for discovering data pattern that may exist in large spectral libraries.
NASA Technical Reports Server (NTRS)
Bell, Thomas L.; Kundu, Prasun K.; Kummerow, Christian D.; Einaudi, Franco (Technical Monitor)
2000-01-01
Quantitative use of satellite-derived maps of monthly rainfall requires some measure of the accuracy of the satellite estimates. The rainfall estimate for a given map grid box is subject to both remote-sensing error and, in the case of low-orbiting satellites, sampling error due to the limited number of observations of the grid box provided by the satellite. A simple model of rain behavior predicts that Root-mean-square (RMS) random error in grid-box averages should depend in a simple way on the local average rain rate, and the predicted behavior has been seen in simulations using surface rain-gauge and radar data. This relationship was examined using satellite SSM/I data obtained over the western equatorial Pacific during TOGA COARE. RMS error inferred directly from SSM/I rainfall estimates was found to be larger than predicted from surface data, and to depend less on local rain rate than was predicted. Preliminary examination of TRMM microwave estimates shows better agreement with surface data. A simple method of estimating rms error in satellite rainfall estimates is suggested, based on quantities that can be directly computed from the satellite data.
Estimating Model Prediction Error: Should You Treat Predictions as Fixed or Random?
NASA Technical Reports Server (NTRS)
Wallach, Daniel; Thorburn, Peter; Asseng, Senthold; Challinor, Andrew J.; Ewert, Frank; Jones, James W.; Rotter, Reimund; Ruane, Alexander
2016-01-01
Crop models are important tools for impact assessment of climate change, as well as for exploring management options under current climate. It is essential to evaluate the uncertainty associated with predictions of these models. We compare two criteria of prediction error; MSEP fixed, which evaluates mean squared error of prediction for a model with fixed structure, parameters and inputs, and MSEP uncertain( X), which evaluates mean squared error averaged over the distributions of model structure, inputs and parameters. Comparison of model outputs with data can be used to estimate the former. The latter has a squared bias term, which can be estimated using hindcasts, and a model variance term, which can be estimated from a simulation experiment. The separate contributions to MSEP uncertain (X) can be estimated using a random effects ANOVA. It is argued that MSEP uncertain (X) is the more informative uncertainty criterion, because it is specific to each prediction situation.
Measures of precision for dissimilarity-based multivariate analysis of ecological communities
Anderson, Marti J; Santana-Garcon, Julia
2015-01-01
Ecological studies require key decisions regarding the appropriate size and number of sampling units. No methods currently exist to measure precision for multivariate assemblage data when dissimilarity-based analyses are intended to follow. Here, we propose a pseudo multivariate dissimilarity-based standard error (MultSE) as a useful quantity for assessing sample-size adequacy in studies of ecological communities. Based on sums of squared dissimilarities, MultSE measures variability in the position of the centroid in the space of a chosen dissimilarity measure under repeated sampling for a given sample size. We describe a novel double resampling method to quantify uncertainty in MultSE values with increasing sample size. For more complex designs, values of MultSE can be calculated from the pseudo residual mean square of a permanova model, with the double resampling done within appropriate cells in the design. R code functions for implementing these techniques, along with ecological examples, are provided. PMID:25438826
NASA Astrophysics Data System (ADS)
Hu, Chia-Chang; Lin, Hsuan-Yu; Chen, Yu-Fan; Wen, Jyh-Horng
2006-12-01
An adaptive minimum mean-square error (MMSE) array receiver based on the fuzzy-logic recursive least-squares (RLS) algorithm is developed for asynchronous DS-CDMA interference suppression in the presence of frequency-selective multipath fading. This receiver employs a fuzzy-logic control mechanism to perform the nonlinear mapping of the squared error and squared error variation, denoted by ([InlineEquation not available: see fulltext.],[InlineEquation not available: see fulltext.]), into a forgetting factor[InlineEquation not available: see fulltext.]. For the real-time applicability, a computationally efficient version of the proposed receiver is derived based on the least-mean-square (LMS) algorithm using the fuzzy-inference-controlled step-size[InlineEquation not available: see fulltext.]. This receiver is capable of providing both fast convergence/tracking capability as well as small steady-state misadjustment as compared with conventional LMS- and RLS-based MMSE DS-CDMA receivers. Simulations show that the fuzzy-logic LMS and RLS algorithms outperform, respectively, other variable step-size LMS (VSS-LMS) and variable forgetting factor RLS (VFF-RLS) algorithms at least 3 dB and 1.5 dB in bit-error-rate (BER) for multipath fading channels.
Estimating random errors due to shot noise in backscatter lidar observations.
Liu, Zhaoyan; Hunt, William; Vaughan, Mark; Hostetler, Chris; McGill, Matthew; Powell, Kathleen; Winker, David; Hu, Yongxiang
2006-06-20
We discuss the estimation of random errors due to shot noise in backscatter lidar observations that use either photomultiplier tube (PMT) or avalanche photodiode (APD) detectors. The statistical characteristics of photodetection are reviewed, and photon count distributions of solar background signals and laser backscatter signals are examined using airborne lidar observations at 532 nm using a photon-counting mode APD. Both distributions appear to be Poisson, indicating that the arrival at the photodetector of photons for these signals is a Poisson stochastic process. For Poisson- distributed signals, a proportional, one-to-one relationship is known to exist between the mean of a distribution and its variance. Although the multiplied photocurrent no longer follows a strict Poisson distribution in analog-mode APD and PMT detectors, the proportionality still exists between the mean and the variance of the multiplied photocurrent. We make use of this relationship by introducing the noise scale factor (NSF), which quantifies the constant of proportionality that exists between the root mean square of the random noise in a measurement and the square root of the mean signal. Using the NSF to estimate random errors in lidar measurements due to shot noise provides a significant advantage over the conventional error estimation techniques, in that with the NSF, uncertainties can be reliably calculated from or for a single data sample. Methods for evaluating the NSF are presented. Algorithms to compute the NSF are developed for the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations lidar and tested using data from the Lidar In-space Technology Experiment.
Estimating Random Errors Due to Shot Noise in Backscatter Lidar Observations
NASA Technical Reports Server (NTRS)
Liu, Zhaoyan; Hunt, William; Vaughan, Mark A.; Hostetler, Chris A.; McGill, Matthew J.; Powell, Kathy; Winker, David M.; Hu, Yongxiang
2006-01-01
In this paper, we discuss the estimation of random errors due to shot noise in backscatter lidar observations that use either photomultiplier tube (PMT) or avalanche photodiode (APD) detectors. The statistical characteristics of photodetection are reviewed, and photon count distributions of solar background signals and laser backscatter signals are examined using airborne lidar observations at 532 nm using a photon-counting mode APD. Both distributions appear to be Poisson, indicating that the arrival at the photodetector of photons for these signals is a Poisson stochastic process. For Poisson-distributed signals, a proportional, one-to-one relationship is known to exist between the mean of a distribution and its variance. Although the multiplied photocurrent no longer follows a strict Poisson distribution in analog-mode APD and PMT detectors, the proportionality still exists between the mean and the variance of the multiplied photocurrent. We make use of this relationship by introducing the noise scale factor (NSF), which quantifies the constant of proportionality that exists between the root-mean-square of the random noise in a measurement and the square root of the mean signal. Using the NSF to estimate random errors in lidar measurements due to shot noise provides a significant advantage over the conventional error estimation techniques, in that with the NSF uncertainties can be reliably calculated from/for a single data sample. Methods for evaluating the NSF are presented. Algorithms to compute the NSF are developed for the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) lidar and tested using data from the Lidar In-space Technology Experiment (LITE). OCIS Codes:
Retrieval of the aerosol optical thickness from UV global irradiance measurements
NASA Astrophysics Data System (ADS)
Costa, M. J.; Salgueiro, V.; Bortoli, D.; Obregón, M. A.; Antón, M.; Silva, A. M.
2015-12-01
The UV irradiance is measured at Évora since several years, where a CIMEL sunphotometer integrated in AERONET is also installed. In the present work, measurements of UVA (315 - 400 nm) irradiances taken with Kipp&Zonen radiometers, as well as satellite data of ozone total column values, are used in combination with radiative transfer calculations, to estimate the aerosol optical thickness (AOT) in the UV. The retrieved UV AOT in Évora is compared with AERONET AOT (at 340 and 380 nm) and a fairly good agreement is found with a root mean square error of 0.05 (normalized root mean square error of 8.3%) and a mean absolute error of 0.04 (mean percentage error of 2.9%). The methodology is then used to estimate the UV AOT in Sines, an industrialized site on the Atlantic western coast, where the UV irradiance is monitored since 2013 but no aerosol information is available.
Estimation of the simple correlation coefficient.
Shieh, Gwowen
2010-11-01
This article investigates some unfamiliar properties of the Pearson product-moment correlation coefficient for the estimation of simple correlation coefficient. Although Pearson's r is biased, except for limited situations, and the minimum variance unbiased estimator has been proposed in the literature, researchers routinely employ the sample correlation coefficient in their practical applications, because of its simplicity and popularity. In order to support such practice, this study examines the mean squared errors of r and several prominent formulas. The results reveal specific situations in which the sample correlation coefficient performs better than the unbiased and nearly unbiased estimators, facilitating recommendation of r as an effect size index for the strength of linear association between two variables. In addition, related issues of estimating the squared simple correlation coefficient are also considered.
Discordance between net analyte signal theory and practical multivariate calibration.
Brown, Christopher D
2004-08-01
Lorber's concept of net analyte signal is reviewed in the context of classical and inverse least-squares approaches to multivariate calibration. It is shown that, in the presence of device measurement error, the classical and inverse calibration procedures have radically different theoretical prediction objectives, and the assertion that the popular inverse least-squares procedures (including partial least squares, principal components regression) approximate Lorber's net analyte signal vector in the limit is disproved. Exact theoretical expressions for the prediction error bias, variance, and mean-squared error are given under general measurement error conditions, which reinforce the very discrepant behavior between these two predictive approaches, and Lorber's net analyte signal theory. Implications for multivariate figures of merit and numerous recently proposed preprocessing treatments involving orthogonal projections are also discussed.
NASA Technical Reports Server (NTRS)
Long, S. A. T.
1974-01-01
Formulas are derived for the root-mean-square (rms) displacement, slope, and curvature errors in an azimuth-elevation image trace of an elongated object in space, as functions of the number and spacing of the input data points and the rms elevation error in the individual input data points from a single observation station. Also, formulas are derived for the total rms displacement, slope, and curvature error vectors in the triangulation solution of an elongated object in space due to the rms displacement, slope, and curvature errors, respectively, in the azimuth-elevation image traces from different observation stations. The total rms displacement, slope, and curvature error vectors provide useful measure numbers for determining the relative merits of two or more different triangulation procedures applicable to elongated objects in space.
Crop/weed discrimination using near-infrared reflectance spectroscopy (NIRS)
NASA Astrophysics Data System (ADS)
Zhang, Yun; He, Yong
2006-09-01
The traditional uniform herbicide application often results in an over chemical residues on soil, crop plants and agriculture produce, which have imperiled the environment and food security. Near-infrared reflectance spectroscopy (NIRS) offers a promising means for weed detection and site-specific herbicide application. In laboratory, a total of 90 samples (30 for each species) of the detached leaves of two weeds, i.e., threeseeded mercury (Acalypha australis L.) and fourleafed duckweed (Marsilea quadrfolia L.), and one crop soybean (Glycine max) was investigated for NIRS on 325- 1075 nm using a field spectroradiometer. 20 absorbance samples of each species after pretreatment were exported and the lacked Y variables were assigned independent values for partial least squares (PLS) analysis. During the combined principle component analysis (PCA) on 400-1000 nm, the PC1 and PC2 could together explain over 91% of the total variance and detect the three plant species with 98.3% accuracy. The full-cross validation results of PLS, i.e., standard error of prediction (SEP) 0.247, correlation coefficient (r) 0.954 and root mean square error of prediction (RMSEP) 0.245, indicated an optimum model for weed identification. By predicting the remaining 10 samples of each species in the PLS model, the results with deviation presented a 100% crop/weed detection rate. Thus, it could be concluded that PLS was an available alternative of for qualitative weed discrimination on NTRS.
A comparison of abundance estimates from extended batch-marking and Jolly–Seber-type experiments
Cowen, Laura L E; Besbeas, Panagiotis; Morgan, Byron J T; Schwarz, Carl J
2014-01-01
Little attention has been paid to the use of multi-sample batch-marking studies, as it is generally assumed that an individual's capture history is necessary for fully efficient estimates. However, recently, Huggins et al. (2010) present a pseudo-likelihood for a multi-sample batch-marking study where they used estimating equations to solve for survival and capture probabilities and then derived abundance estimates using a Horvitz–Thompson-type estimator. We have developed and maximized the likelihood for batch-marking studies. We use data simulated from a Jolly–Seber-type study and convert this to what would have been obtained from an extended batch-marking study. We compare our abundance estimates obtained from the Crosbie–Manly–Arnason–Schwarz (CMAS) model with those of the extended batch-marking model to determine the efficiency of collecting and analyzing batch-marking data. We found that estimates of abundance were similar for all three estimators: CMAS, Huggins, and our likelihood. Gains are made when using unique identifiers and employing the CMAS model in terms of precision; however, the likelihood typically had lower mean square error than the pseudo-likelihood method of Huggins et al. (2010). When faced with designing a batch-marking study, researchers can be confident in obtaining unbiased abundance estimators. Furthermore, they can design studies in order to reduce mean square error by manipulating capture probabilities and sample size. PMID:24558576
Peña, Juan A; Corral, Victoria; Martínez, Miguel A; Peña, Estefanía
2018-01-01
In this paper, we hypothesize that the biaxial mechanical properties of the aorta may be dependent on arterial location. To demonstrate any possible position-related difference, our study analyzed and compared the biaxial mechanical properties of the ascending thoracic aorta, descending thoracic aorta and infrarenal abdominal aorta stemming from the same porcine subjects, and reported values of constitutive parameters for well-known strain energy functions, showing how these mechanical properties are affected by location along the aorta. When comparing ascending thoracic aorta, descending thoracic aorta and infrarenal abdominal aorta, abdominal tissues were found to be stiffer and highly anisotropic. We found that the aorta changed from a more isotropic to a more anisotropic tissue and became progressively less compliant and stiffer with the distance to the heart. We observed substantial differences in the anisotropy parameter between aortic samples where abdominal samples were more anisotropic and nonlinear than the thoracic samples. The phenomenological model was not able to capture the passive biaxial properties of each specific porcine aorta over a wide range of biaxial deformations, showing the best prediction root mean square error ε=0.2621 for ascending thoracic samples and, especially, the worst for the infrarenal abdominal samples ε=0.3780. The micro-structured model with Bingham orientation density function was able to better predict biaxial deformations (ε=0.1372 for ascending thoracic aorta samples). The root mean square error of the micro-structural model and the micro-structured model with von Mises orientation density function were similar for all positions. Copyright © 2017 Elsevier Ltd. All rights reserved.
An affordable cuff-less blood pressure estimation solution.
Jain, Monika; Kumar, Niranjan; Deb, Sujay
2016-08-01
This paper presents a cuff-less hypertension pre-screening device that non-invasively monitors the Blood Pressure (BP) and Heart Rate (HR) continuously. The proposed device simultaneously records two clinically significant and highly correlated biomedical signals, viz., Electrocardiogram (ECG) and Photoplethysmogram (PPG). The device provides a common data acquisition platform that can interface with PC/laptop, Smart phone/tablet and Raspberry-pi etc. The hardware stores and processes the recorded ECG and PPG in order to extract the real-time BP and HR using kernel regression approach. The BP and HR estimation error is measured in terms of normalized mean square error, Error Standard Deviation (ESD) and Mean Absolute Error (MAE), with respect to a clinically proven digital BP monitor (OMRON HBP1300). The computed error falls under the maximum standard allowable error mentioned by Association for the Advancement of Medical Instrumentation; MAE <; 5 mmHg and ESD <; 8mmHg. The results are validated using two-tailed dependent sample t-test also. The proposed device is a portable low-cost home and clinic bases solution for continuous health monitoring.
Simple Forest Canopy Thermal Exitance Model
NASA Technical Reports Server (NTRS)
Smith J. A.; Goltz, S. M.
1999-01-01
We describe a model to calculate brightness temperature and surface energy balance for a forest canopy system. The model is an extension of an earlier vegetation only model by inclusion of a simple soil layer. The root mean square error in brightness temperature for a dense forest canopy was 2.5 C. Surface energy balance predictions were also in good agreement. The corresponding root mean square errors for net radiation, latent, and sensible heat were 38.9, 30.7, and 41.4 W/sq m respectively.
Jeyasingh, Suganthi; Veluchamy, Malathi
2017-05-01
Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE). Creative Commons Attribution License
Heuristic-driven graph wavelet modeling of complex terrain
NASA Astrophysics Data System (ADS)
Cioacǎ, Teodor; Dumitrescu, Bogdan; Stupariu, Mihai-Sorin; Pǎtru-Stupariu, Ileana; Nǎpǎrus, Magdalena; Stoicescu, Ioana; Peringer, Alexander; Buttler, Alexandre; Golay, François
2015-03-01
We present a novel method for building a multi-resolution representation of large digital surface models. The surface points coincide with the nodes of a planar graph which can be processed using a critically sampled, invertible lifting scheme. To drive the lazy wavelet node partitioning, we employ an attribute aware cost function based on the generalized quadric error metric. The resulting algorithm can be applied to multivariate data by storing additional attributes at the graph's nodes. We discuss how the cost computation mechanism can be coupled with the lifting scheme and examine the results by evaluating the root mean square error. The algorithm is experimentally tested using two multivariate LiDAR sets representing terrain surface and vegetation structure with different sampling densities.
Applications and Comparisons of Four Time Series Models in Epidemiological Surveillance Data
Young, Alistair A.; Li, Xiaosong
2014-01-01
Public health surveillance systems provide valuable data for reliable predication of future epidemic events. This paper describes a study that used nine types of infectious disease data collected through a national public health surveillance system in mainland China to evaluate and compare the performances of four time series methods, namely, two decomposition methods (regression and exponential smoothing), autoregressive integrated moving average (ARIMA) and support vector machine (SVM). The data obtained from 2005 to 2011 and in 2012 were used as modeling and forecasting samples, respectively. The performances were evaluated based on three metrics: mean absolute error (MAE), mean absolute percentage error (MAPE), and mean square error (MSE). The accuracy of the statistical models in forecasting future epidemic disease proved their effectiveness in epidemiological surveillance. Although the comparisons found that no single method is completely superior to the others, the present study indeed highlighted that the SVMs outperforms the ARIMA model and decomposition methods in most cases. PMID:24505382
Optimal dental age estimation practice in United Arab Emirates' children.
Altalie, Salem; Thevissen, Patrick; Fieuws, Steffen; Willems, Guy
2014-03-01
The aim of the study was to detect whether the Willems model, developed on a Belgian reference sample, can be used for age estimations in United Arab Emirates (UAE) children. Furthermore, it was verified that if added third molars development information in children provided more accurate age predictions. On 1900 panoramic radiographs, the development of left mandibular permanent teeth (PT) and third molars (TM) was registered according the Demirjian and the Kohler technique, respectively. The PT data were used to verify the Willems model and to develop a UAE model and to verify it. Multiple regression models with PT, TM, and PT + TM scores as independent and age as dependent factor were developed. Comparing the verified Willems- and the UAE model revealed differences in mean error of -0.01 year, mean absolute error of 0.01 year and root mean squared error of 0.90 year. Neglectable overall decrease in RMSE was detected combining PM and TM developmental information. © 2013 American Academy of Forensic Sciences.
de Godoy, Luiz Antonio Fonseca; Hantao, Leandro Wang; Pedroso, Marcio Pozzobon; Poppi, Ronei Jesus; Augusto, Fabio
2011-08-05
The use of multivariate curve resolution (MCR) to build multivariate quantitative models using data obtained from comprehensive two-dimensional gas chromatography with flame ionization detection (GC×GC-FID) is presented and evaluated. The MCR algorithm presents some important features, such as second order advantage and the recovery of the instrumental response for each pure component after optimization by an alternating least squares (ALS) procedure. A model to quantify the essential oil of rosemary was built using a calibration set containing only known concentrations of the essential oil and cereal alcohol as solvent. A calibration curve correlating the concentration of the essential oil of rosemary and the instrumental response obtained from the MCR-ALS algorithm was obtained, and this calibration model was applied to predict the concentration of the oil in complex samples (mixtures of the essential oil, pineapple essence and commercial perfume). The values of the root mean square error of prediction (RMSEP) and of the root mean square error of the percentage deviation (RMSPD) obtained were 0.4% (v/v) and 7.2%, respectively. Additionally, a second model was built and used to evaluate the accuracy of the method. A model to quantify the essential oil of lemon grass was built and its concentration was predicted in the validation set and real perfume samples. The RMSEP and RMSPD obtained were 0.5% (v/v) and 6.9%, respectively, and the concentration of the essential oil of lemon grass in perfume agreed to the value informed by the manufacturer. The result indicates that the MCR algorithm is adequate to resolve the target chromatogram from the complex sample and to build multivariate models of GC×GC-FID data. Copyright © 2011 Elsevier B.V. All rights reserved.
Geostatistical modeling of riparian forest microclimate and its implications for sampling
Eskelson, B.N.I.; Anderson, P.D.; Hagar, J.C.; Temesgen, H.
2011-01-01
Predictive models of microclimate under various site conditions in forested headwater stream - riparian areas are poorly developed, and sampling designs for characterizing underlying riparian microclimate gradients are sparse. We used riparian microclimate data collected at eight headwater streams in the Oregon Coast Range to compare ordinary kriging (OK), universal kriging (UK), and kriging with external drift (KED) for point prediction of mean maximum air temperature (Tair). Several topographic and forest structure characteristics were considered as site-specific parameters. Height above stream and distance to stream were the most important covariates in the KED models, which outperformed OK and UK in terms of root mean square error. Sample patterns were optimized based on the kriging variance and the weighted means of shortest distance criterion using the simulated annealing algorithm. The optimized sample patterns outperformed systematic sample patterns in terms of mean kriging variance mainly for small sample sizes. These findings suggest methods for increasing efficiency of microclimate monitoring in riparian areas.
Super-linear Precision in Simple Neural Population Codes
NASA Astrophysics Data System (ADS)
Schwab, David; Fiete, Ila
2015-03-01
A widely used tool for quantifying the precision with which a population of noisy sensory neurons encodes the value of an external stimulus is the Fisher Information (FI). Maximizing the FI is also a commonly used objective for constructing optimal neural codes. The primary utility and importance of the FI arises because it gives, through the Cramer-Rao bound, the smallest mean-squared error achievable by any unbiased stimulus estimator. However, it is well-known that when neural firing is sparse, optimizing the FI can result in codes that perform very poorly when considering the resulting mean-squared error, a measure with direct biological relevance. Here we construct optimal population codes by minimizing mean-squared error directly and study the scaling properties of the resulting network, focusing on the optimal tuning curve width. We then extend our results to continuous attractor networks that maintain short-term memory of external stimuli in their dynamics. Here we find similar scaling properties in the structure of the interactions that minimize diffusive information loss.
NASA Astrophysics Data System (ADS)
Degaudenzi, Riccardo; Vanghi, Vieri
1994-02-01
In all-digital Trellis-Coded 8PSK (TC-8PSK) demodulator well suited for VLSI implementation, including maximum likelihood estimation decision-directed (MLE-DD) carrier phase and clock timing recovery, is introduced and analyzed. By simply removing the trellis decoder the demodulator can efficiently cope with uncoded 8PSK signals. The proposed MLE-DD synchronization algorithm requires one sample for the phase and two samples per symbol for the timing loop. The joint phase and timing discriminator characteristics are analytically derived and numerical results checked by means of computer simulations. An approximated expression for steady-state carrier phase and clock timing mean square error has been derived and successfully checked with simulation findings. Synchronizer deviation from the Cramer Rao bound is also discussed. Mean acquisition time for the digital synchronizer has also been computed and checked, using the Monte Carlo simulation technique. Finally, TC-8PSK digital demodulator performance in terms of bit error rate and mean time to lose lock, including digital interpolators and synchronization loops, is presented.
McHugh Power, Joanna; Carney, Sile; Hannigan, Caoimhe; Brennan, Sabina; Wolfe, Hannah; Lynch, Marina; Kee, Frank; Lawlor, Brian
2016-11-01
Potential associations between systemic inflammation and social support received by a sample of 120 older adults were examined here. Inflammatory markers, cognitive function, social support and psychosocial wellbeing were evaluated. A structural equation modelling approach was used to analyse the data. The model was a good fit [Formula: see text], p < 0.001; comparative fit index = 0.973; Tucker-Lewis Index = 0.962; root mean square error of approximation = 0.021; standardised root mean-square residual = 0.074). Chemokine levels were associated with increased age ( β = 0.276), receipt of less social support from friends ( β = -0.256) and body mass index ( β = -0.256). Results are discussed in relation to social signal transduction theory.
An Empirical State Error Covariance Matrix for the Weighted Least Squares Estimation Method
NASA Technical Reports Server (NTRS)
Frisbee, Joseph H., Jr.
2011-01-01
State estimation techniques effectively provide mean state estimates. However, the theoretical state error covariance matrices provided as part of these techniques often suffer from a lack of confidence in their ability to describe the un-certainty in the estimated states. By a reinterpretation of the equations involved in the weighted least squares algorithm, it is possible to directly arrive at an empirical state error covariance matrix. This proposed empirical state error covariance matrix will contain the effect of all error sources, known or not. Results based on the proposed technique will be presented for a simple, two observer, measurement error only problem.
Robust Mean and Covariance Structure Analysis through Iteratively Reweighted Least Squares.
ERIC Educational Resources Information Center
Yuan, Ke-Hai; Bentler, Peter M.
2000-01-01
Adapts robust schemes to mean and covariance structures, providing an iteratively reweighted least squares approach to robust structural equation modeling. Each case is weighted according to its distance, based on first and second order moments. Test statistics and standard error estimators are given. (SLD)
Guelpa, Anina; Bevilacqua, Marta; Marini, Federico; O'Kennedy, Kim; Geladi, Paul; Manley, Marena
2015-04-15
It has been established in this study that the Rapid Visco Analyser (RVA) can describe maize hardness, irrespective of the RVA profile, when used in association with appropriate multivariate data analysis techniques. Therefore, the RVA can complement or replace current and/or conventional methods as a hardness descriptor. Hardness modelling based on RVA viscograms was carried out using seven conventional hardness methods (hectoliter mass (HLM), hundred kernel mass (HKM), particle size index (PSI), percentage vitreous endosperm (%VE), protein content, percentage chop (%chop) and near infrared (NIR) spectroscopy) as references and three different RVA profiles (hard, soft and standard) as predictors. An approach using locally weighted partial least squares (LW-PLS) was followed to build the regression models. The resulted prediction errors (root mean square error of cross-validation (RMSECV) and root mean square error of prediction (RMSEP)) for the quantification of hardness values were always lower or in the same order of the laboratory error of the reference method. Copyright © 2014 Elsevier Ltd. All rights reserved.
Li, Wei; Zhang, Xuan; Zheng, Kaiyi; Du, Yiping; Cap, Peng; Sui, Tao; Geng, Jinpei
2015-01-01
A fluidized bed enrichment technique was developed to improve sensitivity of near infrared (NIR) spectroscopy with features of rapidness and large volume solution. D301 resin was used as an adsorption material to preconcentrate β-naphthalenesulfonic acid in solutions in a concentration range of 2.0-100.0 μg/mL, and NIR spectra were measured directly relative to the β-naphthalenesulfonic acid adsorbed on the material. An improved partial least squares (PLS) model was attained with the aid of multiplicative scatter correction pretreatment and stability competitive adaptive reweighted sampling wavenumber selection method. The root mean square error of cross validation was 1.87 μg/mL at PLS factor of 7. An independent test set was used to assess the model, with the relative error (RE) in an acceptable range of 0.46 to 10.03% and mean RE of 3.72%. This study confirmed the viability of the proposed method for the measurement of a low content of β-naphthalenesulfonic acid in water.
[Application of genetic algorithm in blending technology for extractions of Cortex Fraxini].
Yang, Ming; Zhou, Yinmin; Chen, Jialei; Yu, Minying; Shi, Xiufeng; Gu, Xijun
2009-10-01
To explore the feasibility of genetic algorithm (GA) on multiple objective blending technology for extractions of Cortex Fraxini. According to that the optimization objective was the combination of fingerprint similarity and the root-mean-square error of multiple key constituents, a new multiple objective optimization model of 10 batches extractions of Cortex Fraxini was built. The blending coefficient was obtained by genetic algorithm. The quality of 10 batches extractions of Cortex Fraxini that after blending was evaluated with the finger print similarity and root-mean-square error as indexes. The quality of 10 batches extractions of Cortex Fraxini that after blending was well improved. Comparing with the fingerprint of the control sample, the similarity was up, but the degree of variation is down. The relative deviation of the key constituents was less than 10%. It is proved that genetic algorithm works well on multiple objective blending technology for extractions of Cortex Fraxini. This method can be a reference to control the quality of extractions of Cortex Fraxini. Genetic algorithm in blending technology for extractions of Chinese medicines is advisable.
Stevens, Antoine; Nocita, Marco; Tóth, Gergely; Montanarella, Luca; van Wesemael, Bas
2013-01-01
Soil organic carbon is a key soil property related to soil fertility, aggregate stability and the exchange of CO2 with the atmosphere. Existing soil maps and inventories can rarely be used to monitor the state and evolution in soil organic carbon content due to their poor spatial resolution, lack of consistency and high updating costs. Visible and Near Infrared diffuse reflectance spectroscopy is an alternative method to provide cheap and high-density soil data. However, there are still some uncertainties on its capacity to produce reliable predictions for areas characterized by large soil diversity. Using a large-scale EU soil survey of about 20,000 samples and covering 23 countries, we assessed the performance of reflectance spectroscopy for the prediction of soil organic carbon content. The best calibrations achieved a root mean square error ranging from 4 to 15 g C kg(-1) for mineral soils and a root mean square error of 50 g C kg(-1) for organic soil materials. Model errors are shown to be related to the levels of soil organic carbon and variations in other soil properties such as sand and clay content. Although errors are ∼5 times larger than the reproducibility error of the laboratory method, reflectance spectroscopy provides unbiased predictions of the soil organic carbon content. Such estimates could be used for assessing the mean soil organic carbon content of large geographical entities or countries. This study is a first step towards providing uniform continental-scale spectroscopic estimations of soil organic carbon, meeting an increasing demand for information on the state of the soil that can be used in biogeochemical models and the monitoring of soil degradation.
Stevens, Antoine; Nocita, Marco; Tóth, Gergely; Montanarella, Luca; van Wesemael, Bas
2013-01-01
Soil organic carbon is a key soil property related to soil fertility, aggregate stability and the exchange of CO2 with the atmosphere. Existing soil maps and inventories can rarely be used to monitor the state and evolution in soil organic carbon content due to their poor spatial resolution, lack of consistency and high updating costs. Visible and Near Infrared diffuse reflectance spectroscopy is an alternative method to provide cheap and high-density soil data. However, there are still some uncertainties on its capacity to produce reliable predictions for areas characterized by large soil diversity. Using a large-scale EU soil survey of about 20,000 samples and covering 23 countries, we assessed the performance of reflectance spectroscopy for the prediction of soil organic carbon content. The best calibrations achieved a root mean square error ranging from 4 to 15 g C kg−1 for mineral soils and a root mean square error of 50 g C kg−1 for organic soil materials. Model errors are shown to be related to the levels of soil organic carbon and variations in other soil properties such as sand and clay content. Although errors are ∼5 times larger than the reproducibility error of the laboratory method, reflectance spectroscopy provides unbiased predictions of the soil organic carbon content. Such estimates could be used for assessing the mean soil organic carbon content of large geographical entities or countries. This study is a first step towards providing uniform continental-scale spectroscopic estimations of soil organic carbon, meeting an increasing demand for information on the state of the soil that can be used in biogeochemical models and the monitoring of soil degradation. PMID:23840459
[Near infrared spectroscopy study on water content in turbine oil].
Chen, Bin; Liu, Ge; Zhang, Xian-Ming
2013-11-01
Near infrared (NIR) spectroscopy combined with successive projections algorithm (SPA) was investigated for determination of water content in turbine oil. Through the 57 samples of different water content in turbine oil scanned applying near infrared (NIR) spectroscopy, with the water content in the turbine oil of 0-0.156%, different pretreatment methods such as the original spectra, first derivative spectra and differential polynomial least squares fitting algorithm Savitzky-Golay (SG), and successive projections algorithm (SPA) were applied for the extraction of effective wavelengths, the correlation coefficient (R) and root mean square error (RMSE) were used as the model evaluation indices, accordingly water content in turbine oil was investigated. The results indicated that the original spectra with different water content in turbine oil were pretreated by the performance of first derivative + SG pretreatments, then the selected effective wavelengths were used as the inputs of least square support vector machine (LS-SVM). A total of 16 variables selected by SPA were employed to construct the model of SPA and least square support vector machine (SPA-LS-SVM). There is 9 as The correlation coefficient was 0.975 9 and the root of mean square error of validation set was 2.655 8 x 10(-3) using the model, and it is feasible to determine the water content in oil using near infrared spectroscopy and SPA-LS-SVM, and an excellent prediction precision was obtained. This study supplied a new and alternative approach to the further application of near infrared spectroscopy in on-line monitoring of contamination such as water content in oil.
Robust Timing Synchronization in Aeronautical Mobile Communication Systems
NASA Technical Reports Server (NTRS)
Xiong, Fu-Qin; Pinchak, Stanley
2004-01-01
This work details a study of robust synchronization schemes suitable for satellite to mobile aeronautical applications. A new scheme, the Modified Sliding Window Synchronizer (MSWS), is devised and compared with existing schemes, including the traditional Early-Late Gate Synchronizer (ELGS), the Gardner Zero-Crossing Detector (GZCD), and the Sliding Window Synchronizer (SWS). Performance of the synchronization schemes is evaluated by a set of metrics that indicate performance in digital communications systems. The metrics are convergence time, mean square phase error (or root mean-square phase error), lowest SNR for locking, initial frequency offset performance, midstream frequency offset performance, and system complexity. The performance of the synchronizers is evaluated by means of Matlab simulation models. A simulation platform is devised to model the satellite to mobile aeronautical channel, consisting of a Quadrature Phase Shift Keying modulator, an additive white Gaussian noise channel, and a demodulator front end. Simulation results show that the MSWS provides the most robust performance at the cost of system complexity. The GZCD provides a good tradeoff between robustness and system complexity for communication systems that require high symbol rates or low overall system costs. The ELGS has a high system complexity despite its average performance. Overall, the SWS, originally designed for multi-carrier systems, performs very poorly in single-carrier communications systems. Table 5.1 in Section 5 provides a ranking of each of the synchronization schemes in terms of the metrics set forth in Section 4.1. Details of comparison are given in Section 5. Based on the results presented in Table 5, it is safe to say that the most robust synchronization scheme examined in this work is the high-sample-rate Modified Sliding Window Synchronizer. A close second is its low-sample-rate cousin. The tradeoff between complexity and lowest mean-square phase error determines the rankings of the Gardner Zero-Crossing Detector and both versions of the Early-Late Gate Synchronizer. The least robust models are the high and low-sample-rate Sliding Window Synchronizers. Consequently, the recommended replacement synchronizer for NASA's Advanced Air Transportation Technologies mobile aeronautical communications system is the high-sample-rate Modified Sliding Window Synchronizer. By incorporating this synchronizer into their system, NASA can be assured that their system will be operational in extremely adverse conditions. The quick convergence time of the MSWS should allow the use of high-level protocols. However, if NASA feels that reduced system complexity is the most important aspect of their replacement synchronizer, the Gardner Zero-Crossing Detector would be the best choice.
Li, Xiaomeng; Fang, Dansi; Cong, Xiaodong; Cao, Gang; Cai, Hao; Cai, Baochang
2012-12-01
A method is described using rapid and sensitive Fourier transform near-infrared spectroscopy combined with high-performance liquid chromatography-diode array detection for the simultaneous identification and determination of four bioactive compounds in crude Radix Scrophulariae samples. Partial least squares regression is selected as the analysis type and multiplicative scatter correction, second derivative, and Savitzky-Golay filter were adopted for the spectral pretreatment. The correlation coefficients (R) of the calibration models were above 0.96 and the root mean square error of predictions were under 0.028. The developed models were applied to unknown samples with satisfactory results. The established method was validated and can be applied to the intrinsic quality control of crude Radix Scrophulariae.
Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Xi, Xiuxiu
2015-07-23
The measurement of soil total nitrogen (TN) by hyperspectral remote sensing provides an important tool for soil restoration programs in areas with subsided land caused by the extraction of natural resources. This study used the local correlation maximization-complementary superiority method (LCMCS) to establish TN prediction models by considering the relationship between spectral reflectance (measured by an ASD FieldSpec 3 spectroradiometer) and TN based on spectral reflectance curves of soil samples collected from subsided land which is determined by synthetic aperture radar interferometry (InSAR) technology. Based on the 1655 selected effective bands of the optimal spectrum (OSP) of the first derivate differential of reciprocal logarithm ([log{1/R}]'), (correlation coefficients, p < 0.01), the optimal model of LCMCS method was obtained to determine the final model, which produced lower prediction errors (root mean square error of validation [RMSEV] = 0.89, mean relative error of validation [MREV] = 5.93%) when compared with models built by the local correlation maximization (LCM), complementary superiority (CS) and partial least squares regression (PLS) methods. The predictive effect of LCMCS model was optional in Cangzhou, Renqiu and Fengfeng District. Results indicate that the LCMCS method has great potential to monitor TN in subsided lands caused by the extraction of natural resources including groundwater, oil and coal.
An Empirical State Error Covariance Matrix for Batch State Estimation
NASA Technical Reports Server (NTRS)
Frisbee, Joseph H., Jr.
2011-01-01
State estimation techniques serve effectively to provide mean state estimates. However, the state error covariance matrices provided as part of these techniques suffer from some degree of lack of confidence in their ability to adequately describe the uncertainty in the estimated states. A specific problem with the traditional form of state error covariance matrices is that they represent only a mapping of the assumed observation error characteristics into the state space. Any errors that arise from other sources (environment modeling, precision, etc.) are not directly represented in a traditional, theoretical state error covariance matrix. Consider that an actual observation contains only measurement error and that an estimated observation contains all other errors, known and unknown. It then follows that a measurement residual (the difference between expected and observed measurements) contains all errors for that measurement. Therefore, a direct and appropriate inclusion of the actual measurement residuals in the state error covariance matrix will result in an empirical state error covariance matrix. This empirical state error covariance matrix will fully account for the error in the state estimate. By way of a literal reinterpretation of the equations involved in the weighted least squares estimation algorithm, it is possible to arrive at an appropriate, and formally correct, empirical state error covariance matrix. The first specific step of the method is to use the average form of the weighted measurement residual variance performance index rather than its usual total weighted residual form. Next it is helpful to interpret the solution to the normal equations as the average of a collection of sample vectors drawn from a hypothetical parent population. From here, using a standard statistical analysis approach, it directly follows as to how to determine the standard empirical state error covariance matrix. This matrix will contain the total uncertainty in the state estimate, regardless as to the source of the uncertainty. Also, in its most straight forward form, the technique only requires supplemental calculations to be added to existing batch algorithms. The generation of this direct, empirical form of the state error covariance matrix is independent of the dimensionality of the observations. Mixed degrees of freedom for an observation set are allowed. As is the case with any simple, empirical sample variance problems, the presented approach offers an opportunity (at least in the case of weighted least squares) to investigate confidence interval estimates for the error covariance matrix elements. The diagonal or variance terms of the error covariance matrix have a particularly simple form to associate with either a multiple degree of freedom chi-square distribution (more approximate) or with a gamma distribution (less approximate). The off diagonal or covariance terms of the matrix are less clear in their statistical behavior. However, the off diagonal covariance matrix elements still lend themselves to standard confidence interval error analysis. The distributional forms associated with the off diagonal terms are more varied and, perhaps, more approximate than those associated with the diagonal terms. Using a simple weighted least squares sample problem, results obtained through use of the proposed technique are presented. The example consists of a simple, two observer, triangulation problem with range only measurements. Variations of this problem reflect an ideal case (perfect knowledge of the range errors) and a mismodeled case (incorrect knowledge of the range errors).
Arroz, Erin; Jordan, Michael; Dumancas, Gerard G
2017-07-01
An ultraviolet visible (UV-Vis) spectrophotometric and partial least squares (PLS) chemometric method was developed for the simultaneous determination of erythrosine B (red), Brilliant Blue, and tartrazine (yellow) dyes. A training set (n = 64) was generated using a full factorial design and its accuracy was tested in a test set (n = 13) using a Box-Behnken design. The test set garnered a root mean square error (RMSE) of 1.79 × 10 -7 for blue, 4.59 × 10 -7 for red, and 1.13 × 10 -6 for yellow dyes. The relatively small RMSE suggests only a small difference between predicted versus measured concentrations, demonstrating the accuracy of our model. The relative error of prediction (REP) for the test set were 11.73%, 19.52%, 19.38%, for blue, red, and yellow dyes, respectively. A comparable overlay between the actual candy samples and their replicated synthetic spectra were also obtained indicating the model as a potentially accurate method for determining concentrations of dyes in food samples.
Gaspardo, B; Del Zotto, S; Torelli, E; Cividino, S R; Firrao, G; Della Riccia, G; Stefanon, B
2012-12-01
Fourier transform near infrared (FT-NIR) spectroscopy is an analytical procedure generally used to detect organic compounds in food. In this work the ability to predict fumonisin B(1)+B(2) contents in corn meal using an FT-NIR spectrophotometer, equipped with an integration sphere, was assessed. A total of 143 corn meal samples were collected in Friuli Venezia Giulia Region (Italy) and used to define a 15 principal components regression model, applying partial least square regression algorithm with full cross validation as internal validation. External validation was performed to 25 unknown samples. Coefficients of correlation, root mean square error and standard error of calibration were 0.964, 0.630 and 0.632, respectively and the external validation confirmed a fair potential of the model in predicting FB(1)+FB(2) concentration. Results suggest that FT-NIR analysis is a suitable method to detect FB(1)+FB(2) in corn meal and to discriminate safe meals from those contaminated. Copyright © 2012 Elsevier Ltd. All rights reserved.
Water quality management using statistical analysis and time-series prediction model
NASA Astrophysics Data System (ADS)
Parmar, Kulwinder Singh; Bhardwaj, Rashmi
2014-12-01
This paper deals with water quality management using statistical analysis and time-series prediction model. The monthly variation of water quality standards has been used to compare statistical mean, median, mode, standard deviation, kurtosis, skewness, coefficient of variation at Yamuna River. Model validated using R-squared, root mean square error, mean absolute percentage error, maximum absolute percentage error, mean absolute error, maximum absolute error, normalized Bayesian information criterion, Ljung-Box analysis, predicted value and confidence limits. Using auto regressive integrated moving average model, future water quality parameters values have been estimated. It is observed that predictive model is useful at 95 % confidence limits and curve is platykurtic for potential of hydrogen (pH), free ammonia, total Kjeldahl nitrogen, dissolved oxygen, water temperature (WT); leptokurtic for chemical oxygen demand, biochemical oxygen demand. Also, it is observed that predicted series is close to the original series which provides a perfect fit. All parameters except pH and WT cross the prescribed limits of the World Health Organization /United States Environmental Protection Agency, and thus water is not fit for drinking, agriculture and industrial use.
Validating Clusters with the Lower Bound for Sum-of-Squares Error
ERIC Educational Resources Information Center
Steinley, Douglas
2007-01-01
Given that a minor condition holds (e.g., the number of variables is greater than the number of clusters), a nontrivial lower bound for the sum-of-squares error criterion in K-means clustering is derived. By calculating the lower bound for several different situations, a method is developed to determine the adequacy of cluster solution based on…
Zhang, Chu; Liu, Fei; Kong, Wenwen; He, Yong
2015-01-01
Visible and near-infrared hyperspectral imaging covering spectral range of 380–1030 nm as a rapid and non-destructive method was applied to estimate the soluble protein content of oilseed rape leaves. Average spectrum (500–900 nm) of the region of interest (ROI) of each sample was extracted, and four samples out of 128 samples were defined as outliers by Monte Carlo-partial least squares (MCPLS). Partial least squares (PLS) model using full spectra obtained dependable performance with the correlation coefficient (rp) of 0.9441, root mean square error of prediction (RMSEP) of 0.1658 mg/g and residual prediction deviation (RPD) of 2.98. The weighted regression coefficient (Bw), successive projections algorithm (SPA) and genetic algorithm-partial least squares (GAPLS) selected 18, 15, and 16 sensitive wavelengths, respectively. SPA-PLS model obtained the best performance with rp of 0.9554, RMSEP of 0.1538 mg/g and RPD of 3.25. Distribution of protein content within the rape leaves were visualized and mapped on the basis of the SPA-PLS model. The overall results indicated that hyperspectral imaging could be used to determine and visualize the soluble protein content of rape leaves. PMID:26184198
Selective Weighted Least Squares Method for Fourier Transform Infrared Quantitative Analysis.
Wang, Xin; Li, Yan; Wei, Haoyun; Chen, Xia
2017-06-01
Classical least squares (CLS) regression is a popular multivariate statistical method used frequently for quantitative analysis using Fourier transform infrared (FT-IR) spectrometry. Classical least squares provides the best unbiased estimator for uncorrelated residual errors with zero mean and equal variance. However, the noise in FT-IR spectra, which accounts for a large portion of the residual errors, is heteroscedastic. Thus, if this noise with zero mean dominates in the residual errors, the weighted least squares (WLS) regression method described in this paper is a better estimator than CLS. However, if bias errors, such as the residual baseline error, are significant, WLS may perform worse than CLS. In this paper, we compare the effect of noise and bias error in using CLS and WLS in quantitative analysis. Results indicated that for wavenumbers with low absorbance, the bias error significantly affected the error, such that the performance of CLS is better than that of WLS. However, for wavenumbers with high absorbance, the noise significantly affected the error, and WLS proves to be better than CLS. Thus, we propose a selective weighted least squares (SWLS) regression that processes data with different wavenumbers using either CLS or WLS based on a selection criterion, i.e., lower or higher than an absorbance threshold. The effects of various factors on the optimal threshold value (OTV) for SWLS have been studied through numerical simulations. These studies reported that: (1) the concentration and the analyte type had minimal effect on OTV; and (2) the major factor that influences OTV is the ratio between the bias error and the standard deviation of the noise. The last part of this paper is dedicated to quantitative analysis of methane gas spectra, and methane/toluene mixtures gas spectra as measured using FT-IR spectrometry and CLS, WLS, and SWLS. The standard error of prediction (SEP), bias of prediction (bias), and the residual sum of squares of the errors (RSS) from the three quantitative analyses were compared. In methane gas analysis, SWLS yielded the lowest SEP and RSS among the three methods. In methane/toluene mixture gas analysis, a modification of the SWLS has been presented to tackle the bias error from other components. The SWLS without modification presents the lowest SEP in all cases but not bias and RSS. The modification of SWLS reduced the bias, which showed a lower RSS than CLS, especially for small components.
A nonlinear model of gold production in Malaysia
NASA Astrophysics Data System (ADS)
Ramli, Norashikin; Muda, Nora; Umor, Mohd Rozi
2014-06-01
Malaysia is a country which is rich in natural resources and one of it is a gold. Gold has already become an important national commodity. This study is conducted to determine a model that can be well fitted with the gold production in Malaysia from the year 1995-2010. Five nonlinear models are presented in this study which are Logistic model, Gompertz, Richard, Weibull and Chapman-Richard model. These model are used to fit the cumulative gold production in Malaysia. The best model is then selected based on the model performance. The performance of the fitted model is measured by sum squares error, root mean squares error, coefficient of determination, mean relative error, mean absolute error and mean absolute percentage error. This study has found that a Weibull model is shown to have significantly outperform compare to the other models. To confirm that Weibull is the best model, the latest data are fitted to the model. Once again, Weibull model gives the lowest readings at all types of measurement error. We can concluded that the future gold production in Malaysia can be predicted according to the Weibull model and this could be important findings for Malaysia to plan their economic activities.
Rakkiyappan, R; Sakthivel, N; Cao, Jinde
2015-06-01
This study examines the exponential synchronization of complex dynamical networks with control packet loss and additive time-varying delays. Additionally, sampled-data controller with time-varying sampling period is considered and is assumed to switch between m different values in a random way with given probability. Then, a novel Lyapunov-Krasovskii functional (LKF) with triple integral terms is constructed and by using Jensen's inequality and reciprocally convex approach, sufficient conditions under which the dynamical network is exponentially mean-square stable are derived. When applying Jensen's inequality to partition double integral terms in the derivation of linear matrix inequality (LMI) conditions, a new kind of linear combination of positive functions weighted by the inverses of squared convex parameters appears. In order to handle such a combination, an effective method is introduced by extending the lower bound lemma. To design the sampled-data controller, the synchronization error system is represented as a switched system. Based on the derived LMI conditions and average dwell-time method, sufficient conditions for the synchronization of switched error system are derived in terms of LMIs. Finally, numerical example is employed to show the effectiveness of the proposed methods. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Bayesian approach to parameter and reliability estimation in the Poisson distribution.
NASA Technical Reports Server (NTRS)
Canavos, G. C.
1972-01-01
For life testing procedures, a Bayesian analysis is developed with respect to a random intensity parameter in the Poisson distribution. Bayes estimators are derived for the Poisson parameter and the reliability function based on uniform and gamma prior distributions of that parameter. A Monte Carlo procedure is implemented to make possible an empirical mean-squared error comparison between Bayes and existing minimum variance unbiased, as well as maximum likelihood, estimators. As expected, the Bayes estimators have mean-squared errors that are appreciably smaller than those of the other two.
Weisberg, Arel; Lakis, Rollin E; Simpson, Michael F; Horowitz, Leo; Craparo, Joseph
2014-01-01
The versatility of laser-induced breakdown spectroscopy (LIBS) as an analytical method for high-temperature applications was demonstrated through measurement of the concentrations of the lanthanide elements europium (Eu) and praseodymium (Pr) in molten eutectic lithium chloride-potassium chloride (LiCl-KCl) salts at a temperature of 500 °C. Laser pulses (1064 nm, 7 ns, 120 mJ/pulse) were focused on the top surface of the molten salt samples in a laboratory furnace under an argon atmosphere, and the resulting LIBS signals were collected using a broadband Echelle-type spectrometer. Partial least squares (PLS) regression using leave-one-sample-out cross-validation was used to quantify the concentrations of Eu and Pr in the samples. The root mean square error of prediction (RMSEP) for Eu was 0.13% (absolute) over a concentration range of 0-3.01%, and for Pr was 0.13% (absolute) over a concentration range of 0-1.04%.
NASA Astrophysics Data System (ADS)
Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; Mello, Paola de Azevedo; Ferrão, Marco Flores; dos Santos, Maria de Fátima Pereira; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes
2012-04-01
Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm-1). This model produced a RMSECV of 400 mg kg-1 S and RMSEP of 420 mg kg-1 S, showing a correlation coefficient of 0.990.
NASA Astrophysics Data System (ADS)
Liu, Fei; He, Yong
2008-02-01
Visible and near infrared (Vis/NIR) transmission spectroscopy and chemometric methods were utilized to predict the pH values of cola beverages. Five varieties of cola were prepared and 225 samples (45 samples for each variety) were selected for the calibration set, while 75 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay and standard normal variate (SNV) followed by first-derivative were used as the pre-processing methods. Partial least squares (PLS) analysis was employed to extract the principal components (PCs) which were used as the inputs of least squares-support vector machine (LS-SVM) model according to their accumulative reliabilities. Then LS-SVM with radial basis function (RBF) kernel function and a two-step grid search technique were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias were 0.961, 0.040 and 0.012 for PLS, while 0.975, 0.031 and 4.697x10 -3 for LS-SVM, respectively. Both methods obtained a satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be applied as an alternative way for the prediction of pH of cola beverages.
Measures of precision for dissimilarity-based multivariate analysis of ecological communities.
Anderson, Marti J; Santana-Garcon, Julia
2015-01-01
Ecological studies require key decisions regarding the appropriate size and number of sampling units. No methods currently exist to measure precision for multivariate assemblage data when dissimilarity-based analyses are intended to follow. Here, we propose a pseudo multivariate dissimilarity-based standard error (MultSE) as a useful quantity for assessing sample-size adequacy in studies of ecological communities. Based on sums of squared dissimilarities, MultSE measures variability in the position of the centroid in the space of a chosen dissimilarity measure under repeated sampling for a given sample size. We describe a novel double resampling method to quantify uncertainty in MultSE values with increasing sample size. For more complex designs, values of MultSE can be calculated from the pseudo residual mean square of a permanova model, with the double resampling done within appropriate cells in the design. R code functions for implementing these techniques, along with ecological examples, are provided. © 2014 The Authors. Ecology Letters published by John Wiley & Sons Ltd and CNRS.
Region of influence regression for estimating the 50-year flood at ungaged sites
Tasker, Gary D.; Hodge, S.A.; Barks, C.S.
1996-01-01
Five methods of developing regional regression models to estimate flood characteristics at ungaged sites in Arkansas are examined. The methods differ in the manner in which the State is divided into subrogions. Each successive method (A to E) is computationally more complex than the previous method. Method A makes no subdivision. Methods B and C define two and four geographic subrogions, respectively. Method D uses cluster/discriminant analysis to define subrogions on the basis of similarities in watershed characteristics. Method E, the new region of influence method, defines a unique subregion for each ungaged site. Split-sample results indicate that, in terms of root-mean-square error, method E (38 percent error) is best. Methods C and D (42 and 41 percent error) were in a virtual tie for second, and methods B (44 percent error) and A (49 percent error) were fourth and fifth best.
Linhart, S. Mike; Nania, Jon F.; Sanders, Curtis L.; Archfield, Stacey A.
2012-01-01
The U.S. Geological Survey (USGS) maintains approximately 148 real-time streamgages in Iowa for which daily mean streamflow information is available, but daily mean streamflow data commonly are needed at locations where no streamgages are present. Therefore, the USGS conducted a study as part of a larger project in cooperation with the Iowa Department of Natural Resources to develop methods to estimate daily mean streamflow at locations in ungaged watersheds in Iowa by using two regression-based statistical methods. The regression equations for the statistical methods were developed from historical daily mean streamflow and basin characteristics from streamgages within the study area, which includes the entire State of Iowa and adjacent areas within a 50-mile buffer of Iowa in neighboring states. Results of this study can be used with other techniques to determine the best method for application in Iowa and can be used to produce a Web-based geographic information system tool to compute streamflow estimates automatically. The Flow Anywhere statistical method is a variation of the drainage-area-ratio method, which transfers same-day streamflow information from a reference streamgage to another location by using the daily mean streamflow at the reference streamgage and the drainage-area ratio of the two locations. The Flow Anywhere method modifies the drainage-area-ratio method in order to regionalize the equations for Iowa and determine the best reference streamgage from which to transfer same-day streamflow information to an ungaged location. Data used for the Flow Anywhere method were retrieved for 123 continuous-record streamgages located in Iowa and within a 50-mile buffer of Iowa. The final regression equations were computed by using either left-censored regression techniques with a low limit threshold set at 0.1 cubic feet per second (ft3/s) and the daily mean streamflow for the 15th day of every other month, or by using an ordinary-least-squares multiple linear regression method and the daily mean streamflow for the 15th day of every other month. The Flow Duration Curve Transfer method was used to estimate unregulated daily mean streamflow from the physical and climatic characteristics of gaged basins. For the Flow Duration Curve Transfer method, daily mean streamflow quantiles at the ungaged site were estimated with the parameter-based regression model, which results in a continuous daily flow-duration curve (the relation between exceedance probability and streamflow for each day of observed streamflow) at the ungaged site. By the use of a reference streamgage, the Flow Duration Curve Transfer is converted to a time series. Data used in the Flow Duration Curve Transfer method were retrieved for 113 continuous-record streamgages in Iowa and within a 50-mile buffer of Iowa. The final statewide regression equations for Iowa were computed by using a weighted-least-squares multiple linear regression method and were computed for the 0.01-, 0.05-, 0.10-, 0.15-, 0.20-, 0.30-, 0.40-, 0.50-, 0.60-, 0.70-, 0.80-, 0.85-, 0.90-, and 0.95-exceedance probability statistics determined from the daily mean streamflow with a reporting limit set at 0.1 ft3/s. The final statewide regression equation for Iowa computed by using left-censored regression techniques was computed for the 0.99-exceedance probability statistic determined from the daily mean streamflow with a low limit threshold and a reporting limit set at 0.1 ft3/s. For the Flow Anywhere method, results of the validation study conducted by using six streamgages show that differences between the root-mean-square error and the mean absolute error ranged from 1,016 to 138 ft3/s, with the larger value signifying a greater occurrence of outliers between observed and estimated streamflows. Root-mean-square-error values ranged from 1,690 to 237 ft3/s. Values of the percent root-mean-square error ranged from 115 percent to 26.2 percent. The logarithm (base 10) streamflow percent root-mean-square error ranged from 13.0 to 5.3 percent. Root-mean-square-error observations standard-deviation-ratio values ranged from 0.80 to 0.40. Percent-bias values ranged from 25.4 to 4.0 percent. Untransformed streamflow Nash-Sutcliffe efficiency values ranged from 0.84 to 0.35. The logarithm (base 10) streamflow Nash-Sutcliffe efficiency values ranged from 0.86 to 0.56. For the streamgage with the best agreement between observed and estimated streamflow, higher streamflows appear to be underestimated. For the streamgage with the worst agreement between observed and estimated streamflow, low flows appear to be overestimated whereas higher flows seem to be underestimated. Estimated cumulative streamflows for the period October 1, 2004, to September 30, 2009, are underestimated by -25.8 and -7.4 percent for the closest and poorest comparisons, respectively. For the Flow Duration Curve Transfer method, results of the validation study conducted by using the same six streamgages show that differences between the root-mean-square error and the mean absolute error ranged from 437 to 93.9 ft3/s, with the larger value signifying a greater occurrence of outliers between observed and estimated streamflows. Root-mean-square-error values ranged from 906 to 169 ft3/s. Values of the percent root-mean-square-error ranged from 67.0 to 25.6 percent. The logarithm (base 10) streamflow percent root-mean-square error ranged from 12.5 to 4.4 percent. Root-mean-square-error observations standard-deviation-ratio values ranged from 0.79 to 0.40. Percent-bias values ranged from 22.7 to 0.94 percent. Untransformed streamflow Nash-Sutcliffe efficiency values ranged from 0.84 to 0.38. The logarithm (base 10) streamflow Nash-Sutcliffe efficiency values ranged from 0.89 to 0.48. For the streamgage with the closest agreement between observed and estimated streamflow, there is relatively good agreement between observed and estimated streamflows. For the streamgage with the poorest agreement between observed and estimated streamflow, streamflows appear to be substantially underestimated for much of the time period. Estimated cumulative streamflow for the period October 1, 2004, to September 30, 2009, are underestimated by -9.3 and -22.7 percent for the closest and poorest comparisons, respectively.
Voss, Frank D.; Curran, Christopher A.; Mastin, Mark C.
2008-01-01
A mechanistic water-temperature model was constructed by the U.S. Geological Survey for use by the Bureau of Reclamation for studying the effect of potential water management decisions on water temperature in the Yakima River between Roza and Prosser, Washington. Flow and water temperature data for model input were obtained from the Bureau of Reclamation Hydromet database and from measurements collected by the U.S. Geological Survey during field trips in autumn 2005. Shading data for the model were collected by the U.S. Geological Survey in autumn 2006. The model was calibrated with data collected from April 1 through October 31, 2005, and tested with data collected from April 1 through October 31, 2006. Sensitivity analysis results showed that for the parameters tested, daily maximum water temperature was most sensitive to changes in air temperature and solar radiation. Root mean squared error for the five sites used for model calibration ranged from 1.3 to 1.9 degrees Celsius (?C) and mean error ranged from ?1.3 to 1.6?C. The root mean squared error for the five sites used for testing simulation ranged from 1.6 to 2.2?C and mean error ranged from 0.1 to 1.3?C. The accuracy of the stream temperatures estimated by the model is limited by four errors (model error, data error, parameter error, and user error).
Rapid Detection of Volatile Oil in Mentha haplocalyx by Near-Infrared Spectroscopy and Chemometrics.
Yan, Hui; Guo, Cheng; Shao, Yang; Ouyang, Zhen
2017-01-01
Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . The effects of data pre-processing methods on the accuracy of the PLSR calibration models were investigated. The performance of the final model was evaluated according to the correlation coefficient ( R ) and root mean square error of prediction (RMSEP). For PLSR model, the best preprocessing method combination was first-order derivative, standard normal variate transformation (SNV), and mean centering, which had of 0.8805, of 0.8719, RMSEC of 0.091, and RMSEP of 0.097, respectively. The wave number variables linking to volatile oil are from 5500 to 4000 cm-1 by analyzing the loading weights and variable importance in projection (VIP) scores. For SVM model, six LVs (less than seven LVs in PLSR model) were adopted in model, and the result was better than PLSR model. The and were 0.9232 and 0.9202, respectively, with RMSEC and RMSEP of 0.084 and 0.082, respectively, which indicated that the predicted values were accurate and reliable. This work demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in M. haplocalyx . The quality of medicine directly links to clinical efficacy, thus, it is important to control the quality of Mentha haplocalyx . Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . For SVM model, 6 LVs (less than 7 LVs in PLSR model) were adopted in model, and the result was better than PLSR model. It demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in Mentha haplocalyx . Abbreviations used: 1 st der: First-order derivative; 2 nd der: Second-order derivative; LOO: Leave-one-out; LVs: Latent variables; MC: Mean centering, NIR: Near-infrared; NIRS: Near infrared spectroscopy; PCR: Principal component regression, PLSR: Partial least squares regression; RBF: Radial basis function; RMSEC: Root mean square error of cross validation, RMSEC: Root mean square error of calibration; RMSEP: Root mean square error of prediction; SNV: Standard normal variate transformation; SVM: Support vector machine; VIP: Variable Importance in projection.
A Canonical Ensemble Correlation Prediction Model for Seasonal Precipitation Anomaly
NASA Technical Reports Server (NTRS)
Shen, Samuel S. P.; Lau, William K. M.; Kim, Kyu-Myong; Li, Guilong
2001-01-01
This report describes an optimal ensemble forecasting model for seasonal precipitation and its error estimation. Each individual forecast is based on the canonical correlation analysis (CCA) in the spectral spaces whose bases are empirical orthogonal functions (EOF). The optimal weights in the ensemble forecasting crucially depend on the mean square error of each individual forecast. An estimate of the mean square error of a CCA prediction is made also using the spectral method. The error is decomposed onto EOFs of the predictand and decreases linearly according to the correlation between the predictor and predictand. This new CCA model includes the following features: (1) the use of area-factor, (2) the estimation of prediction error, and (3) the optimal ensemble of multiple forecasts. The new CCA model is applied to the seasonal forecasting of the United States precipitation field. The predictor is the sea surface temperature.
Assessing and calibrating the ATR-FTIR approach as a carbonate rock characterization tool
NASA Astrophysics Data System (ADS)
Henry, Delano G.; Watson, Jonathan S.; John, Cédric M.
2017-01-01
ATR-FTIR (attenuated total reflectance Fourier transform infrared) spectroscopy can be used as a rapid and economical tool for qualitative identification of carbonates, calcium sulphates, oxides and silicates, as well as quantitatively estimating the concentration of minerals. Over 200 powdered samples with known concentrations of two, three, four and five phase mixtures were made, then a suite of calibration curves were derived that can be used to quantify the minerals. The calibration curves in this study have an R2 that range from 0.93-0.99, a RMSE (root mean square error) of 1-5 wt.% and a maximum error of 3-10 wt.%. The calibration curves were used on 35 geological samples that have previously been studied using XRD (X-ray diffraction). The identification of the minerals using ATR-FTIR is comparable with XRD and the quantitative results have a RMSD (root mean square deviation) of 14% and 12% for calcite and dolomite respectively when compared to XRD results. ATR-FTIR is a rapid technique (identification and quantification takes < 5 min) that involves virtually no cost if the machine is available. It is a common tool in most analytical laboratories, but it also has the potential to be deployed on a rig for real-time data acquisition of the mineralogy of cores and rock chips at the surface as there is no need for special sample preparation, rapid data collection and easy analysis.
Levin, Gregory P; Emerson, Sarah C; Emerson, Scott S
2014-09-01
Many papers have introduced adaptive clinical trial methods that allow modifications to the sample size based on interim estimates of treatment effect. There has been extensive commentary on type I error control and efficiency considerations, but little research on estimation after an adaptive hypothesis test. We evaluate the reliability and precision of different inferential procedures in the presence of an adaptive design with pre-specified rules for modifying the sampling plan. We extend group sequential orderings of the outcome space based on the stage at stopping, likelihood ratio statistic, and sample mean to the adaptive setting in order to compute median-unbiased point estimates, exact confidence intervals, and P-values uniformly distributed under the null hypothesis. The likelihood ratio ordering is found to average shorter confidence intervals and produce higher probabilities of P-values below important thresholds than alternative approaches. The bias adjusted mean demonstrates the lowest mean squared error among candidate point estimates. A conditional error-based approach in the literature has the benefit of being the only method that accommodates unplanned adaptations. We compare the performance of this and other methods in order to quantify the cost of failing to plan ahead in settings where adaptations could realistically be pre-specified at the design stage. We find the cost to be meaningful for all designs and treatment effects considered, and to be substantial for designs frequently proposed in the literature. © 2014, The International Biometric Society.
Fiyadh, Seef Saadi; AlSaadi, Mohammed Abdulhakim; AlOmar, Mohamed Khalid; Fayaed, Sabah Saadi; Hama, Ako R; Bee, Sharifah; El-Shafie, Ahmed
2017-11-01
The main challenge in the lead removal simulation is the behaviour of non-linearity relationships between the process parameters. The conventional modelling technique usually deals with this problem by a linear method. The substitute modelling technique is an artificial neural network (ANN) system, and it is selected to reflect the non-linearity in the interaction among the variables in the function. Herein, synthesized deep eutectic solvents were used as a functionalized agent with carbon nanotubes as adsorbents of Pb 2+ . Different parameters were used in the adsorption study including pH (2.7 to 7), adsorbent dosage (5 to 20 mg), contact time (3 to 900 min) and Pb 2+ initial concentration (3 to 60 mg/l). The number of experimental trials to feed and train the system was 158 runs conveyed in laboratory scale. Two ANN types were designed in this work, the feed-forward back-propagation and layer recurrent; both methods are compared based on their predictive proficiency in terms of the mean square error (MSE), root mean square error, relative root mean square error, mean absolute percentage error and determination coefficient (R 2 ) based on the testing dataset. The ANN model of lead removal was subjected to accuracy determination and the results showed R 2 of 0.9956 with MSE of 1.66 × 10 -4 . The maximum relative error is 14.93% for the feed-forward back-propagation neural network model.
Wang, Rong
2015-01-01
In real-world applications, the image of faces varies with illumination, facial expression, and poses. It seems that more training samples are able to reveal possible images of the faces. Though minimum squared error classification (MSEC) is a widely used method, its applications on face recognition usually suffer from the problem of a limited number of training samples. In this paper, we improve MSEC by using the mirror faces as virtual training samples. We obtained the mirror faces generated from original training samples and put these two kinds of samples into a new set. The face recognition experiments show that our method does obtain high accuracy performance in classification.
Soil sail content estimation in the yellow river delta with satellite hyperspectral data
Weng, Yongling; Gong, Peng; Zhu, Zhi-Liang
2008-01-01
Soil salinization is one of the most common land degradation processes and is a severe environmental hazard. The primary objective of this study is to investigate the potential of predicting salt content in soils with hyperspectral data acquired with EO-1 Hyperion. Both partial least-squares regression (PLSR) and conventional multiple linear regression (MLR), such as stepwise regression (SWR), were tested as the prediction model. PLSR is commonly used to overcome the problem caused by high-dimensional and correlated predictors. Chemical analysis of 95 samples collected from the top layer of soils in the Yellow River delta area shows that salt content was high on average, and the dominant chemicals in the saline soil were NaCl and MgCl2. Multivariate models were established between soil contents and hyperspectral data. Our results indicate that the PLSR technique with laboratory spectral data has a strong prediction capacity. Spectral bands at 1487-1527, 1971-1991, 2032-2092, and 2163-2355 nm possessed large absolute values of regression coefficients, with the largest coefficient at 2203 nm. We obtained a root mean squared error (RMSE) for calibration (with 61 samples) of RMSEC = 0.753 (R2 = 0.893) and a root mean squared error for validation (with 30 samples) of RMSEV = 0.574. The prediction model was applied on a pixel-by-pixel basis to a Hyperion reflectance image to yield a quantitative surface distribution map of soil salt content. The result was validated successfully from 38 sampling points. We obtained an RMSE estimate of 1.037 (R2 = 0.784) for the soil salt content map derived by the PLSR model. The salinity map derived from the SWR model shows that the predicted value is higher than the true value. These results demonstrate that the PLSR method is a more suitable technique than stepwise regression for quantitative estimation of soil salt content in a large area. ?? 2008 CASI.
Estimating accuracy of land-cover composition from two-stage cluster sampling
Stehman, S.V.; Wickham, J.D.; Fattorini, L.; Wade, T.D.; Baffetta, F.; Smith, J.H.
2009-01-01
Land-cover maps are often used to compute land-cover composition (i.e., the proportion or percent of area covered by each class), for each unit in a spatial partition of the region mapped. We derive design-based estimators of mean deviation (MD), mean absolute deviation (MAD), root mean square error (RMSE), and correlation (CORR) to quantify accuracy of land-cover composition for a general two-stage cluster sampling design, and for the special case of simple random sampling without replacement (SRSWOR) at each stage. The bias of the estimators for the two-stage SRSWOR design is evaluated via a simulation study. The estimators of RMSE and CORR have small bias except when sample size is small and the land-cover class is rare. The estimator of MAD is biased for both rare and common land-cover classes except when sample size is large. A general recommendation is that rare land-cover classes require large sample sizes to ensure that the accuracy estimators have small bias. ?? 2009 Elsevier Inc.
Reducing representativeness and sampling errors in radio occultation-radiosonde comparisons
NASA Astrophysics Data System (ADS)
Gilpin, Shay; Rieckh, Therese; Anthes, Richard
2018-05-01
Radio occultation (RO) and radiosonde (RS) comparisons provide a means of analyzing errors associated with both observational systems. Since RO and RS observations are not taken at the exact same time or location, temporal and spatial sampling errors resulting from atmospheric variability can be significant and inhibit error analysis of the observational systems. In addition, the vertical resolutions of RO and RS profiles vary and vertical representativeness errors may also affect the comparison. In RO-RS comparisons, RO observations are co-located with RS profiles within a fixed time window and distance, i.e. within 3-6 h and circles of radii ranging between 100 and 500 km. In this study, we first show that vertical filtering of RO and RS profiles to a common vertical resolution reduces representativeness errors. We then test two methods of reducing horizontal sampling errors during RO-RS comparisons: restricting co-location pairs to within ellipses oriented along the direction of wind flow rather than circles and applying a spatial-temporal sampling correction based on model data. Using data from 2011 to 2014, we compare RO and RS differences at four GCOS Reference Upper-Air Network (GRUAN) RS stations in different climatic locations, in which co-location pairs were constrained to a large circle ( ˜ 666 km radius), small circle ( ˜ 300 km radius), and ellipse parallel to the wind direction ( ˜ 666 km semi-major axis, ˜ 133 km semi-minor axis). We also apply a spatial-temporal sampling correction using European Centre for Medium-Range Weather Forecasts Interim Reanalysis (ERA-Interim) gridded data. Restricting co-locations to within the ellipse reduces root mean square (RMS) refractivity, temperature, and water vapor pressure differences relative to RMS differences within the large circle and produces differences that are comparable to or less than the RMS differences within circles of similar area. Applying the sampling correction shows the most significant reduction in RMS differences, such that RMS differences are nearly identical to the sampling correction regardless of the geometric constraints. We conclude that implementing the spatial-temporal sampling correction using a reliable model will most effectively reduce sampling errors during RO-RS comparisons; however, if a reliable model is not available, restricting spatial comparisons to within an ellipse parallel to the wind flow will reduce sampling errors caused by horizontal atmospheric variability.
Evrendilek, Fatih
2007-12-12
This study aims at quantifying spatio-temporal dynamics of monthly mean dailyincident photosynthetically active radiation (PAR) over a vast and complex terrain such asTurkey. The spatial interpolation method of universal kriging, and the combination ofmultiple linear regression (MLR) models and map algebra techniques were implemented togenerate surface maps of PAR with a grid resolution of 500 x 500 m as a function of fivegeographical and 14 climatic variables. Performance of the geostatistical and MLR modelswas compared using mean prediction error (MPE), root-mean-square prediction error(RMSPE), average standard prediction error (ASE), mean standardized prediction error(MSPE), root-mean-square standardized prediction error (RMSSPE), and adjustedcoefficient of determination (R² adj. ). The best-fit MLR- and universal kriging-generatedmodels of monthly mean daily PAR were validated against an independent 37-year observeddataset of 35 climate stations derived from 160 stations across Turkey by the Jackknifingmethod. The spatial variability patterns of monthly mean daily incident PAR were moreaccurately reflected in the surface maps created by the MLR-based models than in thosecreated by the universal kriging method, in particular, for spring (May) and autumn(November). The MLR-based spatial interpolation algorithms of PAR described in thisstudy indicated the significance of the multifactor approach to understanding and mappingspatio-temporal dynamics of PAR for a complex terrain over meso-scales.
Geometric Quality Assessment of LIDAR Data Based on Swath Overlap
NASA Astrophysics Data System (ADS)
Sampath, A.; Heidemann, H. K.; Stensaas, G. L.
2016-06-01
This paper provides guidelines on quantifying the relative horizontal and vertical errors observed between conjugate features in the overlapping regions of lidar data. The quantification of these errors is important because their presence quantifies the geometric quality of the data. A data set can be said to have good geometric quality if measurements of identical features, regardless of their position or orientation, yield identical results. Good geometric quality indicates that the data are produced using sensor models that are working as they are mathematically designed, and data acquisition processes are not introducing any unforeseen distortion in the data. High geometric quality also leads to high geolocation accuracy of the data when the data acquisition process includes coupling the sensor with geopositioning systems. Current specifications (e.g. Heidemann 2014) do not provide adequate means to quantitatively measure these errors, even though they are required to be reported. Current accuracy measurement and reporting practices followed in the industry and as recommended by data specification documents also potentially underestimate the inter-swath errors, including the presence of systematic errors in lidar data. Hence they pose a risk to the user in terms of data acceptance (i.e. a higher potential for Type II error indicating risk of accepting potentially unsuitable data). For example, if the overlap area is too small or if the sampled locations are close to the center of overlap, or if the errors are sampled in flat regions when there are residual pitch errors in the data, the resultant Root Mean Square Differences (RMSD) can still be small. To avoid this, the following are suggested to be used as criteria for defining the inter-swath quality of data: a) Median Discrepancy Angle b) Mean and RMSD of Horizontal Errors using DQM measured on sloping surfaces c) RMSD for sampled locations from flat areas (defined as areas with less than 5 degrees of slope) It is suggested that 4000-5000 points are uniformly sampled in the overlapping regions of the point cloud, and depending on the surface roughness, to measure the discrepancy between swaths. Care must be taken to sample only areas of single return points only. Point-to-Plane distance based data quality measures are determined for each sample point. These measurements are used to determine the above mentioned parameters. This paper details the measurements and analysis of measurements required to determine these metrics, i.e. Discrepancy Angle, Mean and RMSD of errors in flat regions and horizontal errors obtained using measurements extracted from sloping regions (slope greater than 10 degrees). The research is a result of an ad-hoc joint working group of the US Geological Survey and the American Society for Photogrammetry and Remote Sensing (ASPRS) Airborne Lidar Committee.
Pang, Yuan-Ping
2016-09-01
Predicting crystallographic B-factors of a protein from a conventional molecular dynamics simulation is challenging, in part because the B-factors calculated through sampling the atomic positional fluctuations in a picosecond molecular dynamics simulation are unreliable, and the sampling of a longer simulation yields overly large root mean square deviations between calculated and experimental B-factors. This article reports improved B-factor prediction achieved by sampling the atomic positional fluctuations in multiple picosecond molecular dynamics simulations that use uniformly increased atomic masses by 100-fold to increase time resolution. Using the third immunoglobulin-binding domain of protein G, bovine pancreatic trypsin inhibitor, ubiquitin, and lysozyme as model systems, the B-factor root mean square deviations (mean ± standard error) of these proteins were 3.1 ± 0.2-9 ± 1 Å 2 for Cα and 7.3 ± 0.9-9.6 ± 0.2 Å 2 for Cγ, when the sampling was done for each of these proteins over 20 distinct, independent, and 50-picosecond high-mass molecular dynamics simulations with AMBER forcefield FF12MC or FF14SB. These results suggest that sampling the atomic positional fluctuations in multiple picosecond high-mass molecular dynamics simulations may be conducive to a priori prediction of crystallographic B-factors of a folded globular protein.
Ignjatovic, Anita Rakic; Miljkovic, Branislava; Todorovic, Dejan; Timotijevic, Ivana; Pokrajac, Milena
2011-05-01
Because moclobemide pharmacokinetics vary considerably among individuals, monitoring of plasma concentrations lends insight into its pharmacokinetic behavior and enhances its rational use in clinical practice. The aim of this study was to evaluate whether single concentration-time points could adequately predict moclobemide systemic exposure. Pharmacokinetic data (full 7-point pharmacokinetic profiles), obtained from 21 depressive inpatients receiving moclobemide (150 mg 3 times daily), were randomly split into development (n = 18) and validation (n = 16) sets. Correlations between the single concentration-time points and the area under the concentration-time curve within a 6-hour dosing interval at steady-state (AUC(0-6)) were assessed by linear regression analyses. The predictive performance of single-point sampling strategies was evaluated in the validation set by mean prediction error, mean absolute error, and root mean square error. Plasma concentrations in the absorption phase yielded unsatisfactory predictions of moclobemide AUC(0-6). The best estimation of AUC(0-6) was achieved from concentrations at 4 and 6 hours following dosing. As the most reliable surrogate for moclobemide systemic exposure, concentrations at 4 and 6 hours should be used instead of predose trough concentrations as an indicator of between-patient variability and a guide for dose adjustments in specific clinical situations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernardis, F. De; Aiola, S.; Vavagiakis, E. M.
Here, we present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariancemore » matrix of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernardis, F. De; Vavagiakis, E.M.; Niemack, M.D.
We present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariance matrixmore » of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.« less
NASA Technical Reports Server (NTRS)
De Bernardis, F.; Aiola, S.; Vavagiakis, E. M.; Battaglia, N.; Niemack, M. D.; Beall, J.; Becker, D. T.; Bond, J. R.; Calabrese, E.; Cho, H.;
2017-01-01
We present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariance matrix of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.
NASA Astrophysics Data System (ADS)
De Bernardis, F.; Aiola, S.; Vavagiakis, E. M.; Battaglia, N.; Niemack, M. D.; Beall, J.; Becker, D. T.; Bond, J. R.; Calabrese, E.; Cho, H.; Coughlin, K.; Datta, R.; Devlin, M.; Dunkley, J.; Dunner, R.; Ferraro, S.; Fox, A.; Gallardo, P. A.; Halpern, M.; Hand, N.; Hasselfield, M.; Henderson, S. W.; Hill, J. C.; Hilton, G. C.; Hilton, M.; Hincks, A. D.; Hlozek, R.; Hubmayr, J.; Huffenberger, K.; Hughes, J. P.; Irwin, K. D.; Koopman, B. J.; Kosowsky, A.; Li, D.; Louis, T.; Lungu, M.; Madhavacheril, M. S.; Maurin, L.; McMahon, J.; Moodley, K.; Naess, S.; Nati, F.; Newburgh, L.; Nibarger, J. P.; Page, L. A.; Partridge, B.; Schaan, E.; Schmitt, B. L.; Sehgal, N.; Sievers, J.; Simon, S. M.; Spergel, D. N.; Staggs, S. T.; Stevens, J. R.; Thornton, R. J.; van Engelen, A.; Van Lanen, J.; Wollack, E. J.
2017-03-01
We present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariance matrix of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.
Bernardis, F. De; Aiola, S.; Vavagiakis, E. M.; ...
2017-03-07
Here, we present a new measurement of the kinematic Sunyaev-Zel'dovich effect using data from the Atacama Cosmology Telescope (ACT) and the Baryon Oscillation Spectroscopic Survey (BOSS). Using 600 square degrees of overlapping sky area, we evaluate the mean pairwise baryon momentum associated with the positions of 50,000 bright galaxies in the BOSS DR11 Large Scale Structure catalog. A non-zero signal arises from the large-scale motions of halos containing the sample galaxies. The data fits an analytical signal model well, with the optical depth to microwave photon scattering as a free parameter determining the overall signal amplitude. We estimate the covariancemore » matrix of the mean pairwise momentum as a function of galaxy separation, using microwave sky simulations, jackknife evaluation, and bootstrap estimates. The most conservative simulation-based errors give signal-to-noise estimates between 3.6 and 4.1 for varying galaxy luminosity cuts. We discuss how the other error determinations can lead to higher signal-to-noise values, and consider the impact of several possible systematic errors. Estimates of the optical depth from the average thermal Sunyaev-Zel'dovich signal at the sample galaxy positions are broadly consistent with those obtained from the mean pairwise momentum signal.« less
Ouyang, Qin; Zhao, Jiewen; Chen, Quansheng
2015-01-01
The non-sugar solids (NSS) content is one of the most important nutrition indicators of Chinese rice wine. This study proposed a rapid method for the measurement of NSS content in Chinese rice wine using near infrared (NIR) spectroscopy. We also systemically studied the efficient spectral variables selection algorithms that have to go through modeling. A new algorithm of synergy interval partial least square with competitive adaptive reweighted sampling (Si-CARS-PLS) was proposed for modeling. The performance of the final model was back-evaluated using root mean square error of calibration (RMSEC) and correlation coefficient (Rc) in calibration set and similarly tested by mean square error of prediction (RMSEP) and correlation coefficient (Rp) in prediction set. The optimum model by Si-CARS-PLS algorithm was achieved when 7 PLS factors and 18 variables were included, and the results were as follows: Rc=0.95 and RMSEC=1.12 in the calibration set, Rp=0.95 and RMSEP=1.22 in the prediction set. In addition, Si-CARS-PLS algorithm showed its superiority when compared with the commonly used algorithms in multivariate calibration. This work demonstrated that NIR spectroscopy technique combined with a suitable multivariate calibration algorithm has a high potential in rapid measurement of NSS content in Chinese rice wine. Copyright © 2015 Elsevier B.V. All rights reserved.
Flanders, W Dana; Kirkland, Kimberly H; Shelton, Brian G
2014-10-01
Outbreaks of Legionnaires' disease require environmental testing of water samples from potentially implicated building water systems to identify the source of exposure. A previous study reports a large impact on Legionella sample results due to shipping and delays in sample processing. Specifically, this same study, without accounting for measurement error, reports more than half of shipped samples tested had Legionella levels that arbitrarily changed up or down by one or more logs, and the authors attribute this result to shipping time. Accordingly, we conducted a study to determine the effects of sample holding/shipping time on Legionella sample results while taking into account measurement error, which has previously not been addressed. We analyzed 159 samples, each split into 16 aliquots, of which one-half (8) were processed promptly after collection. The remaining half (8) were processed the following day to assess impact of holding/shipping time. A total of 2544 samples were analyzed including replicates. After accounting for inherent measurement error, we found that the effect of holding time on observed Legionella counts was small and should have no practical impact on interpretation of results. Holding samples increased the root mean squared error by only about 3-8%. Notably, for only one of 159 samples, did the average of the 8 replicate counts change by 1 log. Thus, our findings do not support the hypothesis of frequent, significant (≥= 1 log10 unit) Legionella colony count changes due to holding. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Measurement invariance via multigroup SEM: Issues and solutions with chi-square-difference tests.
Yuan, Ke-Hai; Chan, Wai
2016-09-01
Multigroup structural equation modeling (SEM) plays a key role in studying measurement invariance and in group comparison. When population covariance matrices are deemed not equal across groups, the next step to substantiate measurement invariance is to see whether the sample covariance matrices in all the groups can be adequately fitted by the same factor model, called configural invariance. After configural invariance is established, cross-group equalities of factor loadings, error variances, and factor variances-covariances are then examined in sequence. With mean structures, cross-group equalities of intercepts and factor means are also examined. The established rule is that if the statistic at the current model is not significant at the level of .05, one then moves on to testing the next more restricted model using a chi-square-difference statistic. This article argues that such an established rule is unable to control either Type I or Type II errors. Analysis, an example, and Monte Carlo results show why and how chi-square-difference tests are easily misused. The fundamental issue is that chi-square-difference tests are developed under the assumption that the base model is sufficiently close to the population, and a nonsignificant chi-square statistic tells little about how good the model is. To overcome this issue, this article further proposes that null hypothesis testing in multigroup SEM be replaced by equivalence testing, which allows researchers to effectively control the size of misspecification before moving on to testing a more restricted model. R code is also provided to facilitate the applications of equivalence testing for multigroup SEM. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Model error in covariance structure models: Some implications for power and Type I error
Coffman, Donna L.
2010-01-01
The present study investigated the degree to which violation of the parameter drift assumption affects the Type I error rate for the test of close fit and power analysis procedures proposed by MacCallum, Browne, and Sugawara (1996) for both the test of close fit and the test of exact fit. The parameter drift assumption states that as sample size increases both sampling error and model error (i.e. the degree to which the model is an approximation in the population) decrease. Model error was introduced using a procedure proposed by Cudeck and Browne (1992). The empirical power for both the test of close fit, in which the null hypothesis specifies that the Root Mean Square Error of Approximation (RMSEA) ≤ .05, and the test of exact fit, in which the null hypothesis specifies that RMSEA = 0, is compared with the theoretical power computed using the MacCallum et al. (1996) procedure. The empirical power and theoretical power for both the test of close fit and the test of exact fit are nearly identical under violations of the assumption. The results also indicated that the test of close fit maintains the nominal Type I error rate under violations of the assumption. PMID:21331302
NASA Astrophysics Data System (ADS)
Zhang, Xiaodong; Chen, Long; Sun, Yangbo; Bai, Yu; Huang, Bisheng; Chen, Keli
2018-03-01
Near-infrared (NIR) spectroscopy has been widely used in the analysis fields of traditional Chinese medicine. It has the advantages of fast analysis, no damage to samples and no pollution. In this research, a fast quantitative model for zinc oxide (ZnO) content in mineral medicine calamine was explored based on NIR spectroscopy. NIR spectra of 57 batches of calamine samples were collected and the first derivative (FD) method was adopted for conducting spectral pretreatment. The content of ZnO in calamine sample was determined using ethylenediaminetetraacetic acid (EDTA) titration and taken as reference value of NIR spectroscopy. 57 batches of calamine samples were categorized into calibration and prediction set using the Kennard-Stone (K-S) algorithm. Firstly, in the calibration set, to calculate the correlation coefficient (r) between the absorbance value and the ZnO content of corresponding samples at each wave number. Next, according to the square correlation coefficient (r2) value to obtain the top 50 wave numbers to compose the characteristic spectral bands (4081.8-4096.3, 4188.9-4274.7, 4335.4, 4763.6,4794.4-4802.1, 4809.9, 4817.6-4875.4 cm- 1), which were used to establish the quantitative model of ZnO content using back propagation artificial neural network (BP-ANN) algorithm. Then, the 50 wave numbers were operated by the mean impact value (MIV) algorithm to choose wave numbers whose absolute value of MIV greater than or equal to 25, to obtain the optimal characteristic spectral bands (4875.4-4836.9, 4223.6-4080.9 cm- 1). And then, both internal cross and external validation were used to screen the number of hidden layer nodes of BP-ANN. Finally, the number 4 of hidden layer nodes was chosen as the best. At last, the BP-ANN model was found to enjoy a high accuracy and strong forecasting capacity for analyzing ZnO content in calamine samples ranging within 42.05-69.98%, with relative mean square error of cross validation (RMSECV) of 1.66% and coefficient of determination (R2) of 95.75% in internal cross and relative mean square error of prediction (RMSEP) of 1.98%, R2 of 97.94% and ratio of performance to deviation (RPD) of 6.11 in external validation.
A network application for modeling a centrifugal compressor performance map
NASA Astrophysics Data System (ADS)
Nikiforov, A.; Popova, D.; Soldatova, K.
2017-08-01
The approximation of aerodynamic performance of a centrifugal compressor stage and vaneless diffuser by neural networks is presented. Advantages, difficulties and specific features of the method are described. An example of a neural network and its structure is shown. The performances in terms of efficiency, pressure ratio and work coefficient of 39 model stages within the range of flow coefficient from 0.01 to 0.08 were modeled with mean squared error 1.5 %. In addition, the loss and friction coefficients of vaneless diffusers of relative widths 0.014-0.10 are modeled with mean squared error 2.45 %.
Bär, David; Debus, Heiko; Brzenczek, Sina; Fischer, Wolfgang; Imming, Peter
2018-03-20
Near-infrared spectroscopy is frequently used by the pharmaceutical industry to monitor and optimize several production processes. In combination with chemometrics, a mathematical-statistical technique, the following advantages of near-infrared spectroscopy can be applied: It is a fast, non-destructive, non-invasive, and economical analytical method. One of the most advanced and popular chemometric technique is the partial least square algorithm with its best applicability in routine and its results. The required reference analytic enables the analysis of various parameters of interest, for example, moisture content, particle size, and many others. Parameters like the correlation coefficient, root mean square error of prediction, root mean square error of calibration, and root mean square error of validation have been used for evaluating the applicability and robustness of these analytical methods developed. This study deals with investigating a Naproxen Sodium granulation process using near-infrared spectroscopy and the development of water content and particle-size methods. For the water content method, one should consider a maximum water content of about 21% in the granulation process, which must be confirmed by the loss on drying. Further influences to be considered are the constantly changing product temperature, rising to about 54 °C, the creation of hydrated states of Naproxen Sodium when using a maximum of about 21% water content, and the large quantity of about 87% Naproxen Sodium in the formulation. It was considered to use a combination of these influences in developing the near-infrared spectroscopy method for the water content of Naproxen Sodium granules. The "Root Mean Square Error" was 0.25% for calibration dataset and 0.30% for the validation dataset, which was obtained after different stages of optimization by multiplicative scatter correction and the first derivative. Using laser diffraction, the granules have been analyzed for particle sizes and obtaining the summary sieve sizes of >63 μm and >100 μm. The following influences should be considered for application in routine production: constant changes in water content up to 21% and a product temperature up to 54 °C. The different stages of optimization result in a "Root Mean Square Error" of 2.54% for the calibration data set and 3.53% for the validation set by using the Kubelka-Munk conversion and first derivative for the near-infrared spectroscopy method for a particle size >63 μm. For the near-infrared spectroscopy method using a particle size >100 μm, the "Root Mean Square Error" was 3.47% for the calibration data set and 4.51% for the validation set, while using the same pre-treatments. - The robustness and suitability of this methodology has already been demonstrated by its recent successful implementation in a routine granulate production process. Copyright © 2018 Elsevier B.V. All rights reserved.
Comparative study of four time series methods in forecasting typhoid fever incidence in China.
Zhang, Xingyu; Liu, Yuanyuan; Yang, Min; Zhang, Tao; Young, Alistair A; Li, Xiaosong
2013-01-01
Accurate incidence forecasting of infectious disease is critical for early prevention and for better government strategic planning. In this paper, we present a comprehensive study of different forecasting methods based on the monthly incidence of typhoid fever. The seasonal autoregressive integrated moving average (SARIMA) model and three different models inspired by neural networks, namely, back propagation neural networks (BPNN), radial basis function neural networks (RBFNN), and Elman recurrent neural networks (ERNN) were compared. The differences as well as the advantages and disadvantages, among the SARIMA model and the neural networks were summarized and discussed. The data obtained for 2005 to 2009 and for 2010 from the Chinese Center for Disease Control and Prevention were used as modeling and forecasting samples, respectively. The performances were evaluated based on three metrics: mean absolute error (MAE), mean absolute percentage error (MAPE), and mean square error (MSE). The results showed that RBFNN obtained the smallest MAE, MAPE and MSE in both the modeling and forecasting processes. The performances of the four models ranked in descending order were: RBFNN, ERNN, BPNN and the SARIMA model.
Comparative Study of Four Time Series Methods in Forecasting Typhoid Fever Incidence in China
Zhang, Xingyu; Liu, Yuanyuan; Yang, Min; Zhang, Tao; Young, Alistair A.; Li, Xiaosong
2013-01-01
Accurate incidence forecasting of infectious disease is critical for early prevention and for better government strategic planning. In this paper, we present a comprehensive study of different forecasting methods based on the monthly incidence of typhoid fever. The seasonal autoregressive integrated moving average (SARIMA) model and three different models inspired by neural networks, namely, back propagation neural networks (BPNN), radial basis function neural networks (RBFNN), and Elman recurrent neural networks (ERNN) were compared. The differences as well as the advantages and disadvantages, among the SARIMA model and the neural networks were summarized and discussed. The data obtained for 2005 to 2009 and for 2010 from the Chinese Center for Disease Control and Prevention were used as modeling and forecasting samples, respectively. The performances were evaluated based on three metrics: mean absolute error (MAE), mean absolute percentage error (MAPE), and mean square error (MSE). The results showed that RBFNN obtained the smallest MAE, MAPE and MSE in both the modeling and forecasting processes. The performances of the four models ranked in descending order were: RBFNN, ERNN, BPNN and the SARIMA model. PMID:23650546
[NIR Assignment of Magnolol by 2D-COS Technology and Model Application Huoxiangzhengqi Oral Liduid].
Pei, Yan-ling; Wu, Zhi-sheng; Shi, Xin-yuan; Pan, Xiao-ning; Peng, Yan-fang; Qiao, Yan-jiang
2015-08-01
Near infrared (NIR) spectroscopy assignment of Magnolol was performed using deuterated chloroform solvent and two-dimensional correlation spectroscopy (2D-COS) technology. According to the synchronous spectra of deuterated chloroform solvent and Magnolol, 1365~1455, 1600~1720, 2000~2181 and 2275~2465 nm were the characteristic absorption of Magnolol. Connected with the structure of Magnolol, 1440 nm was the stretching vibration of phenolic group O-H, 1679 nm was the stretching vibration of aryl and methyl which connected with aryl, 2117, 2304, 2339 and 2370 nm were the combination of the stretching vibration, bending vibration and deformation vibration for aryl C-H, 2445 nm were the bending vibration of methyl which linked with aryl group, these bands attribut to the characteristics of Magnolol. Huoxiangzhengqi Oral Liduid was adopted to study the Magnolol, the characteristic band by spectral assignment and the band by interval Partial Least Squares (iPLS) and Synergy interval Partial Least Squares (SiPLS) were used to establish Partial Least Squares (PLS) quantitative model, the coefficient of determination Rcal(2) and Rpre(2) were greater than 0.99, the Root Mean of Square Error of Calibration (RM-SEC), Root Mean of Square Error of Cross Validation (RMSECV) and Root Mean of Square Error of Prediction (RMSEP) were very small. It indicated that the characteristic band by spectral assignment has the same results with the Chemometrics in PLS model. It provided a reference for NIR spectral assignment of chemical compositions in Chinese Materia Medica, and the band filters of NIR were interpreted.
August Median Streamflow on Ungaged Streams in Eastern Aroostook County, Maine
Lombard, Pamela J.; Tasker, Gary D.; Nielsen, Martha G.
2003-01-01
Methods for estimating August median streamflow were developed for ungaged, unregulated streams in the eastern part of Aroostook County, Maine, with drainage areas from 0.38 to 43 square miles and mean basin elevations from 437 to 1,024 feet. Few long-term, continuous-record streamflow-gaging stations with small drainage areas were available from which to develop the equations; therefore, 24 partial-record gaging stations were established in this investigation. A mathematical technique for estimating a standard low-flow statistic, August median streamflow, at partial-record stations was applied by relating base-flow measurements at these stations to concurrent daily flows at nearby long-term, continuous-record streamflow- gaging stations (index stations). Generalized least-squares regression analysis (GLS) was used to relate estimates of August median streamflow at gaging stations to basin characteristics at these same stations to develop equations that can be applied to estimate August median streamflow on ungaged streams. GLS accounts for varying periods of record at the gaging stations and the cross correlation of concurrent streamflows among gaging stations. Twenty-three partial-record stations and one continuous-record station were used for the final regression equations. The basin characteristics of drainage area and mean basin elevation are used in the calculated regression equation for ungaged streams to estimate August median flow. The equation has an average standard error of prediction from -38 to 62 percent. A one-variable equation uses only drainage area to estimate August median streamflow when less accuracy is acceptable. This equation has an average standard error of prediction from -40 to 67 percent. Model error is larger than sampling error for both equations, indicating that additional basin characteristics could be important to improved estimates of low-flow statistics. Weighted estimates of August median streamflow, which can be used when making estimates at partial-record or continuous-record gaging stations, range from 0.03 to 11.7 cubic feet per second or from 0.1 to 0.4 cubic feet per second per square mile. Estimates of August median streamflow on ungaged streams in the eastern part of Aroostook County, within the range of acceptable explanatory variables, range from 0.03 to 30 cubic feet per second or 0.1 to 0.7 cubic feet per second per square mile. Estimates of August median streamflow per square mile of drainage area generally increase as mean elevation and drainage area increase.
Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Xi, Xiuxiu
2015-01-01
The measurement of soil total nitrogen (TN) by hyperspectral remote sensing provides an important tool for soil restoration programs in areas with subsided land caused by the extraction of natural resources. This study used the local correlation maximization-complementary superiority method (LCMCS) to establish TN prediction models by considering the relationship between spectral reflectance (measured by an ASD FieldSpec 3 spectroradiometer) and TN based on spectral reflectance curves of soil samples collected from subsided land which is determined by synthetic aperture radar interferometry (InSAR) technology. Based on the 1655 selected effective bands of the optimal spectrum (OSP) of the first derivate differential of reciprocal logarithm ([log{1/R}]′), (correlation coefficients, p < 0.01), the optimal model of LCMCS method was obtained to determine the final model, which produced lower prediction errors (root mean square error of validation [RMSEV] = 0.89, mean relative error of validation [MREV] = 5.93%) when compared with models built by the local correlation maximization (LCM), complementary superiority (CS) and partial least squares regression (PLS) methods. The predictive effect of LCMCS model was optional in Cangzhou, Renqiu and Fengfeng District. Results indicate that the LCMCS method has great potential to monitor TN in subsided lands caused by the extraction of natural resources including groundwater, oil and coal. PMID:26213935
Ruangsetakit, Varee
2015-11-01
To re-examine relative accuracy of intraocular lens (IOL) power calculation of immersion ultrasound biometry (IUB) and partial coherence interferometry (PCI) based on a new approach that limits its interest on the cases in which the IUB's IOL and PCI's IOL assignments disagree. Prospective observational study of 108 eyes that underwent cataract surgeries at Taksin Hospital. Two halves ofthe randomly chosen sample eyes were implanted with the IUB- and PCI-assigned lens. Postoperative refractive errors were measured in the fifth week. More accurate calculation was based on significantly smaller mean absolute errors (MAEs) and root mean squared errors (RMSEs) away from emmetropia. The distributions of the errors were examined to ensure that the higher accuracy was significant clinically as well. The (MAEs, RMSEs) were smaller for PCI of (0.5106 diopter (D), 0.6037D) than for IUB of (0.7000D, 0.8062D). The higher accuracy was principally contributedfrom negative errors, i.e., myopia. The MAEs and RMSEs for (IUB, PCI)'s negative errors were (0.7955D, 0.5185D) and (0.8562D, 0.5853D). Their differences were significant. The 72.34% of PCI errors fell within a clinically accepted range of ± 0.50D, whereas 50% of IUB errors did. PCI's higher accuracy was significant statistically and clinically, meaning that lens implantation based on PCI's assignments could improve postoperative outcomes over those based on IUB's assignments.
Hypoglycemia early alarm systems based on recursive autoregressive partial least squares models.
Bayrak, Elif Seyma; Turksoy, Kamuran; Cinar, Ali; Quinn, Lauretta; Littlejohn, Elizabeth; Rollins, Derrick
2013-01-01
Hypoglycemia caused by intensive insulin therapy is a major challenge for artificial pancreas systems. Early detection and prevention of potential hypoglycemia are essential for the acceptance of fully automated artificial pancreas systems. Many of the proposed alarm systems are based on interpretation of recent values or trends in glucose values. In the present study, subject-specific linear models are introduced to capture glucose variations and predict future blood glucose concentrations. These models can be used in early alarm systems of potential hypoglycemia. A recursive autoregressive partial least squares (RARPLS) algorithm is used to model the continuous glucose monitoring sensor data and predict future glucose concentrations for use in hypoglycemia alarm systems. The partial least squares models constructed are updated recursively at each sampling step with a moving window. An early hypoglycemia alarm algorithm using these models is proposed and evaluated. Glucose prediction models based on real-time filtered data has a root mean squared error of 7.79 and a sum of squares of glucose prediction error of 7.35% for six-step-ahead (30 min) glucose predictions. The early alarm systems based on RARPLS shows good performance. A sensitivity of 86% and a false alarm rate of 0.42 false positive/day are obtained for the early alarm system based on six-step-ahead predicted glucose values with an average early detection time of 25.25 min. The RARPLS models developed provide satisfactory glucose prediction with relatively smaller error than other proposed algorithms and are good candidates to forecast and warn about potential hypoglycemia unless preventive action is taken far in advance. © 2012 Diabetes Technology Society.
Hypoglycemia Early Alarm Systems Based on Recursive Autoregressive Partial Least Squares Models
Bayrak, Elif Seyma; Turksoy, Kamuran; Cinar, Ali; Quinn, Lauretta; Littlejohn, Elizabeth; Rollins, Derrick
2013-01-01
Background Hypoglycemia caused by intensive insulin therapy is a major challenge for artificial pancreas systems. Early detection and prevention of potential hypoglycemia are essential for the acceptance of fully automated artificial pancreas systems. Many of the proposed alarm systems are based on interpretation of recent values or trends in glucose values. In the present study, subject-specific linear models are introduced to capture glucose variations and predict future blood glucose concentrations. These models can be used in early alarm systems of potential hypoglycemia. Methods A recursive autoregressive partial least squares (RARPLS) algorithm is used to model the continuous glucose monitoring sensor data and predict future glucose concentrations for use in hypoglycemia alarm systems. The partial least squares models constructed are updated recursively at each sampling step with a moving window. An early hypoglycemia alarm algorithm using these models is proposed and evaluated. Results Glucose prediction models based on real-time filtered data has a root mean squared error of 7.79 and a sum of squares of glucose prediction error of 7.35% for six-step-ahead (30 min) glucose predictions. The early alarm systems based on RARPLS shows good performance. A sensitivity of 86% and a false alarm rate of 0.42 false positive/day are obtained for the early alarm system based on six-step-ahead predicted glucose values with an average early detection time of 25.25 min. Conclusions The RARPLS models developed provide satisfactory glucose prediction with relatively smaller error than other proposed algorithms and are good candidates to forecast and warn about potential hypoglycemia unless preventive action is taken far in advance. PMID:23439179
Tests of Independence in Contingency Tables with Small Samples: A Comparison of Statistical Power.
ERIC Educational Resources Information Center
Parshall, Cynthia G.; Kromrey, Jeffrey D.
1996-01-01
Power and Type I error rates were estimated for contingency tables with small sample sizes for the following four types of tests: (1) Pearson's chi-square; (2) chi-square with Yates's continuity correction; (3) the likelihood ratio test; and (4) Fisher's Exact Test. Various marginal distributions, sample sizes, and effect sizes were examined. (SLD)
A Robust Bayesian Random Effects Model for Nonlinear Calibration Problems
Fong, Y.; Wakefield, J.; De Rosa, S.; Frahm, N.
2013-01-01
Summary In the context of a bioassay or an immunoassay, calibration means fitting a curve, usually nonlinear, through the observations collected on a set of samples containing known concentrations of a target substance, and then using the fitted curve and observations collected on samples of interest to predict the concentrations of the target substance in these samples. Recent technological advances have greatly improved our ability to quantify minute amounts of substance from a tiny volume of biological sample. This has in turn led to a need to improve statistical methods for calibration. In this paper, we focus on developing calibration methods robust to dependent outliers. We introduce a novel normal mixture model with dependent error terms to model the experimental noise. In addition, we propose a re-parameterization of the five parameter logistic nonlinear regression model that allows us to better incorporate prior information. We examine the performance of our methods with simulation studies and show that they lead to a substantial increase in performance measured in terms of mean squared error of estimation and a measure of the average prediction accuracy. A real data example from the HIV Vaccine Trials Network Laboratory is used to illustrate the methods. PMID:22551415
NASA Astrophysics Data System (ADS)
Adineh-Vand, A.; Torabi, M.; Roshani, G. H.; Taghipour, M.; Feghhi, S. A. H.; Rezaei, M.; Sadati, S. M.
2013-09-01
This paper presents a soft computing based artificial intelligent technique, adaptive neuro-fuzzy inference system (ANFIS) to predict the neutron production rate (NPR) of IR-IECF device in wide discharge current and voltage ranges. A hybrid learning algorithm consists of back-propagation and least-squares estimation is used for training the ANFIS model. The performance of the proposed ANFIS model is tested using the experimental data using four performance measures: correlation coefficient, mean absolute error, mean relative error percentage (MRE%) and root mean square error. The obtained results show that the proposed ANFIS model has achieved good agreement with the experimental results. In comparison to the experimental data the proposed ANFIS model has MRE% <1.53 and 2.85 % for training and testing data respectively. Therefore, this model can be used as an efficient tool to predict the NPR in the IR-IECF device.
Quantitative transmission Raman spectroscopy of pharmaceutical tablets and capsules.
Johansson, Jonas; Sparén, Anders; Svensson, Olof; Folestad, Staffan; Claybourn, Mike
2007-11-01
Quantitative analysis of pharmaceutical formulations using the new approach of transmission Raman spectroscopy has been investigated. For comparison, measurements were also made in conventional backscatter mode. The experimental setup consisted of a Raman probe-based spectrometer with 785 nm excitation for measurements in backscatter mode. In transmission mode the same system was used to detect the Raman scattered light, while an external diode laser of the same type was used as excitation source. Quantitative partial least squares models were developed for both measurement modes. The results for tablets show that the prediction error for an independent test set was lower for the transmission measurements with a relative root mean square error of about 2.2% as compared with 2.9% for the backscatter mode. Furthermore, the models were simpler in the transmission case, for which only a single partial least squares (PLS) component was required to explain the variation. The main reason for the improvement using the transmission mode is a more representative sampling of the tablets compared with the backscatter mode. Capsules containing mixtures of pharmaceutical powders were also assessed by transmission only. The quantitative results for the capsules' contents were good, with a prediction error of 3.6% w/w for an independent test set. The advantage of transmission Raman over backscatter Raman spectroscopy has been demonstrated for quantitative analysis of pharmaceutical formulations, and the prospects for reliable, lean calibrations for pharmaceutical analysis is discussed.
Dental age estimation in Japanese individuals combining permanent teeth and third molars.
Ramanan, Namratha; Thevissen, Patrick; Fleuws, Steffen; Willems, G
2012-12-01
The study aim was, firstly, to verify the Willems et al. model on a Japanese reference sample. Secondly to develop a Japanese reference model based on the Willems et al. method and to verify it. Thirdly to analyze the age prediction performance adding tooth development information of third molars to permanent teeth. Retrospectively 1877 panoramic radiographs were selected in the age range between 1 and 23 years (1248 children, 629 sub-adults). Dental development was registered applying Demirjian 's stages of the mandibular left permanent teeth in children and Köhler stages on the third molars. The children's data were, firstly, used to validate the Willems et al. model (developed a Belgian reference sample), secondly, split ino a training and a test sample. On the training sample a Japanese reference model was developed based on the Willems method. The developed model and the Willems et al; model were verified on the test sample. Regression analysis was used to detect the age prediction performance adding third molar scores to permanent tooth scores. The validated Willems et al. model provided a mean absolute error of 0.85 and 0.75 years in females and males, respectively. The mean absolute error in the verified Willems et al. and the developed Japanese reference model was 0.85, 0.77 and 0.79, 0.75 years in females and males, respectively. On average a negligible change in root mean square error values was detected adding third molar scores to permanent teeth scores. The Belgian sample could be used as a reference model to estimate the age of the Japanese individuals. Combining information from the third molars and permanent teeth was not providing clinically significant improvement of age predictions based on permanent teeth information alone.
NASA Astrophysics Data System (ADS)
Taasti, Vicki T.; Michalak, Gregory J.; Hansen, David C.; Deisher, Amanda J.; Kruse, Jon J.; Krauss, Bernhard; Muren, Ludvig P.; Petersen, Jørgen B. B.; McCollough, Cynthia H.
2018-01-01
Dual energy CT (DECT) has been shown, in theoretical and phantom studies, to improve the stopping power ratio (SPR) determination used for proton treatment planning compared to the use of single energy CT (SECT). However, it has not been shown that this also extends to organic tissues. The purpose of this study was therefore to investigate the accuracy of SPR estimation for fresh pork and beef tissue samples used as surrogates of human tissues. The reference SPRs for fourteen tissue samples, which included fat, muscle and femur bone, were measured using proton pencil beams. The tissue samples were subsequently CT scanned using four different scanners with different dual energy acquisition modes, giving in total six DECT-based SPR estimations for each sample. The SPR was estimated using a proprietary algorithm (syngo.via DE Rho/Z Maps, Siemens Healthcare, Forchheim, Germany) for extracting the electron density and the effective atomic number. SECT images were also acquired and SECT-based SPR estimations were performed using a clinical Hounsfield look-up table. The mean and standard deviation of the SPR over large volume-of-interests were calculated. For the six different DECT acquisition methods, the root-mean-square errors (RMSEs) for the SPR estimates over all tissue samples were between 0.9% and 1.5%. For the SECT-based SPR estimation the RMSE was 2.8%. For one DECT acquisition method, a positive bias was seen in the SPR estimates, having a mean error of 1.3%. The largest errors were found in the very dense cortical bone from a beef femur. This study confirms the advantages of DECT-based SPR estimation although good results were also obtained using SECT for most tissues.
Analysis of S-box in Image Encryption Using Root Mean Square Error Method
NASA Astrophysics Data System (ADS)
Hussain, Iqtadar; Shah, Tariq; Gondal, Muhammad Asif; Mahmood, Hasan
2012-07-01
The use of substitution boxes (S-boxes) in encryption applications has proven to be an effective nonlinear component in creating confusion and randomness. The S-box is evolving and many variants appear in literature, which include advanced encryption standard (AES) S-box, affine power affine (APA) S-box, Skipjack S-box, Gray S-box, Lui J S-box, residue prime number S-box, Xyi S-box, and S8 S-box. These S-boxes have algebraic and statistical properties which distinguish them from each other in terms of encryption strength. In some circumstances, the parameters from algebraic and statistical analysis yield results which do not provide clear evidence in distinguishing an S-box for an application to a particular set of data. In image encryption applications, the use of S-boxes needs special care because the visual analysis and perception of a viewer can sometimes identify artifacts embedded in the image. In addition to existing algebraic and statistical analysis already used for image encryption applications, we propose an application of root mean square error technique, which further elaborates the results and enables the analyst to vividly distinguish between the performances of various S-boxes. While the use of the root mean square error analysis in statistics has proven to be effective in determining the difference in original data and the processed data, its use in image encryption has shown promising results in estimating the strength of the encryption method. In this paper, we show the application of the root mean square error analysis to S-box image encryption. The parameters from this analysis are used in determining the strength of S-boxes
Multilevel sequential Monte Carlo: Mean square error bounds under verifiable conditions
Del Moral, Pierre; Jasra, Ajay; Law, Kody J. H.
2017-01-09
We consider the multilevel sequential Monte Carlo (MLSMC) method of Beskos et al. (Stoch. Proc. Appl. [to appear]). This technique is designed to approximate expectations w.r.t. probability laws associated to a discretization. For instance, in the context of inverse problems, where one discretizes the solution of a partial differential equation. The MLSMC approach is especially useful when independent, coupled sampling is not possible. Beskos et al. show that for MLSMC the computational effort to achieve a given error, can be less than independent sampling. In this article we significantly weaken the assumptions of Beskos et al., extending the proofs tomore » non-compact state-spaces. The assumptions are based upon multiplicative drift conditions as in Kontoyiannis and Meyn (Electron. J. Probab. 10 [2005]: 61–123). The assumptions are verified for an example.« less
Multilevel sequential Monte Carlo: Mean square error bounds under verifiable conditions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Del Moral, Pierre; Jasra, Ajay; Law, Kody J. H.
We consider the multilevel sequential Monte Carlo (MLSMC) method of Beskos et al. (Stoch. Proc. Appl. [to appear]). This technique is designed to approximate expectations w.r.t. probability laws associated to a discretization. For instance, in the context of inverse problems, where one discretizes the solution of a partial differential equation. The MLSMC approach is especially useful when independent, coupled sampling is not possible. Beskos et al. show that for MLSMC the computational effort to achieve a given error, can be less than independent sampling. In this article we significantly weaken the assumptions of Beskos et al., extending the proofs tomore » non-compact state-spaces. The assumptions are based upon multiplicative drift conditions as in Kontoyiannis and Meyn (Electron. J. Probab. 10 [2005]: 61–123). The assumptions are verified for an example.« less
Distribution of kriging errors, the implications and how to communicate them
NASA Astrophysics Data System (ADS)
Li, Hong Yi; Milne, Alice; Webster, Richard
2016-04-01
Kriging in one form or another has become perhaps the most popular method for spatial prediction in environmental science. Each prediction is unbiased and of minimum variance, which itself is estimated. The kriging variances depend on the mathematical model chosen to describe the spatial variation; different models, however plausible, give rise to different minimized variances. Practitioners often compare models by so-called cross-validation before finally choosing the most appropriate for their kriging. One proceeds as follows. One removes a unit (a sampling point) from the whole set, kriges the value there and compares the kriged value with the value observed to obtain the deviation or error. One repeats the process for each and every point in turn and for all plausible models. One then computes the mean errors (MEs) and the mean of the squared errors (MSEs). Ideally a squared error should equal the corresponding kriging variance (σK2), and so one is advised to choose the model for which on average the squared errors most nearly equal the kriging variances, i.e. the ratio MSDR = MSE/σK2 ≈ 1. Maximum likelihood estimation of models almost guarantees that the MSDR equals 1, and so the kriging variances are unbiased predictors of the squared error across the region. The method is based on the assumption that the errors have a normal distribution. The squared deviation ratio (SDR) should therefore be distributed as χ2 with one degree of freedom with a median of 0.455. We have found that often the median of the SDR (MedSDR) is less, in some instances much less, than 0.455 even though the mean of the SDR is close to 1. It seems that in these cases the distributions of the errors are leptokurtic, i.e. they have an excess of predictions close to the true values, excesses near the extremes and a dearth of predictions in between. In these cases the kriging variances are poor measures of the uncertainty at individual sites. The uncertainty is typically under-estimated for the extreme observations and compensated for by over estimating for other observations. Statisticians must tell users when they present maps of predictions. We illustrate the situation with results from mapping salinity in land reclaimed from the Yangtze delta in the Gulf of Hangzhou, China. There the apparent electrical conductivity (ECa) of the topsoil was measured at 525 points in a field of 2.3 ha. The marginal distribution of the observations was strongly positively skewed, and so the observed ECas were transformed to their logarithms to give an approximately symmetric distribution. That distribution was strongly platykurtic with short tails and no evident outliers. The logarithms were analysed as a mixed model of quadratic drift plus correlated random residuals with a spherical variogram. The kriged predictions that deviated from their true values with an MSDR of 0.993, but with a medSDR=0.324. The coefficient of kurtosis of the deviations was 1.45, i.e. substantially larger than 0 for a normal distribution. The reasons for this behaviour are being sought. The most likely explanation is that there are spatial outliers, i.e. points at which the observed values that differ markedly from those at their their closest neighbours.
Distribution of kriging errors, the implications and how to communicate them
NASA Astrophysics Data System (ADS)
Li, HongYi; Milne, Alice; Webster, Richard
2015-04-01
Kriging in one form or another has become perhaps the most popular method for spatial prediction in environmental science. Each prediction is unbiased and of minimum variance, which itself is estimated. The kriging variances depend on the mathematical model chosen to describe the spatial variation; different models, however plausible, give rise to different minimized variances. Practitioners often compare models by so-called cross-validation before finally choosing the most appropriate for their kriging. One proceeds as follows. One removes a unit (a sampling point) from the whole set, kriges the value there and compares the kriged value with the value observed to obtain the deviation or error. One repeats the process for each and every point in turn and for all plausible models. One then computes the mean errors (MEs) and the mean of the squared errors (MSEs). Ideally a squared error should equal the corresponding kriging variance (σ_K^2), and so one is advised to choose the model for which on average the squared errors most nearly equal the kriging variances, i.e. the ratio MSDR=MSE/ σ_K2 ≈1. Maximum likelihood estimation of models almost guarantees that the MSDR equals 1, and so the kriging variances are unbiased predictors of the squared error across the region. The method is based on the assumption that the errors have a normal distribution. The squared deviation ratio (SDR) should therefore be distributed as χ2 with one degree of freedom with a median of 0.455. We have found that often the median of the SDR (MedSDR) is less, in some instances much less, than 0.455 even though the mean of the SDR is close to 1. It seems that in these cases the distributions of the errors are leptokurtic, i.e. they have an excess of predictions close to the true values, excesses near the extremes and a dearth of predictions in between. In these cases the kriging variances are poor measures of the uncertainty at individual sites. The uncertainty is typically under-estimated for the extreme observations and compensated for by over estimating for other observations. Statisticians must tell users when they present maps of predictions. We illustrate the situation with results from mapping salinity in land reclaimed from the Yangtze delta in the Gulf of Hangzhou, China. There the apparent electrical conductivity (EC_a) of the topsoil was measured at 525 points in a field of 2.3~ha. The marginal distribution of the observations was strongly positively skewed, and so the observed EC_as were transformed to their logarithms to give an approximately symmetric distribution. That distribution was strongly platykurtic with short tails and no evident outliers. The logarithms were analysed as a mixed model of quadratic drift plus correlated random residuals with a spherical variogram. The kriged predictions that deviated from their true values with an MSDR of 0.993, but with a medSDR=0.324. The coefficient of kurtosis of the deviations was 1.45, i.e. substantially larger than 0 for a normal distribution. The reasons for this behaviour are being sought. The most likely explanation is that there are spatial outliers, i.e. points at which the observed values that differ markedly from those at their their closest neighbours.
Spectral combination of spherical gravitational curvature boundary-value problems
NASA Astrophysics Data System (ADS)
PitoÅák, Martin; Eshagh, Mehdi; Šprlák, Michal; Tenzer, Robert; Novák, Pavel
2018-04-01
Four solutions of the spherical gravitational curvature boundary-value problems can be exploited for the determination of the Earth's gravitational potential. In this article we discuss the combination of simulated satellite gravitational curvatures, i.e., components of the third-order gravitational tensor, by merging these solutions using the spectral combination method. For this purpose, integral estimators of biased- and unbiased-types are derived. In numerical studies, we investigate the performance of the developed mathematical models for the gravitational field modelling in the area of Central Europe based on simulated satellite measurements. Firstly, we verify the correctness of the integral estimators for the spectral downward continuation by a closed-loop test. Estimated errors of the combined solution are about eight orders smaller than those from the individual solutions. Secondly, we perform a numerical experiment by considering the Gaussian noise with the standard deviation of 6.5× 10-17 m-1s-2 in the input data at the satellite altitude of 250 km above the mean Earth sphere. This value of standard deviation is equivalent to a signal-to-noise ratio of 10. Superior results with respect to the global geopotential model TIM-r5 are obtained by the spectral downward continuation of the vertical-vertical-vertical component with the standard deviation of 2.104 m2s-2, but the root mean square error is the largest and reaches 9.734 m2s-2. Using the spectral combination of all gravitational curvatures the root mean square error is more than 400 times smaller but the standard deviation reaches 17.234 m2s-2. The combination of more components decreases the root mean square error of the corresponding solutions while the standard deviations of the combined solutions do not improve as compared to the solution from the vertical-vertical-vertical component. The presented method represents a weight mean in the spectral domain that minimizes the root mean square error of the combined solutions and improves standard deviation of the solution based only on the least accurate components.
Small convolution kernels for high-fidelity image restoration
NASA Technical Reports Server (NTRS)
Reichenbach, Stephen E.; Park, Stephen K.
1991-01-01
An algorithm is developed for computing the mean-square-optimal values for small, image-restoration kernels. The algorithm is based on a comprehensive, end-to-end imaging system model that accounts for the important components of the imaging process: the statistics of the scene, the point-spread function of the image-gathering device, sampling effects, noise, and display reconstruction. Subject to constraints on the spatial support of the kernel, the algorithm generates the kernel values that restore the image with maximum fidelity, that is, the kernel minimizes the expected mean-square restoration error. The algorithm is consistent with the derivation of the spatially unconstrained Wiener filter, but leads to a small, spatially constrained kernel that, unlike the unconstrained filter, can be efficiently implemented by convolution. Simulation experiments demonstrate that for a wide range of imaging systems these small kernels can restore images with fidelity comparable to images restored with the unconstrained Wiener filter.
The multicategory case of the sequential Bayesian pixel selection and estimation procedure
NASA Technical Reports Server (NTRS)
Pore, M. D.; Dennis, T. B. (Principal Investigator)
1980-01-01
A Bayesian technique for stratified proportion estimation and a sampling based on minimizing the mean squared error of this estimator were developed and tested on LANDSAT multispectral scanner data using the beta density function to model the prior distribution in the two-class case. An extention of this procedure to the k-class case is considered. A generalization of the beta function is shown to be a density function for the general case which allows the procedure to be extended.
Transmuted of Rayleigh Distribution with Estimation and Application on Noise Signal
NASA Astrophysics Data System (ADS)
Ahmed, Suhad; Qasim, Zainab
2018-05-01
This paper deals with transforming one parameter Rayleigh distribution, into transmuted probability distribution through introducing a new parameter (λ), since this studied distribution is necessary in representing signal data distribution and failure data model the value of this transmuted parameter |λ| ≤ 1, is also estimated as well as the original parameter (⊖) by methods of moments and maximum likelihood using different sample size (n=25, 50, 75, 100) and comparing the results of estimation by statistical measure (mean square error, MSE).
Selection within households in health surveys
Alves, Maria Cecilia Goi Porto; Escuder, Maria Mercedes Loureiro; Claro, Rafael Moreira; da Silva, Nilza Nunes
2014-01-01
OBJECTIVE To compare the efficiency and accuracy of sampling designs including and excluding the sampling of individuals within sampled households in health surveys. METHODS From a population survey conducted in Baixada Santista Metropolitan Area, SP, Southeastern Brazil, lowlands between 2006 and 2007, 1,000 samples were drawn for each design and estimates for people aged 18 to 59 and 18 and over were calculated for each sample. In the first design, 40 census tracts, 12 households per sector, and one person per household were sampled. In the second, no sampling within the household was performed and 40 census sectors and 6 households for the 18 to 59-year old group and 5 or 6 for the 18 and over age group or more were sampled. Precision and bias of proportion estimates for 11 indicators were assessed in the two final sets of the 1000 selected samples with the two types of design. They were compared by means of relative measurements: coefficient of variation, bias/mean ratio, bias/standard error ratio, and relative mean square error. Comparison of costs contrasted basic cost per person, household cost, number of people, and households. RESULTS Bias was found to be negligible for both designs. A lower precision was found in the design including individuals sampling within households, and the costs were higher. CONCLUSIONS The design excluding individual sampling achieved higher levels of efficiency and accuracy and, accordingly, should be first choice for investigators. Sampling of household dwellers should be adopted when there are reasons related to the study subject that may lead to bias in individual responses if multiple dwellers answer the proposed questionnaire. PMID:24789641
Response Surface Modeling Using Multivariate Orthogonal Functions
NASA Technical Reports Server (NTRS)
Morelli, Eugene A.; DeLoach, Richard
2001-01-01
A nonlinear modeling technique was used to characterize response surfaces for non-dimensional longitudinal aerodynamic force and moment coefficients, based on wind tunnel data from a commercial jet transport model. Data were collected using two experimental procedures - one based on modem design of experiments (MDOE), and one using a classical one factor at a time (OFAT) approach. The nonlinear modeling technique used multivariate orthogonal functions generated from the independent variable data as modeling functions in a least squares context to characterize the response surfaces. Model terms were selected automatically using a prediction error metric. Prediction error bounds computed from the modeling data alone were found to be- a good measure of actual prediction error for prediction points within the inference space. Root-mean-square model fit error and prediction error were less than 4 percent of the mean response value in all cases. Efficacy and prediction performance of the response surface models identified from both MDOE and OFAT experiments were investigated.
Determining the Uncertainty of X-Ray Absorption Measurements
Wojcik, Gary S.
2004-01-01
X-ray absorption (or more properly, x-ray attenuation) techniques have been applied to study the moisture movement in and moisture content of materials like cement paste, mortar, and wood. An increase in the number of x-ray counts with time at a location in a specimen may indicate a decrease in moisture content. The uncertainty of measurements from an x-ray absorption system, which must be known to properly interpret the data, is often assumed to be the square root of the number of counts, as in a Poisson process. No detailed studies have heretofore been conducted to determine the uncertainty of x-ray absorption measurements or the effect of averaging data on the uncertainty. In this study, the Poisson estimate was found to adequately approximate normalized root mean square errors (a measure of uncertainty) of counts for point measurements and profile measurements of water specimens. The Poisson estimate, however, was not reliable in approximating the magnitude of the uncertainty when averaging data from paste and mortar specimens. Changes in uncertainty from differing averaging procedures were well-approximated by a Poisson process. The normalized root mean square errors decreased when the x-ray source intensity, integration time, collimator size, and number of scanning repetitions increased. Uncertainties in mean paste and mortar count profiles were kept below 2 % by averaging vertical profiles at horizontal spacings of 1 mm or larger with counts per point above 4000. Maximum normalized root mean square errors did not exceed 10 % in any of the tests conducted. PMID:27366627
Huang, Xinchuan; Schwenke, David W; Lee, Timothy J
2011-01-28
In this work, we build upon our previous work on the theoretical spectroscopy of ammonia, NH(3). Compared to our 2008 study, we include more physics in our rovibrational calculations and more experimental data in the refinement procedure, and these enable us to produce a potential energy surface (PES) of unprecedented accuracy. We call this the HSL-2 PES. The additional physics we include is a second-order correction for the breakdown of the Born-Oppenheimer approximation, and we find it to be critical for improved results. By including experimental data for higher rotational levels in the refinement procedure, we were able to greatly reduce our systematic errors for the rotational dependence of our predictions. These additions together lead to a significantly improved total angular momentum (J) dependence in our computed rovibrational energies. The root-mean-square error between our predictions using the HSL-2 PES and the reliable energy levels from the HITRAN database for J = 0-6 and J = 7∕8 for (14)NH(3) is only 0.015 cm(-1) and 0.020∕0.023 cm(-1), respectively. The root-mean-square errors for the characteristic inversion splittings are approximately 1∕3 smaller than those for energy levels. The root-mean-square error for the 6002 J = 0-8 transition energies is 0.020 cm(-1). Overall, for J = 0-8, the spectroscopic data computed with HSL-2 is roughly an order of magnitude more accurate relative to our previous best ammonia PES (denoted HSL-1). These impressive numbers are eclipsed only by the root-mean-square error between our predictions for purely rotational transition energies of (15)NH(3) and the highly accurate Cologne database (CDMS): 0.00034 cm(-1) (10 MHz), in other words, 2 orders of magnitude smaller. In addition, we identify a deficiency in the (15)NH(3) energy levels determined from a model of the experimental data.
Quantized kernel least mean square algorithm.
Chen, Badong; Zhao, Songlin; Zhu, Pingping; Príncipe, José C
2012-01-01
In this paper, we propose a quantization approach, as an alternative of sparsification, to curb the growth of the radial basis function structure in kernel adaptive filtering. The basic idea behind this method is to quantize and hence compress the input (or feature) space. Different from sparsification, the new approach uses the "redundant" data to update the coefficient of the closest center. In particular, a quantized kernel least mean square (QKLMS) algorithm is developed, which is based on a simple online vector quantization method. The analytical study of the mean square convergence has been carried out. The energy conservation relation for QKLMS is established, and on this basis we arrive at a sufficient condition for mean square convergence, and a lower and upper bound on the theoretical value of the steady-state excess mean square error. Static function estimation and short-term chaotic time-series prediction examples are presented to demonstrate the excellent performance.
Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; de Azevedo Mello, Paola; Ferrão, Marco Flores; de Fátima Pereira dos Santos, Maria; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes
2012-04-01
Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm(-1)). This model produced a RMSECV of 400 mg kg(-1) S and RMSEP of 420 mg kg(-1) S, showing a correlation coefficient of 0.990. Copyright © 2011 Elsevier B.V. All rights reserved.
Application of a bioenergetics model for hatchery production: Largemouth bass fed commercial diets
Csargo, Isak J.; Michael L. Brown,; Chipps, Steven R.
2012-01-01
Fish bioenergetics models based on natural prey items have been widely used to address research and management questions. However, few attempts have been made to evaluate and apply bioenergetics models to hatchery-reared fish receiving commercial feeds that contain substantially higher energy densities than natural prey. In this study, we evaluated a bioenergetics model for age-0 largemouth bass Micropterus salmoidesreared on four commercial feeds. Largemouth bass (n ≈ 3,504) were reared for 70 d at 25°C in sixteen 833-L circular tanks connected in parallel to a recirculation system. Model performance was evaluated using error components (mean, slope, and random) derived from decomposition of the mean square error obtained from regression of observed on predicted values. Mean predicted consumption was only 8.9% lower than mean observed consumption and was similar to error rates observed for largemouth bass consuming natural prey. Model evaluation showed that the 97.5% joint confidence region included the intercept of 0 (−0.43 ± 3.65) and slope of 1 (1.08 ± 0.20), which indicates the model accurately predicted consumption. Moreover model error was similar among feeds (P = 0.98), and most error was probably attributable to sampling error (unconsumed feed), underestimated predator energy densities, or consumption-dependent error, which is common in bioenergetics models. This bioenergetics model could provide a valuable tool in hatchery production of largemouth bass. Furthermore, we believe that bioenergetics modeling could be useful in aquaculture production, particularly for species lacking historical hatchery constants or conventional growth models.
NASA Technical Reports Server (NTRS)
Rice, R. F.
1976-01-01
The root-mean-square error performance measure is used to compare the relative performance of several widely known source coding algorithms with the RM2 image data compression system. The results demonstrate that RM2 has a uniformly significant performance advantage.
Campos, Juliana Alvares Duarte Bonini; Spexoto, Maria Cláudia Bernardes; Serrano, Sergio Vicente; Maroco, João
2016-01-13
The psychometric properties of an instrument should be evaluated routinely when using different samples. This study evaluated the psychometric properties of the Functional Assessment of Cancer Therapy-General (FACT-G) when applied to a sample of Brazilian cancer patients. The face, content, and construct (factorial, convergent, and discriminant) validities of the FACT-G were estimated. Confirmatory factor analysis (CFA) was conducted the ratio chi-square by degrees of freedom (χ (2)/df), the comparative fit index (CFI), the Tucker-Lewis index (TLI), and the root mean square error of approximation (RMSEA) as indices. The invariance of the best model was assessed with multi-group analysis using the difference of chi-squares method (Δχ(2)). Convergent validity was assessed using Average Variance Extracted (AVE) and discriminant validity was determined via correlational analysis. Internal consistency was assessed using the Cronbach's alpha (α) coefficient, and the Composite Reliability (CR) was estimated. A total of 975 cancer patients participated in the study, with a mean age of 53.3 (SD = 13.0) years. Of these participants, 61.5 % were women. In CFA, five correlations between errors were included to fit the FACT-G to the sample (χ (2)/df = 8.611, CFI = .913, TLI = .902, RMSEA = .088). The model did not indicate invariant independent samples (Δχ(2): μ: p < .001, i: p < .958, Cov: p < .001, Res: p < .001). While there was adequate convergent validity for the physical well-being (AVE = .54) and social and family Well-being factors (AVE = .55), there was low convergent validity for the other factors. Reliability was adequate (CR = .76-.89 and α = .71-.82). Functional well-being, emotional well-being, and physical well-being were the factors that demonstrated a strong contribution to patients' health-related quality of life (β = -.99, .88, and .64, respectively). The FACT-G was found to be a valid and reliable assessment of health-related quality of life in a Brazilian sample of patients with cancer.
Teng, C-C; Chai, H; Lai, D-M; Wang, S-F
2007-02-01
Previous research has shown that there is no significant relationship between the degree of structural degeneration of the cervical spine and neck pain. We therefore sought to investigate the potential role of sensory dysfunction in chronic neck pain. Cervicocephalic kinesthetic sensibility, expressed by how accurately an individual can reposition the head, was studied in three groups of individuals, a control group of 20 asymptomatic young adults and two groups of middle-aged adults (20 subjects in each group) with or without a history of mild neck pain. An ultrasound-based three-dimensional coordinate measuring system was used to measure the position of the head and to test the accuracy of repositioning. Constant error (indicating that the subject overshot or undershot the intended position) and root mean square errors (representing total errors of accuracy and variability) were measured during repositioning of the head to the neutral head position (Head-to-NHP) and repositioning of the head to the target (Head-to-Target) in three cardinal planes (sagittal, transverse, and frontal). Analysis of covariance (ANCOVA) was used to test the group effect, with age used as a covariate. The constant errors during repositioning from a flexed position and from an extended position to the NHP were significantly greater in the middle-aged subjects than in the control group (beta=0.30 and beta=0.60, respectively; P<0.05 for both). In addition, the root mean square errors during repositioning from a flexed or extended position to the NHP were greater in the middle-aged subjects than in the control group (beta=0.27 and beta=0.49, respectively; P<0.05 for both). The root mean square errors also increased during Head-to-Target in left rotation (beta=0.24;P<0.05), but there was no difference in the constant errors or root mean square errors during Head-to-NHP repositioning from other target positions (P>0.05). The results indicate that, after controlling for age as a covariate, there was no group effect. Thus, age appears to have a profound effect on an individual's ability to accurately reposition the head toward the neutral position in the sagittal plane and repositioning the head toward left rotation. A history of mild chronic neck pain alone had no significant effect on cervicocephalic kinesthetic sensibility.
Mueller, Silke C; Drewelow, Bernd
2013-05-01
The area under the concentration-time curve (AUC) after oral midazolam administration is commonly used for cytochrome P450 (CYP) 3A phenotyping studies. The aim of this investigation was to evaluate a limited sampling strategy for the prediction of AUC with oral midazolam. A total of 288 concentration-time profiles from 123 healthy volunteers who participated in four previously performed drug interaction studies with intense sampling after a single oral dose of 7.5 mg midazolam were available for evaluation. Of these, 45 profiles served for model building, which was performed by stepwise multiple linear regression, and the remaining 243 datasets served for validation. Mean prediction error (MPE), mean absolute error (MAE) and root mean squared error (RMSE) were calculated to determine bias and precision The one- to four-sampling point models with the best coefficient of correlation were the one-sampling point model (8 h; r (2) = 0.84), the two-sampling point model (0.5 and 8 h; r (2) = 0.93), the three-sampling point model (0.5, 2, and 8 h; r (2) = 0.96), and the four-sampling point model (0.5,1, 2, and 8 h; r (2) = 0.97). However, the one- and two-sampling point models were unable to predict the midazolam AUC due to unacceptable bias and precision. Only the four-sampling point model predicted the very low and very high midazolam AUC of the validation dataset with acceptable precision and bias. The four-sampling point model was also able to predict the geometric mean ratio of the treatment phase over the baseline (with 90 % confidence interval) results of three drug interaction studies in the categories of strong, moderate, and mild induction, as well as no interaction. A four-sampling point limited sampling strategy to predict the oral midazolam AUC for CYP3A phenotyping is proposed. The one-, two- and three-sampling point models were not able to predict midazolam AUC accurately.
NASA Technical Reports Server (NTRS)
Lin, Qian; Allebach, Jan P.
1990-01-01
An adaptive vector linear minimum mean-squared error (LMMSE) filter for multichannel images with multiplicative noise is presented. It is shown theoretically that the mean-squared error in the filter output is reduced by making use of the correlation between image bands. The vector and conventional scalar LMMSE filters are applied to a three-band SIR-B SAR, and their performance is compared. Based on a mutliplicative noise model, the per-pel maximum likelihood classifier was derived. The authors extend this to the design of sequential and robust classifiers. These classifiers are also applied to the three-band SIR-B SAR image.
An empirical Bayes approach for the Poisson life distribution.
NASA Technical Reports Server (NTRS)
Canavos, G. C.
1973-01-01
A smooth empirical Bayes estimator is derived for the intensity parameter (hazard rate) in the Poisson distribution as used in life testing. The reliability function is also estimated either by using the empirical Bayes estimate of the parameter, or by obtaining the expectation of the reliability function. The behavior of the empirical Bayes procedure is studied through Monte Carlo simulation in which estimates of mean-squared errors of the empirical Bayes estimators are compared with those of conventional estimators such as minimum variance unbiased or maximum likelihood. Results indicate a significant reduction in mean-squared error of the empirical Bayes estimators over the conventional variety.
[Gaussian process regression and its application in near-infrared spectroscopy analysis].
Feng, Ai-Ming; Fang, Li-Min; Lin, Min
2011-06-01
Gaussian process (GP) is applied in the present paper as a chemometric method to explore the complicated relationship between the near infrared (NIR) spectra and ingredients. After the outliers were detected by Monte Carlo cross validation (MCCV) method and removed from dataset, different preprocessing methods, such as multiplicative scatter correction (MSC), smoothing and derivate, were tried for the best performance of the models. Furthermore, uninformative variable elimination (UVE) was introduced as a variable selection technique and the characteristic wavelengths obtained were further employed as input for modeling. A public dataset with 80 NIR spectra of corn was introduced as an example for evaluating the new algorithm. The optimal models for oil, starch and protein were obtained by the GP regression method. The performance of the final models were evaluated according to the root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP) and correlation coefficient (r). The models give good calibration ability with r values above 0.99 and the prediction ability is also satisfactory with r values higher than 0.96. The overall results demonstrate that GP algorithm is an effective chemometric method and is promising for the NIR analysis.
Basalekou, M.; Pappas, C.; Kotseridis, Y.; Tarantilis, P. A.; Kontaxakis, E.
2017-01-01
Color, phenolic content, and chemical age values of red wines made from Cretan grape varieties (Kotsifali, Mandilari) were evaluated over nine months of maturation in different containers for two vintages. The wines differed greatly on their anthocyanin profiles. Mid-IR spectra were also recorded with the use of a Fourier Transform Infrared Spectrophotometer in ZnSe disk mode. Analysis of Variance was used to explore the parameter's dependency on time. Determination models were developed for the chemical age indexes using Partial Least Squares (PLS) (TQ Analyst software) considering the spectral region 1830–1500 cm−1. The correlation coefficients (r) for chemical age index i were 0.86 for Kotsifali (Root Mean Square Error of Calibration (RMSEC) = 0.067, Root Mean Square Error of Prediction (RMSEP) = 0,115, and Root Mean Square Error of Validation (RMSECV) = 0.164) and 0.90 for Mandilari (RMSEC = 0.050, RMSEP = 0.040, and RMSECV = 0.089). For chemical age index ii the correlation coefficients (r) were 0.86 and 0.97 for Kotsifali (RMSEC 0.044, RMSEP = 0.087, and RMSECV = 0.214) and Mandilari (RMSEC = 0.024, RMSEP = 0.033, and RMSECV = 0.078), respectively. The proposed method is simpler, less time consuming, and more economical and does not require chemical reagents. PMID:29225994
Assessing the external validity of algorithms to estimate EQ-5D-3L from the WOMAC.
Kiadaliri, Aliasghar A; Englund, Martin
2016-10-04
The use of mapping algorithms have been suggested as a solution to predict health utilities when no preference-based measure is included in the study. However, validity and predictive performance of these algorithms are highly variable and hence assessing the accuracy and validity of algorithms before use them in a new setting is of importance. The aim of the current study was to assess the predictive accuracy of three mapping algorithms to estimate the EQ-5D-3L from the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) among Swedish people with knee disorders. Two of these algorithms developed using ordinary least squares (OLS) models and one developed using mixture model. The data from 1078 subjects mean (SD) age 69.4 (7.2) years with frequent knee pain and/or knee osteoarthritis from the Malmö Osteoarthritis study in Sweden were used. The algorithms' performance was assessed using mean error, mean absolute error, and root mean squared error. Two types of prediction were estimated for mixture model: weighted average (WA), and conditional on estimated component (CEC). The overall mean was overpredicted by an OLS model and underpredicted by two other algorithms (P < 0.001). All predictions but the CEC predictions of mixture model had a narrower range than the observed scores (22 to 90 %). All algorithms suffered from overprediction for severe health states and underprediction for mild health states with lesser extent for mixture model. While the mixture model outperformed OLS models at the extremes of the EQ-5D-3D distribution, it underperformed around the center of the distribution. While algorithm based on mixture model reflected the distribution of EQ-5D-3L data more accurately compared with OLS models, all algorithms suffered from systematic bias. This calls for caution in applying these mapping algorithms in a new setting particularly in samples with milder knee problems than original sample. Assessing the impact of the choice of these algorithms on cost-effectiveness studies through sensitivity analysis is recommended.
Demand forecasting of electricity in Indonesia with limited historical data
NASA Astrophysics Data System (ADS)
Dwi Kartikasari, Mujiati; Rohmad Prayogi, Arif
2018-03-01
Demand forecasting of electricity is an important activity for electrical agents to know the description of electricity demand in future. Prediction of demand electricity can be done using time series models. In this paper, double moving average model, Holt’s exponential smoothing model, and grey model GM(1,1) are used to predict electricity demand in Indonesia under the condition of limited historical data. The result shows that grey model GM(1,1) has the smallest value of MAE (mean absolute error), MSE (mean squared error), and MAPE (mean absolute percentage error).
ERIC Educational Resources Information Center
Wilson, Celia M.
2010-01-01
Research pertaining to the distortion of the squared canonical correlation coefficient has traditionally been limited to the effects of sampling error and associated correction formulas. The purpose of this study was to compare the degree of attenuation of the squared canonical correlation coefficient under varying conditions of score reliability.…
Baird, Zachariah Steven; Oja, Vahur; Järvik, Oliver
2015-05-01
This article describes the use of Fourier transform infrared (FT-IR) spectroscopy to quantitatively measure the hydroxyl concentrations among narrow boiling shale oil cuts. Shale oil samples were from an industrial solid heat carrier retort. Reference values were measured by titration and were used to create a partial least squares regression model from FT-IR data. The model had a root mean squared error (RMSE) of 0.44 wt% OH. This method was then used to study the distribution of hydroxyl groups among more than 100 shale oil cuts, which showed that hydroxyl content increased with the average boiling point of the cut up to about 350 °C and then leveled off and decreased.
Skinner, Kenneth D.
2009-01-01
Elevation data in riverine environments can be used in various applications for which different levels of accuracy are required. The Experimental Advanced Airborne Research LiDAR (Light Detection and Ranging) - or EAARL - system was used to obtain topographic and bathymetric data along the lower Boise River, southwestern Idaho, for use in hydraulic and habitat modeling. The EAARL data were post-processed into bare earth and bathymetric raster and point datasets. Concurrently with the EAARL data collection, real-time kinetic global positioning system and total station ground-survey data were collected in three areas within the lower Boise River basin to assess the accuracy of the EAARL elevation data in different hydrogeomorphic settings. The accuracies of the EAARL-derived elevation data, determined in open, flat terrain, to provide an optimal vertical comparison surface, had root mean square errors ranging from 0.082 to 0.138 m. Accuracies for bank, floodplain, and in-stream bathymetric data had root mean square errors ranging from 0.090 to 0.583 m. The greater root mean square errors for the latter data are the result of high levels of turbidity in the downstream ground-survey area, dense tree canopy, and horizontal location discrepancies between the EAARL and ground-survey data in steeply sloping areas such as riverbanks. The EAARL point to ground-survey comparisons produced results similar to those for the EAARL raster to ground-survey comparisons, indicating that the interpolation of the EAARL points to rasters did not introduce significant additional error. The mean percent error for the wetted cross-sectional areas of the two upstream ground-survey areas was 1 percent. The mean percent error increases to -18 percent if the downstream ground-survey area is included, reflecting the influence of turbidity in that area.
Diffuse-flow conceptualization and simulation of the Edwards aquifer, San Antonio region, Texas
Lindgren, R.J.
2006-01-01
A numerical ground-water-flow model (hereinafter, the conduit-flow Edwards aquifer model) of the karstic Edwards aquifer in south-central Texas was developed for a previous study on the basis of a conceptualization emphasizing conduit development and conduit flow, and included simulating conduits as one-cell-wide, continuously connected features. Uncertainties regarding the degree to which conduits pervade the Edwards aquifer and influence ground-water flow, as well as other uncertainties inherent in simulating conduits, raised the question of whether a model based on the conduit-flow conceptualization was the optimum model for the Edwards aquifer. Accordingly, a model with an alternative hydraulic conductivity distribution without conduits was developed in a study conducted during 2004-05 by the U.S. Geological Survey, in cooperation with the San Antonio Water System. The hydraulic conductivity distribution for the modified Edwards aquifer model (hereinafter, the diffuse-flow Edwards aquifer model), based primarily on a conceptualization in which flow in the aquifer predominantly is through a network of numerous small fractures and openings, includes 38 zones, with hydraulic conductivities ranging from 3 to 50,000 feet per day. Revision of model input data for the diffuse-flow Edwards aquifer model was limited to changes in the simulated hydraulic conductivity distribution. The root-mean-square error for 144 target wells for the calibrated steady-state simulation for the diffuse-flow Edwards aquifer model is 20.9 feet. This error represents about 3 percent of the total head difference across the model area. The simulated springflows for Comal and San Marcos Springs for the calibrated steady-state simulation were within 2.4 and 15 percent of the median springflows for the two springs, respectively. The transient calibration period for the diffuse-flow Edwards aquifer model was 1947-2000, with 648 monthly stress periods, the same as for the conduit-flow Edwards aquifer model. The root-mean-square error for a period of drought (May-November 1956) for the calibrated transient simulation for 171 target wells is 33.4 feet, which represents about 5 percent of the total head difference across the model area. The root-mean-square error for a period of above-normal rainfall (November 1974-July 1975) for the calibrated transient simulation for 169 target wells is 25.8 feet, which represents about 4 percent of the total head difference across the model area. The root-mean-square error ranged from 6.3 to 30.4 feet in 12 target wells with long-term water-level measurements for varying periods during 1947-2000 for the calibrated transient simulation for the diffuse-flow Edwards aquifer model, and these errors represent 5.0 to 31.3 percent of the range in water-level fluctuations of each of those wells. The root-mean-square errors for the five major springs in the San Antonio segment of the aquifer for the calibrated transient simulation, as a percentage of the range of discharge fluctuations measured at the springs, varied from 7.2 percent for San Marcos Springs and 8.1 percent for Comal Springs to 28.8 percent for Leona Springs. The root-mean-square errors for hydraulic heads for the conduit-flow Edwards aquifer model are 27, 76, and 30 percent greater than those for the diffuse-flow Edwards aquifer model for the steady-state, drought, and above-normal rainfall synoptic time periods, respectively. The goodness-of-fit between measured and simulated springflows is similar for Comal, San Marcos, and Leona Springs for the diffuse-flow Edwards aquifer model and the conduit-flow Edwards aquifer model. The root-mean-square errors for Comal and Leona Springs were 15.6 and 21.3 percent less, respectively, whereas the root-mean-square error for San Marcos Springs was 3.3 percent greater for the diffuse-flow Edwards aquifer model compared to the conduit-flow Edwards aquifer model. The root-mean-square errors for San Antonio and San Pedro Springs were appreciably greater, 80.2 and 51.0 percent, respectively, for the diffuse-flow Edwards aquifer model. The simulated water budgets for the diffuse-flow Edwards aquifer model are similar to those for the conduit-flow Edwards aquifer model. Differences in percentage of total sources or discharges for a budget component are 2.0 percent or less for all budget components for the steady-state and transient simulations. The largest difference in terms of the magnitude of water budget components for the transient simulation for 1956 was a decrease of about 10,730 acre-feet per year (about 2 per-cent) in springflow for the diffuse-flow Edwards aquifer model compared to the conduit-flow Edwards aquifer model. This decrease in springflow (a water budget discharge) was largely offset by the decreased net loss of water from storage (a water budget source) of about 10,500 acre-feet per year.
NASA Astrophysics Data System (ADS)
Ying, Yibin; Liu, Yande; Tao, Yang
2005-09-01
This research evaluated the feasibility of using Fourier-transform near-infrared (FT-NIR) spectroscopy to quantify the soluble-solids content (SSC) and the available acidity (VA) in intact apples. Partial least-squares calibration models, obtained from several preprocessing techniques (smoothing, derivative, etc.) in several wave-number ranges were compared. The best models were obtained with the high coefficient determination (r) 0.940 for the SSC and a moderate r of 0.801 for the VA, root-mean-square errors of prediction of 0.272% and 0.053%, and root-mean-square errors of calibration of 0.261% and 0.046%, respectively. The results indicate that the FT-NIR spectroscopy yields good predictions of the SSC and also showed the feasibility of using it to predict the VA of apples.
Comparison of structural and least-squares lines for estimating geologic relations
Williams, G.P.; Troutman, B.M.
1990-01-01
Two different goals in fitting straight lines to data are to estimate a "true" linear relation (physical law) and to predict values of the dependent variable with the smallest possible error. Regarding the first goal, a Monte Carlo study indicated that the structural-analysis (SA) method of fitting straight lines to data is superior to the ordinary least-squares (OLS) method for estimating "true" straight-line relations. Number of data points, slope and intercept of the true relation, and variances of the errors associated with the independent (X) and dependent (Y) variables influence the degree of agreement. For example, differences between the two line-fitting methods decrease as error in X becomes small relative to error in Y. Regarding the second goal-predicting the dependent variable-OLS is better than SA. Again, the difference diminishes as X takes on less error relative to Y. With respect to estimation of slope and intercept and prediction of Y, agreement between Monte Carlo results and large-sample theory was very good for sample sizes of 100, and fair to good for sample sizes of 20. The procedures and error measures are illustrated with two geologic examples. ?? 1990 International Association for Mathematical Geology.
Maritime Adaptive Optics Beam Control
2010-09-01
Liquid Crystal LMS Least Mean Square MIMO Multiple- Input Multiple-Output MMDM Micromachined Membrane Deformable Mirror MSE Mean Square Error...determine how the beam is distorted, a control computer to calculate the correction to be applied, and a corrective element, usually a deformable mirror ...during this research, an overview of the system modification is provided here. Using additional mirrors and reflecting the beam to and from an
Evaluation of the depth-integration method of measuring water discharge in large rivers
Moody, J.A.; Troutman, B.M.
1992-01-01
The depth-integration method oor measuring water discharge makes a continuos measurement of the water velocity from the water surface to the bottom at 20 to 40 locations or verticals across a river. It is especially practical for large rivers where river traffic makes it impractical to use boats attached to taglines strung across the river or to use current meters suspended from bridges. This method has the additional advantage over the standard two- and eight-tenths method in that a discharge-weighted suspended-sediment sample can be collected at the same time. When this method is used in large rivers such as the Missouri, Mississippi and Ohio, a microwave navigation system is used to determine the ship's position at each vertical sampling location across the river, and to make accurate velocity corrections to compensate for shift drift. An essential feature is a hydraulic winch that can lower and raise the current meter at a constant transit velocity so that the velocities at all depths are measured for equal lengths of time. Field calibration measurements show that: (1) the mean velocity measured on the upcast (bottom to surface) is within 1% of the standard mean velocity determined by 9-11 point measurements; (2) if the transit velocity is less than 25% of the mean velocity, then average error in the mean velocity is 4% or less. The major source of bias error is a result of mounting the current meter above a sounding weight and sometimes above a suspended-sediment sampling bottle, which prevents measurement of the velocity all the way to the bottom. The measured mean velocity is slightly larger than the true mean velocity. This bias error in the discharge is largest in shallow water (approximately 8% for the Missouri River at Hermann, MO, where the mean depth was 4.3 m) and smallest in deeper water (approximately 3% for the Mississippi River at Vickbsurg, MS, where the mean depth was 14.5 m). The major source of random error in the discharge is the natural variability of river velocities, which we assumed to be independent and random at each vertical. The standard error of the estimated mean velocity, at an individual vertical sampling location, may be as large as 9%, for large sand-bed alluvial rivers. The computed discharge, however, is a weighted mean of these random velocities. Consequently the standard error of computed discharge is divided by the square root of the number of verticals, producing typical values between 1 and 2%. The discharges measured by the depth-integrated method agreed within ??5% of those measured simultaneously by the standard two- and eight-tenths, six-tenth and moving boat methods. ?? 1992.
Szymanska-Chargot, M; Chylinska, M; Kruk, B; Zdunek, A
2015-01-22
The aim of this work was to quantitatively and qualitatively determine the composition of the cell wall material from apples during development by means of Fourier transform infrared (FT-IR) spectroscopy. The FT-IR region of 1500-800 cm(-1), containing characteristic bands for galacturonic acid, hemicellulose and cellulose, was examined using principal component analysis (PCA), k-means clustering and partial least squares (PLS). The samples were differentiated by development stage and cultivar using PCA and k-means clustering. PLS calibration models for galacturonic acid, hemicellulose and cellulose content from FT-IR spectra were developed and validated with the reference data. PLS models were tested using the root-mean-square errors of cross-validation for contents of galacturonic acid, hemicellulose and cellulose which was 8.30 mg/g, 4.08% and 1.74%, respectively. It was proven that FT-IR spectroscopy combined with chemometric methods has potential for fast and reliable determination of the main constituents of fruit cell walls. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Yang, Renjie; Dong, Guimei; Sun, Xueshan; Yang, Yanrong; Yu, Yaping; Liu, Haixue; Zhang, Weiyu
2018-02-01
A new approach for quantitative determination of polycyclic aromatic hydrocarbons (PAHs) in environment was proposed based on two-dimensional (2D) fluorescence correlation spectroscopy in conjunction with multivariate method. 40 mixture solutions of anthracene and pyrene were prepared in the laboratory. Excitation-emission matrix (EEM) fluorescence spectra of all samples were collected. And 2D fluorescence correlation spectra were calculated under the excitation perturbation. The N-way partial least squares (N-PLS) models were developed based on 2D fluorescence correlation spectra, showing a root mean square error of calibration (RMSEC) of 3.50 μg L- 1 and root mean square error of prediction (RMSEP) of 4.42 μg L- 1 for anthracene and of 3.61 μg L- 1 and 4.29 μg L- 1 for pyrene, respectively. Also, the N-PLS models were developed for quantitative analysis of anthracene and pyrene using EEM fluorescence spectra. The RMSEC and RMSEP were 3.97 μg L- 1 and 4.63 μg L- 1 for anthracene, 4.46 μg L- 1 and 4.52 μg L- 1 for pyrene, respectively. It was found that the N-PLS model using 2D fluorescence correlation spectra could provide better results comparing with EEM fluorescence spectra because of its low RMSEC and RMSEP. The methodology proposed has the potential to be an alternative method for detection of PAHs in environment.
Boiret, Mathieu; Meunier, Loïc; Ginot, Yves-Michel
2011-02-20
A near infrared (NIR) method was developed for determination of tablet potency of active pharmaceutical ingredient (API) in a complex coated tablet matrix. The calibration set contained samples from laboratory and production scale batches. The reference values were obtained by high performance liquid chromatography (HPLC) and partial least squares (PLS) regression was used to establish a model. The model was challenged by calculating tablet potency of two external test sets. Root mean square errors of prediction were respectively equal to 2.0% and 2.7%. To use this model with a second spectrometer from the production field, a calibration transfer method called piecewise direct standardisation (PDS) was used. After the transfer, the root mean square error of prediction of the first test set was 2.4% compared to 4.0% without transferring the spectra. A statistical technique using bootstrap of PLS residuals was used to estimate confidence intervals of tablet potency calculations. This method requires an optimised PLS model, selection of the bootstrap number and determination of the risk. In the case of a chemical analysis, the tablet potency value will be included within the confidence interval calculated by the bootstrap method. An easy to use graphical interface was developed to easily determine if the predictions, surrounded by minimum and maximum values, are within the specifications defined by the regulatory organisation. Copyright © 2010 Elsevier B.V. All rights reserved.
Validation of Core Temperature Estimation Algorithm
2016-01-29
plot of observed versus estimated core temperature with the line of identity (dashed) and the least squares regression line (solid) and line equation...estimated PSI with the line of identity (dashed) and the least squares regression line (solid) and line equation in the top left corner. (b) Bland...for comparison. The root mean squared error (RMSE) was also computed, as given by Equation 2.
Wang, Junmei; Hou, Tingjun
2011-01-01
In this work, we have evaluated how well the General AMBER force field (GAFF) performs in studying the dynamic properties of liquids. Diffusion coefficients (D) have been predicted for 17 solvents, 5 organic compounds in aqueous solutions, 4 proteins in aqueous solutions, and 9 organic compounds in non-aqueous solutions. An efficient sampling strategy has been proposed and tested in the calculation of the diffusion coefficients of solutes in solutions. There are two major findings of this study. First of all, the diffusion coefficients of organic solutes in aqueous solution can be well predicted: the average unsigned error (AUE) and the root-mean-square error (RMSE) are 0.137 and 0.171 ×10−5 cm−2s−1, respectively. Second, although the absolute values of D cannot be predicted, good correlations have been achieved for 8 organic solvents with experimental data (R2 = 0.784), 4 proteins in aqueous solutions (R2 = 0.996) and 9 organic compounds in non-aqueous solutions (R2 = 0.834). The temperature dependent behaviors of three solvents, namely, TIP3P water, dimethyl sulfoxide (DMSO) and cyclohexane have been studied. The major MD settings, such as the sizes of simulation boxes and with/without wrapping the coordinates of MD snapshots into the primary simulation boxes have been explored. We have concluded that our sampling strategy that averaging the mean square displacement (MSD) collected in multiple short-MD simulations is efficient in predicting diffusion coefficients of solutes at infinite dilution. PMID:21953689
The impact of multiple endpoint dependency on Q and I(2) in meta-analysis.
Thompson, Christopher Glen; Becker, Betsy Jane
2014-09-01
A common assumption in meta-analysis is that effect sizes are independent. When correlated effect sizes are analyzed using traditional univariate techniques, this assumption is violated. This research assesses the impact of dependence arising from treatment-control studies with multiple endpoints on homogeneity measures Q and I(2) in scenarios using the unbiased standardized-mean-difference effect size. Univariate and multivariate meta-analysis methods are examined. Conditions included different overall outcome effects, study sample sizes, numbers of studies, between-outcomes correlations, dependency structures, and ways of computing the correlation. The univariate approach used typical fixed-effects analyses whereas the multivariate approach used generalized least-squares (GLS) estimates of a fixed-effects model, weighted by the inverse variance-covariance matrix. Increased dependence among effect sizes led to increased Type I error rates from univariate models. When effect sizes were strongly dependent, error rates were drastically higher than nominal levels regardless of study sample size and number of studies. In contrast, using GLS estimation to account for multiple-endpoint dependency maintained error rates within nominal levels. Conversely, mean I(2) values were not greatly affected by increased amounts of dependency. Last, we point out that the between-outcomes correlation should be estimated as a pooled within-groups correlation rather than using a full-sample estimator that does not consider treatment/control group membership. Copyright © 2014 John Wiley & Sons, Ltd.
Theoretical and experimental studies of error in square-law detector circuits
NASA Technical Reports Server (NTRS)
Stanley, W. D.; Hearn, C. P.; Williams, J. B.
1984-01-01
Square law detector circuits to determine errors from the ideal input/output characteristic function were investigated. The nonlinear circuit response is analyzed by a power series expansion containing terms through the fourth degree, from which the significant deviation from square law can be predicted. Both fixed bias current and flexible bias current configurations are considered. The latter case corresponds with the situation where the mean current can change with the application of a signal. Experimental investigations of the circuit arrangements are described. Agreement between the analytical models and the experimental results are established. Factors which contribute to differences under certain conditions are outlined.
NASA Astrophysics Data System (ADS)
Xu, B. Y.; Ye, Y.; Liao, L. C.
2016-07-01
A new method was developed to determine the methamphetamine and morphine concentrations in urine and saliva based on excitation-emission matrix fluorescence coupled to a second-order calibration algorithm. In the case of single-drug abuse, the results showed that the average recoveries of methamphetamine and morphine were 95.3 and 96.7% in urine samples, respectively, and 98.1 and 106.2% in saliva samples, respectively. The relative errors were all below 5%. The simultaneous determination of methamphetamine and morphine in urine using two second-order algorithms was also investigated. Satisfactory results were obtained with a self-weighted alternating trilinear decomposition algorithm. The root-mean-square errors of the predictions were 0.540 and 0.0382 μg/mL for methamphetamine and morphine, respectively. The limits of detection of the proposed methods were very low and sufficient for studying methamphetamine and morphine in urine.
NASA Astrophysics Data System (ADS)
Suhandy, D.; Yulia, M.; Ogawa, Y.; Kondo, N.
2018-05-01
In the present research, an evaluation of using near infrared (NIR) spectroscopy in tandem with full spectrum partial least squares (FS-PLS) regression for quantification of degree of adulteration in civet coffee was conducted. A number of 126 ground roasted coffee samples with degree of adulteration 0-51% were prepared. Spectral data were acquired using a NIR spectrometer equipped with an integrating sphere for diffuse reflectance measurement in the range of 1300-2500 nm. The samples were divided into two groups calibration sample set (84 samples) and prediction sample set (42 samples). The calibration model was developed on original spectra using FS-PLS regression with full-cross validation method. The calibration model exhibited the determination coefficient R2=0.96 for calibration and R2=0.92 for validation. The prediction resulted in low root mean square error of prediction (RMSEP) (4.67%) and high ratio prediction to deviation (RPD) (3.75). In conclusion, the degree of adulteration in civet coffee have been quantified successfully by using NIR spectroscopy and FS-PLS regression in a non-destructive, economical, precise, and highly sensitive method, which uses very simple sample preparation.
Intrinsic Raman spectroscopy for quantitative biological spectroscopy Part II
Bechtel, Kate L.; Shih, Wei-Chuan; Feld, Michael S.
2009-01-01
We demonstrate the effectiveness of intrinsic Raman spectroscopy (IRS) at reducing errors caused by absorption and scattering. Physical tissue models, solutions of varying absorption and scattering coefficients with known concentrations of Raman scatterers, are studied. We show significant improvement in prediction error by implementing IRS to predict concentrations of Raman scatterers using both ordinary least squares regression (OLS) and partial least squares regression (PLS). In particular, we show that IRS provides a robust calibration model that does not increase in error when applied to samples with optical properties outside the range of calibration. PMID:18711512
Zhao, Guo; Wang, Hui; Liu, Gang; Wang, Zhiqiang
2016-09-21
An easy, but effective, method has been proposed to detect and quantify the Pb(II) in the presence of Cd(II) based on a Bi/glassy carbon electrode (Bi/GCE) with the combination of a back propagation artificial neural network (BP-ANN) and square wave anodic stripping voltammetry (SWASV) without further electrode modification. The effects of Cd(II) in different concentrations on stripping responses of Pb(II) was studied. The results indicate that the presence of Cd(II) will reduce the prediction precision of a direct calibration model. Therefore, a two-input and one-output BP-ANN was built for the optimization of a stripping voltammetric sensor, which considering the combined effects of Cd(II) and Pb(II) on the SWASV detection of Pb(II) and establishing the nonlinear relationship between the stripping peak currents of Pb(II) and Cd(II) and the concentration of Pb(II). The key parameters of the BP-ANN and the factors affecting the SWASV detection of Pb(II) were optimized. The prediction performance of direct calibration model and BP-ANN model were tested with regard to the mean absolute error (MAE), root mean square error (RMSE), average relative error (ARE), and correlation coefficient. The results proved that the BP-ANN model exhibited higher prediction accuracy than the direct calibration model. Finally, a real samples analysis was performed to determine trace Pb(II) in some soil specimens with satisfactory results.
Terra, Luciana A; Filgueiras, Paulo R; Tose, Lílian V; Romão, Wanderson; de Souza, Douglas D; de Castro, Eustáquio V R; de Oliveira, Mirela S L; Dias, Júlio C M; Poppi, Ronei J
2014-10-07
Negative-ion mode electrospray ionization, ESI(-), with Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was coupled to a Partial Least Squares (PLS) regression and variable selection methods to estimate the total acid number (TAN) of Brazilian crude oil samples. Generally, ESI(-)-FT-ICR mass spectra present a power of resolution of ca. 500,000 and a mass accuracy less than 1 ppm, producing a data matrix containing over 5700 variables per sample. These variables correspond to heteroatom-containing species detected as deprotonated molecules, [M - H](-) ions, which are identified primarily as naphthenic acids, phenols and carbazole analog species. The TAN values for all samples ranged from 0.06 to 3.61 mg of KOH g(-1). To facilitate the spectral interpretation, three methods of variable selection were studied: variable importance in the projection (VIP), interval partial least squares (iPLS) and elimination of uninformative variables (UVE). The UVE method seems to be more appropriate for selecting important variables, reducing the dimension of the variables to 183 and producing a root mean square error of prediction of 0.32 mg of KOH g(-1). By reducing the size of the data, it was possible to relate the selected variables with their corresponding molecular formulas, thus identifying the main chemical species responsible for the TAN values.
Ebtehaj, Isa; Bonakdari, Hossein
2014-01-01
The existence of sediments in wastewater greatly affects the performance of the sewer and wastewater transmission systems. Increased sedimentation in wastewater collection systems causes problems such as reduced transmission capacity and early combined sewer overflow. The article reviews the performance of the genetic algorithm (GA) and imperialist competitive algorithm (ICA) in minimizing the target function (mean square error of observed and predicted Froude number). To study the impact of bed load transport parameters, using four non-dimensional groups, six different models have been presented. Moreover, the roulette wheel selection method is used to select the parents. The ICA with root mean square error (RMSE) = 0.007, mean absolute percentage error (MAPE) = 3.5% show better results than GA (RMSE = 0.007, MAPE = 5.6%) for the selected model. All six models return better results than the GA. Also, the results of these two algorithms were compared with multi-layer perceptron and existing equations.
Discrete-time state estimation for stochastic polynomial systems over polynomial observations
NASA Astrophysics Data System (ADS)
Hernandez-Gonzalez, M.; Basin, M.; Stepanov, O.
2018-07-01
This paper presents a solution to the mean-square state estimation problem for stochastic nonlinear polynomial systems over polynomial observations confused with additive white Gaussian noises. The solution is given in two steps: (a) computing the time-update equations and (b) computing the measurement-update equations for the state estimate and error covariance matrix. A closed form of this filter is obtained by expressing conditional expectations of polynomial terms as functions of the state estimate and error covariance. As a particular case, the mean-square filtering equations are derived for a third-degree polynomial system with second-degree polynomial measurements. Numerical simulations show effectiveness of the proposed filter compared to the extended Kalman filter.
Criterion Predictability: Identifying Differences Between [r-squares
ERIC Educational Resources Information Center
Malgady, Robert G.
1976-01-01
An analysis of variance procedure for testing differences in r-squared, the coefficient of determination, across independent samples is proposed and briefly discussed. The principal advantage of the procedure is to minimize Type I error for follow-up tests of pairwise differences. (Author/JKS)
NASA Astrophysics Data System (ADS)
Weng, Yi; He, Xuan; Yao, Wang; Pacheco, Michelle C.; Wang, Junyi; Pan, Zhongqi
2017-07-01
In this paper, we explored the performance of space-time block-coding (STBC) assisted multiple-input multiple-output (MIMO) scheme for modal dispersion and mode-dependent loss (MDL) mitigation in spatial-division multiplexed optical communication systems, whereas the weight matrices of frequency-domain equalization (FDE) were updated heuristically using decision-directed recursive least squares (RLS) algorithm for convergence and channel estimation. The proposed STBC-RLS algorithm can achieve 43.6% enhancement on convergence rate over conventional least mean squares (LMS) for quadrature phase-shift keying (QPSK) signals with merely 16.2% increase in hardware complexity. The overall optical signal to noise ratio (OSNR) tolerance can be improved via STBC by approximately 3.1, 4.9, 7.8 dB for QPSK, 16-quadrature amplitude modulation (QAM) and 64-QAM with respective bit-error-rates (BER) and minimum-mean-square-error (MMSE).
RLS Channel Estimation with Adaptive Forgetting Factor for DS-CDMA Frequency-Domain Equalization
NASA Astrophysics Data System (ADS)
Kojima, Yohei; Tomeba, Hiromichi; Takeda, Kazuaki; Adachi, Fumiyuki
Frequency-domain equalization (FDE) based on the minimum mean square error (MMSE) criterion can increase the downlink bit error rate (BER) performance of DS-CDMA beyond that possible with conventional rake combining in a frequency-selective fading channel. FDE requires accurate channel estimation. Recently, we proposed a pilot-assisted channel estimation (CE) based on the MMSE criterion. Using MMSE-CE, the channel estimation accuracy is almost insensitive to the pilot chip sequence, and a good BER performance is achieved. In this paper, we propose a channel estimation scheme using one-tap recursive least square (RLS) algorithm, where the forgetting factor is adapted to the changing channel condition by the least mean square (LMS)algorithm, for DS-CDMA with FDE. We evaluate the BER performance using RLS-CE with adaptive forgetting factor in a frequency-selective fast Rayleigh fading channel by computer simulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ying Yibin; Liu Yande; Tao Yang
2005-09-01
This research evaluated the feasibility of using Fourier-transform near-infrared (FT-NIR) spectroscopy to quantify the soluble-solids content (SSC) and the available acidity (VA) in intact apples. Partial least-squares calibration models, obtained from several preprocessing techniques (smoothing, derivative, etc.) in several wave-number ranges were compared. The best models were obtained with the high coefficient determination (r{sup 2}) 0.940 for the SSC and a moderate r{sup 2} of 0.801 for the VA, root-mean-square errors of prediction of 0.272% and 0.053%, and root-mean-square errors of calibration of 0.261% and 0.046%, respectively. The results indicate that the FT-NIR spectroscopy yields good predictions of the SSCmore » and also showed the feasibility of using it to predict the VA of apples.« less
Synthesis of hover autopilots for rotary-wing VTOL aircraft
NASA Technical Reports Server (NTRS)
Hall, W. E.; Bryson, A. E., Jr.
1972-01-01
The practical situation is considered where imperfect information on only a few rotor and fuselage state variables is available. Filters are designed to estimate all the state variables from noisy measurements of fuselage pitch/roll angles and from noisy measurements of both fuselage and rotor pitch/roll angles. The mean square response of the vehicle to a very gusty, random wind is computed using various filter/controllers and is found to be quite satisfactory although, of course, not so good as when one has perfect information (idealized case). The second part of the report considers precision hover over a point on the ground. A vehicle model without rotor dynamics is used and feedback signals in position and integral of position error are added. The mean square response of the vehicle to a very gusty, random wind is computed, assuming perfect information feedback, and is found to be excellent. The integral error feedback gives zero position error for a steady wind, and smaller position error for a random wind.
An Examination of Statistical Power in Multigroup Dynamic Structural Equation Models
ERIC Educational Resources Information Center
Prindle, John J.; McArdle, John J.
2012-01-01
This study used statistical simulation to calculate differential statistical power in dynamic structural equation models with groups (as in McArdle & Prindle, 2008). Patterns of between-group differences were simulated to provide insight into how model parameters influence power approximations. Chi-square and root mean square error of…
Olson, Scott A.
2003-01-01
The stream-gaging network in New Hampshire was analyzed for its effectiveness in providing regional information on peak-flood flow, mean-flow, and low-flow frequency. The data available for analysis were from stream-gaging stations in New Hampshire and selected stations in adjacent States. The principles of generalized-least-squares regression analysis were applied to develop regional regression equations that relate streamflow-frequency characteristics to watershed characteristics. Regression equations were developed for (1) the instantaneous peak flow with a 100-year recurrence interval, (2) the mean-annual flow, and (3) the 7-day, 10-year low flow. Active and discontinued stream-gaging stations with 10 or more years of flow data were used to develop the regression equations. Each stream-gaging station in the network was evaluated and ranked on the basis of how much the data from that station contributed to the cost-weighted sampling-error component of the regression equation. The potential effect of data from proposed and new stream-gaging stations on the sampling error also was evaluated. The stream-gaging network was evaluated for conditions in water year 2000 and for estimated conditions under various network strategies if an additional 5 years and 20 years of streamflow data were collected. The effectiveness of the stream-gaging network in providing regional streamflow information could be improved for all three flow characteristics with the collection of additional flow data, both temporally and spatially. With additional years of data collection, the greatest reduction in the average sampling error of the regional regression equations was found for the peak- and low-flow characteristics. In general, additional data collection at stream-gaging stations with unregulated flow, relatively short-term record (less than 20 years), and drainage areas smaller than 45 square miles contributed the largest cost-weighted reduction to the average sampling error of the regional estimating equations. The results of the network analyses can be used to prioritize the continued operation of active stations, the reactivation of discontinued stations, or the activation of new stations to maximize the regional information content provided by the stream-gaging network. Final decisions regarding altering the New Hampshire stream-gaging network would require the consideration of the many uses of the streamflow data serving local, State, and Federal interests.
Gilliom, Robert J.; Helsel, Dennis R.
1986-01-01
A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensored observations, for determining the best performing parameter estimation method for any particular data set. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilliom, R.J.; Helsel, D.R.
1986-02-01
A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensoredmore » observations, for determining the best performing parameter estimation method for any particular data det. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.« less
Estimation of distributional parameters for censored trace-level water-quality data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilliom, R.J.; Helsel, D.R.
1984-01-01
A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water-sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensored observations,more » for determining the best-performing parameter estimation method for any particular data set. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least-squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification. 6 figs., 6 tabs.« less
Tamburini, Elena; Tagliati, Chiara; Bonato, Tiziano; Costa, Stefania; Scapoli, Chiara; Pedrini, Paola
2016-01-01
Near-infrared spectroscopy (NIRS) has been widely used for quantitative and/or qualitative determination of a wide range of matrices. The objective of this study was to develop a NIRS method for the quantitative determination of fluorine content in polylactide (PLA)-talc blends. A blending profile was obtained by mixing different amounts of PLA granules and talc powder. The calibration model was built correlating wet chemical data (alkali digestion method) and NIR spectra. Using FT (Fourier Transform)-NIR technique, a Partial Least Squares (PLS) regression model was set-up, in a concentration interval of 0 ppm of pure PLA to 800 ppm of pure talc. Fluorine content prediction (R2cal = 0.9498; standard error of calibration, SEC = 34.77; standard error of cross-validation, SECV = 46.94) was then externally validated by means of a further 15 independent samples (R2EX.V = 0.8955; root mean standard error of prediction, RMSEP = 61.08). A positive relationship between an inorganic component as fluorine and NIR signal has been evidenced, and used to obtain quantitative analytical information from the spectra. PMID:27490548
1992-09-01
and collecting and processing data. They were at the front line in interacting with the subjects and maintaining morale. They did an excellent job. They...second for 16 parameter channels, and the data were processed to produce a single root mean square (RMS) error value for each channel appropriate to...represented in the final analysis. Physiological data The physiological data on the VAX were processed by sampling them at 5-minute intervals throughout the
Determination of total phenolic compounds in compost by infrared spectroscopy.
Cascant, M M; Sisouane, M; Tahiri, S; Krati, M El; Cervera, M L; Garrigues, S; de la Guardia, M
2016-06-01
Middle and near infrared (MIR and NIR) were applied to determine the total phenolic compounds (TPC) content in compost samples based on models built by using partial least squares (PLS) regression. The multiplicative scatter correction, standard normal variate and first derivative were employed as spectra pretreatment, and the number of latent variable were optimized by leave-one-out cross-validation. The performance of PLS-ATR-MIR and PLS-DR-NIR models was evaluated according to root mean square error of cross validation and prediction (RMSECV and RMSEP), the coefficient of determination for prediction (Rpred(2)) and residual predictive deviation (RPD) being obtained for this latter values of 5.83 and 8.26 for MIR and NIR, respectively. Copyright © 2016 Elsevier B.V. All rights reserved.
Estimating the settling velocity of bioclastic sediment using common grain-size analysis techniques
Cuttler, Michael V. W.; Lowe, Ryan J.; Falter, James L.; Buscombe, Daniel D.
2017-01-01
Most techniques for estimating settling velocities of natural particles have been developed for siliciclastic sediments. Therefore, to understand how these techniques apply to bioclastic environments, measured settling velocities of bioclastic sedimentary deposits sampled from a nearshore fringing reef in Western Australia were compared with settling velocities calculated using results from several common grain-size analysis techniques (sieve, laser diffraction and image analysis) and established models. The effects of sediment density and shape were also examined using a range of density values and three different models of settling velocity. Sediment density was found to have a significant effect on calculated settling velocity, causing a range in normalized root-mean-square error of up to 28%, depending upon settling velocity model and grain-size method. Accounting for particle shape reduced errors in predicted settling velocity by 3% to 6% and removed any velocity-dependent bias, which is particularly important for the fastest settling fractions. When shape was accounted for and measured density was used, normalized root-mean-square errors were 4%, 10% and 18% for laser diffraction, sieve and image analysis, respectively. The results of this study show that established models of settling velocity that account for particle shape can be used to estimate settling velocity of irregularly shaped, sand-sized bioclastic sediments from sieve, laser diffraction, or image analysis-derived measures of grain size with a limited amount of error. Collectively, these findings will allow for grain-size data measured with different methods to be accurately converted to settling velocity for comparison. This will facilitate greater understanding of the hydraulic properties of bioclastic sediment which can help to increase our general knowledge of sediment dynamics in these environments.
Wang, Lu; Liu, Tao; Chen, Yang; Sun, Yaqin; Xiu, Zhilong
2017-01-25
Biomass is an important parameter reflecting the fermentation dynamics. Real-time monitoring of biomass can be used to control and optimize a fermentation process. To overcome the deficiencies of measurement delay and manual errors from offline measurement, we designed an experimental platform for online monitoring the biomass during a 1,3-propanediol fermentation process, based on using the fourier-transformed near-infrared (FT-NIR) spectra analysis. By pre-processing the real-time sampled spectra and analyzing the sensitive spectra bands, a partial least-squares algorithm was proposed to establish a dynamic prediction model for the biomass change during a 1,3-propanediol fermentation process. The fermentation processes with substrate glycerol concentrations of 60 g/L and 40 g/L were used as the external validation experiments. The root mean square error of prediction (RMSEP) obtained by analyzing experimental data was 0.341 6 and 0.274 3, respectively. These results showed that the established model gave good prediction and could be effectively used for on-line monitoring the biomass during a 1,3-propanediol fermentation process.
Program documentation: Surface heating rate of thin skin models (THNSKN)
NASA Technical Reports Server (NTRS)
Mcbryde, J. D.
1975-01-01
Program THNSKN computes the mean heating rate at a maximum of 100 locations on the surface of thin skin transient heating rate models. Output is printed in tabular form and consists of time history tabulation of temperatures, average temperatures, heat loss without conduction correction, mean heating rate, least squares heating rate, and the percent standard error of the least squares heating rates. The input tape used is produced by the program EHTS03.
Brouillette, Carl; Smith, Wayne; Shende, Chetan; Gladding, Zack; Farquharson, Stuart; Morris, Robert E; Cramer, Jeffrey A; Schmitigal, Joel
2016-05-01
The change in custody of fuel shipments at depots, pipelines, and ports could benefit from an analyzer that could rapidly verify that properties are within specifications. To meet this need, the design requirements for a fuel analyzer based on near-infrared (NIR) spectroscopy, such as spectral region and resolution, were examined. It was found that the 1000 to 1600 nm region, containing the second CH overtone and combination vibrational modes of hydrocarbons, provided the best near-infrared to fuel property correlations when path length was taken into account, whereas 4 cm(-1) resolution provided only a modest improvement compared to 16 cm(-1) resolution when four or more latent variables were used. Based on these results, a field-portable near-infrared fuel analyzer was built that employed an incandescent light source, sample compartment optics to hold 2 mL glass sample vials with ∼1 cm path length, a transmission grating, and a 256 channel InGaAs detector that measured the above stated wavelength range with 5-6 nm (∼32 cm(-1)) resolution. The analyzer produced high signal-to-noise ratio (SNR) spectra of samples in 5 s. Twenty-two property correlation models were developed for diesel, gasoline, and jet fuels with root mean squared error of correlation - cross-validated values that compared favorably to corresponding ASTM reproducibility values. The standard deviations of predicted properties for repeat measurements at 4, 24, and 38℃ were often better than ASTM documented repeatability values. The analyzer and diesel property models were tested by measuring seven diesel samples at a local ASTM certification laboratory. The standard deviations between the analyzer determined values and the ASTM measured values for these samples were generally better than the model root mean squared error of correlation-cross-validated values for each property. © The Author(s) 2016.
Srinivas, Nuggehally R; Syed, Muzeeb
2016-01-01
Limited pharmacokinetic sampling strategy may be useful for predicting the area under the curve (AUC) for triptans and may have clinical utility as a prospective tool for prediction. Using appropriate intranasal pharmacokinetic data, a Cmax vs. AUC relationship was established by linear regression models for sumatriptan and zolmitriptan. The predictions of the AUC values were performed using published mean/median Cmax data and appropriate regression lines. The quotient of observed and predicted values rendered fold-difference calculation. The mean absolute error (MAE), mean positive error (MPE), mean negative error (MNE), root mean square error (RMSE), correlation coefficient (r), and the goodness of the AUC fold prediction were used to evaluate the two triptans. Also, data from the mean concentration profiles at time points of 1 hour (sumatriptan) and 3 hours (zolmitriptan) were used for the AUC prediction. The Cmax vs. AUC models displayed excellent correlation for both sumatriptan (r = .9997; P < .001) and zolmitriptan (r = .9999; P < .001). Irrespective of the two triptans, the majority of the predicted AUCs (83%-85%) were within 0.76-1.25-fold difference using the regression model. The prediction of AUC values for sumatriptan or zolmitriptan using the concentration data that reflected the Tmax occurrence were in the proximity of the reported values. In summary, the Cmax vs. AUC models exhibited strong correlations for sumatriptan and zolmitriptan. The usefulness of the prediction of the AUC values was established by a rigorous statistical approach.
Smith, S. Jerrod; Lewis, Jason M.; Graves, Grant M.
2015-09-28
Generalized-least-squares multiple-linear regression analysis was used to formulate regression relations between peak-streamflow frequency statistics and basin characteristics. Contributing drainage area was the only basin characteristic determined to be statistically significant for all percentage of annual exceedance probabilities and was the only basin characteristic used in regional regression equations for estimating peak-streamflow frequency statistics on unregulated streams in and near the Oklahoma Panhandle. The regression model pseudo-coefficient of determination, converted to percent, for the Oklahoma Panhandle regional regression equations ranged from about 38 to 63 percent. The standard errors of prediction and the standard model errors for the Oklahoma Panhandle regional regression equations ranged from about 84 to 148 percent and from about 76 to 138 percent, respectively. These errors were comparable to those reported for regional peak-streamflow frequency regression equations for the High Plains areas of Texas and Colorado. The root mean square errors for the Oklahoma Panhandle regional regression equations (ranging from 3,170 to 92,000 cubic feet per second) were less than the root mean square errors for the Oklahoma statewide regression equations (ranging from 18,900 to 412,000 cubic feet per second); therefore, the Oklahoma Panhandle regional regression equations produce more accurate peak-streamflow statistic estimates for the irrigated period of record in the Oklahoma Panhandle than do the Oklahoma statewide regression equations. The regression equations developed in this report are applicable to streams that are not substantially affected by regulation, impoundment, or surface-water withdrawals. These regression equations are intended for use for stream sites with contributing drainage areas less than or equal to about 2,060 square miles, the maximum value for the independent variable used in the regression analysis.
Online measurement of urea concentration in spent dialysate during hemodialysis.
Olesberg, Jonathon T; Arnold, Mark A; Flanigan, Michael J
2004-01-01
We describe online optical measurements of urea in the effluent dialysate line during regular hemodialysis treatment of several patients. Monitoring urea removal can provide valuable information about dialysis efficiency. Spectral measurements were performed with a Fourier-transform infrared spectrometer equipped with a flow-through cell. Spectra were recorded across the 5000-4000 cm(-1) (2.0-2.5 microm) wavelength range at 1-min intervals. Savitzky-Golay filtering was used to remove baseline variations attributable to the temperature dependence of the water absorption spectrum. Urea concentrations were extracted from the filtered spectra by use of partial least-squares regression and the net analyte signal of urea. Urea concentrations predicted by partial least-squares regression matched concentrations obtained from standard chemical assays with a root mean square error of 0.30 mmol/L (0.84 mg/dL urea nitrogen) over an observed concentration range of 0-11 mmol/L. The root mean square error obtained with the net analyte signal of urea was 0.43 mmol/L with a calibration based only on a set of pure-component spectra. The error decreased to 0.23 mmol/L when a slope and offset correction were used. Urea concentrations can be continuously monitored during hemodialysis by near-infrared spectroscopy. Calibrations based on the net analyte signal of urea are particularly appealing because they do not require a training step, as do statistical multivariate calibration procedures such as partial least-squares regression.
Yu, Tzy-Chyi; Zhou, Huanxue
2015-09-01
Evaluate performance of techniques used to handle missing cost-to-charge ratio (CCR) data in the USA Healthcare Cost and Utilization Project's Nationwide Inpatient Sample. Four techniques to replace missing CCR data were evaluated: deleting discharges with missing CCRs (complete case analysis), reweighting as recommended by Healthcare Cost and Utilization Project, reweighting by adjustment cells and hot deck imputation by adjustment cells. Bias and root mean squared error of these techniques on hospital cost were evaluated in five disease cohorts. Similar mean cost estimates would be obtained with any of the four techniques when the percentage of missing data is low (<10%). When total cost is the outcome of interest, a reweighting technique to avoid underestimation from dropping observations with missing data should be adopted.
Peng, Dan; Bi, Yanlan; Ren, Xiaona; Yang, Guolong; Sun, Shangde; Wang, Xuede
2015-12-01
This study was performed to develop a hierarchical approach for detection and quantification of adulteration of sesame oil with vegetable oils using gas chromatography (GC). At first, a model was constructed to discriminate the difference between authentic sesame oils and adulterated sesame oils using support vector machine (SVM) algorithm. Then, another SVM-based model is developed to identify the type of adulterant in the mixed oil. At last, prediction models for sesame oil were built for each kind of oil using partial least square method. To validate this approach, 746 samples were prepared by mixing authentic sesame oils with five types of vegetable oil. The prediction results show that the detection limit for authentication is as low as 5% in mixing ratio and the root-mean-square errors for prediction range from 1.19% to 4.29%, meaning that this approach is a valuable tool to detect and quantify the adulteration of sesame oil. Copyright © 2015 Elsevier Ltd. All rights reserved.
Nondestructive evaluation of soluble solid content in strawberry by near infrared spectroscopy
NASA Astrophysics Data System (ADS)
Guo, Zhiming; Huang, Wenqian; Chen, Liping; Wang, Xiu; Peng, Yankun
This paper indicates the feasibility to use near infrared (NIR) spectroscopy combined with synergy interval partial least squares (siPLS) algorithms as a rapid nondestructive method to estimate the soluble solid content (SSC) in strawberry. Spectral preprocessing methods were optimized selected by cross-validation in the model calibration. Partial least squares (PLS) algorithm was conducted on the calibration of regression model. The performance of the final model was back-evaluated according to root mean square error of calibration (RMSEC) and correlation coefficient (R2 c) in calibration set, and tested by mean square error of prediction (RMSEP) and correlation coefficient (R2 p) in prediction set. The optimal siPLS model was obtained with after first derivation spectra preprocessing. The measurement results of best model were achieved as follow: RMSEC = 0.2259, R2 c = 0.9590 in the calibration set; and RMSEP = 0.2892, R2 p = 0.9390 in the prediction set. This work demonstrated that NIR spectroscopy and siPLS with efficient spectral preprocessing is a useful tool for nondestructively evaluation SSC in strawberry.
Study of the convergence behavior of the complex kernel least mean square algorithm.
Paul, Thomas K; Ogunfunmi, Tokunbo
2013-09-01
The complex kernel least mean square (CKLMS) algorithm is recently derived and allows for online kernel adaptive learning for complex data. Kernel adaptive methods can be used in finding solutions for neural network and machine learning applications. The derivation of CKLMS involved the development of a modified Wirtinger calculus for Hilbert spaces to obtain the cost function gradient. We analyze the convergence of the CKLMS with different kernel forms for complex data. The expressions obtained enable us to generate theory-predicted mean-square error curves considering the circularity of the complex input signals and their effect on nonlinear learning. Simulations are used for verifying the analysis results.
Park, Sangsoo; Spirduso, Waneen; Eakin, Tim; Abraham, Lawrence
2018-01-01
The authors investigated how varying the required low-level forces and the direction of force change affect accuracy and variability of force production in a cyclic isometric pinch force tracking task. Eighteen healthy right-handed adult volunteers performed the tracking task over 3 different force ranges. Root mean square error and coefficient of variation were higher at lower force levels and during minimum reversals compared with maximum reversals. Overall, the thumb showed greater root mean square error and coefficient of variation scores than did the index finger during maximum reversals, but not during minimum reversals. The observed impaired performance during minimum reversals might originate from history-dependent mechanisms of force production and highly coupled 2-digit performance.
Adaptive control strategies for flexible robotic arm
NASA Technical Reports Server (NTRS)
Bialasiewicz, Jan T.
1993-01-01
The motivation of this research came about when a neural network direct adaptive control scheme was applied to control the tip position of a flexible robotic arm. Satisfactory control performance was not attainable due to the inherent non-minimum phase characteristics of the flexible robotic arm tip. Most of the existing neural network control algorithms are based on the direct method and exhibit very high sensitivity if not unstable closed-loop behavior. Therefore a neural self-tuning control (NSTC) algorithm is developed and applied to this problem and showed promising results. Simulation results of the NSTC scheme and the conventional self-tuning (STR) control scheme are used to examine performance factors such as control tracking mean square error, estimation mean square error, transient response, and steady state response.
A study of image quality for radar image processing. [synthetic aperture radar imagery
NASA Technical Reports Server (NTRS)
King, R. W.; Kaupp, V. H.; Waite, W. P.; Macdonald, H. C.
1982-01-01
Methods developed for image quality metrics are reviewed with focus on basic interpretation or recognition elements including: tone or color; shape; pattern; size; shadow; texture; site; association or context; and resolution. Seven metrics are believed to show promise as a way of characterizing the quality of an image: (1) the dynamic range of intensities in the displayed image; (2) the system signal-to-noise ratio; (3) the system spatial bandwidth or bandpass; (4) the system resolution or acutance; (5) the normalized-mean-square-error as a measure of geometric fidelity; (6) the perceptual mean square error; and (7) the radar threshold quality factor. Selective levels of degradation are being applied to simulated synthetic radar images to test the validity of these metrics.
Measuring Dispersion Effects of Factors in Factorial Experiments.
1988-01-01
error is MSE =i=l j=1 i n r (SSE/(N-p)), the sum of squares of pure error is SSPE = Z E Y i=1 j=1 and the mean square of pure error is MSPE - ( SSPE /n...the level of the factor in the ith run is 0. 3.1. First Measure We have n r n r SSPE = 1 Is it -yi) 2 + E r (1-8 )(yjj li-l j=l (iYjj +i= j=l l - i...The first component in SSPE corresponds to level I of the factor and has n degrees of freedom ( E 6i)(r-I). The second component corresponds to i=l n
Low complexity adaptive equalizers for underwater acoustic communications
NASA Astrophysics Data System (ADS)
Soflaei, Masoumeh; Azmi, Paeiz
2014-08-01
Interference signals due to scattering from surface and reflecting from bottom is one of the most important problems of reliable communications in shallow water channels. To solve this problem, one of the best suggested ways is to use adaptive equalizers. Convergence rate and misadjustment error in adaptive algorithms play important roles in adaptive equalizer performance. In this paper, affine projection algorithm (APA), selective regressor APA(SR-APA), family of selective partial update (SPU) algorithms, family of set-membership (SM) algorithms and selective partial update selective regressor APA (SPU-SR-APA) are compared with conventional algorithms such as the least mean square (LMS) in underwater acoustic communications. We apply experimental data from the Strait of Hormuz for demonstrating the efficiency of the proposed methods over shallow water channel. We observe that the values of the steady-state mean square error (MSE) of SR-APA, SPU-APA, SPU-normalized least mean square (SPU-NLMS), SPU-SR-APA, SM-APA and SM-NLMS algorithms decrease in comparison with the LMS algorithm. Also these algorithms have better convergence rates than LMS type algorithm.
Analysis of lard in meatball broth using Fourier transform infrared spectroscopy and chemometrics.
Kurniawati, Endah; Rohman, Abdul; Triyana, Kuwat
2014-01-01
Meatball is one of the favorite foods in Indonesia. For the economic reason (due to the price difference), the substitution of beef meat with pork can occur. In this study, FTIR spectroscopy in combination with chemometrics of partial least square (PLS) and principal component analysis (PCA) was used for analysis of pork fat (lard) in meatball broth. Lard in meatball broth was quantitatively determined at wavenumber region of 1018-1284 cm(-1). The coefficient of determination (R(2)) and root mean square error of calibration (RMSEC) values obtained were 0.9975 and 1.34% (v/v), respectively. Furthermore, the classification of lard and beef fat in meatball broth as well as in commercial samples was performed at wavenumber region of 1200-1000 cm(-1). The results showed that FTIR spectroscopy coupled with chemometrics can be used for quantitative analysis and classification of lard in meatball broth for Halal verification studies. The developed method is simple in operation, rapid and not involving extensive sample preparation. © 2013.
Wang, Junmei; Hou, Tingjun
2011-12-01
In this work, we have evaluated how well the general assisted model building with energy refinement (AMBER) force field performs in studying the dynamic properties of liquids. Diffusion coefficients (D) have been predicted for 17 solvents, five organic compounds in aqueous solutions, four proteins in aqueous solutions, and nine organic compounds in nonaqueous solutions. An efficient sampling strategy has been proposed and tested in the calculation of the diffusion coefficients of solutes in solutions. There are two major findings of this study. First of all, the diffusion coefficients of organic solutes in aqueous solution can be well predicted: the average unsigned errors and the root mean square errors are 0.137 and 0.171 × 10(-5) cm(-2) s(-1), respectively. Second, although the absolute values of D cannot be predicted, good correlations have been achieved for eight organic solvents with experimental data (R(2) = 0.784), four proteins in aqueous solutions (R(2) = 0.996), and nine organic compounds in nonaqueous solutions (R(2) = 0.834). The temperature dependent behaviors of three solvents, namely, TIP3P water, dimethyl sulfoxide, and cyclohexane have been studied. The major molecular dynamics (MD) settings, such as the sizes of simulation boxes and with/without wrapping the coordinates of MD snapshots into the primary simulation boxes have been explored. We have concluded that our sampling strategy that averaging the mean square displacement collected in multiple short-MD simulations is efficient in predicting diffusion coefficients of solutes at infinite dilution. Copyright © 2011 Wiley Periodicals, Inc.
On-line milk spectrometry: analysis of bovine milk composition
NASA Astrophysics Data System (ADS)
Spitzer, Kyle; Kuennemeyer, Rainer; Woolford, Murray; Claycomb, Rod
2005-04-01
We present partial least squares (PLS) regressions to predict the composition of raw, unhomogenised milk using visible to near infrared spectroscopy. A total of 370 milk samples from individual quarters were collected and analysed on-line by two low cost spectrometers in the wavelength ranges 380-1100 nm and 900-1700 nm. Samples were collected from 22 Friesian, 17 Jersey, 2 Ayrshire and 3 Friesian-Jersey crossbred cows over a period of 7 consecutive days. Transmission spectra were recorded in an inline flowcell through a 0.5 mm thick milk sample. PLS models, where wavelength selection was performed using iterative PLS, were developed for fat, protein, lactose, and somatic cell content. The root mean square error of prediction (and correlation coefficient) for the nir and visible spectrometers respectively were 0.70%(0.93) and 0.91%(0.91) for fat, 0.65%(0.5) and 0.47%(0.79) for protein, 0.36%(0.49) and 0.45%(0.43) for lactose, and 0.50(0.54) and 0.48(0.51) for log10 somatic cells.
Gonzaga, Fabiano Barbieri; Pasquini, Celio
2010-06-18
A low cost absorption spectrophotometer for the short wave near infrared spectral region (850-1050 nm) is described. The spectrophotometer is basically composed of a conventional dichroic lamp, a long-pass filter, a sample cell and a Czerny-Turner type polychromator coupled to a 1024 pixel non-cooled photodiode array. A preliminary evaluation of the spectrophotometer showed good repeatability of the first derivative of the spectra at a constant room temperature and the possibility of assigning some spectral regions to different C-H stretching third overtones. Finally, the spectrophotometer was successfully applied for the analysis of diesel samples and the determination of some of their quality parameters using partial least squares calibration models. The values found for the root mean square error of prediction using external validation were 0.5 for the cetane index and from 2.5 to 5.0 degrees C for the temperatures achieved during distillation when obtaining 10, 50, 85, and 90% (v/v) of the distilled sample, respectively. 2010 Elsevier B.V. All rights reserved.
Dankowska, A; Domagała, A; Kowalewski, W
2017-09-01
The potential of fluorescence, UV-Vis spectroscopies as well as the low- and mid-level data fusion of both spectroscopies for the quantification of concentrations of roasted Coffea arabica and Coffea canephora var. robusta in coffee blends was investigated. Principal component analysis was used to reduce data multidimensionality. To calculate the level of undeclared addition, multiple linear regression (PCA-MLR) models were used with lowest root mean square error of calibration (RMSEC) of 3.6% and root mean square error of cross-validation (RMSECV) of 7.9%. LDA analysis was applied to fluorescence intensities and UV spectra of Coffea arabica, canephora samples, and their mixtures in order to examine classification ability. The best performance of PCA-LDA analysis was observed for data fusion of UV and fluorescence intensity measurements at wavelength interval of 60nm. LDA showed that data fusion can achieve over 96% of correct classifications (sensitivity) in the test set and 100% of correct classifications in the training set, with low-level data fusion. The corresponding results for individual spectroscopies ranged from 90% (UV-Vis spectroscopy) to 77% (synchronous fluorescence) in the test set, and from 93% to 97% in the training set. The results demonstrate that fluorescence, UV, and visible spectroscopies complement each other, giving a complementary effect for the quantification of roasted Coffea arabica and Coffea canephora var. robusta concentration in blends. Copyright © 2017 Elsevier B.V. All rights reserved.
Circulation patterns in the deep Subtropical Northeast Atlantic with ARGO data
NASA Astrophysics Data System (ADS)
Calheiros, Tomas; Bashmachnikov, Igor
2014-05-01
In this work we study the dominant circulation patterns in the Subtropical Northeast Atlantic using ARGO data [25-45o N, 5-35o W]. The data were obtained from the Coriolis operational data center (ftp://ftp.ifremer.fr) for the years 1999-2013. During this period of time in the study there were available area 376 floats with 15062 float-months of total time. The floats were launched in the depths range between 300 and 2000 m, but most of the floats were concentrated at 1000 m (2000 float-months) and 1500 m (3400 float-months). In the upper 400-m layer there were also about 1000 float-months, but their number and distribution did not allow analysis of the mean currents over the study region. For each float position Lagrangian current velocity was computed as the difference between the position when the buoy started sinking to the reference depth and the consequent position of surfacing of the float, divided by the respective time interval. This allowed reducing the noise related with sea-surface drift of the buoys during the data-transmission periods. Mean Eulerian velocity and its error were computed in each of the 2ox2o square. Whenever in a 2ox2o square more than 150 observations of the Lagrangian velocity were available, the square was split into 4 smaller 1ox1o squares, in each of which the mean Eulerian velocities and their errors were estimated. Eulerian currents at 1000 m, as well as at 1500 m depth formed an overall anticyclonic circulation pattern in the study region. The modal velocity of all buoys at 1000 m level was 4 cm/s with an error of the mean of 1.8 cm/s. The modal velocity of all buoys at 1500m was 3 cm/s with an error of the mean of 1.4 cm/s. The southwestward flows near the Madeira Island and further westwards flow along the zonal band of 25-30o N at 1500 m depth well corresponded to the extension of the deep fraction of the Mediterranean Water salt tong.
2009-07-16
0.25 0.26 -0.85 1 SSR SSE R SSTO SSTO = = − 2 2 ˆ( ) : Regression sum of square, ˆwhere : mean value, : value from the fitted line ˆ...Error sum of square : Total sum of square i i i i SSR Y Y Y Y SSE Y Y SSTO SSE SSR = − = − = + ∑ ∑ Statistical analysis: Coefficient of correlation
Teo, Troy P; Ahmed, Syed Bilal; Kawalec, Philip; Alayoubi, Nadia; Bruce, Neil; Lyn, Ethan; Pistorius, Stephen
2018-02-01
The accurate prediction of intrafraction lung tumor motion is required to compensate for system latency in image-guided adaptive radiotherapy systems. The goal of this study was to identify an optimal prediction model that has a short learning period so that prediction and adaptation can commence soon after treatment begins, and requires minimal reoptimization for individual patients. Specifically, the feasibility of predicting tumor position using a combination of a generalized (i.e., averaged) neural network, optimized using historical patient data (i.e., tumor trajectories) obtained offline, coupled with the use of real-time online tumor positions (obtained during treatment delivery) was examined. A 3-layer perceptron neural network was implemented to predict tumor motion for a prediction horizon of 650 ms. A backpropagation algorithm and batch gradient descent approach were used to train the model. Twenty-seven 1-min lung tumor motion samples (selected from a CyberKnife patient dataset) were sampled at a rate of 7.5 Hz (0.133 s) to emulate the frame rate of an electronic portal imaging device (EPID). A sliding temporal window was used to sample the data for learning. The sliding window length was set to be equivalent to the first breathing cycle detected from each trajectory. Performing a parametric sweep, an averaged error surface of mean square errors (MSE) was obtained from the prediction responses of seven trajectories used for the training of the model (Group 1). An optimal input data size and number of hidden neurons were selected to represent the generalized model. To evaluate the prediction performance of the generalized model on unseen data, twenty tumor traces (Group 2) that were not involved in the training of the model were used for the leave-one-out cross-validation purposes. An input data size of 35 samples (4.6 s) and 20 hidden neurons were selected for the generalized neural network. An average sliding window length of 28 data samples was used. The average initial learning period prior to the availability of the first predicted tumor position was 8.53 ± 1.03 s. Average mean absolute error (MAE) of 0.59 ± 0.13 mm and 0.56 ± 0.18 mm were obtained from Groups 1 and 2, respectively, giving an overall MAE of 0.57 ± 0.17 mm. Average root-mean-square-error (RMSE) of 0.67 ± 0.36 for all the traces (0.76 ± 0.34 mm, Group 1 and 0.63 ± 0.36 mm, Group 2), is comparable to previously published results. Prediction errors are mainly due to the irregular periodicities between cycles. Since the errors from Groups 1 and 2 are within the same range, it demonstrates that this model can generalize and predict on unseen data. This is a first attempt to use an averaged MSE error surface (obtained from the prediction of different patients' tumor trajectories) to determine the parameters of a generalized neural network. This network could be deployed as a plug-and-play predictor for tumor trajectory during treatment delivery, eliminating the need for optimizing individual networks with pretreatment patient data. © 2017 American Association of Physicists in Medicine.
Accuracy of a pulse-coherent acoustic Doppler profiler in a wave-dominated flow
Lacy, J.R.; Sherwood, C.R.
2004-01-01
The accuracy of velocities measured by a pulse-coherent acoustic Doppler profiler (PCADP) in the bottom boundary layer of a wave-dominated inner-shelf environment is evaluated. The downward-looking PCADP measured velocities in eight 10-cm cells at 1 Hz. Velocities measured by the PCADP are compared to those measured by an acoustic Doppler velocimeter for wave orbital velocities up to 95 cm s-1 and currents up to 40 cm s-1. An algorithm for correcting ambiguity errors using the resolution velocities was developed. Instrument bias, measured as the average error in burst mean speed, is -0.4 cm s-1 (standard deviation = 0.8). The accuracy (root-mean-square error) of instantaneous velocities has a mean of 8.6 cm s-1 (standard deviation = 6.5) for eastward velocities (the predominant direction of waves), 6.5 cm s-1 (standard deviation = 4.4) for northward velocities, and 2.4 cm s-1 (standard deviation = 1.6) for vertical velocities. Both burst mean and root-mean-square errors are greater for bursts with ub ??? 50 cm s-1. Profiles of burst mean speeds from the bottom five cells were fit to logarithmic curves: 92% of bursts with mean speed ??? 5 cm s-1 have a correlation coefficient R2 > 0.96. In cells close to the transducer, instantaneous velocities are noisy, burst mean velocities are biased low, and bottom orbital velocities are biased high. With adequate blanking distances for both the profile and resolution velocities, the PCADP provides sufficient accuracy to measure velocities in the bottom boundary layer under moderately energetic inner-shelf conditions.
Fitting a function to time-dependent ensemble averaged data.
Fogelmark, Karl; Lomholt, Michael A; Irbäck, Anders; Ambjörnsson, Tobias
2018-05-03
Time-dependent ensemble averages, i.e., trajectory-based averages of some observable, are of importance in many fields of science. A crucial objective when interpreting such data is to fit these averages (for instance, squared displacements) with a function and extract parameters (such as diffusion constants). A commonly overlooked challenge in such function fitting procedures is that fluctuations around mean values, by construction, exhibit temporal correlations. We show that the only available general purpose function fitting methods, correlated chi-square method and the weighted least squares method (which neglects correlation), fail at either robust parameter estimation or accurate error estimation. We remedy this by deriving a new closed-form error estimation formula for weighted least square fitting. The new formula uses the full covariance matrix, i.e., rigorously includes temporal correlations, but is free of the robustness issues, inherent to the correlated chi-square method. We demonstrate its accuracy in four examples of importance in many fields: Brownian motion, damped harmonic oscillation, fractional Brownian motion and continuous time random walks. We also successfully apply our method, weighted least squares including correlation in error estimation (WLS-ICE), to particle tracking data. The WLS-ICE method is applicable to arbitrary fit functions, and we provide a publically available WLS-ICE software.
Mohammadpour, A-H; Nazemian, F; Abtahi, B; Naghibi, M; Gholami, K; Rezaee, S; Nazari, M-R A; Rajabi, O
2008-12-01
Area under the concentration curve (AUC) of mycophenolic acid (MPA) could help to optimize therapeutic drug monitoring during the early post-renal transplant period. The aim of this study was to develop a limited sampling strategy to estimate an abbreviated MPA AUC within the first month after renal transplantation. In this study we selected 19 patients in the early posttransplant period with normal renal graft function (glomerular filtration rate > 70 mL/min). Plasma MPA concentrations were measured using reverse-phase high-performance liquid chromatography. MPA AUC(0-12h) was calculated using the linear trapezoidal rule. Multiple stepwise regression analysis was used to determine the minimal and convenient time points of MPA levels that could be used to derive model equations best fitted to MPA AUC(0-12h). The regression equation for AUC estimation that gave the best performance was AUC = 14.46 C(10) + 15.547 (r(2) = .882). The validation of the method was performed using the jackknife method. Mean prediction error of this model was not different from zero (P > .05) and had a high root mean square prediction error (8.06). In conclusion, this limited sampling strategy provided an effective approach for therapeutic drug monitoring during the early posttransplant period.
NASA Astrophysics Data System (ADS)
Hu, Yongguang; Li, Pingping; Mao, Hanping; Chen, Bin; Wang, Xi
2006-12-01
pH of the wetland soil is one of the most important indicators for aquatic vegetation and water bodies. Mount Beigu Wetland, just near the Yangtse River, is under ecological recovery. Visible and near infrared reflectance spectroscopy was adopted to estimate soil pH of the wetland. The spectroradiometer, FieldSpec 3 (ASD) with a full spectral range (350-2500 nm), was used to acquire the reflectance spectra of wetland soil, and soil pH was measured with the pH meter of IQ150 (Spectrum) and InPro 3030 (Mettler Toledo). 146 soil samples were taken with soil sampler (Eijkelkamp) according to different position and depth, which covered the wider range of pH value from 7.1 to 8.39. 133 samples were used to establish the calibration model with the method of partial least square regression and principal component analysis regression. 13 soil samples were used to validate the model. The results show that the model is not good, but the mean error and root mean standard error of prediction are less (1.846% and 0.186 respectively). Spectral reflectancebased estimation of soil pH of the wetland is applicable and the calibration model needs to be improved.
Zhu, Wen-Jing; Mao, Han-Ping; Li, Qing-Lin; Liu, Hong-Yu; Sun, Jun; Zuo, Zhi-Yu; Chen, Yong
2014-09-01
With 25%, 50%, 75%, 100% and 150%, five levels of, nitrogen (N), phosphorus (P) and potassium (K) nutrition stress samples cultivated in Venlo type greenhouse soilless cultivation mode as the research object, polarized reflectance spectra and hyperspectral images of different nutrient deficiency greenhouse tomato leaves were acquired by using polarized reflectance spectroscopy system developed by our own research group and hyperspectral imaging system respectively. The relationship between a certain number of changes in the bump and texture of non-smooth surface of the nutrient stress leaf and the level of polarization reflected radiation was clarified by scanning electron microscopy (SEM). On the one hand, the polarization spectrum was converted into the degree of polarization through Stokes equation, and the four polarization characteristics between the polarization spectroscopy and reference measurement values of N, P and K respectively were extracted. On the other hand, the four characteristic wavelengths of N, P, K hyperspectral image data were determined respectively through the principal component analysis, followed by eight hyperspectral texture features extracted corresponding to the four characteristic wavelengths through correlation analysis. Polarization characteristics and hyperspectral texture features combined with each characteristics of N, P, K were extracted. These 12 characteristic variables were normalized by maximum-minimum value method. N, P, K nutrient levels quantitative diagnostic models were established by SVR. Results of models are as follows: the correlation coefficient of nitrogen r = 0.961 8, root mean square error RMSE= 0.451; correlation coefficient of phosphorus r = 0.916 3, root mean square error RMSE = 0.620; correlation coefficient of potassium r = 0.940 6, root mean square error RMSE = 0.494. The results show that high precision tomato leaves nutrition prediction model could be built by using polarized reflectance spectroscopy combined with high spectral information fusion technology and achieve good diagnoses effect. It has a great significance for the improvement of model accuracy and the development of special instruments. The research provides a new idea for the rapid detection of tomato nutrient content.
Mahrooghy, Majid; Yarahmadian, Shantia; Menon, Vineetha; Rezania, Vahid; Tuszynski, Jack A
2015-10-01
Microtubules (MTs) are intra-cellular cylindrical protein filaments. They exhibit a unique phenomenon of stochastic growth and shrinkage, called dynamic instability. In this paper, we introduce a theoretical framework for applying Compressive Sensing (CS) to the sampled data of the microtubule length in the process of dynamic instability. To reduce data density and reconstruct the original signal with relatively low sampling rates, we have applied CS to experimental MT lament length time series modeled as a Dichotomous Markov Noise (DMN). The results show that using CS along with the wavelet transform significantly reduces the recovery errors comparing in the absence of wavelet transform, especially in the low and the medium sampling rates. In a sampling rate ranging from 0.2 to 0.5, the Root-Mean-Squared Error (RMSE) decreases by approximately 3 times and between 0.5 and 1, RMSE is small. We also apply a peak detection technique to the wavelet coefficients to detect and closely approximate the growth and shrinkage of MTs for computing the essential dynamic instability parameters, i.e., transition frequencies and specially growth and shrinkage rates. The results show that using compressed sensing along with the peak detection technique and wavelet transform in sampling rates reduces the recovery errors for the parameters. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Guermoui, Mawloud; Gairaa, Kacem; Rabehi, Abdelaziz; Djafer, Djelloul; Benkaciali, Said
2018-06-01
Accurate estimation of solar radiation is the major concern in renewable energy applications. Over the past few years, a lot of machine learning paradigms have been proposed in order to improve the estimation performances, mostly based on artificial neural networks, fuzzy logic, support vector machine and adaptive neuro-fuzzy inference system. The aim of this work is the prediction of the daily global solar radiation, received on a horizontal surface through the Gaussian process regression (GPR) methodology. A case study of Ghardaïa region (Algeria) has been used in order to validate the above methodology. In fact, several combinations have been tested; it was found that, GPR-model based on sunshine duration, minimum air temperature and relative humidity gives the best results in term of mean absolute bias error (MBE), root mean square error (RMSE), relative mean square error (rRMSE), and correlation coefficient ( r) . The obtained values of these indicators are 0.67 MJ/m2, 1.15 MJ/m2, 5.2%, and 98.42%, respectively.
Computationally efficient real-time interpolation algorithm for non-uniform sampled biosignals
Eftekhar, Amir; Kindt, Wilko; Constandinou, Timothy G.
2016-01-01
This Letter presents a novel, computationally efficient interpolation method that has been optimised for use in electrocardiogram baseline drift removal. In the authors’ previous Letter three isoelectric baseline points per heartbeat are detected, and here utilised as interpolation points. As an extension from linear interpolation, their algorithm segments the interpolation interval and utilises different piecewise linear equations. Thus, the algorithm produces a linear curvature that is computationally efficient while interpolating non-uniform samples. The proposed algorithm is tested using sinusoids with different fundamental frequencies from 0.05 to 0.7 Hz and also validated with real baseline wander data acquired from the Massachusetts Institute of Technology University and Boston's Beth Israel Hospital (MIT-BIH) Noise Stress Database. The synthetic data results show an root mean square (RMS) error of 0.9 μV (mean), 0.63 μV (median) and 0.6 μV (standard deviation) per heartbeat on a 1 mVp–p 0.1 Hz sinusoid. On real data, they obtain an RMS error of 10.9 μV (mean), 8.5 μV (median) and 9.0 μV (standard deviation) per heartbeat. Cubic spline interpolation and linear interpolation on the other hand shows 10.7 μV, 11.6 μV (mean), 7.8 μV, 8.9 μV (median) and 9.8 μV, 9.3 μV (standard deviation) per heartbeat. PMID:27382478
Computationally efficient real-time interpolation algorithm for non-uniform sampled biosignals.
Guven, Onur; Eftekhar, Amir; Kindt, Wilko; Constandinou, Timothy G
2016-06-01
This Letter presents a novel, computationally efficient interpolation method that has been optimised for use in electrocardiogram baseline drift removal. In the authors' previous Letter three isoelectric baseline points per heartbeat are detected, and here utilised as interpolation points. As an extension from linear interpolation, their algorithm segments the interpolation interval and utilises different piecewise linear equations. Thus, the algorithm produces a linear curvature that is computationally efficient while interpolating non-uniform samples. The proposed algorithm is tested using sinusoids with different fundamental frequencies from 0.05 to 0.7 Hz and also validated with real baseline wander data acquired from the Massachusetts Institute of Technology University and Boston's Beth Israel Hospital (MIT-BIH) Noise Stress Database. The synthetic data results show an root mean square (RMS) error of 0.9 μV (mean), 0.63 μV (median) and 0.6 μV (standard deviation) per heartbeat on a 1 mVp-p 0.1 Hz sinusoid. On real data, they obtain an RMS error of 10.9 μV (mean), 8.5 μV (median) and 9.0 μV (standard deviation) per heartbeat. Cubic spline interpolation and linear interpolation on the other hand shows 10.7 μV, 11.6 μV (mean), 7.8 μV, 8.9 μV (median) and 9.8 μV, 9.3 μV (standard deviation) per heartbeat.
NASA Astrophysics Data System (ADS)
Zhang, Wei; Qu, Zhengyi; Wang, Yingping; Yao, Chunlin; Bai, Xueyuan; Bian, Shuai; Zhao, Bing
2015-03-01
Ginsenosides in plant samples have been extensively studied because protopanaxadiol saponins are ubiquitous in Chinese patent medicines, in which they can be used in promoting human health as the main active ingredients. A method for rapid determination of two ginsenosides (Rg1 and Re) in Naosaitong (NST) samples using near-infrared reflectance spectroscopy (NIRS) is studied to determine the contents of ginsenoside Rg1 and Re in this work. Partial least square (PLS) regression was used for building the calibration models, and the effects of spectral preprocessing and variable selection on the models are investigated for optimization of the models. A total of 93 samples were scanned by NIRS, and also by high performance liquid chromatography coupled to a diode array detector to determine the contents of ginsenoside Rg1 and Re. The calibration models for Rg1 and Re had high values of the coefficient of determination (R2) (0.9766 and 0.9764) and low root mean square error of cross validation (RMSECV) (0.0136 and 0.0104), and the values of the standard error of prediction set (SEP) are 0.00764 and 0.0103, which indicate a good correlation between reference values and NIRS predicted values. The overall results show that NIRS could be applied for the rapid determination of the contents of ginsenosides in Ginseng byproducts for pharmaceuticals that develop high-quality Chinese patent medicines.
The Use of Neural Networks in Identifying Error Sources in Satellite-Derived Tropical SST Estimates
Lee, Yung-Hsiang; Ho, Chung-Ru; Su, Feng-Chun; Kuo, Nan-Jung; Cheng, Yu-Hsin
2011-01-01
An neural network model of data mining is used to identify error sources in satellite-derived tropical sea surface temperature (SST) estimates from thermal infrared sensors onboard the Geostationary Operational Environmental Satellite (GOES). By using the Back Propagation Network (BPN) algorithm, it is found that air temperature, relative humidity, and wind speed variation are the major factors causing the errors of GOES SST products in the tropical Pacific. The accuracy of SST estimates is also improved by the model. The root mean square error (RMSE) for the daily SST estimate is reduced from 0.58 K to 0.38 K and mean absolute percentage error (MAPE) is 1.03%. For the hourly mean SST estimate, its RMSE is also reduced from 0.66 K to 0.44 K and the MAPE is 1.3%. PMID:22164030
Ejlerskov, Katrine T.; Jensen, Signe M.; Christensen, Line B.; Ritz, Christian; Michaelsen, Kim F.; Mølgaard, Christian
2014-01-01
For 3-year-old children suitable methods to estimate body composition are sparse. We aimed to develop predictive equations for estimating fat-free mass (FFM) from bioelectrical impedance (BIA) and anthropometry using dual-energy X-ray absorptiometry (DXA) as reference method using data from 99 healthy 3-year-old Danish children. Predictive equations were derived from two multiple linear regression models, a comprehensive model (height2/resistance (RI), six anthropometric measurements) and a simple model (RI, height, weight). Their uncertainty was quantified by means of 10-fold cross-validation approach. Prediction error of FFM was 3.0% for both equations (root mean square error: 360 and 356 g, respectively). The derived equations produced BIA-based prediction of FFM and FM near DXA scan results. We suggest that the predictive equations can be applied in similar population samples aged 2–4 years. The derived equations may prove useful for studies linking body composition to early risk factors and early onset of obesity. PMID:24463487
Ejlerskov, Katrine T; Jensen, Signe M; Christensen, Line B; Ritz, Christian; Michaelsen, Kim F; Mølgaard, Christian
2014-01-27
For 3-year-old children suitable methods to estimate body composition are sparse. We aimed to develop predictive equations for estimating fat-free mass (FFM) from bioelectrical impedance (BIA) and anthropometry using dual-energy X-ray absorptiometry (DXA) as reference method using data from 99 healthy 3-year-old Danish children. Predictive equations were derived from two multiple linear regression models, a comprehensive model (height(2)/resistance (RI), six anthropometric measurements) and a simple model (RI, height, weight). Their uncertainty was quantified by means of 10-fold cross-validation approach. Prediction error of FFM was 3.0% for both equations (root mean square error: 360 and 356 g, respectively). The derived equations produced BIA-based prediction of FFM and FM near DXA scan results. We suggest that the predictive equations can be applied in similar population samples aged 2-4 years. The derived equations may prove useful for studies linking body composition to early risk factors and early onset of obesity.
NASA Astrophysics Data System (ADS)
Ou-Yang, Mang; Jeng, Wei-De; Wu, Yin-Yi; Dung, Lan-Rong; Wu, Hsien-Ming; Weng, Ping-Kuo; Huang, Ker-Jer; Chiu, Luan-Jiau
2012-05-01
This study investigates image processing using the radial imaging capsule endoscope (RICE) system. First, an experimental environment is established in which a simulated object has a shape that is similar to a cylinder, such that a triaxial platform can be used to push the RICE into the sample and capture radial images. Then four algorithms (mean absolute error, mean square error, Pearson correlation coefficient, and deformation processing) are used to stitch the images together. The Pearson correlation coefficient method is the most effective algorithm because it yields the highest peak signal-to-noise ratio, higher than 80.69 compared to the original image. Furthermore, a living animal experiment is carried out. Finally, the Pearson correlation coefficient method and vector deformation processing are used to stitch the images that were captured in the living animal experiment. This method is very attractive because unlike the other methods, in which two lenses are required to reconstruct the geometrical image, RICE uses only one lens and one mirror.
Lankford, Christopher L; Does, Mark D
2018-02-01
Quantitative MRI may require correcting for nuisance parameters which can or must be constrained to independently measured or assumed values. The noise and/or bias in these constraints propagate to fitted parameters. For example, the case of refocusing pulse flip angle constraint in multiple spin echo T 2 mapping is explored. An analytical expression for the mean-squared error of a parameter of interest was derived as a function of the accuracy and precision of an independent estimate of a nuisance parameter. The expression was validated by simulations and then used to evaluate the effects of flip angle (θ) constraint on the accuracy and precision of T⁁2 for a variety of multi-echo T 2 mapping protocols. Constraining θ improved T⁁2 precision when the θ-map signal-to-noise ratio was greater than approximately one-half that of the first spin echo image. For many practical scenarios, constrained fitting was calculated to reduce not just the variance but the full mean-squared error of T⁁2, for bias in θ⁁≲6%. The analytical expression derived in this work can be applied to inform experimental design in quantitative MRI. The example application to T 2 mapping provided specific cases, depending on θ⁁ accuracy and precision, in which θ⁁ measurement and constraint would be beneficial to T⁁2 variance or mean-squared error. Magn Reson Med 79:673-682, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Medium-range Performance of the Global NWP Model
NASA Astrophysics Data System (ADS)
Kim, J.; Jang, T.; Kim, J.; Kim, Y.
2017-12-01
The medium-range performance of the global numerical weather prediction (NWP) model in the Korea Meteorological Administration (KMA) is investigated. The performance is based on the prediction of the extratropical circulation. The mean square error is expressed by sum of spatial variance of discrepancy between forecasts and observations and the square of the mean error (ME). Thus, it is important to investigate the ME effect in order to understand the model performance. The ME is expressed by the subtraction of an anomaly from forecast difference against the real climatology. It is found that the global model suffers from a severe systematic ME in medium-range forecasts. The systematic ME is dominant in the entire troposphere in all months. Such ME can explain at most 25% of root mean square error. We also compare the extratropical ME distribution with that from other NWP centers. NWP models exhibit similar spatial ME structure each other. It is found that the spatial ME pattern is highly correlated to that of an anomaly, implying that the ME varies with seasons. For example, the correlation coefficient between ME and anomaly ranges from -0.51 to -0.85 by months. The pattern of the extratropical circulation also has a high correlation to an anomaly. The global model has trouble in faithfully simulating extratropical cyclones and blockings in the medium-range forecast. In particular, the model has a hard to simulate an anomalous event in medium-range forecasts. If we choose an anomalous period for a test-bed experiment, we will suffer from a large error due to an anomaly.
Blind prediction of cyclohexane-water distribution coefficients from the SAMPL5 challenge.
Bannan, Caitlin C; Burley, Kalistyn H; Chiu, Michael; Shirts, Michael R; Gilson, Michael K; Mobley, David L
2016-11-01
In the recent SAMPL5 challenge, participants submitted predictions for cyclohexane/water distribution coefficients for a set of 53 small molecules. Distribution coefficients (log D) replace the hydration free energies that were a central part of the past five SAMPL challenges. A wide variety of computational methods were represented by the 76 submissions from 18 participating groups. Here, we analyze submissions by a variety of error metrics and provide details for a number of reference calculations we performed. As in the SAMPL4 challenge, we assessed the ability of participants to evaluate not just their statistical uncertainty, but their model uncertainty-how well they can predict the magnitude of their model or force field error for specific predictions. Unfortunately, this remains an area where prediction and analysis need improvement. In SAMPL4 the top performing submissions achieved a root-mean-squared error (RMSE) around 1.5 kcal/mol. If we anticipate accuracy in log D predictions to be similar to the hydration free energy predictions in SAMPL4, the expected error here would be around 1.54 log units. Only a few submissions had an RMSE below 2.5 log units in their predicted log D values. However, distribution coefficients introduced complexities not present in past SAMPL challenges, including tautomer enumeration, that are likely to be important in predicting biomolecular properties of interest to drug discovery, therefore some decrease in accuracy would be expected. Overall, the SAMPL5 distribution coefficient challenge provided great insight into the importance of modeling a variety of physical effects. We believe these types of measurements will be a promising source of data for future blind challenges, especially in view of the relatively straightforward nature of the experiments and the level of insight provided.
Blind prediction of cyclohexane-water distribution coefficients from the SAMPL5 challenge
Bannan, Caitlin C.; Burley, Kalistyn H.; Chiu, Michael; Shirts, Michael R.; Gilson, Michael K.; Mobley, David L.
2016-01-01
In the recent SAMPL5 challenge, participants submitted predictions for cyclohexane/water distribution coefficients for a set of 53 small molecules. Distribution coefficients (log D) replace the hydration free energies that were a central part of the past five SAMPL challenges. A wide variety of computational methods were represented by the 76 submissions from 18 participating groups. Here, we analyze submissions by a variety of error metrics and provide details for a number of reference calculations we performed. As in the SAMPL4 challenge, we assessed the ability of participants to evaluate not just their statistical uncertainty, but their model uncertainty – how well they can predict the magnitude of their model or force field error for specific predictions. Unfortunately, this remains an area where prediction and analysis need improvement. In SAMPL4 the top performing submissions achieved a root-mean-squared error (RMSE) around 1.5 kcal/mol. If we anticipate accuracy in log D predictions to be similar to the hydration free energy predictions in SAMPL4, the expected error here would be around 1.54 log units. Only a few submissions had an RMSE below 2.5 log units in their predicted log D values. However, distribution coefficients introduced complexities not present in past SAMPL challenges, including tautomer enumeration, that are likely to be important in predicting biomolecular properties of interest to drug discovery, therefore some decrease in accuracy would be expected. Overall, the SAMPL5 distribution coefficient challenge provided great insight into the importance of modeling a variety of physical effects. We believe these types of measurements will be a promising source of data for future blind challenges, especially in view of the relatively straightforward nature of the experiments and the level of insight provided. PMID:27677750
Predicting active-layer soil thickness using topographic variables at a small watershed scale
Li, Aidi; Tan, Xing; Wu, Wei; Liu, Hongbin; Zhu, Jie
2017-01-01
Knowledge about the spatial distribution of active-layer (AL) soil thickness is indispensable for ecological modeling, precision agriculture, and land resource management. However, it is difficult to obtain the details on AL soil thickness by using conventional soil survey method. In this research, the objective is to investigate the possibility and accuracy of mapping the spatial distribution of AL soil thickness through random forest (RF) model by using terrain variables at a small watershed scale. A total of 1113 soil samples collected from the slope fields were randomly divided into calibration (770 soil samples) and validation (343 soil samples) sets. Seven terrain variables including elevation, aspect, relative slope position, valley depth, flow path length, slope height, and topographic wetness index were derived from a digital elevation map (30 m). The RF model was compared with multiple linear regression (MLR), geographically weighted regression (GWR) and support vector machines (SVM) approaches based on the validation set. Model performance was evaluated by precision criteria of mean error (ME), mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2). Comparative results showed that RF outperformed MLR, GWR and SVM models. The RF gave better values of ME (0.39 cm), MAE (7.09 cm), and RMSE (10.85 cm) and higher R2 (62%). The sensitivity analysis demonstrated that the DEM had less uncertainty than the AL soil thickness. The outcome of the RF model indicated that elevation, flow path length and valley depth were the most important factors affecting the AL soil thickness variability across the watershed. These results demonstrated the RF model is a promising method for predicting spatial distribution of AL soil thickness using terrain parameters. PMID:28877196
Miaw, Carolina Sheng Whei; Assis, Camila; Silva, Alessandro Rangel Carolino Sales; Cunha, Maria Luísa; Sena, Marcelo Martins; de Souza, Scheilla Vitorino Carvalho
2018-07-15
Grape, orange, peach and passion fruit nectars were formulated and adulterated by dilution with syrup, apple and cashew juices at 10 levels for each adulterant. Attenuated total reflectance Fourier transform mid infrared (ATR-FTIR) spectra were obtained. Partial least squares (PLS) multivariate calibration models allied to different variable selection methods, such as interval partial least squares (iPLS), ordered predictors selection (OPS) and genetic algorithm (GA), were used to quantify the main fruits. PLS improved by iPLS-OPS variable selection showed the highest predictive capacity to quantify the main fruit contents. The selected variables in the final models varied from 72 to 100; the root mean square errors of prediction were estimated from 0.5 to 2.6%; the correlation coefficients of prediction ranged from 0.948 to 0.990; and, the mean relative errors of prediction varied from 3.0 to 6.7%. All of the developed models were validated. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Hanike, Yusrianti; Sadik, Kusman; Kurnia, Anang
2016-02-01
This research implemented unemployment rate in Indonesia that based on Poisson distribution. It would be estimated by modified the post-stratification and Small Area Estimation (SAE) model. Post-stratification was one of technique sampling that stratified after collected survey data. It's used when the survey data didn't serve for estimating the interest area. Interest area here was the education of unemployment which separated in seven category. The data was obtained by Labour Employment National survey (Sakernas) that's collected by company survey in Indonesia, BPS, Statistic Indonesia. This company served the national survey that gave too small sample for level district. Model of SAE was one of alternative to solved it. According the problem above, we combined this post-stratification sampling and SAE model. This research gave two main model of post-stratification sampling. Model I defined the category of education was the dummy variable and model II defined the category of education was the area random effect. Two model has problem wasn't complied by Poisson assumption. Using Poisson-Gamma model, model I has over dispersion problem was 1.23 solved to 0.91 chi square/df and model II has under dispersion problem was 0.35 solved to 0.94 chi square/df. Empirical Bayes was applied to estimate the proportion of every category education of unemployment. Using Bayesian Information Criteria (BIC), Model I has smaller mean square error (MSE) than model II.
Castritius, Stefan; Kron, Alexander; Schäfer, Thomas; Rädle, Matthias; Harms, Diedrich
2010-12-22
A new approach of combination of near-infrared (NIR) spectroscopy and refractometry was developed in this work to determine the concentration of alcohol and real extract in various beer samples. A partial least-squares (PLS) regression, as multivariate calibration method, was used to evaluate the correlation between the data of spectroscopy/refractometry and alcohol/extract concentration. This multivariate combination of spectroscopy and refractometry enhanced the precision in the determination of alcohol, compared to single spectroscopy measurements, due to the effect of high extract concentration on the spectral data, especially of nonalcoholic beer samples. For NIR calibration, two mathematical pretreatments (first-order derivation and linear baseline correction) were applied to eliminate light scattering effects. A sample grouping of the refractometry data was also applied to increase the accuracy of the determined concentration. The root mean squared errors of validation (RMSEV) of the validation process concerning alcohol and extract concentration were 0.23 Mas% (method A), 0.12 Mas% (method B), and 0.19 Mas% (method C) and 0.11 Mas% (method A), 0.11 Mas% (method B), and 0.11 Mas% (method C), respectively.
Analysis and application of minimum variance discrete time system identification
NASA Technical Reports Server (NTRS)
Kaufman, H.; Kotob, S.
1975-01-01
An on-line minimum variance parameter identifier is developed which embodies both accuracy and computational efficiency. The formulation results in a linear estimation problem with both additive and multiplicative noise. The resulting filter which utilizes both the covariance of the parameter vector itself and the covariance of the error in identification is proven to be mean square convergent and mean square consistent. The MV parameter identification scheme is then used to construct a stable state and parameter estimation algorithm.
NASA Technical Reports Server (NTRS)
Stowe, Larry; Hucek, Richard; Ardanuy, Philip; Joyce, Robert
1994-01-01
Much of the new record of broadband earth radiation budget satellite measurements to be obtained during the late 1990s and early twenty-first century will come from the dual-radiometer Clouds and Earth's Radiant Energy System Instrument (CERES-I) flown aboard sun-synchronous polar orbiters. Simulation studies conducted in this work for an early afternoon satellite orbit indicate that spatial root-mean-square (rms) sampling errors of instantaneous CERES-I shortwave flux estimates will range from about 8.5 to 14.0 W/m on a 2.5 deg latitude and longitude grid resolution. Rms errors in longwave flux estimates are only about 20% as large and range from 1.5 to 3.5 W/sq m. These results are based on an optimal cross-track scanner design that includes 50% footprint overlap to eliminate gaps in the top-of-the-atmosphere coverage, and a 'smallest' footprint size to increase the ratio in the number of observations lying within to the number of observations lying on grid area boundaries. Total instantaneous measurement error also depends on the variability of anisotropic reflectance and emission patterns and on retrieval methods used to generate target area fluxes. Three retrieval procedures from both CERES-I scanners (cross-track and rotating azimuth plane) are used. (1) The baseline Earth Radiaton Budget Experiment (ERBE) procedure, which assumes that errors due to the use of mean angular dependence models (ADMs) in the radiance-to-flux inversion process nearly cancel when averaged over grid areas. (2) To estimate N, instantaneous ADMs are estimated from the multiangular, collocated observations of the two scanners. These observed models replace the mean models in computation of satellite flux estimates. (3) The scene flux approach, conducts separate target-area retrievals for each ERBE scene category and combines their results using area weighting by scene type. The ERBE retrieval performs best when the simulated radiance field departs from the ERBE mean models by less than 10%. For larger perturbations, both the scene flux and collocation methods produce less error than the ERBE retrieval. The scene flux technique is preferable, however, because it involves fewer restrictive assumptions.
Arnold, Benjamin F; Galiani, Sebastian; Ram, Pavani K; Hubbard, Alan E; Briceño, Bertha; Gertler, Paul J; Colford, John M
2013-02-15
Many community-based studies of acute child illness rely on cases reported by caregivers. In prior investigations, researchers noted a reporting bias when longer illness recall periods were used. The use of recall periods longer than 2-3 days has been discouraged to minimize this reporting bias. In the present study, we sought to determine the optimal recall period for illness measurement when accounting for both bias and variance. Using data from 12,191 children less than 24 months of age collected in 2008-2009 from Himachal Pradesh in India, Madhya Pradesh in India, Indonesia, Peru, and Senegal, we calculated bias, variance, and mean squared error for estimates of the prevalence ratio between groups defined by anemia, stunting, and underweight status to identify optimal recall periods for caregiver-reported diarrhea, cough, and fever. There was little bias in the prevalence ratio when a 7-day recall period was used (<10% in 35 of 45 scenarios), and the mean squared error was usually minimized with recall periods of 6 or more days. Shortening the recall period from 7 days to 2 days required sample-size increases of 52%-92% for diarrhea, 47%-61% for cough, and 102%-206% for fever. In contrast to the current practice of using 2-day recall periods, this work suggests that studies should measure caregiver-reported illness with a 7-day recall period.
NASA Astrophysics Data System (ADS)
Liu, Tingting; Zhang, Ling; Wang, Shutao; Cui, Yaoyao; Wang, Yutian; Liu, Lingfei; Yang, Zhe
2018-03-01
Qualitative and quantitative analysis of polycyclic aromatic hydrocarbons (PAHs) was carried out by three-dimensional fluorescence spectroscopy combining with Alternating Weighted Residue Constraint Quadrilinear Decomposition (AWRCQLD). The experimental subjects were acenaphthene (ANA) and naphthalene (NAP). Firstly, in order to solve the redundant information of the three-dimensional fluorescence spectral data, the wavelet transform was used to compress data in preprocessing. Then, the four-dimensional data was constructed by using the excitation-emission fluorescence spectra of different concentration PAHs. The sample data was obtained from three solvents that are methanol, ethanol and Ultra-pure water. The four-dimensional spectral data was analyzed by AWRCQLD, then the recovery rate of PAHs was obtained from the three solvents and compared respectively. On one hand, the results showed that PAHs can be measured more accurately by the high-order data, and the recovery rate was higher. On the other hand, the results presented that AWRCQLD can better reflect the superiority of four-dimensional algorithm than the second-order calibration and other third-order calibration algorithms. The recovery rate of ANA was 96.5% 103.3% and the root mean square error of prediction was 0.04 μgL- 1. The recovery rate of NAP was 96.7% 115.7% and the root mean square error of prediction was 0.06 μgL- 1.
A fast algorithm to compute precise type-2 centroids for real-time control applications.
Chakraborty, Sumantra; Konar, Amit; Ralescu, Anca; Pal, Nikhil R
2015-02-01
An interval type-2 fuzzy set (IT2 FS) is characterized by its upper and lower membership functions containing all possible embedded fuzzy sets, which together is referred to as the footprint of uncertainty (FOU). The FOU results in a span of uncertainty measured in the defuzzified space and is determined by the positional difference of the centroids of all the embedded fuzzy sets taken together. This paper provides a closed-form formula to evaluate the span of uncertainty of an IT2 FS. The closed-form formula offers a precise measurement of the degree of uncertainty in an IT2 FS with a runtime complexity less than that of the classical iterative Karnik-Mendel algorithm and other formulations employing the iterative Newton-Raphson algorithm. This paper also demonstrates a real-time control application using the proposed closed-form formula of centroids with reduced root mean square error and computational overhead than those of the existing methods. Computer simulations for this real-time control application indicate that parallel realization of the IT2 defuzzification outperforms its competitors with respect to maximum overshoot even at high sampling rates. Furthermore, in the presence of measurement noise in system (plant) states, the proposed IT2 FS based scheme outperforms its type-1 counterpart with respect to peak overshoot and root mean square error in plant response.
Comparison of in vitro and in situ methods in evaluation of forage digestibility in ruminants.
Krizsan, S J; Nyholm, L; Nousiainen, J; Südekum, K-H; Huhtanen, P
2012-09-01
The objective of this study was to compare the application of different in vitro and in situ methods in empirical and mechanistic predictions of in vivo OM digestibility (OMD) and their associations to near-infrared reflectance spectroscopy spectra for a variety of forages. Apparent in vivo OMD of silages made from alfalfa (n = 2), corn (n = 9), corn stover (n = 2), grass (n = 11), whole crops of wheat and barley (n = 8) and red clover (n = 7), and fresh alfalfa (n = 1), grass hays (n = 5), and wheat straws (n = 5) had previously been determined in sheep. Concentrations of indigestible NDF (iNDF) in all forage samples were determined by a 288-h ruminal in situ incubation. Gas production of isolated forage NDF was measured by in vitro incubations for 72 h. In vitro pepsin-cellulase OM solubility (OMS) of the forages was determined by a 2-step gravimetric digestion method. Samples were also subjected to a 2-step determination of in vitro OMD based on buffered rumen fluid and pepsin. Further, rumen fluid digestible OM was determined from a single 96-h incubation at 38°C. Digestibility of OM from the in situ and the in vitro incubations was calculated according to published empirical equations, which were either forage specific or general (1 equation for all forages) within method. Indigestible NDF was also used in a mechanistic model to predict OMD. Predictions of OMD were evaluated by residual analysis using the GLM procedure in SAS. In vitro OMS in a general prediction equation of OMD did not display a significant forage-type effect on the residuals (observed - predicted OMD; P = 0.10). Predictions of OMD within forage types were consistent between iNDF and the 2-step in vitro method based on rumen fluid. Root mean square error of OMD was least (0.032) when the prediction was based on a general forage equation of OMS. However, regenerating a simple regression for iNDF by omitting alfalfa and wheat straw reduced the root mean square error of OMD to 0.025. Indigestible NDF in a general forage equation predicted OMD without any bias (P ≥ 0.16), and root mean square error of prediction was smallest among all methods when alfalfa and wheat straw samples were excluded. Our study suggests that compared with the in vitro laboratory methods, iNDF used in forage-specific equations will improve overall predictions of forage in vivo OMD. The in vitro and in situ methods performed equally well in calibrations of iNDF or OMD by near-infrared reflectance spectroscopy.
Neural self-tuning adaptive control of non-minimum phase system
NASA Technical Reports Server (NTRS)
Ho, Long T.; Bialasiewicz, Jan T.; Ho, Hai T.
1993-01-01
The motivation of this research came about when a neural network direct adaptive control scheme was applied to control the tip position of a flexible robotic arm. Satisfactory control performance was not attainable due to the inherent non-minimum phase characteristics of the flexible robotic arm tip. Most of the existing neural network control algorithms are based on the direct method and exhibit very high sensitivity, if not unstable, closed-loop behavior. Therefore, a neural self-tuning control (NSTC) algorithm is developed and applied to this problem and showed promising results. Simulation results of the NSTC scheme and the conventional self-tuning (STR) control scheme are used to examine performance factors such as control tracking mean square error, estimation mean square error, transient response, and steady state response.
NASA Astrophysics Data System (ADS)
Zarychta, R.; Zarychta, A.
2013-12-01
Extraction of mineral resources, including rocks, usually causes some significant changes of the landscape. Transformation of the relief which character and scale can be analysed by means of cartographic materials seems to be the most interesting. Reconstruction of the relief of the period prior to the exploitation is a starting point for such investigation. It can be done basing on archival cartographic materials which are difficult to obtain. However, too varied morphological material of the area can lead to erroneous conclusions which suggests interpretation of three - dimensional models of the relief. Hence, the paper deals with reconstruction and visualisation of the relief (in the period before the exploitation) of four sand fields of the old sand mine excavation "Siemonia". A geological map of Poland (Wojkowice sheet) has been used for the purpose. A geostatical analysis by means of the programmes Surfer 8 and ArcGIS 10.1. has been performed on the map. An estimation method called ordinary kriging, which is related to B.L.U.E. (best linear unbiased estimator), where the condition of the lack of weight of the measurement (the sum of weight is equal to 1) is fulfilled, has been applied. The calculated values of errors (mean error, mean squared error and mean squared standardised error) obtained as a result of application of the cross - validation procedure are, to a large extent, in agreement with predetermined values of errors given by numerous authors in the scientific literature. It confirms proper "manual" adjustment of two mathematic al models of spherical variograms and empirical variograms. The generated contour map of the investigated area (based on estimated points of sampling in nodes of the interpolation grid) together with its three - dimensional digital model are more adequate (due to significant marking of the relief) to the previous state of the investigated area than the two other presented types of cartographic visualisations made without application of the geostatistical methods. Hence, the graphic presentation of results, mentioned as the last one, can be only applied to visualise the relief without any detailed geomorphological interpretations due to its inaccuracy. It seems to be obvious that detailed analyses can be performed basing on a digital model of the terrain accompanied by its contour map obtained when reconstruction of the relief is made by means of geostatistical methods (especially ordinary kriging).
Hard choices in assessing survival past dams — a comparison of single- and paired-release strategies
Zydlewski, Joseph D.; Stich, Daniel S.; Sigourney, Douglas B.
2017-01-01
Mark–recapture models are widely used to estimate survival of salmon smolts migrating past dams. Paired releases have been used to improve estimate accuracy by removing components of mortality not attributable to the dam. This method is accompanied by reduced precision because (i) sample size is reduced relative to a single, large release; and (ii) variance calculations inflate error. We modeled an idealized system with a single dam to assess trade-offs between accuracy and precision and compared methods using root mean squared error (RMSE). Simulations were run under predefined conditions (dam mortality, background mortality, detection probability, and sample size) to determine scenarios when the paired release was preferable to a single release. We demonstrate that a paired-release design provides a theoretical advantage over a single-release design only at large sample sizes and high probabilities of detection. At release numbers typical of many survival studies, paired release can result in overestimation of dam survival. Failures to meet model assumptions of a paired release may result in further overestimation of dam-related survival. Under most conditions, a single-release strategy was preferable.
Abreu, Patrícia B de; Cogo-Moreira, Hugo; Pose, Regina A; Laranjeira, Ronaldo; Caetano, Raul; Gaya, Carolina M; Madruga, Clarice S
2017-01-01
To perform a construct validation of the List of Threatening Events Questionnaire (LTE-Q), as well as convergence validation by identifying its association with drug use in a sample of the Brazilian population. This is a secondary analysis of the Second Brazilian National Alcohol and Drugs Survey (II BNADS), which used a cross-cultural adaptation of the LTE-Q in a probabilistic sample of 4,607 participants aged 14 years and older. Latent class analysis was used to validate the latent trait adversity (which considered the number of events from the list of 12 item in the LTE experienced by the respondent in the previous year) and logistic regression was performed to find its association with binge drinking and cocaine use. The confirmatory factor analysis returned a chi-square of 108.341, weighted root mean square residual (WRMR) of 1.240, confirmatory fit indices (CFI) of 0.970, Tucker-Lewis index (TLI) of 0.962, and root mean square error approximation (RMSEA) score of 1.000. LTE-Q convergence validation showed that the adversity latent trait increased the chances of binge drinking by 1.31 time and doubled the chances of previous year cocaine use (adjusted by sociodemographic variables). The use of the LTE-Q in Brazil should be encouraged in different research fields, including large epidemiological surveys, as it is also appropriate when time and budget are limited. The LTE-Q can be a useful tool in the development of targeted and more efficient prevention strategies.
Yang, Jie; Liu, Qingquan; Dai, Wei
2017-02-01
To improve the air temperature observation accuracy, a low measurement error temperature sensor is proposed. A computational fluid dynamics (CFD) method is implemented to obtain temperature errors under various environmental conditions. Then, a temperature error correction equation is obtained by fitting the CFD results using a genetic algorithm method. The low measurement error temperature sensor, a naturally ventilated radiation shield, a thermometer screen, and an aspirated temperature measurement platform are characterized in the same environment to conduct the intercomparison. The aspirated platform served as an air temperature reference. The mean temperature errors of the naturally ventilated radiation shield and the thermometer screen are 0.74 °C and 0.37 °C, respectively. In contrast, the mean temperature error of the low measurement error temperature sensor is 0.11 °C. The mean absolute error and the root mean square error between the corrected results and the measured results are 0.008 °C and 0.01 °C, respectively. The correction equation allows the temperature error of the low measurement error temperature sensor to be reduced by approximately 93.8%. The low measurement error temperature sensor proposed in this research may be helpful to provide a relatively accurate air temperature result.
Vilmin, Franck; Dussap, Claude; Coste, Nathalie
2006-06-01
In the tire industry, synthetic styrene-butadiene rubber (SBR), butadiene rubber (BR), and isoprene rubber (IR) elastomers are essential for conferring to the product its properties of grip and rolling resistance. Their physical properties depend on their chemical composition, i. e., their microstructure and styrene content, which must be accurately controlled. This paper describes a fast, robust, and highly reproducible near-infrared analytical method for the quantitative determination of the microstructure and styrene content. The quantitative models are calculated with the help of pure spectral profiles estimated from a partial least squares (PLS) regression, using (13)C nuclear magnetic resonance (NMR) as the reference method. This versatile approach allows the models to be applied over a large range of compositions, from a single BR to an SBR-IR blend. The resulting quantitative predictions are independent of the sample path length. As a consequence, the sample preparation is solvent free and simplified with a very fast (five minutes) hot filming step of a bulk polymer piece. No precise thickness control is required. Thus, the operator effect becomes negligible and the method is easily transferable. The root mean square error of prediction, depending on the rubber composition, is between 0.7% and 1.3%. The reproducibility standard error is less than 0.2% in every case.
Precise calibration of spatial phase response nonuniformity arising in liquid crystal on silicon.
Xu, Jingquan; Qin, SiYi; Liu, Chen; Fu, Songnian; Liu, Deming
2018-06-15
In order to calibrate the spatial phase response nonuniformity of liquid crystal on silicon (LCoS), we propose to use a Twyman-Green interferometer to characterize the wavefront distortion, due to the inherent curvature of the device. During the characterization, both the residual carrier frequency introduced by the Fourier transform evaluation method and the lens aberration are error sources. For the tilted phase error introduced by residual carrier frequency, the least mean square fitting method is used to obtain the tilted phase error. Meanwhile, we use Zernike polynomials fitting based on plane mirror calibration to mitigate the lens aberration. For a typical LCoS with 1×12,288 pixels after calibration, the peak-to-valley value of the inherent wavefront distortion is approximately 0.25λ at 1550 nm, leading to a half-suppression of wavefront distortion. All efforts can suppress the root mean squares value of the inherent wavefront distortion to approximately λ/34.
NASA Astrophysics Data System (ADS)
Sun, Li-wei; Ye, Xin; Fang, Wei; He, Zhen-lei; Yi, Xiao-long; Wang, Yu-peng
2017-11-01
Hyper-spectral imaging spectrometer has high spatial and spectral resolution. Its radiometric calibration needs the knowledge of the sources used with high spectral resolution. In order to satisfy the requirement of source, an on-orbit radiometric calibration method is designed in this paper. This chain is based on the spectral inversion accuracy of the calibration light source. We compile the genetic algorithm progress which is used to optimize the channel design of the transfer radiometer and consider the degradation of the halogen lamp, thus realizing the high accuracy inversion of spectral curve in the whole working time. The experimental results show the average root mean squared error is 0.396%, the maximum root mean squared error is 0.448%, and the relative errors at all wavelengths are within 1% in the spectral range from 500 nm to 900 nm during 100 h operating time. The design lays a foundation for the high accuracy calibration of imaging spectrometer.
Research on the infiltration processes of lawn soils of the Babao River in the Qilian Mountain.
Li, GuangWen; Feng, Qi; Zhang, FuPing; Cheng, AiFang
2014-01-01
Using a Guelph Permeameter, the soil water infiltration processes were analyzed in the Babao River of the Qilian Mountain in China. The results showed that the average soil initial infiltration and the steady infiltration rates in the upstream reaches of the Babao River are 1.93 and 0.99 cm/min, whereas those of the middle area are 0.48 cm/min and 0.21 cm/min, respectively. The infiltration processes can be divided into three stages: the rapidly changing stage (0-10 min), the slowly changing stage (10-30 min) and the stabilization stage (after 30 min). We used field data collected from lawn soils and evaluated the performances of the infiltration models of Philip, Kostiakov and Horton with the sum of squared error, the root mean square error, the coefficient of determination, the mean error, the model efficiency and Willmott's index of agreement. The results indicated that the Kostiakov model was most suitable for studying the infiltration process in the alpine lawn soils.
Zhao, Haiquan; Zhang, Jiashu
2009-04-01
This paper proposes a novel computational efficient adaptive nonlinear equalizer based on combination of finite impulse response (FIR) filter and functional link artificial neural network (CFFLANN) to compensate linear and nonlinear distortions in nonlinear communication channel. This convex nonlinear combination results in improving the speed while retaining the lower steady-state error. In addition, since the CFFLANN needs not the hidden layers, which exist in conventional neural-network-based equalizers, it exhibits a simpler structure than the traditional neural networks (NNs) and can require less computational burden during the training mode. Moreover, appropriate adaptation algorithm for the proposed equalizer is derived by the modified least mean square (MLMS). Results obtained from the simulations clearly show that the proposed equalizer using the MLMS algorithm can availably eliminate various intensity linear and nonlinear distortions, and be provided with better anti-jamming performance. Furthermore, comparisons of the mean squared error (MSE), the bit error rate (BER), and the effect of eigenvalue ratio (EVR) of input correlation matrix are presented.
Efficient estimation of Pareto model: Some modified percentile estimators.
Bhatti, Sajjad Haider; Hussain, Shahzad; Ahmad, Tanvir; Aslam, Muhammad; Aftab, Muhammad; Raza, Muhammad Ali
2018-01-01
The article proposes three modified percentile estimators for parameter estimation of the Pareto distribution. These modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first-order statistic. The proposed modified estimators are compared with traditional percentile estimators through a Monte Carlo simulation for different parameter combinations with varying sample sizes. Performance of different estimators is assessed in terms of total mean square error and total relative deviation. It is determined that modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic provides efficient and precise parameter estimates compared to other estimators considered. The simulation results were further confirmed using two real life examples where maximum likelihood and moment estimators were also considered.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ping; Wang, Chenyu; Li, Mingjie
In general, the modeling errors of dynamic system model are a set of random variables. The traditional performance index of modeling such as means square error (MSE) and root means square error (RMSE) can not fully express the connotation of modeling errors with stochastic characteristics both in the dimension of time domain and space domain. Therefore, the probability density function (PDF) is introduced to completely describe the modeling errors in both time scales and space scales. Based on it, a novel wavelet neural network (WNN) modeling method is proposed by minimizing the two-dimensional (2D) PDF shaping of modeling errors. First,more » the modeling error PDF by the tradional WNN is estimated using data-driven kernel density estimation (KDE) technique. Then, the quadratic sum of 2D deviation between the modeling error PDF and the target PDF is utilized as performance index to optimize the WNN model parameters by gradient descent method. Since the WNN has strong nonlinear approximation and adaptive capability, and all the parameters are well optimized by the proposed method, the developed WNN model can make the modeling error PDF track the target PDF, eventually. Simulation example and application in a blast furnace ironmaking process show that the proposed method has a higher modeling precision and better generalization ability compared with the conventional WNN modeling based on MSE criteria. Furthermore, the proposed method has more desirable estimation for modeling error PDF that approximates to a Gaussian distribution whose shape is high and narrow.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ping; Wang, Chenyu; Li, Mingjie
In general, the modeling errors of dynamic system model are a set of random variables. The traditional performance index of modeling such as means square error (MSE) and root means square error (RMSE) cannot fully express the connotation of modeling errors with stochastic characteristics both in the dimension of time domain and space domain. Therefore, the probability density function (PDF) is introduced to completely describe the modeling errors in both time scales and space scales. Based on it, a novel wavelet neural network (WNN) modeling method is proposed by minimizing the two-dimensional (2D) PDF shaping of modeling errors. First, themore » modeling error PDF by the traditional WNN is estimated using data-driven kernel density estimation (KDE) technique. Then, the quadratic sum of 2D deviation between the modeling error PDF and the target PDF is utilized as performance index to optimize the WNN model parameters by gradient descent method. Since the WNN has strong nonlinear approximation and adaptive capability, and all the parameters are well optimized by the proposed method, the developed WNN model can make the modeling error PDF track the target PDF, eventually. Simulation example and application in a blast furnace ironmaking process show that the proposed method has a higher modeling precision and better generalization ability compared with the conventional WNN modeling based on MSE criteria. However, the proposed method has more desirable estimation for modeling error PDF that approximates to a Gaussian distribution whose shape is high and narrow.« less
Zhou, Ping; Wang, Chenyu; Li, Mingjie; ...
2018-01-31
In general, the modeling errors of dynamic system model are a set of random variables. The traditional performance index of modeling such as means square error (MSE) and root means square error (RMSE) cannot fully express the connotation of modeling errors with stochastic characteristics both in the dimension of time domain and space domain. Therefore, the probability density function (PDF) is introduced to completely describe the modeling errors in both time scales and space scales. Based on it, a novel wavelet neural network (WNN) modeling method is proposed by minimizing the two-dimensional (2D) PDF shaping of modeling errors. First, themore » modeling error PDF by the traditional WNN is estimated using data-driven kernel density estimation (KDE) technique. Then, the quadratic sum of 2D deviation between the modeling error PDF and the target PDF is utilized as performance index to optimize the WNN model parameters by gradient descent method. Since the WNN has strong nonlinear approximation and adaptive capability, and all the parameters are well optimized by the proposed method, the developed WNN model can make the modeling error PDF track the target PDF, eventually. Simulation example and application in a blast furnace ironmaking process show that the proposed method has a higher modeling precision and better generalization ability compared with the conventional WNN modeling based on MSE criteria. However, the proposed method has more desirable estimation for modeling error PDF that approximates to a Gaussian distribution whose shape is high and narrow.« less
Waltemeyer, Scott D.
2008-01-01
Estimates of the magnitude and frequency of peak discharges are necessary for the reliable design of bridges, culverts, and open-channel hydraulic analysis, and for flood-hazard mapping in New Mexico and surrounding areas. The U.S. Geological Survey, in cooperation with the New Mexico Department of Transportation, updated estimates of peak-discharge magnitude for gaging stations in the region and updated regional equations for estimation of peak discharge and frequency at ungaged sites. Equations were developed for estimating the magnitude of peak discharges for recurrence intervals of 2, 5, 10, 25, 50, 100, and 500 years at ungaged sites by use of data collected through 2004 for 293 gaging stations on unregulated streams that have 10 or more years of record. Peak discharges for selected recurrence intervals were determined at gaging stations by fitting observed data to a log-Pearson Type III distribution with adjustments for a low-discharge threshold and a zero skew coefficient. A low-discharge threshold was applied to frequency analysis of 140 of the 293 gaging stations. This application provides an improved fit of the log-Pearson Type III frequency distribution. Use of the low-discharge threshold generally eliminated the peak discharge by having a recurrence interval of less than 1.4 years in the probability-density function. Within each of the nine regions, logarithms of the maximum peak discharges for selected recurrence intervals were related to logarithms of basin and climatic characteristics by using stepwise ordinary least-squares regression techniques for exploratory data analysis. Generalized least-squares regression techniques, an improved regression procedure that accounts for time and spatial sampling errors, then were applied to the same data used in the ordinary least-squares regression analyses. The average standard error of prediction, which includes average sampling error and average standard error of regression, ranged from 38 to 93 percent (mean value is 62, and median value is 59) for the 100-year flood. The 1996 investigation standard error of prediction for the flood regions ranged from 41 to 96 percent (mean value is 67, and median value is 68) for the 100-year flood that was analyzed by using generalized least-squares regression analysis. Overall, the equations based on generalized least-squares regression techniques are more reliable than those in the 1996 report because of the increased length of record and improved geographic information system (GIS) method to determine basin and climatic characteristics. Flood-frequency estimates can be made for ungaged sites upstream or downstream from gaging stations by using a method that transfers flood-frequency data at the gaging station to the ungaged site by using a drainage-area ratio adjustment equation. The peak discharge for a given recurrence interval at the gaging station, drainage-area ratio, and the drainage-area exponent from the regional regression equation of the respective region is used to transfer the peak discharge for the recurrence interval to the ungaged site. Maximum observed peak discharge as related to drainage area was determined for New Mexico. Extreme events are commonly used in the design and appraisal of bridge crossings and other structures. Bridge-scour evaluations are commonly made by using the 500-year peak discharge for these appraisals. Peak-discharge data collected at 293 gaging stations and 367 miscellaneous sites were used to develop a maximum peak-discharge relation as an alternative method of estimating peak discharge of an extreme event such as a maximum probable flood.
Liu, Geng; Niu, Junjie; Zhang, Chao; Guo, Guanlin
2015-12-01
Data distribution is usually skewed severely by the presence of hot spots in contaminated sites. This causes difficulties for accurate geostatistical data transformation. Three types of typical normal distribution transformation methods termed the normal score, Johnson, and Box-Cox transformations were applied to compare the effects of spatial interpolation with normal distribution transformation data of benzo(b)fluoranthene in a large-scale coking plant-contaminated site in north China. Three normal transformation methods decreased the skewness and kurtosis of the benzo(b)fluoranthene, and all the transformed data passed the Kolmogorov-Smirnov test threshold. Cross validation showed that Johnson ordinary kriging has a minimum root-mean-square error of 1.17 and a mean error of 0.19, which was more accurate than the other two models. The area with fewer sampling points and that with high levels of contamination showed the largest prediction standard errors based on the Johnson ordinary kriging prediction map. We introduce an ideal normal transformation method prior to geostatistical estimation for severely skewed data, which enhances the reliability of risk estimation and improves the accuracy for determination of remediation boundaries.
An Investigation of the Standard Errors of Expected A Posteriori Ability Estimates.
ERIC Educational Resources Information Center
De Ayala, R. J.; And Others
Expected a posteriori has a number of advantages over maximum likelihood estimation or maximum a posteriori (MAP) estimation methods. These include ability estimates (thetas) for all response patterns, less regression towards the mean than MAP ability estimates, and a lower average squared error. R. D. Bock and R. J. Mislevy (1982) state that the…
Liu, Xue-song; Sun, Fen-fang; Jin, Ye; Wu, Yong-jiang; Gu, Zhi-xin; Zhu, Li; Yan, Dong-lan
2015-12-01
A novel method was developed for the rapid determination of multi-indicators in corni fructus by means of near infrared (NIR) spectroscopy. Particle swarm optimization (PSO) based least squares support vector machine was investigated to increase the levels of quality control. The calibration models of moisture, extractum, morroniside and loganin were established using the PSO-LS-SVM algorithm. The performance of PSO-LS-SVM models was compared with partial least squares regression (PLSR) and back propagation artificial neural network (BP-ANN). The calibration and validation results of PSO-LS-SVM were superior to both PLS and BP-ANN. For PSO-LS-SVM models, the correlation coefficients (r) of calibrations were all above 0.942. The optimal prediction results were also achieved by PSO-LS-SVM models with the RMSEP (root mean square error of prediction) and RSEP (relative standard errors of prediction) less than 1.176 and 15.5% respectively. The results suggest that PSO-LS-SVM algorithm has a good model performance and high prediction accuracy. NIR has a potential value for rapid determination of multi-indicators in Corni Fructus.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Yunlong; Wang, Aiping; Guo, Lei
This paper presents an error-entropy minimization tracking control algorithm for a class of dynamic stochastic system. The system is represented by a set of time-varying discrete nonlinear equations with non-Gaussian stochastic input, where the statistical properties of stochastic input are unknown. By using Parzen windowing with Gaussian kernel to estimate the probability densities of errors, recursive algorithms are then proposed to design the controller such that the tracking error can be minimized. The performance of the error-entropy minimization criterion is compared with the mean-square-error minimization in the simulation results.
Error-Based Design Space Windowing
NASA Technical Reports Server (NTRS)
Papila, Melih; Papila, Nilay U.; Shyy, Wei; Haftka, Raphael T.; Fitz-Coy, Norman
2002-01-01
Windowing of design space is considered in order to reduce the bias errors due to low-order polynomial response surfaces (RS). Standard design space windowing (DSW) uses a region of interest by setting a requirement on response level and checks it by a global RS predictions over the design space. This approach, however, is vulnerable since RS modeling errors may lead to the wrong region to zoom on. The approach is modified by introducing an eigenvalue error measure based on point-to-point mean squared error criterion. Two examples are presented to demonstrate the benefit of the error-based DSW.
Comparison of Methods for Estimating Low Flow Characteristics of Streams
Tasker, Gary D.
1987-01-01
Four methods for estimating the 7-day, 10-year and 7-day, 20-year low flows for streams are compared by the bootstrap method. The bootstrap method is a Monte Carlo technique in which random samples are drawn from an unspecified sampling distribution defined from observed data. The nonparametric nature of the bootstrap makes it suitable for comparing methods based on a flow series for which the true distribution is unknown. Results show that the two methods based on hypothetical distribution (Log-Pearson III and Weibull) had lower mean square errors than did the G. E. P. Box-D. R. Cox transformation method or the Log-W. C. Boughton method which is based on a fit of plotting positions.
Tiyip, Tashpolat; Ding, Jianli; Zhang, Dong; Liu, Wei; Wang, Fei; Tashpolat, Nigara
2017-01-01
Effective pretreatment of spectral reflectance is vital to model accuracy in soil parameter estimation. However, the classic integer derivative has some disadvantages, including spectral information loss and the introduction of high-frequency noise. In this paper, the fractional order derivative algorithm was applied to the pretreatment and partial least squares regression (PLSR) was used to assess the clay content of desert soils. Overall, 103 soil samples were collected from the Ebinur Lake basin in the Xinjiang Uighur Autonomous Region of China, and used as data sets for calibration and validation. Following laboratory measurements of spectral reflectance and clay content, the raw spectral reflectance and absorbance data were treated using the fractional derivative order from the 0.0 to the 2.0 order (order interval: 0.2). The ratio of performance to deviation (RPD), determinant coefficients of calibration (Rc2), root mean square errors of calibration (RMSEC), determinant coefficients of prediction (Rp2), and root mean square errors of prediction (RMSEP) were applied to assess the performance of predicting models. The results showed that models built on the fractional derivative order performed better than when using the classic integer derivative. Comparison of the predictive effects of 22 models for estimating clay content, calibrated by PLSR, showed that those models based on the fractional derivative 1.8 order of spectral reflectance (Rc2 = 0.907, RMSEC = 0.425%, Rp2 = 0.916, RMSEP = 0.364%, and RPD = 2.484 ≥ 2.000) and absorbance (Rc2 = 0.888, RMSEC = 0.446%, Rp2 = 0.918, RMSEP = 0.383% and RPD = 2.511 ≥ 2.000) were most effective. Furthermore, they performed well in quantitative estimations of the clay content of soils in the study area. PMID:28934274
Fadzlillah, Nurrulhidayah Ahmad; Rohman, Abdul; Ismail, Amin; Mustafa, Shuhaimi; Khatib, Alfi
2013-01-01
In dairy product sector, butter is one of the potential sources of fat soluble vitamins, namely vitamin A, D, E, K; consequently, butter is taken into account as high valuable price from other dairy products. This fact has attracted unscrupulous market players to blind butter with other animal fats to gain economic profit. Animal fats like mutton fat (MF) are potential to be mixed with butter due to the similarity in terms of fatty acid composition. This study focused on the application of FTIR-ATR spectroscopy in conjunction with chemometrics for classification and quantification of MF as adulterant in butter. The FTIR spectral region of 3910-710 cm⁻¹ was used for classification between butter and butter blended with MF at various concentrations with the aid of discriminant analysis (DA). DA is able to classify butter and adulterated butter without any mistakenly grouped. For quantitative analysis, partial least square (PLS) regression was used to develop a calibration model at the frequency regions of 3910-710 cm⁻¹. The equation obtained for the relationship between actual value of MF and FTIR predicted values of MF in PLS calibration model was y = 0.998x + 1.033, with the values of coefficient of determination (R²) and root mean square error of calibration are 0.998 and 0.046% (v/v), respectively. The PLS calibration model was subsequently used for the prediction of independent samples containing butter in the binary mixtures with MF. Using 9 principal components, root mean square error of prediction (RMSEP) is 1.68% (v/v). The results showed that FTIR spectroscopy can be used for the classification and quantification of MF in butter formulation for verification purposes.
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kleijnen, J.P.C.; Helton, J.C.
1999-04-01
The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less
NASA Astrophysics Data System (ADS)
Yehia, Ali M.; Mohamed, Heba M.
2016-01-01
Three advanced chemmometric-assisted spectrophotometric methods namely; Concentration Residuals Augmented Classical Least Squares (CRACLS), Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) and Principal Component Analysis-Artificial Neural Networks (PCA-ANN) were developed, validated and benchmarked to PLS calibration; to resolve the severely overlapped spectra and simultaneously determine; Paracetamol (PAR), Guaifenesin (GUA) and Phenylephrine (PHE) in their ternary mixture and in presence of p-aminophenol (AP) the main degradation product and synthesis impurity of Paracetamol. The analytical performance of the proposed methods was described by percentage recoveries, root mean square error of calibration and standard error of prediction. The four multivariate calibration methods could be directly used without any preliminary separation step and successfully applied for pharmaceutical formulation analysis, showing no excipients' interference.
1984-12-01
total sum of squares at the center points minus the correction factor for the mean at the center points ( SSpe =Y’Y-nlY), where n1 is the number of...SSlac=SSres- SSpe ). The sum of squares due to pure error estimates 0" and the sum of squares due to lack-of-fit estimates 0’" plus a bias term if...Response Surface Methodology Source d.f. SS MS Regression n b’X1 Y b’XVY/n Residual rn-n Y’Y-b’X’ *Y (Y’Y-b’X’Y)/(n-n) Pure Error ni-i Y’Y-nl1Y SSpe / (ni
Kuligowski, Julia; Carrión, David; Quintás, Guillermo; Garrigues, Salvador; de la Guardia, Miguel
2011-01-01
The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).
NASA Technical Reports Server (NTRS)
Chelton, Dudley B.; Schlax, Michael G.
1991-01-01
The sampling error of an arbitrary linear estimate of a time-averaged quantity constructed from a time series of irregularly spaced observations at a fixed located is quantified through a formalism. The method is applied to satellite observations of chlorophyll from the coastal zone color scanner. The two specific linear estimates under consideration are the composite average formed from the simple average of all observations within the averaging period and the optimal estimate formed by minimizing the mean squared error of the temporal average based on all the observations in the time series. The resulting suboptimal estimates are shown to be more accurate than composite averages. Suboptimal estimates are also found to be nearly as accurate as optimal estimates using the correct signal and measurement error variances and correlation functions for realistic ranges of these parameters, which makes it a viable practical alternative to the composite average method generally employed at present.
Creating a Satellite-Based Record of Tropospheric Ozone
NASA Technical Reports Server (NTRS)
Oetjen, Hilke; Payne, Vivienne H.; Kulawik, Susan S.; Eldering, Annmarie; Worden, John; Edwards, David P.; Francis, Gene L.; Worden, Helen M.
2013-01-01
The TES retrieval algorithm has been applied to IASI radiances. We compare the retrieved ozone profiles with ozone sonde profiles for mid-latitudes for the year 2008. We find a positive bias in the IASI ozone profiles in the UTLS region of up to 22 %. The spatial coverage of the IASI instrument allows sampling of effectively the same air mass with several IASI scenes simultaneously. Comparisons of the root-mean-square of an ensemble of IASI profiles to theoretical errors indicate that the measurement noise and the interference of temperature and water vapour on the retrieval together mostly explain the empirically derived random errors. The total degrees of freedom for signal of the retrieval for ozone are 3.1 +/- 0.2 and the tropospheric degrees of freedom are 1.0 +/- 0.2 for the described cases. IASI ozone profiles agree within the error bars with coincident ozone profiles derived from a TES stare sequence for the ozone sonde station at Bratt's Lake (50.2 deg N, 104.7 deg W).
NASA Astrophysics Data System (ADS)
Mai, W.; Zhang, J.-F.; Zhao, X.-M.; Li, Z.; Xu, Z.-W.
2017-11-01
Wastewater from the dye industry is typically analyzed using a standard method for measurement of chemical oxygen demand (COD) or by a single-wavelength spectroscopic method. To overcome the disadvantages of these methods, ultraviolet-visible (UV-Vis) spectroscopy was combined with principal component regression (PCR) and partial least squares regression (PLSR) in this study. Unlike the standard method, this method does not require digestion of the samples for preparation. Experiments showed that the PLSR model offered high prediction performance for COD, with a mean relative error of about 5% for two dyes. This error is similar to that obtained with the standard method. In this study, the precision of the PLSR model decreased with the number of dye compounds present. It is likely that multiple models will be required in reality, and the complexity of a COD monitoring system would be greatly reduced if the PLSR model is used because it can include several dyes. UV-Vis spectroscopy with PLSR successfully enhanced the performance of COD prediction for dye wastewater and showed good potential for application in on-line water quality monitoring.
Doros, Gheorghe; Pencina, Michael; Rybin, Denis; Meisner, Allison; Fava, Maurizio
2013-07-20
Previous authors have proposed the sequential parallel comparison design (SPCD) to address the issue of high placebo response rate in clinical trials. The original use of SPCD focused on binary outcomes, but recent use has since been extended to continuous outcomes that arise more naturally in many fields, including psychiatry. Analytic methods proposed to date for analysis of SPCD trial continuous data included methods based on seemingly unrelated regression and ordinary least squares. Here, we propose a repeated measures linear model that uses all outcome data collected in the trial and accounts for data that are missing at random. An appropriate contrast formulated after the model has been fit can be used to test the primary hypothesis of no difference in treatment effects between study arms. Our extensive simulations show that when compared with the other methods, our approach preserves the type I error even for small sample sizes and offers adequate power and the smallest mean squared error under a wide variety of assumptions. We recommend consideration of our approach for analysis of data coming from SPCD trials. Copyright © 2013 John Wiley & Sons, Ltd.
Analysis of Students' Error in Learning of Quadratic Equations
ERIC Educational Resources Information Center
Zakaria, Effandi; Ibrahim; Maat, Siti Mistima
2010-01-01
The purpose of the study was to determine the students' error in learning quadratic equation. The samples were 30 form three students from a secondary school in Jambi, Indonesia. Diagnostic test was used as the instrument of this study that included three components: factorization, completing the square and quadratic formula. Diagnostic interview…
Wavelet-based multiscale performance analysis: An approach to assess and improve hydrological models
NASA Astrophysics Data System (ADS)
Rathinasamy, Maheswaran; Khosa, Rakesh; Adamowski, Jan; ch, Sudheer; Partheepan, G.; Anand, Jatin; Narsimlu, Boini
2014-12-01
The temporal dynamics of hydrological processes are spread across different time scales and, as such, the performance of hydrological models cannot be estimated reliably from global performance measures that assign a single number to the fit of a simulated time series to an observed reference series. Accordingly, it is important to analyze model performance at different time scales. Wavelets have been used extensively in the area of hydrological modeling for multiscale analysis, and have been shown to be very reliable and useful in understanding dynamics across time scales and as these evolve in time. In this paper, a wavelet-based multiscale performance measure for hydrological models is proposed and tested (i.e., Multiscale Nash-Sutcliffe Criteria and Multiscale Normalized Root Mean Square Error). The main advantage of this method is that it provides a quantitative measure of model performance across different time scales. In the proposed approach, model and observed time series are decomposed using the Discrete Wavelet Transform (known as the à trous wavelet transform), and performance measures of the model are obtained at each time scale. The applicability of the proposed method was explored using various case studies-both real as well as synthetic. The synthetic case studies included various kinds of errors (e.g., timing error, under and over prediction of high and low flows) in outputs from a hydrologic model. The real time case studies investigated in this study included simulation results of both the process-based Soil Water Assessment Tool (SWAT) model, as well as statistical models, namely the Coupled Wavelet-Volterra (WVC), Artificial Neural Network (ANN), and Auto Regressive Moving Average (ARMA) methods. For the SWAT model, data from Wainganga and Sind Basin (India) were used, while for the Wavelet Volterra, ANN and ARMA models, data from the Cauvery River Basin (India) and Fraser River (Canada) were used. The study also explored the effect of the choice of the wavelets in multiscale model evaluation. It was found that the proposed wavelet-based performance measures, namely the MNSC (Multiscale Nash-Sutcliffe Criteria) and MNRMSE (Multiscale Normalized Root Mean Square Error), are a more reliable measure than traditional performance measures such as the Nash-Sutcliffe Criteria (NSC), Root Mean Square Error (RMSE), and Normalized Root Mean Square Error (NRMSE). Further, the proposed methodology can be used to: i) compare different hydrological models (both physical and statistical models), and ii) help in model calibration.
Improving Arterial Spin Labeling by Using Deep Learning.
Kim, Ki Hwan; Choi, Seung Hong; Park, Sung-Hong
2018-05-01
Purpose To develop a deep learning algorithm that generates arterial spin labeling (ASL) perfusion images with higher accuracy and robustness by using a smaller number of subtraction images. Materials and Methods For ASL image generation from pair-wise subtraction, we used a convolutional neural network (CNN) as a deep learning algorithm. The ground truth perfusion images were generated by averaging six or seven pairwise subtraction images acquired with (a) conventional pseudocontinuous arterial spin labeling from seven healthy subjects or (b) Hadamard-encoded pseudocontinuous ASL from 114 patients with various diseases. CNNs were trained to generate perfusion images from a smaller number (two or three) of subtraction images and evaluated by means of cross-validation. CNNs from the patient data sets were also tested on 26 separate stroke data sets. CNNs were compared with the conventional averaging method in terms of mean square error and radiologic score by using a paired t test and/or Wilcoxon signed-rank test. Results Mean square errors were approximately 40% lower than those of the conventional averaging method for the cross-validation with the healthy subjects and patients and the separate test with the patients who had experienced a stroke (P < .001). Region-of-interest analysis in stroke regions showed that cerebral blood flow maps from CNN (mean ± standard deviation, 19.7 mL per 100 g/min ± 9.7) had smaller mean square errors than those determined with the conventional averaging method (43.2 ± 29.8) (P < .001). Radiologic scoring demonstrated that CNNs suppressed noise and motion and/or segmentation artifacts better than the conventional averaging method did (P < .001). Conclusion CNNs provided superior perfusion image quality and more accurate perfusion measurement compared with those of the conventional averaging method for generation of ASL images from pair-wise subtraction images. © RSNA, 2017.
Azeez, Adeboye; Obaromi, Davies; Odeyemi, Akinwumi; Ndege, James; Muntabayi, Ruffin
2016-07-26
Tuberculosis (TB) is a deadly infectious disease caused by Mycobacteria tuberculosis. Tuberculosis as a chronic and highly infectious disease is prevalent in almost every part of the globe. More than 95% of TB mortality occurs in low/middle income countries. In 2014, approximately 10 million people were diagnosed with active TB and two million died from the disease. In this study, our aim is to compare the predictive powers of the seasonal autoregressive integrated moving average (SARIMA) and neural network auto-regression (SARIMA-NNAR) models of TB incidence and analyse its seasonality in South Africa. TB incidence cases data from January 2010 to December 2015 were extracted from the Eastern Cape Health facility report of the electronic Tuberculosis Register (ERT.Net). A SARIMA model and a combined model of SARIMA model and a neural network auto-regression (SARIMA-NNAR) model were used in analysing and predicting the TB data from 2010 to 2015. Simulation performance parameters of mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), mean percent error (MPE), mean absolute scaled error (MASE) and mean absolute percentage error (MAPE) were applied to assess the better performance of prediction between the models. Though practically, both models could predict TB incidence, the combined model displayed better performance. For the combined model, the Akaike information criterion (AIC), second-order AIC (AICc) and Bayesian information criterion (BIC) are 288.56, 308.31 and 299.09 respectively, which were lower than the SARIMA model with corresponding values of 329.02, 327.20 and 341.99, respectively. The seasonality trend of TB incidence was forecast to have a slightly increased seasonal TB incidence trend from the SARIMA-NNAR model compared to the single model. The combined model indicated a better TB incidence forecasting with a lower AICc. The model also indicates the need for resolute intervention to reduce infectious disease transmission with co-infection with HIV and other concomitant diseases, and also at festival peak periods.
Implementation of neural network for color properties of polycarbonates
NASA Astrophysics Data System (ADS)
Saeed, U.; Ahmad, S.; Alsadi, J.; Ross, D.; Rizvi, G.
2014-05-01
In present paper, the applicability of artificial neural networks (ANN) is investigated for color properties of plastics. The neural networks toolbox of Matlab 6.5 is used to develop and test the ANN model on a personal computer. An optimal design is completed for 10, 12, 14,16,18 & 20 hidden neurons on single hidden layer with five different algorithms: batch gradient descent (GD), batch variable learning rate (GDX), resilient back-propagation (RP), scaled conjugate gradient (SCG), levenberg-marquardt (LM) in the feed forward back-propagation neural network model. The training data for ANN is obtained from experimental measurements. There were twenty two inputs including resins, additives & pigments while three tristimulus color values L*, a* and b* were used as output layer. Statistical analysis in terms of Root-Mean-Squared (RMS), absolute fraction of variance (R squared), as well as mean square error is used to investigate the performance of ANN. LM algorithm with fourteen neurons on hidden layer in Feed Forward Back-Propagation of ANN model has shown best result in the present study. The degree of accuracy of the ANN model in reduction of errors is proven acceptable in all statistical analysis and shown in results. However, it was concluded that ANN provides a feasible method in error reduction in specific color tristimulus values.
NASA Astrophysics Data System (ADS)
Luna, Aderval S.; da Silva, Arnaldo P.; Ferré, Joan; Boqué, Ricard
This research work describes two studies for the classification and characterization of edible oils and its quality parameters through Fourier transform mid infrared spectroscopy (FT-mid-IR) together with chemometric methods. The discrimination of canola, sunflower, corn and soybean oils was investigated using SVM-DA, SIMCA and PLS-DA. Using FT-mid-IR, DPLS was able to classify 100% of the samples from the validation set, but SIMCA and SVM-DA were not. The quality parameters: refraction index and relative density of edible oils were obtained from reference methods. Prediction models for FT-mid-IR spectra were calculated for these quality parameters using partial least squares (PLS) and support vector machines (SVM). Several preprocessing alternatives (first derivative, multiplicative scatter correction, mean centering, and standard normal variate) were investigated. The best result for the refraction index was achieved with SVM as well as for the relative density except when the preprocessing combination of mean centering and first derivative was used. For both of quality parameters, the best results obtained for the figures of merit expressed by the root mean square error of cross validation (RMSECV) and prediction (RMSEP) were equal to 0.0001.
NASA Technical Reports Server (NTRS)
Kummerow, Christian; Giglio, Louis
1994-01-01
A multi channel physical approach for retrieving rainfall and its vertical structure from Special Sensor Microwave/Imager (SSM/I) observations is examined. While a companion paper was devoted exclusively to the description of the algorithm, its strengths, and its limitations, the main focus of this paper is to report on the results, applicability, and expected accuraciesfrom this algorithm. Some examples are given that compare retrieved results with ground-based radar data from different geographical regions to illustrate the performance and utility of the algorithm under distinct rainfall conditions. More quantitative validation is accomplished using two months of radar data from Darwin, Australia, and the radar network over Japan. Instantaneous comparisons at Darwin indicate that root-mean-square errors for 1.25 deg areas over water are 0.09 mm/h compared to the mean rainfall value of 0.224 mm/h while the correlation exceeds 0.9. Similar results are obtained over the Japanese validation site with rms errors of 0.615 mm/h compared to the mean of 0.0880 mm/h and a correlation of 0.9. Results are less encouraging over land with root-mean-square errors somewhat larger than the mean rain rates and correlations of only 0.71 and 0.62 for Darwin and Japan, respectively. These validation studies are further used in combination with the theoretical treatment of expected accuracies developed in the companion paper to define error estimates on a broader scale than individual radar sites from which the errors may be analyzed. Comparisons with simpler techniques that are based on either emission or scattering measurements are used to illustrate the fact that the current algorithm, while better correlated with the emission methods over water, cannot be reduced to either of these simpler methods.
NASA Astrophysics Data System (ADS)
Liu, Fei; He, Yong
2008-03-01
Three different chemometric methods were performed for the determination of sugar content of cola soft drinks using visible and near infrared spectroscopy (Vis/NIRS). Four varieties of colas were prepared and 180 samples (45 samples for each variety) were selected for the calibration set, while 60 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay, standard normal variate (SNV) and Savitzky-Golay first derivative transformation were applied for the pre-processing of spectral data. The first eleven principal components (PCs) extracted by partial least squares (PLS) analysis were employed as the inputs of BP neural network (BPNN) and least squares-support vector machine (LS-SVM) model. Then the BPNN model with the optimal structural parameters and LS-SVM model with radial basis function (RBF) kernel were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias for prediction were 0.971, 1.259 and -0.335 for PLS, 0.986, 0.763, and -0.042 for BPNN, while 0.978, 0.995 and -0.227 for LS-SVM, respectively. All the three methods supplied a high and satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be utilized as a high precision way for the determination of sugar content of cola soft drinks.
Monitoring of beer fermentation based on hybrid electronic tongue.
Kutyła-Olesiuk, Anna; Zaborowski, Michał; Prokaryn, Piotr; Ciosek, Patrycja
2012-10-01
Monitoring of biotechnological processes, including fermentation is extremely important because of the rapidly occurring changes in the composition of the samples during the production. In the case of beer, the analysis of physicochemical parameters allows for the determination of the stage of fermentation process and the control of its possible perturbations. As a tool to control the beer production process a sensor array can be used, composed of potentiometric and voltammetric sensors (so-called hybrid Electronic Tongue, h-ET). The aim of this study is to apply electronic tongue system to distinguish samples obtained during alcoholic fermentation. The samples originate from batch of homemade beer fermentation and from two stages of the process: fermentation reaction and maturation of beer. The applied sensor array consists of 10 miniaturized ion-selective electrodes (potentiometric ET) and silicon based 3-electrode voltammetric transducers (voltammetric ET). The obtained results were processed using Partial Least Squares (PLS) and Partial Least Squares-Discriminant Analysis (PLS-DA). For potentiometric data, voltammetric data, and combined potentiometric and voltammetric data, comparison of the classification ability was conducted based on Root Mean Squared Error (RMSE), sensitivity, specificity, and coefficient F calculation. It is shown, that in the contrast to the separately used techniques, the developed hybrid system allowed for a better characterization of the beer samples. Data fusion in hybrid ET enables to obtain better results both in qualitative analysis (RMSE, specificity, sensitivity) and in quantitative analysis (RMSE, R(2), a, b). Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Yeganeh, B.; Motlagh, M. Shafie Pour; Rashidi, Y.; Kamalan, H.
2012-08-01
Due to the health impacts caused by exposures to air pollutants in urban areas, monitoring and forecasting of air quality parameters have become popular as an important topic in atmospheric and environmental research today. The knowledge on the dynamics and complexity of air pollutants behavior has made artificial intelligence models as a useful tool for a more accurate pollutant concentration prediction. This paper focuses on an innovative method of daily air pollution prediction using combination of Support Vector Machine (SVM) as predictor and Partial Least Square (PLS) as a data selection tool based on the measured values of CO concentrations. The CO concentrations of Rey monitoring station in the south of Tehran, from Jan. 2007 to Feb. 2011, have been used to test the effectiveness of this method. The hourly CO concentrations have been predicted using the SVM and the hybrid PLS-SVM models. Similarly, daily CO concentrations have been predicted based on the aforementioned four years measured data. Results demonstrated that both models have good prediction ability; however the hybrid PLS-SVM has better accuracy. In the analysis presented in this paper, statistic estimators including relative mean errors, root mean squared errors and the mean absolute relative error have been employed to compare performances of the models. It has been concluded that the errors decrease after size reduction and coefficients of determination increase from 56 to 81% for SVM model to 65-85% for hybrid PLS-SVM model respectively. Also it was found that the hybrid PLS-SVM model required lower computational time than SVM model as expected, hence supporting the more accurate and faster prediction ability of hybrid PLS-SVM model.
NASA Astrophysics Data System (ADS)
Wu, Kang-Hung; Su, Ching-Lun; Chu, Yen-Hsyang
2015-03-01
In this article, we use the International Reference Ionosphere (IRI) model to simulate temporal and spatial distributions of global E region electron densities retrieved by the FORMOSAT-3/COSMIC satellites by means of GPS radio occultation (RO) technique. Despite regional discrepancies in the magnitudes of the E region electron density, the IRI model simulations can, on the whole, describe the COSMIC measurements in quality and quantity. On the basis of global ionosonde network and the IRI model, the retrieval errors of the global COSMIC-measured E region peak electron density (NmE) from July 2006 to July 2011 are examined and simulated. The COSMIC measurement and the IRI model simulation both reveal that the magnitudes of the percentage error (PE) and root mean-square-error (RMSE) of the relative RO retrieval errors of the NmE values are dependent on local time (LT) and geomagnetic latitude, with minimum in the early morning and at high latitudes and maximum in the afternoon and at middle latitudes. In addition, the seasonal variation of PE and RMSE values seems to be latitude dependent. After removing the IRI model-simulated GPS RO retrieval errors from the original COSMIC measurements, the average values of the annual and monthly mean percentage errors of the RO retrieval errors of the COSMIC-measured E region electron density are, respectively, substantially reduced by a factor of about 2.95 and 3.35, and the corresponding root-mean-square errors show averaged decreases of 15.6% and 15.4%, respectively. It is found that, with this process, the largest reduction in the PE and RMSE of the COSMIC-measured NmE occurs at the equatorial anomaly latitudes 10°N-30°N in the afternoon from 14 to 18 LT, with a factor of 25 and 2, respectively. Statistics show that the residual errors that remained in the corrected COSMIC-measured NmE vary in a range of -20% to 38%, which are comparable to or larger than the percentage errors of the IRI-predicted NmE fluctuating in a range of -6.5% to 20%.
Sensitivity of Fit Indices to Misspecification in Growth Curve Models
ERIC Educational Resources Information Center
Wu, Wei; West, Stephen G.
2010-01-01
This study investigated the sensitivity of fit indices to model misspecification in within-individual covariance structure, between-individual covariance structure, and marginal mean structure in growth curve models. Five commonly used fit indices were examined, including the likelihood ratio test statistic, root mean square error of…
Metameric MIMO-OOK transmission scheme using multiple RGB LEDs.
Bui, Thai-Chien; Cusani, Roberto; Scarano, Gaetano; Biagi, Mauro
2018-05-28
In this work, we propose a novel visible light communication (VLC) scheme utilizing multiple different red green and blue triplets each with a different emission spectrum of red, green and blue for mitigating the effect of interference due to different colors using spatial multiplexing. On-off keying modulation is considered and its effect on light emission in terms of flickering, dimming and color rendering is discussed so as to demonstrate how metameric properties have been considered. At the receiver, multiple photodiodes with color filter-tuned on each transmit light emitting diode (LED) are employed. Three different detection mechanisms of color zero forcing, minimum mean square error estimation and minimum mean square error equalization are then proposed. The system performance of the proposed scheme is evaluated both with computer simulations and tests with an Arduino board implementation.
A comparative study of optimum and suboptimum direct-detection laser ranging receivers
NASA Technical Reports Server (NTRS)
Abshire, J. B.
1978-01-01
A summary of previously proposed receiver strategies for direct-detection laser ranging receivers is presented. Computer simulations are used to compare performance of candidate implementation strategies in the 1- to 100-photoelectron region. Under the condition of no background radiation, the maximum-likelihood and minimum mean-square error estimators were found to give the same performance for both bell-shaped and rectangular optical-pulse shapes. For signal energies greater than 100 photoelectrons, the root-mean-square range error is shown to decrease as Q to the -1/2 power for bell-shaped pulses and Q to the -1 power for rectangular pulses, where Q represents the average pulse energy. Of several receiver implementations presented, the matched-filter peak detector was found to be preferable. A similar configuration, using a constant-fraction discriminator, exhibited a signal-level dependent time bias.
NASA Astrophysics Data System (ADS)
Bunai, Tasya; Rokhmatuloh; Wibowo, Adi
2018-05-01
In this paper, two methods to retrieve the Land Surface Temperature (LST) from thermal infrared data supplied by band 10 and 11 of the Thermal Infrared Sensor (TIRS) onboard the Landsat 8 is compared. The first is mono window algorithm developed by Qin et al. and the second is split window algorithm by Rozenstein et al. The purpose of this study is to perform the spatial distribution of land surface temperature, as well as to determine more accurate algorithm for retrieving land surface temperature by calculated root mean square error (RMSE). Finally, we present comparison the spatial distribution of land surface temperature by both of algorithm, and more accurate algorithm is split window algorithm refers to the root mean square error (RMSE) is 7.69° C.
NASA Astrophysics Data System (ADS)
Zimina, S. V.
2015-06-01
We present the results of statistical analysis of an adaptive antenna array tuned using the least-mean-square error algorithm with quadratic constraint on the useful-signal amplification with allowance for the weight-coefficient fluctuations. Using the perturbation theory, the expressions for the correlation function and power of the output signal of the adaptive antenna array, as well as the formula for the weight-vector covariance matrix are obtained in the first approximation. The fluctuations are shown to lead to the signal distortions at the antenna-array output. The weight-coefficient fluctuations result in the appearance of additional terms in the statistical characteristics of the antenna array. It is also shown that the weight-vector fluctuations are isotropic, i.e., identical in all directions of the weight-coefficient space.
The Estimation of Gestational Age at Birth in Database Studies.
Eberg, Maria; Platt, Robert W; Filion, Kristian B
2017-11-01
Studies on the safety of prenatal medication use require valid estimation of the pregnancy duration. However, gestational age is often incompletely recorded in administrative and clinical databases. Our objective was to compare different approaches to estimating the pregnancy duration. Using data from the Clinical Practice Research Datalink and Hospital Episode Statistics, we examined the following four approaches to estimating missing gestational age: (1) generalized estimating equations for longitudinal data; (2) multiple imputation; (3) estimation based on fetal birth weight and sex; and (4) conventional approaches that assigned a fixed value (39 weeks for all or 39 weeks for full term and 35 weeks for preterm). The gestational age recorded in Hospital Episode Statistics was considered the gold standard. We conducted a simulation study comparing the described approaches in terms of estimated bias and mean square error. A total of 25,929 infants from 22,774 mothers were included in our "gold standard" cohort. The smallest average absolute bias was observed for the generalized estimating equation that included birth weight, while the largest absolute bias occurred when assigning 39-week gestation to all those with missing values. The smallest mean square errors were detected with generalized estimating equations while multiple imputation had the highest mean square errors. The use of generalized estimating equations resulted in the most accurate estimation of missing gestational age when birth weight information was available. In the absence of birth weight, assignment of fixed gestational age based on term/preterm status may be the optimal approach.
Schleier, Jerome J.; Peterson, Robert K.D.; Irvine, Kathryn M.; Marshall, Lucy M.; Weaver, David K.; Preftakes, Collin J.
2012-01-01
One of the more effective ways of managing high densities of adult mosquitoes that vector human and animal pathogens is ultra-low-volume (ULV) aerosol applications of insecticides. The U.S. Environmental Protection Agency uses models that are not validated for ULV insecticide applications and exposure assumptions to perform their human and ecological risk assessments. Currently, there is no validated model that can accurately predict deposition of insecticides applied using ULV technology for adult mosquito management. In addition, little is known about the deposition and drift of small droplets like those used under conditions encountered during ULV applications. The objective of this study was to perform field studies to measure environmental concentrations of insecticides and to develop a validated model to predict the deposition of ULV insecticides. The final regression model was selected by minimizing the Bayesian Information Criterion and its prediction performance was evaluated using k-fold cross validation. Density of the formulation and the density and CMD interaction coefficients were the largest in the model. The results showed that as density of the formulation decreases, deposition increases. The interaction of density and CMD showed that higher density formulations and larger droplets resulted in greater deposition. These results are supported by the aerosol physics literature. A k-fold cross validation demonstrated that the mean square error of the selected regression model is not biased, and the mean square error and mean square prediction error indicated good predictive ability.
Very-short-term wind power prediction by a hybrid model with single- and multi-step approaches
NASA Astrophysics Data System (ADS)
Mohammed, E.; Wang, S.; Yu, J.
2017-05-01
Very-short-term wind power prediction (VSTWPP) has played an essential role for the operation of electric power systems. This paper aims at improving and applying a hybrid method of VSTWPP based on historical data. The hybrid method is combined by multiple linear regressions and least square (MLR&LS), which is intended for reducing prediction errors. The predicted values are obtained through two sub-processes:1) transform the time-series data of actual wind power into the power ratio, and then predict the power ratio;2) use the predicted power ratio to predict the wind power. Besides, the proposed method can include two prediction approaches: single-step prediction (SSP) and multi-step prediction (MSP). WPP is tested comparatively by auto-regressive moving average (ARMA) model from the predicted values and errors. The validity of the proposed hybrid method is confirmed in terms of error analysis by using probability density function (PDF), mean absolute percent error (MAPE) and means square error (MSE). Meanwhile, comparison of the correlation coefficients between the actual values and the predicted values for different prediction times and window has confirmed that MSP approach by using the hybrid model is the most accurate while comparing to SSP approach and ARMA. The MLR&LS is accurate and promising for solving problems in WPP.
Model assessment using a multi-metric ranking technique
NASA Astrophysics Data System (ADS)
Fitzpatrick, P. J.; Lau, Y.; Alaka, G.; Marks, F.
2017-12-01
Validation comparisons of multiple models presents challenges when skill levels are similar, especially in regimes dominated by the climatological mean. Assessing skill separation will require advanced validation metrics and identifying adeptness in extreme events, but maintain simplicity for management decisions. Flexibility for operations is also an asset. This work postulates a weighted tally and consolidation technique which ranks results by multiple types of metrics. Variables include absolute error, bias, acceptable absolute error percentages, outlier metrics, model efficiency, Pearson correlation, Kendall's Tau, reliability Index, multiplicative gross error, and root mean squared differences. Other metrics, such as root mean square difference and rank correlation were also explored, but removed when the information was discovered to be generally duplicative to other metrics. While equal weights are applied, weights could be altered depending for preferred metrics. Two examples are shown comparing ocean models' currents and tropical cyclone products, including experimental products. The importance of using magnitude and direction for tropical cyclone track forecasts instead of distance, along-track, and cross-track are discussed. Tropical cyclone intensity and structure prediction are also assessed. Vector correlations are not included in the ranking process, but found useful in an independent context, and will be briefly reported.
Wu, Jibo
2016-01-01
In this article, a generalized difference-based ridge estimator is proposed for the vector parameter in a partial linear model when the errors are dependent. It is supposed that some additional linear constraints may hold to the whole parameter space. Its mean-squared error matrix is compared with the generalized restricted difference-based estimator. Finally, the performance of the new estimator is explained by a simulation study and a numerical example.
The microcomputer scientific software series 3: general linear model--analysis of variance.
Harold M. Rauscher
1985-01-01
A BASIC language set of programs, designed for use on microcomputers, is presented. This set of programs will perform the analysis of variance for any statistical model describing either balanced or unbalanced designs. The program computes and displays the degrees of freedom, Type I sum of squares, and the mean square for the overall model, the error, and each factor...
The GEOS Ozone Data Assimilation System: Specification of Error Statistics
NASA Technical Reports Server (NTRS)
Stajner, Ivanka; Riishojgaard, Lars Peter; Rood, Richard B.
2000-01-01
A global three-dimensional ozone data assimilation system has been developed at the Data Assimilation Office of the NASA/Goddard Space Flight Center. The Total Ozone Mapping Spectrometer (TOMS) total ozone and the Solar Backscatter Ultraviolet (SBUV) or (SBUV/2) partial ozone profile observations are assimilated. The assimilation, into an off-line ozone transport model, is done using the global Physical-space Statistical Analysis Scheme (PSAS). This system became operational in December 1999. A detailed description of the statistical analysis scheme, and in particular, the forecast and observation error covariance models is given. A new global anisotropic horizontal forecast error correlation model accounts for a varying distribution of observations with latitude. Correlations are largest in the zonal direction in the tropics where data is sparse. Forecast error variance model is proportional to the ozone field. The forecast error covariance parameters were determined by maximum likelihood estimation. The error covariance models are validated using x squared statistics. The analyzed ozone fields in the winter 1992 are validated against independent observations from ozone sondes and HALOE. There is better than 10% agreement between mean Halogen Occultation Experiment (HALOE) and analysis fields between 70 and 0.2 hPa. The global root-mean-square (RMS) difference between TOMS observed and forecast values is less than 4%. The global RMS difference between SBUV observed and analyzed ozone between 50 and 3 hPa is less than 15%.
Kong, W W; Zhang, C; Liu, F; Gong, A P; He, Y
2013-08-01
The objective of this study was to examine the possibility of applying visible and near-infrared spectroscopy to the quantitative detection of irradiation dose of irradiated milk powder. A total of 150 samples were used: 100 for the calibration set and 50 for the validation set. The samples were irradiated at 5 different dose levels in the dose range 0 to 6.0 kGy. Six different pretreatment methods were compared. The prediction results of full spectra given by linear and nonlinear calibration methods suggested that Savitzky-Golay smoothing and first derivative were suitable pretreatment methods in this study. Regression coefficient analysis was applied to select effective wavelengths (EW). Less than 10 EW were selected and they were useful for portable detection instrument or sensor development. Partial least squares, extreme learning machine, and least squares support vector machine were used. The best prediction performance was achieved by the EW-extreme learning machine model with first-derivative spectra, and correlation coefficients=0.97 and root mean square error of prediction=0.844. This study provided a new approach for the fast detection of irradiation dose of milk powder. The results could be helpful for quality detection and safety monitoring of milk powder. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Rapid detection of talcum powder in tea using FT-IR spectroscopy coupled with chemometrics
Li, Xiaoli; Zhang, Yuying; He, Yong
2016-01-01
This paper investigated the feasibility of Fourier transform infrared transmission (FT-IR) spectroscopy to detect talcum powder illegally added in tea based on chemometric methods. Firstly, 210 samples of tea powder with 13 dose levels of talcum powder were prepared for FT-IR spectra acquirement. In order to highlight the slight variations in FT-IR spectra, smoothing, normalize and standard normal variate (SNV) were employed to preprocess the raw spectra. Among them, SNV preprocessing had the best performance with high correlation of prediction (RP = 0.948) and low root mean square error of prediction (RMSEP = 0.108) of partial least squares (PLS) model. Then 18 characteristic wavenumbers were selected based on a hybrid of backward interval partial least squares (biPLS) regression, competitive adaptive reweighted sampling (CARS) algorithm and successive projections algorithm (SPA). These characteristic wavenumbers only accounted for 0.64% of the full wavenumbers. Following that, 18 characteristic wavenumbers were used to build linear and nonlinear determination models by PLS regression and extreme learning machine (ELM), respectively. The optimal model with RP = 0.963 and RMSEP = 0.137 was achieved by ELM algorithm. These results demonstrated that FT-IR spectroscopy with chemometrics could be used successfully to detect talcum powder in tea. PMID:27468701
Controlled sound field with a dual layer loudspeaker array
NASA Astrophysics Data System (ADS)
Shin, Mincheol; Fazi, Filippo M.; Nelson, Philip A.; Hirono, Fabio C.
2014-08-01
Controlled sound interference has been extensively investigated using a prototype dual layer loudspeaker array comprised of 16 loudspeakers. Results are presented for measures of array performance such as input signal power, directivity of sound radiation and accuracy of sound reproduction resulting from the application of conventional control methods such as minimization of error in mean squared pressure, maximization of energy difference and minimization of weighted pressure error and energy. Procedures for selecting the tuning parameters have also been introduced. With these conventional concepts aimed at the production of acoustically bright and dark zones, all the control methods used require a trade-off between radiation directivity and reproduction accuracy in the bright zone. An alternative solution is proposed which can achieve better performance based on the measures presented simultaneously by inserting a low priority zone named as the “gray” zone. This involves the weighted minimization of mean-squared errors in both bright and dark zones together with the gray zone in which the minimization error is given less importance. This results in the production of directional bright zone in which the accuracy of sound reproduction is maintained with less required input power. The results of simulations and experiments are shown to be in excellent agreement.
New Insights into Handling Missing Values in Environmental Epidemiological Studies
Roda, Célina; Nicolis, Ioannis; Momas, Isabelle; Guihenneuc, Chantal
2014-01-01
Missing data are unavoidable in environmental epidemiologic surveys. The aim of this study was to compare methods for handling large amounts of missing values: omission of missing values, single and multiple imputations (through linear regression or partial least squares regression), and a fully Bayesian approach. These methods were applied to the PARIS birth cohort, where indoor domestic pollutant measurements were performed in a random sample of babies' dwellings. A simulation study was conducted to assess performances of different approaches with a high proportion of missing values (from 50% to 95%). Different simulation scenarios were carried out, controlling the true value of the association (odds ratio of 1.0, 1.2, and 1.4), and varying the health outcome prevalence. When a large amount of data is missing, omitting these missing data reduced statistical power and inflated standard errors, which affected the significance of the association. Single imputation underestimated the variability, and considerably increased risk of type I error. All approaches were conservative, except the Bayesian joint model. In the case of a common health outcome, the fully Bayesian approach is the most efficient approach (low root mean square error, reasonable type I error, and high statistical power). Nevertheless for a less prevalent event, the type I error is increased and the statistical power is reduced. The estimated posterior distribution of the OR is useful to refine the conclusion. Among the methods handling missing values, no approach is absolutely the best but when usual approaches (e.g. single imputation) are not sufficient, joint modelling approach of missing process and health association is more efficient when large amounts of data are missing. PMID:25226278
NASA Astrophysics Data System (ADS)
Gao, Jing; Burt, James E.
2017-12-01
This study investigates the usefulness of a per-pixel bias-variance error decomposition (BVD) for understanding and improving spatially-explicit data-driven models of continuous variables in environmental remote sensing (ERS). BVD is a model evaluation method originated from machine learning and have not been examined for ERS applications. Demonstrated with a showcase regression tree model mapping land imperviousness (0-100%) using Landsat images, our results showed that BVD can reveal sources of estimation errors, map how these sources vary across space, reveal the effects of various model characteristics on estimation accuracy, and enable in-depth comparison of different error metrics. Specifically, BVD bias maps can help analysts identify and delineate model spatial non-stationarity; BVD variance maps can indicate potential effects of ensemble methods (e.g. bagging), and inform efficient training sample allocation - training samples should capture the full complexity of the modeled process, and more samples should be allocated to regions with more complex underlying processes rather than regions covering larger areas. Through examining the relationships between model characteristics and their effects on estimation accuracy revealed by BVD for both absolute and squared errors (i.e. error is the absolute or the squared value of the difference between observation and estimate), we found that the two error metrics embody different diagnostic emphases, can lead to different conclusions about the same model, and may suggest different solutions for performance improvement. We emphasize BVD's strength in revealing the connection between model characteristics and estimation accuracy, as understanding this relationship empowers analysts to effectively steer performance through model adjustments.
NASA Technical Reports Server (NTRS)
Canavos, G. C.
1974-01-01
A study is made of the extent to which the size of the sample affects the accuracy of a quadratic or a cubic polynomial approximation of an experimentally observed quantity, and the trend with regard to improvement in the accuracy of the approximation as a function of sample size is established. The task is made possible through a simulated analysis carried out by the Monte Carlo method in which data are simulated by using several transcendental or algebraic functions as models. Contaminated data of varying amounts are fitted to either quadratic or cubic polynomials, and the behavior of the mean-squared error of the residual variance is determined as a function of sample size. Results indicate that the effect of the size of the sample is significant only for relatively small sizes and diminishes drastically for moderate and large amounts of experimental data.
Soriano, Vincent V; Tesoro, Eljim P; Kane, Sean P
2017-08-01
The Winter-Tozer (WT) equation has been shown to reliably predict free phenytoin levels in healthy patients. In patients with end-stage renal disease (ESRD), phenytoin-albumin binding is altered and, thus, affects interpretation of total serum levels. Although an ESRD WT equation was historically proposed for this population, there is a lack of data evaluating its accuracy. The objective of this study was to determine the accuracy of the ESRD WT equation in predicting free serum phenytoin concentration in patients with ESRD on hemodialysis (HD). A retrospective analysis of adult patients with ESRD on HD and concurrent free and total phenytoin concentrations was conducted. Each patient's true free phenytoin concentration was compared with a calculated value using the ESRD WT equation and a revised version of the ESRD WT equation. A total of 21 patients were included for analysis. The ESRD WT equation produced a percentage error of 75% and a root mean square error of 1.76 µg/mL. Additionally, 67% of the samples had an error >50% when using the ESRD WT equation. A revised equation was found to have high predictive accuracy, with only 5% of the samples demonstrating >50% error. The ESRD WT equation was not accurate in predicting free phenytoin concentration in patients with ESRD on HD. A revised ESRD WT equation was found to be significantly more accurate. Given the small study sample, further studies are required to fully evaluate the clinical utility of the revised ESRD WT equation.
Mishra, Vishal
2015-01-01
The interchange of the protons with the cell wall-bound calcium and magnesium ions at the interface of solution/bacterial cell surface in the biosorption system at various concentrations of protons has been studied in the present work. A mathematical model for establishing the correlation between concentration of protons and active sites was developed and optimized. The sporadic limited residence time reactor was used to titrate the calcium and magnesium ions at the individual data point. The accuracy of the proposed mathematical model was estimated using error functions such as nonlinear regression, adjusted nonlinear regression coefficient, the chi-square test, P-test and F-test. The values of the chi-square test (0.042-0.017), P-test (<0.001-0.04), sum of square errors (0.061-0.016), root mean square error (0.01-0.04) and F-test (2.22-19.92) reported in the present research indicated the suitability of the model over a wide range of proton concentrations. The zeta potential of the bacterium surface at various concentrations of protons was observed to validate the denaturation of active sites.
Modeling error analysis of stationary linear discrete-time filters
NASA Technical Reports Server (NTRS)
Patel, R.; Toda, M.
1977-01-01
The performance of Kalman-type, linear, discrete-time filters in the presence of modeling errors is considered. The discussion is limited to stationary performance, and bounds are obtained for the performance index, the mean-squared error of estimates for suboptimal and optimal (Kalman) filters. The computation of these bounds requires information on only the model matrices and the range of errors for these matrices. Consequently, a design can easily compare the performance of a suboptimal filter with that of the optimal filter, when only the range of errors in the elements of the model matrices is available.
Sloat, J.V.; Gain, W.S.
1995-01-01
Index-velocity data collected with acoustic velocity meters, stage data, and cross-sectional area data were used to calculate discharge at three low-velocity, tidal streamflow stations in north-east Florida. Discharge at three streamflow stations was computed as the product of the channel cross-sectional area and the mean velocity as determined from an index velocity measured in the stream using an acoustic velocity meter. The tidal streamlflow stations used in the study were: Six Mile Creek near Picolata, Fla.; Dunns Creek near Satsuma, Fla.; and the St. Johns River at Buffalo Bluff. Cross-sectional areas at the measurement sections ranged from about 3,000 square feet at Six Mile Creek to about 18,500 square feet at St. Johns River at Buffalo Bluff. Physical characteristics for all three streams were similar except for drainage area. The topography primarily is low-relief, swampy terrain; stream velocities ranged from about -2 to 2 feet per second; and the average change in stage was about 1 foot. Instantaneous discharge was measured using a portable acoustic current meter at each of the three streams to develop a relation between the mean velocity in the stream and the index velocity measured by the acoustic velocity meter. Using least-squares linear regression, a simple linear relation between mean velocity and index velocity was determined. Index velocity was the only significant linear predictor of mean velocity for Six Mile Creek and St. Johns River at Buffalo Bluff. For Dunns Creek, both index velocity and stage were used to develop a multiple-linear predictor of mean velocity. Stage-area curves for each stream were developed from bathymetric data. Instantaneous discharge was computed by multiplying results of relations developed for cross-sectional area and mean velocity. Principal sources of error in the estimated discharge are identified as: (1) instrument errors associated with measurement of stage and index velocity, (2) errors in the representation of mean daily stage and index velocity due to natural variability over time and space, and (3) errors in cross-sectional area and mean-velocity ratings based on stage and index velocity. Standard errors for instantaneous discharge for the median cross-sectional area for Six Mile Creek, Dunns Creek, and St. Johns River at Buffalo Bluff were 94,360, and 1,980 cubic feet per second, respectively. Standard errors for mean daily discharge for the median cross-sectional area for Six Mile Creek, Dunns Creek, and St. Johns River at Buffalo Bluff were 25, 65, and 455 cubic feet per second, respectively. Mean daily discharge at the three sites ranged from about -500 to 1,500 cubic feet per second at Six Mile Creek and Dunns Creek and from about -500 to 15,000 cubic feet per second on the St. Johns River at Buffalo Bluff. For periods of high discharge, the AVM index-velocity method tended to produce estimates accurate with 2 to 6 percent. For periods of moderate discharge, errors in discharge may increase to more than 50 percent. At low flows, errors as a percentage of discharge increase toward infinity.
Quality assessment of MEG-to-MRI coregistrations
NASA Astrophysics Data System (ADS)
Sonntag, Hermann; Haueisen, Jens; Maess, Burkhard
2018-04-01
For high precision in source reconstruction of magnetoencephalography (MEG) or electroencephalography data, high accuracy of the coregistration of sources and sensors is mandatory. Usually, the source space is derived from magnetic resonance imaging (MRI). In most cases, however, no quality assessment is reported for sensor-to-MRI coregistrations. If any, typically root mean squares (RMS) of point residuals are provided. It has been shown, however, that RMS of residuals do not correlate with coregistration errors. We suggest using target registration error (TRE) as criterion for the quality of sensor-to-MRI coregistrations. TRE measures the effect of uncertainty in coregistrations at all points of interest. In total, 5544 data sets with sensor-to-head and 128 head-to-MRI coregistrations, from a single MEG laboratory, were analyzed. An adaptive Metropolis algorithm was used to estimate the optimal coregistration and to sample the coregistration parameters (rotation and translation). We found an average TRE between 1.3 and 2.3 mm at the head surface. Further, we observed a mean absolute difference in coregistration parameters between the Metropolis and iterative closest point algorithm of (1.9 +/- 15){\\hspace{0pt}}\\circ and (1.1 +/- 9) m. A paired sample t-test indicated a significant improvement in goal function minimization by using the Metropolis algorithm. The sampled parameters allowed computation of TRE on the entire grid of the MRI volume. Hence, we recommend the Metropolis algorithm for head-to-MRI coregistrations.
Liu, Xiaofeng Steven
2011-05-01
The use of covariates is commonly believed to reduce the unexplained error variance and the standard error for the comparison of treatment means, but the reduction in the standard error is neither guaranteed nor uniform over different sample sizes. The covariate mean differences between the treatment conditions can inflate the standard error of the covariate-adjusted mean difference and can actually produce a larger standard error for the adjusted mean difference than that for the unadjusted mean difference. When the covariate observations are conceived of as randomly varying from one study to another, the covariate mean differences can be related to a Hotelling's T(2) . Using this Hotelling's T(2) statistic, one can always find a minimum sample size to achieve a high probability of reducing the standard error and confidence interval width for the adjusted mean difference. ©2010 The British Psychological Society.
Quentin, A G; Rodemann, T; Doutreleau, M-F; Moreau, M; Davies, N W; Millard, Peter
2017-01-31
Near-infrared reflectance spectroscopy (NIRS) is frequently used for the assessment of key nutrients of forage or crops but remains underused in ecological and physiological studies, especially to quantify non-structural carbohydrates. The aim of this study was to develop calibration models to assess the content in soluble sugars (fructose, glucose, sucrose) and starch in foliar material of Eucalyptus globulus. A partial least squares (PLS) regression was used on the sample spectral data and was compared to the contents measured using standard wet chemistry methods. The calibration models were validated using a completely independent set of samples. We used key indicators such as the ratio of prediction to deviation (RPD) and the range error ratio to give an assessment of the performance of the calibration models. Accurate calibration models were obtained for fructose and sucrose content (R2 > 0.85, root mean square error of prediction (RMSEP) of 0.95%–1.26% in the validation models), followed by sucrose and total soluble sugar content (R2 ~ 0.70 and RMSEP > 2.3%). In comparison to the others, calibration of the starch model performed very poorly with RPD = 1.70. This study establishes the ability of the NIRS calibration model to infer soluble sugar content in foliar samples of E. globulus in a rapid and cost-effective way. We suggest a complete redevelopment of the starch analysis using more specific quantification such as an HPLC-based technique to reach higher performance in the starch model. Overall, NIRS could serve as a high-throughput phenotyping tool to study plant response to stress factors.
Ebrahimi-Najafabadi, Heshmatollah; Leardi, Riccardo; Oliveri, Paolo; Casolino, Maria Chiara; Jalali-Heravi, Mehdi; Lanteri, Silvia
2012-09-15
The current study presents an application of near infrared spectroscopy for identification and quantification of the fraudulent addition of barley in roasted and ground coffee samples. Nine different types of coffee including pure Arabica, Robusta and mixtures of them at different roasting degrees were blended with four types of barley. The blending degrees were between 2 and 20 wt% of barley. D-optimal design was applied to select 100 and 30 experiments to be used as calibration and test set, respectively. Partial least squares regression (PLS) was employed to build the models aimed at predicting the amounts of barley in coffee samples. In order to obtain simplified models, taking into account only informative regions of the spectral profiles, a genetic algorithm (GA) was applied. A completely independent external set was also used to test the model performances. The models showed excellent predictive ability with root mean square errors (RMSE) for the test and external set equal to 1.4% w/w and 0.8% w/w, respectively. Copyright © 2012 Elsevier B.V. All rights reserved.
Luo, Yu; Li, Wen-Long; Huang, Wen-Hua; Liu, Xue-Hua; Song, Yan-Gang; Qu, Hai-Bin
2017-05-01
A near infrared spectroscopy (NIRS) approach was established for quality control of the alcohol precipitation liquid in the manufacture of Codonopsis Radix. By applying NIRS with multivariate analysis, it was possible to build variation into the calibration sample set, and the Plackett-Burman design, Box-Behnken design, and a concentrating-diluting method were used to obtain the sample set covered with sufficient fluctuation of process parameters and extended concentration information. NIR data were calibrated to predict the four quality indicators using partial least squares regression (PLSR). In the four calibration models, the root mean squares errors of prediction (RMSEPs) were 1.22 μg/ml, 10.5 μg/ml, 1.43 μg/ml, and 0.433% for lobetyolin, total flavonoids, pigments, and total solid contents, respectively. The results indicated that multi-components quantification of the alcohol precipitation liquid of Codonopsis Radix could be achieved with an NIRS-based method, which offers a useful tool for real-time release testing (RTRT) of intermediates in the manufacture of Codonopsis Radix.
A robust nonparametric framework for reconstruction of stochastic differential equation models
NASA Astrophysics Data System (ADS)
Rajabzadeh, Yalda; Rezaie, Amir Hossein; Amindavar, Hamidreza
2016-05-01
In this paper, we employ a nonparametric framework to robustly estimate the functional forms of drift and diffusion terms from discrete stationary time series. The proposed method significantly improves the accuracy of the parameter estimation. In this framework, drift and diffusion coefficients are modeled through orthogonal Legendre polynomials. We employ the least squares regression approach along with the Euler-Maruyama approximation method to learn coefficients of stochastic model. Next, a numerical discrete construction of mean squared prediction error (MSPE) is established to calculate the order of Legendre polynomials in drift and diffusion terms. We show numerically that the new method is robust against the variation in sample size and sampling rate. The performance of our method in comparison with the kernel-based regression (KBR) method is demonstrated through simulation and real data. In case of real dataset, we test our method for discriminating healthy electroencephalogram (EEG) signals from epilepsy ones. We also demonstrate the efficiency of the method through prediction in the financial data. In both simulation and real data, our algorithm outperforms the KBR method.
Koch, Cosima; Posch, Andreas E; Goicoechea, Héctor C; Herwig, Christoph; Lendl, Bernhard
2014-01-07
This paper presents the quantification of Penicillin V and phenoxyacetic acid, a precursor, inline during Pencillium chrysogenum fermentations by FTIR spectroscopy and partial least squares (PLS) regression and multivariate curve resolution - alternating least squares (MCR-ALS). First, the applicability of an attenuated total reflection FTIR fiber optic probe was assessed offline by measuring standards of the analytes of interest and investigating matrix effects of the fermentation broth. Then measurements were performed inline during four fed-batch fermentations with online HPLC for the determination of Penicillin V and phenoxyacetic acid as reference analysis. PLS and MCR-ALS models were built using these data and validated by comparison of single analyte spectra with the selectivity ratio of the PLS models and the extracted spectral traces of the MCR-ALS models, respectively. The achieved root mean square errors of cross-validation for the PLS regressions were 0.22 g L(-1) for Penicillin V and 0.32 g L(-1) for phenoxyacetic acid and the root mean square errors of prediction for MCR-ALS were 0.23 g L(-1) for Penicillin V and 0.15 g L(-1) for phenoxyacetic acid. A general work-flow for building and assessing chemometric regression models for the quantification of multiple analytes in bioprocesses by FTIR spectroscopy is given. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Maggin, Daniel M.; Swaminathan, Hariharan; Rogers, Helen J.; O'Keeffe, Breda V.; Sugai, George; Horner, Robert H.
2011-01-01
A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of…
A fast least-squares algorithm for population inference
2013-01-01
Background Population inference is an important problem in genetics used to remove population stratification in genome-wide association studies and to detect migration patterns or shared ancestry. An individual’s genotype can be modeled as a probabilistic function of ancestral population memberships, Q, and the allele frequencies in those populations, P. The parameters, P and Q, of this binomial likelihood model can be inferred using slow sampling methods such as Markov Chain Monte Carlo methods or faster gradient based approaches such as sequential quadratic programming. This paper proposes a least-squares simplification of the binomial likelihood model motivated by a Euclidean interpretation of the genotype feature space. This results in a faster algorithm that easily incorporates the degree of admixture within the sample of individuals and improves estimates without requiring trial-and-error tuning. Results We show that the expected value of the least-squares solution across all possible genotype datasets is equal to the true solution when part of the problem has been solved, and that the variance of the solution approaches zero as its size increases. The Least-squares algorithm performs nearly as well as Admixture for these theoretical scenarios. We compare least-squares, Admixture, and FRAPPE for a variety of problem sizes and difficulties. For particularly hard problems with a large number of populations, small number of samples, or greater degree of admixture, least-squares performs better than the other methods. On simulated mixtures of real population allele frequencies from the HapMap project, Admixture estimates sparsely mixed individuals better than Least-squares. The least-squares approach, however, performs within 1.5% of the Admixture error. On individual genotypes from the HapMap project, Admixture and least-squares perform qualitatively similarly and within 1.2% of each other. Significantly, the least-squares approach nearly always converges 1.5- to 6-times faster. Conclusions The computational advantage of the least-squares approach along with its good estimation performance warrants further research, especially for very large datasets. As problem sizes increase, the difference in estimation performance between all algorithms decreases. In addition, when prior information is known, the least-squares approach easily incorporates the expected degree of admixture to improve the estimate. PMID:23343408
A fast least-squares algorithm for population inference.
Parry, R Mitchell; Wang, May D
2013-01-23
Population inference is an important problem in genetics used to remove population stratification in genome-wide association studies and to detect migration patterns or shared ancestry. An individual's genotype can be modeled as a probabilistic function of ancestral population memberships, Q, and the allele frequencies in those populations, P. The parameters, P and Q, of this binomial likelihood model can be inferred using slow sampling methods such as Markov Chain Monte Carlo methods or faster gradient based approaches such as sequential quadratic programming. This paper proposes a least-squares simplification of the binomial likelihood model motivated by a Euclidean interpretation of the genotype feature space. This results in a faster algorithm that easily incorporates the degree of admixture within the sample of individuals and improves estimates without requiring trial-and-error tuning. We show that the expected value of the least-squares solution across all possible genotype datasets is equal to the true solution when part of the problem has been solved, and that the variance of the solution approaches zero as its size increases. The Least-squares algorithm performs nearly as well as Admixture for these theoretical scenarios. We compare least-squares, Admixture, and FRAPPE for a variety of problem sizes and difficulties. For particularly hard problems with a large number of populations, small number of samples, or greater degree of admixture, least-squares performs better than the other methods. On simulated mixtures of real population allele frequencies from the HapMap project, Admixture estimates sparsely mixed individuals better than Least-squares. The least-squares approach, however, performs within 1.5% of the Admixture error. On individual genotypes from the HapMap project, Admixture and least-squares perform qualitatively similarly and within 1.2% of each other. Significantly, the least-squares approach nearly always converges 1.5- to 6-times faster. The computational advantage of the least-squares approach along with its good estimation performance warrants further research, especially for very large datasets. As problem sizes increase, the difference in estimation performance between all algorithms decreases. In addition, when prior information is known, the least-squares approach easily incorporates the expected degree of admixture to improve the estimate.
Le, Quang A; Doctor, Jason N
2011-05-01
As quality-adjusted life years have become the standard metric in health economic evaluations, mapping health-profile or disease-specific measures onto preference-based measures to obtain quality-adjusted life years has become a solution when health utilities are not directly available. However, current mapping methods are limited due to their predictive validity, reliability, and/or other methodological issues. We employ probability theory together with a graphical model, called a Bayesian network, to convert health-profile measures into preference-based measures and to compare the results to those estimated with current mapping methods. A sample of 19,678 adults who completed both the 12-item Short Form Health Survey (SF-12v2) and EuroQoL 5D (EQ-5D) questionnaires from the 2003 Medical Expenditure Panel Survey was split into training and validation sets. Bayesian networks were constructed to explore the probabilistic relationships between each EQ-5D domain and 12 items of the SF-12v2. The EQ-5D utility scores were estimated on the basis of the predicted probability of each response level of the 5 EQ-5D domains obtained from the Bayesian inference process using the following methods: Monte Carlo simulation, expected utility, and most-likely probability. Results were then compared with current mapping methods including multinomial logistic regression, ordinary least squares, and censored least absolute deviations. The Bayesian networks consistently outperformed other mapping models in the overall sample (mean absolute error=0.077, mean square error=0.013, and R overall=0.802), in different age groups, number of chronic conditions, and ranges of the EQ-5D index. Bayesian networks provide a new robust and natural approach to map health status responses into health utility measures for health economic evaluations.
Lamadrid-Figueroa, Héctor; Téllez-Rojo, Martha M; Angeles, Gustavo; Hernández-Ávila, Mauricio; Hu, Howard
2011-01-01
In-vivo measurement of bone lead by means of K-X-ray fluorescence (KXRF) is the preferred biological marker of chronic exposure to lead. Unfortunately, considerable measurement error associated with KXRF estimations can introduce bias in estimates of the effect of bone lead when this variable is included as the exposure in a regression model. Estimates of uncertainty reported by the KXRF instrument reflect the variance of the measurement error and, although they can be used to correct the measurement error bias, they are seldom used in epidemiological statistical analyzes. Errors-in-variables regression (EIV) allows for correction of bias caused by measurement error in predictor variables, based on the knowledge of the reliability of such variables. The authors propose a way to obtain reliability coefficients for bone lead measurements from uncertainty data reported by the KXRF instrument and compare, by the use of Monte Carlo simulations, results obtained using EIV regression models vs. those obtained by the standard procedures. Results of the simulations show that Ordinary Least Square (OLS) regression models provide severely biased estimates of effect, and that EIV provides nearly unbiased estimates. Although EIV effect estimates are more imprecise, their mean squared error is much smaller than that of OLS estimates. In conclusion, EIV is a better alternative than OLS to estimate the effect of bone lead when measured by KXRF. Copyright © 2010 Elsevier Inc. All rights reserved.
Li, Libo; Bentler, Peter M
2011-06-01
MacCallum, Browne, and Cai (2006) proposed a new framework for evaluation and power analysis of small differences between nested structural equation models (SEMs). In their framework, the null and alternative hypotheses for testing a small difference in fit and its related power analyses were defined by some chosen root-mean-square error of approximation (RMSEA) pairs. In this article, we develop a new method that quantifies those chosen RMSEA pairs and allows a quantitative comparison of them. Our method proposes the use of single RMSEA values to replace the choice of RMSEA pairs for model comparison and power analysis, thus avoiding the differential meaning of the chosen RMSEA pairs inherent in the approach of MacCallum et al. (2006). With this choice, the conventional cutoff values in model overall evaluation can directly be transferred and applied to the evaluation and power analysis of model differences. © 2011 American Psychological Association
Confidence intervals in Flow Forecasting by using artificial neural networks
NASA Astrophysics Data System (ADS)
Panagoulia, Dionysia; Tsekouras, George
2014-05-01
One of the major inadequacies in implementation of Artificial Neural Networks (ANNs) for flow forecasting is the development of confidence intervals, because the relevant estimation cannot be implemented directly, contrasted to the classical forecasting methods. The variation in the ANN output is a measure of uncertainty in the model predictions based on the training data set. Different methods for uncertainty analysis, such as bootstrap, Bayesian, Monte Carlo, have already proposed for hydrologic and geophysical models, while methods for confidence intervals, such as error output, re-sampling, multi-linear regression adapted to ANN have been used for power load forecasting [1-2]. The aim of this paper is to present the re-sampling method for ANN prediction models and to develop this for flow forecasting of the next day. The re-sampling method is based on the ascending sorting of the errors between real and predicted values for all input vectors. The cumulative sample distribution function of the prediction errors is calculated and the confidence intervals are estimated by keeping the intermediate value, rejecting the extreme values according to the desired confidence levels, and holding the intervals symmetrical in probability. For application of the confidence intervals issue, input vectors are used from the Mesochora catchment in western-central Greece. The ANN's training algorithm is the stochastic training back-propagation process with decreasing functions of learning rate and momentum term, for which an optimization process is conducted regarding the crucial parameters values, such as the number of neurons, the kind of activation functions, the initial values and time parameters of learning rate and momentum term etc. Input variables are historical data of previous days, such as flows, nonlinearly weather related temperatures and nonlinearly weather related rainfalls based on correlation analysis between the under prediction flow and each implicit input variable of different ANN structures [3]. The performance of each ANN structure is evaluated by the voting analysis based on eleven criteria, which are the root mean square error (RMSE), the correlation index (R), the mean absolute percentage error (MAPE), the mean percentage error (MPE), the mean percentage error (ME), the percentage volume in errors (VE), the percentage error in peak (MF), the normalized mean bias error (NMBE), the normalized root mean bias error (NRMSE), the Nash-Sutcliffe model efficiency coefficient (E) and the modified Nash-Sutcliffe model efficiency coefficient (E1). The next day flow for the test set is calculated using the best ANN structure's model. Consequently, the confidence intervals of various confidence levels for training, evaluation and test sets are compared in order to explore the generalisation dynamics of confidence intervals from training and evaluation sets. [1] H.S. Hippert, C.E. Pedreira, R.C. Souza, "Neural networks for short-term load forecasting: A review and evaluation," IEEE Trans. on Power Systems, vol. 16, no. 1, 2001, pp. 44-55. [2] G. J. Tsekouras, N.E. Mastorakis, F.D. Kanellos, V.T. Kontargyri, C.D. Tsirekis, I.S. Karanasiou, Ch.N. Elias, A.D. Salis, P.A. Kontaxis, A.A. Gialketsi: "Short term load forecasting in Greek interconnected power system using ANN: Confidence Interval using a novel re-sampling technique with corrective Factor", WSEAS International Conference on Circuits, Systems, Electronics, Control & Signal Processing, (CSECS '10), Vouliagmeni, Athens, Greece, December 29-31, 2010. [3] D. Panagoulia, I. Trichakis, G. J. Tsekouras: "Flow Forecasting via Artificial Neural Networks - A Study for Input Variables conditioned on atmospheric circulation", European Geosciences Union, General Assembly 2012 (NH1.1 / AS1.16 - Extreme meteorological and hydrological events induced by severe weather and climate change), Vienna, Austria, 22-27 April 2012.
Yehia, Ali M; Mohamed, Heba M
2016-01-05
Three advanced chemmometric-assisted spectrophotometric methods namely; Concentration Residuals Augmented Classical Least Squares (CRACLS), Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) and Principal Component Analysis-Artificial Neural Networks (PCA-ANN) were developed, validated and benchmarked to PLS calibration; to resolve the severely overlapped spectra and simultaneously determine; Paracetamol (PAR), Guaifenesin (GUA) and Phenylephrine (PHE) in their ternary mixture and in presence of p-aminophenol (AP) the main degradation product and synthesis impurity of Paracetamol. The analytical performance of the proposed methods was described by percentage recoveries, root mean square error of calibration and standard error of prediction. The four multivariate calibration methods could be directly used without any preliminary separation step and successfully applied for pharmaceutical formulation analysis, showing no excipients' interference. Copyright © 2015 Elsevier B.V. All rights reserved.
Rasch fit statistics and sample size considerations for polytomous data.
Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael
2008-05-29
Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.
Rasch fit statistics and sample size considerations for polytomous data
Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael
2008-01-01
Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722
About an adaptively weighted Kaplan-Meier estimate.
Plante, Jean-François
2009-09-01
The minimum averaged mean squared error nonparametric adaptive weights use data from m possibly different populations to infer about one population of interest. The definition of these weights is based on the properties of the empirical distribution function. We use the Kaplan-Meier estimate to let the weights accommodate right-censored data and use them to define the weighted Kaplan-Meier estimate. The proposed estimate is smoother than the usual Kaplan-Meier estimate and converges uniformly in probability to the target distribution. Simulations show that the performances of the weighted Kaplan-Meier estimate on finite samples exceed that of the usual Kaplan-Meier estimate. A case study is also presented.
As-built design specification for proportion estimate software subsystem
NASA Technical Reports Server (NTRS)
Obrien, S. (Principal Investigator)
1980-01-01
The Proportion Estimate Processor evaluates four estimation techniques in order to get an improved estimate of the proportion of a scene that is planted in a selected crop. The four techniques to be evaluated were provided by the techniques development section and are: (1) random sampling; (2) proportional allocation, relative count estimate; (3) proportional allocation, Bayesian estimate; and (4) sequential Bayesian allocation. The user is given two options for computation of the estimated mean square error. These are referred to as the cluster calculation option and the segment calculation option. The software for the Proportion Estimate Processor is operational on the IBM 3031 computer.
New equations improve NIR prediction of body fat among high school wrestlers.
Oppliger, R A; Clark, R R; Nielsen, D H
2000-09-01
Methodologic study to derive prediction equations for percent body fat (%BF). To develop valid regression equations using NIR to assess body composition among high school wrestlers. Clinicians need a portable, fast, and simple field method for assessing body composition among wrestlers. Near-infrared photospectrometry (NIR) meets these criteria, but its efficacy has been challenged. Subjects were 150 high school wrestlers from 2 Midwestern states with mean +/- SD age of 16.3 +/- 1.1 yrs, weight of 69.5 +/- 11.7 kg, and height of 174.4 +/- 7.0 cm. Relative body fatness (%BF) determined from hydrostatic weighing was the criterion measure, and NIR optical density (OD) measurements at multiple sites, plus height, weight, and body mass index (BMI) were the predictor variables. Four equations were developed with multiple R2s that varied from .530 to .693, root mean squared errors varied from 2.8% BF to 3.4% BF, and prediction errors varied from 2.9% BF to 3.1% BF. The best equation used OD measurements at the biceps, triceps, and thigh sites, BMI, and age. The root mean squared error and prediction error for all 4 equations were equal to or smaller than for a skinfold equation commonly used with wrestlers. The results substantiate the validity of NIR for predicting % BF among high school wrestlers. Cross-validation of these equations is warranted.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, B; Miften, M
2014-06-15
Purpose: Cone-beam CT (CBCT) projection images provide anatomical data in real-time over several respiratory cycles, forming a comprehensive picture of tumor movement. We developed a method using these projections to determine the trajectory and dose of highly mobile tumors during each fraction of treatment. Methods: CBCT images of a respiration phantom were acquired, where the trajectory mimicked a lung tumor with high amplitude (2.4 cm) and hysteresis. A template-matching algorithm was used to identify the location of a steel BB in each projection. A Gaussian probability density function for tumor position was calculated which best fit the observed trajectory ofmore » the BB in the imager geometry. Two methods to improve the accuracy of tumor track reconstruction were investigated: first, using respiratory phase information to refine the trajectory estimation, and second, using the Monte Carlo method to sample the estimated Gaussian tumor position distribution. 15 clinically-drawn abdominal/lung CTV volumes were used to evaluate the accuracy of the proposed methods by comparing the known and calculated BB trajectories. Results: With all methods, the mean position of the BB was determined with accuracy better than 0.1 mm, and root-mean-square (RMS) trajectory errors were lower than 5% of marker amplitude. Use of respiratory phase information decreased RMS errors by 30%, and decreased the fraction of large errors (>3 mm) by half. Mean dose to the clinical volumes was calculated with an average error of 0.1% and average absolute error of 0.3%. Dosimetric parameters D90/D95 were determined within 0.5% of maximum dose. Monte-Carlo sampling increased RMS trajectory and dosimetric errors slightly, but prevented over-estimation of dose in trajectories with high noise. Conclusions: Tumor trajectory and dose-of-the-day were accurately calculated using CBCT projections. This technique provides a widely-available method to evaluate highly-mobile tumors, and could facilitate better strategies to mitigate or compensate for motion during SBRT.« less
Liu, Tingting; Zhang, Ling; Wang, Shutao; Cui, Yaoyao; Wang, Yutian; Liu, Lingfei; Yang, Zhe
2018-03-15
Qualitative and quantitative analysis of polycyclic aromatic hydrocarbons (PAHs) was carried out by three-dimensional fluorescence spectroscopy combining with Alternating Weighted Residue Constraint Quadrilinear Decomposition (AWRCQLD). The experimental subjects were acenaphthene (ANA) and naphthalene (NAP). Firstly, in order to solve the redundant information of the three-dimensional fluorescence spectral data, the wavelet transform was used to compress data in preprocessing. Then, the four-dimensional data was constructed by using the excitation-emission fluorescence spectra of different concentration PAHs. The sample data was obtained from three solvents that are methanol, ethanol and Ultra-pure water. The four-dimensional spectral data was analyzed by AWRCQLD, then the recovery rate of PAHs was obtained from the three solvents and compared respectively. On one hand, the results showed that PAHs can be measured more accurately by the high-order data, and the recovery rate was higher. On the other hand, the results presented that AWRCQLD can better reflect the superiority of four-dimensional algorithm than the second-order calibration and other third-order calibration algorithms. The recovery rate of ANA was 96.5%~103.3% and the root mean square error of prediction was 0.04μgL -1 . The recovery rate of NAP was 96.7%~115.7% and the root mean square error of prediction was 0.06μgL -1 . Copyright © 2017 Elsevier B.V. All rights reserved.
Zhao, Guo; Wang, Hui; Liu, Gang
2017-07-03
Abstract : In this study, a novel method based on a Bi/glassy carbon electrode (Bi/GCE) for quantitatively and directly detecting Cd 2+ in the presence of Cu 2+ without further electrode modifications by combining square-wave anodic stripping voltammetry (SWASV) and a back-propagation artificial neural network (BP-ANN) has been proposed. The influence of the Cu 2+ concentration on the stripping response to Cd 2+ was studied. In addition, the effect of the ferrocyanide concentration on the SWASV detection of Cd 2+ in the presence of Cu 2+ was investigated. A BP-ANN with two inputs and one output was used to establish the nonlinear relationship between the concentration of Cd 2+ and the stripping peak currents of Cu 2+ and Cd 2+ . The factors affecting the SWASV detection of Cd 2+ and the key parameters of the BP-ANN were optimized. Moreover, the direct calibration model (i.e., adding 0.1 mM ferrocyanide before detection), the BP-ANN model and other prediction models were compared to verify the prediction performance of these models in terms of their mean absolute errors (MAEs), root mean square errors (RMSEs) and correlation coefficients. The BP-ANN model exhibited higher prediction accuracy than the direct calibration model and the other prediction models. Finally, the proposed method was used to detect Cd 2+ in soil samples with satisfactory results.
Biomass energy inventory and mapping system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kasile, J.D.
1993-12-31
A four-stage biomass energy inventory and mapping system was conducted for the entire State of Ohio. The product is a set of maps and an inventory of the State of Ohio. The set of amps and an inventory of the State`s energy biomass resource are to a one kilometer grid square basis on the Universal Transverse Mercator (UTM) system. Each square kilometer is identified and mapped showing total British Thermal Unit (BTU) energy availability. Land cover percentages and BTU values are provided for each of nine biomass strata types for each one kilometer grid square. LANDSAT satellite data was usedmore » as the primary stratifier. The second stage sampling was the photointerpretation of randomly selected one kilometer grid squares that exactly corresponded to the LANDSAT one kilometer grid square classification orientation. Field sampling comprised the third stage of the energy biomass inventory system and was combined with the fourth stage sample of laboratory biomass energy analysis using a Bomb calorimeter and was then used to assign BTU values to the photointerpretation and to adjust the LANDSAT classification. The sampling error for the whole system was 3.91%.« less
Yingying, Zhang; Jiancheng, Lai; Cheng, Yin; Zhenhua, Li
2009-03-01
The dependence of the surface plasmon resonance (SPR) phase difference curve on the complex refractive index of a sample in Kretschmann configuration is discussed comprehensively, based on which a new method is proposed to measure the complex refractive index of turbid liquid. A corresponding experiment setup was constructed to measure the SPR phase difference curve, and the complex refractive index of turbid liquid was determined. By using the setup, the complex refractive indices of Intralipid solutions with concentrations of 5%, 10%, 15%, and 20% are obtained to be 1.3377+0.0005 i, 1.3427+0.0028 i, 1.3476+0.0034 i, and 1.3496+0.0038 i, respectively. Furthermore, the error analysis indicates that the root-mean-square errors of both the real and the imaginary parts of the measured complex refractive index are less than 5x10(-5).
Wang, Lu; Xu, Lisheng; Feng, Shuting; Meng, Max Q-H; Wang, Kuanquan
2013-11-01
Analysis of pulse waveform is a low cost, non-invasive method for obtaining vital information related to the conditions of the cardiovascular system. In recent years, different Pulse Decomposition Analysis (PDA) methods have been applied to disclose the pathological mechanisms of the pulse waveform. All these methods decompose single-period pulse waveform into a constant number (such as 3, 4 or 5) of individual waves. Furthermore, those methods do not pay much attention to the estimation error of the key points in the pulse waveform. The estimation of human vascular conditions depends on the key points' positions of pulse wave. In this paper, we propose a Multi-Gaussian (MG) model to fit real pulse waveforms using an adaptive number (4 or 5 in our study) of Gaussian waves. The unknown parameters in the MG model are estimated by the Weighted Least Squares (WLS) method and the optimized weight values corresponding to different sampling points are selected by using the Multi-Criteria Decision Making (MCDM) method. Performance of the MG model and the WLS method has been evaluated by fitting 150 real pulse waveforms of five different types. The resulting Normalized Root Mean Square Error (NRMSE) was less than 2.0% and the estimation accuracy for the key points was satisfactory, demonstrating that our proposed method is effective in compressing, synthesizing and analyzing pulse waveforms. Copyright © 2013 Elsevier Ltd. All rights reserved.
Jiménez-Carvelo, Ana M; González-Casado, Antonio; Cuadros-Rodríguez, Luis
2017-03-01
A new analytical method for the quantification of olive oil and palm oil in blends with other vegetable edible oils (canola, safflower, corn, peanut, seeds, grapeseed, linseed, sesame and soybean) using normal phase liquid chromatography, and applying chemometric tools was developed. The procedure for obtaining of chromatographic fingerprint from the methyl-transesterified fraction from each blend is described. The multivariate quantification methods used were Partial Least Square-Regression (PLS-R) and Support Vector Regression (SVR). The quantification results were evaluated by several parameters as the Root Mean Square Error of Validation (RMSEV), Mean Absolute Error of Validation (MAEV) and Median Absolute Error of Validation (MdAEV). It has to be highlighted that the new proposed analytical method, the chromatographic analysis takes only eight minutes and the results obtained showed the potential of this method and allowed quantification of mixtures of olive oil and palm oil with other vegetable oils. Copyright © 2016 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morley, Steven
The PyForecastTools package provides Python routines for calculating metrics for model validation, forecast verification and model comparison. For continuous predictands the package provides functions for calculating bias (mean error, mean percentage error, median log accuracy, symmetric signed bias), and for calculating accuracy (mean squared error, mean absolute error, mean absolute scaled error, normalized RMSE, median symmetric accuracy). Convenience routines to calculate the component parts (e.g. forecast error, scaled error) of each metric are also provided. To compare models the package provides: generic skill score; percent better. Robust measures of scale including median absolute deviation, robust standard deviation, robust coefficient ofmore » variation and the Sn estimator are all provided by the package. Finally, the package implements Python classes for NxN contingency tables. In the case of a multi-class prediction, accuracy and skill metrics such as proportion correct and the Heidke and Peirce skill scores are provided as object methods. The special case of a 2x2 contingency table inherits from the NxN class and provides many additional metrics for binary classification: probability of detection, probability of false detection, false alarm ration, threat score, equitable threat score, bias. Confidence intervals for many of these quantities can be calculated using either the Wald method or Agresti-Coull intervals.« less
Instrumental variables vs. grouping approach for reducing bias due to measurement error.
Batistatou, Evridiki; McNamee, Roseanne
2008-01-01
Attenuation of the exposure-response relationship due to exposure measurement error is often encountered in epidemiology. Given that error cannot be totally eliminated, bias correction methods of analysis are needed. Many methods require more than one exposure measurement per person to be made, but the `group mean OLS method,' in which subjects are grouped into several a priori defined groups followed by ordinary least squares (OLS) regression on the group means, can be applied with one measurement. An alternative approach is to use an instrumental variable (IV) method in which both the single error-prone measure and an IV are used in IV analysis. In this paper we show that the `group mean OLS' estimator is equal to an IV estimator with the group mean used as IV, but that the variance estimators for the two methods are different. We derive a simple expression for the bias in the common estimator which is a simple function of group size, reliability and contrast of exposure between groups, and show that the bias can be very small when group size is large. We compare this method with a new proposal (group mean ranking method), also applicable with a single exposure measurement, in which the IV is the rank of the group means. When there are two independent exposure measurements per subject, we propose a new IV method (EVROS IV) and compare it with Carroll and Stefanski's (CS IV) proposal in which the second measure is used as an IV; the new IV estimator combines aspects of the `group mean' and `CS' strategies. All methods are evaluated in terms of bias, precision and root mean square error via simulations and a dataset from occupational epidemiology. The `group mean ranking method' does not offer much improvement over the `group mean method.' Compared with the `CS' method, the `EVROS' method is less affected by low reliability of exposure. We conclude that the group IV methods we propose may provide a useful way to handle mismeasured exposures in epidemiology with or without replicate measurements. Our finding may also have implications for the use of aggregate variables in epidemiology to control for unmeasured confounding.
A Constrained Least Squares Approach to Mobile Positioning: Algorithms and Optimality
NASA Astrophysics Data System (ADS)
Cheung, KW; So, HC; Ma, W.-K.; Chan, YT
2006-12-01
The problem of locating a mobile terminal has received significant attention in the field of wireless communications. Time-of-arrival (TOA), received signal strength (RSS), time-difference-of-arrival (TDOA), and angle-of-arrival (AOA) are commonly used measurements for estimating the position of the mobile station. In this paper, we present a constrained weighted least squares (CWLS) mobile positioning approach that encompasses all the above described measurement cases. The advantages of CWLS include performance optimality and capability of extension to hybrid measurement cases (e.g., mobile positioning using TDOA and AOA measurements jointly). Assuming zero-mean uncorrelated measurement errors, we show by mean and variance analysis that all the developed CWLS location estimators achieve zero bias and the Cramér-Rao lower bound approximately when measurement error variances are small. The asymptotic optimum performance is also confirmed by simulation results.
Prediction of BP reactivity to talking using hybrid soft computing approaches.
Kaur, Gurmanik; Arora, Ajat Shatru; Jain, Vijender Kumar
2014-01-01
High blood pressure (BP) is associated with an increased risk of cardiovascular diseases. Therefore, optimal precision in measurement of BP is appropriate in clinical and research studies. In this work, anthropometric characteristics including age, height, weight, body mass index (BMI), and arm circumference (AC) were used as independent predictor variables for the prediction of BP reactivity to talking. Principal component analysis (PCA) was fused with artificial neural network (ANN), adaptive neurofuzzy inference system (ANFIS), and least square-support vector machine (LS-SVM) model to remove the multicollinearity effect among anthropometric predictor variables. The statistical tests in terms of coefficient of determination (R (2)), root mean square error (RMSE), and mean absolute percentage error (MAPE) revealed that PCA based LS-SVM (PCA-LS-SVM) model produced a more efficient prediction of BP reactivity as compared to other models. This assessment presents the importance and advantages posed by PCA fused prediction models for prediction of biological variables.
A successive overrelaxation iterative technique for an adaptive equalizer
NASA Technical Reports Server (NTRS)
Kosovych, O. S.
1973-01-01
An adaptive strategy for the equalization of pulse-amplitude-modulated signals in the presence of intersymbol interference and additive noise is reported. The successive overrelaxation iterative technique is used as the algorithm for the iterative adjustment of the equalizer coefficents during a training period for the minimization of the mean square error. With 2-cyclic and nonnegative Jacobi matrices substantial improvement is demonstrated in the rate of convergence over the commonly used gradient techniques. The Jacobi theorems are also extended to nonpositive Jacobi matrices. Numerical examples strongly indicate that the improvements obtained for the special cases are possible for general channel characteristics. The technique is analytically demonstrated to decrease the mean square error at each iteration for a large range of parameter values for light or moderate intersymbol interference and for small intervals for general channels. Analytically, convergence of the relaxation algorithm was proven in a noisy environment and the coefficient variance was demonstrated to be bounded.
Damage level prediction of non-reshaped berm breakwater using ANN, SVM and ANFIS models
NASA Astrophysics Data System (ADS)
Mandal, Sukomal; Rao, Subba; N., Harish; Lokesha
2012-06-01
The damage analysis of coastal structure is very important as it involves many design parameters to be considered for the better and safe design of structure. In the present study experimental data for non-reshaped berm breakwater are collected from Marine Structures Laboratory, Department of Applied Mechanics and Hydraulics, NITK, Surathkal, India. Soft computing techniques like Artificial Neural Network (ANN), Support Vector Machine (SVM) and Adaptive Neuro Fuzzy Inference system (ANFIS) models are constructed using experimental data sets to predict the damage level of non-reshaped berm breakwater. The experimental data are used to train ANN, SVM and ANFIS models and results are determined in terms of statistical measures like mean square error, root mean square error, correla-tion coefficient and scatter index. The result shows that soft computing techniques i.e., ANN, SVM and ANFIS can be efficient tools in predicting damage levels of non reshaped berm breakwater.
pKa prediction of monoprotic small molecules the SMARTS way.
Lee, Adam C; Yu, Jing-Yu; Crippen, Gordon M
2008-10-01
Realizing favorable absorption, distribution, metabolism, elimination, and toxicity profiles is a necessity due to the high attrition rate of lead compounds in drug development today. The ability to accurately predict bioavailability can help save time and money during the screening and optimization processes. As several robust programs already exist for predicting logP, we have turned our attention to the fast and robust prediction of pK(a) for small molecules. Using curated data from the Beilstein Database and Lange's Handbook of Chemistry, we have created a decision tree based on a novel set of SMARTS strings that can accurately predict the pK(a) for monoprotic compounds with R(2) of 0.94 and root mean squared error of 0.68. Leave-some-out (10%) cross-validation achieved Q(2) of 0.91 and root mean squared error of 0.80.
NASA Astrophysics Data System (ADS)
Dikmen, Erkan; Ayaz, Mahir; Gül, Doğan; Şahin, Arzu Şencan
2017-07-01
The determination of drying behavior of herbal plants is a complex process. In this study, gene expression programming (GEP) model was used to determine drying behavior of herbal plants as fresh sweet basil, parsley and dill leaves. Time and drying temperatures are input parameters for the estimation of moisture ratio of herbal plants. The results of the GEP model are compared with experimental drying data. The statistical values as mean absolute percentage error, root-mean-squared error and R-square are used to calculate the difference between values predicted by the GEP model and the values actually observed from the experimental study. It was found that the results of the GEP model and experimental study are in moderately well agreement. The results have shown that the GEP model can be considered as an efficient modelling technique for the prediction of moisture ratio of herbal plants.
NASA Astrophysics Data System (ADS)
Xiong, Qiufen; Hu, Jianglin
2013-05-01
The minimum/maximum (Min/Max) temperature in the Yangtze River valley is decomposed into the climatic mean and anomaly component. A spatial interpolation is developed which combines the 3D thin-plate spline scheme for climatological mean and the 2D Barnes scheme for the anomaly component to create a daily Min/Max temperature dataset. The climatic mean field is obtained by the 3D thin-plate spline scheme because the relationship between the decreases in Min/Max temperature with elevation is robust and reliable on a long time-scale. The characteristics of the anomaly field tend to be related to elevation variation weakly, and the anomaly component is adequately analyzed by the 2D Barnes procedure, which is computationally efficient and readily tunable. With this hybridized interpolation method, a daily Min/Max temperature dataset that covers the domain from 99°E to 123°E and from 24°N to 36°N with 0.1° longitudinal and latitudinal resolution is obtained by utilizing daily Min/Max temperature data from three kinds of station observations, which are national reference climatological stations, the basic meteorological observing stations and the ordinary meteorological observing stations in 15 provinces and municipalities in the Yangtze River valley from 1971 to 2005. The error estimation of the gridded dataset is assessed by examining cross-validation statistics. The results show that the statistics of daily Min/Max temperature interpolation not only have high correlation coefficient (0.99) and interpolation efficiency (0.98), but also the mean bias error is 0.00 °C. For the maximum temperature, the root mean square error is 1.1 °C and the mean absolute error is 0.85 °C. For the minimum temperature, the root mean square error is 0.89 °C and the mean absolute error is 0.67 °C. Thus, the new dataset provides the distribution of Min/Max temperature over the Yangtze River valley with realistic, successive gridded data with 0.1° × 0.1° spatial resolution and daily temporal scale. The primary factors influencing the dataset precision are elevation and terrain complexity. In general, the gridded dataset has a relatively high precision in plains and flatlands and a relatively low precision in mountainous areas.
Ten years of preanalytical monitoring and control: Synthetic Balanced Score Card Indicator
López-Garrigós, Maite; Flores, Emilio; Santo-Quiles, Ana; Gutierrez, Mercedes; Lugo, Javier; Lillo, Rosa; Leiva-Salinas, Carlos
2015-01-01
Introduction Preanalytical control and monitoring continue to be an important issue for clinical laboratory professionals. The aim of the study was to evaluate a monitoring system of preanalytical errors regarding not suitable samples for analysis, based on different indicators; to compare such indicators in different phlebotomy centres; and finally to evaluate a single synthetic preanalytical indicator that may be included in the balanced scorecard management system (BSC). Materials and methods We collected individual and global preanalytical errors in haematology, coagulation, chemistry, and urine samples analysis. We also analyzed a synthetic indicator that represents the sum of all types of preanalytical errors, expressed in a sigma level. We studied the evolution of those indicators over time and compared indicator results by way of the comparison of proportions and Chi-square. Results There was a decrease in the number of errors along the years (P < 0.001). This pattern was confirmed in primary care patients, inpatients and outpatients. In blood samples, fewer errors occurred in outpatients, followed by inpatients. Conclusion We present a practical and effective methodology to monitor unsuitable sample preanalytical errors. The synthetic indicator results summarize overall preanalytical sample errors, and can be used as part of BSC management system. PMID:25672466
Achievable accuracy of hip screw holding power estimation by insertion torque measurement.
Erani, Paolo; Baleani, Massimiliano
2018-02-01
To ensure stability of proximal femoral fractures, the hip screw must firmly engage into the femoral head. Some studies suggested that screw holding power into trabecular bone could be evaluated, intraoperatively, through measurement of screw insertion torque. However, those studies used synthetic bone, instead of trabecular bone, as host material or they did not evaluate accuracy of predictions. We determined prediction accuracy, also assessing the impact of screw design and host material. We measured, under highly-repeatable experimental conditions, disregarding clinical procedure complexities, insertion torque and pullout strength of four screw designs, both in 120 synthetic and 80 trabecular bone specimens of variable density. For both host materials, we calculated the root-mean-square error and the mean-absolute-percentage error of predictions based on the best fitting model of torque-pullout data, in both single-screw and merged dataset. Predictions based on screw-specific regression models were the most accurate. Host material impacts on prediction accuracy: the replacement of synthetic with trabecular bone decreased both root-mean-square errors, from 0.54 ÷ 0.76 kN to 0.21 ÷ 0.40 kN, and mean-absolute-percentage errors, from 14 ÷ 21% to 10 ÷ 12%. However, holding power predicted on low insertion torque remained inaccurate, with errors up to 40% for torques below 1 Nm. In poor-quality trabecular bone, tissue inhomogeneities likely affect pullout strength and insertion torque to different extents, limiting the predictive power of the latter. This bias decreases when the screw engages good-quality bone. Under this condition, predictions become more accurate although this result must be confirmed by close in-vitro simulation of the clinical procedure. Copyright © 2018 Elsevier Ltd. All rights reserved.
Li, Kaiyue; Wang, Weiying; Liu, Yanping; Jiang, Su; Huang, Guo; Ye, Liming
2017-01-01
The active ingredients and thus pharmacological efficacy of traditional Chinese medicine (TCM) at different degrees of parching process vary greatly. Near-infrared spectroscopy (NIR) was used to develop a new method for rapid online analysis of TCM parching process, using two kinds of chemical indicators (5-(hydroxymethyl) furfural [5-HMF] content and 420 nm absorbance) as reference values which were obviously observed and changed in most TCM parching process. Three representative TCMs, Areca ( Areca catechu L.), Malt ( Hordeum Vulgare L.), and Hawthorn ( Crataegus pinnatifida Bge.), were used in this study. With partial least squares regression, calibration models of NIR were generated based on two kinds of reference values, i.e. 5-HMF contents measured by high-performance liquid chromatography (HPLC) and 420 nm absorbance measured by ultraviolet-visible spectroscopy (UV/Vis), respectively. In the optimized models for 5-HMF, the root mean square errors of prediction (RMSEP) for Areca, Malt, and Hawthorn was 0.0192, 0.0301, and 0.2600 and correlation coefficients ( R cal ) were 99.86%, 99.88%, and 99.88%, respectively. Moreover, in the optimized models using 420 nm absorbance as reference values, the RMSEP for Areca, Malt, and Hawthorn was 0.0229, 0.0096, and 0.0409 and R cal were 99.69%, 99.81%, and 99.62%, respectively. NIR models with 5-HMF content and 420 nm absorbance as reference values can rapidly and effectively identify three kinds of TCM in different parching processes. This method has great promise to replace current subjective color judgment and time-consuming HPLC or UV/Vis methods and is suitable for rapid online analysis and quality control in TCM industrial manufacturing process. Near-infrared spectroscopy.(NIR) was used to develop a new method for online analysis of traditional Chinese medicine.(TCM) parching processCalibration and validation models of Areca, Malt, and Hawthorn were generated by partial least squares regression using 5.(hydroxymethyl) furfural contents and 420.nm absorbance as reference values, respectively, which were main indicator components during parching process of most TCMThe established NIR models of three TCMs had low root mean square errors of prediction and high correlation coefficientsThe NIR method has great promise for use in TCM industrial manufacturing processes for rapid online analysis and quality control. Abbreviations used: NIR: Near-infrared Spectroscopy; TCM: Traditional Chinese medicine; Areca: Areca catechu L.; Hawthorn: Crataegus pinnatifida Bge.; Malt: Hordeum vulgare L.; 5-HMF: 5-(hydroxymethyl) furfural; PLS: Partial least squares; D: Dimension faction; SLS: Straight line subtraction, MSC: Multiplicative scatter correction; VN: Vector normalization; RMSECV: Root mean square errors of cross-validation; RMSEP: Root mean square errors of validation; R cal : Correlation coefficients; RPD: Residual predictive deviation; PAT: Process analytical technology; FDA: Food and Drug Administration; ICH: International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use.
Mixed effects versus fixed effects modelling of binary data with inter-subject variability.
Murphy, Valda; Dunne, Adrian
2005-04-01
The question of whether or not a mixed effects model is required when modelling binary data with inter-subject variability and within subject correlation was reported in this journal by Yano et al. (J. Pharmacokin. Pharmacodyn. 28:389-412 [2001]). That report used simulation experiments to demonstrate that, under certain circumstances, the use of a fixed effects model produced more accurate estimates of the fixed effect parameters than those produced by a mixed effects model. The Laplace approximation to the likelihood was used when fitting the mixed effects model. This paper repeats one of those simulation experiments, with two binary observations recorded for every subject, and uses both the Laplace and the adaptive Gaussian quadrature approximations to the likelihood when fitting the mixed effects model. The results show that the estimates produced using the Laplace approximation include a small number of extreme outliers. This was not the case when using the adaptive Gaussian quadrature approximation. Further examination of these outliers shows that they arise in situations in which the Laplace approximation seriously overestimates the likelihood in an extreme region of the parameter space. It is also demonstrated that when the number of observations per subject is increased from two to three, the estimates based on the Laplace approximation no longer include any extreme outliers. The root mean squared error is a combination of the bias and the variability of the estimates. Increasing the sample size is known to reduce the variability of an estimator with a consequent reduction in its root mean squared error. The estimates based on the fixed effects model are inherently biased and this bias acts as a lower bound for the root mean squared error of these estimates. Consequently, it might be expected that for data sets with a greater number of subjects the estimates based on the mixed effects model would be more accurate than those based on the fixed effects model. This is borne out by the results of a further simulation experiment with an increased number of subjects in each set of data. The difference in the interpretation of the parameters of the fixed and mixed effects models is discussed. It is demonstrated that the mixed effects model and parameter estimates can be used to estimate the parameters of the fixed effects model but not vice versa.
Comparing least-squares and quantile regression approaches to analyzing median hospital charges.
Olsen, Cody S; Clark, Amy E; Thomas, Andrea M; Cook, Lawrence J
2012-07-01
Emergency department (ED) and hospital charges obtained from administrative data sets are useful descriptors of injury severity and the burden to EDs and the health care system. However, charges are typically positively skewed due to costly procedures, long hospital stays, and complicated or prolonged treatment for few patients. The median is not affected by extreme observations and is useful in describing and comparing distributions of hospital charges. A least-squares analysis employing a log transformation is one approach for estimating median hospital charges, corresponding confidence intervals (CIs), and differences between groups; however, this method requires certain distributional properties. An alternate method is quantile regression, which allows estimation and inference related to the median without making distributional assumptions. The objective was to compare the log-transformation least-squares method to the quantile regression approach for estimating median hospital charges, differences in median charges between groups, and associated CIs. The authors performed simulations using repeated sampling of observed statewide ED and hospital charges and charges randomly generated from a hypothetical lognormal distribution. The median and 95% CI and the multiplicative difference between the median charges of two groups were estimated using both least-squares and quantile regression methods. Performance of the two methods was evaluated. In contrast to least squares, quantile regression produced estimates that were unbiased and had smaller mean square errors in simulations of observed ED and hospital charges. Both methods performed well in simulations of hypothetical charges that met least-squares method assumptions. When the data did not follow the assumed distribution, least-squares estimates were often biased, and the associated CIs had lower than expected coverage as sample size increased. Quantile regression analyses of hospital charges provide unbiased estimates even when lognormal and equal variance assumptions are violated. These methods may be particularly useful in describing and analyzing hospital charges from administrative data sets. © 2012 by the Society for Academic Emergency Medicine.
A new open-loop fiber optic gyro error compensation method based on angular velocity error modeling.
Zhang, Yanshun; Guo, Yajing; Li, Chunyu; Wang, Yixin; Wang, Zhanqing
2015-02-27
With the open-loop fiber optic gyro (OFOG) model, output voltage and angular velocity can effectively compensate OFOG errors. However, the model cannot reflect the characteristics of OFOG errors well when it comes to pretty large dynamic angular velocities. This paper puts forward a modeling scheme with OFOG output voltage u and temperature T as the input variables and angular velocity error Δω as the output variable. Firstly, the angular velocity error Δω is extracted from OFOG output signals, and then the output voltage u, temperature T and angular velocity error Δω are used as the learning samples to train a Radial-Basis-Function (RBF) neural network model. Then the nonlinear mapping model over T, u and Δω is established and thus Δω can be calculated automatically to compensate OFOG errors according to T and u. The results of the experiments show that the established model can be used to compensate the nonlinear OFOG errors. The maximum, the minimum and the mean square error of OFOG angular velocity are decreased by 97.0%, 97.1% and 96.5% relative to their initial values, respectively. Compared with the direct modeling of gyro angular velocity, which we researched before, the experimental results of the compensating method proposed in this paper are further reduced by 1.6%, 1.4% and 1.42%, respectively, so the performance of this method is better than that of the direct modeling for gyro angular velocity.
A New Open-Loop Fiber Optic Gyro Error Compensation Method Based on Angular Velocity Error Modeling
Zhang, Yanshun; Guo, Yajing; Li, Chunyu; Wang, Yixin; Wang, Zhanqing
2015-01-01
With the open-loop fiber optic gyro (OFOG) model, output voltage and angular velocity can effectively compensate OFOG errors. However, the model cannot reflect the characteristics of OFOG errors well when it comes to pretty large dynamic angular velocities. This paper puts forward a modeling scheme with OFOG output voltage u and temperature T as the input variables and angular velocity error Δω as the output variable. Firstly, the angular velocity error Δω is extracted from OFOG output signals, and then the output voltage u, temperature T and angular velocity error Δω are used as the learning samples to train a Radial-Basis-Function (RBF) neural network model. Then the nonlinear mapping model over T, u and Δω is established and thus Δω can be calculated automatically to compensate OFOG errors according to T and u. The results of the experiments show that the established model can be used to compensate the nonlinear OFOG errors. The maximum, the minimum and the mean square error of OFOG angular velocity are decreased by 97.0%, 97.1% and 96.5% relative to their initial values, respectively. Compared with the direct modeling of gyro angular velocity, which we researched before, the experimental results of the compensating method proposed in this paper are further reduced by 1.6%, 1.4% and 1.2%, respectively, so the performance of this method is better than that of the direct modeling for gyro angular velocity. PMID:25734642
Nuopponen, Mari H; Birch, Gillian M; Sykes, Rob J; Lee, Steve J; Stewart, Derek
2006-01-11
Sitka spruce (Picea sitchensis) samples (491) from 50 different clones as well as 24 different tropical hardwoods and 20 Scots pine (Pinus sylvestris) samples were used to construct diffuse reflectance mid-infrared Fourier transform (DRIFT-MIR) based partial least squares (PLS) calibrations on lignin, cellulose, and wood resin contents and densities. Calibrations for density, lignin, and cellulose were established for all wood species combined into one data set as well as for the separate Sitka spruce data set. Relationships between wood resin and MIR data were constructed for the Sitka spruce data set as well as the combined Scots pine and Sitka spruce data sets. Calibrations containing only five wavenumbers instead of spectral ranges 4000-2800 and 1800-700 cm(-1) were also established. In addition, chemical factors contributing to wood density were studied. Chemical composition and density assessed from DRIFT-MIR calibrations had R2 and Q2 values in the ranges of 0.6-0.9 and 0.6-0.8, respectively. The PLS models gave residual mean squares error of prediction (RMSEP) values of 1.6-1.9, 2.8-3.7, and 0.4 for lignin, cellulose, and wood resin contents, respectively. Density test sets had RMSEP values ranging from 50 to 56. Reduced amount of wavenumbers can be utilized to predict the chemical composition and density of a wood, which should allow measurements of these properties using a hand-held device. MIR spectral data indicated that low-density samples had somewhat higher lignin contents than high-density samples. Correspondingly, high-density samples contained slightly more polysaccharides than low-density samples. This observation was consistent with the wet chemical data.
Development of uncertainty-based work injury model using Bayesian structural equation modelling.
Chatterjee, Snehamoy
2014-01-01
This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
Photogrammetric Method and Software for Stream Planform Identification
NASA Astrophysics Data System (ADS)
Stonedahl, S. H.; Stonedahl, F.; Lohberg, M. M.; Lusk, K.; Miller, D.
2013-12-01
Accurately characterizing the planform of a stream is important for many purposes, including recording measurement and sampling locations, monitoring change due to erosion or volumetric discharge, and spatial modeling of stream processes. While expensive surveying equipment or high resolution aerial photography can be used to obtain planform data, our research focused on developing a close-range photogrammetric method (and accompanying free/open-source software) to serve as a cost-effective alternative. This method involves securing and floating a wooden square frame on the stream surface at several locations, taking photographs from numerous angles at each location, and then post-processing and merging data from these photos using the corners of the square for reference points, unit scale, and perspective correction. For our test field site we chose a ~35m reach along Black Hawk Creek in Sunderbruch Park (Davenport, IA), a small, slow-moving stream with overhanging trees. To quantify error we measured 88 distances between 30 marked control points along the reach. We calculated error by comparing these 'ground truth' distances to the corresponding distances extracted from our photogrammetric method. We placed the square at three locations along our reach and photographed it from multiple angles. The square corners, visible control points, and visible stream outline were hand-marked in these photos using the GIMP (open-source image editor). We wrote an open-source GUI in Java (hosted on GitHub), which allows the user to load marked-up photos, designate square corners and label control points. The GUI also extracts the marked pixel coordinates from the images. We also wrote several scripts (currently in MATLAB) that correct the pixel coordinates for radial distortion using Brown's lens distortion model, correct for perspective by forcing the four square corner pixels to form a parallelogram in 3-space, and rotate the points in order to correctly orient all photos of the same square location. Planform data from multiple photos (and multiple square locations) are combined using weighting functions that mitigate the error stemming from the markup-process, imperfect camera calibration, etc. We have used our (beta) software to mark and process over 100 photos, yielding an average error of only 1.5% relative to our 88 measured lengths. Next we plan to translate the MATLAB scripts into Python and release their source code, at which point only free software, consumer-grade digital cameras, and inexpensive building materials will be needed for others to replicate this method at new field sites. Three sample photographs of the square with the created planform and control points
Zhang, Xike; Zhang, Qiuwen; Zhang, Gui; Nie, Zhiping; Gui, Zifan; Que, Huafei
2018-01-01
Daily land surface temperature (LST) forecasting is of great significance for application in climate-related, agricultural, eco-environmental, or industrial studies. Hybrid data-driven prediction models using Ensemble Empirical Mode Composition (EEMD) coupled with Machine Learning (ML) algorithms are useful for achieving these purposes because they can reduce the difficulty of modeling, require less history data, are easy to develop, and are less complex than physical models. In this article, a computationally simple, less data-intensive, fast and efficient novel hybrid data-driven model called the EEMD Long Short-Term Memory (LSTM) neural network, namely EEMD-LSTM, is proposed to reduce the difficulty of modeling and to improve prediction accuracy. The daily LST data series from the Mapoling and Zhijiang stations in the Dongting Lake basin, central south China, from 1 January 2014 to 31 December 2016 is used as a case study. The EEMD is firstly employed to decompose the original daily LST data series into many Intrinsic Mode Functions (IMFs) and a single residue item. Then, the Partial Autocorrelation Function (PACF) is used to obtain the number of input data sample points for LSTM models. Next, the LSTM models are constructed to predict the decompositions. All the predicted results of the decompositions are aggregated as the final daily LST. Finally, the prediction performance of the hybrid EEMD-LSTM model is assessed in terms of the Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Pearson Correlation Coefficient (CC) and Nash-Sutcliffe Coefficient of Efficiency (NSCE). To validate the hybrid data-driven model, the hybrid EEMD-LSTM model is compared with the Recurrent Neural Network (RNN), LSTM and Empirical Mode Decomposition (EMD) coupled with RNN, EMD-LSTM and EEMD-RNN models, and their comparison results demonstrate that the hybrid EEMD-LSTM model performs better than the other five models. The scatterplots of the predicted results of the six models versus the original daily LST data series show that the hybrid EEMD-LSTM model is superior to the other five models. It is concluded that the proposed hybrid EEMD-LSTM model in this study is a suitable tool for temperature forecasting. PMID:29883381
Zhang, Xike; Zhang, Qiuwen; Zhang, Gui; Nie, Zhiping; Gui, Zifan; Que, Huafei
2018-05-21
Daily land surface temperature (LST) forecasting is of great significance for application in climate-related, agricultural, eco-environmental, or industrial studies. Hybrid data-driven prediction models using Ensemble Empirical Mode Composition (EEMD) coupled with Machine Learning (ML) algorithms are useful for achieving these purposes because they can reduce the difficulty of modeling, require less history data, are easy to develop, and are less complex than physical models. In this article, a computationally simple, less data-intensive, fast and efficient novel hybrid data-driven model called the EEMD Long Short-Term Memory (LSTM) neural network, namely EEMD-LSTM, is proposed to reduce the difficulty of modeling and to improve prediction accuracy. The daily LST data series from the Mapoling and Zhijaing stations in the Dongting Lake basin, central south China, from 1 January 2014 to 31 December 2016 is used as a case study. The EEMD is firstly employed to decompose the original daily LST data series into many Intrinsic Mode Functions (IMFs) and a single residue item. Then, the Partial Autocorrelation Function (PACF) is used to obtain the number of input data sample points for LSTM models. Next, the LSTM models are constructed to predict the decompositions. All the predicted results of the decompositions are aggregated as the final daily LST. Finally, the prediction performance of the hybrid EEMD-LSTM model is assessed in terms of the Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Pearson Correlation Coefficient (CC) and Nash-Sutcliffe Coefficient of Efficiency (NSCE). To validate the hybrid data-driven model, the hybrid EEMD-LSTM model is compared with the Recurrent Neural Network (RNN), LSTM and Empirical Mode Decomposition (EMD) coupled with RNN, EMD-LSTM and EEMD-RNN models, and their comparison results demonstrate that the hybrid EEMD-LSTM model performs better than the other five models. The scatterplots of the predicted results of the six models versus the original daily LST data series show that the hybrid EEMD-LSTM model is superior to the other five models. It is concluded that the proposed hybrid EEMD-LSTM model in this study is a suitable tool for temperature forecasting.
Kolodziejczyk, Julia K; Norman, Gregory J; Roesch, Scott C; Rock, Cheryl L; Arredondo, Elva M; Madanat, Hala; Patrick, Kevin
2015-01-01
There is a need for a self-report measure that assesses use of recommended strategies related to weight management. Cross-sectional analysis. Universities, community. Exploratory factor analysis (EFA) involved data from 404 overweight/obese young adults (mean age = 22 years, 48% non-Hispanic white, 68% ethnic minority). Confirmatory factor analysis (CFA) involved data from 236 overweight/obese adults (mean age = 42 years, 63% non-Hispanic white, 84% ethnic minority). The Strategies for Weight Management (SWM) measure is a 35-item questionnaire that assesses use of recommended behavioral strategies for reducing energy intake and increasing energy expenditure in overweight/obese adults. EFA and CFA were conducted on the SWM. Correlate models assessed the associations between SWM factor/total scores and demographics by using linear regressions. EFA suggested a four-factor model: strategies categorized as targeting (1) energy intake, (2) energy expenditure, (3) self-monitoring, and (4) self-regulation. CFA indicated good model fit (χ(2)/df = 2.0, comparative fit index = .90, standardized root mean square residual = .06, and root mean square error of approximation = .07, confidence interval = .06-.08, R(2) = .11-.74). The fourth factor had the lowest loadings, possibly because the items cover a wide domain. The final model included 20 items. Correlate models revealed weak associations between the SWM scores and age, gender, Hispanic ethnicity, and relationship status in both samples, with the models explaining only 1% to 8% of the variance (betas = -.04 to .29, p < .05). The SWM has promising psychometric qualities in two diverse samples.
Modeling and forecasting of KLCI weekly return using WT-ANN integrated model
NASA Astrophysics Data System (ADS)
Liew, Wei-Thong; Liong, Choong-Yeun; Hussain, Saiful Izzuan; Isa, Zaidi
2013-04-01
The forecasting of weekly return is one of the most challenging tasks in investment since the time series are volatile and non-stationary. In this study, an integrated model of wavelet transform and artificial neural network, WT-ANN is studied for modeling and forecasting of KLCI weekly return. First, the WT is applied to decompose the weekly return time series in order to eliminate noise. Then, a mathematical model of the time series is constructed using the ANN. The performance of the suggested model will be evaluated by root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE). The result shows that the WT-ANN model can be considered as a feasible and powerful model for time series modeling and prediction.
[Application of wavelet neural networks model to forecast incidence of syphilis].
Zhou, Xian-Feng; Feng, Zi-Jian; Yang, Wei-Zhong; Li, Xiao-Song
2011-07-01
To apply Wavelet Neural Networks (WNN) model to forecast incidence of Syphilis. Back Propagation Neural Network (BPNN) and WNN were developed based on the monthly incidence of Syphilis in Sichuan province from 2004 to 2008. The accuracy of forecast was compared between the two models. In the training approximation, the mean absolute error (MAE), rooted mean square error (RMSE) and mean absolute percentage error (MAPE) were 0.0719, 0.0862 and 11.52% respectively for WNN, and 0.0892, 0.1183 and 14.87% respectively for BPNN. The three indexes for generalization of models were 0.0497, 0.0513 and 4.60% for WNN, and 0.0816, 0.1119 and 7.25% for BPNN. WNN is a better model for short-term forecasting of Syphilis.
Least-squares dual characterization for ROI assessment in emission tomography
NASA Astrophysics Data System (ADS)
Ben Bouallègue, F.; Crouzet, J. F.; Dubois, A.; Buvat, I.; Mariano-Goulart, D.
2013-06-01
Our aim is to describe an original method for estimating the statistical properties of regions of interest (ROIs) in emission tomography. Drawn upon the works of Louis on the approximate inverse, we propose a dual formulation of the ROI estimation problem to derive the ROI activity and variance directly from the measured data without any image reconstruction. The method requires the definition of an ROI characteristic function that can be extracted from a co-registered morphological image. This characteristic function can be smoothed to optimize the resolution-variance tradeoff. An iterative procedure is detailed for the solution of the dual problem in the least-squares sense (least-squares dual (LSD) characterization), and a linear extrapolation scheme is described to compensate for sampling partial volume effect and reduce the estimation bias (LSD-ex). LSD and LSD-ex are compared with classical ROI estimation using pixel summation after image reconstruction and with Huesman's method. For this comparison, we used Monte Carlo simulations (GATE simulation tool) of 2D PET data of a Hoffman brain phantom containing three small uniform high-contrast ROIs and a large non-uniform low-contrast ROI. Our results show that the performances of LSD characterization are at least as good as those of the classical methods in terms of root mean square (RMS) error. For the three small tumor regions, LSD-ex allows a reduction in the estimation bias by up to 14%, resulting in a reduction in the RMS error of up to 8.5%, compared with the optimal classical estimation. For the large non-specific region, LSD using appropriate smoothing could intuitively and efficiently handle the resolution-variance tradeoff.
Flügge, Tabea V; Schlager, Stefan; Nelson, Katja; Nahles, Susanne; Metzger, Marc C
2013-09-01
Digital impression devices are used alternatively to conventional impression techniques and materials. The aims of this study were to evaluate the precision of digital intraoral scanning under clinical conditions (iTero; Align Technologies, San Jose, Calif) and to compare it with the precision of extraoral digitization. One patient received 10 full-arch intraoral scans with the iTero and conventional impressions with a polyether impression material (Impregum Penta; 3M ESPE, Seefeld, Germany). Stone cast models manufactured from the impressions were digitized 10 times with an extraoral scanner (D250; 3Shape, Copenhagen, Denmark) and 10 times with the iTero. Virtual models provided by each method were roughly aligned, and the model edges were trimmed with cutting planes to create common borders (Rapidform XOR; Inus Technologies, Seoul, Korea). A second model alignment was then performed along the closest distances of the surfaces (Artec Studio software; Artec Group, Luxembourg, Luxembourg). To assess precision, deviations between corresponding models were compared. Repeated intraoral scanning was evaluated in group 1, repeated extraoral model scanning with the iTero was assessed in group 2, and repeated model scanning with the D250 was assessed in group 3. Deviations between models were measured and expressed as maximums, means, medians, and root mean square errors for quantitative analysis. Color-coded displays of the deviations allowed qualitative visualization of the deviations. The greatest deviations and therefore the lowest precision were in group 1, with mean deviations of 50 μm, median deviations of 37 μm, and root mean square errors of 73 μm. Group 2 showed a higher precision, with mean deviations of 25 μm, median deviations of 18 μm, and root mean square errors of 51 μm. Scanning with the D250 had the highest precision, with mean deviations of 10 μm, median deviations of 5 μm, and root mean square errors of 20 μm. Intraoral and extraoral scanning with the iTero resulted in deviations at the facial surfaces of the anterior teeth and the buccal molar surfaces. Scanning with the iTero is less accurate than scanning with the D250. Intraoral scanning with the iTero is less accurate than model scanning with the iTero, suggesting that the intraoral conditions (saliva, limited spacing) contribute to the inaccuracy of a scan. For treatment planning and manufacturing of tooth-supported appliances, virtual models created with the iTero can be used. An extended scanning protocol could improve the scanning results in some regions. Copyright © 2013 American Association of Orthodontists. Published by Mosby, Inc. All rights reserved.
Near infrared spectroscopy for prediction of antioxidant compounds in the honey.
Escuredo, Olga; Seijo, M Carmen; Salvador, Javier; González-Martín, M Inmaculada
2013-12-15
The selection of antioxidant variables in honey is first time considered applying the near infrared (NIR) spectroscopic technique. A total of 60 honey samples were used to develop the calibration models using the modified partial least squares (MPLS) regression method and 15 samples were used for external validation. Calibration models on honey matrix for the estimation of phenols, flavonoids, vitamin C, antioxidant capacity (DPPH), oxidation index and copper using near infrared (NIR) spectroscopy has been satisfactorily obtained. These models were optimised by cross-validation, and the best model was evaluated according to multiple correlation coefficient (RSQ), standard error of cross-validation (SECV), ratio performance deviation (RPD) and root mean standard error (RMSE) in the prediction set. The result of these statistics suggested that the equations developed could be used for rapid determination of antioxidant compounds in honey. This work shows that near infrared spectroscopy can be considered as rapid tool for the nondestructive measurement of antioxidant constitutes as phenols, flavonoids, vitamin C and copper and also the antioxidant capacity in the honey. Copyright © 2013 Elsevier Ltd. All rights reserved.
Comparison of laser ray-tracing and skiascopic ocular wavefront-sensing devices
Bartsch, D-UG; Bessho, K; Gomez, L; Freeman, WR
2009-01-01
Purpose To compare two wavefront-sensing devices based on different principles. Methods Thirty-eight healthy eyes of 19 patients were measured five times in the reproducibility study. Twenty eyes of 10 patients were measured in the comparison study. The Tracey Visual Function Analyzer (VFA), based on the ray-tracing principle and the Nidek optical pathway difference (OPD)-Scan, based on the dynamic skiascopy principle were compared. Standard deviation (SD) of root mean square (RMS) errors was compared to verify the reproducibility. We evaluated RMS errors, Zernike terms and conventional refractive indexes (Sph, Cyl, Ax, and spherical equivalent). Results In RMS errors reading, both devices showed similar ratios of SD to the mean measurement value (VFA: 57.5±11.7%, OPD-Scan: 53.9±10.9%). Comparison on the same eye showed that almost all terms were significantly greater using the VFA than using the OPD-Scan. However, certain high spatial frequency aberrations (tetrafoil, pentafoil, and hexafoil) were consistently measured near zero with the OPD-Scan. Conclusion Both devices showed similar level of reproducibility; however, there was considerable difference in the wavefront reading between machines when measuring the same eye. Differences in the number of sample points, centration, and measurement algorithms between the two instruments may explain our results. PMID:17571088
Skinner, Kenneth D.
2011-01-01
High-quality elevation data in riverine environments are important for fisheries management applications and the accuracy of such data needs to be determined for its proper application. The Experimental Advanced Airborne Research LiDAR (Light Detection and Ranging)-or EAARL-system was used to obtain topographic and bathymetric data along the Deadwood and South Fork Boise Rivers in west-central Idaho. The EAARL data were post-processed into bare earth and bathymetric raster and point datasets. Concurrently with the EAARL surveys, real-time kinematic global positioning system surveys were made in three areas along each of the rivers to assess the accuracy of the EAARL elevation data in different hydrogeomorphic settings. The accuracies of the EAARL-derived raster elevation values, determined in open, flat terrain, to provide an optimal vertical comparison surface, had root mean square errors ranging from 0.134 to 0.347 m. Accuracies in the elevation values for the stream hydrogeomorphic settings had root mean square errors ranging from 0.251 to 0.782 m. The greater root mean square errors for the latter data are the result of complex hydrogeomorphic environments within the streams, such as submerged aquatic macrophytes and air bubble entrainment; and those along the banks, such as boulders, woody debris, and steep slopes. These complex environments reduce the accuracy of EAARL bathymetric and topographic measurements. Steep banks emphasize the horizontal location discrepancies between the EAARL and ground-survey data and may not be good representations of vertical accuracy. The EAARL point to ground-survey comparisons produced results with slightly higher but similar root mean square errors than those for the EAARL raster to ground-survey comparisons, emphasizing the minimized horizontal offset by using interpolated values from the raster dataset at the exact location of the ground-survey point as opposed to an actual EAARL point within a 1-meter distance. The average error for the wetted stream channel surface areas was -0.5 percent, while the average error for the wetted stream channel volume was -8.3 percent. The volume of the wetted river channel was underestimated by an average of 31 percent in half of the survey areas, and overestimated by an average of 14 percent in the remainder of the survey areas. The EAARL system is an efficient way to obtain topographic and bathymetric data in large areas of remote terrain. The elevation accuracy of the EAARL system varies throughout the area depending upon the hydrogeomorphic setting, preventing the use of a single accuracy value to describe the EAARL system. The elevation accuracy variations should be kept in mind when using the data, such as for hydraulic modeling or aquatic habitat assessments.
The Drag-based Ensemble Model (DBEM) for Coronal Mass Ejection Propagation
NASA Astrophysics Data System (ADS)
Dumbović, Mateja; Čalogović, Jaša; Vršnak, Bojan; Temmer, Manuela; Mays, M. Leila; Veronig, Astrid; Piantschitsch, Isabell
2018-02-01
The drag-based model for heliospheric propagation of coronal mass ejections (CMEs) is a widely used analytical model that can predict CME arrival time and speed at a given heliospheric location. It is based on the assumption that the propagation of CMEs in interplanetary space is solely under the influence of magnetohydrodynamical drag, where CME propagation is determined based on CME initial properties as well as the properties of the ambient solar wind. We present an upgraded version, the drag-based ensemble model (DBEM), that covers ensemble modeling to produce a distribution of possible ICME arrival times and speeds. Multiple runs using uncertainty ranges for the input values can be performed in almost real-time, within a few minutes. This allows us to define the most likely ICME arrival times and speeds, quantify prediction uncertainties, and determine forecast confidence. The performance of the DBEM is evaluated and compared to that of ensemble WSA-ENLIL+Cone model (ENLIL) using the same sample of events. It is found that the mean error is ME = ‑9.7 hr, mean absolute error MAE = 14.3 hr, and root mean square error RMSE = 16.7 hr, which is somewhat higher than, but comparable to ENLIL errors (ME = ‑6.1 hr, MAE = 12.8 hr and RMSE = 14.4 hr). Overall, DBEM and ENLIL show a similar performance. Furthermore, we find that in both models fast CMEs are predicted to arrive earlier than observed, most likely owing to the physical limitations of models, but possibly also related to an overestimation of the CME initial speed for fast CMEs.
Mutual information estimation for irregularly sampled time series
NASA Astrophysics Data System (ADS)
Rehfeld, K.; Marwan, N.; Heitzig, J.; Kurths, J.
2012-04-01
For the automated, objective and joint analysis of time series, similarity measures are crucial. Used in the analysis of climate records, they allow for a complimentary, unbiased view onto sparse datasets. The irregular sampling of many of these time series, however, makes it necessary to either perform signal reconstruction (e.g. interpolation) or to develop and use adapted measures. Standard linear interpolation comes with an inevitable loss of information and bias effects. We have recently developed a Gaussian kernel-based correlation algorithm with which the interpolation error can be substantially lowered, but this would not work should the functional relationship in a bivariate setting be non-linear. We therefore propose an algorithm to estimate lagged auto and cross mutual information from irregularly sampled time series. We have extended the standard and adaptive binning histogram estimators and use Gaussian distributed weights in the estimation of the (joint) probabilities. To test our method we have simulated linear and nonlinear auto-regressive processes with Gamma-distributed inter-sampling intervals. We have then performed a sensitivity analysis for the estimation of actual coupling length, the lag of coupling and the decorrelation time in the synthetic time series and contrast our results to the performance of a signal reconstruction scheme. Finally we applied our estimator to speleothem records. We compare the estimated memory (or decorrelation time) to that from a least-squares estimator based on fitting an auto-regressive process of order 1. The calculated (cross) mutual information results are compared for the different estimators (standard or adaptive binning) and contrasted with results from signal reconstruction. We find that the kernel-based estimator has a significantly lower root mean square error and less systematic sampling bias than the interpolation-based method. It is possible that these encouraging results could be further improved by using non-histogram mutual information estimators, like k-Nearest Neighbor or Kernel-Density estimators, but for short (<1000 points) and irregularly sampled datasets the proposed algorithm is already a great improvement.
Improving Bandwidth Utilization in a 1 Tbps Airborne MIMO Communications Downlink
2013-03-21
number of transmitters). C = log2 ∣∣∣∣∣INr + EsNtN0 HHH ∣∣∣∣∣ (2.32) In the signal to noise ratio, Es represents the total energy from all transmitters...channel matrix pseudo-inverse is computed by (2.36) [6, p. 970] 31 H+ = ( HHH )−1HH. (2.36) 2.6.5 Minimum Mean-Squared Error Detection. Minimum Mean Squared...H† = ( HHH + Nt SNR I )−1 HH . (3.14) Equation (3.14) was defined in [2] as an implementation of a MMSE equalizer, and was applied to the received
Sample allocation balancing overall representativeness and stratum precision.
Diaz-Quijano, Fredi Alexander
2018-05-07
In large-scale surveys, it is often necessary to distribute a preset sample size among a number of strata. Researchers must make a decision between prioritizing overall representativeness or precision of stratum estimates. Hence, I evaluated different sample allocation strategies based on stratum size. The strategies evaluated herein included allocation proportional to stratum population; equal sample for all strata; and proportional to the natural logarithm, cubic root, and square root of the stratum population. This study considered the fact that, from a preset sample size, the dispersion index of stratum sampling fractions is correlated with the population estimator error and the dispersion index of stratum-specific sampling errors would measure the inequality in precision distribution. Identification of a balanced and efficient strategy was based on comparing those both dispersion indices. Balance and efficiency of the strategies changed depending on overall sample size. As the sample to be distributed increased, the most efficient allocation strategies were equal sample for each stratum; proportional to the logarithm, to the cubic root, to square root; and that proportional to the stratum population, respectively. Depending on sample size, each of the strategies evaluated could be considered in optimizing the sample to keep both overall representativeness and stratum-specific precision. Copyright © 2018 Elsevier Inc. All rights reserved.
Maximum Likelihood Time-of-Arrival Estimation of Optical Pulses via Photon-Counting Photodetectors
NASA Technical Reports Server (NTRS)
Erkmen, Baris I.; Moision, Bruce E.
2010-01-01
Many optical imaging, ranging, and communications systems rely on the estimation of the arrival time of an optical pulse. Recently, such systems have been increasingly employing photon-counting photodetector technology, which changes the statistics of the observed photocurrent. This requires time-of-arrival estimators to be developed and their performances characterized. The statistics of the output of an ideal photodetector, which are well modeled as a Poisson point process, were considered. An analytical model was developed for the mean-square error of the maximum likelihood (ML) estimator, demonstrating two phenomena that cause deviations from the minimum achievable error at low signal power. An approximation was derived to the threshold at which the ML estimator essentially fails to provide better than a random guess of the pulse arrival time. Comparing the analytic model performance predictions to those obtained via simulations, it was verified that the model accurately predicts the ML performance over all regimes considered. There is little prior art that attempts to understand the fundamental limitations to time-of-arrival estimation from Poisson statistics. This work establishes both a simple mathematical description of the error behavior, and the associated physical processes that yield this behavior. Previous work on mean-square error characterization for ML estimators has predominantly focused on additive Gaussian noise. This work demonstrates that the discrete nature of the Poisson noise process leads to a distinctly different error behavior.
Optical pattern recognition architecture implementing the mean-square error correlation algorithm
Molley, Perry A.
1991-01-01
An optical architecture implementing the mean-square error correlation algorithm, MSE=.SIGMA.[I-R].sup.2 for discriminating the presence of a reference image R in an input image scene I by computing the mean-square-error between a time-varying reference image signal s.sub.1 (t) and a time-varying input image signal s.sub.2 (t) includes a laser diode light source which is temporally modulated by a double-sideband suppressed-carrier source modulation signal I.sub.1 (t) having the form I.sub.1 (t)=A.sub.1 [1+.sqroot.2m.sub.1 s.sub.1 (t)cos (2.pi.f.sub.o t)] and the modulated light output from the laser diode source is diffracted by an acousto-optic deflector. The resultant intensity of the +1 diffracted order from the acousto-optic device is given by: I.sub.2 (t)=A.sub.2 [+2m.sub.2.sup.2 s.sub.2.sup.2 (t)-2.sqroot.2m.sub.2 (t) cos (2.pi.f.sub.o t] The time integration of the two signals I.sub.1 (t) and I.sub.2 (t) on the CCD deflector plane produces the result R(.tau.) of the mean-square error having the form: R(.tau.)=A.sub.1 A.sub.2 {[T]+[2m.sub.2.sup.2.multidot..intg.s.sub.2.sup.2 (t-.tau.)dt]-[2m.sub.1 m.sub.2 cos (2.tau.f.sub.o .tau.).multidot..intg.s.sub.1 (t)s.sub.2 (t-.tau.)dt]} where: s.sub.1 (t) is the signal input to the diode modulation source: s.sub.2 (t) is the signal input to the AOD modulation source; A.sub.1 is the light intensity; A.sub.2 is the diffraction efficiency; m.sub.1 and m.sub.2 are constants that determine the signal-to-bias ratio; f.sub.o is the frequency offset between the oscillator at f.sub.c and the modulation at f.sub.c +f.sub.o ; and a.sub.o and a.sub.1 are constant chosen to bias the diode source and the acousto-optic deflector into their respective linear operating regions so that the diode source exhibits a linear intensity characteristic and the AOD exhibits a linear amplitude characteristic.
da Silva, Neirivaldo Cavalcante; Honorato, Ricardo Saldanha; Pimentel, Maria Fernanda; Garrigues, Salvador; Cervera, Maria Luisa; de la Guardia, Miguel
2015-09-01
There is an increasing demand for herbal medicines in weight loss treatment. Some synthetic chemicals, such as sibutramine (SB), have been detected as adulterants in herbal formulations. In this study, two strategies using near infrared (NIR) spectroscopy have been developed to evaluate potential adulteration of herbal medicines with SB: a qualitative screening approach and a quantitative methodology based on multivariate calibration. Samples were composed by products commercialized as herbal medicines, as well as by laboratory adulterated samples. Spectra were obtained in the range of 14,000-4000 per cm. Using PLS-DA, a correct classification of 100% was achieved for the external validation set. In the quantitative approach, the root mean squares error of prediction (RMSEP), for both PLS and MLR models, was 0.2% w/w. The results prove the potential of NIR spectroscopy and multivariate calibration in quantifying sibutramine in adulterated herbal medicines samples. © 2015 American Academy of Forensic Sciences.
[Determination of acidity and vitamin C in apples using portable NIR analyzer].
Yang, Fan; Li, Ya-Ting; Gu, Xuan; Ma, Jiang; Fan, Xing; Wang, Xiao-Xuan; Zhang, Zhuo-Yong
2011-09-01
Near infrared (NIR) spectroscopy technology based on a portable NIR analyzer, combined with kernel Isomap algorithm and generalized regression neural network (GRNN) has been applied to establishing quantitative models for prediction of acidity and vitamin C in six kinds of apple samples. The obtained results demonstrated that the fitting and the predictive accuracy of the models with kernel Isomap algorithm were satisfactory. The correlation between actual and predicted values of calibration samples (R(c)) obtained by the acidity model was 0.999 4, and for prediction samples (R(p)) was 0.979 9. The root mean square error of prediction set (RMSEP) was 0.055 8. For the vitamin C model, R(c) was 0.989 1, R(p) was 0.927 2, and RMSEP was 4.043 1. Results proved that the portable NIR analyzer can be a feasible tool for the determination of acidity and vitamin C in apples.
A Matlab Program for Textural Classification Using Neural Networks
NASA Astrophysics Data System (ADS)
Leite, E. P.; de Souza, C.
2008-12-01
A new MATLAB code that provides tools to perform classification of textural images for applications in the Geosciences is presented. The program, here coined TEXTNN, comprises the computation of variogram maps in the frequency domain for specific lag distances in the neighborhood of a pixel. The result is then converted back to spatial domain, where directional or ominidirectional semivariograms are extracted. Feature vectors are built with textural information composed of the semivariance values at these lag distances and, moreover, with histogram measures of mean, standard deviation and weighted fill-ratio. This procedure is applied to a selected group of pixels or to all pixels in an image using a moving window. A feed- forward back-propagation Neural Network can then be designed and trained on feature vectors of predefined classes (training set). The training phase minimizes the mean-squared error on the training set. Additionally, at each iteration, the mean-squared error for every validation is assessed and a test set is evaluated. The program also calculates contingency matrices, global accuracy and kappa coefficient for the three data sets, allowing a quantitative appraisal of the predictive power of the Neural Network models. The interpreter is able to select the best model obtained from a k-fold cross-validation or to use a unique split-sample data set for classification of all pixels in a given textural image. The code is opened to the geoscientific community and is very flexible, allowing the experienced user to modify it as necessary. The performance of the algorithms and the end-user program were tested using synthetic images, orbital SAR (RADARSAT) imagery for oil seepage detection, and airborne, multi-polarimetric SAR imagery for geologic mapping. The overall results proved very promising.
Computation of misalignment and primary mirror astigmatism figure error of two-mirror telescopes
NASA Astrophysics Data System (ADS)
Gu, Zhiyuan; Wang, Yang; Ju, Guohao; Yan, Changxiang
2018-01-01
Active optics usually uses the computation models based on numerical methods to correct misalignments and figure errors at present. These methods can hardly lead to any insight into the aberration field dependencies that arise in the presence of the misalignments. An analytical alignment model based on third-order nodal aberration theory is presented for this problem, which can be utilized to compute the primary mirror astigmatic figure error and misalignments for two-mirror telescopes. Alignment simulations are conducted for an R-C telescope based on this analytical alignment model. It is shown that in the absence of wavefront measurement errors, wavefront measurements at only two field points are enough, and the correction process can be completed with only one alignment action. In the presence of wavefront measurement errors, increasing the number of field points for wavefront measurements can enhance the robustness of the alignment model. Monte Carlo simulation shows that, when -2 mm ≤ linear misalignment ≤ 2 mm, -0.1 deg ≤ angular misalignment ≤ 0.1 deg, and -0.2 λ ≤ astigmatism figure error (expressed as fringe Zernike coefficients C5 / C6, λ = 632.8 nm) ≤0.2 λ, the misaligned systems can be corrected to be close to nominal state without wavefront testing error. In addition, the root mean square deviation of RMS wavefront error of all the misaligned samples after being corrected is linearly related to wavefront testing error.
Choosing the Number of Clusters in K-Means Clustering
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.
2011-01-01
Steinley (2007) provided a lower bound for the sum-of-squares error criterion function used in K-means clustering. In this article, on the basis of the lower bound, the authors propose a method to distinguish between 1 cluster (i.e., a single distribution) versus more than 1 cluster. Additionally, conditional on indicating there are multiple…
Optimal Sampling to Provide User-Specific Climate Information.
NASA Astrophysics Data System (ADS)
Panturat, Suwanna
The types of weather-related world problems which are of socio-economic importance selected in this study as representative of three different levels of user groups include: (i) a regional problem concerned with air pollution plumes which lead to acid rain in the north eastern United States, (ii) a state-level problem in the form of winter wheat production in Oklahoma, and (iii) an individual-level problem involving reservoir management given errors in rainfall estimation at Lake Ellsworth, upstream from Lawton, Oklahoma. The study is aimed at designing optimal sampling networks which are based on customer value systems and also abstracting from data sets that information which is most cost-effective in reducing the climate-sensitive aspects of a given user problem. Three process models being used in this study to interpret climate variability in terms of the variables of importance to the user comprise: (i) the HEFFTER-SAMSON diffusion model as the climate transfer function for acid rain, (ii) the CERES-MAIZE plant process model for winter wheat production and (iii) the AGEHYD streamflow model selected as "a black box" for reservoir management. A state-of-the-art Non Linear Program (NLP) algorithm for minimizing an objective function is employed to determine the optimal number and location of various sensors. Statistical quantities considered in determining sensor locations including Bayes Risk, the chi-squared value, the probability of the Type I error (alpha) and the probability of the Type II error (beta) and the noncentrality parameter delta^2. Moreover, the number of years required to detect a climate change resulting in a given bushel per acre change in mean wheat production is determined; the number of seasons of observations required to reduce the standard deviation of the error variance of the ambient sulfur dioxide to less than a certain percent of the mean is found; and finally the policy of maintaining pre-storm flood pools at selected levels is examined given information from the optimal sampling network as defined by the study.
Artificial Intelligence Techniques for Predicting and Mapping Daily Pan Evaporation
NASA Astrophysics Data System (ADS)
Arunkumar, R.; Jothiprakash, V.; Sharma, Kirty
2017-09-01
In this study, Artificial Intelligence techniques such as Artificial Neural Network (ANN), Model Tree (MT) and Genetic Programming (GP) are used to develop daily pan evaporation time-series (TS) prediction and cause-effect (CE) mapping models. Ten years of observed daily meteorological data such as maximum temperature, minimum temperature, relative humidity, sunshine hours, dew point temperature and pan evaporation are used for developing the models. For each technique, several models are developed by changing the number of inputs and other model parameters. The performance of each model is evaluated using standard statistical measures such as Mean Square Error, Mean Absolute Error, Normalized Mean Square Error and correlation coefficient (R). The results showed that daily TS-GP (4) model predicted better with a correlation coefficient of 0.959 than other TS models. Among various CE models, CE-ANN (6-10-1) resulted better than MT and GP models with a correlation coefficient of 0.881. Because of the complex non-linear inter-relationship among various meteorological variables, CE mapping models could not achieve the performance of TS models. From this study, it was found that GP performs better for recognizing single pattern (time series modelling), whereas ANN is better for modelling multiple patterns (cause-effect modelling) in the data.
Curran, Christopher A.; Eng, Ken; Konrad, Christopher P.
2012-01-01
Regional low-flow regression models for estimating Q7,10 at ungaged stream sites are developed from the records of daily discharge at 65 continuous gaging stations (including 22 discontinued gaging stations) for the purpose of evaluating explanatory variables. By incorporating the base-flow recession time constant τ as an explanatory variable in the regression model, the root-mean square error for estimating Q7,10 at ungaged sites can be lowered to 72 percent (for known values of τ), which is 42 percent less than if only basin area and mean annual precipitation are used as explanatory variables. If partial-record sites are included in the regression data set, τ must be estimated from pairs of discharge measurements made during continuous periods of declining low flows. Eight measurement pairs are optimal for estimating τ at partial-record sites, and result in a lowering of the root-mean square error by 25 percent. A low-flow survey strategy that includes paired measurements at partial-record sites requires additional effort and planning beyond a standard strategy, but could be used to enhance regional estimates of τ and potentially reduce the error of regional regression models for estimating low-flow characteristics at ungaged sites.
Santos-Martins, Diogo; Fernandes, Pedro Alexandrino; Ramos, Maria João
2016-11-01
In the context of SAMPL5, we submitted blind predictions of the cyclohexane/water distribution coefficient (D) for a series of 53 drug-like molecules. Our method is purely empirical and based on the additive contribution of each solute atom to the free energy of solvation in water and in cyclohexane. The contribution of each atom depends on the atom type and on the exposed surface area. Comparatively to similar methods in the literature, we used a very small set of atomic parameters: only 10 for solvation in water and 1 for solvation in cyclohexane. As a result, the method is protected from overfitting and the error in the blind predictions could be reasonably estimated. Moreover, this approach is fast: it takes only 0.5 s to predict the distribution coefficient for all 53 SAMPL5 compounds, allowing its application in virtual screening campaigns. The performance of our approach (submission 49) is modest but satisfactory in view of its efficiency: the root mean square error (RMSE) was 3.3 log D units for the 53 compounds, while the RMSE of the best performing method (using COSMO-RS) was 2.1 (submission 16). Our method is implemented as a Python script available at https://github.com/diogomart/SAMPL5-DC-surface-empirical .
Lipiäinen, Tiina; Fraser-Miller, Sara J; Gordon, Keith C; Strachan, Clare J
2018-02-05
This study considers the potential of low-frequency (terahertz) Raman spectroscopy in the quantitative analysis of ternary mixtures of solid-state forms. Direct comparison between low-frequency and mid-frequency spectral regions for quantitative analysis of crystal form mixtures, without confounding sampling and instrumental variations, is reported for the first time. Piroxicam was used as a model drug, and the low-frequency spectra of piroxicam forms β, α2 and monohydrate are presented for the first time. These forms show clear spectral differences in both the low- and mid-frequency regions. Both spectral regions provided quantitative models suitable for predicting the mixture compositions using partial least squares regression (PLSR), but the low-frequency data gave better models, based on lower errors of prediction (2.7, 3.1 and 3.2% root-mean-square errors of prediction [RMSEP] values for the β, α2 and monohydrate forms, respectively) than the mid-frequency data (6.3, 5.4 and 4.8%, for the β, α2 and monohydrate forms, respectively). The better performance of low-frequency Raman analysis was attributed to larger spectral differences between the solid-state forms, combined with a higher signal-to-noise ratio. Copyright © 2017 Elsevier B.V. All rights reserved.
Eliciting Naturalistic Cortical Responses with a Sensory Prosthesis via Optimized Microstimulation
2016-08-12
error and correlation as metrics amenable to highly efficient convex optimization. This study concentrates on characterizing the neural responses to both...spiking signal. For LFP, distance measures such as the traditional mean-squared error and cross- correlation can be used, whereas distances between spike...with parameters that describe their associated temporal dynamics and relations to the observed output. A description of the model follows, but we
Two-body potential model based on cosine series expansion for ionic materials
Oda, Takuji; Weber, William J.; Tanigawa, Hisashi
2015-09-23
There is a method to construct a two-body potential model for ionic materials with a Fourier series basis and we examine it. For this method, the coefficients of cosine basis functions are uniquely determined by solving simultaneous linear equations to minimize the sum of weighted mean square errors in energy, force and stress, where first-principles calculation results are used as the reference data. As a validation test of the method, potential models for magnesium oxide are constructed. The mean square errors appropriately converge with respect to the truncation of the cosine series. This result mathematically indicates that the constructed potentialmore » model is sufficiently close to the one that is achieved with the non-truncated Fourier series and demonstrates that this potential virtually provides minimum error from the reference data within the two-body representation. The constructed potential models work appropriately in both molecular statics and dynamics simulations, especially if a two-step correction to revise errors expected in the reference data is performed, and the models clearly outperform two existing Buckingham potential models that were tested. Moreover, the good agreement over a broad range of energies and forces with first-principles calculations should enable the prediction of materials behavior away from equilibrium conditions, such as a system under irradiation.« less
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-25
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.
Limited sampling strategy models for estimating the AUC of gliclazide in Chinese healthy volunteers.
Huang, Ji-Han; Wang, Kun; Huang, Xiao-Hui; He, Ying-Chun; Li, Lu-Jin; Sheng, Yu-Cheng; Yang, Juan; Zheng, Qing-Shan
2013-06-01
The aim of this work is to reduce the cost of required sampling for the estimation of the area under the gliclazide plasma concentration versus time curve within 60 h (AUC0-60t ). The limited sampling strategy (LSS) models were established and validated by the multiple regression model within 4 or fewer gliclazide concentration values. Absolute prediction error (APE), root of mean square error (RMSE) and visual prediction check were used as criterion. The results of Jack-Knife validation showed that 10 (25.0 %) of the 40 LSS based on the regression analysis were not within an APE of 15 % using one concentration-time point. 90.2, 91.5 and 92.4 % of the 40 LSS models were capable of prediction using 2, 3 and 4 points, respectively. Limited sampling strategies were developed and validated for estimating AUC0-60t of gliclazide. This study indicates that the implementation of an 80 mg dosage regimen enabled accurate predictions of AUC0-60t by the LSS model. This study shows that 12, 6, 4, 2 h after administration are the key sampling times. The combination of (12, 2 h), (12, 8, 2 h) or (12, 8, 4, 2 h) can be chosen as sampling hours for predicting AUC0-60t in practical application according to requirement.
NIR spectroscopic measurement of moisture content in Scots pine seeds.
Lestander, Torbjörn A; Geladi, Paul
2003-04-01
When tree seeds are used for seedling production it is important that they are of high quality in order to be viable. One of the factors influencing viability is moisture content and an ideal quality control system should be able to measure this factor quickly for each seed. Seed moisture content within the range 3-34% was determined by near-infrared (NIR) spectroscopy on Scots pine (Pinus sylvestris L.) single seeds and on bulk seed samples consisting of 40-50 seeds. The models for predicting water content from the spectra were made by partial least squares (PLS) and ordinary least squares (OLS) regression. Different conditions were simulated involving both using less wavelengths and going from samples to single seeds. Reflectance and transmission measurements were used. Different spectral pretreatment methods were tested on the spectra. Including bias, the lowest prediction errors for PLS models based on reflectance within 780-2280 nm from bulk samples and single seeds were 0.8% and 1.9%, respectively. Reduction of the single seed reflectance spectrum to 850-1048 nm gave higher biases and prediction errors in the test set. In transmission (850-1048 nm) the prediction error was 2.7% for single seeds. OLS models based on simulated 4-sensor single seed system consisting of optical filters with Gaussian transmission indicated more than 3.4% error in prediction. A practical F-test based on test sets to differentiate models is introduced.
Empirical State Error Covariance Matrix for Batch Estimation
NASA Technical Reports Server (NTRS)
Frisbee, Joe
2015-01-01
State estimation techniques effectively provide mean state estimates. However, the theoretical state error covariance matrices provided as part of these techniques often suffer from a lack of confidence in their ability to describe the uncertainty in the estimated states. By a reinterpretation of the equations involved in the weighted batch least squares algorithm, it is possible to directly arrive at an empirical state error covariance matrix. The proposed empirical state error covariance matrix will contain the effect of all error sources, known or not. This empirical error covariance matrix may be calculated as a side computation for each unique batch solution. Results based on the proposed technique will be presented for a simple, two observer and measurement error only problem.
Optimal estimation of large structure model errors. [in Space Shuttle controller design
NASA Technical Reports Server (NTRS)
Rodriguez, G.
1979-01-01
In-flight estimation of large structure model errors is usually required as a means of detecting inevitable deficiencies in large structure controller/estimator models. The present paper deals with a least-squares formulation which seeks to minimize a quadratic functional of the model errors. The properties of these error estimates are analyzed. It is shown that an arbitrary model error can be decomposed as the sum of two components that are orthogonal in a suitably defined function space. Relations between true and estimated errors are defined. The estimates are found to be approximations that retain many of the significant dynamics of the true model errors. Current efforts are directed toward application of the analytical results to a reference large structure model.
Baltieri, Danilo Antonio; Luísa de Souza Gatti, Ana; Henrique de Oliveira, Vitor; Junqueira Aguiar, Ana Saito; Almeida de Souza Aranha e Silva, Renata
2016-02-01
Although men constitute the widest consumer group of pornography, the Internet has facilitated both the production of and access to pornographic material by women as well. However, few measures are available to examine pornography-use constructs, which can compromise the reliability of statements regarding the harmful use of pornography. Our study aimed to confirm the factorial validity and internal consistency of the Pornography Consumption Inventory (PCI) in a sample of female university students in Brazil. The PCI is a four-factor, 15-item, five-point Likert-type scale. After translation and back-translation of the PCI, it was administered to 105 female medical students. Exploratory and confirmatory factor analyses were conducted to examine the construct validity. The results supported the four-factor model of the PCI. The model showed adequate internal reliability and good fit indices (comparative fit index (CFI) = 0.95, Tucker-Lewis index (TLI) = 0.94, root mean square error of approximation (RMSEA) = 0.07 (95% confidence interval (CI) = 0.04-0.09), and standardized root mean square residual (SRMR) = 0.08). Overall, the findings from this study support the use of the PCI in Portuguese-speaking women. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
NASA Astrophysics Data System (ADS)
Ali, Mumtaz; Deo, Ravinesh C.; Downs, Nathan J.; Maraseni, Tek
2018-07-01
Forecasting drought by means of the World Meteorological Organization-approved Standardized Precipitation Index (SPI) is considered to be a fundamental task to support socio-economic initiatives and effectively mitigating the climate-risk. This study aims to develop a robust drought modelling strategy to forecast multi-scalar SPI in drought-rich regions of Pakistan where statistically significant lagged combinations of antecedent SPI are used to forecast future SPI. With ensemble-Adaptive Neuro Fuzzy Inference System ('ensemble-ANFIS') executed via a 10-fold cross-validation procedure, a model is constructed by randomly partitioned input-target data. Resulting in 10-member ensemble-ANFIS outputs, judged by mean square error and correlation coefficient in the training period, the optimal forecasts are attained by the averaged simulations, and the model is benchmarked with M5 Model Tree and Minimax Probability Machine Regression (MPMR). The results show the proposed ensemble-ANFIS model's preciseness was notably better (in terms of the root mean square and mean absolute error including the Willmott's, Nash-Sutcliffe and Legates McCabe's index) for the 6- and 12- month compared to the 3-month forecasts as verified by the largest error proportions that registered in smallest error band. Applying 10-member simulations, ensemble-ANFIS model was validated for its ability to forecast severity (S), duration (D) and intensity (I) of drought (including the error bound). This enabled uncertainty between multi-models to be rationalized more efficiently, leading to a reduction in forecast error caused by stochasticity in drought behaviours. Through cross-validations at diverse sites, a geographic signature in modelled uncertainties was also calculated. Considering the superiority of ensemble-ANFIS approach and its ability to generate uncertainty-based information, the study advocates the versatility of a multi-model approach for drought-risk forecasting and its prime importance for estimating drought properties over confidence intervals to generate better information for strategic decision-making.
Automated body weight prediction of dairy cows using 3-dimensional vision.
Song, X; Bokkers, E A M; van der Tol, P P J; Groot Koerkamp, P W G; van Mourik, S
2018-05-01
The objectives of this study were to quantify the error of body weight prediction using automatically measured morphological traits in a 3-dimensional (3-D) vision system and to assess the influence of various sources of uncertainty on body weight prediction. In this case study, an image acquisition setup was created in a cow selection box equipped with a top-view 3-D camera. Morphological traits of hip height, hip width, and rump length were automatically extracted from the raw 3-D images taken of the rump area of dairy cows (n = 30). These traits combined with days in milk, age, and parity were used in multiple linear regression models to predict body weight. To find the best prediction model, an exhaustive feature selection algorithm was used to build intermediate models (n = 63). Each model was validated by leave-one-out cross-validation, giving the root mean square error and mean absolute percentage error. The model consisting of hip width (measurement variability of 0.006 m), days in milk, and parity was the best model, with the lowest errors of 41.2 kg of root mean square error and 5.2% mean absolute percentage error. Our integrated system, including the image acquisition setup, image analysis, and the best prediction model, predicted the body weights with a performance similar to that achieved using semi-automated or manual methods. Moreover, the variability of our simplified morphological trait measurement showed a negligible contribution to the uncertainty of body weight prediction. We suggest that dairy cow body weight prediction can be improved by incorporating more predictive morphological traits and by improving the prediction model structure. The Authors. Published by FASS Inc. and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
Quantification of trace metals in infant formula premixes using laser-induced breakdown spectroscopy
NASA Astrophysics Data System (ADS)
Cama-Moncunill, Raquel; Casado-Gavalda, Maria P.; Cama-Moncunill, Xavier; Markiewicz-Keszycka, Maria; Dixit, Yash; Cullen, Patrick J.; Sullivan, Carl
2017-09-01
Infant formula is a human milk substitute generally based upon fortified cow milk components. In order to mimic the composition of breast milk, trace elements such as copper, iron and zinc are usually added in a single operation using a premix. The correct addition of premixes must be verified to ensure that the target levels in infant formulae are achieved. In this study, a laser-induced breakdown spectroscopy (LIBS) system was assessed as a fast validation tool for trace element premixes. LIBS is a promising emission spectroscopic technique for elemental analysis, which offers real-time analyses, little to no sample preparation and ease of use. LIBS was employed for copper and iron determinations of premix samples ranging approximately from 0 to 120 mg/kg Cu/1640 mg/kg Fe. LIBS spectra are affected by several parameters, hindering subsequent quantitative analyses. This work aimed at testing three matrix-matched calibration approaches (simple-linear regression, multi-linear regression and partial least squares regression (PLS)) as means for precision and accuracy enhancement of LIBS quantitative analysis. All calibration models were first developed using a training set and then validated with an independent test set. PLS yielded the best results. For instance, the PLS model for copper provided a coefficient of determination (R2) of 0.995 and a root mean square error of prediction (RMSEP) of 14 mg/kg. Furthermore, LIBS was employed to penetrate through the samples by repetitively measuring the same spot. Consequently, LIBS spectra can be obtained as a function of sample layers. This information was used to explore whether measuring deeper into the sample could reduce possible surface-contaminant effects and provide better quantifications.
Estimating Root Mean Square Errors in Remotely Sensed Soil Moisture over Continental Scale Domains
NASA Technical Reports Server (NTRS)
Draper, Clara S.; Reichle, Rolf; de Jeu, Richard; Naeimi, Vahid; Parinussa, Robert; Wagner, Wolfgang
2013-01-01
Root Mean Square Errors (RMSE) in the soil moisture anomaly time series obtained from the Advanced Scatterometer (ASCAT) and the Advanced Microwave Scanning Radiometer (AMSR-E; using the Land Parameter Retrieval Model) are estimated over a continental scale domain centered on North America, using two methods: triple colocation (RMSETC ) and error propagation through the soil moisture retrieval models (RMSEEP ). In the absence of an established consensus for the climatology of soil moisture over large domains, presenting a RMSE in soil moisture units requires that it be specified relative to a selected reference data set. To avoid the complications that arise from the use of a reference, the RMSE is presented as a fraction of the time series standard deviation (fRMSE). For both sensors, the fRMSETC and fRMSEEP show similar spatial patterns of relatively highlow errors, and the mean fRMSE for each land cover class is consistent with expectations. Triple colocation is also shown to be surprisingly robust to representativity differences between the soil moisture data sets used, and it is believed to accurately estimate the fRMSE in the remotely sensed soil moisture anomaly time series. Comparing the ASCAT and AMSR-E fRMSETC shows that both data sets have very similar accuracy across a range of land cover classes, although the AMSR-E accuracy is more directly related to vegetation cover. In general, both data sets have good skill up to moderate vegetation conditions.
Grelet, C; Bastin, C; Gelé, M; Davière, J-B; Johan, M; Werner, A; Reding, R; Fernandez Pierna, J A; Colinet, F G; Dardenne, P; Gengler, N; Soyeurt, H; Dehareng, F
2016-06-01
To manage negative energy balance and ketosis in dairy farms, rapid and cost-effective detection is needed. Among the milk biomarkers that could be useful for this purpose, acetone and β-hydroxybutyrate (BHB) have been proved as molecules of interest regarding ketosis and citrate was recently identified as an early indicator of negative energy balance. Because Fourier transform mid-infrared spectrometry can provide rapid and cost-effective predictions of milk composition, the objective of this study was to evaluate the ability of this technology to predict these biomarkers in milk. Milk samples were collected in commercial and experimental farms in Luxembourg, France, and Germany. Acetone, BHB, and citrate contents were determined by flow injection analysis. Milk mid-infrared spectra were recorded and standardized for all samples. After edits, a total of 548 samples were used in the calibration and validation data sets for acetone, 558 for BHB, and 506 for citrate. Acetone content ranged from 0.020 to 3.355mmol/L with an average of 0.103mmol/L; BHB content ranged from 0.045 to 1.596mmol/L with an average of 0.215mmol/L; and citrate content ranged from 3.88 to 16.12mmol/L with an average of 9.04mmol/L. Acetone and BHB contents were log-transformed and a part of the samples with low values was randomly excluded to approach a normal distribution. The 3 edited data sets were then randomly divided into a calibration data set (3/4 of the samples) and a validation data set (1/4 of the samples). Prediction equations were developed using partial least square regression. The coefficient of determination (R(2)) of cross-validation was 0.73 for acetone, 0.71 for BHB, and 0.90 for citrate with root mean square error of 0.248, 0.109, and 0.70mmol/L, respectively. Finally, the external validation was performed and R(2) obtained were 0.67 for acetone, 0.63 for BHB, and 0.86 for citrate, with respective root mean square error of validation of 0.196, 0.083, and 0.76mmol/L. Although the practical usefulness of the equations developed should be further verified with other field data, results from this study demonstrated the potential of Fourier transform mid-infrared spectrometry to predict citrate content with good accuracy and to supply indicative contents of BHB and acetone in milk, thereby providing rapid and cost-effective tools to manage ketosis and negative energy balance in dairy farms. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Straub, D.E.
1998-01-01
The streamflow-gaging station network in Ohio was evaluated for its effectiveness in providing regional streamflow information. The analysis involved application of the principles of generalized least squares regression between streamflow and climatic and basin characteristics. Regression equations were developed for three flow characteristics: (1) the instantaneous peak flow with a 100-year recurrence interval (P100), (2) the mean annual flow (Qa), and (3) the 7-day, 10-year low flow (7Q10). All active and discontinued gaging stations with 5 or more years of unregulated-streamflow data with respect to each flow characteristic were used to develop the regression equations. The gaging-station network was evaluated for the current (1996) condition of the network and estimated conditions of various network strategies if an additional 5 and 20 years of streamflow data were collected. Any active or discontinued gaging station with (1) less than 5 years of unregulated-streamflow record, (2) previously defined basin and climatic characteristics, and (3) the potential for collection of more unregulated-streamflow record were included in the network strategies involving the additional 5 and 20 years of data. The network analysis involved use of the regression equations, in combination with location, period of record, and cost of operation, to determine the contribution of the data for each gaging station to regional streamflow information. The contribution of each gaging station was based on a cost-weighted reduction of the mean square error (average sampling-error variance) associated with each regional estimating equation. All gaging stations included in the network analysis were then ranked according to their contribution to the regional information for each flow characteristic. The predictive ability of the regression equations developed from the gaging station network could be improved for all three flow characteristics with the collection of additional streamflow data. The addition of new gaging stations to the network would result in an even greater improvement of the accuracy of the regional regression equations. Typically, continued data collection at stations with unregulated streamflow for all flow conditions that had less than 11 years of record with drainage areas smaller than 200 square miles contributed the largest cost-weighted reduction to the average sampling-error variance of the regional estimating equations. The results of the network analyses can be used to prioritize the continued operation of active gaging stations or the reactivation of discontinued gaging stations if the objective is to maximize the regional information content in the streamflow-gaging station network.
Sampling plantations to determine white-pine weevil injury
Robert L. Talerico; Robert W., Jr. Wilson
1973-01-01
Use of 1/10-acre square plots to obtain estimates of the proportion of never-weeviled trees necessary for evaluating and scheduling white-pine weevil control is described. The optimum number of trees to observe per plot is estimated from data obtained from sample plantations in the Northeast and a table is given. Of sample size required to achieve a standard error of...
Tissue resistivity estimation in the presence of positional and geometrical uncertainties.
Baysal, U; Eyüboğlu, B M
2000-08-01
Geometrical uncertainties (organ boundary variation and electrode position uncertainties) are the biggest sources of error in estimating electrical resistivity of tissues from body surface measurements. In this study, in order to decrease estimation errors, the statistically constrained minimum mean squared error estimation algorithm (MiMSEE) is constrained with a priori knowledge of the geometrical uncertainties in addition to the constraints based on geometry, resistivity range, linearization and instrumentation errors. The MiMSEE calculates an optimum inverse matrix, which maps the surface measurements to the unknown resistivity distribution. The required data are obtained from four-electrode impedance measurements, similar to injected-current electrical impedance tomography (EIT). In this study, the surface measurements are simulated by using a numerical thorax model. The data are perturbed with additive instrumentation noise. Simulated surface measurements are then used to estimate the tissue resistivities by using the proposed algorithm. The results are compared with the results of conventional least squares error estimator (LSEE). Depending on the region, the MiMSEE yields an estimation error between 0.42% and 31.3% compared with 7.12% to 2010% for the LSEE. It is shown that the MiMSEE is quite robust even in the case of geometrical uncertainties.
Anandakrishnan, Ramu; Onufriev, Alexey
2008-03-01
In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
Prediction of valid acidity in intact apples with Fourier transform near infrared spectroscopy.
Liu, Yan-De; Ying, Yi-Bin; Fu, Xia-Ping
2005-03-01
To develop nondestructive acidity prediction for intact Fuji apples, the potential of Fourier transform near infrared (FT-NIR) method with fiber optics in interactance mode was investigated. Interactance in the 800 nm to 2619 nm region was measured for intact apples, harvested from early to late maturity stages. Spectral data were analyzed by two multivariate calibration techniques including partial least squares (PLS) and principal component regression (PCR) methods. A total of 120 Fuji apples were tested and 80 of them were used to form a calibration data set. The influences of different data preprocessing and spectra treatments were also quantified. Calibration models based on smoothing spectra were slightly worse than that based on derivative spectra, and the best result was obtained when the segment length was 5 nm and the gap size was 10 points. Depending on data preprocessing and PLS method, the best prediction model yielded correlation coefficient of determination (r2) of 0.759, low root mean square error of prediction (RMSEP) of 0.0677, low root mean square error of calibration (RMSEC) of 0.0562. The results indicated the feasibility of FT-NIR spectral analysis for predicting apple valid acidity in a nondestructive way.
Prediction of valid acidity in intact apples with Fourier transform near infrared spectroscopy*
Liu, Yan-de; Ying, Yi-bin; Fu, Xia-ping
2005-01-01
To develop nondestructive acidity prediction for intact Fuji apples, the potential of Fourier transform near infrared (FT-NIR) method with fiber optics in interactance mode was investigated. Interactance in the 800 nm to 2619 nm region was measured for intact apples, harvested from early to late maturity stages. Spectral data were analyzed by two multivariate calibration techniques including partial least squares (PLS) and principal component regression (PCR) methods. A total of 120 Fuji apples were tested and 80 of them were used to form a calibration data set. The influences of different data preprocessing and spectra treatments were also quantified. Calibration models based on smoothing spectra were slightly worse than that based on derivative spectra, and the best result was obtained when the segment length was 5 nm and the gap size was 10 points. Depending on data preprocessing and PLS method, the best prediction model yielded correlation coefficient of determination (r 2) of 0.759, low root mean square error of prediction (RMSEP) of 0.0677, low root mean square error of calibration (RMSEC) of 0.0562. The results indicated the feasibility of FT-NIR spectral analysis for predicting apple valid acidity in a nondestructive way. PMID:15682498
Niioka, Takenori; Uno, Tsukasa; Yasui-Furukori, Norio; Takahata, Takenori; Shimizu, Mikiko; Sugawara, Kazunobu; Tateishi, Tomonori
2007-04-01
The aim of this study was to determine the pharmacokinetics of low-dose nedaplatin combined with paclitaxel and radiation therapy in patients having non-small-cell lung carcinoma and establish the optimal dosage regimen for low-dose nedaplatin. We also evaluated predictive accuracy of reported formulas to estimate the area under the plasma concentration-time curve (AUC) of low-dose nedaplatin. A total of 19 patients were administered a constant intravenous infusion of 20 mg/m(2) body surface area (BSA) nedaplatin for an hour, and blood samples were collected at 1, 2, 3, 4, 6, 8, and 19 h after the administration. Plasma concentrations of unbound platinum were measured, and the actual value of platinum AUC (actual AUC) was calculated based on these data. The predicted value of platinum AUC (predicted AUC) was determined by three predictive methods reported in previous studies, consisting of Bayesian method, limited sampling strategies with plasma concentration at a single time point, and simple formula method (SFM) without measured plasma concentration. Three error indices, mean prediction error (ME, measure of bias), mean absolute error (MAE, measure of accuracy), and root mean squared prediction error (RMSE, measure of precision), were obtained from the difference between the actual and the predicted AUC, to compare the accuracy between the three predictive methods. The AUC showed more than threefold inter-patient variation, and there was a favorable correlation between nedaplatin clearance and creatinine clearance (Ccr) (r = 0.832, P < 0.01). In three error indices, MAE and RMSE showed significant difference between the three AUC predictive methods, and the method of SFM had the most favorable results, in which %ME, %MAE, and %RMSE were 5.5, 10.7, and 15.4, respectively. The dosage regimen of low-dose nedaplatin should be established based on Ccr rather than on BSA. Since prediction accuracy of SFM, which did not require measured plasma concentration, was most favorable among the three methods evaluated in this study, SFM could be the most practical method to predict AUC of low-dose nedaplatin in a clinical situation judging from its high accuracy in predicting AUC without measured plasma concentration.
[Prediction of schistosomiasis infection rates of population based on ARIMA-NARNN model].
Ke-Wei, Wang; Yu, Wu; Jin-Ping, Li; Yu-Yu, Jiang
2016-07-12
To explore the effect of the autoregressive integrated moving average model-nonlinear auto-regressive neural network (ARIMA-NARNN) model on predicting schistosomiasis infection rates of population. The ARIMA model, NARNN model and ARIMA-NARNN model were established based on monthly schistosomiasis infection rates from January 2005 to February 2015 in Jiangsu Province, China. The fitting and prediction performances of the three models were compared. Compared to the ARIMA model and NARNN model, the mean square error (MSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) of the ARIMA-NARNN model were the least with the values of 0.011 1, 0.090 0 and 0.282 4, respectively. The ARIMA-NARNN model could effectively fit and predict schistosomiasis infection rates of population, which might have a great application value for the prevention and control of schistosomiasis.
A variable-step-size robust delta modulator.
NASA Technical Reports Server (NTRS)
Song, C. L.; Garodnick, J.; Schilling, D. L.
1971-01-01
Description of an analytically obtained optimum adaptive delta modulator-demodulator configuration. The device utilizes two past samples to obtain a step size which minimizes the mean square error for a Markov-Gaussian source. The optimum system is compared, using computer simulations, with a linear delta modulator and an enhanced Abate delta modulator. In addition, the performance is compared to the rate distortion bound for a Markov source. It is shown that the optimum delta modulator is neither quantization nor slope-overload limited. The highly nonlinear equations obtained for the optimum transmitter and receiver are approximated by piecewise-linear equations in order to obtain system equations which can be transformed into hardware. The derivation of the experimental system is presented.
An information theory of image gathering
NASA Technical Reports Server (NTRS)
Fales, Carl L.; Huck, Friedrich O.
1991-01-01
Shannon's mathematical theory of communication is extended to image gathering. Expressions are obtained for the total information that is received with a single image-gathering channel and with parallel channels. It is concluded that the aliased signal components carry information even though these components interfere with the within-passband components in conventional image gathering and restoration, thereby degrading the fidelity and visual quality of the restored image. An examination of the expression for minimum mean-square-error, or Wiener-matrix, restoration from parallel image-gathering channels reveals a method for unscrambling the within-passband and aliased signal components to restore spatial frequencies beyond the sampling passband out to the spatial frequency response cutoff of the optical aperture.
Covariance Matrix Estimation for Massive MIMO
NASA Astrophysics Data System (ADS)
Upadhya, Karthik; Vorobyov, Sergiy A.
2018-04-01
We propose a novel pilot structure for covariance matrix estimation in massive multiple-input multiple-output (MIMO) systems in which each user transmits two pilot sequences, with the second pilot sequence multiplied by a random phase-shift. The covariance matrix of a particular user is obtained by computing the sample cross-correlation of the channel estimates obtained from the two pilot sequences. This approach relaxes the requirement that all the users transmit their uplink pilots over the same set of symbols. We derive expressions for the achievable rate and the mean-squared error of the covariance matrix estimate when the proposed method is used with staggered pilots. The performance of the proposed method is compared with existing methods through simulations.
Flood loss model transfer: on the value of additional data
NASA Astrophysics Data System (ADS)
Schröter, Kai; Lüdtke, Stefan; Vogel, Kristin; Kreibich, Heidi; Thieken, Annegret; Merz, Bruno
2017-04-01
The transfer of models across geographical regions and flood events is a key challenge in flood loss estimation. Variations in local characteristics and continuous system changes require regional adjustments and continuous updating with current evidence. However, acquiring data on damage influencing factors is expensive and therefore assessing the value of additional data in terms of model reliability and performance improvement is of high relevance. The present study utilizes empirical flood loss data on direct damage to residential buildings available from computer aided telephone interviews that were carried out after the floods in 2002, 2005, 2006, 2010, 2011 and 2013 mainly in the Elbe and Danube catchments in Germany. Flood loss model performance is assessed for incrementally increased numbers of loss data which are differentiated according to region and flood event. Two flood loss modeling approaches are considered: (i) a multi-variable flood loss model approach using Random Forests and (ii) a uni-variable stage damage function. Both model approaches are embedded in a bootstrapping process which allows evaluating the uncertainty of model predictions. Predictive performance of both models is evaluated with regard to mean bias, mean absolute and mean squared errors, as well as hit rate and sharpness. Mean bias and mean absolute error give information about the accuracy of model predictions; mean squared error and sharpness about precision and hit rate is an indicator for model reliability. The results of incremental, regional and temporal updating demonstrate the usefulness of additional data to improve model predictive performance and increase model reliability, particularly in a spatial-temporal transfer setting.
Modeling number of claims and prediction of total claim amount
NASA Astrophysics Data System (ADS)
Acar, Aslıhan Şentürk; Karabey, Uǧur
2017-07-01
In this study we focus on annual number of claims of a private health insurance data set which belongs to a local insurance company in Turkey. In addition to Poisson model and negative binomial model, zero-inflated Poisson model and zero-inflated negative binomial model are used to model the number of claims in order to take into account excess zeros. To investigate the impact of different distributional assumptions for the number of claims on the prediction of total claim amount, predictive performances of candidate models are compared by using root mean square error (RMSE) and mean absolute error (MAE) criteria.
NASA Astrophysics Data System (ADS)
Shastri, Niket; Pathak, Kamlesh
2018-05-01
The water vapor content in atmosphere plays very important role in climate. In this paper the application of GPS signal in meteorology is discussed, which is useful technique that is used to estimate the perceptible water vapor of atmosphere. In this paper various algorithms like artificial neural network, support vector machine and multiple linear regression are use to predict perceptible water vapor. The comparative studies in terms of root mean square error and mean absolute errors are also carried out for all the algorithms.
Smith, Erik A.; Kiesling, Richard L.; Ziegeweid, Jeffrey R.
2017-07-20
Fish habitat can degrade in many lakes due to summer blue-green algal blooms. Predictive models are needed to better manage and mitigate loss of fish habitat due to these changes. The U.S. Geological Survey (USGS), in cooperation with the Minnesota Department of Natural Resources, developed predictive water-quality models for two agricultural land-use dominated lakes in Minnesota—Madison Lake and Pearl Lake, which are part of Minnesota’s sentinel lakes monitoring program—to assess algal community dynamics, water quality, and fish habitat suitability of these two lakes under recent (2014) meteorological conditions. The interaction of basin processes to these two lakes, through the delivery of nutrient loads, were simulated using CE-QUAL-W2, a carbon-based, laterally averaged, two-dimensional water-quality model that predicts distribution of temperature and oxygen from interactions between nutrient cycling, primary production, and trophic dynamics.The CE-QUAL-W2 models successfully predicted water temperature and dissolved oxygen on the basis of the two metrics of mean absolute error and root mean square error. For Madison Lake, the mean absolute error and root mean square error were 0.53 and 0.68 degree Celsius, respectively, for the vertical temperature profile comparisons; for Pearl Lake, the mean absolute error and root mean square error were 0.71 and 0.95 degree Celsius, respectively, for the vertical temperature profile comparisons. Temperature and dissolved oxygen were key metrics for calibration targets. These calibrated lake models also simulated algal community dynamics and water quality. The model simulations presented potential explanations for persistently large total phosphorus concentrations in Madison Lake, key differences in nutrient concentrations between these lakes, and summer blue-green algal bloom persistence.Fish habitat suitability simulations for cool-water and warm-water fish indicated that, in general, both lakes contained a large proportion of good-growth habitat and a sustained period of optimal growth habitat in the summer, without any periods of lethal oxythermal habitat. For Madison and Pearl Lakes, examples of important cool-water fish, particularly game fish, include northern pike (Esox lucius), walleye (Sander vitreus), and black crappie (Pomoxis nigromaculatus); examples of important warm-water fish include bluegill (Lepomis macrochirus), largemouth bass (Micropterus salmoides), and smallmouth bass (Micropterus dolomieu). Sensitivity analyses were completed to understand lake response effects through the use of controlled departures on certain calibrated model parameters and input nutrient loads. These sensitivity analyses also operated as land-use change scenarios because alterations in agricultural practices, for example, could potentially increase or decrease nutrient loads.
Parameter Estimation for Thurstone Choice Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vojnovic, Milan; Yun, Seyoung
We consider the estimation accuracy of individual strength parameters of a Thurstone choice model when each input observation consists of a choice of one item from a set of two or more items (so called top-1 lists). This model accommodates the well-known choice models such as the Luce choice model for comparison sets of two or more items and the Bradley-Terry model for pair comparisons. We provide a tight characterization of the mean squared error of the maximum likelihood parameter estimator. We also provide similar characterizations for parameter estimators defined by a rank-breaking method, which amounts to deducing one ormore » more pair comparisons from a comparison of two or more items, assuming independence of these pair comparisons, and maximizing a likelihood function derived under these assumptions. We also consider a related binary classification problem where each individual parameter takes value from a set of two possible values and the goal is to correctly classify all items within a prescribed classification error. The results of this paper shed light on how the parameter estimation accuracy depends on given Thurstone choice model and the structure of comparison sets. In particular, we found that for unbiased input comparison sets of a given cardinality, when in expectation each comparison set of given cardinality occurs the same number of times, for a broad class of Thurstone choice models, the mean squared error decreases with the cardinality of comparison sets, but only marginally according to a diminishing returns relation. On the other hand, we found that there exist Thurstone choice models for which the mean squared error of the maximum likelihood parameter estimator can decrease much faster with the cardinality of comparison sets. We report empirical evaluation of some claims and key parameters revealed by theory using both synthetic and real-world input data from some popular sport competitions and online labor platforms.« less
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.
Wei, Runmin; Wang, Jingye; Su, Mingming; Jia, Erik; Chen, Shaoqiu; Chen, Tianlu; Ni, Yan
2018-01-12
Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection can significantly affect following data analyses. Typically, there are three types of missing values, missing not at random (MNAR), missing at random (MAR), and missing completely at random (MCAR). Our study comprehensively compared eight imputation methods (zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC)) for different types of missing values using four metabolomics datasets. Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate imputation accuracy. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes analysis were used to evaluate the overall sample distribution. Student's t-test followed by correlation analysis was conducted to evaluate the effects on univariate statistics. Our findings demonstrated that RF performed the best for MCAR/MAR and QRILC was the favored one for left-censored MNAR. Finally, we proposed a comprehensive strategy and developed a public-accessible web-tool for the application of missing value imputation in metabolomics ( https://metabolomics.cc.hawaii.edu/software/MetImp/ ).
NASA Astrophysics Data System (ADS)
Mignani, Anna G.; Ciaccheri, Leonardo; Mencaglia, Andrea A.; Tuccio, Lorenza; Agati, Giovanni
2015-05-01
Nondestructive in situ determination of the antioxidant lycopene of fresh tomato fruits is of large interest for the growers, willing to optimize the harvest time for high quality products. For this, we developed a portable LED-based colorimeter which was able to measure reflectance spectra of whole tomatoes in the 400-750 nm range. The tomato skins from the same samples were then frozen in liquid nitrogen, extracted with an acetone/ethanol/hexane mixture and analyzed by means of a spectrophotometer for their lycopene content. Concentration of lycopene was varying between 70 and 550 mg/Kg fresh weight skin. Partial Least Square regression was used to correlate spectral data to the tomato lycopene content. The multivariate processing of the reflectance data showed that lycopene content could be nicely predicted with a coefficient of determination R2=0.945 and a root mean square error of cross-validation RMSECV=57 mg/Kg skin fresh weight. These results suggest that portable, low-cost and compact LED-based sensors appear to be promising instruments for the nondestructive assessment of tomato lycopene even in the field.
NASA Astrophysics Data System (ADS)
Dillner, A. M.; Takahama, S.
2015-10-01
Elemental carbon (EC) is an important constituent of atmospheric particulate matter because it absorbs solar radiation influencing climate and visibility and it adversely affects human health. The EC measured by thermal methods such as thermal-optical reflectance (TOR) is operationally defined as the carbon that volatilizes from quartz filter samples at elevated temperatures in the presence of oxygen. Here, methods are presented to accurately predict TOR EC using Fourier transform infrared (FT-IR) absorbance spectra from atmospheric particulate matter collected on polytetrafluoroethylene (PTFE or Teflon) filters. This method is similar to the procedure developed for OC in prior work (Dillner and Takahama, 2015). Transmittance FT-IR analysis is rapid, inexpensive and nondestructive to the PTFE filter samples which are routinely collected for mass and elemental analysis in monitoring networks. FT-IR absorbance spectra are obtained from 794 filter samples from seven Interagency Monitoring of PROtected Visual Environment (IMPROVE) sites collected during 2011. Partial least squares regression is used to calibrate sample FT-IR absorbance spectra to collocated TOR EC measurements. The FT-IR spectra are divided into calibration and test sets. Two calibrations are developed: one developed from uniform distribution of samples across the EC mass range (Uniform EC) and one developed from a uniform distribution of Low EC mass samples (EC < 2.4 μg, Low Uniform EC). A hybrid approach which applies the Low EC calibration to Low EC samples and the Uniform EC calibration to all other samples is used to produce predictions for Low EC samples that have mean error on par with parallel TOR EC samples in the same mass range and an estimate of the minimum detection limit (MDL) that is on par with TOR EC MDL. For all samples, this hybrid approach leads to precise and accurate TOR EC predictions by FT-IR as indicated by high coefficient of determination (R2; 0.96), no bias (0.00 μg m-3, a concentration value based on the nominal IMPROVE sample volume of 32.8 m3), low error (0.03 μg m-3) and reasonable normalized error (21 %). These performance metrics can be achieved with various degrees of spectral pretreatment (e.g., including or excluding substrate contributions to the absorbances) and are comparable in precision and accuracy to collocated TOR measurements. Only the normalized error is higher for the FT-IR EC measurements than for collocated TOR. FT-IR spectra are also divided into calibration and test sets by the ratios OC/EC and ammonium/EC to determine the impact of OC and ammonium on EC prediction. We conclude that FT-IR analysis with partial least squares regression is a robust method for accurately predicting TOR EC in IMPROVE network samples, providing complementary information to TOR OC predictions (Dillner and Takahama, 2015) and the organic functional group composition and organic matter estimated previously from the same set of sample spectra (Ruthenburg et al., 2014).
NASA Astrophysics Data System (ADS)
Dillner, A. M.; Takahama, S.
2015-06-01
Elemental carbon (EC) is an important constituent of atmospheric particulate matter because it absorbs solar radiation influencing climate and visibility and it adversely affects human health. The EC measured by thermal methods such as Thermal-Optical Reflectance (TOR) is operationally defined as the carbon that volatilizes from quartz filter samples at elevated temperatures in the presence of oxygen. Here, methods are presented to accurately predict TOR EC using Fourier Transform Infrared (FT-IR) absorbance spectra from atmospheric particulate matter collected on polytetrafluoroethylene (PTFE or Teflon) filters. This method is similar to the procedure tested and developed for OC in prior work (Dillner and Takahama, 2015). Transmittance FT-IR analysis is rapid, inexpensive, and non-destructive to the PTFE filter samples which are routinely collected for mass and elemental analysis in monitoring networks. FT-IR absorbance spectra are obtained from 794 filter samples from seven Interagency Monitoring of PROtected Visual Environment (IMPROVE) sites collected during 2011. Partial least squares regression is used to calibrate sample FT-IR absorbance spectra to collocated TOR EC measurements. The FTIR spectra are divided into calibration and test sets. Two calibrations are developed, one which is developed from uniform distribution of samples across the EC mass range (Uniform EC) and one developed from a~uniform distribution of low EC mass samples (EC < 2.4 μg, Low Uniform EC). A hybrid approach which applies the low EC calibration to low EC samples and the Uniform EC calibration to all other samples is used to produces predictions for low EC samples that have mean error on par with parallel TOR EC samples in the same mass range and an estimate of the minimum detection limit (MDL) that is on par with TOR EC MDL. For all samples, this hybrid approach leads to precise and accurate TOR EC predictions by FT-IR as indicated by high coefficient of variation (R2; 0.96), no bias (0.00 μg m-3, concentration value based on the nominal IMPROVE sample volume of 32.8 m-3), low error (0.03 μg m-3) and reasonable normalized error (21 %). These performance metrics can be achieved with various degrees of spectral pretreatment (e.g., including or excluding substrate contributions to the absorbances) and are comparable in precision and accuracy to collocated TOR measurements. Only the normalized error is higher for the FT-IR EC measurements than for collocated TOR. FT-IR spectra are also divided into calibration and test sets by the ratios OC/EC and ammonium/EC to determine the impact of OC and ammonium on EC prediction. We conclude that FT-IR analysis with partial least squares regression is a robust method for accurately predicting TOR EC in IMPROVE network samples; providing complementary information to TOR OC predictions (Dillner and Takahama, 2015) and the organic functional group composition and organic matter (OM) estimated previously from the same set of sample spectra (Ruthenburg et al., 2014).
Torrecilla, José S; García, Julián; García, Silvia; Rodríguez, Francisco
2011-03-04
The combination of lag-k autocorrelation coefficients (LCCs) and thermogravimetric analyzer (TGA) equipment is defined here as a tool to detect and quantify adulterations of extra virgin olive oil (EVOO) with refined olive (ROO), refined olive pomace (ROPO), sunflower (SO) or corn (CO) oils, when the adulterating agents concentration are less than 14%. The LCC is calculated from TGA scans of adulterated EVOO samples. Then, the standardized skewness of this coefficient has been applied to classify pure and adulterated samples of EVOO. In addition, this chaotic parameter has also been used to quantify the concentration of adulterant agents, by using successful linear correlation of LCCs and ROO, ROPO, SO or CO in 462 EVOO adulterated samples. In the case of detection, more than 82% of adulterated samples have been correctly classified. In the case of quantification of adulterant concentration, by an external validation process, the LCC/TGA approach estimates the adulterant agents concentration with a mean correlation coefficient (estimated versus real adulterant agent concentration) greater than 0.90 and a mean square error less than 4.9%. Copyright © 2011 Elsevier B.V. All rights reserved.
Patwary, Nurmohammed; Doblas, Ana; Preza, Chrysanthe
2018-01-01
The performance of structured illumination microscopy (SIM) is hampered in many biological applications due to the inability to modulate the light when imaging deep into the sample. This is in part because sample-induced aberration reduces the modulation contrast of the structured pattern. In this paper, we present an image restoration approach suitable for processing raw incoherent-grid-projection SIM data with a low fringe contrast. Restoration results from simulated and experimental ApoTome SIM data show results with improved signal-to-noise ratio (SNR) and optical sectioning compared to the results obtained from existing methods, such as 2D demodulation and 3D SIM deconvolution. Our proposed method provides satisfactory results (quantified by the achieved SNR and normalized mean square error) even when the modulation contrast of the illumination pattern is as low as 7%. PMID:29675307
An estimator of the survival function based on the semi-Markov model under dependent censorship.
Lee, Seung-Yeoun; Tsai, Wei-Yann
2005-06-01
Lee and Wolfe (Biometrics vol. 54 pp. 1176-1178, 1998) proposed the two-stage sampling design for testing the assumption of independent censoring, which involves further follow-up of a subset of lost-to-follow-up censored subjects. They also proposed an adjusted estimator for the survivor function for a proportional hazards model under the dependent censoring model. In this paper, a new estimator for the survivor function is proposed for the semi-Markov model under the dependent censorship on the basis of the two-stage sampling data. The consistency and the asymptotic distribution of the proposed estimator are derived. The estimation procedure is illustrated with an example of lung cancer clinical trial and simulation results are reported of the mean squared errors of estimators under a proportional hazards and two different nonproportional hazards models.
Huh, S.; Dickey, D.A.; Meador, M.R.; Ruhl, K.E.
2005-01-01
A temporal analysis of the number and duration of exceedences of high- and low-flow thresholds was conducted to determine the number of years required to detect a level shift using data from Virginia, North Carolina, and South Carolina. Two methods were used - ordinary least squares assuming a known error variance and generalized least squares without a known error variance. Using ordinary least squares, the mean number of years required to detect a one standard deviation level shift in measures of low-flow variability was 57.2 (28.6 on either side of the break), compared to 40.0 years for measures of high-flow variability. These means become 57.6 and 41.6 when generalized least squares is used. No significant relations between years and elevation or drainage area were detected (P>0.05). Cluster analysis did not suggest geographic patterns in years related to physiography or major hydrologic regions. Referring to the number of observations required to detect a one standard deviation shift as 'characterizing' the variability, it appears that at least 20 years of record on either side of a shift may be necessary to adequately characterize high-flow variability. A longer streamflow record (about 30 years on either side) may be required to characterize low-flow variability. ?? 2005 Elsevier B.V. All rights reserved.
Comparison of Sleep Models for Score Fatigue Model Integration
2015-04-01
In order to obtain sleepiness, the Karolinska Sleepiness Scale (KSS) was applied using the following equation. = − ( ∗ ) (8) Where a = 10.3... Karolinska Sleepiness Scale MSE Mean Square Error St Homeostatic sleep pressure TPM Three-Process Model U Ultradian component
Does positivity mediate the relation of extraversion and neuroticism with subjective happiness?
Lauriola, Marco; Iani, Luca
2015-01-01
Recent theories suggest an important role of neuroticism, extraversion, attitudes, and global positive orientations as predictors of subjective happiness. We examined whether positivity mediates the hypothesized relations in a community sample of 504 adults between the ages of 20 and 60 years old (females = 50%). A model with significant paths from neuroticism to subjective happiness, from extraversion and neuroticism to positivity, and from positivity to subjective happiness fitted the data (Satorra-Bentler scaled chi-square (38) = 105.91; Comparative Fit Index = .96; Non-Normed Fit Index = .95; Root Mean Square Error of Approximation = .060; 90% confidence interval = .046, .073). The percentage of subjective happiness variance accounted for by personality traits was only about 48%, whereas adding positivity as a mediating factor increased the explained amount of subjective happiness to 78%. The mediation model was invariant by age and gender. The results show that the effect of extraversion on happiness was fully mediated by positivity, whereas the effect of neuroticism was only partially mediated. Implications for happiness studies are also discussed.
Analytic Method for Computing Instrument Pointing Jitter
NASA Technical Reports Server (NTRS)
Bayard, David
2003-01-01
A new method of calculating the root-mean-square (rms) pointing jitter of a scientific instrument (e.g., a camera, radar antenna, or telescope) is introduced based on a state-space concept. In comparison with the prior method of calculating the rms pointing jitter, the present method involves significantly less computation. The rms pointing jitter of an instrument (the square root of the jitter variance shown in the figure) is an important physical quantity which impacts the design of the instrument, its actuators, controls, sensory components, and sensor- output-sampling circuitry. Using the Sirlin, San Martin, and Lucke definition of pointing jitter, the prior method of computing the rms pointing jitter involves a frequency-domain integral of a rational polynomial multiplied by a transcendental weighting function, necessitating the use of numerical-integration techniques. In practice, numerical integration complicates the problem of calculating the rms pointing error. In contrast, the state-space method provides exact analytic expressions that can be evaluated without numerical integration.
Thermal Property Measurement of Semiconductor Melt using Modified Laser Flash Method
NASA Technical Reports Server (NTRS)
Lin, Bochuan; Zhu, Shen; Ban, Heng; Li, Chao; Scripa, Rosalla N.; Su, Ching-Hua; Lehoczky, Sandor L.
2003-01-01
This study further developed standard laser flash method to measure multiple thermal properties of semiconductor melts. The modified method can determine thermal diffusivity, thermal conductivity, and specific heat capacity of the melt simultaneously. The transient heat transfer process in the melt and its quartz container was numerically studied in detail. A fitting procedure based on numerical simulation results and the least root-mean-square error fitting to the experimental data was used to extract the values of specific heat capacity, thermal conductivity and thermal diffusivity. This modified method is a step forward from the standard laser flash method, which is usually used to measure thermal diffusivity of solids. The result for tellurium (Te) at 873 K: specific heat capacity 300.2 Joules per kilogram K, thermal conductivity 3.50 Watts per meter K, thermal diffusivity 2.04 x 10(exp -6) square meters per second, are within the range reported in literature. The uncertainty analysis showed the quantitative effect of sample geometry, transient temperature measured, and the energy of the laser pulse.